Mining Big Data

Looking for a business advantage? Take a cue from Moneyball, which dramatized how sports teams could win if they played by the numbers instead of gut instinct. Regardless of the size of your business, it’s time to get on top of the relentlessly growing and invaluable information stream generated by nearly every sector of society. Whatever software you’re using to process data today is almost certainly inadequate to meet the challenge of a world that’s starting to think in zettabytes (that’s 1 billion terabytes, with each terabyte being 1 trillion bytes). The challenge is not just to store all that information, but to understand the opportunities it offers and effectively analyze it ahead of the competition.

Big Data, as it’s come to be called, refers to large data sets that come from just about everywhere—including online sales records, shipping information, climate information, satellite photos and remote surveillance video, computer-generated stock market trades, arrest records, posts to social media sites, flight information, cellphone GPS signals… and much more.

Police departments routinely sift through huge volumes of such information to predict and plan for crime trends. They may look, for instance, at weather, traffic patterns, sporting event schedules, holidays and dates of paydays to pinpoint crime hot spots where targets of opportunity (like distracted people flush with cash) intersect with would-be bad guys.

Savvy retailers can evaluate sales performance of products, pricing trends and demographics to better understand their customers’ rapidly shifting needs.

Lawyers could study individual judges’ decisions to gain insights into strategies to use in their courtrooms—in far less time than it would take them in the analog law library.

Airlines can know before a plane lands that a passenger’s baggage didn’t make the flight, then alert the passenger about the bag’s whereabouts and when he will get it, before the passenger’s blood begins to boil as he waits next to an empty carousel.

And athletic team managers can analyze data and stats to identify undervalued players, as in the Moneyball example based on the Oakland A’s baseball team, as chronicled in the 2003 book by Michael Lewis and last year’s movie starring Brad Pitt.

If you’ve never heard of Big Data or its importance, it’s no wonder. Consider that 90 percent of the world’s data was created in the last two years, IBM tells us, with more than 2.5 quintillion bytes of data being created daily.

Just a year ago, jobs that involve crunching Big Data barely existed, but now the United States faces a shortage of up to 190,000 workers with analytical expertise, as well as 1.5 million managers and analysts to understand and make decisions based on that analysis, according to the McKinsey Global Institute, the research arm of the international management consultant McKinsey & Co.

The market for Big Data technology and services will grow from $3.2 billion in 2010 to $16.9 billion in 2015, according to a 2012 report from the forecasting company International Data Corp. The growth is even higher in certain sectors such as storage, estimated by IDC to be 61.4 percent over the next five years. And specialized data handlers will pioneer new markets; companies that provide clinical medical information, for example, could see a market worth more than $10 billion by 2020, McKinsey says.

How best to tap this veritable gold mine is a question being addressed by tech companies, entrepreneurs, academics and even the Obama administration. Many companies already are doing it. Ever heard of Apache Hadoop? It’s a free, open-source suite of software programs that allows precisely tailored processing of large data sets. (It was named for the creator’s son’s toy elephant, named Hadoop.)

The skill set necessary to effectively use Hadoop needs to be in the wheelhouse of large corporations (which may want to develop teams in-house) as well as small businesses (which are more likely to farm it out to consultants). Facebook processes billions of communications through Hadoop every day. Yahoo is a big user, too, calling it “the open source technology at the epicenter of Big Data and cloud computing.” Last year, Yahoo spun off a company called Hortonworks to further develop Hadoop, and its CEO, Eric Baldeschwieler, predicts that by 2016 half the world’s data could be trusted to Hortonworks’ care. The client list is long, including Apple, LinkedIn, Microsoft, Netflix and StumbleUpon.

Data-Driven Sales

Mollie Lombardi, research director for human capital management at the Aberdeen Group, sees rich opportunities for Big Data in the sales arena, and she uses an extremely basic personal example. “I checked into a Westin/Starwood hotel,” she says, “and the clerk said to me, ‘Welcome back; I see you were with us before—would you like to stay in the same room?’ ”

By having this information at his fingertips, the clerk was able to make a personal connection. “They had the technology to bring up that prompt to the person at the desk,” Lombardi says. “In the same way, data gathering can tell a marketing firm that I’m not going to make a purchase with a 15 percent discount—but I have a record of responding to 30 percent offers.”

Sales forces should be power users of Big Data. Assume a business manager is on the phone talking to a regular customer who says that, for $1 off per piece, he will order another 500 units. With a Big Data front end, the manager can take five or six seconds to access the customer’s history over 20 business cycles. Did the customer actually make good on his volume promises? If not, the manager is in a good position to either deny the discount or offer it conditionally on the purchase of 1,000, not 500, units.

The opportunity is there to put rich customer data in front of salespeople—and it can go well beyond a list of client kids’ birthdays to include detailed analysis of buying patterns put together from many sources in real time.

Within companies, Big Data analysis will let firms study their top-performing salespeople and gain insights into what makes them good. “We could look at graduates of X, Y and Z college and see how they’ve performed,” Lombardi says, “or study the results with people hired from competitor A versus competitor B. With information gained from sources like that, you can create a competency profile and use it to replicate the best sales hires.”

Exciting stuff, right? Not so fast. One of the problems with Big Data is that much of it is useless; according to the B2B Sales Intelligence Blog, only .01 percent of the massive amounts of data spewing forth from social networks, blogs and product reviews is helpful in discovering a buyer’s intent. Again, the key is processing and interpreting the data and gaining insight from it.

Solutions for Health Care

Medicine is another big data generator, and Big Data helps in analyzing it effectively—with results in some cases that can save lives. The data science team at California-based enterprise software company Cloudera used Apache Hadoop to analyze adverse drug events that can occur when two or more prescriptions are combined. Four percent of Americans over 55 are at risk from drug interactions. The problem in analyzing the 1 million reports received annually by the Food and Drug Adminstration, Cloudera quickly found, is a computational explosion—there are more than 3 trillion potential combinations of triple drug interactions.

But getting answers from such huge data sets is no longer beyond our technical reach. Cloudera’s deep dive into the medical data revealed tens of thousands of adverse reactions in patients taking combinations of three drugs, all meriting further investigation. For instance, a seizure medication used in conjunction with a certain pain reliever was found to correlate with memory impairment.

Drug interactions are only one medical application among many. Salient Management Co. uses Big Data to help New York State control Medicaid spending. Over five years, the state’s computerized payment system processed almost 2 billion medical transactions involving more than 200,000 providers and 9 million recipients.

Weeding out fraud is difficult and made harder because illegal schemes involve a large number of records. The Medicaid system generates 2 terabytes of data annually, says Salient CEO Guy Amisano. But Salient’s technology can sort through all that data quickly, looking for odd patterns and trends that may be red flags for fraud, like sudden hikes in billing from a particular location or concentrated cases of the same procedure.

The Human Factor

Big Data also offers a major opportunity for human resource professionals. Brenda Kowske, Ph.D., a senior analyst at Bersin & Associates, says the use of data-based analytics for HR is still in its infancy. “We use data in marketing to figure out what consumers will buy and in finance for risk management,” she says. “In human resources, we can use it to predict how employees will perform on the job and how to get them engaged and motivated.”

Confidentiality laws present one big hurdle to accessing human-resource Big Data. Companies face limits on how long they can store data pertaining to individuals, and sharing human resource data across different companies is difficult.

But within the legal confines, there is much that can be done. Specifically, HR managers can study data from past employees, including patterns in their behavior on the job, which will lead to identifying personality attributes that are helpful if people are to perform at the level necessary for the position. “It requires managers to think like researchers rather than HR people,” Kowske says. “Firms need to not only collect data, but store it in forms that are minable. It would, in fact, be useful to have smart tools that could crawl across different HR systems, because the data are not likely to be all in the same place.”

A cottage industry is growing to help HR departments get up to speed in working with Big Data. One such company is Spring International, whose CEO, Robert Berrier, has a background in presidential polling. Politicians divide voters into segments that can then be specifically targeted with campaign advertising, says Matt Fumento, a vice president of strategy and development at Spring. In HR, he says, firms are trying to better understand their own workforces (and potential hires) and maximize their levels of engagement on the job. Spring assesses employee engagement by surveying employees and studying that data, along with information such as employee absenteeism and sick time. Spring also looks into factors such as customer satisfaction, revenue generation and profitability.

Engaged professionals definitely add to the bottom line. For an airline client, Spring correlated pilots’ levels of engagement with the amount of time they were spending on the tarmac before taking off and found that delays in getting airborne were costing the company $100 million. For retail clients such as Lowe’s, it helps identify the impact of engagement on revenue generated per square foot of store space. Lowe’s was able to confirm the link between engaged employees, customer satisfaction and revenue generation.

At its simplest level, a Lowe’s customer looking for a gallon of paint would get that and nothing else from a disengaged employee. But if the employee is listening, he or she will take an interest in the project—and the customer could end up with spackling paste, sandpaper, brushes and rollers in addition to the paint. Lowe’s found that the difference between its highest- and lowest-engaged stores was more than $1 million in sales annually.

Looking for actual revenue results is important, because according to the book Strategy Maps: Converting Intangible Assets into Tangible Outcomes, 70 to 90 percent of companies fail in their business strategies. And one reason for that is that HR—with potentially very valuable information on how to increase employee performance—lacks a seat at the table when important strategic decisions are made. In one 2011 survey, Engagement Maturity Practices, only four of 200 firms studied had the ability to equate employee engagement with business outcomes.

Fumento says access to Big Data—and to information generated in real time throughout an employee’s work life, not just in annual or quarterly reviews—will make it clear that HR provides a return on investment to the company. “The workforce intelligence model has the potential of revolutionizing the HR function,” he says.

Into the Cloud

Data isn’t just growing, it’s also migrating online, which poses additional challenges as well as opportunities. While cloud computing constitutes less than 2 percent of IT spending today, says a Digital Universe Study, by 2015, almost 20 percent of information will be processed by the cloud and 10 percent will be stored there. More virtual servers used for cloud computing were purchased in 2010 than physical servers, IDC says.

Managing Big Data is a challenge as the cloud takes over because information stored away from the office on remote servers has to be integrated with bytes stored on company hard drives. Company officials will want to ensure that their cloud data is secure and off-limits to third parties, and that it’s backed up regularly and archived properly. But housing Big Data on the cloud has many advantages. Writing for ZDNet, Phil Wainewright uses the iPhone 4’s digital voice assistant, Siri, to illustrate that point. Previous generations of voice recognition had to be trained in the user’s voice over time; Siri dispenses with that—it matches the user to the nearest voice pattern in an ever-expanding library of tens of thousands made possible by its home on the cloud. For most companies, small and large, cloud storage will make sense because there’s no space limit that matters, and because the data is as accessible from remote locations as it is when stored in-house.

It’s not just the ability to analyze big data pools. “What really matters,” Wainewright says, “is the broad base of that data, gathered from a large mix of users within which patterns of behavior can be analyzed and then applied elsewhere. Think of it as swarm data—lots of individual, autonomous behavior that collectively add up to reusable patterns.”

Another advantage to storing Big Data on the cloud is the savings it offers in energy costs, according to 62 percent of IT managers surveyed in the 2012 Energy Efficient IT Report by CDW, a technology and services provider. Energy usage is no trivial matter—consider the case of Google, said to run as many as 900,000 servers requiring 220 megawatts of power generation, which is nearly 1 percent of global data center energy use and .01 percent of the world’s total energy demand. According to the CDW survey, the virtual solution reduced energy demand an average of 28 percent among respondents.

Ideally, a company’s cloud solutions would combine both huge data storage with the ability to analyze all that information—a one-stop shop. Just such a solution was announced by Global Computer Enterprises in April as SMART Cloud for Big Data and Analytics. It was developed with open-source tools such as the aforementioned Apache Hadoop. Government agencies are major targeted users.

The Obama administration is, not coincidentally, taking note of the possibilities of Big Data. In March, it announced the Big Data Research and Development Initiative, a $200 million package of commitments in six agencies, including the departments of Energy, Defense and Homeland Security, designed to “greatly improve the tools and techniques needed to access, organize and glean discoveries from huge volumes of digital data,” says Tom Kalil, deputy director for policy of the Office of Science and Technology Policy.

Just as a government system called ARPANET was a precursor to today’s Internet, similar opportunities exist now with Big Data, says John Holdren, a science advisor to Obama. “In the same way that past federal investments in information technology [research and development] led to dramatic advances in supercomputing and the creation of the Internet, the initiative we are launching today promises to transform our ability to use Big Data,” he says.

Part of the federal plan is to provide $10 million in funding research at the University of California at Berkeley, through the National Science Foundation, for cloud computing, crowdsourcing (using modern technology to gather information and images from the public) and techniques to help computers “learn” from experience. That’s exactly the kind of cutting-edge project we need as Big Data matures, especially if the U.S. is to maintain a technological lead. We’re at an exciting crossroads, and both Big Data and the study of it are in their infancy. We will definitely see data accumulation grow exponentially in the near future. The question is how wisely we will access it.

Big Data in the Real World

Practical uses for Big Data aren’t merely theoretical—they’re here and now. Here are five ways innovative people and companies are making huge information streams work for them:

  • Desktop Warriors. Crunching a huge amount of publicly available Wikileaks information about the war in Afghanistan, New York University Ph.D. student Drew Conway was able to draw some conclusions about peak periods and locations of conflict, according to a report by Gigaom. Conway, who runs the Zero Intelligence Agents blog, organized the Big Data dump by geography and by the nature of encounters (hostile or friendly) between U.S. troops and Afghans. The conclusions lent credence to the idea that conflict with the Taliban tends to peak during certain seasons and is concentrated around the Ring Road that surrounds the capital of Kabul.
  • Sales Targets. British supermarket chain Tesco has experienced a 12 percent uptick in sales during early trials using data analysis to determine which top-selling items to discount and when. Tesco’s recently acquired subsidiary Dunnhumby, a shopping information company, tracked sales data from 16 million families, who make approximately 6 million transactions a day using Tesco Clubcards to accrue reward points. The company also profits from the sale of its shopping preference data to other businesses. The program is not without controversy, however, because some critics say shoppers aren’t told their information is being used for Tesco’s profit. The company says it’s only identifying trends, not offering a peek into its customers’ lives.
  • Who’s Driving Our Kids? Not all uses for Big Data are highly complex or technical. In Iowa, Gov. Terry Branstad signed into law a new mandate that school bus drivers will be subject to background checks. To pass muster, an applicant has to survive a search of public records—including the sex offender registry, the central registry for child abuse, files on dependent adult abuse and driving infractions, if any. These records aren’t sequestered for official use, as they once were, but available online through the Iowa Courts Online Search. The procedure has to be followed every five years, when the driver renews his license. The record shows that data cross-checks can be valuable in keeping kids out of harm’s way. An Oregon school bus driver was arrested in 2010 after a forensic computer investigation found eight child pornography videos on a social networking site that had been uploaded with his email address and password. He received a seven-year sentence and, needless to say, won’t be driving any more kids to school.
  • Charged by the Volt. General Motors was the first auto manufacturer to deliver a full range of services, from finding your lost car in a parking lot to emergency response and driving directions, through the wireless connectivity of its OnStar service. Through OnStar, GM now juggles an amazing three petabytes of data annually (one petabyte being equal to 1 quadrillion bytes). OnStar Chief Information Officer Jeffrey Liedel admits that GM hasn’t fully figured out how to make its data flow work for its customers and for the company’s bottom line. But it knows that OnStar will be of major benefit to its future electric car buyers, and is testing an app that will let drivers remotely check their battery charge and start or stop a charging session from the comfort of their living room lounge chair.
  • Predicting Global Crises. The United Nations’ Global Pulse initiative utilizes digital data such as social media chatter, mobile phone calls and online transactions to predict and better understand economic crises, health epidemics and natural disasters. Researchers with Pulse and the analytics software specialist SAS analyzed more than 500,000 blogs, online forums and news sites in Ireland and the U.S. to determine that social media chatter (particularly about “cutting back,” “public transportation usage” and “downgrading the car”) could predict spikes in unemployment that occurred three to five months later. Global Pulse researchers have also used digital data such as mobile phone usage to monitor the movement of people following Haiti’s 2010 earthquake as well as the spread of a subsequent cholera outbreak there.

Big Data is like an iceberg, with only a tiny bit of its practical uses visible to us. What’s exciting is what we’ll be able to do as the rest of the iceberg becomes visible. And of course, with privacy issues more at stake than ever, one must wonder: Will the discovery of this iceberg save the global economy, sink our humanity, or both?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.