Viewing entries tagged
Big Data


Does Correlation Trump Causation?

A new book called Big Data: A Revolution That Will Transform How We Live, Work and Think, written by Viktor Mayer-Schonberger of Oxford and Kenneth Cukier of The Economist, raises some intriguing and provocative issues for data publishers.  Among  them is this one:

 “…society will need to shed some of its obsession for causality in exchange for simple correlation: not knowing why but only what.”

The underlying thinking as I understand it is that Big Data, because it can analyze and yield insight from millions or even billions of data points, is both incredibly powerful and uncannily accurate, in large part because of the massive sample sizes involved.

But are all Big Data insights created equal?

Without a doubt, some insights from Big Data analytics yields useful and low-risk results. If Big Data, for example, were to determine that from a price perspective, the best time to purchase an airline ticket is 11 days prior to departure, I have both useful information and not a care in the world about causation. Ironically, in this example, Big Data would be used to outsmart airline Big Data analytics that are trying to optimize revenues through variable pricing.

But riding solely on correlation often creates situations where heavy-handed or even ridiculous steps would be necessary to act on Big Data insights. Consider a vexing issue such as alcoholism. What if we learned through Big Data analytics that left-handed males who played tennis and drove red cars had an unusually high propensity to become alcoholics? Correlation identifies the problem, but it doesn’t provide much of a solution. Do we ban alcohol for this entire group? Do we tell left-handed males that they can either play tennis or drive a red car, but not both? Does breaking the correlative pattern actually work to prevent the correlated result? Things can get strange and confusing very quickly when you rely entirely on correlation.

Am I calling into question the value of Big Data analytics? Not at all. The ability to powerfully analyze massive data sets will be beneficial to all of us, in many different ways. But to suggest that Big Data correlations can largely supplant causation research plays into the Big Data hype by suggesting it is a pat, “plug and play” solution to all problems. Big Data can very usefully shape and define causal research, but there are numerous situations where it can’t simply replace it.

The lesson here is that while you should embrace Big Data and its big potential, remain objective and ask tough questions to separate Big Data from Big Hype because lately, the two have been tightly correlated.



Walking Around Money

A young company called Placed is deep into Big Data analytics, but with a twist: it marries customer data with its own proprietary data to yield insights into customer behavior. Essentially, Placed wants to provide context around how customers use the mobile applications of its clients, for example, when do they use the app and where do they use it?

The “where” part of the analysis is what’s interesting. Placed could simply spit back to its clients that its customers are in certain ZIP codes or other dry demographics – interesting, like so many analytics reports are, but not particularly useful.

Instead Placed marries customer location with its own proprietary database of places – named stores, major buildings, points of interest. By connecting the two, Placed can tell its clients where mobile use of its app is occurring. For example, if a client’s customers utilize its mobile app in a competitor’s store, it might suggest competitive price comparisons. Knowing its customers frequent Starbucks and nightclubs might influence the clients’ marketing strategy or advertising campaign design. Knowing that the app is used most often when someone is walking (yes, Placed can tell you that) can be important for user interface design – you get the idea.

And therein lies an important insight. There are an endless number of companies offering Big Data analytics capabilities. But almost all of them expect their customers to bring both the problem and the data. That’s a sure recipe for commoditization, and as analytics software evolve, it’s also certain that the companies with the biggest analytics needs will decide to do the work themselves.

Solution? Big Data analytics players should bring proprietary data to the party. Placed is a perfect case study. It differentiates itself by providing answers others can’t. It adds value to its analytics by integrating proprietary and licensed data with customer data and its own optimized analytical tools. As I discussed in my presentation at DataContent 2012, there are lots of ways publishers can profit from the Big Data revolution -- even if they don't have big data themselves.

In a market where companies like Placed can make money by tracking people walking around, it behooves data publishers to walk around to some of these Big Data analytics players and suggest data partnerships that will help them stand out from the crowd.