Not All Datasets Are Good Datasets


As someone who has been a long-time proponent of data, it is intriguing to see the number of new start-ups that have revenue models based partially – sometime entirely – on the sale of data, even though they are not data publishers in the conventional sense. Rather, they are seeking to monetize data they are collecting incidentally in the course of other activities.

A fashion website or app, for example, might realize that by tracking what new fashions its users viewed the most, they were collecting valuable intelligence that could be sold to fashion manufacturers. The early players in this area usually did, in fact, have valuable and readily saleable data collections and they had in fact identified an important new revenue stream.

But now “data” is transforming into a buzz-term, up there with “the cloud” and “social.” Purported data opportunities are being used to mask weak business models because everyone these days knows “it’s all about the data.” Just as start-ups these days feel compelled to be in the cloud and have a strong social component, so too do they now need a data opportunity.

Not every new business can create value from the incidental data it generates. Those that do represent the exception, not the rule.  Here are a few reasons why these data opportunities may not be as strong as the entrepreneurs behind them would like to believe:

1. You generate too little data. While everyone talks about quality data, there is still a quantity aspect as well. Even for things as valuable as sales leads, most companies will turn up their noses at them if you can’t deliver a certain volume of leads regularly and dependably. Depending on the data itch you’re trying to scratch, 100,000 or even a million users may not cut it.

2. You generate too much data. Having the most data about something can be as much a burden as an opportunity. Think Twitter. Everyone “knows” that the huge collective stream of consciousness that its  users generate is enormously valuable, but extracting that value is very complex and expensive, and much of the final output still represents conjecture and surmise.

3. You don’t really know much about the data you’ve got. I’ve been in numerous meetings where the issue on the table was, “we’ve got tons of data, but we’re not sure how to monetize it.” This situation naturally calls for advanced TAPITS (There’s A Pony in There Somewhere) analysis to assess value. More times than not, the chosen solution is simply to sell the raw data and hope that the buyer can find value. Of course, when you sell data by the ton, you have to charge for it by the ton too. It’s just not that valuable if the buyer needs to do all the thinking and all the work.

4. A sample of none. Online businesses want lots of traffic and lots of users, the more the merrier. This is good for business generally, but not necessarily great from a data perspective. If your user base is too disparate, the aggregate insights from the data they generate may not be all that valuable. And if your user base is largely anonymous, good luck with that.

5. Buy me a drink first. Many times, an online company is in possession of extremely detailed and valuable data. Unfortunately, this typically means that these data can only be had by violating the trust if not the privacy of the user. It’s even more complicated if the company built its business with a strong privacy policy that prohibits it from ever selling all this valuable information.

6.  Exclusive insights. These days, if you said you have “near-real-time insight into bus station storage locker utilization rates” it will be automatically assumed that you've tapped a huge data opportunity. Every bus station certainly needs this information, bus lines probably have a use for it, there’s probably a government market, some hedge funds will want it and there might even be a consumer opportunity as well – think of an app that shows you available storage lockers nationwide! But in reality, every market is not a viable data market. The market might be too small, marginally profitable, too localized or too consolidated. It is absolutely possible to have data that nobody cares about or that too few people care about to create a meaningful revenue stream.

7. Competition. Your data may indeed be valuable, but chances are, you don’t have the full picture. This means your data is less valuable than a company that can supply the full picture. That means the market for your data may be the one company that knows more about the market than you do. Yes, there’s revenue to be had in this case, but you won’t get rich.

8. Raw data follies. Typically, companies trying to sell the data they collect incidentally want to sell the data, get the money, and get back to their core business activities. But if you don’t clean and organize your data, you’re leaving lots of money on the table. And if you decided to get serious about your data, you’re moving into a different business, one you probably don’t understand very well.

I could keep going, but hopefully you get the point: the chances that the incidental data you generate from some other business activities are valuable is pretty low. And even if you have valuable data, getting maximum value from it generally demands getting a lot more serious about your data, which starts to move you into a totally different business.