InfoCommerce Group Blog

Not All Datasets Are Good Datasets

As someone who has been a long-time proponent of data, it is intriguing to see the number of new start-ups that have revenue models based partially – sometime entirely – on the sale of data, even though they are not data publishers in the conventional sense. Rather, they are seeking to monetize data they are collecting incidentally in the course of other activities.

A fashion website or app, for example, might realize that by tracking what new fashions its users viewed the most, they were collecting valuable intelligence that could be sold to fashion manufacturers. The early players in this area usually did, in fact, have valuable and readily saleable data collections and they had in fact identified an important new revenue stream.

But now “data” is transforming into a buzz-term, up there with “the cloud” and “social.” Purported data opportunities are being used to mask weak business models because everyone these days knows “it’s all about the data.” Just as start-ups these days feel compelled to be in the cloud and have a strong social component, so too do they now need a data opportunity.

Not every new business can create value from the incidental data it generates. Those that do represent the exception, not the rule.  Here are a few reasons why these data opportunities may not be as strong as the entrepreneurs behind them would like to believe:

1. You generate too little data. While everyone talks about quality data, there is still a quantity aspect as well. Even for things as valuable as sales leads, most companies will turn up their noses at them if you can’t deliver a certain volume of leads regularly and dependably. Depending on the data itch you’re trying to scratch, 100,000 or even a million users may not cut it.

2. You generate too much data. Having the most data about something can be as much a burden as an opportunity. Think Twitter. Everyone “knows” that the huge collective stream of consciousness that its  users generate is enormously valuable, but extracting that value is very complex and expensive, and much of the final output still represents conjecture and surmise.

3. You don’t really know much about the data you’ve got. I’ve been in numerous meetings where the issue on the table was, “we’ve got tons of data, but we’re not sure how to monetize it.” This situation naturally calls for advanced TAPITS (There’s A Pony in There Somewhere) analysis to assess value. More times than not, the chosen solution is simply to sell the raw data and hope that the buyer can find value. Of course, when you sell data by the ton, you have to charge for it by the ton too. It’s just not that valuable if the buyer needs to do all the thinking and all the work.

4. A sample of none. Online businesses want lots of traffic and lots of users, the more the merrier. This is good for business generally, but not necessarily great from a data perspective. If your user base is too disparate, the aggregate insights from the data they generate may not be all that valuable. And if your user base is largely anonymous, good luck with that.

5. Buy me a drink first. Many times, an online company is in possession of extremely detailed and valuable data. Unfortunately, this typically means that these data can only be had by violating the trust if not the privacy of the user. It’s even more complicated if the company built its business with a strong privacy policy that prohibits it from ever selling all this valuable information.

6.  Exclusive insights. These days, if you said you have “near-real-time insight into bus station storage locker utilization rates” it will be automatically assumed that you've tapped a huge data opportunity. Every bus station certainly needs this information, bus lines probably have a use for it, there’s probably a government market, some hedge funds will want it and there might even be a consumer opportunity as well – think of an app that shows you available storage lockers nationwide! But in reality, every market is not a viable data market. The market might be too small, marginally profitable, too localized or too consolidated. It is absolutely possible to have data that nobody cares about or that too few people care about to create a meaningful revenue stream.

7. Competition. Your data may indeed be valuable, but chances are, you don’t have the full picture. This means your data is less valuable than a company that can supply the full picture. That means the market for your data may be the one company that knows more about the market than you do. Yes, there’s revenue to be had in this case, but you won’t get rich.

8. Raw data follies. Typically, companies trying to sell the data they collect incidentally want to sell the data, get the money, and get back to their core business activities. But if you don’t clean and organize your data, you’re leaving lots of money on the table. And if you decided to get serious about your data, you’re moving into a different business, one you probably don’t understand very well.

I could keep going, but hopefully you get the point: the chances that the incidental data you generate from some other business activities are valuable is pretty low. And even if you have valuable data, getting maximum value from it generally demands getting a lot more serious about your data, which starts to move you into a totally different business.





Data Insights from Bitsight

A Boston-area start-up called Bitsight is pulling in investor money so quickly, a total of $95 million, that it doesn’t know what to do with it all … yet.

And what does Bitsight do, to justify this level of investment? It examines company websites, evaluates them for the quality of their website security, and assigns them a rating, much like a credit score.

How do they do it? There’s a bit of proprietary secret sauce in how the company evaluates the security of a website, but what’s particularly interesting is that they do it all with publicly available information. And that raises another fascinating aspect of the business: the companies that Bitsight rates are not its clients. Bitsight is not an online security consultant with an automated assessment tool. Indeed, it has evaluated over 60,000 websites to date, and ultimately may evaluate tens or even hundreds of thousands of websites.

Why would anyone want this information? The uses for this data are surprisingly numerous. You can sell it in the form of a benchmark products to the companies you have rated. What IT manager wouldn’t want to know how their company stacks up against its peers? A better opportunity is to help insurance companies properly price data breach insurance policies.

But perhaps the best opportunity is to help big companies evaluate and manage risk with their vendors – a huge issue as a number of headline-grabbing recent data breaches resulted from a company’s network being penetrated via one of its vendors that was connected to it.

While Bitsight may look like a cutting edge analytics company, what’s significant is that so much of its business model is drawn from very basic approaches used by many other data publishers. It is aggregating publicly-available data into a database. It normalizes this information, then applies an algorithm to assess it and produce comparable company ratings. It sells this data product for internal benchmarking, risk management and due diligence applications.

In short, despite its high tech trimmings, Bitsight very much has data publishing DNA. It is also a great example that data products don’t have to be perfect right out of the gate. By relying on public information, Bitsight can’t possibly know everything about the security of a company’s website. But by relying just on public data, it can quickly build a large database of comparable company ratings using a credible methodology and solve market needs that require a certain scale of coverage. If you’re the first data provider serving a serious market need, you can launch with good-enough data and improve it over time. Trying to perfect your data prior to launch can mean missing the opportunity entirely.

Do You Rate?

An article in the New York Times today discusses the growing proliferation of college rankings as focus shifts to trying to evaluate colleges based on their economic value.

Traditionally, rankings of colleges have tended to focus on their selectivity/exclusivity, but now the focus has shifted to what are politely called “outcomes,” in particular, how many graduates of a particular college get jobs in their chosen fields, and how well they are paid. Interestingly, many of the existing college rankings, such as the well-known one produced by U.S. News, have been slow to adopt to this new area of interest, creating opportunities for new entrants. For example, PayScale (an InfoCommerce Model of Excellence winner) has produced earnings-driven college rankings since 2008. Much more recently, both the Economist and the Wall Street Journal have entered the fray with outcomes-driven college rankings. And let’s not forget still another college ranking system, this one from the U.S. Department of Education.

At first blush, the tendency is to say, “enough is enough.” Indeed, one professor quoted in the Times article somewhat humorously noted that there are so many college rankings that, “We’ll soon be ranking the rankings.”

However, there is typically always room for another useful ranking. The key is utility. Every ranking system is inherently an alchemic blend of input data and weightings. What data are used and how they are evaluated depend on what the ratings service thinks is important. For some, it is exclusivity. For others it is value. There are even the well-known (though somewhat tongue in cheek) rankings of top college party schools.

And since concepts like “quality” and “value” are in the eye of the beholder with results often a function of available data, two rating systems can produce wildly varying results. That’s why when multiple rating systems exist, most experts suggest considering several of them to get the most rounded picture and most informative result.

It’s this lack of a single right way to create a perfect ranking that means that in almost every market, multiple competing rating systems can exist and thrive. Having a strong brand that can credential your results always helps, but in many cases, you can be competitive just with a strong and transparent methodology. It helps too when your rankings aren’t too far out of whack with general expectations. Totally unintuitive ranking results are great for a few days of publicity and buzz, but longer term they struggle with credibility issues.

A take-away for publishers is that just because you weren’t first to market with the rankings for your industry, there may still be a solid opportunity for you, if you have better data, a better methodology and solid credibility as a neutral information provider. 

Data's Brave New World

The ACLU has just released a report highlighting the growing relationship between law enforcement agencies and a Chicago-based company called Geofeedia. In a nutshell, Geofeedia is apparently marketing to law enforcement agencies a crowd surveillance tool that mixes geolocation with social media sentiment analysis.

This illustrates the gray area we operate in as data providers, especially those of us dealing with consumer data. Things that are perfectly legal may be seen by others as unethical and inappropriate. And, perhaps ironically, the power and pervasiveness of social media means that reputational risk becomes an outsized area of concern for those of us who deal in data.

On the one hand, Geofeedia is simply aggregating and analyzing information that individuals have voluntarily and publicly posted on various social media platforms. On the other hand, its particular application for these data can be seen to be chilling to lawful speech, dissent and free assembly. And as noted earlier, the law lags far behind these new technologies, and thus provides little guidance.

Facebook reacted to the ACLU report by quickly severing ties with Geofeedia. It understands that anything that creates even the slightest hesitancy to use its platform is detrimental to its own business. Instagram suspended Geofeedia as well. Even Twitter, which we have previously noted seems content to be a datastream for others to monetize, has suspended Geofeedia from commercial access to its data.

As we have noted, it’s difficult to come down on one side or the other in this issue. As a data producer, I think that aggregating and analyzing publicly available data is generally a beneficial activity. Indeed, what Geofeedia is doing is conceptually not all that different than the many social sentiment analysis companies selling aggregated insights to hedge funds seeking early warning on news and emerging trends. Yet at the same time, even if Geofeedia was working with the best of intentions, the optics of its product offering should have received greater attention. And that’s the lesson here for data publishers: just because you can do something doesn’t always mean you should do it. Perception has become as important as reality. Don’t let ignorance or arrogance crater your products or your entire business. Keep firmly in mind at all times that, especially when it comes to data, optics do matter.




Should Governments Sell Data?

Under the broad label of “open data,” governments around the world are opening up increasing numbers of fascinating and often valuable datasets to public access, in many cases, via API.
As a recent article in Network World notes, London makes nearly 500 datasets available, and even smaller cities in the UK like Leeds make hundreds of datasets available as well. Perhaps most interesting of all is the initiative by the city of Copenhagen, called City Data Exchange, which takes open data in two important new directions. First, it intends to charge for its data, and second, it is also offering relevant databases from for-profit data producers, also for a fee.

The US has not been a leader in the open data movement, though more government data comes online on almost a daily basis now. Typically, the model in the US is that government data made available to the public is made available for free. That makes sense, since it was gathered at taxpayer expense and should therefore be made available for free – keeping the “free” in Freedom of Information if you will.
But when you think about it, there may be some merit to governments charging reasonable fees to access public datasets. Simply put, it forces governments to treat their data and the people using their data with more professionalism and respect. I’ve been involved in several promising projects that were to be based on government databases that suddenly disappeared because funding was cut, or the person who was responsible for the initiative left the agency and wasn’t replaced. It’s great to have a business based on free government data – until it isn’t. You are at the mercy of an organization that collects data its own way, for its own purposes, and only for as long as it feels it needs to collect it. Putting a revenue stream behind a dataset starts to change that dynamic.
Also of interest is Copenhagen’s plan to be a reseller of private databases. On the one hand, I celebrate the innovation and progressive thinking in this move. On the other hand, it feels backwards to me. If there is a commercial database that complements a government-created database, I think it makes a lot more sense for the commercial database publisher to resell the government data alongside its own. After all, it has the larger financial incentive, it has the staff that really understands data, and it has the marketing and sales capability the government lacks. Government entities are not well positioned to sell their own data, much less someone else’s data, and the better they get at it, the more likely they will cross the line and start competing with private business.
Government is a great source of data, though historically it has been a somewhat undependable source of data. Perhaps putting some modest revenue around it could improve that situation. But moving into the business of selling commercial data products, however well intentioned, is a bridge too far. There are too many specialized skills involved that government entities don’t have and shouldn’t develop.

Meet DiscoverOrg at BIMS
Want to find out why DiscoverOrg won a 2016 Model of Excellence Award?
This year’s winners will be showcased at BIMS, November 14-16 in Ft. Lauderdale. It’s a peer-to-peer forum complete with exclusive tracks on Data and the unique opportunity to hear from the MOE founders firsthand.  Register now to attend!
Here’s just a taste of the brilliance behind DiscoverOrg – be sure to attend BIMS to get the full story.
DiscoverOrg is a leading global sales and marketing intelligence tool used by over 2,000 companies to accelerate growth. DiscoverOrg’s solutions provide a constant stream of accurate and actionable company, contact, and buying intelligence that can be used to find, connect with, and sell to target buyers more effectively. CMO, Katie Ballard, dsays, “We believe accurate data is the foundation to faster revenue growth. You can’t make good decisions without it. How are you going to grow if you don’t have accurate data to build your sales and marketing strategy on? DiscoverOrg offers the most accurate, actionable, and integrated sales and marketing intelligence—covering contact, company, org charts, buying triggers, and predictive purchase data—that allows our customers to generate more leads, set more meetings, and close more deals.  One of the reasons that the data is so accurate is that we have a team of 150 in-house researchers that verifies every single piece of data in our platform.  We work mostly with technology, staffing, marketing, and consulting firms. Our clients run the gamut from the biggest brands and companies down to startups, and about 80% fall into technology (including hardware, software, information security, etc…), 10% in staffing, and 10% in the other industries.”
Hear more at BIMS!