Data Marketplaces: Almost There

There has been much excitement about the recent launch of the Salesforce Data Studio, a new data-sharing platform within the Salesforce Marketing Cloud.

The idea of the Data Studio is simple: marketers can, on a fully automated basis, identify, order and integrate datasets that others are offering for sale. In its early implementation, the Data Studio seems mostly like a cool way for marketers to buy email lists. But the vision is much bigger and more interesting: to allow marketers to augment and overlay existing email lists with more data so that they become smarter about their lists, target their efforts more effectively, and get better results.

Data Studio at time of launch is heavy on audience data, mostly from larger publishers, but there’s no reason any data publisher couldn’t participate as well, especially if the Data Studio wants to exploit its full potential.

Interestingly, Salesforce is not the only big player that has an interest in data marketplaces. The Amazon Web Services Marketplace sells software through its marketplace – again, a totally automated buying experience – but it also offers a selection of public domain datasets for free. It’s a small jump then for Amazon to start selling databases on behalf of others.

As you can see, neither of these two marketplaces is quite ready for prime time as far as becoming a meaningful sales channel for data publishers, but they’re tantalizingly close. Keep an eye on these marketplaces: they could become very important to data publishers very quickly.

ADS.DATA

It’s not news that fraud is rampant in online advertising. It turns out that one of the biggest reasons is the fact that the buyers and sellers of online advertising in large part do not deal directly. They transact through third party brokers and marketplaces. Increasingly, it’s now computers ordering through third party brokers and marketplaces – the wonderful world we call programmatic. With no humans watching, much less policing the buying process, it is notsurprising that crooks and thieves have rushed in.

One of the easiest types of fraud is simply to misrepresent yourself online. You can tell an online marketplace that you represent the CNN website, collect the revenue, then run the ads you sold on some other website, often one that gets lots of bot traffic and other fake clicks in order to show performance.

To fight this type of misrepresentation, the Internet Advertising Bureau (IAB) created a new standard called ADS.TXT. It’s a small standardized format file that a website owner creates and places on the website that lists all the website’s authorized sellers. If you’re familiar with ROBOTS.TXT, it is exactly analogous.

The idea is that programmatic advertising buyers can easily and confidently check a website’s list of authorized resellers. It’s a full, workable solution to a significant problem, but it comes with one big catch: the ADS.TXT file is necessarily open to everyone who wants to view it. And a lot of publishers and other website owners aren’t thrilled about exposing what they consider proprietary information.

The solution? In my view, it’s a central database, operated by an independent third party. The same information can be placed in the database, but access can be easily restricted to those who “need to know” the information. I’ve always liked opportunities where an industry needs to share information but at the same time doesn’t want to make that information public. A neutral data provider is most times the perfect answer, as I think it is in this case.

Moreover, a central database can add additional value, because it can track what is happening. It can automatically nag website owners who don’t update their reseller lists regularly. It can check which advertising marketplaces are using the service. In these and many other ways, it can actively work to keep all players engaged and honest.

And of course, data being data, there’s an easy opportunity to aggregate this reseller data to look for sales trends and market share. This information can be given or sold back to the industry without any privacy concerns.

ADS.TXT is just one example of a good idea that could be a much better idea if there was a trusted data provider in the middle, protecting privacy while mediating and recording access to insure compliance and data accuracy. I’d like to see ADS.TXT as what you might call ADS.DATA. You’d be wise to look for analogous opportunities in your own market.

 

Top Level Domains/Low-Level Trustmarks

If you’re not immediately familiar with the term top level domain (TLD), think of “.com” and “.net” and “.edu” – they are all top-level domains, along with hundreds of others, and by the way, they are not limited to three characters anymore.

In the early days of the Internet, domain names were free for the asking, and I stocked up on quite a few for no other reason than a gut feeling they had some value. I did ultimately sell a lot of them, including several Fortune 500 companies who bought their corporate names back from me. By the time I realized there might be a bigger opportunity here, the rules of the game changed and big companies that had previously shown up with checkbooks now showed up with lawyers. Ah, well!

But for all my domain name hoarding, I couldn’t ever get domains names with the “.edu” TLD because they were reserved for schools. Similarly, “.net” was reserved for Internet Service Providers back then, and “.org” was reserved for non-profits. These distinctions were widely understood back then, and even today, I hear people telling me some organization “must” be a non-profit because it has a “.org” domain name. Old naming conventions die hard. More importantly, people are hungry for trustmarks.

But TLDs were never great trustmarks, for two reasons. First, validating an organization’s credentials before handing out a domain name is hard and expensive work. Second, domain names don’t sell for a lot, so you can only make money with volume. The pickier you are, the less money you make.

Despite this, the non-profit sector is now pushing the “.ngo” TLD. Think of it as a do-over of the “.org” TLD, because the operator of the domain is trying to limit sales to non-profit entities with the explicit hope that the TLD will become a trustmark over time. Similarly, the AICPA, the big association of certified public accountants, is in a fierce battle to control the forthcoming “.cpa” TLD, again with the hope it can restrict its use to certified public accountants and build it into a trustmark.

My view is that TLDs make for poor trustmarks. The economics make it hard to enforce standards, and there are too many sleazy operators in the business that drag down the credibility of TLDs across the board. The need for online trustmarks remains high. Who better than data companies to seize the opportunity?

 

Survey Says ... It Depends

The data for data products can come from a wide array of sources. Traditionally, datasets were compiled through primary research, usually via questionnaires or by phone. There is alsosecondary research, where staff gathers data using online sources. There are also public domain databases that can be leveraged. We have also seen a rise in technologically-driven data gathering, such as web harvesting. And a growing number of data publishers license third-party data to augment their data gathering. Almost anything goes these days, and the savviest data publishers are mixing and matching their collection techniques for maximum effectiveness. (a topic that will be addressed at the Business Information and Media Summit in November. )

This brings me to a question I have been asked more than a few times: can survey data be turned into a data product? When I talk about surveys, I mean the types of surveys most of us do routinely: you ask, say, 20,000 restaurant owners to answer questions about their businesses and the market generally, and if you’re lucky, you’ll get 1,000 responses. My take? While a survey does in fact generate data, I don’t think a survey automatically qualifies as a commercial data product. The reason is subtle, but important.

Much of the value of a data product is in its granularity and specificity. Typically, a data product focuses on organizations, individuals or products and attempts to collect as much detail as possible on each unit of coverage, as comprehensively as possible. Most surveys, by contrast, are anonymous by nature and hit-and-miss in coverage. Using our earlier example, a survey of restaurants might well be useful and valuable if it didn’t get any response from Taco Bell operators. A restaurant database without any listings of Taco Bell locations would have no credibility.  Since most surveys promise anonymity to increase survey participation rates, only aggregate reporting is possible. From my perspective, surveys of this type are useless as data products.

But not all surveys are the same. Some surveys ask respondents to list the vendors they use, or which of a specified set of companies they like the most and the least. Surveys where you ask the anonymous respondent to list or opine on specific companies or products actually can yield a very compelling type of commercial data product. That’s because the companies or products that come out of the survey effort are not anonymous. If the owner of the Blue Duck restaurant tells you that she likes National Restaurant Supply, you’re developing lots of valuable data about National Restaurant Supply that you can publish, even while keeping Blue Duck restaurant anonymous. Your survey data can report on attitudes or adoption or market share of specific products or firms and compare them and rank and rate them. That’s very valuable because the data are highly proprietary, difficult to collect and actionable.

My bottom line on surveys is that “traditional format” surveys with anonymous submissions and aggregate reporting are truly surveys, not data products. But if your survey asks respondents to tell you how much they use or like specific companies or products – you’ve got yourself the makings of a data product!

Inexhaustible Data Opportunities

A new product from LexisNexis Risk Solutions monitors newly listed homes for sale on behalf of home insurance companies to alert them when a customer is preparing to move. The insurer can use this advance notice to contact these customers to help retain their business. 

This is a great idea. For a long time now, data companies have offered so-called “new mover” databases, identifying people who have recently moved into a new home. These are prime prospects because they’re in the market for all sorts of things, sometimes urgently, meaning the first offer they get stands a strong chance of being accepted.

This LexisNexis product shows how to combine databases to up your game. What could be a better prospect than a new mover? How about a pre-mover! While LexisNexis is focused on insurance companies, there are all sorts of companies that would be very interested to have at-risk current customers identified for them so that they can focus their customer retention efforts.

What makes this big leap in sales targeting possible isn't cutting edge technology in this case. It’s having the insight to see that data produced by one type of organization (in this case real estate agents) is valuable to another type of organization (in this case, insurance companies). Add in some additional value by matching the database of one organization to the database of another, and you almost assuredly have a nice business opportunity for the taking.

That’s what is so exciting and fun about the data business today: with so many new databases coming together, opportunity is everywhere. The key is to look at every new database you see and ask, “who else could use these data, and what could I do to these data to make them even more valuable to others?”

The people who create databases are almost always trying to solve a specific, single problem or need. Flip, spin, match or sometimes simply re-sort these databases, and you can often solve someone else’s problem or need. Am I talking about what’s known as data exhaust? To some extent yes, but some of the biggest and most interesting opportunities are right in front of us in plain sight – far less complex and challenging than most of the data exhaust opportunities I have seen.