Where Disruptors Fear to Tread

A recent article in the New York Times paints a stunning and detailed picture of the lengths some lead generation businesses (in this case offering locksmith services) will go in pursuit of top search results listings and even more importantly, the perception of a local presence.

The article details business practices that are pretty rough. The idea is that when people are locked out of their homes or cars, they want the fastest possible service, but at some reasonable price. These lead generation services, which are centrally located, and sometimes overseas, promise both to callers. They then sell the lead to unsavory local locksmiths who show up and demand a much higher price – in cash – from the distraught victim. So how does a lead generation shop based in a foreign country look like its offices are right around the corner?

The secret sauce of these lead generation firms are two Google programs: Google Business and Map Maker. Both depend on user-generated content, hence the opportunity to manipulate Google. If you’re willing to be sleazy, it’s easy to create a fake business in Google Business. And it’s not much harder to create a fake entry for your business in Map Maker either. Indeed, as the New York Times article details with actual screen captures, one lead generation company turned an empty lot into a shopping center with its phony storefront in the prime corner location!

What makes the corruption of Map Maker possible is that Google relies on volunteers to monitor additions and changes. Yes, one of the most valuable corporations in the world doesn’t see fit to spend the money to do the job itself. A quote from the article sums up the mentality nicely: “Fighting spam is boring. The employees who cared didn’t have the political clout in the company.”

That’s an important point and one every data publisher should take to heart. Many of the companies that are trying to disrupt the business of existing data providers are, first and foremost, programming shops. They are interested in the app. The content, not so much. That’s why so many of these disruptive startups gravitate to public domain datasets. To them, it’s plug-and-play content that they don’t have to worry about or maintain. The idea of creating content from scratch is anathema to these companies. If they can’t get it for free, they’ll license it. If they can’t license it, they’ll try to crowdsource it. And even the crowdsourcing effort reflects this software bias: the goal is to build a front-end to make it easy to enter relatively clean data. Beyond that, the programmers lose interest.

If there is something that gives a data provider an edge these days, it is the willingness to roll up the sleeves and source data, aggregate it, clean it and normalize it. This is simply a place where today’s disrupters really don’t want to go.

 

Monetizing Information Flows

StreetContxt is a hot, Canadian-based start-up that just raised $8 million from A-list investors, including a number of big banks and brokerage houses. Its mission is simple: to maximize the value of the mountain of investment research that gets generated each year. But what really makes StreetContxt stand out to me is that it offers a very compelling business proposition to both those who create the research and those who use it.

For the sell-side (those who create the content), it’s currently difficult to measure the impact much less the ROI on the huge volume of research they create annually. They send it out to presumably interested and qualified recipients, with no way of knowing if it is acted on, or even viewed.

For the buy-side (those who receive and use the content), it’s impossible to keep up with the blizzard of information being pushed out to them. Even more significantly, some of this research is very good, but a lot of it isn’t. How do you identify the good stuff?

StreetContxt offers the sell-side a powerful intelligence platform. By distributing research through StreetContxt, research producers can learn exactly who viewed their research and whether it was forwarded to others (multiple forwards are used as a signal to suggest a timely and important research report). What naturally falls out of this is the ability to assess what research is having the most market impact. But StreetContxt also helps research producers correlate research with trading activity to help make sure that their research insights are being rewarded with adequate commission revenue. Even better, StreetContxt helps the sell-side by providing insight into who is reading research on what topics and with what level of engagement in order to help power sales conversations. In short, StreetContxt tracks “who’s reading what” at a very granular level both to measure impact but also to inform selling activity.

On the buy-side, StreetContxt helps those who use research with a recommendation engine. Research users can specify topical areas of interest that get tuned by StreetContxt based on who is reading and forwarding what research reports. In other word, StreetContxt has found an approach to automatically surface the best and most important research. StreetContxt also helps research users by monitoring relevant research from sources to which the research user may not currently subscribe. And since much research is provided in exchange for trading commissions, StreetContxt can help research users get the most value from these credits.

The magic described here happens because the content creators post to the central StreetContxt portal, and research users access content from the same portal. This allows StreetConxt to monitor exactly who is using what research.

Why would research users allow their every click to be tracked and turned into sales leads? Because StreetContxt offers them a powerful inventive in the form of curated research recommendations, a better way to manage research instead of having it flood their in-boxes as it does now, and most importantly of all, a way to ferret out the best and most important research.

The big lesson for me is that with a sufficiently compelling value proposition on both sides, smart companies can position themselves in the middle of an information flow and monetize the resulting data in powerful and profitable ways.

Could You Be a LinkedIn for Product Information?

A young Canadian company called Hubba is making waves internationally with its goal to become the “single source of truth” for product information. Described simply, they’d like every company with products to post and maintain their product information in their database so that everyone who needs that product information (retailers, wholesalers, online merchants) can find it in one place, in one format, and with the knowledge that it is always the most current information available.

Ambitious? You bet. But the company, only founded in 2013, has taken on over $45 million in investment. As importantly, there are signs of market traction: 10,000 customers have already listed over 1 million products on the service. The company’s founder described Hubba as “a little bit like LinkedIn for products.”

Organizing product information is no small challenge. Most companies of any size struggle just to keep their own product information organized and current. And product information is a nightmare for those who sell products. The work involved in finding and accessing current product specifications, images and brochures is slow, painful and never-ending. And good product information isn’t only in the interest of the retailer; manufacturers increasingly see the value of consistent, accurate and attractive presentation of their products across the web.

Somewhat surprisingly, this isn’t an entirely new idea. Indeed, it’s an application model we call “Central Catalogs” in our business information framework. The first company I found doing this was working in the audio-visual equipment industry. Currently called AV-IQ (and recently acquired by NewBay Media), the site lets manufacturers centralize all their product information for the benefit of their retailers.

Another company called EdgeNet performs a similar service in the hardware market. And there are others with slightly varying models. Some see themselves as content syndicators, pushing product information out to the world. Some are built on closed networks. But to date, given the scale of the challenge, most services limit themselves to a single vertical. While Hubba is currently tackling just a few verticals, it’s clearly positioning itself to become the central product information repository for everything.

Hubba isn’t just notable for its ambition. It’s also adopted a freemium model that makes it a small decision for a manufacturer to participate. And consider too how far this company has come in just three years with an offering that’s good, but in a category that certainly isn’t new. That’s Internet speed for you!

Central catalog opportunities belong naturally to companies in the center of their markets, particularly those that are data-savvy. Having a neutral market position is critical too, because product information gets political very quickly. It remains to be seen how Hubba evolves, but in the meantime, vertical market central catalog opportunities abound, and data publishers in vertical markets are the best positioned players to take advantage of these opportunities.

 

Data With Backbone

You may recall a political dust-up this summer when the Clinton campaign accused the Sanders campaign of gaining illicit access to its voter data. How was this even possible? Well, it all traces back to a private company called NGP VAN.

NGP VAN, which works only with Democratic and progressive candidates for office, maintains a central national voter database that it regularly updates and enhances. Providing campaigns with convenient access to enhanced voter data is a good product in and of itself, but there’s a lot more going on in addition to that. Each campaign uploads its own donor files to NGP VAN, where they are matched to the central voter database. In a sense, NGP VAN offers a comprehensive backbone file of all voters, and each campaign attaches its transaction history to that backbone file. If a single voter gives to two campaigns, the voter will have two records attached. Critically, campaigns can only view their own records, or that’s the way it’s supposed to work. A botched software upgrade apparently gave the Sanders campaign the ability to view records from the Clinton campaign.

This software snafu notwithstanding, this “backbone” data model is an interesting one. Because NGP VAN only works with candidates blessed by the Democratic Party, all the activity by all the campaigns is viewable in the aggregate by the Democratic Party, providing real-time insight into campaign field activities nationwide.

In addition, by matching to a dependable backbone file, each campaign has unduplicated access both to known supporters in addition to everyone else in its area, all in a single, normalized dataset The matching process also helps campaigns identify and eliminate duplicate records. As NGP VAN identifies new third-party data elements that might be helpful to its campaign clients, it appends them to its backbone database, making them immediately accessible to all its clients. NGP VAN also supplies a sophisticated user interface to its clients, including door-to-door canvassing tools that operate on an iPad. Finally, NGP VAN makes it easy to export client data to any of a number of analytical tools and services used by the campaigns.

The basic idea of building a comprehensive industry database and software platform, and letting clients deeply integrate their data into it, so that all their customers and every possible prospect reside in one place in one format – that’s deep embedment. A number of data companies are backing into this model, albeit slowly, but SaaS really opens up the prospect of everyone in a vertical industry sharing the same data without giving up any confidential information. Could your market use some backbone?

The 50% Solution

A saying attributed to the famous Philadelphia retailer John Wanamaker is that, “Half the money I spend on advertising is wasted; the trouble is I don't know which half.” Apparently, that saying can be updated for the Internet age to read, “Half the traffic to my website is non-human; the trouble is I don't know which half.”

In fact, the percentage is worse than that. According to a study by online researcher Imperva, a whopping 61.5% of traffic on the web is non-human. What do we mean by non-human? Well, it’s a category that include search engines, software that’s scraping your website, hackers, spammers and others who are up to no good.

And yes, it gets worse. The lower the traffic to your website, the greater the percentage that is likely to be non-human. Indeed, if your site gets 1,000 of fewer visits per day, the study suggests that as much as 80% of your traffic may be non-human.

Sure, a lot of this non-human traffic is search engines (and you’d be amazed how many there still are out there), and that’s probably a good thing. After all, we want exposure. But the rest of this traffic is more dubious. About 5% of your overall site traffic is likely to be scrapers -- –people using software to grab all the content on your site, for purposes benign or evil. Sure, they can’t get to your password protected content, but if you publish any amount of free data on your site in structured form, chances are that others now have that data in their databases.

Obviously, if your sell online advertising, these statistics represent an inconvenient truth. The only saving grace is that your competitors are in the same boat. But if you are a subscription site, does any of this even matter?

I think it does. Because all this non-human activity distorts all of our web analytics in addition to our overall visitor counts. Half the numbers we see are not real. These non-human visitors could lead you to believe certain pages are more popular on your site than the really are; this could cause you to use bad insights to fashion your marketing strategy. And if you are using paid search to generate traffic, you could be getting similarly bad marketing data, and paying for the privilege as well.

Most importantly, this non-human traffic distorts reality. If you’re beating yourself up because of low response, lead generation or order rates, especially given the number of uniques and page views you appear to be getting, start by dividing by two. Do your numbers suddenly look a lot better? Bots and scrapers and search engines don’t request demos, don’t download white pages and certainly don’t buy merchandise. Keep that in mind next time you’re looking at your site analytics reports or puzzling why some pages on your site get so much more attention than others. Remember, not all data are good data.