When Data Is Smarter Than Its Users

In my review of the decade past and my predictions for our new decade, the common thread is that the quality of commercial data products has advanced immeasurably, as has their insight and predictive capability. As an industry, we’ve accomplished some truly remarkable things in the past ten years by making data more powerful, more useful and more current.

This said, data buyers remain far less sophisticated than the datasets they are buying. While buyers of data used for research and planning purposes seem to both appreciate and use powerful new data capabilities, marketers – generally speaking – do not. Even worse, this problem is ages-old.

Earlier in my career, I spent several years in the direct marketing business. Even back in the 1980s we were doing file overlays, assessing purchase propensity and building out detailed prospect profiles based on hundreds of individual data elements. It was slower and sloppier and harder back then, but we were doing it. We even had artificial intelligence software, though one project in particular I recall involving a million customer records required that we rent exclusive use of  a mainframe computer for two weeks! And not only did we have the capability, we had the buy-in of the marketing world. There was a fever pitch of interest in the incredible potential of super-targeted marketing.

But what we quickly learned as mailing list providers was that while sales and marketing types talked quality, what they bought was quantity. If you went to any organization of any size and said, “we have identified the 5,000 absolute best prospects in the country for you, all ready, willing and able to buy,” you would get interest but few if any takers. At best, you’d have marketers say that they’d throw these prospects in the pot with all the others – as long as they weren’t too expensive. 

From this experience came my epiphany: marketers had no experience with high quality prospects. They were so used to crappy data they had built processes and organizations optimized to churn through vast quantities of poor quality prospects. As to our 5,000 perfect prospects, we heard things like, “we’d chew through them in a week.” Note the operative word “chew.” 

We have new and better buzzwords now, but the broad problem is the same. Nowadays, when it comes to sales leads, companies are literally feeding the beast in the form of their marketing automation platforms. And everything has to flow through the platform because otherwise reports would be inaccurate and KPIs would be wrong.

Companies today will pay handsomely for qualified sales leads – sometimes up to several hundred dollars per lead. But these top quality leads won’t get treated any better than the mediocre ones. How do I know? Because the marketers spending all these big bucks will insist the leads be formatted for easy loading into their marketing platforms, and I’ve also been told, “we’re not interested unless you can guarantee at least 100 leads per week.” And that’s how far we have progressed in 30 years: marketers have solved the tension between quality and quantity by simply insisting on both. And the pressure to deliver both will necessarily come at the expense of quality. This essential disconnect won’t be solved easily, but when it is, a new golden age of data will arrive.

 

 

 

Looking Ahead: The Application Decade

 As I noted in my previous post, the data business was the right place to be in the last decade. Commercial data producers were already well-positioned in 2010. The value of data products was already well understood. The quaint subscription model that data producers had been stubbornly clinging to for years suddenly became all the rage. The birth of Big Data and the growth of data science as a profession put a spotlight on the need for high quality datasets.

From 2010-2019, things only got better as Big Data tools proliferated, the cloud offered cheap, efficient storage, computer processing power continued to increase and we were finally able to build and make effective use of truly massive, multi-sourced databases, many updated in real time.

The advances we have made as in industry in the last ten years have been truly breathtaking. But if the last decade was characterized by a wondrous growth in the accumulation of data, the decade in front of us will be about the smart application of that data.

A picture of what’s in store for us is already emerging. Artificial intelligence will take the data industry to the next evolutionary plane by enabling us to predict buyers and sellers and other transactional activity with confidence and in advance. That’s no small statement when you consider that the vast majority of commercial data products exist to bring buyers and sellers together or otherwise enable business transactions.

Our new decade will also be notable for its embrace of data governance. There simply won’t be any place for poorly managed and sloppily maintained datasets. Those who properly see data governance as an opportunity and not a burden will prosper mightily. And yes, the commercial data business will yield a first-mover advantage, because we understood the power of data governance even before it had a name.

Boil it all down, and my prediction is that we will be entering the decade of data-driven predictions. By 2030, commercial data producers will literally be able to predict the future, at least from a sales and marketing enablement perspective. The new tools required already exist, and they will continue to improve. All that’s needed is the creativity to apply them to the oldest, most basic objective of business: buying and selling. And our industry is nothing if not creative!

Looking Back: The Data Decade

In so many respects, the last ten years can be fairly called the Data Decade. In large part, that’s because the data business came into the last decade on a strong footing. While the ad-based media world was decimated by the likes of Google and Facebook, data companies held firm to their subscription-based revenue models and thrived as a result. And while legacy print publishers struggled to make online work for them, data publishers moved online without issues or complications, in large part because their products were inherently more useful when accessed online. As importantly, data entered the last decade with a lot of buzz, because the value and power of data products had become broadly understood.
At the highest level,, data got both bigger and better in the last decade. The much-used and much-

abused term “Big Data” came into popular usage. While Big Data was misunderstood by many, the impact for data publishers is that for the first time we became able to both aggregate and productively access and use truly massive amounts of data, creating endless new opportunities for both new and enhanced data products.

\While life without the cloud is unimaginable today, at the beginning of the last decade it was just getting started and its importance was vastly underappreciated. But the cloud profoundly altered for the better both the cost and convenience of maintaining and manipulating large amounts of data.
 
I’d argue too that APIs came into their own in the last decade to become a necessary component of almost every online data business. The result of this is that data became more portable and easier to aggregate and mix and match and integrate in ways that generated lots of new revenue for data owners while also building powerful lock-in with data licensees who increasingly became reliant on these data feeds. That’s one of the reasons that the data business didn’t feel the impact of the Great Recession as severely as many others.
 
Through a combination of Big Data, the cloud and APIs, the last decade saw incredible growth in collection and use of behavioral signals to infer such critical things as purchase interest and intent, opening both new markets and new revenue opportunities.  This of course allowed many data publishers to tap into the many household name marketing automation platforms. Hopefully, companies will someday develop marketing campaigns as sophisticated as the data powering them, as the holy grail of fewer but more effective email messages still seems badly out of reach.
 
Another fascinating development of the last decade is the growing understanding of the power and value of data. The cutesy term “data exhaust” came into common usage in the last few years, referring to data created as a by-product of some other activity. And just as start-ups once rushed to add social media elements to their products, however inappropriately, venture capitalists now rarely see a business plan without a reference to a start-up’s data opportunity. There will be backlash here as both entrepreneurs and venture capitalists learn the expensive lesson that “not all data is good data,” but in the meantime, the goldrush continues unabated.
 
Somewhat related to this trend, we’ve seen much interest and activity around the concept of “data governance,” which is an acknowledgement that while poor quality data is close to useless, top quality data is enormously powerful in large part because it can be trusted implicitly. Indeed, if you listen in at any gathering of data scientists, the grousing you will hear is that they see themselves in fact as “data janitors,” reflecting the fact that they spend far more of their time cleaning and structuring data than actually analyzing it.
 
I can’t close out this decade without also mentioning the trend towards open data, which in large part refers to the increasing availability of public sector databases that often can be used to enhance commercial data products.
 
In all, it was a very good decade for the data business, a happy outcome that resulted primarily from the increased technical ability to aggregate and process huge amounts of data, growing willingness to share data on a computer-to-computer basis, and much greater attention to improving the overall quality of data. 

And the decade now in front of us? Next week, I'll take a look ahead.

Is Time Up for 230?

In 1996, several Internet lifetimes ago, Congress passed a bill called the Communications Decency Act (officially, it is Title V of the Telecommunications Act of 1996). The law was a somewhat ham-handed attempt at prohibiting the posting of indecent material online (big chunks of the law were ultimately ruled unconstitutional by the Supreme Court). But one of the sections of the law that remained in force was Section 230. In many ways, Section 230 is the basis for the modern Internet.

Section 230 is short – only 26 words – but those 26 words are so important there has even been an entire book written about their implications. Section 230 says the following: 

“No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another content provider.”

 The impetus for Section 230 was a string of court decisions where the owners of websites were being held liable for things posted by users of their websites. Section 230 stepped in to provide near-absolute immunity for website owners. Think of it as the “don’t shoot the messenger” defense. Without Section 230, websites like Facebook, Twitter and YouTube probably wouldn’t exist. And most newspapers and online publications probably wouldn’t let users post comments. Without Section 230, the Internet would look very different. Some might argue we’d be better off without it. But the protections of Section 230 extend to many information companies as well.

 That’s because Section 230 also provides strong legal protection for online ratings and reviews. Without Section 230, sites as varied as Yelp, TripAdvisor and even Wikipedia might find it difficult to operate. Indeed, all crowdsourced data sites would instantly become very risky to operate.

 The reason that Section 230 is in the news right now is that it also provides strong protection to sites that traffic in hateful and violent speech. That’s why there are moves afoot to change or even repeal Section 230. Some of these actions are well intentioned. Others are blatantly political. But regardless of intent, these are actions that publishers need to watch, because if it becomes too risky to publish third-party content, the unintended consequences will be huge indeed.

 

 

Use Your Computer Vision

Those familiar with the powerhouse real estate listing site Zillow will likely recall that it burst on the scene in 2006 with an irresistible new offering: a free online estimate of the value of every house in the United States. Zillow calls them Zestimates. The site crashed continuously from too much traffic when it first launched, and Zillow now gets a stunning 195 million unique visitors monthly, all with virtually no advertising. Credit the Zestimates for this.

 As you would expect, Zestimates are derived algorithmically, using a combination of public domain and recent sales data. The algorithm selects recent sales of similar comparable nearby houses to compute estimated value. 

As you would also expect, professional appraisers hate Zestimates. They believe that they produce better valuation estimates because they hand select the comparable nearby homes and are thus more accurate. However, with the goal of consistent appraisals, the hand selection process that appraisers use is so prescribed and formulaic that it operates much like an algorithm does. At this level, you could argue that appraisers have little advantage over the computed Zestimate.

However, one area in which appraisers have a distinct advantage is that they are able to assess the condition and interiors of the properties they are appraising. They visually inspect the home and can use interior photos of comparable homes that have recently sold to refine their estimates.

Not to be outdone, Zillow is employing artificial intelligence to create what it calls “computer vision.” Using interior and exterior photos of millions of recently sold homes, Zillow now assesses such things as curb appeal, construction quality and even landscape;  quantifies what it finds;  and factors that information into its valuation algorithm. When it has interior photos of a house, it scans for such things as granite countertops, upgraded bathrooms and even how much natural light the house enjoys, and incorporates this information into its algorithm as well.

 With this advance, appraisers look very much like their competitive advantage is owning “the last mile,” because they are the feet on the street that actually visit the house being appraised. But you can see where things are heading: as companies like Zillow refine their technology, the day may well come that an appraisal is performed by the homeowner uploading interior pictures of her house, and perhaps confirming public record data, such as number of rooms in the house.

There are many market verticals where automated inspection and interpretation of visual data can be used. While the technology is in its infancy, its power is undeniable, so it’s not too early to think about possible ways it might enhance your data products.