Viewing entries in
Building Databases

Standard Stuff Is Actually Cool

In the not-too-distant past, there was something close to an agreed-upon standard for the user interface for software applications. Promoted by Microsoft, it is the reason that so much software still adheres to conventions such as a “file” menu in the upper left corner of the screen.

The reason Microsoft promoted this open standard is that it saw clear benefit in bringing order out of chaos. If most software functioned in largely the same way, users could become comfortable with new software faster, meaning greater productivity, reduced training time and associated cost, and greater overall levels of satisfaction.

Back up a bit more and you can see that the World Wide Web itself represented a standard – it provides one path to access all websites that function in all critical respects in the same way. Before that, companies with online offerings had varying login conventions, different communications networks, and totally proprietary software that looked like nobody else’s software. Costs were high, learning curves were steep and user satisfaction was low.

There are clear benefits to adhering to high-level user interface standards, even ones that bubble up out of nowhere to become de facto standards. Consider the term “grayed out.” By virtue of this de facto user standard, users learned that website features and functions that were “grayed out” were inaccessible to them, either because the user hadn’t paid for them, or because they weren’t relevant to what the user was currently doing within the application. Having a common understanding of what “grayed out” meant was important to many data publishers because it was a key part of the upsell strategy.

That’s why I am so disappointed to see the erosion of these standards. On many websites and mobile apps now, a “grayed out” tab now represents the active tab the user is working in, not an unavailable tab. And virtually all other standards have evaporated as designers have been allowed to favor “pretty” and “cool” over functional and intuitive. I could go on for days about software developers who similarly run amok, employing all kinds of functionality mostly because it is new and with absolutely no consideration for the user experience. What we are doing is reverting to the balkanized state of applications software before the World Wide Web.

And while I call out designers and developers, the fault really lies with the product managers who favor speed above all, or who themselves start to believe that “cutting edge” somehow confers prestige or competitive advantage. Who’s getting left out the conversation? The end-user customer. What does the customer want? At a basic level the answer is simple: a clean, intuitive interface that allows them to access data and get answers as quickly and painlessly as possible. Standard stuff, and the best reason that being different for the sake of being different isn’t in your best interest.

Where the Value is In Visual Data

The New York Times recently reported on the results of a fascinating project conducted at Stanford University. Using over 50 million images drawn from Google Street View, along with ZIP code data, the researchers were able to associate automobile ownership preferences with voting patterns. For example, the researchers found that the type of vehicles most strongly associated with Republican voting districts are extended-cab pickup trucks.

While this particular finding may not surprise you, the underlying work represents a programmatic tour de force, because artificial intelligence software was used to identify and classify the vehicles found in these 50 million images. The researchers used automotive experts to identify specific makes and models of cars from the images, giving the software a basis for training itself to find and identify vehicles all by itself, regardless of the angle of the photo, shadows and a host of other factors that make this anything but an easy task.

This project is believed to represent that first time that images have been used on a large scale to develop data. And while this image identification is a technically impressive example of both artificial intelligence and Big Data, most of the really useful insights come from associating the finding with other datasets, what I like to refer to as Little Data.

Think about it. The artificial intelligence software is given as input an image, and the ZIP code associated with that image. The software identifies an automobile make and model from the image, and creates an output record with two elements: the ZIP code and a normalized make and model description of the automobile. With this, you can explore auto ownership patterns by geography. But with just a few more steps, you can go a lot further.

You can use “little data” government and private datasets to link ZIP code to voting districts and thus voting patterns. With this information, you can determine that people living in Republican districts prefer extended-cab pickup trucks.

You can also use the ZIP code in the record to link to “little data” Census demographic data summarized at ZIP level. With this, you can correlate car ownership patterns to such things as income, race, education and ethnicity. Indeed, the study found it could predict demographics and voting patterns based on auto ownership.

And you can go further. You can link your normalized automobile make and model data to “little data” datasets of automobile technical specifications which is how the study determined, for example, that based on miles per gallon, Burlington, Vermont is the greenest city in the United States.

Using artificial intelligence on a Big Data image database to build a normalized text database is impressive. But all the real insights in this study could only be developed by linking Big Data to Little Data to allow for granular analysis.

While Big Data and artificial intelligence are getting all the breathless coverage, we should never forget that Little Data is what’s providing the real value behind the scenes.  

Survey Says ... It Depends

The data for data products can come from a wide array of sources. Traditionally, datasets were compiled through primary research, usually via questionnaires or by phone. There is alsosecondary research, where staff gathers data using online sources. There are also public domain databases that can be leveraged. We have also seen a rise in technologically-driven data gathering, such as web harvesting. And a growing number of data publishers license third-party data to augment their data gathering. Almost anything goes these days, and the savviest data publishers are mixing and matching their collection techniques for maximum effectiveness. (a topic that will be addressed at the Business Information and Media Summit in November. )

This brings me to a question I have been asked more than a few times: can survey data be turned into a data product? When I talk about surveys, I mean the types of surveys most of us do routinely: you ask, say, 20,000 restaurant owners to answer questions about their businesses and the market generally, and if you’re lucky, you’ll get 1,000 responses. My take? While a survey does in fact generate data, I don’t think a survey automatically qualifies as a commercial data product. The reason is subtle, but important.

Much of the value of a data product is in its granularity and specificity. Typically, a data product focuses on organizations, individuals or products and attempts to collect as much detail as possible on each unit of coverage, as comprehensively as possible. Most surveys, by contrast, are anonymous by nature and hit-and-miss in coverage. Using our earlier example, a survey of restaurants might well be useful and valuable if it didn’t get any response from Taco Bell operators. A restaurant database without any listings of Taco Bell locations would have no credibility.  Since most surveys promise anonymity to increase survey participation rates, only aggregate reporting is possible. From my perspective, surveys of this type are useless as data products.

But not all surveys are the same. Some surveys ask respondents to list the vendors they use, or which of a specified set of companies they like the most and the least. Surveys where you ask the anonymous respondent to list or opine on specific companies or products actually can yield a very compelling type of commercial data product. That’s because the companies or products that come out of the survey effort are not anonymous. If the owner of the Blue Duck restaurant tells you that she likes National Restaurant Supply, you’re developing lots of valuable data about National Restaurant Supply that you can publish, even while keeping Blue Duck restaurant anonymous. Your survey data can report on attitudes or adoption or market share of specific products or firms and compare them and rank and rate them. That’s very valuable because the data are highly proprietary, difficult to collect and actionable.

My bottom line on surveys is that “traditional format” surveys with anonymous submissions and aggregate reporting are truly surveys, not data products. But if your survey asks respondents to tell you how much they use or like specific companies or products – you’ve got yourself the makings of a data product!

Inexhaustible Data Opportunities

A new product from LexisNexis Risk Solutions monitors newly listed homes for sale on behalf of home insurance companies to alert them when a customer is preparing to move. The insurer can use this advance notice to contact these customers to help retain their business. 

This is a great idea. For a long time now, data companies have offered so-called “new mover” databases, identifying people who have recently moved into a new home. These are prime prospects because they’re in the market for all sorts of things, sometimes urgently, meaning the first offer they get stands a strong chance of being accepted.

This LexisNexis product shows how to combine databases to up your game. What could be a better prospect than a new mover? How about a pre-mover! While LexisNexis is focused on insurance companies, there are all sorts of companies that would be very interested to have at-risk current customers identified for them so that they can focus their customer retention efforts.

What makes this big leap in sales targeting possible isn't cutting edge technology in this case. It’s having the insight to see that data produced by one type of organization (in this case real estate agents) is valuable to another type of organization (in this case, insurance companies). Add in some additional value by matching the database of one organization to the database of another, and you almost assuredly have a nice business opportunity for the taking.

That’s what is so exciting and fun about the data business today: with so many new databases coming together, opportunity is everywhere. The key is to look at every new database you see and ask, “who else could use these data, and what could I do to these data to make them even more valuable to others?”

The people who create databases are almost always trying to solve a specific, single problem or need. Flip, spin, match or sometimes simply re-sort these databases, and you can often solve someone else’s problem or need. Am I talking about what’s known as data exhaust? To some extent yes, but some of the biggest and most interesting opportunities are right in front of us in plain sight – far less complex and challenging than most of the data exhaust opportunities I have seen.

 

 

Data as the Decider

I have discussed before how data providers can leverage their central, neutral market positions to collect highly valuable data that otherwise couldn’t be collected. Examples abound of data providers that have convinced companies to provide them with their information crown jewels – sales data, pricing data and the like – in return for getting it back (on a paid or unpaid basis) in aggregate, anonymized form. Fundamentally, the companies realize that their data, no matter how sensitive they consider it to be, has even more value to them when combined with or compared to a larger set of similar data. These situations are wonderful opportunities for data publishers, and they are cropping up more and more as companies get better about organizing their internal data and then become more sophisticated about how to optimize it.

But there is a level above this enviable market position. It’s when data actually starts to drive commercial transactions. I have worked with companies whose data products actually drive the bonus compensation of salespeople and managers across entire industries. I have seen data products that are used to set valuations of companies for sale. And of course, there are industry giants such a Nielsen, with its well-known television ratings that drive billions in ad dollars.

The commonality among this rarified group of data providers is that their data is survey-driven. These companies leverage not only their neutrality and impartiality, but they are gathering data that no individual organization could easily or credibly collect on its own. In many cases, these data companies are gathering customer and user experiences and actions.

Yes, for the right kind of opportunity, a simple survey can be turned into an extraordinarily valuable data product. Again, the key drivers of such opportunities are: 1) a need to gather customer/subscriber/user opinions/ratings/activities; 2) the information is difficult for industry players to gather themselves; and 3) the need for trust and objectivity in the collected data.

It may sound hard and complicated, but in the right situations, a well-executed survey can be the path to a very valuable data franchise.