Viewing entries in
Building Databases

Data Flipping

One of the best things above government databases is that even when the government agency makes the database available on its website for free, it isn’t very useful. That’s because government agencies put these databases online for regulatory or compliance reasons.  They’re designed to search for known entities because the expectation is that you are checking the license status of a company, or perhaps its compliance history.

Occasionally, a government agency will get ambitious and permit geographic searches, but in these cases, there are real limitations. That’s because the underlying data were collected for regulatory, not marketing purposes. So, for example, a manufacturer with 30 plants around the country may only appear in one ZIP code because the government agency wants filings only from headquarters locations.

Taking a regulatory database and changing it into, say, a marketing database, is something I call “flipping the file,” because while the underlying data remains the same, the way the database is accessed is different. Sometimes this is as simple as offering more search options; sometimes it involves normalizing or re-structuring the data to make it more useful and accessible. As just one example, a company called Labworks built a product called the RIA Database. It started with an  investment advisor database that the SEC maintains for regulatory purposes, and then flipped the file to make the same database useful to companies that wanted to market toinvestment advisors.  There are hundreds of data publishers doing this in different markets, and as you might expect, it’s a very attractive model since the underlying data can be obtained for free.

In addition to simply flipping a file, you can also enhance a database. The shortcoming of many government databases is that they focus on companies, not people, so while there may be a wealth of information on the company, data buyers typically want to know the names of contacts at those companies. Companies such as D&B and ZoomInfo do a brisk business licensing their contact information to be appended onto government databases of company information.

This is one of the truly magical aspects of the data business. Databases built for one reason can often be re-purposed for an entirely different use. And re-purposing can involve something as little as a new user interface. This magic isn’t limited to government data of course. Another great place to look for flipping opportunities is so-called “data exhaust,” data created in the course of some other activity, and thus not considered valuable by the entity creating it. You can even license data from other data providers and re-purpose it. There are a number of mapping products, for example, that take licensed company data and essentially create a new user interface by displaying data in a map context.

Increasingly, identifying the data need is as important as identifying the data source. With data, it’s all in how you look at it. 

Standard Stuff Is Actually Cool

In the not-too-distant past, there was something close to an agreed-upon standard for the user interface for software applications. Promoted by Microsoft, it is the reason that so much software still adheres to conventions such as a “file” menu in the upper left corner of the screen.

The reason Microsoft promoted this open standard is that it saw clear benefit in bringing order out of chaos. If most software functioned in largely the same way, users could become comfortable with new software faster, meaning greater productivity, reduced training time and associated cost, and greater overall levels of satisfaction.

Back up a bit more and you can see that the World Wide Web itself represented a standard – it provides one path to access all websites that function in all critical respects in the same way. Before that, companies with online offerings had varying login conventions, different communications networks, and totally proprietary software that looked like nobody else’s software. Costs were high, learning curves were steep and user satisfaction was low.

There are clear benefits to adhering to high-level user interface standards, even ones that bubble up out of nowhere to become de facto standards. Consider the term “grayed out.” By virtue of this de facto user standard, users learned that website features and functions that were “grayed out” were inaccessible to them, either because the user hadn’t paid for them, or because they weren’t relevant to what the user was currently doing within the application. Having a common understanding of what “grayed out” meant was important to many data publishers because it was a key part of the upsell strategy.

That’s why I am so disappointed to see the erosion of these standards. On many websites and mobile apps now, a “grayed out” tab now represents the active tab the user is working in, not an unavailable tab. And virtually all other standards have evaporated as designers have been allowed to favor “pretty” and “cool” over functional and intuitive. I could go on for days about software developers who similarly run amok, employing all kinds of functionality mostly because it is new and with absolutely no consideration for the user experience. What we are doing is reverting to the balkanized state of applications software before the World Wide Web.

And while I call out designers and developers, the fault really lies with the product managers who favor speed above all, or who themselves start to believe that “cutting edge” somehow confers prestige or competitive advantage. Who’s getting left out the conversation? The end-user customer. What does the customer want? At a basic level the answer is simple: a clean, intuitive interface that allows them to access data and get answers as quickly and painlessly as possible. Standard stuff, and the best reason that being different for the sake of being different isn’t in your best interest.

Where the Value is In Visual Data

The New York Times recently reported on the results of a fascinating project conducted at Stanford University. Using over 50 million images drawn from Google Street View, along with ZIP code data, the researchers were able to associate automobile ownership preferences with voting patterns. For example, the researchers found that the type of vehicles most strongly associated with Republican voting districts are extended-cab pickup trucks.

While this particular finding may not surprise you, the underlying work represents a programmatic tour de force, because artificial intelligence software was used to identify and classify the vehicles found in these 50 million images. The researchers used automotive experts to identify specific makes and models of cars from the images, giving the software a basis for training itself to find and identify vehicles all by itself, regardless of the angle of the photo, shadows and a host of other factors that make this anything but an easy task.

This project is believed to represent that first time that images have been used on a large scale to develop data. And while this image identification is a technically impressive example of both artificial intelligence and Big Data, most of the really useful insights come from associating the finding with other datasets, what I like to refer to as Little Data.

Think about it. The artificial intelligence software is given as input an image, and the ZIP code associated with that image. The software identifies an automobile make and model from the image, and creates an output record with two elements: the ZIP code and a normalized make and model description of the automobile. With this, you can explore auto ownership patterns by geography. But with just a few more steps, you can go a lot further.

You can use “little data” government and private datasets to link ZIP code to voting districts and thus voting patterns. With this information, you can determine that people living in Republican districts prefer extended-cab pickup trucks.

You can also use the ZIP code in the record to link to “little data” Census demographic data summarized at ZIP level. With this, you can correlate car ownership patterns to such things as income, race, education and ethnicity. Indeed, the study found it could predict demographics and voting patterns based on auto ownership.

And you can go further. You can link your normalized automobile make and model data to “little data” datasets of automobile technical specifications which is how the study determined, for example, that based on miles per gallon, Burlington, Vermont is the greenest city in the United States.

Using artificial intelligence on a Big Data image database to build a normalized text database is impressive. But all the real insights in this study could only be developed by linking Big Data to Little Data to allow for granular analysis.

While Big Data and artificial intelligence are getting all the breathless coverage, we should never forget that Little Data is what’s providing the real value behind the scenes.  

Survey Says ... It Depends

The data for data products can come from a wide array of sources. Traditionally, datasets were compiled through primary research, usually via questionnaires or by phone. There is alsosecondary research, where staff gathers data using online sources. There are also public domain databases that can be leveraged. We have also seen a rise in technologically-driven data gathering, such as web harvesting. And a growing number of data publishers license third-party data to augment their data gathering. Almost anything goes these days, and the savviest data publishers are mixing and matching their collection techniques for maximum effectiveness. (a topic that will be addressed at the Business Information and Media Summit in November. )

This brings me to a question I have been asked more than a few times: can survey data be turned into a data product? When I talk about surveys, I mean the types of surveys most of us do routinely: you ask, say, 20,000 restaurant owners to answer questions about their businesses and the market generally, and if you’re lucky, you’ll get 1,000 responses. My take? While a survey does in fact generate data, I don’t think a survey automatically qualifies as a commercial data product. The reason is subtle, but important.

Much of the value of a data product is in its granularity and specificity. Typically, a data product focuses on organizations, individuals or products and attempts to collect as much detail as possible on each unit of coverage, as comprehensively as possible. Most surveys, by contrast, are anonymous by nature and hit-and-miss in coverage. Using our earlier example, a survey of restaurants might well be useful and valuable if it didn’t get any response from Taco Bell operators. A restaurant database without any listings of Taco Bell locations would have no credibility.  Since most surveys promise anonymity to increase survey participation rates, only aggregate reporting is possible. From my perspective, surveys of this type are useless as data products.

But not all surveys are the same. Some surveys ask respondents to list the vendors they use, or which of a specified set of companies they like the most and the least. Surveys where you ask the anonymous respondent to list or opine on specific companies or products actually can yield a very compelling type of commercial data product. That’s because the companies or products that come out of the survey effort are not anonymous. If the owner of the Blue Duck restaurant tells you that she likes National Restaurant Supply, you’re developing lots of valuable data about National Restaurant Supply that you can publish, even while keeping Blue Duck restaurant anonymous. Your survey data can report on attitudes or adoption or market share of specific products or firms and compare them and rank and rate them. That’s very valuable because the data are highly proprietary, difficult to collect and actionable.

My bottom line on surveys is that “traditional format” surveys with anonymous submissions and aggregate reporting are truly surveys, not data products. But if your survey asks respondents to tell you how much they use or like specific companies or products – you’ve got yourself the makings of a data product!

Inexhaustible Data Opportunities

A new product from LexisNexis Risk Solutions monitors newly listed homes for sale on behalf of home insurance companies to alert them when a customer is preparing to move. The insurer can use this advance notice to contact these customers to help retain their business. 

This is a great idea. For a long time now, data companies have offered so-called “new mover” databases, identifying people who have recently moved into a new home. These are prime prospects because they’re in the market for all sorts of things, sometimes urgently, meaning the first offer they get stands a strong chance of being accepted.

This LexisNexis product shows how to combine databases to up your game. What could be a better prospect than a new mover? How about a pre-mover! While LexisNexis is focused on insurance companies, there are all sorts of companies that would be very interested to have at-risk current customers identified for them so that they can focus their customer retention efforts.

What makes this big leap in sales targeting possible isn't cutting edge technology in this case. It’s having the insight to see that data produced by one type of organization (in this case real estate agents) is valuable to another type of organization (in this case, insurance companies). Add in some additional value by matching the database of one organization to the database of another, and you almost assuredly have a nice business opportunity for the taking.

That’s what is so exciting and fun about the data business today: with so many new databases coming together, opportunity is everywhere. The key is to look at every new database you see and ask, “who else could use these data, and what could I do to these data to make them even more valuable to others?”

The people who create databases are almost always trying to solve a specific, single problem or need. Flip, spin, match or sometimes simply re-sort these databases, and you can often solve someone else’s problem or need. Am I talking about what’s known as data exhaust? To some extent yes, but some of the biggest and most interesting opportunities are right in front of us in plain sight – far less complex and challenging than most of the data exhaust opportunities I have seen.