Make the Product, Not Just the Raw Material

Twitter exhausts me. Even though I feel I have been very selective in who I choose to follow, the volume is overwhelming. Every time I go to review my Twitter feed, I waste far too much time in an exercise to separate the wheat from the chaff to find useful nuggets of news or insight. Twitter ought to be incredibly valuable, but in its current design, users find that to overcome the sheer volume of tweets to get noticed, they have to pump out an increasing number of tweets themselves. It’s an endless game of volumetric one-upmanship that is ultimately self-defeating.

A recent article in the Wall Street Journal takes the view that Twitter is very good as a raw content creation platform, but a failure at making that content useful or even intelligible. We know that Twitter content has value: consider the number of companies looking for trends, breaking news and other signals to gain an edge and generate profits. But it is companies other than Twitter that are adding the value and making the money.

This got me to thinking. Many data publisher still focus on the quantity of the data they provided, not its value. And this inevitably leads to a mentality of selling data by the pound. These publishers deliver lots of data, and their customers figure out what to do with it. For a long time, this was a good business approach for publishers, but hardly an optimized one.

By wrapping their content in software, publishers have added value by allowing customers to act on their data more powerfully. But while data-software integration has been a boon for data publishers, there may still be entirely new products and even entirely new businesses hiding in your data. There are clues to this. Do you have lots of consultants buying your data year after year? Do they renew easily, rarely complaining about price increases? Chances are at least a few of them are productizing your data in some way. Get familiar with their specialties and their services, and you can often come away with new product ideas.

Have you ever changed your file layouts or stopped delivering a specific data field, only to get immediate panic calls from some of your customers? Chances are, they’ve built software around your content and are doing something very valuable with it. A few casual inquiries about how they’re using your data will often yield tremendous insights. Do you have whole categories of customers where you have no idea why they buy your data? Chances are, it will be worth your time to find out. It’s not unusual to find that markets you never considered are making valuable use of your data.

Data-software integration is great, but in the majority of cases, publishers are simply helping their customers better manipulate their data. But there’s a whole additional of level of value that can be created by turning your data into finished products. And while I am not arguing that you should try to run all your customers out of business, if some of them have found a way to make money by re-formatting, augmenting or manipulating your data to add value to it, I’d argue that such opportunities properly belong to the owner of that data. And your subscriber file is often the first best place to look for clues to such opportunities.

Credit Scores: Not Just for Credit Anymore

A credit score, like it or not, is something that exists for all of us. Pioneered by a company called Fair Isaac (now just known as FICO), the credit score provided powerful advantages to credit granters in two key ways. First, using massive samples of consumer payment data, FICO analysts were able to tease out what characteristics were predictive of an individual’s willingness to re-pay their debts. With this knowledge, the company built sophisticated algorithms to automatically assess and score consumers. This approach is obviously more efficient than manual credit reviews by humans, but it offered consistency and dependability as well. Second, FICO reduces your credit history to a single number in a fixed range. The higher the number, the better your credit. This innovation made it possible for banks and other to write software to offer instant credit decisions, online credit approvals and more. Moreover, a consistent national scoring system made it easy for banks to both manage and benchmark their credit portfolios, as well as watch for early signs of credit erosion.

There’s little doubt that credit scoring was a brilliant innovation, but is it so specialized it can’t be replicated elsewhere? Well, it appears that creative data types are seeing scoring opportunities everywhere these days.

Consider just one example: computer network security scores. There are several companies (and FICO just acquired one of them) that use a variety of publicly available inputs to score the computer networks of companies to assess their vulnerability to hackers. Is this even possible to do? A lot of smart people in the field say it is, and pretty much everyone agrees the need is so great that even if these scores aren’t perfect, they’re better than nothing.

You may also be asking whether or not there is a business opportunity here and indeed there is. Companies buy their own scores to assess how they are doing and to benchmark themselves against their peers. Insurance companies writing policies to cover data hacks and other cybercrimes are desperate for these objective assessments. And increasingly, companies are asking potential vendors to provide them with their scores to make sure all their vendors are taking cybersecurity seriously.

While scoring started with credit, it certainly doesn’t end there. Are there scoring opportunities in your own market? Put on your thinking cap and get creative!

Ebay Revamps By Adding Structure

Ebay, the giant online marketplace/flea market, is reacting to lackluster growth in an interesting way: with a new focus on structured data. The goal, simply put, is to make it easier for users to find merchandise on its site.

Currently, eBay merchants upload free-text descriptions of the products they are offering for sale. This works reasonably well, but as we all know, searching on unstructured text is ultimately a hit-or-miss proposition. And with over one million merchants on eBay doing their own data entry with very few rules and little data validation, you can imagine the number of errors that result, ranging from typos, to use of inconsistent terminology to missing data elements, etc. The consequence of this is that buyers can’t efficiently and confidently discover all items available for sale, and sellers can sell their products because they are not being seen.

It may seem odd that after several decades in business, eBay is just getting around to this. But in fact it hasn’t been standing still. Rather, it’s been investing its resources in perfecting its search software, trying to use algorithms to overcome weaknesses in the descriptive product data. And while eBay has made great strides, this shift to structured data is really an admission that there are limits to free text searching.

Granular, precise search results can’t be better or more accurate than the underlying data. If you want to be able to distinguish between copper and aluminum fasteners in your search results, you need your merchants to specify copper or aluminum, spell the words correctly and consistently, and have agreement on how to handle exceptions such as copperplate aluminum. Ideally, you also want your merchants to tag the metal used in the fastener so that you don’t have to hunt for the information in a block of text, with the associated chance of an erroneous result.

While we’ve come to believe there are no limits to full-text search wizardry, remember the best software in the world breaks down when the data is wrong or doesn’t exist. Google spent many years and millions of dollars trying to build online company directories, before finally admitting that even it couldn’t overcome missing and incorrect data.

Databases and data products are all about structure. Cleaning up and organizing data is slow, expensive and not a lot of fun, but it is a huge value-add. Indeed, one of the biggest complaints of those working in the Big Data arena is that the data they want to analyze is simply too inconsistent and undependable to use.

These days, anyone can aggregate giant pots of data. But increasingly, value is being created by making these pots of data more accessible by adding more structure. This is the essence of data publishing, and something successful data publishers fully appreciate and never forget.  

Time to Get a New Address?

I’ve long been fascinated by unique identifier systems, because while often hard to implement, they can provide enormous value and constitute a great business opportunity. We’re all familiar with the D&B DUNS system, but there are far more identifier systems in use in vertical markets than you might expect. Don’t, for example, try to publish a book without an ISBN number. Similarly, don’t try to get into the advertising specialties business without an ASI number.

Identifier systems are not just for companies. They exist for people too. Physicians in the U.S. have government-issued unique identifiers. LexisNexis has implemented a similar private sector solution for lawyers called the International Standard Lawyer Number (ISLN). And we’re all of course familiar with Social Security numbers. For geographic locations, think about such identifiers as ZIP codes and their value in identifying specific geographic areas.

The power of unique identifiers is that that they serve as a sort of numeric lingua franca. Everyone agrees that a specific company, person or location is identified by a single permanent identifier. This removes ambiguity. It makes all sorts of transactions easier and more efficient. It allows for better and more precise record-keeping. And in this data-centric age, it makes matching of datasets easier and more precise. If everyone can agree on a unique identifier system, all sorts of things happen more easily and smoothly. Needless to say, the operator of the identifier system is in a powerful and lucrative position.

But how ambitious can you get with a non-governmental unique identifier system? After all, if you can’t mandate adoption of your identifiers, you’ve got to build voluntary participation. That’s tough in a narrow, vertical market. Imagine trying to build participation on a broad-based, global basis.

That’s why we were intrigued to run across perhaps the most ambitious attempt at a unique identifier system we have seen. It’s operated by a company called What3Words. Its goal is to assign a unique identifier to every inch of the planet, in 3 meter square blocks. Further, much like the Internet’s Domain Name System, What3Words assigns each block a three-word name instead of numbers, believing the system will be easier to use with words rather than hard to remember random numbers or latitude and longitude coordinates.

You may be saying, “cool, but who needs this?” Well, start with obvious examples of aid agencies trying to serve areas of rural Africa, where no neat systems like ZIP codes exist. Indeed, the founders of Just3Words are quick to note that 75% of the population of the earth essentially don’t exist because they have no physical address. Similarly, hikers and travelers will benefit from being able both to find and describe remote areas. And with much talk of delivery by drones in the near future, a uniform global geo-identifier could be very useful. A consistent system also benefits government administration, development of consistent and comparable statistics, and much more. Those of us who regularly deal with international addresses know they are an inconsistent mess, and these are addresses in advanced, developed countries. There are vast swaths of the planet that still lack addressing systems at all.

It’s a big project, but there’s a big need. And hopefully this brief overview inspires some big thinking about the potential of unique identifiers to make all kinds of activities take place more smoothly and efficiently, with some of those productivity savings accruing to the operator of the identifier system.

 

 

 

Proposed Bill Puts the OPEN in Government Data

Should federal government data be open to the public? Perhaps a better way to frame the question is whether or not the federal government should make public data publicly available. Because databases compiled by the government are, with few exceptions, already open to the public, if you can track them down in the first place. And this problem with discovering government datasets has long been the rub.

The federal government collects data for many reasons, but generally data gathering is for regulatory, compliance or statistical reasons. When this data gathering relates to business entities, there’s usually a business opportunity to be found. That’s because government agencies usually collect data for one specific purpose only. For example, the Federal Aviation Administration maintains a database of all airplanes that are licensed for operation in the United States. It collects a lot of data about both the plane and its owner, but its overall objective is simply to keep a record of whether or not a given plane is licensed to operate. Even if it puts this database online for public access, your ability to search the database is limited to looking up specific airplanes by tail number or owner. This is the compliance focus of government manifesting itself. But that’s great news for commercial data publishers who can get the underlying database and add tremendous value simply by making the data parametrically searchable. Online government databases are almost always designed to help the user find information on a single, known entity. Parametric search creates a powerful sales prospecting tool. Suddenly, the database can be searched by make and model and age of the plane, with the ability to limit search results to specific geographies.

Needless to say, federal government databases can offer huge business opportunities because the government has done all the compilation work, at its own expense, and even keeps the database updated for you. But again, the challenge is finding and accessing these databases in the first place. Government agencies have no incentive to merchandise their internal databases, and many continue to resist opening their datasets to the public, usually out of bureaucratic fear or inertia.

Yes, there is data.gov, a much-heralded federal government initiative to not only move more data online, but to put it all in a central place. But the datasets of interest to commercial data publishers will rarely be found there. However, if you’re interested in data on migratory butterflies in Oklahoma, data.gov is a great place to go.

That’s why I am excited by the OPEN Government Data Act (OPEN Data Act, S. 2852, H.R. 5051) that will mandate that all federal government agencies make all of their datasets immediately available for public use, subject only to a handful of exceptions. This is a bill worth watching and supporting. Fortunes have already been made by commercial data publishers with the savvy and persistence to navigate the federal labyrinth. The OPEN Data Act will level the playing field and open even more opportunities to leverage government data for commercial applications. What’s not to like?