Ebay, the giant online marketplace/flea market, is reacting to lackluster growth in an interesting way: with a new focus on structured data. The goal, simply put, is to make it easier for users to find merchandise on its site.
Currently, eBay merchants upload free-text descriptions of the products they are offering for sale. This works reasonably well, but as we all know, searching on unstructured text is ultimately a hit-or-miss proposition. And with over one million merchants on eBay doing their own data entry with very few rules and little data validation, you can imagine the number of errors that result, ranging from typos, to use of inconsistent terminology to missing data elements, etc. The consequence of this is that buyers can’t efficiently and confidently discover all items available for sale, and sellers can sell their products because they are not being seen.
It may seem odd that after several decades in business, eBay is just getting around to this. But in fact it hasn’t been standing still. Rather, it’s been investing its resources in perfecting its search software, trying to use algorithms to overcome weaknesses in the descriptive product data. And while eBay has made great strides, this shift to structured data is really an admission that there are limits to free text searching.
Granular, precise search results can’t be better or more accurate than the underlying data. If you want to be able to distinguish between copper and aluminum fasteners in your search results, you need your merchants to specify copper or aluminum, spell the words correctly and consistently, and have agreement on how to handle exceptions such as copperplate aluminum. Ideally, you also want your merchants to tag the metal used in the fastener so that you don’t have to hunt for the information in a block of text, with the associated chance of an erroneous result.
While we’ve come to believe there are no limits to full-text search wizardry, remember the best software in the world breaks down when the data is wrong or doesn’t exist. Google spent many years and millions of dollars trying to build online company directories, before finally admitting that even it couldn’t overcome missing and incorrect data.
Databases and data products are all about structure. Cleaning up and organizing data is slow, expensive and not a lot of fun, but it is a huge value-add. Indeed, one of the biggest complaints of those working in the Big Data arena is that the data they want to analyze is simply too inconsistent and undependable to use.
These days, anyone can aggregate giant pots of data. But increasingly, value is being created by making these pots of data more accessible by adding more structure. This is the essence of data publishing, and something successful data publishers fully appreciate and never forget.