Viewing entries in
Building Databases

Form Follows Function

Numerous online marketing trade associations have announced their latest initiative to bring structure and transparency to an industry that can only be called the Wild, Wild West of the data world: online audience data. Their approach offers some useful lessons to data publishers.

At their brand-new one-page website ( this industry coalition is introducing its “Data Transparency Label.” In an attempt to be hip and clever, the coalition has modeled its data record on the familiar nutrition labels found on most food packaging today. It’s undeniably cute, but it’s a classic case of form not following function. Having decided on this approach, the designers of this label immediately boxed themselves in as to what kind and how much data they could present to buyers. I see this all the time with new data products: so much emphasis is placed on how the data looks, its visual presentation, that important data elements often end up getting minimized, hidden or even discarded. Pleasing visual presentation is desirable, but it shouldn’t come at the expense of our data.

The other constraint you immediately see is that this label format works great if an audience is derived from a single source by a single data company. But the real world is far messier than that. What if the audience is aggregated from multiple sources? What if its value derives from complex signal data that may be sourced from multiple third parties? What about resellers? Life is complicated. This label pretends it is simple. Having spent many years involved with data cards for mailing lists, during which time I became deeply frustrated by the lost opportunities caused by a simple approach used to describe increasingly sophisticated products, I see history about to repeat itself.

My biggest objection to this new label is that its focus seems to be 100% on transparency, with little attention being paid to equally valuable uses such as sourcing and comparison. The designers of this label allude to a taxonomy that will be used for classification purposes, but it’s only mentioned in passing and doesn’t feel like a priority focus at all. Perhaps most importantly, there’s no hint of whether or not these labels will be offered as a searchable database or not. There’s a potentially powerful audience sourcing tool here, and if anyone is considering that, they aren’t talking about it.

 Take-aways to consider:

·     When designing a new data product, don’t allow yourself to get boxed in by design

·     The real world is messy, with lots of exceptions. If you don’t provide for these exceptions, you’ll have a product that will never reach its full potential

·     Always remember that a good data product is much more than a filing cabinet that is used to look up specific facts. A thoughtful, well-organized dataset can deliver a lot more value to users and often to multiple groups of users. Don’t limit yourself to a single use case for your product – you’ll just be limiting your opportunity.

Regulating by the Numbers

While so many large financial institutions were teetering during the Great Recession, regulators trying to bring stability to the global financial system quickly learned a startling, shocking fact: there was really no way to net out how much money one financial institution owed to another.

The reason for this is that the complex financial trades that banks were engaged in weren’t straightforward bank-to-bank deals. JP Morgan didn’t just do trades with Citibank, for example. Rather, they were done through a web of subsidiaries, many of them set up specifically to be opaque and obscure. And that’s just the banks. Add in hedge funds and other investors, and their offshore companies and subsidiaries that also were designed to be opaque, and you quickly get to mind-numbing complexity. 

 With an eye to better regulation and better information during a future financial crisis, an idea was proposed during a 2011 meeting of the G-20 countries to create a numbering system called the Legal Entity Identifier (LEI). The simple idea was that if every legal entity engaged in financial transaction had a unique number, and the record of that legal entity also contained the number for its parent company, it would be easy to roll up these records to see the total financial exposure of any institution.

While you may never have heard of it, the LEI system actually exists, and most financial institutions now have LEI numbers. There is a push in some countries (in the United States, the Treasury Department is leading the charge) to require all companies to obtain a LEI number, it’s been slow going so far.

If this discussion has you wondering about the DUNS number from D&B, not to worry: it’s alive and well. It’s also far more evolved and comprehensive than the LEI system. However, as a privately maintained identifier system, D&B not unreasonably wants to be paid for its use. This rankles some government agencies that are paying substantial sums to D&B for access to the DUNS system, and more than a few are pushing for broad expansion of the LEI system as a replacement for the DUNS system. Suffice to say there is a lot going on behind the scenes.

There are a number of free lookup services for LEI records, and the information is in the public domain. Some data publishers may find immediate uses for LEI data, but its fundamental weakness at this point is that it’s hit and miss as to what companies have registered. Still, it’s a database to know about and watch, particularly if you have an interest in company relationships. Over time, its likely its coverage and importance will grow.

Fresh Data Sold Here! 

While many successful data publishers obsess about continually adding new features and functionality to their data products, there are lots of good reasons to be regularly evaluating your data as well.

Don’t get me wrong: new features and functionality are critically important, particularly if you have a data product that offers a workflow solution.

But adding new, well-selected data elements can add significant value and appeal as well. Here’s a few examples:

Morningstar just enhanced its suite of investment analysis tools by introducing a single new data element: a Carbon Risk Score. This score assesses how vulnerable a company is financially to the transition away from a fossil-fuel-based economy to a lower-carbon economy. Not only does the score hold significant value in its own right, but as an individual and consistently presented data element, it can be used for discovery and filtering by investment analysts. Moreover, as a proprietary piece of information, it gives Morningstar additional differentiation and strengthens its competitive edge.

Data-driven real estate listings sites such as, Zillow and Trulia have moved away from tussling over who has the most complete listings to trying to outdo each other with deeper datasets. Various combinations of these three sites now give detailed information and ratings on local schools, crime data, traffic data, neighborhood data, walkability data … even data on whether or not a particular home is likely to be a good candidate for solar panels! And in a move I particularly admire, they have gotten major cable and companies to pay to indicate if a particular house is eligible for their services. In the hotly competitive world of real estate data sites, it’s a relentless battle at the data element level, all with the goal of providing the most attractive one-stop shop for prospective homebuyers.

Consider too the intensely competitive market of hotel booking databases. Think of services such as Expedia, TripAdvisor, Oyster and Having exhausted themselves by all claiming to offer the lowest rates, they’re now seeking to differentiate themselves at the data element level. Using filters, site visitors can draw on specific data elements to locate hotels with free wi-fi, that accept pets, that have handicapped access, that are green or sustainable, that are LGBT-welcoming and even hotels that have a party atmosphere.

Features and functionality matter, but a single new and well-chosen data element can add tremendous value, while simultaneously providing competitive advantage and product differentiation. Keep your data fresh of course, but always be on the lookup for fresh new data elements as well.

Data Flipping

One of the best things above government databases is that even when the government agency makes the database available on its website for free, it isn’t very useful. That’s because government agencies put these databases online for regulatory or compliance reasons.  They’re designed to search for known entities because the expectation is that you are checking the license status of a company, or perhaps its compliance history.

Occasionally, a government agency will get ambitious and permit geographic searches, but in these cases, there are real limitations. That’s because the underlying data were collected for regulatory, not marketing purposes. So, for example, a manufacturer with 30 plants around the country may only appear in one ZIP code because the government agency wants filings only from headquarters locations.

Taking a regulatory database and changing it into, say, a marketing database, is something I call “flipping the file,” because while the underlying data remains the same, the way the database is accessed is different. Sometimes this is as simple as offering more search options; sometimes it involves normalizing or re-structuring the data to make it more useful and accessible. As just one example, a company called Labworks built a product called the RIA Database. It started with an  investment advisor database that the SEC maintains for regulatory purposes, and then flipped the file to make the same database useful to companies that wanted to market toinvestment advisors.  There are hundreds of data publishers doing this in different markets, and as you might expect, it’s a very attractive model since the underlying data can be obtained for free.

In addition to simply flipping a file, you can also enhance a database. The shortcoming of many government databases is that they focus on companies, not people, so while there may be a wealth of information on the company, data buyers typically want to know the names of contacts at those companies. Companies such as D&B and ZoomInfo do a brisk business licensing their contact information to be appended onto government databases of company information.

This is one of the truly magical aspects of the data business. Databases built for one reason can often be re-purposed for an entirely different use. And re-purposing can involve something as little as a new user interface. This magic isn’t limited to government data of course. Another great place to look for flipping opportunities is so-called “data exhaust,” data created in the course of some other activity, and thus not considered valuable by the entity creating it. You can even license data from other data providers and re-purpose it. There are a number of mapping products, for example, that take licensed company data and essentially create a new user interface by displaying data in a map context.

Increasingly, identifying the data need is as important as identifying the data source. With data, it’s all in how you look at it. 

Standard Stuff Is Actually Cool

In the not-too-distant past, there was something close to an agreed-upon standard for the user interface for software applications. Promoted by Microsoft, it is the reason that so much software still adheres to conventions such as a “file” menu in the upper left corner of the screen.

The reason Microsoft promoted this open standard is that it saw clear benefit in bringing order out of chaos. If most software functioned in largely the same way, users could become comfortable with new software faster, meaning greater productivity, reduced training time and associated cost, and greater overall levels of satisfaction.

Back up a bit more and you can see that the World Wide Web itself represented a standard – it provides one path to access all websites that function in all critical respects in the same way. Before that, companies with online offerings had varying login conventions, different communications networks, and totally proprietary software that looked like nobody else’s software. Costs were high, learning curves were steep and user satisfaction was low.

There are clear benefits to adhering to high-level user interface standards, even ones that bubble up out of nowhere to become de facto standards. Consider the term “grayed out.” By virtue of this de facto user standard, users learned that website features and functions that were “grayed out” were inaccessible to them, either because the user hadn’t paid for them, or because they weren’t relevant to what the user was currently doing within the application. Having a common understanding of what “grayed out” meant was important to many data publishers because it was a key part of the upsell strategy.

That’s why I am so disappointed to see the erosion of these standards. On many websites and mobile apps now, a “grayed out” tab now represents the active tab the user is working in, not an unavailable tab. And virtually all other standards have evaporated as designers have been allowed to favor “pretty” and “cool” over functional and intuitive. I could go on for days about software developers who similarly run amok, employing all kinds of functionality mostly because it is new and with absolutely no consideration for the user experience. What we are doing is reverting to the balkanized state of applications software before the World Wide Web.

And while I call out designers and developers, the fault really lies with the product managers who favor speed above all, or who themselves start to believe that “cutting edge” somehow confers prestige or competitive advantage. Who’s getting left out the conversation? The end-user customer. What does the customer want? At a basic level the answer is simple: a clean, intuitive interface that allows them to access data and get answers as quickly and painlessly as possible. Standard stuff, and the best reason that being different for the sake of being different isn’t in your best interest.