Recently signed into law, the Foundations for Evidence-Based Policymaking Act is going to have a big impact on the data business. It contains within it provisions to open up all non-sensitive databases, and make them easily available in machine-readable, non-proprietary formats. Moreover, every federal agency is now obliged to publish a master catalog of all its datasets in order to make them more readily accessible.
Federal government databases are the gift that keeps on giving. Because they are generally the result of regulatory/compliance activity by the government, they are quite complete, and the data quite trustworthy. Moreover, the great shift online has made it easier for government agencies to require more frequent data updates. And with more data coming to these agencies electronically, the notoriously bad government data entry of years past has largely disappeared. Best of all, you can obtain these databases at little or no charge to use as you please.
However, this new push for open formats and is a two-edged sword. Many of the great data companies that have been built in whole or in part on government data got significant advantage from the complexity and obscurity of that data. Indeed, government data has been open for decades now – you just needed to know it existed, what it was called and who to talk to in order to get your hands on it. This was actually a meaningful barrier to entry for many years.
While it won’t happen overnight, increased data transparency and availability is likely to create a new wave of industry disruption. These government datasets are catnip to Silicon Valley start-ups because these companies develop software and don’t have the skills or interest to compile data. “Plug and play” data will assuredly attract these new players, and they will cause havoc with many established data providers.
How do you fight back against this coming onslaught? The key is to understand the Achilles heel of these companies. Not only don’t these companies tend to understand data, most of them actively dislike it. That means that you can find competitive advantage by augmenting your data with proprietary data elements or even other public data that might need to be cleaned and normalized. Think about emphasizing historical data, which is often harder for new entrants to obtain. These disruptive players will win every time if the battlefield is around the user interface or fancy reports. Change the battlefield to the data itself, and the advantage shifts back to you.