Google's mission is “to organize the world's information and make it universally accessible and useful”. How does its newest foray into data stack up? Early this year, Google officially launched something it calls Dataset Search. It’s been in public beta since 2018 (I still contend that the concept of “public beta” remains Google’s single greatest technological innovation), but now it’s for real and according to Google, already contains information on over 25 million datasets.
Dataset Search is loosely tied to Google Scholar, a specialized version of the Google search engine intended to make it easier to search for academic papers. Along those lines, Google sees Dataset Search as something most useful to scholars and data journalists.
Improving discovery of datasets is a worthy and important task. Quite likely, 25 million datasets are only a tiny fraction of what exists online. And in this age of open data, Google is tackling a big task at just the right time.
Anyone can add a database to Dataset Search. Include some metatags on the relevant webpage, and the Google crawler will find it, and automatically inject a record into Dataset Search. Is it worth the effort? Well, it’s free and it’s fairly easy to participate, and it’s Google. Google does note that information in Dataset Search is added to the Google Knowledge Graph, meaning it connects Dataset Search records to all other information it knows about the organization that owns the dataset. Some suspect this may improve your overall Google search ranking, though Google is necessarily playing coy on this point.
What’s in Dataset Search today? I have to say, while it has potential, it’s going through some growing pains. Pro Publica has a very good database of financial data on non-profit organizations. However, rather than list the dataset once, Pro Publica appears to have coded its database so all 800,000 records in its database have become separate records in Database Search. Humorously, for some other organizations, a CEO headshot will be displayed instead of a company logo. This will all be corrected in time. My biggest disappointment, however, is likely to remain: Dataset Search is a database of databases searchable primarily by full text queries. There are very few parameters that can be applied to usefully narrow a search, so much like the primary Google search engine itself, you will still have to manually browse through endless search results to find what you want.
I do want to stress that Dataset Search is open to commercial data products. It’s an easy, free way to get some additional online exposure for your products and if it bumps up your search result rankings, it’s well worth the effort. And as Dataset Search evolves, it may well become an accepted way to discover and source commercial data products. Why not get in on the ground floor?