History Matters

A fascinating article in Technology Review highlights and quantifies a problem that is also an opportunity: disappearing data.

The article reports on the findings of a new research study of social media that finds that in an analysis of recent major cultural events (e.g. Arab Spring), 11% of the social media content had disappeared within a year and 27% within 2 years. Beyond that, the study authors have calculated that the world loses 0.02% of its culturally significant social media material every day.

Keep in mind that what the study authors are measuring is not the number of tweets that are disappearing, but the web pages that these tweets are linking to.

And while this study only looked at headline news events, you've probably had experiences similar to mine: links to news stories and press releases that are no longer active, businesses removing all traces of failed product launches from their websites, online stories that are undated and thus difficult to rely on, news outlets that continue to treat yesterday's news like yesterday's news - and the list goes on.

The short message is that historical knowledge is always valuable, and it becomes even more valuable as it becomes less available. And that's what we are seeing online. Whether by accident or design, more and more content is "aging off" the web, creating opportunities for those companies that hold onto it in an organized fashion.

Yes, there are amazing resources like The Internet Archive that attempt to maintain searchable, historical copies of the entire web, but as you might suspect, this is hard, and there is a limit to the amount of detail it can store. But taking on this task on a vertical market basis is do-able, and potentially quite remunerative.

Many data products are designed to provide the latest and most current data, and that's smart. But as you add new data, don't delete the old data. It's amazing what insights can be gleaned by watching the changes to a business over time, and as the art of data analytics evolves, this capability will only get more compelling. But you can't analyze data you don't have. More than a few data companies I know owe their strength and their profitability to the fact that they maintained historical data. And as you become the sole source for content that has disappeared elsewhere from the web, you create a proprietary aspect to your content, along with an often insurmountable competitive barrier.

So take a little time to think about what information (specific data points or full-text documents) you can preserve, and how valuable this information might become if it wasn't available anywhere else, because that's the trend I see. What's past isn't just prologue; it may also be a new profit center!