Dirty Data


The recent pronouncement from the research firm Gartner that "dirty data is a business problem, not an IT problem," puts a spotlight on an important issue: automating your business processes won't help -- and might even hurt -- if the underlying data is old, inaccurate, poorly fielded or inconsistent.

Data publishers fully appreciate that their value is based on well-managed data. But businesses -- our customers -- continue to avoid the issue, which most of them find confusing if not overwhelming. What we consistently hear from executives at end-user companies is that because their data is "in the computer," keeping it clean is an IT problem. Those of us who have worked with corporate IT departments know that IT folks typically go to absurd lengths to avoid directly touching data, ever.

To their credit, IT departments are increasingly investing in data hygiene software to try to clean up dirty databases, and there seems to be increasing understanding that the only long-term solution is to catch bad data at input, before it gets into the system. But initiatives on both these fronts have been limited and slow.

This has created a buregeoning opportunity for data publishers because of a growing need for clean look- up databases, matching services to help separate the good data from the bad, and even manual and automated data scrubbing services. Once these companies get their databases in shape, there are then great opportunities to sell data augmentation services, or even to provide databases on a turn- key basis to companies that don’t have the interest or resources to maintain good databases themselves.

As an industry, there are a lot of ways we can help tackle the dirty data problem at its roots and help make the world of data a lot cleaner, while cleaning up in the process.

Labels: , , , , ,

Comment