Education Data: Lessons Learned

A recent Reuters story described a new national database of student information. Reportedly built at a cost of $100 million, and backed by prestigious non-profits such as the Bill and Melinda Gates Foundation and the Carnegie Corporation, the aim of the project is to build a standardized database of information on all students in the country, grades K-12. No, this is not aggregate data. This is detailed, specific information on every student that can include such information as grades, learning disabilities, hobbies and interests. Surely this database doesn’t include student names and other identifiers you say. But in fact it does. And that’s the point. It’s also why this database is so exciting to so many companies in the education market. The goal is to jump-start technology-driven individualized learning for students.

According to the article, school administrators have long (and legally) maintained all sorts of data on students for educational purposes. And, as you would suspect, every school did things a little differently. They collected different data elements and held them in different formats in different locations. So if you were marketing educational technology to schools that tried to personalize the learning experience, you faced a painful data interface challenge for every new school you sold. Seeing a real impediment to growth for cutting-edge educational technology, several big foundations jumped. And rather than just developing a data standard which would take decades to gain widespread adoption, they invested to actually build a single database. Participation by schools is voluntarily and (currently) free, but lots of incentives have been created to spur participation.

We can draw a few fascinating lessons and trends from this initiative.

First, we see a wonderful acknowledgement of what I modestly call Perkins’ Law: no organization will voluntarily build and maintain a database if it is outside their core competencies and there is a viable alternative to doing so. The commercial data publishing business is really built around this law: data publishers succeed because people want the data, but don’t want to collect or maintain it themselves.

Second, we see another great example of a “data pipe,” where one organization provides data that developers can tap into via APIs to build applications driven by that data. The data provider seeks to become an information utility, while dozens or even hundreds of different developers can identify and mine niche opportunities faster and better than any single data publisher. This is a relatively young model, but it’s quickly gaining a following.

Third, valuable data is more often than not sensitive data as well. As this database hits the radar of parents and civil liberties advocates, the inevitable questions around privacy and security are being asked. And the answers to date, according to the article, do not seem particularly robust or reassuring. The non-profit managing the database makes all the appropriate noises about protecting the data, while at the same time the database exists in large part to benefit commercial entities. While the goal of the database is laudable, we have a classic example of a database that will likely succeed only with strong governance and privacy policies. This is something that commercial data publishers will need to become attentive to in years to come.

It’s a fascinating initiative, and one where we can all learn by example.