Comment

Education Data: Lessons Learned

A recent Reuters story described a new national database of student information. Reportedly built at a cost of $100 million, and backed by prestigious non-profits such as the Bill and Melinda Gates Foundation and the Carnegie Corporation, the aim of the project is to build a standardized database of information on all students in the country, grades K-12. No, this is not aggregate data. This is detailed, specific information on every student that can include such information as grades, learning disabilities, hobbies and interests. Surely this database doesn’t include student names and other identifiers you say. But in fact it does. And that’s the point. It’s also why this database is so exciting to so many companies in the education market. The goal is to jump-start technology-driven individualized learning for students.

According to the article, school administrators have long (and legally) maintained all sorts of data on students for educational purposes. And, as you would suspect, every school did things a little differently. They collected different data elements and held them in different formats in different locations. So if you were marketing educational technology to schools that tried to personalize the learning experience, you faced a painful data interface challenge for every new school you sold. Seeing a real impediment to growth for cutting-edge educational technology, several big foundations jumped. And rather than just developing a data standard which would take decades to gain widespread adoption, they invested to actually build a single database. Participation by schools is voluntarily and (currently) free, but lots of incentives have been created to spur participation.

We can draw a few fascinating lessons and trends from this initiative.

First, we see a wonderful acknowledgement of what I modestly call Perkins’ Law: no organization will voluntarily build and maintain a database if it is outside their core competencies and there is a viable alternative to doing so. The commercial data publishing business is really built around this law: data publishers succeed because people want the data, but don’t want to collect or maintain it themselves.

Second, we see another great example of a “data pipe,” where one organization provides data that developers can tap into via APIs to build applications driven by that data. The data provider seeks to become an information utility, while dozens or even hundreds of different developers can identify and mine niche opportunities faster and better than any single data publisher. This is a relatively young model, but it’s quickly gaining a following.

Third, valuable data is more often than not sensitive data as well. As this database hits the radar of parents and civil liberties advocates, the inevitable questions around privacy and security are being asked. And the answers to date, according to the article, do not seem particularly robust or reassuring. The non-profit managing the database makes all the appropriate noises about protecting the data, while at the same time the database exists in large part to benefit commercial entities. While the goal of the database is laudable, we have a classic example of a database that will likely succeed only with strong governance and privacy policies. This is something that commercial data publishers will need to become attentive to in years to come.

It’s a fascinating initiative, and one where we can all learn by example.

Comment

When It Pays to License Data

I have long advocated for something I modestly call "Perkins' Law." It holds that "No company outside the information industry will voluntarily build and maintain a database if there is a viable alternative to doing so." Over the years, I have found this posit to hold up pretty well. Indeed, when I find an industry where multiple companies are collecting largely the same information themselves, there is usually a data opportunity. This dynamic drives off an even more fundamental truth: building and maintaining a quality database is hard and expensive work.

So when companies start bragging about building databases when their primary business and expertise is elsewhere, watch out. Back in the early 1990s, a very successful software company decided it wanted to take on media information company SRDS. Their software was impressive. The company had keyed all the necessary listings data. I saw a demo of the new product on a giant screen in a huge conference room. The software folks prattled on about their cutting edge code. The marketing folks described endless features. I have to say, I was impressed. Then I asked about what type of editorial group they had put in place to maintain the data. The room got very quiet. Apparently, the content had been viewed as a one-off activity. The product was quietly scuttled a year or so later.

This, and many other related experiences over the years came flashing back to me this week when I read that high-flying Internet darling Airbnb was busy building its own global city guides that will assess such factors as nightlife, quality and quantity of restaurants, etc. Yes, pretty much the same information you can get 100 different places online. And if you have any remaining doubt that Airbnb is in over its head, note that to build its database, it found a need to fly "neighborhood experts" in from all over to world to help build its database. Further proof: it's working on printed guides as well.

An isolated case? Maybe not. Consider the well-publicized debacle of Apple launching its own map product with its own business and points of interest database. Apparently Apple, having read too many of its own press clippings and deciding it could do no wrong, went out and promptly did wrong.

When companies not in the data business decide they need data, the smart answer is to license whenever possible. Data is hard. Some companies build databases out of ignorance of options; some build them out of hubris. But you should stay alert to both such situations, because whether educating a company or bailing it out, Perkins' Law can be a dependable source of attractive new revenue for you.

Comment

Unlocking the Value of LinkedIn

TechCrunch posted a long and thoughtful analysis of LinkedIn this week. In short, it suggests that LinkedIn is at risk from a growing number of vertical market professional networking sites, and faces a "death by a thousand cuts." The risk of being "sliced and diced" is real. Any horizontal provider of information stands at perpetual risk of a competitor targeting an attractive vertical segment and doing a better job. The same holds true for professional networks. It might be nice that you could theoretically interact with any other professional anywhere, but in reality, most of your interaction is with others in your specific industry.

But is LinkedIn really at risk? I don't think so, primarily because I don't think of LinkedIn as a professional network.

Blasphemy? It may sound like it, since LinkedIn has routinely been classified as a social media play almost since its inception. Indeed, If you think back to the early days of LinkedIn, it was explicitly designed as a novel attempt to codify the concept of "six degrees of separation" among business professionals.  If you continue to think back, you will also remember how quickly that concept fell on its face. Nobody used LinkedIn as originally intended, but it was in the right time and place to become the largest and richest biographical database in the history of the world, and that's where its money does and should come from.

LinkedIn has reached a stage where your failure to have a LinkedIn profile raises questions about you in many circles. It has quietly grown its company profiles (all cross-linked to individual profiles) to the point where a number of business information providers are taking worried notice. Many individuals base their job hunting on well-burnished LinkedIn profiles. In some professions, having the largest number of connections is a sign of success. Increasing number of data publishers are tapping the LinkedIn API to gather, augment and maintain their own databases. I could go on, but in its own weird, wacky and wonderful way, LinkedIn has become part of the fabric of business.

Right now, LinkedIn profits handsomely selling access to its structured database to recruiters and others. But this is just the beginning. LinkedIn is poised to become a critical backbone database with numerous uses. In one obvious application, CRM software companies are all rushing to integrate LinkedIn into their applications. But think beyond the obvious. For example, LinkedIn could be a hugely powerful trust and identity tool. After all, it's hard to fake more than a handful of connections, and are you likely to blatantly misrepresent your career in front of all your friends and colleagues who you invited to link to you? There is very valuable confirmation and verification locked up in all this linking that remains to be fully exploited.

The LinkedIn database could be used in a number of ways to tune up spam filters. You might even use it to prioritize incoming messages from your connections. LinkedIn is ever-eager to suck in your contact list from your computer, but what if it maintained that list for you (remember Plaxo)? Suddenly, LinkedIn would become the central database of business. And let's get inferential for a moment. I am certain that someone somewhere is trying to correlate the number and quality of LinkedIn connections with creditworthiness. And what about evaluating the quality and prospects of a company based on the extent and quality of the LinkedIn connections of its management team?

Crazy ideas? Maybe. But all I am trying to do is illustrate that while the social elements of LinkedIn are nice and necessary, don't lose sight of the fact that it has only just begun to mine one of the most remarkable databases ever created. There's gold in them thar hills!

Comment

Comment

History Matters

A fascinating article in Technology Review highlights and quantifies a problem that is also an opportunity: disappearing data.

The article reports on the findings of a new research study of social media that finds that in an analysis of recent major cultural events (e.g. Arab Spring), 11% of the social media content had disappeared within a year and 27% within 2 years. Beyond that, the study authors have calculated that the world loses 0.02% of its culturally significant social media material every day.

Keep in mind that what the study authors are measuring is not the number of tweets that are disappearing, but the web pages that these tweets are linking to.

And while this study only looked at headline news events, you've probably had experiences similar to mine: links to news stories and press releases that are no longer active, businesses removing all traces of failed product launches from their websites, online stories that are undated and thus difficult to rely on, news outlets that continue to treat yesterday's news like yesterday's news - and the list goes on.

The short message is that historical knowledge is always valuable, and it becomes even more valuable as it becomes less available. And that's what we are seeing online. Whether by accident or design, more and more content is "aging off" the web, creating opportunities for those companies that hold onto it in an organized fashion.

Yes, there are amazing resources like The Internet Archive that attempt to maintain searchable, historical copies of the entire web, but as you might suspect, this is hard, and there is a limit to the amount of detail it can store. But taking on this task on a vertical market basis is do-able, and potentially quite remunerative.

Many data products are designed to provide the latest and most current data, and that's smart. But as you add new data, don't delete the old data. It's amazing what insights can be gleaned by watching the changes to a business over time, and as the art of data analytics evolves, this capability will only get more compelling. But you can't analyze data you don't have. More than a few data companies I know owe their strength and their profitability to the fact that they maintained historical data. And as you become the sole source for content that has disappeared elsewhere from the web, you create a proprietary aspect to your content, along with an often insurmountable competitive barrier.

So take a little time to think about what information (specific data points or full-text documents) you can preserve, and how valuable this information might become if it wasn't available anywhere else, because that's the trend I see. What's past isn't just prologue; it may also be a new profit center!

Comment