Viewing entries in
Building Databases

Comment

Source Data’s True Worth

In my discussion of the Internet of Things (IoT) a few weeks back, I mentioned that there was a big push underway to put sensors in farm fields to collect and monitor soil conditions as a way to optimize fertilizer application, planting dates, etc. But who would be the owner of this information, which everyone in agriculture believes to be exceedingly valuable? Apparently, this is far from decided. An association of farmers, The Farm Bureau, recently testified in Congress that it believes that farmers should have control over this data, and indeed should be paid for providing access to it.

We’ve heard this notion advanced in many different contexts over the past few years. Many consumer advocates maintain that consumers should be compensated by third parties who are accessing their data and generating revenue from it.

Generally, this push for compensation centers on the notion of fairness, but others have suggested it could have motivational value as well: if you offer to pay consumers to voluntarily supply data, more consumers will supply data.

The notion of paying for data certainly makes logical sense, but does it work in practice? Usually not.

The first problem with paying to collect data on any scale is that it is expensive. More times than not, it’s just not an economical approach for the data publisher. And while the aggregate cost is large, the amount an individual typically receives is somewhere between small and tiny which really removes its motivational value.

The other issue (and I’ve seen this first-hand) is the perception of value. Offer someone $1 for their data, and they immediately assume it is worth $10. True, the data is valuable, but only once aggregated. Individual data points in fact aren’t worth very much at all. But try arguing this nuance to the marketplace. It’s hard.

I still get postal mail surveys with the famous “guilt dollar” enclosed. This is a form of paying for data, but it drives, as noted, off guilt, which means undependable results. Further, these payments are made to assure an adequate aggregate response: whether or not you in particular respond to the survey really doesn’t matter. It’s a different situation for, say, a data publisher trying to collect retail store sales data. Not having data from Wal-Mart really does matter.

Outside of the research world, I just haven’t seen many successful examples of data publishers paying to collect primary source data. When a data publisher does feel a need to provide an incentive, it’s almost always in the form of some limited access to the aggregated data. That makes sense because that’s when the data becomes most valuable: once aggregated. And supplying users with a taste of your valuable data often results in them purchasing more of it from you.

Comment

Comment

The Billion Prices Project

Last week, I discussed how the Internet of Things creates all sorts of potential opportunities to create highly valuable, highly granular data. The Billion Prices Project, which is based at MIT, provides another route to the same result. Summarized very simply, two MIT professors, Alberto Cavallo and Roberto Rigobon, collect data from hundreds of online retailers all over the world to build a massive database of product-level pricing data, updated daily. It’s an analytical goldmine that can be applied to solve a broad range of problems.

One obvious example is the measurement of inflation. Currently, the U.S. Government develops its Consumer Price Index inflation data the old fashioned way: mail, phone and field surveys. And inherently, this process is slow. Contrast that with the Billion Price Project that can measure inflation on a daily basis, and do so for a large number of countries.

But measuring inflation is just the beginning. The Billion Prices Project is exploring a range of intriguing questions, such as the premiums that are charged for organic foods and the impact of exchange rates on pricing. You’re really only limited by your specific business information needs – and your imagination.

The Billion Prices Project also offers some useful insights for data publishers. First, the underlying data is scraped from websites. The Billion Prices Project didn’t ask for it or pay for it. That means you can build huge datasets quickly and economically. Secondly, the dataset is significantly incomplete. For example, it entirely ignores the huge service sector of the economy. But’s it’s better than the existing dataset in many ways, and that’s what really matters.

When considering building a database, new web extraction technology gives you the ability to build massive, useful and high quality datasets quickly and economically. And as we have seen time after time, the old aphorism, “don’t let the perfect be the enemy of the good” still holds true. If you can do better than what’s currently available, you generally have an opportunity. Don’t focus on what you can’t get. Instead, focus on whether what you can get meaningfully advances the ball.

Comment

1 Comment

A New Push to End Passwords

I hate passwords. But I don’t hate passwords as a concept. Certainly I understand the need, but password protection implemented poorly creates friction and often frustration, and that’s not good for business or for my own personal protection.

Now there’s a new initiative out of Silicon Valley called the “Petition Against Passwords.” It’s not proposing a specific alternative, but the basic premise is that we can do better. And the initiative seems to be getting some early traction. But I think that before we try to improve, we also need to address our failings.

passwordgraphic

In my view, because online security has become such a high profile concern, many companies have given their programmers carte blanche to “beef up security.” And beef they have, adding all sorts of onerous restrictions, cool new programming and faddish techniques that satisfy their intellectual curiosity, but put a big dent in the overall user experience.

Several years ago, I bought one of the most popular password management programs called Roboform. It actually will provide long, randomly generated passwords for every site where I have an account. Once set-up, I could access any site with a single click. Nirvana! I was fully protected, and friction was eliminated. This was a win for everyone. And it worked. For a while.

But I’ve watched as RoboForm has become less effective, as more sites institute cool new login processes that force you to do more, remember more, and defeat the popular password managers.

I have one site that insists I manually input my password into a virtual keypad on the screen. Way cool, but essentially pointless. I have another site with no fewer than ten challenge questions that it presents randomly, with responses that have to be entered perfectly, or you are locked out and forced to spend 20 minutes with their call center to get back in. Still another site wants a ten character password that includes both a capital letter and two non-alphanumeric characters. And the latest cool approach is “two-factor authentication,” which sends a separate code to your cellphone every single time you want to login. Honestly, can you picture yourself doing this several times (or more) a day? We want more user engagement, not less.

Where I come out is with this simple, three-point proposition:

  1. Login security should be proportionate to what you are protecting, a point of particular relevance to online content providers. Let’s be honest with ourselves: we’re not protecting nuclear launch codes.
  2. Don’t leave login protocols completely in the hands of your programmers. Logins are a critical component of the overall user experience and need to be assessed accordingly. If users aren’t logging in, they’re also not renewing.
  3. For most of us, time would be better spent improving our back-end system security, to reduce the chance of wholesale theft of user logins, credit card data and personal information. That’s where the big business risk resides, although the necessary programming is admittedly less glamorous than virtual keypads.

So sure, let’s start talking about eliminating passwords. But first, let’s acknowledge that a lot of the problem is self-inflicted by the way in which we have implemented passwords.

1 Comment

1 Comment

The Gamification of Data

I attended the Insight Innovation Conference this week – a conference where marketing research professionals gather to think about the future of their industry. A number of the sessions dealt with the topic of gamification. Marketing research is really all about gathering data, and a lot of that data is gathered via surveys. And, not surprisingly, market researchers are finding it harder than ever to get people to participate in their surveys, finish the surveys even when they do participate, and supply trustworthy, high quality answers all the way through. It’s a vexing problem, and it is one that is central to the future of this industry.

That’s where gamification comes in. Some of the smartest minds in the research business think that by making surveys more fun and more engaging, they can not only improve response rates, but actually gather better quality data. And this has implications for all of us.

One particularly interesting presentation provided some fascinating “before and after” examples of boring “traditional” survey questions, and the same question after it had been “gamified.” As significantly, he showed encouraging evidence that gamified surveys do in fact deliver more and better data.

And while it’s relatively easy to see how a survey, once made more fun and engaging, would lead people to answer more questions, it’s less obvious how gamification leads to better data.

In one example, the survey panel was asked to list the names of toothpaste brands. In a standard survey, survey respondents would often get lazy, mentioning the top three brands and moving to the next question. This didn’t provide researchers with the in-depth data they were seeking. When the question was designed to offer points for supplying more than three answers and bonus points for identifying a brand that wasn’t in the top five, survey participants thought harder, and supplied more complete and useful data.

In another example, survey participants were given $20 at the start of the survey, and could earn more or lose money based on how their responses compared to the aggregate response. Participation was extremely high and data quality was top-notch.

Still other surveys provided feedback along the way, generally letting the survey participants know how their answers compared to the group.

Most intriguing to me is that gamification allowed for tremendous subtlety in questions. In a game format, it’s very easy to ask both “what do you think” and “what do you think others think,” but these are devilishly hard insights to get it in traditional survey format.

Gamification already intersects with crowdsourcing and user generated content quite successfully. Foursquare is just one well-known example. But when the marketing research industry begins to embrace gamification in a big way, it’s a signal that this is a ready-for-prime-time technique that can be applied to almost any data gathering application. Maybe it’s time to think about adding some fun and games!

1 Comment

Comment

Is Data the Salvation of News?

Doubtless by now you’ve heard the buzz around the travel news start-up called Skift. Skift is the brainchild of Rafat Ali, the founder of PaidContent. Skift appears to be a disruptive entry into the B2B travel information market, and seeks to distinguish itself through a fresh style of reportage and eclectic editorial coverage (news of innovative airport design merits the same level of coverage as news about major airlines). Given Rafat’s track record and the fondness these days for all things disruptive, Skift has recently attracted an additional $1 million from investors. Where this gets really interesting is that Skift wants to broadly cover the incredibly huge global travel industry with only a handful of reporters. That means Skift will deliver a mix of original reporting along with licensed and curated content. So where’s the innovation and disruption? The answer, in a word, is data.

skiftlogo

Skift’s plan is to deliver most of its news free on an advertising-supported model, but to also offer paid subscriptions (reportedly to range from $500 to $1,000) to give subscribers access to travel data. It’s no surprise then, that Skift is positioning itself as a “competitive intelligence engine.”

Skift may be on to something. I first got interested in the intersection of news and data back in 2007, when I read some fascinating articles written by Mike Orren, the founder of an online newspaper called The Pegasus News. Orren had discovered that despite his focus on hyper-local news, the editorial content that consumers are ostensibly hungering for, fully 75% of those who came to his site were there for some sort of data content. Others in the newspaper industry have also reported similar findings.

In this context Skift seems to have a firm grasp of the new dynamics of the information marketplace: while there is an important role for news, it’s increasingly hard to monetize. That’s why news married to data is a much smarter business model. News provides context and helps with SEO. It can be monetized to some extent through advertising. Data offers premium value that is easily monetized with a subscription model, and the two types of content, intelligently combined, offer a compelling, one-stop proposition to those who need to know what’s going on in a specific market.

This is, of course, a conceptually simple model that not too many legacy news publishers have been able to execute on. That’s because the two types of content are inherently distinctive, from how they are created to how they are sold. Perhaps a disruptive market entrant like Skift will be able to crack the code and produce both types of content successfully itself. Personally, I think the fastest and surest path to success is to build strong partnerships with data publishers.

 

Comment