Nothing New

February 4, 2011

The recent dust-up between Google and Microsoft (in short, Google is accusing Microsoft of copying its search results) is more entertaining than informative. What is of particular interest to me is that the web cognoscenti have largely come down on the side of Microsoft, many going so far as to proclaim Microsoft clever and creative for trying to use Google search results to improve its own.

This latest controversy -- building a search engine, in part, by taking results from another search engine -- reminded me of a larger issue I have been pondering for a while: the seeming tilt in favor of aggregating data over creating it.

Consider "Web 2.0." There are thousands of competing definitions for this term, but to me it stands for a period when we celebrated the power of programmers who aggregated large and complex datasets in clean and powerful user interfaces. These Web 2.0 sites were exciting, useful, powerful and innovative, but their existence depended on OPD: Other People's Data. Sometimes that data was licensed, but just as often it was public domain data or simply scraped and re-formatted. To the extent all this data was legally obtained, I take no issue with it. But it does seem to have created a mindset, even among publishers. As we discuss new products with publishers of all different shapes and sizes, it's not uncommon that one of the first questions asked is, "where will we get the data?"

I jump from that thought to some interesting insights from the Media Dealmakers Summit I attended yesterday. A number of speakers brought up the topic of curation, usually with near-glee in their voices. That's because curation looks to be the next big thing, and who better than publishers to offer content made more valuable through curation? But aggregation is curation-lite. By that I mean you add relatively little value simply deciding what sources to aggregate. Real curation, and hence real value, comes from getting under the hood and selecting, standardizing, normalizing and editing individual units of content. Arguably, the highest form of curation is compilation, where you not only create a unique dataset to meet a specific need, but you make highly granular decisions about what to include or exclude.

At the Dealmakers Summit Dan Lagani, president of Reader's Digest, reminded us that Reader's Digest was created in 1922 specifically to address information overload. What resulted was one of the most successful magazines in the history of publishing. If content curation was that valuable back then, imagine its value today! But again, simple aggregation is the lowest form of curation and compilation is the highest. And if we want to have high value, differentiated products, we must never let our content creation skill atrophy. Aim high.

Comment

Taking Out the Garbage

January 21, 2011

Last week, I wrote that there are an increasing number of people claiming that the major search engines are getting long in the tooth. The key issue: they have been thoroughly compromised by commercial forces (some ethical, many not) that have compromised search results by forcing marginal or inappropriate sites into the coveted top positions, frustrating searchers with false starts and wasted time.

I noted as far back as 2005 that even the advertising in search engines had been similarly compromised. Some retailers and e-commerce sites were so crazed for traffic they would advertise products they didn't sell or products that didn't even exist.

The net results was that we moved from a situation not too long ago where the search engines only indexed 50% of the web, to a situation where it can be said they now index 150% of the web, the extra 50% being the junk, clutter, scammers and garbage that work to obscure meaningful search results.

A lot of companies have sought to address this growing problem with their own search engines. I wrote, for example, about a new search engine called Blekko, that allows users to powerfully filter search results, or even to use filters built by others. Conceptually, it's a clever idea, but on a practical level, it's a lot of work, and if you rely on the work of others, you never know what you're getting (or missing).

Now there is a lot of buzz around a new search engine called duckduckgo.com, a quirky (quacky?) search engine that tries to improve search by doing less, not more. Its unique selling proposition is that it aggressively filters out garbage search results, and won't track or retain your search results in any way. Does a site like this even have a prayer?

I gave duckduckgo a workout this morning. My first reaction: it's surprisingly good. It's design is so spare you get this unsettling feeling you're missing something, but what really seems to be missing is a lot of the garbage we're become accustomed to in search results. It's rather like the first time you put on your glasses with your new prescription. Things jump out at you that you might not have seen before. It takes only a few searches to become convinced you're probably not missing anything important in the search results it returns. It's worth a look.

What may be happening, finally, is that we are beginning a long-term shift back to basics in search, a shift that recognizes that search engines can't and shouldn't do everything, and that search engines are best when they stay true to their purpose: to index original content, not try to become content. A shift like this can only be good news for those of us who own original content.

Comment

Wake Up and Smell the Curation

January 14, 2011

In a very short period of time, it appears that it has become acceptable to say in public what was formerly only whispered in darkened rooms, to wit: Google search isn't cutting it anymore. One great example of the genre can be found here:

Boil the criticism down to its essence, and what is being said is that the Google search algorithm has been thoroughly steamrolled by big merchants and spammers with powerful SEO capabilities, pushing themselves into the important early search results and making it much harder to find what you are looking for. The junk -- that Google in its early days did such a stellar job filtering out -- is back.

There are some who believe it is only a matter of time until Google re-engineers its search algorithm, and then all these problems will magically go away. More people seem to believe that this problem is big, profound and permanent.

Consider too the much-publicized statistic from marketing firm iProspect: the typical knowledge worker spends 16 hours a month searching for information and 50 percent of all those searches fail. More evidence that when it comes to search, something is broken.

Intriguingly, the solution being advanced by many is curation: some more active type of selection that isn't totally driven by algorithms. At one extreme, it is hand-assembled lists. At the other extreme, we have the concept of the "social graph," the concept that search results can be driven by what your friends and colleagues like and recommend.

Of course, right in the middle of that continuum sits a group we all know and love: data publishers. What data publishers do, by definition, is curate: they collect, classify and arrange information to make it more useful.

The majority of data publishers have spent years now trying to prove to themselves, their subscribers and their advertisers where they fit in the world of search, and why they still matter. Intriguingly, there is a growing belief that the general search engines, who believed they could do everything and do it better, are actually finding hard limits to their ambitions. And that puts new importance on information providers who cover an area particularly well and make that information easily accessible. Think data publishers.

This does not mean that Google is going to go away. But it is likely to be a very different company, especially given its rapid diversification into so many non-search businesses. Perhaps Google itself woke up and smelled the coffee and now sees the limits of general search?

Comment

News You Can Use

January 7, 2011

An article in the current issue of Wired discussing a new product from Dow Jones called Lexicon offers up this irresistible line:

"But many of the professional investors subscribing to Lexicon aren't human -- they're algorithms."

Okay, algorithms don't actually call and order up fresh, hot data for delivery like pizza ... but the people in charge of those algorithms do, and that's the real point.

Let me step back and explain Lexicon. It's an XML feed of breaking news stories with an intriguing twist: Lexicon pre-processes each news stories to add fielded sentiment analysis to each story, which is expressed quantitatively. In other words, the tone of the article is reduced to a number. That means that the customers of Lexicon -- institutional traders for the most part -- can more easily feed news content into their computerized models that are used to drive stock trading. Imagine, for example, a story about copper prices with a strongly negative numeric value associated with it. Traders feed that into their software, which is likely looking at real-time copper prices and who knows what else, and formulate a computer-based buy/sell decision.

There are two elements to this product that really get me excited. First, we have a perfect example of how publishers, by pre-processing data to impute, infer or summarize, can add tremendous value to their content. Second, we have a wonderful example of the blurring lines between content formats. News and data used to live in distinct worlds, and Lexicon illustrates how they are coming together, by analyzing the news and assigning a structured numerical summation to it. As importantly, Lexicon makes the news more amenable to machine processing, and that's at the heart of the value proposition for data products.

Lexicon stands as a great illustration of the increasingly rapid evolution of data-text integration, an InfoCommerce Group "mega-trend" we've been advancing since 2001 (better a little early than a little late!).

Comment

The Beef Jerky Business

December 17, 2010

Kudos to American Business Media for providing a heads-up on an important legal development relating to data publishing. Here's the story (I'll keep the legal details short - my holiday gift to you!):

A while back, both Vermont and New Hampshire passed laws severely restricting the collection and sale of data on drug prescriptions. Since that is the primary business of data behemoth IMS Health, you will not be surprised to learn that IMS Health challenged these laws in federal court. The federal First Circuit (covering New Hampshire) court ruled against IMS Health, drawing the now notorious comparison between data producers and producers of beef jerky:

The plaintiffs, who are in the business of harvesting, refining, and selling this commodity, ask us in essence to rule that because their product is information instead of, say, beef jerky, any regulation constitutes a restriction of speech. We think that such an interpretation stretches the fabric of the First Amendment beyond any rational measure

Just to keep things interesting, the federal Second Circuit court, which covers Vermont, has now rendered a decision that goes in the opposite directions and supports IMS Health:

Vermont here aims to do exactly that which has been so highly disfavored-namely, put the state's thumb on the scales of the marketplace of ideas in order to influence conduct ... In other words, the statute seeks to alter the marketplace of ideas by taking out some truthful information that the state thinks could be used too effectively.

These cases and decisions turn on some nuanced points, but more broadly what is taking shape is a view as to whether data publishers are entitled to First Amendment protections. If so, the ability of any level of government to regulate what types of data could or could not legally be collected or sold would be highly restricted. Arguably, this could also impact data gathered for online ad targeting and other related purposes. In short, this is big.

Split decisions between Circuit Courts are often resolved by the U.S. Supreme Court, and there is already some movement in that direction for these decisions. It's important that we all stay alert to this potential Supreme Court review and make our voices heard on this important issue.

In the meantime, pass the Slim Jims!

Comment

InfoCommerce Group Blog

Nothing New

Taking Out the Garbage

Wake Up and Smell the Curation

News You Can Use

The Beef Jerky Business