infocomm 1/21/11 infocomm 1/21/11

Taking Out the Garbage

Last week, I wrote that there are an increasing number of people claiming that the major search engines are getting long in the tooth. The key issue: they have been thoroughly compromised by commercial forces (some ethical, many not) that have compromised search results by forcing marginal or inappropriate sites into the coveted top positions, frustrating searchers with false starts and wasted time.

I noted as far back as 2005 that even the advertising in search engines had been similarly compromised. Some retailers and e-commerce sites were so crazed for traffic they would advertise products they didn't sell or products that didn't even exist.

The net results was that we moved from a situation not too long ago where the search engines only indexed 50% of the web, to a situation where it can be said they now index 150% of the web, the extra 50% being the junk, clutter, scammers and garbage that work to obscure meaningful search results.

A lot of companies have sought to address this growing problem with their own search engines. I wrote, for example, about a new search engine called Blekko, that allows users to powerfully filter search results, or even to use filters built by others. Conceptually, it's a clever idea, but on a practical level, it's a lot of work, and if you rely on the work of others, you never know what you're getting (or missing).

Now there is a lot of buzz around a new search engine called duckduckgo.com, a quirky (quacky?) search engine that tries to improve search by doing less, not more. Its unique selling proposition is that it aggressively filters out garbage search results, and won't track or retain your search results in any way. Does a site like this even have a prayer?

I gave duckduckgo a workout this morning. My first reaction: it's surprisingly good. It's design is so spare you get this unsettling feeling you're missing something, but what really seems to be missing is a lot of the garbage we're become accustomed to in search results. It's rather like the first time you put on your glasses with your new prescription. Things jump out at you that you might not have seen before. It takes only a few searches to become convinced you're probably not missing anything important in the search results it returns. It's worth a look.

What may be happening, finally, is that we are beginning a long-term shift back to basics in search, a shift that recognizes that search engines can't and shouldn't do everything, and that search engines are best when they stay true to their purpose: to index original content, not try to become content. A shift like this can only be good news for those of us who own original content.

infocomm 1/14/11 infocomm 1/14/11

Wake Up and Smell the Curation

In a very short period of time, it appears that it has become acceptable to say in public what was formerly only whispered in darkened rooms, to wit: Google search isn't cutting it anymore. One great example of the genre can be found here:

Boil the criticism down to its essence, and what is being said is that the Google search algorithm has been thoroughly steamrolled by big merchants and spammers with powerful SEO capabilities, pushing themselves into the important early search results and making it much harder to find what you are looking for. The junk -- that Google in its early days did such a stellar job filtering out -- is back.

There are some who believe it is only a matter of time until Google re-engineers its search algorithm, and then all these problems will magically go away. More people seem to believe that this problem is big, profound and permanent.

Consider too the much-publicized statistic from marketing firm iProspect: the typical knowledge worker spends 16 hours a month searching for information and 50 percent of all those searches fail. More evidence that when it comes to search, something is broken.

Intriguingly, the solution being advanced by many is curation: some more active type of selection that isn't totally driven by algorithms. At one extreme, it is hand-assembled lists. At the other extreme, we have the concept of the "social graph," the concept that search results can be driven by what your friends and colleagues like and recommend.

Of course, right in the middle of that continuum sits a group we all know and love: data publishers. What data publishers do, by definition, is curate: they collect, classify and arrange information to make it more useful.

The majority of data publishers have spent years now trying to prove to themselves, their subscribers and their advertisers where they fit in the world of search, and why they still matter. Intriguingly, there is a growing belief that the general search engines, who believed they could do everything and do it better, are actually finding hard limits to their ambitions. And that puts new importance on information providers who cover an area particularly well and make that information easily accessible. Think data publishers.

This does not mean that Google is going to go away. But it is likely to be a very different company, especially given its rapid diversification into so many non-search businesses. Perhaps Google itself woke up and smelled the coffee and now sees the limits of general search?

infocomm 1/7/11 infocomm 1/7/11

News You Can Use

An article in the current issue of Wired discussing a new product from Dow Jones called Lexicon offers up this irresistible line:

"But many of the professional investors subscribing to Lexicon aren't human -- they're algorithms."

Okay, algorithms don't actually call and order up fresh, hot data for delivery like pizza ... but the people in charge of those algorithms do, and that's the real point.

Let me step back and explain Lexicon. It's an XML feed of breaking news stories with an intriguing twist: Lexicon pre-processes each news stories to add fielded sentiment analysis to each story, which is expressed quantitatively. In other words, the tone of the article is reduced to a number. That means that the customers of Lexicon -- institutional traders for the most part -- can more easily feed news content into their computerized models that are used to drive stock trading. Imagine, for example, a story about copper prices with a strongly negative numeric value associated with it. Traders feed that into their software, which is likely looking at real-time copper prices and who knows what else, and formulate a computer-based buy/sell decision.

There are two elements to this product that really get me excited. First, we have a perfect example of how publishers, by pre-processing data to impute, infer or summarize, can add tremendous value to their content. Second, we have a wonderful example of the blurring lines between content formats. News and data used to live in distinct worlds, and Lexicon illustrates how they are coming together, by analyzing the news and assigning a structured numerical summation to it. As importantly, Lexicon makes the news more amenable to machine processing, and that's at the heart of the value proposition for data products.

Lexicon stands as a great illustration of the increasingly rapid evolution of data-text integration, an InfoCommerce Group "mega-trend" we've been advancing since 2001 (better a little early than a little late!).

infocomm 12/17/10 infocomm 12/17/10

The Beef Jerky Business

Kudos to American Business Media for providing a heads-up on an important legal development relating to data publishing. Here's the story (I'll keep the legal details short - my holiday gift to you!):

A while back, both Vermont and New Hampshire passed laws severely restricting the collection and sale of data on drug prescriptions. Since that is the primary business of data behemoth IMS Health, you will not be surprised to learn that IMS Health challenged these laws in federal court. The federal First Circuit (covering New Hampshire) court ruled against IMS Health, drawing the now notorious comparison between data producers and producers of beef jerky:

The plaintiffs, who are in the business of harvesting, refining, and selling this commodity, ask us in essence to rule that because their product is information instead of, say, beef jerky, any regulation constitutes a restriction of speech. We think that such an interpretation stretches the fabric of the First Amendment beyond any rational measure

Just to keep things interesting, the federal Second Circuit court, which covers Vermont, has now rendered a decision that goes in the opposite directions and supports IMS Health:

Vermont here aims to do exactly that which has been so highly disfavored-namely, put the state's thumb on the scales of the marketplace of ideas in order to influence conduct ... In other words, the statute seeks to alter the marketplace of ideas by taking out some truthful information that the state thinks could be used too effectively.

These cases and decisions turn on some nuanced points, but more broadly what is taking shape is a view as to whether data publishers are entitled to First Amendment protections. If so, the ability of any level of government to regulate what types of data could or could not legally be collected or sold would be highly restricted. Arguably, this could also impact data gathered for online ad targeting and other related purposes. In short, this is big.

Split decisions between Circuit Courts are often resolved by the U.S. Supreme Court, and there is already some movement in that direction for these decisions. It's important that we all stay alert to this potential Supreme Court review and make our voices heard on this important issue.

In the meantime, pass the Slim Jims!

infocomm 12/10/10 infocomm 12/10/10

Information Alchemy

One of the first rules I learned as a consultant was to avoid people you barely know who wanted to take you to lunch. Such meals invariably were suggested by people who want to tap into your knowledge and expertise without paying for it. Worst of all, you always left the meal hungry, because you spent the whole time answering a non-stop stream of questions.

These lunchtime "brains drains" have been largely replaced by so-called "expert networks," online marketplaces that allow consultants and other experts to sell their knowledge on an hourly basis to those who want a crash course on a specific industry, trend, technology or company. These expert networks have long been popular with private equity firms, hedge funds and other investors. It's a fast and efficient way for investors to check out a proposed investment. It's also a way to develop ideas and insights that gives an investor an edge.

If you've been following the recent spate of insider trading indictments, you'll note that a number of them involve expert networks that allegedly went over the edge, moving beyond providing ideas and insights into providing insider information.

So if an expert network has crossed the line into providing insider information, shut 'em down and move on, right? Yes, except for one thing: the definition of insider information has always been a bit murky, and seems poised to get murkier, so much so that data providers may feel some of the fallout. That's because there are those who contend that you can create non-public information (the stuff of insider information) out of publicly available information. Information alchemy, or so it would seem.

Yes, the public data most of us sell, often for purposes of business and competitive intelligence, when combined with other public data can, through some type of information alchemy, become non-public data that is subject to insider information rules. Even scarier, nobody seems to know exactly when that line gets crossed.

As I understand it, data providers need not worry directly, because the information we individually provide is clearly public, and only part of the picture being assembled, the "mosaic" in Wall Street parlance. However, this heightened scrutiny puts an uncomfortable spotlight on the expert networks (for example,MedaCorp, the first expert network in the healthcare area, announced this morning it will close).

Even more worrisome, we could see a chilling effect on how extensively and aggressively investors acquire data to build their mosaics, and that could have a bottom line impact for many data publishers. As investors seek to turn obscure bits of data into gold, regulators seem poised to turn that gold into dross.