infocomm 12/22/05 infocomm 12/22/05

Thomas Register: The End of the Beginning

There was a very consistent reaction to the news this week that Thomas Publishing was ending the print edition of Thomas Register. I'd characterize it as being "shocked but not surprised."

I think the shock was driven by the symbolism of the decision. When this publishing icon, one of the largest and most successful buying guides of all time, says so clearly that print is passe, we all then have to acknowledge that the future of our industry is online. By now haven't all database publishers acknowledged that their future is online? Yes, but that acknowledgement hasn't always been backed up with action, in part because there is no clear path to get from here to there.

The lack of surprise comes from the fact that Thomas was an early and aggressive player on the Web. At a point in time when the future of online was anything but clear, Thomas made the huge gamble that the value of an online audience would ultimately be worth more than the print subscription revenue it was putting at risk. It won, and it won big. The transition for Thomas has been anything but painless, but by starting so early, it became a major Web destination well before there were even such things as keywords to buy, and that has given it a huge competitive advantage. Also, by starting early, Thomas learned a lot and was able to make some mistakes without incurring much damage, all while hedging its bet by maintaining its print edition and selling a print/online package. If there isn't one already, this would make one amazing business school case study.

Now that Thomas has killed the print version of Thomas Register, there will be a lot of soul searching by the industry with publishers asking themselves whether it's time to discontinue their own print products. While our research says that print will move into a period of accelerating decline over the next 2-5 years, most publishers will find that 20-25% of their customers will continue to prefer print for the foreseeable future. Most publishers can still economically accommodate this market. Thomas, with its 33-volume annual behemoth, was dealing with atypical print economics.

As someone who cut his teeth (and his hands, re-making pages with an X-acto knife) working on the print edition of Thomas Register, I am certainly going to miss those "big green books." At the same time, we are watching a whole new era unfold before our eyes, one where our information and our value are no longer constrained by the limitations of the print format, but only by our imaginations.

infocomm 12/22/05 infocomm 12/22/05

Searching for Subscriptions

Yahoo! has announced a beta version of what it calls Yahoo! Search Subscriptions, that allows users to search for content on password-protected, subscription content sites.

By arrangement with publishers (the publishers currently participating are Consumer Reports, Financial Times, Forrester Research, IEEE, New England Journal of Medicine, TheStreet.com and the Wall Street Journal), It's being reported that content from Gale, LexisNexis and Factiva is due to be added shortly.

Yahoo! indexes the content and shows a snippet of it in search results. Users then click on a search result link, and they are presented with a page controlled by the publisher that allows users to login (if they are existing subscribers), subscribe online, or in some cases, purchase a specific article or report on a one-off basis. It's an approach that is simple and effective. And it's a model where everybody wins. Users get greater access to so-called "deep Web" content. Publishers with subscription content get more search engine visibility, ultimately leading to more revenue. Yahoo! gets lots of new content under its index, giving it some nice differentiation and competitive advantage (at least for a while).

Best of all, publishers don' have to compromise their business models in any way. Consumer Reports subscribers, for example, can access content directly through the Consumer Reports site, or indirectly through Yahoo!, using the same username and password in both places. The subscription to Consumer Reports is no less valuable to the subscriber because of this arrangement with Yahoo! It's just one more doorway to the same content.

But the most amazing thing about this new service is that it wasn't launched five years ago. Both the idea and the execution are breathtakingly simple, and the underlying technology has been in place for years.

It's the rare subscription-based data publisher who won't benefit from being part of this new service, so run, don't walk over to Yahoo! and get on board now. Even if Yahoo! ultimately starts looking for a revenue share on content sales and new subscriptions, this will probably still be a good financial deal subscription-based publishers. And besides, the faster this new service from Yahoo! grows, the faster Google and all the others will copy it, providing even more low-cost promotional opportunities for subscription-based publishers!

We're pleased to announce that
Richard P. Malloch, President of Hearst Business Media, and
Craig Pisaris-Henderson, Chairman and CEO of Miva Inc.
(formerly FindWhat.com) will be the keynote speakers at InfoCommerce 2005.

InfoCommerce 2005:
Cracking the Quality Conundrum

November 6-8, 2005 - Philadelphia, PA

infocomm 12/22/05 infocomm 12/22/05

Caching In

There's been a flurry of activity lately around the obscure but important practice of Web page caching -- taking and preserving copies of someone else's Web pages.

One of the more remarkable Web sites I have ever run across is www.archive.org, also known as "The Wayback Machine," a non-profit venture co-founded by Brewster Kahle to essentially take snapshots of the Web at different points in time. Using The Wayback Machine, you can easily take a look at how any individual Web site has evolved over time. Perhaps not surprisingly, not everyone wants their history to be so readily accessible.

In a recent court case, a law firm used The Wayback Machine to uncover some evidence to support its case. The other party in the lawsuit turned around and sued the Internet Archive, operators of The Wayback Machine, for inappropriately making and holding copies of their Web pages. The case is complicated, and actually revolves more around something called a "robots.txt" file than the cached pages itself, but an adverse decision could have a chilling effect on archiving and making available historical content gathered on the Web.

At the same time, Canadian legislators are considering an amendment to the Canadian Copyright Act that will actually prohibit anyone from making and holding a cached copy of someone else's Web site without permission. While this could be a speedbump for search engines that cache content, it's not likely to disrupt them too much as they don't need to cache content to index Web sites, and indexing itself will not be prohibited.

But this is all part of a larger trend towards history disappearing from the Web, and therein may reside a real opportunity for data publishers.

I regularly see examples of companies removing all traces of unsucessful products, ousted executives and failed ventures from their Web sites, leaving no clue they ever existed. A large percentage of companies, mostly for benign reasons, "age off" old press releases and announcements from their Web sites, leaving only a narrow window of corporate history. Most companies seem to feel that the primary value of their Web sites is to provide current if not real-time information, with only a small nod to what has happened in the past. That means that those who capture and retain this type of business information will ultimately end up with a vast repository of business intelligence, much of it unavailable elsewhere.

Even in the pre-Internet days, historical data had real value. I published one directory that ran a small index of corporate name changes that was one of the most popular and heavily used sections of the publication. I know another healthcare directory that didn't simply delete companies that went out of business, merged or were acquired. Instead, it ran it as an index called “Mutations” which proved incredibly popular. I know one financial publisher that actually retains all the previous positions held by the executives in its database, valuable information that could be the basis for a number of specialized, high-value products. In many industries, there are successful databases that cross-reference old and new part numbers, or suggest equivalent parts to replace discontinued parts. And knowing what products a company used to make, what ventures it has exited, and what executives it used to employ will become increasingly valuable as the information becomes harder to access. When it comes to data, the past can be a prelude to lucrative opportunities.

infocomm 12/22/05 infocomm 12/22/05

The Vertical Challenge

Few would argue that over the last five years the major search engines have made enormous strides in improving their coverage of the open Web. They now find new sites more quickly, re-index them more often and even provide searchable access to non-textual content. It's all very impressive and very much for the good. However, as we all well know, too much information can be as much a curse as a blessing. That's why so much effort has been invested in trying to improve the precision of search, an effort often referred to as "improving relevancy." Providing a list of Web pages that contain a keyword or phrase is no longer considered innovative or even particularly valuable. Value is now embodied in identifying the most relevant Web pages for searchers. Every major search engine has its own secret sauce of techniques, processes and algorithms to devine relevance, and they seem to be getting better every day. But this isn't where the battle for search primacy ends, not by a long shot.

The next phase in this competitive battle is to get more paid content under index. Opening shots in this battle have already been fired by both Google and Yahoo. Lexis-Nexis used to run an ad campaign in the early days of the Internet touting that it held far more data than the entire Internet. As a competitive response to the Web, it missed the whole point, but it does underscore another one: even at this late date, some of the most important, powerful and useful content is still not to be found through search engines. This may be the simple explanation why content aggregators continue to do well despite the long shadows cast by the big search engines: their content is valuable and not available elsewhere.

To its credit, the content aggregation industry realized years ago that this distinction would not provide protection forever, which is why they've upped the ante, moving beyond delivery of raw data, and even moving beyond the search precision issue to focus on the biggest value-add of all: making data truly useful. OneSource built a nice business by taking on the hard work of integrating disparate databases to create highly comparable company profiles. Alacra has a nice niche providing customized data feeds to clients for use in their internal systems. Factiva continues to develop increasingly elaborate and powerful taxonomies that can even be extended to the internal data of its clients. LexisNexis builds virtual company profiles drawing on its vast data warehouses.

Interestingly, publishers are jumping on this bandwagon as well. Gale is now out with a product that assembles content from across its range of databases to present deep and comparable profiles. infoUSA is also jumping into the fray, having become a recent convert to the power of data mining to deepen its databases.

Where's this all heading? I think once the gee whiz factor of all the new content assembly technology wears off, it will become evident that the marketplace has moved beyond giant, one-size-fits-all databases. No matter how big, deep and accessible a database is, the fact remains that engineers, purchasing agents and analysts need different data different ways, and it's unlikely that anyone will cook up a single product that will keep them all equally happy. And just as there is growing user sophistication in terms of data elements and search interfaces, so too is there growing sophistication in terms of the overall dataset. Users are going to value 98% coverage of what really matters to them over 80% coverage of everything in the world. All this suggests to me that the future of search looks vertical. Business success will be a function of limited coverage, tailored to certain specific types of users, and executed very, very well.

Winners and losers in this scenario? Most data publishers already have a vertical orientation, and those that quickly figure out how to deliver data as well as they compile it will be very nicely positioned. Aggregators should have a solid, continuing role serving the distinct market that will continue to need convenient access to broad swaths of content.

It's the search engines that seem to be the ones not invited to this party. They are simply too wedded to serving up the most stuff to the most people. That will still be a great business for them, but it's a different business. And as the data content business gets comfortable with its distinctive place in the market, the industry will see greater stability and a much clearer path to profits.

infocomm 12/22/05 infocomm 12/22/05

Know Thy Customer

I opened up two recently purchased music CD's yesterday and found they both contained postcards I could fill out to get on the mailing list for the music labels producing the CD's. I have seen this in scattered instances before, and what's particularly interesting to me is that only the small labels ever seem to bother, even though the cost to insert a postcard into a jewel case is virtually zero, and the information gained can be priceless. Yet for whole segments of the publishing industry, the customer continues to be an intermediary, not a true end-user.

I've never been particularly excited about any form of publishing where there isn't some direct connection with the end-user. I say this even for advertising-based publications. Many print publishers to this day continue to rent lists and ship out their ad-based publications to strangers, hoping that the large quantity they are sending out will compensate for their lack of knowledge about who they are sending to. This happens with many ad-based Web sites as well, with publishers evaluating their success based on level of site traffic -- eyeballs -- with no real knowledge of the users behind that traffic

Subscription-based publishers usually have better information on the end-user, but not always. Many data publishers sell a significant percentage of their subscriptions to libraries. Even when the customer appears to be an individual in a company, the subscription ends up in an internal information center, and the individual subscriber of record may not use the information at all.

Of course, many data products are sold through distributors and aggregators, another type of intermediary sale where the ultimate user is unknown. Distributors have traditionally been loathe to release any end-user usage information. I can remember sitting in meetings when Dialog ruled the roost, begging and pleading for the tiniest shred of information on who might be using our content. Ironically, with aggregators and distributors increasingly feeding corporate intranets, even they don't truly know the ultimate user anymore

Interestingly, a lot of publishers are wringing their hands and worrying about maintaining their brands in an environment where information distribution is increasingly anonymous and diffuse. The focus on branding content is in one sense an admission of defeat: publishers are effectively saying, "I probably will never know who you are, so I want you to at least know who I am." Presumably these end-users will then seek out the publisher directly for additional content. At least, that's the hope But this is not the time for passivity when it comes to knowing your customers. It's not just a sales issue. It means understanding how and why your content is being used, and intermediaries will never be able to truly answer that question for you. Because if you don't know exactly who is using your data, as well as how and why it is being used, you won't be able to apply the high-value infocommerce characteristics that are critical to continued success and growth.

HEAR FROM THE MOST IMPORTANT NAMES IN THE DATA PUBLISHING BUSINESS ...TODAY'S AND TOMORROW'S InfoCommerce 2005 is proud to announce that

Tim DeMello, Chairman and CEO of Ziggs, has been confirmed as a speaker.

InfoCommerce 2005 November 6-8 | Philadelphia

The Working Conference for the Thinking Publisher.