Viewing entries in
New Data Products

It’s Hard to Trust This One

March 13, 2020

Recently, ADP (the association of yellow page publishers, not the payroll company) announced something called “Trusted Local Directory,” an online directory of “Trusted Local Businesses.” To become a Trusted Local Business, a company is “thoroughly investigated” and if worthy receives both a Trusted Local Business seal for its use, along with a listing in the Trusted Local Directory.

I give ADP kudos for trying to find ways to breathe new life and relevance into the yellow page directory business, but I have to admit some skepticism as well.

First, this model is not a new one, and the track record of third-party trust evaluators isn’t a good one. Trust is hard. Perhaps more to the point, trust is expensive. And to a great extent, trust is in the eye of the beholder – simply defining how a company can objectively prove it is trustworthy is remarkably challenging. That’s why this is one tough model.

Consider as a case study the Better Business Bureau (BBB). They’ve been providing assurances of trust for over 100 years. But they’ve come be viewed as a consumer advocacy organization when in fact they are supported by their business members, setting up all sorts of inherent conflicts. Moreover, new BBB business members automatically receive a top rating upon joining. The rating may then be reduced over time depending on how the business handles its complaints. That’s a loophole that scammers can drive a truck through. Moreover, BBB has set itself up to process and resolve mountains of consumer complaints, something it doesn’t get paid to do. More fundamentally, BBB has a pay to play business model. It makes no money unless a business becomes a member, and once a member the business automatically receives a top rating from BBB.

If BBB has trouble with this model, consider that ADP has the additional hurdle of being an unknown brand. Moreover, rather than leveraging the directories of its members, ADP has created the Trusted Local Directory as a new directory site that will need to build usage from scratch, a daunting task at this late date. And lest you think that the Trusted Local Directory is a directory of trusted local businesses, be advised that it appears to be a national directory of all businesses, one that offers no more than business name, address and phone.

An online directory of trusted local businesses could be a good and useful product. But the business model inherently fights you every step of the way. A directory like this needs a critical mass of businesses to be useful and viable. But assessing trust at anything more than a cursory level is slow, manual, expensive and difficult to scale. So you can’t do it for free. But by charging for inclusion, fewer businesses will want to be included. To combat this you can reduce your price, which means a less rigorous assessment, which in turn limits the value of the product. Alternately, you can give the impression of a rigorous review without actually doing the work, but that is more likely to lead to court than to success.

Crowdsourced reviews have come closest to making the third-party review model work. They are low cost and do readily scale, but many suffer from gaming and have credibility issues of their own. To succeed, they need a lot of policing and quality control, and that quickly gets complex and expensive, and there only a few examples (TrustPilot is one good one) of meaningful monetization with this model.

Again, kudos to ADP for thinking outside the box, but it doesn’t seem to me they’ve cracked the code on this inherently challenging business model. And for anyone else considering this model, trust me, it’s hard.

Google: Now Organizing the World’s Data

February 14, 2020

Google's mission is “to organize the world's information and make it universally accessible and useful”. How does its newest foray into data stack up? Early this year, Google officially launched something it calls Dataset Search. It’s been in public beta since 2018 (I still contend that the concept of “public beta” remains Google’s single greatest technological innovation), but now it’s for real and according to Google, already contains information on over 25 million datasets.

Dataset Search is loosely tied to Google Scholar, a specialized version of the Google search engine intended to make it easier to search for academic papers. Along those lines, Google sees Dataset Search as something most useful to scholars and data journalists.

Improving discovery of datasets is a worthy and important task. Quite likely, 25 million datasets are only a tiny fraction of what exists online. And in this age of open data, Google is tackling a big task at just the right time.

Anyone can add a database to Dataset Search. Include some metatags on the relevant webpage, and the Google crawler will find it, and automatically inject a record into Dataset Search. Is it worth the effort? Well, it’s free and it’s fairly easy to participate, and it’s Google. Google does note that information in Dataset Search is added to the Google Knowledge Graph, meaning it connects Dataset Search records to all other information it knows about the organization that owns the dataset. Some suspect this may improve your overall Google search ranking, though Google is necessarily playing coy on this point.

What’s in Dataset Search today? I have to say, while it has potential, it’s going through some growing pains. Pro Publica has a very good database of financial data on non-profit organizations. However, rather than list the dataset once, Pro Publica appears to have coded its database so all 800,000 records in its database have become separate records in Database Search. Humorously, for some other organizations, a CEO headshot will be displayed instead of a company logo. This will all be corrected in time. My biggest disappointment, however, is likely to remain: Dataset Search is a database of databases searchable primarily by full text queries. There are very few parameters that can be applied to usefully narrow a search, so much like the primary Google search engine itself, you will still have to manually browse through endless search results to find what you want.

I do want to stress that Dataset Search is open to commercial data products. It’s an easy, free way to get some additional online exposure for your products and if it bumps up your search result rankings, it’s well worth the effort. And as Dataset Search evolves, it may well become an accepted way to discover and source commercial data products. Why not get in on the ground floor?

This Score Doesn't Compute

May 17, 2019

This week the College Board, operators of the SAT college admissions tests, made a very big announcement: in addition to its traditional verbal and mathematic skills measurement scores, it will be adding a new score, which it is calling an “adversity score.”

In a nutshell, the purpose of the adversity score is to help college admissions officers “contextualize” the other two scores. Primarily based on area demographic data (crime rates, poverty rates, etc.) and school-specific data (number of AP courses offered, etc.) this new assessment will generate a score from 1 to 100, with 100 indicating that the student has experienced the highest level of adversity.

Public reaction so far has been mixed. Some see it as an honest effort to help combat college admission disparities. Other see it is a desperate business move by the College Board, which is facing an accelerating trend towards college adopting test-optional admission policies (over 1,000 colleges nationwide are currently test-optional).

I’m willing to stipulate that the College Board had its heart in the right place in developing this new score, but I am underwhelmed by its design and execution.

My first concern is that the College Board is keeping the design methodology of the score secret. I find that odd since the new score seems to rely on benign and objective Census and school data. However, at least a few published articles seemed to suggest that the College Board has included “proprietary data” as well. Let the conspiracy theories begin!

Secondly, the score is being kept secret from students for no good reason that I can see. All this policy does is add to adolescent and parental angst and uncertainty, while creating lots of new opportunities for high-priced advisors to suggest ways to game the score to advantage. And the recent college admissions scandal shows just how far some parents are willing to go to improve the scores of their children.

My third concern is that this new score is assigned to each individual student, when it is in reality a score of the school and its surrounding area. If the College Board had created a school scoring data product (one that could be easily linked to any student’s application) and sold it as a freestanding product, there would likely be no controversy around it.

Perhaps most fundamentally though, the new score doesn’t work to strengthen or improve the original two scores. That’s because what it is measuring and how it measures is completely at odds with the original two scores. The new score is potentially useful, but it’s a bolt-on. Moreover, the way this score was positioned and launched opens it up to all the scrutiny and criticism the original scores have attracted, and that can’t be what the College Board wants. Already, Twitter is ablaze with people citing specific circumstances where the score would be inaccurate or yield unintended outcomes.

Scores and ratings can be extremely powerful. But the more powerful they become, the more carefully you need to tread in updating, modifying or extending them. The College Board hasn’t just created a new Adversity Score for students. It’s also likely to have a caused a lot of new adversity for itself.

A Healthy New Year

January 4, 2019

We’re in the midst of a transformational shift in the healthcare industry. Likely you have experienced it yourself, and it’s probably already hit you in the pocketbook. It’s the shift to what is called consumer-directed healthcare.

While on the surface consumer-directed healthcare may seem like nothing more than an attempt by employers to shift some of their spiraling healthcare costs onto their employees, there is much more going on behind the scenes. There is a lot of public policy driving this shift. The general idea is that healthcare costs are out of control because those buying healthcare services traditionally haven’t been the ones paying for them. By shifting healthcare costs to the consumer, the reasoning goes, consumers will demand better value for their money by becoming smart healthcare shoppers, and healthcare costs will begin to decline.

It all makes sense on paper, but there is one huge stumbling block in making this approach work: it’s hard to be a smart shopper when none of the things you are buying have price tags on them.

Data entrepreneurs have already seen this opportunity. Companies like Healthcare Blue Book and ClearCost Health have made real strides, but it’s a big and enormously complicated problem to solve. In part, that’s because hospitals don’t like to disclose their prices and insurers are often contractually prohibited from sharing what they pay specific hospitals for specific procedures.

Recognizing the issue, the federal government had mandated that as of January 1 of this year, hospitals must post their pricing for common procedures on their websites in an easily downloadable format.

There’s a quick opportunity here to put your website scraping tools to work to gather all this pricing data in one place and normalize it. Certainly, there is an analytical product in there somewhere. But it’s less of an opportunity than it seems because what hospitals are generally posting are their list prices – and virtually nobody pays these prices.

The challenge in hospital pricing is to find out what a specific insurance plan pays a specific hospital for, say, a hip replacement. This could be an ideal opportunity to turn to the crowd.

One approach might be to aggregate all the pricing data that hospitals are now required to publish and use it as a data backbone – essentially a starting point. Then you could turn to consumers and ask them to anonymously submit their hospital bills and insurance statements. Take those images, use optical character recognition to get them into raw data format, then develop software to extract the valuable pricing data. When specific price data isn’t available, you could back off to list price data that would at least show if a hospital is relatively more or less expensive.

Obviously it will take a long time to build a comprehensive database consisting of millions of price points, but there are a lot of consumer groups and other constituencies that would be very interested in your success and would work with you to increase the number of bills submitted. Hospitals won’t like this a bit, but as is so often the case, if one group doesn’t want the data out there, you have immediate confirmation that the data are valuable to some other group. Ironically, hospitals submit their price quotes for medical devices to a fascinating data company called MDBuyline to make sure they aren’t over-paying for their purchases.

Sure, there is lots of complexity hiding under this simple framework. Also, it’s obvious that it will take a long time to build a comprehensive database. But the bromide “don’t let the perfect be the enemy of the good” nicely describes a key to success in the data business. As long as your database is the best available, it doesn’t have to be either complete or perfect. In almost every case, data is so important to decision-making that buyers will take what they can get, warts and all. This is not an invitation to be lazy or sloppy. Rather, it is recognition that you’ll have a marketable product long before you have a complete and perfect product. Just one more reason data is such a great business. Should hospital price data be on your New Year’s resolution list?

Relationship Scoring

November 9, 2018

No, this is not about online dating. I am referring to the growing use of consumer scores to help companies determine how much time and energy to invest with individual customers.

We’re all familiar with credit scores that yield a single number meant to reflect how dependably you pay your bills. A high credit score can mean easy access to credit, often at lower interest rates that reflect your low re-payment risk. A poor credit score can mean limited access to credit and loans, in addition to higher interest rates.

The folks behind the credit scores have been relentless in their work to find new markets for their product. With the notion that a credit score is also a reflection of someone’s level of personal responsibility as well, credit information is increasingly used in hiring decisions. You’ll also find credit scores used to determine pricing for such things as automobile insurance, the insurance companies having concluded that if you pay your bills on time, you likely drive carefully as well.

But credit scores are not the only consumer scores out there. In parallel with credit scores, a number of companies have been building out consumer scores based on Customer Lifetime Value (CLV). The CLV concept has been around forever. What’s changed recently is increasingly easy access to a wide variety of input datasets (a/k/a/ “signals”) that work to increase the precision of these scores, along with increasing computer power that makes it possible to access and act on these scores in real-time.

And how are these scores used? A recent Wall Street Journal articles suggests that CLV scores are increasingly used by companies to determine how they will interact with their customers. A higher scoring customer may actually get faster and better customer service. Companies will offer bigger incentives and better deals to their best customers in order to retain them. CLV scores start with numeric calculations of the likely dollar value of a customer over the entirety of the projected relationship (and yes, your score typically declines as you get older because … less lifetime). More recently, these relatively simple calculations have been enhanced with demographic overlays and a wide array of lifestyle and even behavioral data points. For example, customers who complain too much or call customer service too often may have their scores reduced as a result.

Currently, companies implement their own CLV scoring systems, sometimes with the help of third-party vendors. CLV scores as a data-driven way to make sure better customers are treated better sounds benign. Where it could take a more worrisome turn is if a third-party vendor tries to centralize all of this information to build a single CLV score for all consumers. This would be a fraught undertaking, especially since it would likely not be subject to any regulatory scrutiny and control. Such a scoring system would also look uncomfortably similar to the social credit system recently introduced by the Chinese government, the implications of which are not yet fully understood but are likely to be profound.

InfoCommerce Group Blog

It’s Hard to Trust This One

Google: Now Organizing the World’s Data

This Score Doesn't Compute

A Healthy New Year

Relationship Scoring