Viewing entries in
Building Databases

Shine a Light on Your Hidden Data

If you watch the technology around sales and marketing closely, you’ll know that beacon technology is all the rage. Stores can purchase beacon broadcasting equipment, and when shoppers enter their stores with beacon-enabled apps, the apps will respond to the beacon signals – even if not in use. Stores see nirvana in pushing sale offers and the like to customers who are already on the premises. And of course, it is expected that some mainstream apps (Twitter is often cited, though this is unconfirmed) will become beacon-enabled as well.

Beacons represent a concrete manifestation of the larger frenzy surrounding geolocation. Everyone wants to know where consumers are at any given moment, as epitomized by big players such as Foursquare, which has evolved from its gimmicky “check ins” to become more of a location-driven discovery service.

That’s why I was so intrigued by Foursquare’s most recent product announcement called Pinpoint. Shifting its focus from where people are now, Pinpoint is going to mine valuable insights around where people have been and let companies use it for precise ad targeting.

Details about Pinpoint are scarce right now, but Foursquare is smart to start mining its historical data. At the lowest level, it means that Foursquare can help, say, Starbucks target lots of Starbucks customers. Useful, but not too sophisticated. If Pinpoint can roll up businesses by type (such as pet food stores), it starts to get a lot more interesting. But the real home run would be to be able to divine purchase intent. If someone visits three car dealers in a short period of time, you suddenly have an amazingly valuable sales lead. And mining insights like this is now practical with Big Data tools.

But the real insight here is that your history data isn’t just ancient history: it provides the multiple data points you need to find patterns and trends. Knowing that a company replaces its CEO every 18 months or so is a hugely valuable insight that you can identify simply by comparing your current data to your historical data. At a minimum, you’ve got a powerful sales lead for recruiters. But that level of volatility might be a signal of a company with problems, thus creating useful insights in a business or competitive intelligence context. We’ve all heard about the predictive powerful of social media sentiment analysis. You may have equally valuable insights lurking in your own data. All you need to do is shine a light on them.

How Starbucks in Mall of America looks to Foursquare

How Starbucks in Mall of America looks to Foursquare

User Interface Design: No Small Matter

In advance of big changes to the way pensions are managed, the UK government set up a quasi-independent service called Money Advice Service (MAS). MAS has the worthy goal of trying to improve financial literacy, particularly among those about to retire.

As part of its program, MAS set up an online directory of financial advisors, just launched in beta. Given its high profile and semi-official status, the MAS directory has come under a lot of scrutiny, particularly from the financial advisors it lists, all of which are keen to be highly visible in this important new directory that anticipates very heavy use. But let’s look at it from a user’s perspective to see some important lessons on how not to create an online directory.

Sample Directory Listing

Sample Directory Listing

The directory database itself is quite mundane. It presents such information as advisor name, contact details, certifications (if any), and the types of services it provides (from a fixed list of categories). But here’s how a seemingly basic directory quickly becomes complicated.

First, it encountered the issue of business locations. It’s easy to list ABC Advisors at its headquarters address in London. But what if ABC Advisors has 400 branch offices scattered around the country? Do they each get individual listings? Even more confusing, how do you properly represent advisory firms that have independent advisors, many of whom work from home? What about advisory firms that are affiliated with other advisory firms? You may think all of this is annoying, but not a huge deal. But it becomes a huge deal when the user interface is location-centric.

As it happens, the MAS directory is location-centric. It uses a postal code to do a search to return results based on proximity. But depending on how you handle the entity issues described above, ABC Advisors might appear 100 times in results of a specific search (with each of its offices or advisors appearing as a separate listing), or not at all (because only the headquarters location was listed and it wasn’t anywhere nearby). This can be very confusing to users (who often see the multiple records as annoying duplicates and the absence of major companies as questionable data quality). And if you are selling paid participation or paid enhancements in the directory, this can cause an advertiser revolt.

The MAS directory also lets you search by specialty service. Here, results are not returned by proximity, and because there is no secondary sort on distance, the first search result may list a firm 500 miles away, while a firm 1 mile away appears on page three of search results.

Perhaps the biggest issue of all is that searches tend to return hundreds of listings, and the thin dataset gives the user very little information or tools to differentiate or compare them. Apparently, the plan is to add fees and charges in the near future to build out the database. In the meantime, users struggle with a marginally useful directory. Governments can get away with this.  But those of us in the business know how to do it a lot better – or at least we should. User interface design starts with the design of the database itself, which is in turn informed by the user needs and problems you are trying to address. Shortcuts in the design phase mean expensive additional work later, and can potentially endanger the success of your data product.

Comment

The Power of Predictive Prospecting

Out of all data products, the single largest group is what we call "opportunity finders," databases used by customers to identify sales prospects. These databases, many of which originated as print directories, have followed the normal trajectory of data publishing: moving from being a mile wide and an inch deep to adding tremendous amounts of depth. As publishers add more information to each listing (e.g., revenue, number of employees, year founded, line of business) they enable their users to engage in much more sophisticated targeting of sales prospects. In those situations where a company is looking to sell into a very specific market segment and the data exists to isolate those prospects, it's pretty much mission accomplished for the data publisher. For example, if you sell a product that is only of interest to banks with more than ten branch offices, you can probably find a database that will quickly help you to identify a manageable list of qualified prospects for your product. But there are an awful lot of situations that aren't so neat and tidy. For example, some companies have huge target markets such as "all companies with revenues under $5 million." Some companies literally target everybody. And an awful lot of companies are seeking highly defined target markets for which data doesn't exist (e.g., all private companies whose are considering starting a 401(k) plan).

Until recently, what this meant is that companies were required to slog through a huge number of semi-qualified prospects. Using expensive telesales and field sales teams, they would eventually identify some good prospects, but the work to do so was expensive, slow and not a lot of fun. Could there be a better way?

What we're seeing now are remarkable advances in lead scoring and predictive sales software. The premise is simple: by bringing to bear a lot of information and a lot of smarts about what data points might identify a good prospect, we are getting better a separating strong prospects from weak prospects. Some of the companies leading the way in this area are Lattice Engines (a DataContent 2012 presenter), Context Relevant and Infer.

The potential opportunity for data publishers is to move more aggressively into lead scoring for your customers. Imagine (possibly in combination with one of these firms) to allow your customers to enter parameters about their sales targets, then let them search your data to receive not only the raw information but a predictive score as well to indicate the quality of the prospect.

It's all part of the continued push to data publishers to surround their data with more powerful tools. And is there a tool more powerful that you can offer your customers than one that can help pinpoint where their next sales are most likely to come from?

Comment

Comment

Smarter Data is Right Inside the Box

Most of us are at least somewhat familiar with the concept of the “sales trigger,” something I lump into a larger category I call “inferential data.” If you’re not familiar with the concept, what we are talking about is taking a fact, for example that a company has just moved, and drawing inferences from that fact. We can infer from a recent company move that the company in question is likely to imminently be in the market for a host of new vendors for a whole range of mundane but important office requirements. So if we learn about this company move right after it happens (or, ideally, right before it happens), we have an event that will trigger a number of sales opportunities, hence the name “sales trigger.” But as I noted above, sales triggers in my view are a subset of inferential data. I say that because sales triggers tend to be rather basic and obvious, while true inferential data can get extremely nuanced and powerful, especially when you start analyzing multiple facts and drawing conclusions from them. Tech-savvy folks refer to these multiple input streams as “signals.”

Let’s go back to our example above. The company has moved. That means they likely need a new coffee service and cleaning service, among others. That’s fine as far as it goes. But let’s go deeper. Let’s take the company’s old address and new address, and bounce them against a commercial property database. If the company is moving from $20/square foot space to $50/square foot space, chances are this company is doing well. At a minimum, this makes for a more interesting prospect for coffee service vendors. But it can also be the basis for assigning a company a “high growth” flag, making it interesting to a much broader range of vendors, many of whom will pay a premium to learn about such companies.

Or perhaps we know this company has changed addresses three times in five years. We could infer from this either extremely high growth or extreme financial distress. Since this relocation signal doesn’t give us enough clarity, we need to marry it with other signals such as number of employees during the same period, or the cost of the space or amount of square feet leased. Of course, signals go far beyond real estate. If the company had a new product launch or acquisition during that period, these signals would suggest the address changes signify rapid growth.

You can see the potential power in inferential data, as well as the complexity. That’s because in the business of signals, the more the better. Pretty soon, you’re in the world of Big Data, and you’ll also need the analytical horsepower to make sense of all these data signals, and to test your assumptions. It’s not a small job to get it right.

That’s why I was excited to learn a company called – what else – Infer. Infer collects and interprets signals to help score sales leads. And it sells this service to anyone who wants to integrate it with their existing applications. It’s essentially SaaS for lead scoring. Intriguingly, Infer licenses data from numerous data providers to get critical signals it needs.

Inferential data makes any data it is added to smarter, which in turn makes that data more valuable. Many publishers have latent inferential data they can make use of, but for others, watch out for those “signals in a box” products from what I suspect will be a growing number of vendors in this space. It’s the smart thing to do.

Comment

Comment

Source Data’s True Worth

In my discussion of the Internet of Things (IoT) a few weeks back, I mentioned that there was a big push underway to put sensors in farm fields to collect and monitor soil conditions as a way to optimize fertilizer application, planting dates, etc. But who would be the owner of this information, which everyone in agriculture believes to be exceedingly valuable? Apparently, this is far from decided. An association of farmers, The Farm Bureau, recently testified in Congress that it believes that farmers should have control over this data, and indeed should be paid for providing access to it.

We’ve heard this notion advanced in many different contexts over the past few years. Many consumer advocates maintain that consumers should be compensated by third parties who are accessing their data and generating revenue from it.

Generally, this push for compensation centers on the notion of fairness, but others have suggested it could have motivational value as well: if you offer to pay consumers to voluntarily supply data, more consumers will supply data.

The notion of paying for data certainly makes logical sense, but does it work in practice? Usually not.

The first problem with paying to collect data on any scale is that it is expensive. More times than not, it’s just not an economical approach for the data publisher. And while the aggregate cost is large, the amount an individual typically receives is somewhere between small and tiny which really removes its motivational value.

The other issue (and I’ve seen this first-hand) is the perception of value. Offer someone $1 for their data, and they immediately assume it is worth $10. True, the data is valuable, but only once aggregated. Individual data points in fact aren’t worth very much at all. But try arguing this nuance to the marketplace. It’s hard.

I still get postal mail surveys with the famous “guilt dollar” enclosed. This is a form of paying for data, but it drives, as noted, off guilt, which means undependable results. Further, these payments are made to assure an adequate aggregate response: whether or not you in particular respond to the survey really doesn’t matter. It’s a different situation for, say, a data publisher trying to collect retail store sales data. Not having data from Wal-Mart really does matter.

Outside of the research world, I just haven’t seen many successful examples of data publishers paying to collect primary source data. When a data publisher does feel a need to provide an incentive, it’s almost always in the form of some limited access to the aggregated data. That makes sense because that’s when the data becomes most valuable: once aggregated. And supplying users with a taste of your valuable data often results in them purchasing more of it from you.

Comment