Viewing entries in

Where the Value is In Visual Data

The New York Times recently reported on the results of a fascinating project conducted at Stanford University. Using over 50 million images drawn from Google Street View, along with ZIP code data, the researchers were able to associate automobile ownership preferences with voting patterns. For example, the researchers found that the type of vehicles most strongly associated with Republican voting districts are extended-cab pickup trucks.

While this particular finding may not surprise you, the underlying work represents a programmatic tour de force, because artificial intelligence software was used to identify and classify the vehicles found in these 50 million images. The researchers used automotive experts to identify specific makes and models of cars from the images, giving the software a basis for training itself to find and identify vehicles all by itself, regardless of the angle of the photo, shadows and a host of other factors that make this anything but an easy task.

This project is believed to represent that first time that images have been used on a large scale to develop data. And while this image identification is a technically impressive example of both artificial intelligence and Big Data, most of the really useful insights come from associating the finding with other datasets, what I like to refer to as Little Data.

Think about it. The artificial intelligence software is given as input an image, and the ZIP code associated with that image. The software identifies an automobile make and model from the image, and creates an output record with two elements: the ZIP code and a normalized make and model description of the automobile. With this, you can explore auto ownership patterns by geography. But with just a few more steps, you can go a lot further.

You can use “little data” government and private datasets to link ZIP code to voting districts and thus voting patterns. With this information, you can determine that people living in Republican districts prefer extended-cab pickup trucks.

You can also use the ZIP code in the record to link to “little data” Census demographic data summarized at ZIP level. With this, you can correlate car ownership patterns to such things as income, race, education and ethnicity. Indeed, the study found it could predict demographics and voting patterns based on auto ownership.

And you can go further. You can link your normalized automobile make and model data to “little data” datasets of automobile technical specifications which is how the study determined, for example, that based on miles per gallon, Burlington, Vermont is the greenest city in the United States.

Using artificial intelligence on a Big Data image database to build a normalized text database is impressive. But all the real insights in this study could only be developed by linking Big Data to Little Data to allow for granular analysis.

While Big Data and artificial intelligence are getting all the breathless coverage, we should never forget that Little Data is what’s providing the real value behind the scenes.  

The 50% Solution

A saying attributed to the famous Philadelphia retailer John Wanamaker is that, “Half the money I spend on advertising is wasted; the trouble is I don't know which half.” Apparently, that saying can be updated for the Internet age to read, “Half the traffic to my website is non-human; the trouble is I don't know which half.”

In fact, the percentage is worse than that. According to a study by online researcher Imperva, a whopping 61.5% of traffic on the web is non-human. What do we mean by non-human? Well, it’s a category that include search engines, software that’s scraping your website, hackers, spammers and others who are up to no good.

And yes, it gets worse. The lower the traffic to your website, the greater the percentage that is likely to be non-human. Indeed, if your site gets 1,000 of fewer visits per day, the study suggests that as much as 80% of your traffic may be non-human.

Sure, a lot of this non-human traffic is search engines (and you’d be amazed how many there still are out there), and that’s probably a good thing. After all, we want exposure. But the rest of this traffic is more dubious. About 5% of your overall site traffic is likely to be scrapers -- –people using software to grab all the content on your site, for purposes benign or evil. Sure, they can’t get to your password protected content, but if you publish any amount of free data on your site in structured form, chances are that others now have that data in their databases.

Obviously, if your sell online advertising, these statistics represent an inconvenient truth. The only saving grace is that your competitors are in the same boat. But if you are a subscription site, does any of this even matter?

I think it does. Because all this non-human activity distorts all of our web analytics in addition to our overall visitor counts. Half the numbers we see are not real. These non-human visitors could lead you to believe certain pages are more popular on your site than the really are; this could cause you to use bad insights to fashion your marketing strategy. And if you are using paid search to generate traffic, you could be getting similarly bad marketing data, and paying for the privilege as well.

Most importantly, this non-human traffic distorts reality. If you’re beating yourself up because of low response, lead generation or order rates, especially given the number of uniques and page views you appear to be getting, start by dividing by two. Do your numbers suddenly look a lot better? Bots and scrapers and search engines don’t request demos, don’t download white pages and certainly don’t buy merchandise. Keep that in mind next time you’re looking at your site analytics reports or puzzling why some pages on your site get so much more attention than others. Remember, not all data are good data.

Ad Blocking in Perspective

There has been tremendous anxiety in the media world around Apple’s move to allow ad blocking software on iPhones and iPads. After all, eliminate ads from mobile devices, and you take a big bite out most publishers’ ad revenue. Publishers are describing this move by Apple in near-Apocalyptic terms. But let’s get a grip.

First, we need to be clear that this ad blocking capability applies to the mobile web, not to apps. In that respect, this move by Apple is really just a big kick in the pants to build an app and get your audience onto it as quickly as possible.

Second, this move makes a lot more sense when you consider what’s driving it. Apple doesn’t make money from mobile search advertising; Google does. Apple doesn’t like Google for a variety of reasons, hence this aggressive move cuts into Google’s main source of revenue. We’re all just collateral damage in this war of the titans. But this perspective also helps you understand why apps are (and will likely remain) protected from ad blocking technology. The Apple ecosystem depends on apps, and Apple makes a lot of money from apps. Apple is not really against all mobile advertising; it’s against mobile advertising that benefits Google.

Third, some of these new mobile ad blockers will reportedly strip out some content as well as advertising (not text, but some things such as bloated masthead graphics). Indeed, the new breed of ad blockers are really less focused on eliminating advertising than improving the mobile user experience by speeding up page loads as much as possible.

Fourth, once again, publishers are feeling the pain of a self-inflicted wound. By junking up their websites (and by extension their mobile websites) with all manner of trackers, ad networks, auto-play video, re-targeting ads, overlays, and perhaps most ironic of all, ads to get the user to download the publishers own app, we’ve junked up the mobile experience quite thoroughly. When was the last time you recall having a satisfactory (as in fast and easy) mobile web session?

I certainly agree that a lot of people are using ad blocking software out of a sense of entitlement – they truly believe they should have limitless access to content without fee and ad-free. Of course that’s another self-inflicted wound (a topic I’ve discussed many times over the years). But the more important reason that users are flocking to ad blocking software is that it actually improves their online experiences. That’s a sad statement, but the resolution of the problem is firmly under our control.

Tapping Into Phone Data

For all marketers, B2B marketers in particular, the telephone has long been both a great friend and a big problem. Telephones are a great friend, because someone who calls you, particularly if it’s in response to your advertising, is a top quality prospect. At the same time, telephone calls resulting from ad campaigns have remained difficult to count, measure and evaluate.

And it’s not for lack of trying. I go back in this industry long enough to remember the glory days of “key phone” numbers. In essence, publishers would convince advertisers to use a dedicated phone number in each ad campaign as a crude way to track results. This approach worked, but because they really only yielded call counts, all they could do is prove a point for the publisher. Key phone yielded very little insight into the nature and quality of these calls.

Lest you think key phones are a dated concept, it’s interesting to note that this is essentially what Google is doing with its recent launch of call tracking for AdWords. Intriguingly, Google hasn’t really advanced this technology much – it’s all about using dedicated phone numbers to count the calls generated by your AdWord campaign.

Yes, for 30 years, call tracking technology hasn’t advanced very much. At least that’s what I thought until I recently ran across a company called Convirza.

Convirza offers basic call counting. But it goes much, much further. It has developed software that analyzes every incoming call (most companies already announce that incoming calls may be recorded, putting to bed any privacy issues), actually listening to each call to provide a call quality score. It can measure the outcome of the call, presumably by listening for keywords, to measure call conversion rate. It can even flag calls where it feels the salesperson left money on the table by not trying to upsell or cross-sell the customer. More generally, it can provide a quantitative assessment of the quality of each salesperson’s selling skills.

But wait, there’s more. Convirza integrates with marketing automation software, and can even be used to adjust online ad spending in real-time. If a particular program is generating a solid percentage of calls that convert, that program can be immediately scaled up.

This isn’t even everything that Convirza does, but you get the idea. By analyzing and deconstructing recorded phone conversations, Convirza is generating high-value, actionable data where none existed before. And stunningly, it’s left Google in the dust, because while Google is fine for counting calls, Convirza solves for the “last mile” problem: whether or not that call converted.

We should follow Convirza’s example and expand our thinking about how to extract data from unconventional sources to solve real-world business problems. It’s also a technology that advertising-based publishers could likely adapt to provide not only proof of performance, but a remarkable level of added value to their online advertisers.

Upping the Data Ante

Step back a bit from the fray and you’ll see an interesting evolution in the world of data: from providing lists of people or entities that might be prospects, to lists of people or entities that should be prospects, based on something they have done (think sales triggers). Now we’re beginning to move squarely into what used to be the realm of science fiction: identifying prospects before they have done anything at all.

We’re blazing new trails here, and pre-prospecting (for lack of a better name) depends heavily on lots of input data and Big Data analytics. The 800-pound gorilla in this space right now is a company called InsideSales that calls its analytical secret sauce “Neuralytics.”

All hype, you say? Well some level of hype is a given these days, but the company has raised over $139 million to date, and in particular has fallen hard for the company’s pitch, and actually led its most current funding round, that also included Microsoft.

I don’t have any inside knowledge of what InsideSales is up to, but from the tantalizing tidbits that have surfaced in the press, it seems to be a combination of obvious inputs such as social media feeds, plus less intuitive things such as weather patterns and sports team scores. I can only guess that you’re a somewhat better prospect if it’s sunny out and your team won last night, but perhaps these data are being used in a more subtle and sophisticated way.

The other hint I picked up is that InsideSales depends on “email and phone records” to perform its analytical alchemy. Needless to say, these tend not to be public records, so to deliver the holy grail of sales prospecting, InsideSales apparently depends on the holy grail of input data as well!

I’m not dismissing InsideSales, primarily because I am doing some big league speculating here. But I will say there are data sources available today that get us a long way towards the notion of pre-prospecting. What excites me the most is what is going on today with online ad re-targeting. Ad re-targeting is based on what might be described as networked cookies. Visit a site, and a common cookie is placed on your computer. As you move to other sites that are part of the network, ads can be displayed based on sites you’ve previously visited. More importantly, your travels around the Internet can be centrally stored, creating a wealth of information about you, your interests, your habits and much more. While not easy, it is a straightforward leap to start learning about not only what interests you but also what are the early signs that you are beginning to contemplate a purchase.

Privacy isn’t the issue in re-targeting (at least for now), because nobody needs to know who you are for re-targeting to work. But as your movements around the Internet are recorded and analyzed, it is entirely possible that we’ll someday know when you’re thinking about buying something, and perhaps even a little before.

The next generation of sales insights likely isn’t all that far away, so now is a good time to do some pre-pondering on what it might mean to you and your business.