Viewing entries in

Use Your Computer Vision

Those familiar with the powerhouse real estate listing site Zillow will likely recall that it burst on the scene in 2006 with an irresistible new offering: a free online estimate of the value of every house in the United States. Zillow calls them Zestimates. The site crashed continuously from too much traffic when it first launched, and Zillow now gets a stunning 195 million unique visitors monthly, all with virtually no advertising. Credit the Zestimates for this.

 As you would expect, Zestimates are derived algorithmically, using a combination of public domain and recent sales data. The algorithm selects recent sales of similar comparable nearby houses to compute estimated value. 

As you would also expect, professional appraisers hate Zestimates. They believe that they produce better valuation estimates because they hand select the comparable nearby homes and are thus more accurate. However, with the goal of consistent appraisals, the hand selection process that appraisers use is so prescribed and formulaic that it operates much like an algorithm does. At this level, you could argue that appraisers have little advantage over the computed Zestimate.

However, one area in which appraisers have a distinct advantage is that they are able to assess the condition and interiors of the properties they are appraising. They visually inspect the home and can use interior photos of comparable homes that have recently sold to refine their estimates.

Not to be outdone, Zillow is employing artificial intelligence to create what it calls “computer vision.” Using interior and exterior photos of millions of recently sold homes, Zillow now assesses such things as curb appeal, construction quality and even landscape;  quantifies what it finds;  and factors that information into its valuation algorithm. When it has interior photos of a house, it scans for such things as granite countertops, upgraded bathrooms and even how much natural light the house enjoys, and incorporates this information into its algorithm as well.

 With this advance, appraisers look very much like their competitive advantage is owning “the last mile,” because they are the feet on the street that actually visit the house being appraised. But you can see where things are heading: as companies like Zillow refine their technology, the day may well come that an appraisal is performed by the homeowner uploading interior pictures of her house, and perhaps confirming public record data, such as number of rooms in the house.

There are many market verticals where automated inspection and interpretation of visual data can be used. While the technology is in its infancy, its power is undeniable, so it’s not too early to think about possible ways it might enhance your data products.

Where the Value is In Visual Data

The New York Times recently reported on the results of a fascinating project conducted at Stanford University. Using over 50 million images drawn from Google Street View, along with ZIP code data, the researchers were able to associate automobile ownership preferences with voting patterns. For example, the researchers found that the type of vehicles most strongly associated with Republican voting districts are extended-cab pickup trucks.

While this particular finding may not surprise you, the underlying work represents a programmatic tour de force, because artificial intelligence software was used to identify and classify the vehicles found in these 50 million images. The researchers used automotive experts to identify specific makes and models of cars from the images, giving the software a basis for training itself to find and identify vehicles all by itself, regardless of the angle of the photo, shadows and a host of other factors that make this anything but an easy task.

This project is believed to represent that first time that images have been used on a large scale to develop data. And while this image identification is a technically impressive example of both artificial intelligence and Big Data, most of the really useful insights come from associating the finding with other datasets, what I like to refer to as Little Data.

Think about it. The artificial intelligence software is given as input an image, and the ZIP code associated with that image. The software identifies an automobile make and model from the image, and creates an output record with two elements: the ZIP code and a normalized make and model description of the automobile. With this, you can explore auto ownership patterns by geography. But with just a few more steps, you can go a lot further.

You can use “little data” government and private datasets to link ZIP code to voting districts and thus voting patterns. With this information, you can determine that people living in Republican districts prefer extended-cab pickup trucks.

You can also use the ZIP code in the record to link to “little data” Census demographic data summarized at ZIP level. With this, you can correlate car ownership patterns to such things as income, race, education and ethnicity. Indeed, the study found it could predict demographics and voting patterns based on auto ownership.

And you can go further. You can link your normalized automobile make and model data to “little data” datasets of automobile technical specifications which is how the study determined, for example, that based on miles per gallon, Burlington, Vermont is the greenest city in the United States.

Using artificial intelligence on a Big Data image database to build a normalized text database is impressive. But all the real insights in this study could only be developed by linking Big Data to Little Data to allow for granular analysis.

While Big Data and artificial intelligence are getting all the breathless coverage, we should never forget that Little Data is what’s providing the real value behind the scenes.  

The 50% Solution

A saying attributed to the famous Philadelphia retailer John Wanamaker is that, “Half the money I spend on advertising is wasted; the trouble is I don't know which half.” Apparently, that saying can be updated for the Internet age to read, “Half the traffic to my website is non-human; the trouble is I don't know which half.”

In fact, the percentage is worse than that. According to a study by online researcher Imperva, a whopping 61.5% of traffic on the web is non-human. What do we mean by non-human? Well, it’s a category that include search engines, software that’s scraping your website, hackers, spammers and others who are up to no good.

And yes, it gets worse. The lower the traffic to your website, the greater the percentage that is likely to be non-human. Indeed, if your site gets 1,000 of fewer visits per day, the study suggests that as much as 80% of your traffic may be non-human.

Sure, a lot of this non-human traffic is search engines (and you’d be amazed how many there still are out there), and that’s probably a good thing. After all, we want exposure. But the rest of this traffic is more dubious. About 5% of your overall site traffic is likely to be scrapers -- –people using software to grab all the content on your site, for purposes benign or evil. Sure, they can’t get to your password protected content, but if you publish any amount of free data on your site in structured form, chances are that others now have that data in their databases.

Obviously, if your sell online advertising, these statistics represent an inconvenient truth. The only saving grace is that your competitors are in the same boat. But if you are a subscription site, does any of this even matter?

I think it does. Because all this non-human activity distorts all of our web analytics in addition to our overall visitor counts. Half the numbers we see are not real. These non-human visitors could lead you to believe certain pages are more popular on your site than the really are; this could cause you to use bad insights to fashion your marketing strategy. And if you are using paid search to generate traffic, you could be getting similarly bad marketing data, and paying for the privilege as well.

Most importantly, this non-human traffic distorts reality. If you’re beating yourself up because of low response, lead generation or order rates, especially given the number of uniques and page views you appear to be getting, start by dividing by two. Do your numbers suddenly look a lot better? Bots and scrapers and search engines don’t request demos, don’t download white pages and certainly don’t buy merchandise. Keep that in mind next time you’re looking at your site analytics reports or puzzling why some pages on your site get so much more attention than others. Remember, not all data are good data.

Ad Blocking in Perspective

There has been tremendous anxiety in the media world around Apple’s move to allow ad blocking software on iPhones and iPads. After all, eliminate ads from mobile devices, and you take a big bite out most publishers’ ad revenue. Publishers are describing this move by Apple in near-Apocalyptic terms. But let’s get a grip.

First, we need to be clear that this ad blocking capability applies to the mobile web, not to apps. In that respect, this move by Apple is really just a big kick in the pants to build an app and get your audience onto it as quickly as possible.

Second, this move makes a lot more sense when you consider what’s driving it. Apple doesn’t make money from mobile search advertising; Google does. Apple doesn’t like Google for a variety of reasons, hence this aggressive move cuts into Google’s main source of revenue. We’re all just collateral damage in this war of the titans. But this perspective also helps you understand why apps are (and will likely remain) protected from ad blocking technology. The Apple ecosystem depends on apps, and Apple makes a lot of money from apps. Apple is not really against all mobile advertising; it’s against mobile advertising that benefits Google.

Third, some of these new mobile ad blockers will reportedly strip out some content as well as advertising (not text, but some things such as bloated masthead graphics). Indeed, the new breed of ad blockers are really less focused on eliminating advertising than improving the mobile user experience by speeding up page loads as much as possible.

Fourth, once again, publishers are feeling the pain of a self-inflicted wound. By junking up their websites (and by extension their mobile websites) with all manner of trackers, ad networks, auto-play video, re-targeting ads, overlays, and perhaps most ironic of all, ads to get the user to download the publishers own app, we’ve junked up the mobile experience quite thoroughly. When was the last time you recall having a satisfactory (as in fast and easy) mobile web session?

I certainly agree that a lot of people are using ad blocking software out of a sense of entitlement – they truly believe they should have limitless access to content without fee and ad-free. Of course that’s another self-inflicted wound (a topic I’ve discussed many times over the years). But the more important reason that users are flocking to ad blocking software is that it actually improves their online experiences. That’s a sad statement, but the resolution of the problem is firmly under our control.

Tapping Into Phone Data

For all marketers, B2B marketers in particular, the telephone has long been both a great friend and a big problem. Telephones are a great friend, because someone who calls you, particularly if it’s in response to your advertising, is a top quality prospect. At the same time, telephone calls resulting from ad campaigns have remained difficult to count, measure and evaluate.

And it’s not for lack of trying. I go back in this industry long enough to remember the glory days of “key phone” numbers. In essence, publishers would convince advertisers to use a dedicated phone number in each ad campaign as a crude way to track results. This approach worked, but because they really only yielded call counts, all they could do is prove a point for the publisher. Key phone yielded very little insight into the nature and quality of these calls.

Lest you think key phones are a dated concept, it’s interesting to note that this is essentially what Google is doing with its recent launch of call tracking for AdWords. Intriguingly, Google hasn’t really advanced this technology much – it’s all about using dedicated phone numbers to count the calls generated by your AdWord campaign.

Yes, for 30 years, call tracking technology hasn’t advanced very much. At least that’s what I thought until I recently ran across a company called Convirza.

Convirza offers basic call counting. But it goes much, much further. It has developed software that analyzes every incoming call (most companies already announce that incoming calls may be recorded, putting to bed any privacy issues), actually listening to each call to provide a call quality score. It can measure the outcome of the call, presumably by listening for keywords, to measure call conversion rate. It can even flag calls where it feels the salesperson left money on the table by not trying to upsell or cross-sell the customer. More generally, it can provide a quantitative assessment of the quality of each salesperson’s selling skills.

But wait, there’s more. Convirza integrates with marketing automation software, and can even be used to adjust online ad spending in real-time. If a particular program is generating a solid percentage of calls that convert, that program can be immediately scaled up.

This isn’t even everything that Convirza does, but you get the idea. By analyzing and deconstructing recorded phone conversations, Convirza is generating high-value, actionable data where none existed before. And stunningly, it’s left Google in the dust, because while Google is fine for counting calls, Convirza solves for the “last mile” problem: whether or not that call converted.

We should follow Convirza’s example and expand our thinking about how to extract data from unconventional sources to solve real-world business problems. It’s also a technology that advertising-based publishers could likely adapt to provide not only proof of performance, but a remarkable level of added value to their online advertisers.