Tuesday, June 22, 2010

Google to personalize metashopping

In an interview with CNet, Googler Sameer Samat talks about Google's future plans for shopping search, including personalization and recommendations.

An excerpt:
One thing Google doesn't do very well is provide the shopping-as-adventure experience ... You might go to the mall with a specific product in mind, but a well-designed mall ... forces you to discover -- and hopefully purchase -- other products that you might not have even known you wanted: the marketing types like to call this "serendipity." Google wants to be known as a destination for that kind of experience, said Sameer Samat, director of product management.

After years of trying and failing to reach that goal, Google plans to give it another go over the coming months. Don't expect Google to turn into a full-blown online retailer among the likes of Amazon.com or Buy.com just yet. But the combination of personalized features for product search pages and what Samat thinks is "the largest database of products that has been created" could entice people to actually shop on Google.

Google's current approach works best for those who are on a mission when they shop, shoppers who already know what they want and are just looking for additional information before sealing the deal ... [But] there are millions of other people who treat shopping as leisure, rather than a simple transaction. These are people who ... prefer browsing to targeted shopping, knowing that every now and then they'll discover something totally unique or completely unexpected.

Google wants to serve more of those people ... [by making] recommendations based on that list of products and lists submitted by others to help you discover new products: sort of like Amazon's recommendations page meets Pandora's radio stations meets Google.

"Shopping is not just about search, it's not just about intent, it's about discovery," Samat said. "If we can do it, and do it well, we will have built something that's really amazing; it should be the most comprehensive experience for shopping you could ever find."
On a related note, Google is pushing aggressively to get retailers to use Google's commerce search engine to run their search experience. Each deal Google signs gives them more detailed information about another retailing vertical. It's all about the data.

Monday, June 14, 2010

Google on presentation bias in search

A few Googlers at WWW 2010 had a paper, "Beyond Position Bias: Examining Result Attractiveness as a Source of Presentation Bias in Clickthrough Data" (PDF), that explores how much people tend to click on eye-catching search results rather than seeking the most relevant search results.

The work itself was pretty simple -- just looking at how bolding title and abstract terms changes clickthough rates in A/B tests -- but I think the paper is worth a peek for two reasons. First, it is a decent survey of some of the current work on position and presentation bias. Second, it exposes some of Google's struggles with the difficulty of deriving searcher satisfaction from the noisy proxies that we have available like click data.

By the way, I love the fact, noted in the paper, that people tend to click on the last result much more than you would expect. The reason is that people don't linearly scan down a page, but often jump to the bottom and focus attention there. A decade ago at Amazon, the personalization team exploited this effect and seized the space at the bottom of most pages on the site for our features. You see, when we saw no one had built tools to track click and conversion data, we built them, and then we used them. No one else realized the value of the space at the bottom of the page, but we did.

For more on the struggle to evaluate search results from noisy click data, please see some of my older posts, "Modeling how searchers look at search results ", "Finding task boundaries in search logs", and "Testing rankers by interleaving search results".

Tuesday, June 08, 2010

Travel itineraries from Flickr photo trails

Every once in a while, you hit a paper that seems like a startup waiting to happen. A paper that will be presented next week at HT 2010 by Yahoo Research is one of these.

The paper, "Automatic Construction of Travel Itineraries using Social Breadcrumbs" (PDF), cleverly uses the data often embedded in Flickr photos (e.g. timestamp, tags, sometimes GPS) to produce trails of where people have been in their travels. Then, they combine all those past trails to generate high quality itineraries for future tourists that tell them what to see, where to go, how long to expect to spend at each sight, and how long to allow for travel times between the sights.

Some excerpts from the paper:
Shared photos can be seen as billions of geo-temporal breadcrumbs that can promisingly serve as a latent source reflecting the trips of millions of users ... [We] automatically construct travel itineraries at large scale from those breadcrumbs.

By analyzing these breadcrumbs associated with a person's photo stream, one can deduce the cities visited by a person, which Points of Interest (POI) that the person took photos at, how long that person spent at each POI, and what the transit time was between POIs visited in succession.

By aggregating such timed paths of many users, one can construct itineraries that reflect the "wisdom" of touring crowds. Each such itinerary is composed of a sequence of POIs, with recommended visit times and approximate transit times between them.

[In surveys] users perceive our automatically generated itineraries to be as good as (or even slightly better than) itineraries provided by professional tour companies.
This reminds me quite a bit of the work on using GPS trails from mobile devices like phones (e.g. [1] or [2]) or search histories on maps (e.g. [3]). But, the use of Flickr photos as the data source is clever, especially for this application where the photos are also useful in the final output and the gaps in the data stream are not important.

Fun idea, nicely implemented, and very convincing results. Definitely worth a read. Don't miss the thoughts at the end on expansions to the idea, such as changing how the trails are filtered and aggregated based on individual preferences to generate personalized itineraries.

Monday, June 07, 2010

A Findory buyout offer from Yahoo?

I just received this check from Yahoo payable to Findory.com:


I've never seen a check for $0.00 before. Is that a buyout bid, Yahoo? Your offer for Findory?

In all seriousness, I'm sure this is just an mistake over in Yahoo-land, perhaps leftovers from the experiments Findory did with layering its personalized advertising engine over ads pulled from the Yahoo Publisher Network.

Still, years after Findory ended, years after Findory talked to Yahoo, and seeing how Yahoo now struggles to personalize content, I can't help but laugh at this check for $0.00 from Yahoo to Findory.

Tuesday, June 01, 2010

How Bing predicts the CTR of ads

An upcoming ICML 2010 paper, "Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine", describes the algorithm actually used in the Bing search engine to predict the clickthrough rates of ads.

From the paper:
Recognising the importance of CTR estimation for online advertising ... Bing/adCenter decided to run a competition to entice people across the company to develop the most accurate and scalable CTR predictor.

The algorithm described in this publication tied for first place in the first competition and won the subsequent competition based on prediction accuracy. As a consequence, it was chosen to replace Bing's previous CTR prediction algorithm, a transition that was completed in the summer of 2009.
The paper goes on to describe why the problem is important, the algorithm used, and some of the nastiness of getting something that works in the lab to run on the live site.

Don't miss the tidbit at the end where they say that they are "investigating the use of more powerful models, such as the feature-based collaborative filtering method Matchbox (Stern, Herbrich, & Graepel, 2009) for latent feature discovery and personalisation."