Wednesday, January 02, 2008

Upcoming Yahoo talk on computational advertising

Andrei Broder from Yahoo Research will be giving a talk on "Computational Advertising" next week (Thursday, Jan 10) at University of Washington. Video of the talk will be available live and archived a few days afterward.

The talk looks like a great one. Excerpts from the description:
Computational advertising ... [attempts to] find the "best match" between a given user in a given context and a suitable advertisement.

The information about the user can vary from scarily detailed to practically nil. The number of potential advertisements might be in the billions. Thus, depending on the definition of "best match" this challenge leads to a variety of massive optimization and search problems, with complicated constraints.

This talk will give an introduction to this area and give a "taste" of some recent results.
From the way Andrei is framing the problem -- matching advertisements not only to content, but also to what we know about each user -- Andrei clearly is talking about personalized advertising.

Personalized advertising is a tremendous computational challenge. Traditional contextual advertising matches ads to static content. We only have to do the match infrequently, then we can show a selection of the ads that we think will work well for a given piece of content to everyone who views that content.

With personalized advertising, we match ads to content and each user's interests, and then show different ads for each user. Like with all personalization, caching no longer works. Each user sees a different page. With personalized advertising, targeting ads now means we have to find matches in real-time for each page view and each user.

On a related note, Yahoo Researcher Omid Madani and ex-Yahoo Researcher Dennis DeCoste had an interesting short paper back in 2005, "Contextual Recommender Problems" (PDF), that has some more thoughts on this problem. As I wrote in an older post, Omid and Dennis treated personalized advertising as a recommendations problem and proposed a few methods of attacking the problem.

By the way, this talk seems to be a shift for Andrei away from a more general problem of personalized information -- which he called "information supply" -- toward focusing on the more specific task of personalized advertising.

See also the Computational Advertising page at Yahoo Research and its list of team members and papers.

Update: It appears this talk will not be archived. It will be broadcast live. If you want to see it remotely, you will have to watch it at the time of the talk.

Update: It was an interesting talk, but, frankly, a bit disappointing in its lack of depth.

Andrei spent most of the talk describing the state of online advertising today, including market size and how targeted advertising works. He touched on some of the more interesting and harder problems, but only touched on them, and only very briefly.

For example, on one slide, Andrei criticized Google AdSense for showing ads for Libby shoes on an article about Dick Cheney and Scooter Libby, saying that the match is spurious. But, Andrei did not say what would be a better ad to show for that news article. In response to my question later, Andrei did say that perhaps no ad is appropriate in that case, but he did not expand on this to talk about how to detect, in general, when it might be undesirable to show ads because of lack of value and commercial intent. When I did a follow-up question after the talk, he expanded briefly into ideas around personalized advertising -- showing ads that might interest this user based on this person's history rather than ads targeting the current content -- and an advertising engine that explores and attempts to learn what ads might be effective, but not in any depth.

For another example, on one slide, Andrei drew a parallel between web search and advertising search, arguing that both can be seen as searches for information, but pointed out that advertisements are a smaller database of smaller documents and that the relevance rank of a search for ads depends on the bids. He did not discuss the issue that web search in some ways is an easier problem, though, in that the results are more easily cached. Web search relevance rank is static over substantial periods of time, but advertisements are not because the relevance of ads depends not only on keyword matching, but also on bids, competing bids, budgets, and clickthrough rates, all of which can vary rapidly.

For a third example, Andrei briefly mentioned using a user profile for personalized advertising, but only touched the surface of what that profile should contain, how it should be used, where it should be stored, in what cases personalized advertising is likely to outperform unpersonalized advertising, and how trying to show different ads to different people massively increases the computation necessary for ad targeting.

The details that did come on the hard problems mostly were in the form of references off to other papers. When talking about how to approximately match keywords picked for ads to the keywords for content, Andrei mentioned Ribeiro-Neto et al. "Impedance coupling in content-targeted advertising" and Yih et al., "Finding Advertising Keywords on Web Pages" (PDF), the latter of which is excellent, by the way. When talking about determining intent, Andrei referred to two of his own recent papers, "A semantic approach to contextual advertising" and "Robust classification of rare queries using web knowledge".

Overall, there was an unfortunate lack of detail on how to solve the "best match" massive optimization challenge of online advertising under all its complicated constraints. Andrei did hint at one point that all the search giants are reluctant to talk about these details, but it is too bad that we were not be able to explore the fun issues in more depth.

Update: Andrei gave a version of his talk at WWW 2008. During the question and answer time, I asked him to expand on what is the "best match" for an advertisement given a user and a context.

He described three major categories of utility: advertiser utility, user utility, and publisher utility. In response to further questions, he suggested that advertiser utility is complicated by the fact that ad agencies may have different incentives than the advertiser and by branding effects. He also pointed out that publisher utility is not as simple as just revenue because of publisher branding issues (e.g. the New York Times will not accept ads for pornography).

As for user utility, suggested that it is complicated and difficult to measure, but that clickthrough rate may be one proxy for it.

Andrei did not expand on how these utility functions could or should be combined or how to deal with conflicts between them.


Anonymous said...

Greg, so you are the one who asked which ad should be served for the Libby article. I think Andrei said a politics themed ad should be served. I do agree with you that the talk was more about 'advertising' than 'computational', and indeed it was possibly just a recruiting talk for Yahoo.

Good luck with your new adventure in Microsoft.

Greg Linden said...

Thanks, Anonymous. Andrei did at one point suggest showing political ads for that article, which, more generally, might have been hinting toward the approach he mentioned later from one of his papers of targeting the general category of a document rather than keywords.

Nevertheless, there was little discussion of whether that was preferable to not showing ads at all -- another option he suggested -- or how to determine if either of these two strategies is optimal when put against other options.

Anonymous said...

I'm not sure I agree with you that personalized advertising represents that much greater of a computational challenge.

All ad servers today can't cache too much as it is, due to all the rules/reqs placed on the queue for ad selection (schedules, volume ceilings, freq caps, etc.). Let's say your cache can't be more than 2 minutes. When targeting people instead of content, seems you could cache personal profiles for 2+ minutes as well. To the extent you target them in real-time, then aren't your ads apt to be about the same as targeting the content/context?

Furthermore, on a macro level, there are a lot more pages on the Web than people!

Jordan Mitchell