In summary, the researchers describe a system used in production at Yahoo that does daily builds of large user profiles. Each profile contains tens of thousands of features that summarize the interests of each user from the web pages they have viewed, searches they made, and ads they have viewed, clicked on, and converted (bought something) on. They explain how important it is to use conversions, not just ad clicks, to train the system. They measure the importance of using recent history (what you did in the last couple days), of using fine-grained data (detailed categories and even some specific pages and queries), of using large profiles, and of including data about ad views (which is a huge and low quality data source since there are multiple ad views per page view), and find all those significantly help performance.
Some excerpts from the paper:
We present the experiences from building a web-scale user modeling platform for optimizing display advertising targeting at Yahoo .... Our work ... [looks] into understanding the effect of different user activities on prediction, [gives] insights about the temporal aspect of user behavior (recency vs. long-term trends), and [explores] different variants (user representation and target label) through large offline and online experiments .... We deployed our platform to production and achieved a [large] boost in online metrics, such as eCPA, compared to the old system.Very interesting. A couple things I am left wondering:
Our objective is to refine the targeting constraints using the past behavior of the users ... [so] we can improve the number of conversions per ad impression without greatly increasing the number of impressions.
User profiles are aggregated logs from different systems/products (e.g. user logs of Yahoo News, Yahoo Finance, etc.) .... We consider several different events ... [including] pages visited .. the category of the page ... searches issued, clicks on search links, clicks on search advertising links ... [and] the category of the search query ... [and] views and clicks on ads ... [and] the ad category ... from an existing hierarchical ad categorizer.
Our results show [a] large performance loss incurred in favoring long-term history over short-term history. This is obvious as the recent history clearly communicates with a high probability the current interest of the user ... Although recent history is more important than older history, we still need to include older history to get the most complete idea about the user.
Results show ... many of our raw features are completely non-discriminative. However, a small percentage of these features are actually important ... [For example, just] ... dropping all raw ad views ... [or if] we drop all raw features and only keep categorical features ... [causes] dropping [of] the weighted AUC measure by 3.69% and 4.26%, respectively ... In production ... we apply a coarse feature selection through mutual information, then we apply a rigorous feature selection through l1 regularization.
First, they found recent history is very effective, yet only update the profiles daily. Wouldn't their results on the value of recent behavior (which others found too) suggest that there would be benefit from hourly or, even better, real-time updates of the profiles (perhaps with a second memory-based, unreliable, and partial coverage system supplementing the data in the more complete and more accurate older profiles)? That would allow the system to adapt immediately when someone, for example, starts looking at information for a vacation to Hawaii and show relevant offers immediately instead of only being able to do it the next day when it is usually too late. Unfortunately, I suspect we're not going to see really big gains in relevance and usefulness of ads without real-time updates to profiles of fine-grained interests; results that show that data only 24 hours old is better than data a week old may only be a tease of the gains to be seen with data only seconds old.
Second, they find that features based on individual search queries and pages viewed ("raw features") usually have no value, but occasionally have enough value that it is important to include some. Wouldn't that suggest that the categorization scheme for pages viewed, searches made, and ads need to be more fine-grained (e.g. not just the category "pants", but the category "men's boot cut jeans")? Or, better, perhaps more fine-grained while also correctly cross correlated (interest in "men's boot cut jeans" not only shows in the data a weak interest in all pants, but also maybe has been shown to indicate a fairly strong interest in "men's flannel shirts")?
If you are interested in this paper, you might also want to look at another recent paper out of Yahoo Research, "Learning to Target: What Works for Behavioral Advertising" (ACM), which is referenced multiple times by this paper and describes the features used in the user profile in a bit more detail, as well as the results of some other experiments.
Please see also my 2007 post, "What to advertise when there is no commercial intent?"