Sunday, February 25, 2007

Marissa Mayer interview on personalized search

Gord Hotchkiss at Search Engine Land posted an interview (partial with comments, full transcript) with Google VP Marissa Mayer on personalized search.

Some selected excerpts:
Search engines in the future will become better for a lot of different reasons, but one of the reasons will be that we understand the user better ... Search engines will have become more personalized.

We've been working on personalized search now for almost 4 years. It goes back to the Kaltix acquisition .... They were on the cutting edge of how personalization would be done on the web, and they were capable of looking at things like a searcher’s history and their past clicks, their past searches, the websites that matter to them.

We acquired them in 2003 and we've worked for some time since to outfit our production system to be capable of doing that [personalization] computation and holding a vector for each user in parallel to the base [vector].

Our standards are really high. We only want to offer personalized search if it offers a huge amount of end user benefit ... We're very comfortable and confident in the relevance seen from those technologies.

Overall, we really feel that personalized search is something that holds a lot of promise, and we're not exactly sure of the signals that will yield the best results. We know that search history, your clicks and your searches together provide a really rich set of signals ... It's a matter of understanding how... The more signals that you have and the more data you have about the user, the better it gets.
At the end of the interview, Marissa also indicated that they will not be personalizing advertising any time soon; the focus is on using individual searcher history to improve relevance, not to improve revenue directly.

See also some of the papers -- "Scaling Personalized Web Search" and "An Analytical Comparison of Approaches to Personalizing PageRank" -- by the Kaltix folks.

See also my Feb 2007 post, "Google expands personalization", and my June 2005 post, "More on Google personalized search".

Thursday, February 22, 2007

Startupping launches

Mark Fletcher -- founder of two successful internet startups, Bloglines and ONElist-- has launched a new website called Startupping.

From the About page:
Startupping's goal is to help entrepreneurs, to take the mystery out of starting and running an Internet company and to share in the experience.
The site includes a weblog, discussion forums for asking questions, and a wiki with some introductory information and sample documents.

I have a couple paragraphs posted on the site today as part of Mark's "Best and Worst Decisions" series.

Update: VentureBeat summarizes startup advice from several entrepreneurs from a Y Combinator Startup School event. Worth reading.

Update: Paul Graham has some good advice in his recent essay, "Why to not not start a startup" (scroll down several paragraphs to where the numbered list starts).

Google Apps vs. Microsoft Office

Miguel Helft at the NYT reports on Google Apps in his article "A Google Package Challenges Microsoft". A brief excerpt:
Google is taking aim at one of Microsoft's most lucrative franchises.

Google Apps, combines two sets of previously available software bundles. One included programs for e-mail, instant messaging, calendars and Web page creation; the other, called Docs and Spreadsheets, included programs to read and edit documents created with Microsoft Word and Excel, the mainstays of Microsoft Office, an $11 billion annual franchise.
Don Dodge thinks the threat to Microsoft is overdone:
Google Apps is missing some fundamental features ... no Powerpoint ... offline usage ... privacy and security ... Users are very demanding and have become accustomed to powerful, intuitive features in Microsoft Office.

Office Excel, Word, and Powerpoint are world class. I have tried using Google Docs and Spreadsheets and it is a frustrating experience. Obvious features that you have come to expect just aren't there.
However, Paul Kedrosky makes good points about small businesses and the low end of the market:
There is a very large group of people out there who need basic apps/sharing/scheduling/email, don't want to have IT support, and are stone-petrified at the idea of installing Exchange.

Those people are not the Excel macro-using, Exchange-expert-paying, Fortune 500s. Matter of fact, most of the latter group will predictably sniff at Google Apps as a mere toy, useless for real world work.

We can all just watch ... [as] the toy-like G Apps chews steadily away at tiny, but growing pieces of MSFT's hide.
Many people have very simple needs with an Office suite. They mostly want to be able to read documents sent to them by others. They occasionally might want to write a letter, edit a document, or add a few numbers in a spreadsheet.

Right now, those people usually buy Microsoft Office. It's just the easiest thing to do. Get Microsoft Office like everyone else and, when someone e-mails you a document, you can read it. No fighting with the dang computer. It all just works.

Now, when someone e-mails me a document, I can just open it inside GMail. Now what is the easiest thing to do? Download the document, launch MS Office, and open it? Or click a link and view it in Google Docs & Spreadsheets?

I suspect a lot of this depends on how well Google Apps handle compatability with Microsoft Office. If there are even minor issues, the "no fighting with the dang computer" will take over, and most people will install Microsoft Office just to have it for the times when they really need it.

If the easiest path becomes Google Apps -- if it all just works -- Microsoft could see the low end of the Office market fall away to the effortless laziness of a Google click.

Update: Henry Blodget writes, "Disruption begins when a dominant market leader has built so much functionality into its core products that it has begun to over-serve its core customers. Some of these customers, realizing that a simpler, cheaper product will do, gradually abandon the old technology."

Wednesday, February 21, 2007

Combating tag spam paper

Gary Price points to a paper out of Stanford CS titled "Combating Spam in Tagging Systems" (PDF).

From the abstract:
As tagging systems are gaining in popularity, they become more susceptible to tag spam: misleading tags that are generated in order to increase the visibility of some resources or simply to confuse users.

We are interested in answers to questions such as: How many malicious users can a tagging system tolerate before results significantly degrade? What types of tagging systems are more vulnerable to malicious attacks? What would be the effort and the impact of employing a trusted moderator to find bad postings? Can a system automatically protect itself from spam, for instance, by exploiting user tag patterns?
The paper is heavy on theory, spending most of its time laying down frameworks for analyzing problems with spam in tagging systems. I particularly liked the idea of determining tagger reputation and reliability that is discussed in Section 4.2.

Though the paper is a good read, my personal view on tagging is that, rather than being a solution to web spam, it does little more than shuffle around the problem. Instead of sorting out spam from web pages and keywords extracted from those pages, now we have to figure out how sort out the spam from web pages and the tags associated with those pages.

In fact, I suspect tag spam is going to be an even harder problem than web spam. With web spam, spammers can only modify the pages on the Web they have access to. With tag spam, spammers can label any item anywhere.

Right now, tagging is not mainstream. With small audiences, the potential profit is too tiny to attract much attention from spammers. That is unlikely to remain true if tagging systems continue to grow.

See also the discussion on tagging and tag spam in my previous posts, "Questioning tags" and "Chris Sherman on social search".

Update: This paper will be presented at WWW 2007 AIRWeb in a couple days. The latest version (PDF), appears to have some changes.

I reread the paper recently and wanted to highlight a few other parts of the paper that may be of interest:
We propose an approach to tag search that takes into account not only the number of postings that associate a document with a tag but also the "reliability" of taggers that made these postings. In order to measure the reliability of a user, we define the coincidence factor c(u) of a user u ... [which] shows how often u's postings coincide with other users' postings.

Our hypotheses is that the coincidence factor is an indication of how "reliable" a tagger is. A high factor signifies that a user agrees with other taggers to a great extent; thus, the user's postings are more "reliable".

Overall, the existence of popular tags provides many opportunities for malicious users to misuse tags and spam searches. We hope that the model we have proposed here, and the results it yields, can provide useful insights on how to wage these ongoing "spam wars."
I emphasized an analysis of tagger reputation in this excerpt, but the paper also contains analyses of the impact of different types of attacks and a discussion of the value of human moderators.

On human moderators, the verdict of the paper appears mixed. At one point, it says that, in their simulations, even perfect moderators who only reject bad tags would have to review the tags on 5% of all documents to get a factor of 2 drop in spam. That's quite a lot of work for a relatively small gain. However, later, the authors say that moderators may be the best countermeasure for "focused spam attacks" that may defeat the coincidence analysis they use for tagger reputation.

The rise of wildly speculative execution

I enjoyed Tim O'Reilly's post with thoughts from Tim and others on the changes we may see with increased hardware parallelization.

Like them, I was struck by the news of an 80 processor chip prototype from Intel and wondered what changes it might cause.

However, as I think about this, I suspect the changes we will see will go well beyond increased use of threaded programming to parallelize one task or data mining frameworks like MapReduce.

So, if you do not mind, please indulge me while I go all wacky visionary with this post.

With hundreds of processors available on the desktop, I think we will be moving toward a model of wildly speculative execution. Processors will be used to do work that may be necessary soon rather than work that is known to be necessary now.

Modern, single core, pipelined processors already do this to a very limited extent. Speculative execution sends a processor down the the most likely path of a conditional branch, executing a few cycles of machine instructions that may have to be thrown away if the branch prediction was incorrect.

What I think we may see is a radically expanded version of speculative execution, running code seconds or minutes ahead of when it may be needed. Most of this work will be thrown away, but some will be available later just when it is needed.

It is easier to imagine how this might work for some tasks than others. For example, I could imagine a speech recognition engine running on your desktop that simultaneously runs hundreds of models analyzing and voting on what you have said and what you are about to say. I could imagine a computer immune system that was using a fraction of the processors to search for anomalous patterns in the usage of the rest of the hardware, growing in size as potential threats are detected, shrinking away as the threat passes.

I think our model for programming for many tasks may move from one of controlled, orderly execution of code to one of letting loose many competing predictive models executing in parallel.

In that sense, I think Larry Page was right when he said, "My prediction is that when AI happens, it's going to be a lot of computation, and not so much clever blackboard/whiteboard kind of stuff, clever algorithms. But just a lot of computation." Larry then went on to compare this vision of computer AI to how the brain works.

The brain is a giant pattern matching, prediction engine. On its 100,000,000,000 processors, it speculatively matches patterns, creates expectations for the future, competes potential outcomes against each other, and finds consensus. These predictions are matched against reality, then adapted, improved, and modified.

With a few hundred processors, we are a long way from the parallel processing abilities of the human brain. Yet, as we look for uses for the processing power we soon will have available, I suspect the programs we see will start to look more like the messy execution of the prediction engine in our head than the comfortable, controlled, sequential execution of the past.

Update: Nine months later, Andrew Chien (Director of Intel Research) says something similar in an interview with MIT Technology Review:
Terascale computing ... [is] going to power unbelievable applications ... in terms of inference. The ability for devices to understand the world around them and what their human owners care about is very exciting.

In order to figure out what you're doing, the computing system needs to be reading data from sensor feeds, doing analysis, and computing all the time. This takes multiple processors running complex algorithms simultaneously.

The machine-learning algorithms being used for inference are based on rich statistical analysis of how different sensor readings are correlated, and they tease out obscure connections.
[Chien interview found via Nick Carr]

Tuesday, February 20, 2007

eBay to launch personalization?

It appears that eBay is finally going to implement some personalization and recommendation features. From a NYT article by Brad Stone:
[eBay CTO] Matt Carey, the former chief technology officer of Wal-Mart ... said he was working on building computer systems that can look at customers' past purchases and make educated assumptions about what they might be looking for.
When I worked on Amazon Auctions back in 1999, I implemented a nifty recommendation feature for that site. It was a feature I was not supposed to be working on -- it had been cut from the required features for launch -- but it was too cool not to do, so I built it anyway.

The feature was pulled on the demands of a SVP a few weeks after launch. It was yanked not because it was not effective -- it easily and overwhelmingly won in A/B tests -- but because it generated complaints from sellers who did not like competitors' items shown next to theirs.

I still think the feature was the right thing for buyers and, in the long term, the right thing for sellers. Seven years later, it will be interesting to see what happens as eBay experiments with recommendations.

Update: By the way, don't miss the details on how eBay's systems have evolved over the years in this 2006 talk on eBay's architecture.

Update: Seven months later, a BusinessWeek article claims eBay is "trying to make [the shopping experience] simpler, more personalized, and more relevant" and that eBay will "recommend items to users based on their shopping history and the shopping histories of other people like them."

Google class at UW CS

Laurie Burkitt at the Seattle PI reports that Googler Christophe Bisciglia is directing a "Google 101" class in the Computer Science department at University of Washington. The class explores programming and problem solving on massively parallel distributed systems (such as the one Google happens to have).

The class is actually called CSE 490h. The lecture slides and class tutorials are publicly available.

Of particular interest may be Lecture 3 on MapReduce (PPT), the notes in "Introduction to Distributed Systems" (PDF), and Lecture 4 on GFS and distributed systems (PPT).

Also, the tutorial on Hadoop and the Hadoop mini-wiki may be useful for those who might want to try some of this out on their own cluster using the open-source clone of MapReduce, Hadoop.

Looks like a fun class!

[Seattle PI article found via John Cook and Barry Schwartz]

Sunday, February 18, 2007

Talk on A/B testing

Two former Directors of the Personalization group at, Ronny Kohavi and Matt Round, jointly gave a talk called "Front Line Internet Analytics at" (PDF) at EMetrics in 2004.

I think the talk is most likely to be of interest to people starting on slide 10 when it goes into details about's A/B testing framework and the results of past tests.

By the way, Ronny Kohavi is now the GM of Microsoft's Experimental Platform. That group seeks to "enable product groups at Microsoft and developers using Windows Live to innovate using controlled experiments with live users."

Saturday, February 17, 2007

Relevance at Google, Yahoo, and Microsoft

Gord Hotchkiss at Search Engine Land has an interesting post where he summarizes his conclusions after interviewing members of the usability groups at Google, Yahoo and Microsoft.

Some selected excerpts:
At Google, relevance is a religion. The Golden Triangle is sacred. Nothing can appear here if it's not, in Google's judgment, absolutely relevant to the user.

Both Yahoo and Microsoft said similar things, but in each case, the importance of relevance was always counterbalanced with other factors. At Google, relevance is the only thing that mattered.

The urge to monetize is winning the battle at Yahoo. Yahoo was by far the most aggressive both in terms of how often top sponsored ads were shown, and how much [space] was devoted to them ... Relevance, at least in terms of the user's expectation, was pushed down the page.

The religion that is relevance at Google is not found to the same extent at Microsoft ... Relevance does not define the core purpose of Microsoft ... Microsoft approaches the user experience as one of a number of best practices ... It's not that relevance isn't important at Microsoft. It's just that it's treated more as a business objective than a sacred cow.

I believe its Google's obsession with relevance that makes the difference. Relevance drives a better user experience, which drives market share, which drives monetization. Google seems to have the best understanding of this basic cause-and-effect relationship in the search marketing ecosystem.
I think the history of MSN and Yahoo as portals may be part of the explanation here.

Google has never had a portal. Google has always focused on getting people the information they need quickly and then sending them off to other sites to do whatever they need to get done.

Microsoft, AOL, and Yahoo have large websites with a lot of content. They traditionally have wanted to capture people on their properties and keep them there. Because of this history, I suspect Microsoft and Yahoo may still be conflicted about whether their goal in search is helping people get what they need quickly or capturing audiences for their sites.

Gord's thoughts on where Google, Yahoo, and Microsoft may be going with search are also interesting:
Google is heading the push towards greater personalization of search results. It's an essential next step to reach the ultimate objective, which is to give their user exactly what they're looking for at any given time.

Expect Yahoo to start exploring ways to thread ... individual user experiences together with search as the common element. They'll look for ways in which Yahoo's community can help enhance the search experience and provide social context to it. How this will be accomplished as a little cloudy at this point.

Microsoft's approach to the future of search seems to be the most fragmented of all the three major players .... [but] I would expect Microsoft to start using the vast amount of data that's available to them in the form of click stream behavior across all their properties, and begin to use this not only as an advertising targeting opportunity, but also to help improve the relevance of search results. From this, further personalization is the logical next step.
Google is focused on using better algorithms, big data, and personalization to improve relevance. Yahoo is focused on social search, which is poorly defined but seems to mean building tools to help humans help humans find what they need. Microsoft is scattered, but probably will stumble toward personalization at some point. All sounds about right.

For more on these topics, see also some of my earlier posts, "Google dominates, MSN Search sinks", "Google expands personalization" and "Social software is too much work".

Update: There is a fun debate on search personalization going on in the comments for this post. Don't miss it.

Google scalability conference in Seattle

Amanda Camp announces on the Google Research Blog that Google will host a new conference on scaling massive systems here in Seattle on June 23.

Registration is not open yet, but I hope to be able to attend.

Update: Registration is now open. Unfortunately, I cannot attend (the conference overlaps with Foo Camp). Too bad, the talks look good.

MSN Digg clone even easier to spam than Digg

An amusing post by Joost de Valk describes how he was able to trivially spam MSN's Dutch Digg clone, MSN Reporter, and get his content featured on the site.

See also my earlier posts, "Spam is ruining Digg" and "Digg struggles with spam".

[Found via Gary Price]

myFeedz from Adobe Labs

Gary Price posts about myFeedz, a "social newspaper" offered by Adobe Labs.

From the myFeedz About page:
myFeedz finds what's important from the sea of information out there ... It learns from what you like.

myFeedz uses artificial intelligence techniques to show you personalized news about topics you are interested in.

Some of the things used to create an article's importance are its source, tags, popularity, rating, language and more.

Your profile (tags, feeds and reading history) is then also taken into account to determine how important that article is for you.
The site was not working when I attempted to use it -- a search for [google desktop vista] and other pages I attempted to access were timing out all day today -- but I still think it is worth a mention given that it is at Adobe. I certainly was surprised to see something like this coming out of Adobe Labs.

Friday, February 16, 2007

Personalization, localization, and remote storage

Googler Matt Cutts -- when asked "Where do you see Google going in the next 3-5 years?" -- said:
In my own opinion – personalization, and localization. Also if you have your data, you can store it at Google.

Google's ambition to organize the worlds information, this is really where it's going.
See also my previous post, "Google expands personalization".

For more on local search, see also a couple of my previous posts -- "Local search is hard" and "Newspapers should own local". Don't miss the comments by former Director of Yahoo Local Ali Diab on the first of those posts.

For more on storing all your data at Google, see Philipp Lenssen's post, "Google Gdrive Client Leaked", and my post, "In a world with infinite storage, bandwidth, and CPU power".

Monday, February 12, 2007

Saturday, February 10, 2007

Can community help with data extraction?

Raghu Ramakrishnan from Yahoo Research gave a wide-ranging talk at University of Washington Computer Science titled "Community Systems: The World Online".

The lightweight talk covered topics from social search to advertising, but the primary focus was on using user communities to improve and extend information and structure extracted from the Web.

The motivation here is that search would be more useful if we had structured data (e.g. a book has a title, price, and number of pages) and understood relationships between data (e.g. an apple is a type of fruit). We may be able to extract that data from the Web (e.g. the KnowItAll project), but that extraction is noisy and unreliable.

So, what are we to do? Perhaps we can learn from people's actions and behavior. Not only is their much implicit information in the clickstream trail of what people do on the Web, but also millions of people surprisingly seem willing to help do a fair amount of work on the Web to improve data (e.g. Wikipedia, ESPGame, Yahoo Answers) with only token rewards.

The main idea here then appears to be to start by extracting structured information from the web, then build tools that make it easy for people to improve the data. Give the community a starting point from which to work, then make it easy for them to build and improve.

Raghu mentioned the DBLife project at a few points in his talk. The project page is not all that informative, but there is a little more information in a CIDR 2007 paper (PDF).

Better understanding through big data

An interesting tidbit from Googler (and recent ACM Fellow) Peter Norvig in an interview with Matt Marshall:
The way to get better understanding of text is through statistics rather than through hand-crafted grammars and lexicons.

The statistical approach is cheaper, faster, more robust, easier to internationalize, and so far more effective.
This reminds me a bit of what Peter said in some of his recent talks:
Rather than argue about whether this algorithm is better than that algorithm, all you have to do is get ten times more training data. And now all of a sudden, the worst algorithm ... is performing better than the best algorithm on less training data.

Worry about the data first before you worry about the algorithm.

Having more machines is a very important part because it allows us to turn around the experiments much faster than the other guys ... It's the -- gee, I have an idea, I think we should change this -- and we can get the answer in two hours which I think is a big advantage over someone else who takes two days.
Learning from big data is what Google's infrastructure was built to do. It is particularly obvious in their results in machine translation, but also impacts everything they do, from search to ad targeting to personalization.

The rest of the interview is worth reading. Unfortunately, it is appended to yet another hype piece on vaporware from PowerSet, but just ignore the first part of the article and get to the good stuff at the end.

Update: Matthew Hurst has some interesting thoughts after reading Peter's interview:
The huge redundancy in ... documents suggests approaches to serving the user that don't require the perfect analysis of every document.

The basic [paradigm] of text mining ... the one document at a time pipeline ... is limiting. It fails to leverage redundancy ... [and assumes] that perfection is required at every step.

The key to cracking the problem open is the ability to measure, or estimate, the confidence in the results ... Given 10 different ways in which the same information is presented, one should simply pick the results which are associated with the most confident outcome - and possibly fix the other results in that light.
Update: I also liked the challenge Matthew Hurst described in this later post:
A more sophisticated search engine would be explicit about ambiguity (rather than let the user and documents figure this out for themselves) and would take information from many sources to resolve ambiguity, recognize ambiguity and synthesize results.

Thursday, February 08, 2007

Yahoo Pipes launches, goes down

Yahoo Pipes launched yesterday to much fanfare ([1] [2] [3] [4] [5] [6] [7]) and then immediately went down.

The site now displays nothing but the text, "Our Pipes are clogged! We've called the plumbers!"

Yahoo Pipes does sound pretty cool from what I can tell from the gushing write-ups. I am eager to try it too, though I have to say that crashing immediately after launch under load is not encouraging for those of us who might be interested in building something on top of it.

See also Dare Obasanjo's post, "The Problem with Web Scale: Yahoo! Pipes", for some interesting thoughts on why Yahoo Pipes went down and why it make take quite a bit of effort to unclog it.

See also my previous posts, "Lowered uptime expectations?" and "The folly of ignoring scaling".

Update: I have had a chance to play with Yahoo Pipes now. I think the concept is great and the UI gorgeous, but the actual capabilities currently fall short of what I expected from the other reviews I saw.

It appears that Yahoo Pipes is limited to combining and filtering feeds in a pretty simple way. For example, you can take a feed, then exclude every item in the feed that does not match a keyword string. Or you can combine five feeds, then re-sort them based on publication date.

You cannot rip apart the data in the feed in any complicated way. For example, I was interested in taking all the keywords for all the titles and descriptions in a feed, counting them, then sorting them, truncating to the top keywords, then finally running a Yahoo search against those top keywords that is restricted to a specific website. That is not possible. It is not even possible to take a feed, hook up a content analysis module, then run a Yahoo search against the results of the content analysis, since there seems to be no way to extract the y:content_analysis field from the items.

These more complex features may be added with time but, given the scaling difficulties Yahoo Pipes has had, I suspect I shouldn't hold my breath. For now, it appears Yahoo Pipes is limited to fairly simple combining and filtering of feeds.

Overall, I would say that Yahoo Pipes sure is shiny and pretty, but not that dissimilar from FeedShake, FeedDigest, and other RSS mixing services that are already available. It certainly is not the mashup revolution that some gushingly proclaimed, at least not yet.

Google Book Bar and News Bar

Philipp Lenssen reports on two new little widgets from Google, the Book Bar and the News Bar. Both are pieces of Javascript that let you show on your website a selection of books or news based on a keyword search.

The Book Bar reminds me of the Recommended Product Links of Amazon Associates. Using that, you can also show books (and other products) that match specific keywords or categories. Unlike the Google Book Bar, Amazon's widget pays you for sales generated through the widget.

The News Bar reminds me of the News Widgets (category and zip only) or the keyword search and other widgets of Findory Inline.

Looking at this, what I think would be really cool would be if Google's Book Bar and News Bar self-optimized based on usage of the widget, much like Google AdSense does.

I could imagine two ways of doing that. One would be to adapt to usage of the widget on a particular site. For example, if I put Google Book Bar on my site, the books picked might depend on what books people click on in the Google Book Bar on my site as well as using aggregate information across all sites.

Another would be to personalize the selection of books or news articles to interests of each viewer based on their individual history. For example, the items in your Google search history could influence the news items you see if you view the widget on some weblog. The news personalization in Google Sidebar (part of Google Desktop Search) already does something much like this, featuring news articles based on your past behavior. Amazon Omakase is a good example of a books widget that makes recommendations based on your past behavior, in this case based on your purchase and clickstream history at

Findory has some examples that approach some of these ideas as well. For example, the book ads in the upper left corner of the Findory home page are targeted based on each reader's history as well as the content of the page. Findory Inline allows you to "personalize" the selection of articles shown in the Javascript widget, though that personalizes based on your reading history on Findory, not the behavior of everyone (or to each individual) who sees your widget.

Sunday, February 04, 2007

Google expands personalization

Googlers Sep Kamvar and Marissa Mayer post on the Official Google Blog about expanding Google's personalization and recommendations.

Some extended excerpts:
We have two main ways of personalizing your Google experience. First, you can customize products and services like the Google Personalized Homepage. Personalizing your homepage gives you the at-a-glance information that you care about -- such as your latest Gmail messages, news headlines, or to-do list -- right at your fingertips, just the way you want it.

Second, we offer automatic personalization through things like personalized search and recommendations. Our goal with these types of technologies is to make your Google search experience better based on what we know about your preferences, without you having to do any extra work.

Today, we're taking another step toward making personalization more available to you by combining these two into a single signed-in experience. Now, when you're signed in, you'll have access to a personalized Google -- one that combines personalized search results and a personalized homepage.

Keep in mind that personalization is subtle -- at first you may not notice any difference. But over time, as the search engine learns your preferences, you'll see it. For example, as an avid Miami Dolphins fan (no joke), searching for [dolphins] gives me info about my favorite football team, while a marine biologist colleague gets more information about her salt-water friends.
The change, as described on this Google help page, is that anyone who signs up for a Google Account will be given a search history, a personalized home page, and individually personalized search results based on their past behavior.

Before, everyone had to explicitly enable search history and personalized search to get those features, so very few did. This change of the default will mean that many more people, anyone who has used a feature that requires a Google account, will see personalized web search results and the other personalization features when they go to Google.

See also Danny Sullivan's detailed review of the Google personalization features and his thoughts on the impact of making them more easily and widely available.

See also Eric Schmidt and Larry Page's comments on personalization from their recent Q4 conference call.

See also some of my previous posts ([1] [2] [3] [4] [5]) on Google's personalization.

See also my earlier post, "Potential of web search personalization".

Friday, February 02, 2007

Mockups of search engine evolution

Philipp Lenssen has some fun mockups of what Google might look like in the future as they add more artificial intelligence such as better text summarization and knowledge extraction, user modeling to determine intent, and personalization based on past behavior.

Amazon trying to be too much?

Paul Kedrosky posts about Amazon's quarterly numbers. I particularly agree with this part:
This is going to be a tough year for Amazon.

It is multiple companies in one -- an online retailer, an analytics company, a web services firm, etc. -- and other longer it tries to be all things (it is up to 38 categories in retail alone) the more likely it becomes that the eventual disentanglement of all these overlapping commitments becomes unpleasant to all concerned.
See also my earlier post, "Innovation and learning to love destruction", where I said:, for example, has 63 links on their "all product categories" page, a confusing mess that paralyzes anyone looking for a book or DVD with irrelevant and useless choices.

Why do all these continue to exist? Why do Auctions and zShops hang around for years after they failed to attract an audience? Why do detail pages accumulate more and more "exciting new features" until I cannot find the customer reviews anymore under the sea of crap?
See also my previous post, "Doubling down at", about Amazon's commitment to web services and the reaction from Wall Street.

Thursday, February 01, 2007

Google Q4 2006 call and personalization

There are a few brief mentions of personalization in the Google Q4 2006 Earnings Call. First, Eric Schmidt said:
Our commitment is to provide the most relevant information across the board in a personalized and targeted way across any device.

What you want, when you want it and how you want it.
Later, Larry Page said:
We're very excited about personalization ... We're starting to really get very healthy usage [of the Google Personalized Home Page] ... I think we've got a lot more growth in store.

Also the quality improvements we get with personalized search are also quite significant.

So I think overall, I think we're very excited about personalization.
Larry also suggested that Google's personalization and recommendations features will be more visible and more widely applied in the future.

[Thanks, Jeremy, for pointing out the transcript]