Thursday, March 31, 2005

A relevance rank for news and weblogs

If you use a feed reader like Bloglines, there must have been at least a few times you've looked at the overwhelming pile of unread articles with a sigh. So much to read.

All feed readers organize the articles in the same way. They group the articles by feed and sort the articles by date. So, you go through, click on each feed, skim the articles, and slog on through.

"Wouldn't it be nice," you've probably thought, "if these articles were sorted by relevance? Maybe the most important articles at the top and least important at the bottom? Then I could just read the articles from top to bottom, stopping when I get bored or run out of time."

That would be nice. But, what does it mean? What's the most relevant news?

Let's explore it. What if all the articles from the news and weblog feeds were sorted by how many people read them? The more people have read an article, the higher in your list of unread articles.

Hmm... That might help, but it'd be ordered by popularity, not relevance. Yahoo News has an example of ordering news by popularity. You can see that it tends toward the sensationalistic and tabloid. It pulls you toward the mainstream and away from the long tail. That's the wrong direction, folks. You want interesting and useful, not bland and mediocre.

Okay, if it's not most popular, what is the most relevant news?

Maybe the problem is that we're defining popularity too broadly. Does it matter to me if a teenage surfer chick thought an article with rumors of Britney Spears' pregnancy was really awesome? Not in the slightest. Does it matter if one of my computer geek friends really enjoyed an article on the upcoming MySQL 5 release? Yes, that does matter.

So, perhaps relevance is what people like me like. Okay, so I'll just list hundreds of people I know who are like me, get them all to use the same feed reader, and then... oh, shucks, that's never going to happen, is it?

Fortunately, it doesn't have to. We can find people like me, people I don't even know, automatically and anonymously using some clever algorithms. Put that computer to work, I say.

Great! Now we know how to sort news by relevance. We take all the news and sort by what people like me like. So, why isn't anyone doing this? Well, someone is doing it -- and doing it quite well, I might add -- but why isn't anyone else?

Well, it's hard. Really hard. Maybe I made it sound easy, but the devil is in the details. For example, the most interesting articles for a subgroup isn't actually the same as the most popular; it's a little different, and that's just one of tens of spots where you can trip up and hork the quality of the relevance rank. These "clever algorithms" I mentioned can be really expensive; doing this at scale for millions of readers requires a lot of careful thought. News is perishable -- old news is no news -- so you better find a good solution to the cold start problem. And the list goes on. It's not easy.

But it's got to be done. It takes too long and too much effort to use the current generation of feed readers. To break into the mainstream, next generation feed readers will have to sort articles by relevance.

Search engines and blog search

Gary Price posts about IceRocket's new weblog search. It's a good example of how web search products easily can be extended to offer blog search as well.

How long before the search giants offer weblog search? Ask Jeeves already supports it through Bloglines (and, very soon I'd expect, through their main page).

What about Google, Yahoo, and MSN? What's stopping them from launching blog search?

Profile of

Alarm:clock posts an interesting profile of, the popular social bookmarking site.

See also the recent Slashdot article on, an open source competitor to

Piling on Google

Om Malik rips Google and praises Yahoo.

Ben Hammersley at the Guardian claims Google "has been overtaken" and that "Yahoo is the new Google."

Danny Sullivan responds, rebutting many of the points and providing some much needed perspective.

Wednesday, March 30, 2005

Desktop search should not exist

Rob Pegoraro at the Washington Post reviewed desktop search tools. He ends his article by saying:
    These programs also shouldn't exist: Their capabilities should be built into the operating system, something both Microsoft and Apple are working on.
Exactly right. The opportunity for third-party desktop search apps exists because the Microsoft Windows file search is pitifully weak. As soon as Microsoft corrects this flaw, the opportunity will evaporate, as will the numerous also-ran desktop search apps.

[via Marc Orchant]

Tuesday, March 29, 2005

The best of the old and the new

Dan Gillmor posts that newspapers are "trapped -- highly profitable businesses that can't or won't take the kind of risks that will be crucial to survival." He's looking for "ways to combine the best of the old and the new" to help newspapers and ensure a vibrant future for journalism.

Dan's right that newspapers are trapped. Newspapers used to have a local monopoly on distribution of news, local advertising, and classifieds that generated extraordinary profits. Online sites have created a disruptive new distribution channel. Some newspapers are responding by grasping for their fading monopoly -- throwing up barriers to online and restricting access to their content -- but that will only cause it to slip away faster.

Instead, newspapers should focus on their advantage, a unique understanding of their local market. No one can do local reporting better than newspapers. No one has more experience dealing with local advertisers than newspapers. By focus on their competitive advantage, newspapers can benefit from new distribution channels.

For example, the San Jose Mercury News should be the world experts on Silicon Valley and Silicon Valley businesses. The Seattle PI should be the world experts on the Puget Sound, including Boeing and Microsoft. Newspapers should focus on what they do better than anyone else. They should embrace the new distribution channels to provide their content to the world.

I'm not alone in believing this is the future. Tom Curley (CEO, AP) made a similar prediction at the Online News Association Conference a few months ago.

So, how can the online media help newspapers? Online news aggregators like Findory provide one example. Findory shows short excerpts of news content from thousands of sources. We direct readers to the newspaper websites to read the full articles.

We see ourselves as having two customers, the readers of our sites and the providers of our news. We help our readers find interesting and relevant news. We help newspapers find readers for their content. We are new media helping old media and old media helping new. The combination makes it easier for readers to get the information they need.

Monday, March 28, 2005

Another day, another deal

Michael Bazeley reports that Google acquired Urchin, a web analytics company.

The deals are coming fast and furious lately. Gannett, Knight-Ridder, and Tribune acquired IAC acquired Ask Jeeves. Ask Jeeves acquired Bloglines. Yahoo acquired Oddpost, Farechase, and Flickr. MSN acquired Lookout. Google acquired Keyhole and Picasa. Looksmart acquired Mamma acquired Copernic. Six Apart acquired LiveJournal. Whew!

Update: SiliconBeat reports on two more Google acquisitions, Zipdash and Where2.

Sunday, March 27, 2005

Personalized TV advertising

Lorne Manly at the New York Times writes about the future of television ads, personalized advertising:
    The television commercial -- a blunt instrument that often reaches as many disinterested people as desired ones -- is beginning to behave like a smarter version of direct mail. Ads can be customized, not just by neighborhood, but ultimately by household and perhaps by viewing habits.

    If you don't own a dog, you won't be bombarded by ads for Puppy Chow or Iams. If the technology determines that the man of the house has wrested control of the remote from his teenage daughters, he will not have to sit through feminine hygiene ads during the most popular network shows.

    Their ultimate goal is ... to make ads more relevant to the lives of viewers, so that they might just stick around and watch. Instead of commercials being an annoyance, they become information a viewer needs, perhaps even craves.
I've seen two major trends lately as businesses try to improve the effectiveness of advertising. Some are making ads more intrusive and more obnoxious, hoping to make them more difficult to ignore. Others are making ads more relevant, hoping that people will find them useful.

If you view advertising as being propaganda -- trying to hoodwink you into buying something you don't want or need -- then this new trend toward targeting may be disturbing. After all, targeted, effective propaganda is worse than untargeted, ineffective propaganda.

But if you view advertising as information -- useful information about products and services you don't know about and might want -- then this new trend toward targeting will seem positive to you. It will waste less of your time and help you get the information you need.

Friday, March 25, 2005

Behind the scenes at Findory

Sometimes the impact of our work at Findory is obvious. When we launch a new feature, such as source pages, everyone immediately can see what we've done.

But sometimes our work is more subtle. Most of our work this month was on the backend systems. Sure, you might notice that the site is a little faster, but the effect is subtle, especially since we already served our personalized pages in under 100 ms. If you were very observant, you might notice that we partitioned off our RSS traffic to new servers, but since it all just works like it's supposed to, you probably wouldn't notice. And that's okay. We like it when it all just works.

But work on the backend systems can be satisfying. We substantially improved the quality of the unpersonalized Findory front page recently. Sure, Findory is about personalized news, but we want newbies to Findory to immediately see interesting and useful content, and our latest improvements will make Findory more attractive to new users. It's satisfying when we make a change that we think helps our readers find the news they need.

We completely rewrote our deployment system. When you have a cluster of servers, you want to be able to image the servers and put new content on them quickly and easily. This allows you to scale easily and rapidly as new traffic floods in. Our old system was a little too manual and unreliable for our taste, so we made it nice and bulletproof.

Ahh, yes, deployment systems. Both Alex and I have done this before. Way back in 1997, ran on one big piece of iron, a multiprocessor DEC Alpha box. It might be hard to believe now when clusters are seen as the obvious solution, but, in 1997, web server clusters weren't all that common. I worked on a team of two to split Amazon's website across a cluster of four boxes, a job that mostly involved ferreting out assumptions that there was only one box, but that also included writing the deployment system to push and pull content from the boxes. It was the first step toward Amazon's current cluster architecture, and it was exciting to be on the bleeding edge of it. More recently, Alex wrote a deployment system for some of Amazon's famous web services. We know scalable systems, and it's fun to be building them again at Findory.

Finally, we've been fielding a lot of interest in Findory from the search giants, newspapers, and VC firms. Perhaps some of this is that the exponential growth we've been seeing is now visible in less accurate sources like Alexa. Perhaps some of this is the growing realization that the future of news is personalization. Either way, we're excited to be part of that future.

Early peek at Yahoo 360

A few reviews are starting to come in from a closed beta of Yahoo's upcoming social networking and blogging site, Yahoo 360. Charlene Li says:
    Central to the whole service is the concept that you want to communicate and connect with the people that you already know, rather than try to meet new people. To this end, your home page on the service shows the most recent content published by people within your network ... In essence, the content is being pushed to you by the service.
Curiously, it sounds a lot like Amazon's About You pages.

But I think Yahoo is on to the right idea. Social networking is needs a purpose. It has to be about more than just building a network. I'm not sure whether sharing Yahoo content is a compelling enough purpose, but at least it has a purpose.

Some reviews of Yahoo 360 seem to say that it doesn't pass the "Would your mother use this?" test. Danah Boyd said, "The controls are really overwhelming" and "I'm really worried about the novice user."

Complexity is a problem with a lot of social networking sites, and it's too bad that Yahoo didn't solve this problem. Ideally, using the site should be fun, effortless, and feel like play, not work.

Update: Looks like invites are flying for Yahoo 360. I spent some time on it. It's like some combination of Orkut, MSN Spaces, and Amazon's About You pages.

I'm enjoying playing with it, but I am finding setting it up to be a lot of work, perhaps too much of a startup hurdle for many users. It's got nice integration with some blogging and messaging tools. We'll see if that's enough of a purpose to keep people coming back.

Seeing Yahoo 360 makes me think about missed opportunity for both Amazon and Google. Amazon has had About You pages for 4-5 years but didn't expand them further into a general social software tool. Google has had Orkut and Blogger for a couple years, but we've seen little additional innovation from them.

Update: Two and a half years later, Yahoo 360 shuts down.

Thursday, March 24, 2005

Personalization is hard. So what?

Philipp Lenssen lists some problems with doing personalized search. Boiling it all down, Philipp is saying that personalized search is hard to do well and can't ever be perfect, so it's not worth doing.

He's right that personalization is hard. It has to work from noisy, sparse information. It has to deal with changing preferences. It has to avoid pigeonholing. It has to make good predictions. And it has to do it all in real time for millions of users.

And he's right that it can't ever be perfect. Personalization is so hard that it's going to make mistakes. Probably a lot of mistakes.

Does this mean personalization is useless? Of course not. Personalization doesn't have to be perfect. It just has to be good enough.

If personalization helps people find what they need faster on average, it's a win. It doesn't have to be right all the time. It just needs to be helpful.

Take one of the examples from Philipp's critique, someone who searches for the single word "restaurant". If those results aren't targeted to a best guess at your location, they're completely useless. Try it on Google and look at the top results. In the vast majority of cases, it would be more helpful to emphasize your local restaurants than return the generic results. Personalization would be helpful.

Or let's take's personalization. Amazon's personalization is far from perfect, but it doesn't have to be. A generic storefront emphasizing top sellers is much less useful to you than a storefront emphasizing mostly products you like. When Amazon's personalization guesses wrong, it shows you something you didn't want, which is what the generic storefront would have done anyway. In general, their personalization is helpful.

Personalization is hard. Personalized search will make mistakes. And that's okay. It doesn't have to be perfect. It just has to be helpful.

Personalized search at PC Forum

Dan Farber at ZDNet reports on the search panel at PC Forum. Some excerpts on personalized search:
    Google's [Marissa] Mayer is looking at providing more personalization capabilities. "We don't know how to do [personalization] well, so we are starting with baby steps, such as knowing where you are as a context."

    "We need to get better not at doing searches, but at providing answers people are looking for, " [said Marissa.] "There will be a day when ten HTML links regardless of who you are is not the answer any more." She also said that the idea of everybody getting the same search result isn't reasonable.

    Mayer [said] that Google's goal isn't to force users to have to think about search ... One of the Google principles is that it your "mother" can't figure out how to use a feature, it shouldn't be released.
Google has it exactly right. Personalization is hard. Figuring out how to do it well isn't obvious or easy. But it has to be done to get further improvements in relevance rank. It has to be done to help people find what they need.

Udi Manber (CEO, A9) appears to have a different view. Udi is quoted as saying, "People will learn to use search better but have to invest the thinking -- we are not in the mind reading business."

I disagree. Searchers will not do extra work unless they see immediate, extraordinary, and obvious results from doing more work.

People are lazy, appropriately so. If you need to read minds to prevent them from having to do work, well then you better read minds. They'll think it's your fault, not theirs, if you don't give them what they need.

Dan ends his article with this prediction:
    The last five years of search history was more about monetization and continuity than delivering more relevance and personalization ... The next five years might yield more in the way of personalized answers.
[ZDNet article via Gary Price]

Update: Turns out the transcript for this talk is available as a PDF file after free registration on the site. Interesting reading. [via Paul Kedrosky]

Update: Four years later, Udi Manber (now at Google) appears to have changed his mind, saying, for example, that if users can't spell or don't know how to search, it is Google's problem.

Writing code at Google

Joe Beda posts some interesting details about the development environment at Google. Some excerpts:
    It is really easy to look at and contribute to code in other projects without having to talk to anyone.

    Teams are actively encouraged to share the most intimate details of their projects with the rest of the company.

    When someone comes up with a new idea, the most common response is excitement and a brainstorming session. Politics and who owns what area rarely enter into it.
This is the way it should be. Everyone should constantly be seeking to improve everything. There should be no sacred cows. Assumptions should be questioned. Ideas, prototypes, and code should come from everywhere. Nothing is so good that it can't be improved.

Wednesday, March 23, 2005

Sticky, sticky TiVo

Thomas Hawk discusses an article by Jeff Macke on the TiVo/Comcast deal. The key point is that TiVo customers are exceedingly loyal, reducing costly churn:
    TiVo users ... are sticky -- apparently very sticky. For the fiercest TiVo owners they treat their TiVo like a religion ...

    Churn costs cable operators and satellite companies money. Basically it costs them about $800 to get you as a customer in the first place.

    The churn rate for DirecTV/TiVo users is .2% per month. "That's one full percentage point lower than the levels of churn for Comcast DVR customers ..." Reducing churn [by] 1% ... would save about $60 million per month.

    [Comcast] recognize[s] that their own ... engineers have come nowhere close to matching TiVo's offering.
TiVo has an excellent user interface and novel features like recommendations. These features make the experience on any other PVR seem hollow and pathetic, making switching costs higher and churn lower.

Gannett on personalized news

In the press release for the investment, Jack Williams (SVP, Gannett) said:
    People want the news that's relevant to them - where they live, the business that they are in, the topics they care about.
Jack says people should get the news that is relevant to them. He thinks each person should see news on the topics and issues they care about. Jack says people want personalized news.

Seems like Gannett is joining Reuters and AP in believing personalized news is the future.

Tuesday, March 22, 2005 acquired!

Katharine Seelye at the New York Times reports that Gannett, Knight-Ridder, and Tribune are each taking a 25% stake in

Congratulations to Rich Skrenta and the rest of the team! focuses on fine-grained categorization of news by subject and location. They combined a broad news crawl with a clever, automated, classification system. Topix has hundreds of thousands of categories including narrow categories like nanotechnology and small little towns like Sequim, WA.

While it may be a little surprising to see newspapers making this deal, mainstream newspapers have been struggling mightily to deal with online news. This deal has substantial benefits.

Gannett, Knight-Ridder, and Tribune gain a well-differentiated news aggregator to compete against Yahoo News and Google News. The acquisition also gives them a powerful tool for enhancing their own news sites with related content and customization features using Topix's fine-grained subject categories. Finally, Topix has developed some expertise in targeting advertising to news -- something newspapers desperately need -- and Topix can transfer those skills to its new partners.

Great news, Rich. Congrats again!

Tony Gentile deserves credit for breaking this story on rumors earlier today. Also see the comments from Gary Price, Susan Mernit, John Battelle, Michael Bazeley, and Mike Masnick.

Update: Rich Skrenta (CEO, posts about the investment deal.

Update: There seems to be a fair bit of confusion about the valuation received, in part created by Bambi Francisco first reporting only that "the funding was less than $5 million" but then updating her article to say:
    The three newspapers bought 75 percent of the company for a market valuation below $100 million, but north of what venture capitalists valued Friendster, according to someone close to the deal. Friendster ... received a market valuation of $50-plus million.
My interpretation of this is that was valued between $50-100M for the buyout and also received $5M in cash for expansion and future operations. Very impressive.

Update: A year and a half later, raises another $15M in funding to add additional fuel its impressive growth. Very cool.

The network is just the beginning

Bill Burnham has an excellent post on the problems with a lot of social networking tools:
    Social networking companies face several inherent challenges. The first is ... "input-output asymmetry" ... that in order to get utility out of a social networks, users must first invest significant amounts of time setting up and maintaining their networks .... They are neither autonomous nor self-healing.

    Social networking is inherently an intermittent and dynamic activity. Without some kind of application to force the regular use and maintenance of such networks, pure play online social networks are destined to become as stale and appealing as two week old bread.
Bill's argument is against networking just for networking. For example, on Friendster and Orkut, there is no goal. You go there, list your friends, and then... what? Once you have your network built, what do you do with it?

Instead, Bill says, focus on an application. A social network should be a tool that helps accomplish some other goal. Building a network should be a means to an end, not the end itself.

See also my earlier post, "What has become of Orkut?" and another post by Bill, "Earth to Friendster: We have a problem".

Monday, March 21, 2005

AOL + Kayak = Pinpoint Travel

Chris Sherman reports that AOL took a minority stake in Kayak, a travel metasearch startup, and partnered with them to launch a new site, Pinpoint Travel.

Sunday, March 20, 2005

Ask Jeeves to be acquired?

Geraldine Fabrikant at the New York Times reports that:
    IAC/InterActiveCorp ... is close to an agreement to acquire Ask Jeeves Inc. ... for about $1.9 billion.

    IAC/InterActive owns ... Expedia, Ticketmaster, Home Shopping Network, and CitySearch.
There's been a lot of deals lately. Ask Jeeves itself acquired Bloglines just a few weeks ago. Yahoo acquired Oddpost, Farechase, and Flickr. MSN acquired Lookout. Google acquired Keyhole and Picasa. Looksmart acquired Mamma acquired Copernic. Six Apart acquired LiveJournal.

But this would be a much larger and riskier deal. While there's a certain logic to combining Ask and IAC, the business literature shows that the vast majority of mergers of large companies either destroy value or fail to create value. Promises of "synergies" usually fall victim to the staggering complexity of aligning the organizations.

[via Steve Rubel]

Update: Charlene Li has a good list of the expected synergies.

Update: It's official. Here's the press release.

Paul Graham, startups, and VCs

The latest essay from ubergeek Paul Graham is titled "A Unified Theory of VC Suckage".

Inflammatory title, to be sure, but the essay makes several good points. Paul's argument is similar to many of the points made by Joel Spolsky (of Joel on Software fame) in his excellent post, "Fixing Venture Capital".

[via Andrew Chen]

Update: Looks like Paul Graham is putting his money where is mouth is. His new Summer Founders Program is a paid summer internship designed to build new startups. From their FAQ:
    We suspect that students, and particularly undergrads, are undervalued ... Our hypothesis is that many quite young hackers could start viable startups if someone just gave them enough seed money to develop a prototype. The Summer Founders Program is an experiment to test this hypothesis.
[via Joho]

Update: Joel and Paul did not go unnoticed by the VCs out there. Fred Wilson takes a conciliatory tone with his post, "We suck less". But Jeff Nolan gets out the big guns and flames entrepreneurs, Joel Spolsky, and just about everyone else. Kaboom! All very amusing.

Update: Seven months later, Paul Graham posts about the mostly positive results of the first Summer Founders Program.

Flickr purchased by Yahoo

The rumors are true. Flickr has been acquired by Yahoo.

[via Jeremy Zawodny]

Update: The deals just keep on coming. HP just bought Snapfish, a photo site (it's closer to Ofoto and Shutterfly than Flickr). [via Findory]

Saturday, March 19, 2005

Findory on CNBC

We heard that Findory was mentioned on CNBC a few days ago. Fun! That's the first TV coverage for our personalized news site.

We've had radio coverage on KPLU and KOMO. We've had press coverage in the Seattle PI, Seattle Times, San Jose Mercury News, Slate, Searcher Magazine, the Puget Sound Business Journal. More details and links to many of the articles are available on our press page.

We're also thrilled by the positive reaction we've received from the weblog community. Hundreds of weblogs have talked about Findory and what we're building. Thanks, everyone!

Friday, March 18, 2005

The Long Tail and the cost of information

Kevin Laws has a great post about The Long Tail and the cost of information. Some excerpts:
    [Search cost] is the cost of finding the item you need - often measured in time and effort, rather than money. When sorting through the list of all music ever released, it would take you forever to find that piece of music you'd actually enjoy ... you'd probably buy nothing, because you'd give up long before finding anything you'd like.

    Amazon provides a variety of tools to help reduce search costs: recommendations, samples, listmania, and many other tools ... Amazon ... leads customers to buy items they've never heard of before.

    It is not enough for a company to aggregate lots of small things. Reducing search costs by matching content to users is critical for Long Tail businesses.
Massive selection isn't enough. To make The Long Tail accessible, we need to lower the cost of information. It needs to be quick and easy to discover the gems stuck out in the tail.

Irrelevant items should be hidden. Interesting items should be emphasized. Reduce millions of poor choices to tens of good ones. Help people find what they need.

Bloglines, eTech, and advice on startups

Mark Fletcher (CEO, Bloglines) gave a talk called "From the Garage: Lessons Learned Birthing and Building Web Start-Ups" at eTech.

Best summary of the talk I've seen was by Andrej Gregov. Andrej has had excellent coverage of eTech and the other posts on his blog are well worth reading. Reemer and Mike Rodriquez also have summaries of Mark's talk.

At Findory, we practice many of the same principles Mark advocates. We're deeply passionate about what we're building. We run fast, lean, and cheap. We keep it simple. We launch early and often. We leverage open source technology wherever we can.

For more advice along the lines of Mark's talk, I'd recommend Paul Graham's recent essay, "How to Start a Startup", and Guy Kawasaki's new book, "The Art of the Start".

Update: Mark Fletcher posts a link to the slides from the talk. Thanks, Mark!

The Long Tail at eTech

Chris Anderson posts about eTech and his presentation (.ppt) on The Long Tail.

I'm disappointed that I wasn't able to make it to eTech. This and many of the other talks looked excellent.

Thursday, March 17, 2005

Feedster to exclude Blogspot blogs?

Scott Johnson (VP Engineering, Feedster) says Feedster is so tired of spam from bogus weblogs at Google's Blogger that they're thinking of excluding from their blog search engine.

A drastic step. The weblog spam problem is severe. And it's just getting worse and worse.

Update: Feedster CEO Scott Rafer chimes in too.

Wednesday, March 16, 2005

Blogging tools and Yahoo 360

Charlene Li posts about Yahoo 360, a blogging and social networking service currently in closed beta.

According to Charlene, there are two major pieces to the product:
  1. A blogging tool similar to Google's Blogger or MSN Spaces with some sharing of content.
  2. Sharing product reviews among people you know (on the theory that reviews have more credibility coming from a source you know).
All the major portals -- MSN, Yahoo, and Google -- now will have blogging tools. With these giants in the space, it's going to be hard for other, smaller players.

I'm not surprised to see Yahoo produce a blogging tool, but I am surprised to see most of the innovation in this area coming from Yahoo and MSN, not Google.

Google has had Orkut and Blogger for a while now. But there's been precious little innovation on these products. Orkut has languished. Blogger seems to be overwhelmed with scaling problems. Unfortunate. interview on CNet

GuruNet CEO Bob Rosenschein has an interview in CNet with a good discussion of Some excerpts:
    The Web, for better and for worse, is largely unedited, unfiltered, cluttered ... The average user is just overloaded.

    Our goal is to deliver concise, relevant information in one click .... You just want the information. is now used by Google instead of for word definitions, a deal that's been giving them a lot of attention and traffic.

[via Brad Hill]

Tuesday, March 15, 2005

What's open about A9's OpenSearch?

A9 announced OpenSearch today at the Emerging Technologies Conference. OpenSearch is a list of specifications of how other search engines can make their search results available.

Wait, what was that? Other search engines?

Yes, the idea behind OpenSearch is that other search engines can be listed on A9 if they obey A9's specifications for publishing their data. The "open" in OpenSearch is other people opening their search engines.

No, A9 itself doesn't publish access to its search in OpenSearch format. Accessing and republishing A9's results is prohibited by their terms of use, you silly goose. When they said open, they meant you could be open, not them.

A9's OpenSearch initiative would have a lot more credibility if Amazon and A9 started by providing unrestricted access to all of their search engines in the OpenSearch format.

[OpenSearch announcement via John Battelle, Werner Vogels, and Danny Sullivan]

Reuters CEO on personalized news

Reuters CEO Tom Glocer gave a speech on personalized news at the FT New Media and Broadcasting Conference. A transcript (.doc) is available. Some excerpts:
    If we can characterise the 19th century as the age of the newspaper, the 20th century the age of radio and television, this century will be defined as the age of media personalization. The news you want, when you want it.

    The concept is simple -- forget the old media that decided what was news and when and how you would consume it. Personalization is all about delivering news to the individual -- stories that are relevant to you, your life, your job, your family.

    Personalization of news will cut through the clutter ... In a world of information overload a premium will be placed on personalization.

    Serendipity is part of [it]. In the same way when you read a newspaper you stumble across stories that you otherwise weren't looking for. Customized content needs to be flexible enough to also provide the quirky and strange. It's the chance encounters that make life interesting.

    Provide what's most relevant to me. Give me the news on the topics that I am interested in and give it to me quickly and accurately ... Show me what other people like me are looking at.

    Personalization will be the dominant theme of media for the next hundred years. How we respond now will determine the future for all our companies.
Tom Glocer's speech follows a similar talk by Associated Press CEO Tom Curley. Both AP and Reuters see personalized news as the future.

[transcript via The Bitter Vat]

Rewriting pages and the Autolink backlash

When I commented on the Google Toolbar Autolink feature, I said the problem was not what was currently in Autolink, but the slippery slope of what could follow.

Google is now seeing part of that backlash in Butler, a tool written by Mark Pilgrim that "removes ads on most Google pages" and adds links to pages to Google's competitors.

With Autolink, Google stepped into the thicket of rewriting web pages. Now that others are rewriting pages in ways they don't like, they're going to have a hard time complaining without sounding like hypocrites.

[via Tony Gentile and Danny Sullivan]

Monday, March 14, 2005

The Economist on personalization

The March 12 Economist has a long article called "United we find" on personalization.

Unfortunately, the article is subscription only, but here's some short excerpts:
    Collaborative filtering software is changing the way people choose music, books and other things, by helping them find things they like, but did not know about .... It helps people find things they might otherwise miss.

    Keyword-based search engines (such as Google) have a fundamental constraint: they can only help you find something if you already have an idea of what it is. Two people's idea of "good music" may differ substantially, but Google would return the same results to both of them. To find things you might like, but not already familiar with, requires a different technology.
The article explains how collaborative filtering and similar algorithms work in a fair level of detail. Worth reading if you can get your hands on a copy.

There is one glaring error in an otherwise good piece. The article claims that Badrul Sarwar "pioneered" item-item collaborative filtering in 2001, "around the same time" as "had similar ideas". Amazon invented item-item collaborative filtering and deployed it on their website in 1998, three years earlier.

Update: The article is now available for free.

Update: Tom Standage (Technology Editor, Economist) contacted me about their error, admitted they were wrong, and said they will issue a correction.

Update: The correction.

Newspapers struggle with online

Katharine Seelye at the New York Times reports on the problems newspapers are facing with adapting to news online.

Lucrative print subscriptions are declining. Online sites are "booming", the "fastest-growing source of revenue" for newspapers, but still represent only 2-3% of overall revenues.

Katharine says a few times that online newspapers are free, even claiming online readers are getting a "free ride". This just isn't true. They're not free, they're advertising-supported. Just like with almost all print newspapers and magazines, readers mostly pay for the content by viewing advertising; subscription fees, when they exist, typically represent a small amount of the revenue supporting the newspaper.

The article also takes a defeatist tone toward generating revenue from online advertising. It's true that online advertising is currently a small percentage of overall newspaper revenues, but I don't think that the current amount of revenue is a good indicator of what could be achieved.

Newspapers have barely started to experiment with targeted advertising and content. No one knows what kind of clickthrough rates ultimately can be achieved.

A few years ago, no one would have believed Google could yield $0.09 per search. Google achieved this extraordinary revenue stream using targeted, relevant, useful advertising.

I think newspapers need to learn a lot more about how to deliver highly targeted, relevant advertising and related content before renouncing the online model.

Amazon's Statistically Improbable Phrases seems to be doing an experiment with a feature they call "Statistically Improbable Phrases". From their help page:
    "SIPs", show you the interesting, distinctive, or unlikely phrases that occur in the text of books in Search Inside the Book. Our computers scan the text of all books in the Search Inside program. If they find a phrase that occurs a large number of times in a particular book relative to how many times it occurs across all Search Inside books, that phrase a SIP in that book.
For example, for the business book "The Human Equation", the SIPs are "high performance work arrangements", "profits through people", "high performance management practices", "high commitment work practices", and "more cooperative labor relations".

Cute idea. Data mining in action.

Unfortunately, I'm not sure it is all that useful. The idea seems to be to help people discover other interesting titles that contain the same phrases. In my experiments with it, there were many clicks involved, too many spurious results, and too much work. But, it is a clever way of trying to expose more of the features of Search Inside the Book.

The feature may be in weblab, so I'm not sure everyone can see it, but, for those who can see it, it appears at the very top of book detail pages under the title and author.

Update: Mike at TechDirt reports that Amazon apparently forgot to implement a filter for naughty words and gives some examples of the amusing consequences.

Friday, March 11, 2005

Who's behind the prototypes?

An old friend, Michael McDaniel, apparently is one of a small team of people behind MSN's mysterious prototypes.

At, you'll find an RSS reader that looks a little like My Yahoo or CNet's Newsburst. At, it appears to be more of a bookmark service with some inline content. Both are in early development and work better in IE.

It'll be fun to see where MSN goes with this.

Michael McDaniel was at for many years. I worked with him closely in Amazon's personalization group.

Save TiVo and the perfect machine

In an article called "Save TiVo!", Farhad Manjoo writes about TiVo's goal of building the "perfect machine":
    [TiVo is] embracing our self-indulgence, greed and laziness by working toward a device that I like to call the Perfect Machine: a cheap, small, quiet, stylish thing that sits in your living room and can display all of your entertainment, from TV shows to music to movies to photos; it also hooks into the Web and gives you access to all manner of audio and video available online.

    The new project -- which TiVo calls "Tahiti" -- essentially aims to create a souped-up super-TiVo, a box so inviting, so enthralling, you'll never leave the couch .... The Perfect Machine ameliorates laziness, refines sloth, embellishes indulgence.

    The Perfect Machine is flexible in the way that a computer is, but works as flawlessly as a DVD player. The Perfect Machine, the ultimate hybrid of a PC and a consumer electronics device, would be upgradable and minimally programmable, but it would never freeze up or slow down.
The article makes TiVo users sound like lazy slobs, but I think it's just exaggerating to make a point. People want things that make their lives simpler. It should be easy. It should just work.

And why shouldn't entertainment be this easy? I want a box that has my TV, movies, and music, all available, all the time. And I'm so lazy, I don't even want to try to find content. I want a box that helps me discover shows I didn't even know existed, that finds content for me. It should all just work. It should be that easy.

I hope TiVo does build the "perfect machine". At this point, it looks like they're either going to build it or die trying.

[via Jeff Nolan -> Om Malik]

Thursday, March 10, 2005

Customizing Google News

Google News soft-launched a feature last night that lets readers customize their Google News front page. Fun!

Much like My Yahoo, readers can rearrange news categories on the front page. For example, I moved "Business" and "Sci/Tech" to the top and removed "Entertainment". Readers can also add a custom section to the page that contains all news articles matching a keyword search. For example, I added a section with all news stories containing the word "Google".

This is customization, not personalization. It requires readers to explicitly customize the page; it doesn't learn implicitly from reader behavior. It requires effort. As easy as Google made it, the unfortunate truth is that the majority of Google News users won't bother with it. But it's fun for those of us who do like to tinker.

Unlike My Yahoo, Google News requires no registration or sign in to use. Readers can just go to Google News and customize immediately, a nice touch that reduces the effort required. Unlike My Yahoo, you can not add RSS feeds to your Google News page.

This is an intriguing move by Google. Until now, Google hasn't showed much interest in personalized portals like My Yahoo and My MSN. But this edges Google News closer to those products. Add a left column with stock quotes, weather, and other widgets, and you've got yourself a My Yahoo look-alike.

It's unclear where Google is going with this. Will we see a My Google that looks like My Yahoo? It's not like Google to launch something that's imitating other products in the marketplace. Google is about innovation, not imitation, and I expect we'll see them take their own path.

More coverage from Chris Sherman, Slashdot, and ZDNet.

Update: Nathan Weinberg posts a detail review including this very interesting tidbit:
    One possibly undocumented feature is that your customizations influence the secondary stories in the top right-hand corner of the page.
It's true. The "top stories" under the "Edit this customized page" link do appear to be influenced by how I've customized the page. Perhaps an experiment with greater personalization of the page?

Nathan also has some kind words for Findory. Thanks, Nathan!

Wednesday, March 09, 2005

Amazon India

Werner Vogels (CTO, talks about Amazon's new development center in India.

Monday, March 07, 2005

Google desktop search API

When Google updated their desktop search on Sunday, they added a developer SDK.

Very clever. Rather than laboriously adding code for thousands of bizarre and obscure file formats themselves, Google provided the tools for others to do it.

More details on the updated version and the API from Chris Sherman, John Battelle, Tara Calishain, and Nathan Weinberg.

Update: And the official announcement on the Google Blog.

Thursday, March 03, 2005

Restaurant reviews in Google Local

Gary Price posts about recent improvements to Google Local:
    Let's say you're searching for restaurants located in the Zip Code 60611 (Chicago). The first entry is for Eli's the Place for Steak on Chicago Avenue.

    When you click on the listing link, you'll now find a page with additional information (links to menus, hours of operation, payment info, etc.) automatically extracted from open web sources like,, and Frommers. Below this info, you'll find reviews of the Eli's extracted from various web sites.
John Battelle adds:
    Unlike Yahoo, which allows for users to submit reviews at the point of search, Google crawls the web for reviews which are already extant, then rolls them into its results. Users cannot add their own reviews on the spot.

    This is an important distinction, and yet another declaration of how Google differs from its competitors.
I love this idea of spidering reviews from the web. By building specialized indexes of product, movie, restaurant, and other reviews, Google is essentially building a reviews search engine.

Objective, trustworthy, high quality information about products and services would be of enormous values to consumers.

See also the official announcement by Thai Tran and Bret Taylor on the Google Blog.

Amazon Zuggest

Francis Shanahan created Amazon Zuggest, a little tool that combines a Google Suggest UI with Amazon web services to search for books while you type. Fun!

Another great example of how Amazon is stimulating external innovation by offering web services.

Wednesday, March 02, 2005

Personalizing search using your desktop files

Todd Bishop reports on Microsoft's TechFest, including a project on personalized search:
    Projects on display during a Microsoft Research event yesterday included a method for personalizing Web search results ... The prototype developed by the Microsoft researchers comes up with those personal preferences automatically by consulting the index generated by MSN Desktop Search.

    "Other people have tried to do this by requiring you to specify a profile -- so you say, 'I'm interested in technology or sports,' or whatever the case may be," said Susan Dumais, a Microsoft senior researcher working on the project. "The nice thing about using the desktop search index is that it captures all of that ... and it's updated continuously."

    Microsoft senior researcher Eric Horvitz ... called the personalized search technology his top priority for transfer from Microsoft's research division to the product development side of the company.
Clever. The idea is to use the files on your PC to build a profile implicitly and then use that to modify your search results.

It sounds like this is a coarse-grained approach, building something like a general subject and keyword profile that then skews all future searches. Coarse-grained approaches are easier to implement, but they can make inappropriate changes -- How do you use my interest in Cooking to bias a search for "personalization"? -- and don't do a great job at discovery, surfacing really interesting little gems I would have otherwise missed.

Susan's dig at "other people" is probably referring to Google's personalized search, which is also coarse-grained but does require you to explicitly specify your interests.

Building a profile implicitly is nice, but it's not clear to me that my desktop files are a good predictor for personalized web search. Are the files on your desktop correlated with what web search results you find most interesting? I'm not sure they are. A better predictor might be previous searches or the web pages you've viewed, not whatever data you have stored in Word and Excel files.

But it's a clever idea. Excellent to see Microsoft Research pushing personalization.

Update: According to a recent SIGIR 2005 paper that appears to be about the same work, it sounds like this personalized search prototype is using keyword-based approach, not subject-based approach.

Tuesday, March 01, 2005

Yahoo Movies recommendations

Scott Gatz (Director, My Yahoo) just told me about Yahoo's public beta test of movie recommendations.

Try it out. You rate a few movies, then it recommends movies that might interest you.

Quality of the recommendations seems just okay to me, but this is just a beta, and perhaps things will improve over time. Great to see Yahoo being more aggressive about personalization and recommendations.

It appears that the recommendation engine is provided by ChoiceStream. ChoiceStream uses a content-based algorithm (matching metadata about the movies) that they call "Attributized Bayesian Choice Modeling".

The nice thing about these kind of approaches is that they work well without any ratings data, solving the cold start problem (new movies or when the system doesn't have a lot of users). But, in every implementation I've seen, the quality of the recommendations seems like an issue.

You can also see ChoiceStream's engine running MyBestBets, a recommendation site for TV shows and movies.

Update: A couple months later, this feature has launched.

Yahoo web services

Only five months later, Yahoo launches a web services API.

It looks pretty nice, a good set of features for researchers and tinkerers. It includes image and news search, unlike the Google API.

Jeremy posted links to some good press and weblog coverage.

The key challenge is personalization

Paul La Monica at CNN reports on Yahoo co-founder Jerry Yang's speech at the Search Engine Strategies conference. An excerpt on personalization:
    [Yang] also said that the key challenge for Yahoo! and all search companies going forward will be to find ways to increased the personalization of results, i.e. making sure that a user truly finds what he or she is looking for when typing in a keyword search.

    "The relevance of search is still the Holy Grail for any search application," Yang said.
With only one generalized relevance rank, further improvements to search quality become increasingly difficult because people disagree on how relevant a particular page is to a particular search.

At some point, to get further improvements, relevance rank will have to be customized to each person's definition of relevance. When that happens, you have personalized search.

The Ungoogle

Michael Malone at Wired calls Yahoo the "Ungoogle":
    While Google was busy becoming what Yahoo! used to be, Yahoo! has become what AOL should have been.

    Google is on its way to Redmond to battle Microsoft later this decade, while Yahoo! is going Hollywood.
I think this is overstated. While Google is more focused on technology, both companies are innovating, both are competing against Microsoft, and both continue to develop media products.

In the same article, Yahoo CEO Terry Semel talks about Yahoo's focus on increased customization and personalization:
    Customizing the site down to the neighborhood level will make it more appealing to users and indispensable to advertisers. "If you are looking for a plumber or a pizza parlor, you don't want one 3,000 miles away," Semel says. "You want your search to be customized just for you."
Yahoo does have an advantage here. Many Yahoo users have already entered their zip code at some point or another into Yahoo's systems. Although implicit data like geolocation on your IP address can provide a guess on your location, that might only be good enough for targeting local advertising, not local search.