Wednesday, December 29, 2004

BitTorrent, Internet TV, and personalization

Clive Thompson at Wired has an interesting article on BitTorrent, the filesharing software that has tens of millions of users and generates about a third of all internet traffic.

The entire article is worth reading, but I wanted to highlight this excerpt on using BitTorrent for watching TV:
    BitTorrent is something deeper and more subtle. It's a technology that is changing the landscape of broadcast media.

    "All hell's about to break loose," says Brad Burnham, a venture capitalist with Union Square Ventures ... BitTorrent does not require the wires or airwaves that the cable and network giants have spent billions constructing and buying ... BitTorrent transforms the Internet into the world's largest TiVo.

    If enough people start getting their TV online, it will drastically change the nature of the medium ... The whole concept of must-see TV changes from being something you stop and watch every Thursday to something you gotta check out right now, dude. Just click here.

    What exactly would a next-generation broadcaster look like? The VCs at Union Square Ventures ... suspect the network of the future will resemble Yahoo! or - an aggregator that finds shows, distributes them in P2P video torrents, and sells ads or subscriptions to its portal. The real value of the so-called BitTorrent broadcaster would be in highlighting the good stuff, much as the collaborative filtering of Amazon and TiVo helps people pick good material.
In a flood of information, we need focus. With tens of thousands of TV shows, we need personalization to filter, to help us find what we need.

See also my earlier post, "Will somebody please fix TiVo Suggestions?"

Tuesday, December 28, 2004

The problem for newspapers

Michael Bazeley posts that Craigslist has cost SF Bay Area newspapers $50-60M in classified advertising. Bob Cauthorn (former VP at the SF Chronicle) is quoted as saying:
    The problem for newspapers isn't Craigslist. The problem for newspapers is the newspapers themselves. Specifically, that class of slow-blink-rate executive who refuses to see today through the lens of today....They recite from business self-help manuals and reduce the hard work of innovation and creativity to comic book parables. Meanwhile, they lose market share, circulation and audience. Ultimately these people will cost an industry its future.
Harsh words. I'd say that Cauthorn is being unreasonable, but moves by newspapers such as mandatory registration seem to support his fears.

Newspapers used to have localized monopolies on distribution. Reading the local newspaper was the only way to see local news and local classifieds.

Increasingly, newspapers have to live in a world of decentralized distribution. Advertisements that used to run in a local paper may now run on Craigslist, Yahoo Local, Monster, or eBay. More visitors will come to read local news not through the front page of newspaper's website, but via RSS feeds or aggregators like Google News.

Newspapers know local better than anyone. They know the local advertisers. They know the local news. They are the kings of local content.

Tom Curley (CEO of AP) said it best: "The franchise is not the newspaper; it's not the broadcast; it's not even the Web site. The franchise is the content itself."

Newspapers should take advantage of decentralized distribution. Before, advertising and classifieds would run in a print newspaper to a small subscriber base. Now, newspapers could distribute local advertisements out across many channels, with the newspaper managing the key relationship with the local advertisers. Before, reporters for the paper often find their articles condemned to the back pages, read by only a few thousand readers. Now, a vast audience of readers can discover their work through RSS and news aggregators, pulling readers to the newspaper's website through the strength of their content.

Grasping for the fading monopoly on local distribution will only cause it to slip away faster. Focus on the content. Embrace change.

Friday, December 24, 2004

In 2005, search becomes personal

Alfred Hermida at the BBC writes about personalized web search:
    For all these advances, search is still a clumsy tool, often failing to come up with exactly what you had in mind.

    In order to do a better job, search engines are trying to get to know you better, doing a better job of remembering, cataloguing and managing all the information you come across.

    "Personalisation is going to be a big area for the future," said [Marketing Director at Yahoo] Yonca Brunini.

    "Whoever cracks that and gives you the information you want is going to be the winner. We have to understand you to give you better results that are tailored to you."

    This is perhaps the Holy Grail of search, understanding what it is you are looking for and providing it quickly.
[via Findory]

Thursday, December 23, 2004

What has become of Orkut?

Nathan Weinberg said:
    There's no shortage of people at Google who are disappointed with the way Orkut is not catching on. Google really wanted to build a powerful community, and it isn't going to happen through Orkut.
When I read Nathan's post, I realized it had been months since I used Orkut. Like many people, I played with it a bit when it first came out, set up my little network, got in contact with a few old friends and colleagues.

It was a fun toy. But the fun died quickly. The discussion forums were useless, all noise, no signal. The messaging system was full of spam, people foolishly broadcasting inane messages out to all friends of friends. And Orkut became so slow as to be unusable (something that, I can only assume, is quite embarrassing to the rest of Google).

My visits, initially a couple times a day, dropped to once a week, then dropped off entirely.

The toy wasn't fun anymore, so I stopped playing with it. Had it been more than a toy -- if it were a useful tool that helped me with my life -- I would have stuck with it, but there was no real value to Orkut.

Checking it out again now, it seems that everyone else abandoned Orkut too. The only ones left seem to be Brazilian teenagers. Oh, Orkut. What has become of you?

Webfeeds and ease of use

Rich Gordon says using webfeeds is still too difficult for most:
    Talking to novices about webfeeds is like trying to explain the World Wide Web in 1995 to someone who'd never used a browser. But as soon as browser software became easily accessible and there was good content to view through it, the significance of the Web became clear to most everyone.

    Because the Web (and XML) already existed when RSS was invented, it was relatively easy to generate webfeeds with interesting content. But we're still waiting for the equivalent of the first Netscape browser -- the software that makes ordinary consumers ... go, "Aha. I get it."
It's not at all clear to me that ordinary users want to know what a webfeed is. They just want news. They want their news to be quick to access, easy to read, and relevant to their lives.

Focusing on webfeeds confuses the tool with the goal. Webfeeds are a means to an end, not the end itself.

Smart aggregation

Mike Davidson posts on the need for smart aggregators:
    Information overload. It’s the next big issue in publishing, and technology in general.

    With the internet still growing and changing at such a rapid rate, the raw amount of information your brain processes will see a huge increase ... The flow of information into our lives is only going up and our free time is only going down ...

    The key to our information gathering lives is all about smart aggregation. The days of media companies deciding what’s on your "front page" are numbered. Within five years, I believe customizable newsreader technology ... will be as prevalent as the web is right now.
[via The Shifted Librarian]

Wednesday, December 22, 2004

Battelle's 2005 predictions

It's that time of year again. John Battelle has his predictions for the search war for 2005. It's a great list.

Personalized news and search isn't mentioned explicitly, but is implied in the long tail (#4) and in redefining what's possible in search (#10).

The further entry of Yahoo and Google into e-commerce (#7) seems to me like a bigger threat to Amazon and eBay than John says. If Google's AdWords intrudes on classified advertising (see "Google, small business, and eBay") and Froogle becomes the place to find and buy anything online (see "Froogle adds product reviews"), eBay and Amazon will be hurt.

See also John's predictions from 2004 and how they turned out.

Innovation and the GYM triumvirate

Adam Rifkin has some great ramblings on the search wars in 2004-2005.
    In 2004, Google is the leader of GYM -- the triumvirate of Google/Yahoo/Microsoft, which in turn leads a dozen other related companies in the web-related innovations that improve peoples' lives.

    Google lays down one gauntlet after another -- a better email experience and a Gig of storage, and a better desktop experience in searching my stuff, to name two examples from 2004 alone -- and Yahoo and Microsoft follow the leader by improving their email experiences and announcing their desktop search tools. Often then others follow the troika -- even if, as in X1's and Lycos's cases they actually had desktop search before Google did, once Google plants a flag it's like a shot hearing round the world, and everyone seems like a follower.

    Together GYM and their followers offer a suite of tools that give me hope that I can manage my personal Web -- and accelerate my ability to search and research simply, to discover and find again easily, to filter and incorporate suggestions collaboratively. As the web grows, so does each of our personal Webs, and tools become not just important but critical to productivity.
And, dipping my knife into that peanut butter, a "Googlecalifragilisticexpialidocious" and "Yahoocalifragilisticexpialidocious" 2005 to you too, Adam.

News moves to the Web "The only news source showing an increase in daily use since Gallup's 2002 poll on media usage is the Internet."

Update: More details from an article at Editor & Publisher. [via JD Lasica]

Tuesday, December 21, 2004

Yahoo and Google, vive la difference

Michael Liedtke has an AP article on the differences between Yahoo and Google:
    Google ... is devoted to ... transforming the way the world finds and stores information, even if that means sending people somewhere else.

    Yahoo ... strives to be all things for all people — a one-stop destination for recreation, work and research.

    Google ... takes a ... laissez faire approach toward innovation, embracing new ideas and products long before the company's management figures out how everything fits into the overall business plan.

    Yahoo takes a more practical approach to technology, first identifying what people want and then building or buying a product designed to give visitors one less reason to leave its Web site.
[via Andy Beal and Gary Price]

Monday, December 20, 2004

Killing comment spam

Jeremy Zawodny argues that search engines should stop using links in weblog comments for PageRank in order to reduce the incentive for comment spam.

As with e-mail spam, the basic problem is that, at least for some, the benefits of posting spam exceed the costs. So, how do you attack the problem? Increase the costs or reduce the benefits.

Not counting links in weblog comments for PageRank reduces the benefits. People won't be able to use weblog comments to inflate their PageRank.

But this alone is not sufficient. There's value from a spammer just to having a link or even just a product name mentioned in a public forum. Since the costs are so low -- just like with e-mail spam -- a spammer only needs a tiny fraction of spammed people to respond to make their campaign of annoyance worthwhile.

Increasing the costs will have to be part of the solution. Spammers rely on being able to hit tens of thousands of weblogs automatically, so anything that makes this automation more difficult increases costs.

And there's many strategies out there to make weblog spam more difficult. Blacklists ban specific IP addresses from posting comments. Some require an account or a verified e-mail address before posting. Requiring entering a code from a distorted image (that is difficult for a robot to read) is another technique. Even asking a simple question (e.g. "What's the third word in this sentence?") before posting can be enough of a hassle to block spammers if everyone asks a different question.

But this will be an ongoing problem. The full costs of spam are not borne by the spammers. As long as someone, somewhere finds comment spam rewarding, the problem will exist.

[via John Battelle and Joseph Scott]

Paper on cracking Google Desktop Search

Seth Nielson, Seth Fogarty, and Dan Wallach released a paper, "Attacks on Local Search Tools"(PDF), that discusses in detail the widely reported security flaw in Google Desktop Search.

The paper is worth reading. Most interesting are the details on the implementation of Google Desktop Search. They found:
  1. Google Desktop must be observing all outgoing network connections.
  2. Google Desktop performs packet analysis to identify HTTP proxy connections in addition to looking for direct connections to Google.
  3. The search requests did not need to originate from a web browser visiting
  4. Integration is triggered by observing outgoing packets, and occurs after packets are received, but before they are given to the web browser or application.
This is pretty cool. Google Desktop Search integrates local results into a Google search by intercepting the request out to Google and rewriting it before it gets to the web browser.

At this point, Nielson et al. had already found the chink in the armor, that the request doesn't have to be from a web browser directly. They tried a few tricks to get Google Desktop Search to show local data inappropriately. And were successful.
    We found that the Google Desktop personal search engine contained serious security flaws that would allow a third party to read the search result summaries that are embedded in normal Google web searches by the local search engine. While an attacker would not be able to read the victim’s files directly, the search results often contain snippets of the file results that will be visible to the attacker.
Doh. No need to panic though. Google has already patched the problem and automatically updated everyone.

Google Desktop Search's integration of the local search results into a Google web search was really clever. Ever since I saw it, I've been curious about the details of how it was implemented. This paper was an enjoyable read.

[via eWeek and InsideGoogle]

Update: Nikhil Bhatla (PM, Google Desktop Search) posts about the security patch on the official Google weblog.

Saturday, December 18, 2004

Unfortunate AdWords

El Cogote discovered Amazon, eBay, and others running silly ads on Google for many terms.

For example, a search for "misfortune" brought up an eBay affiliate ad that said, "Find it on eBay! Misfortune and much more."

It's easy to play this game with other terms. I just did a search for "fraudulent" and got an eBay affiliate ad saying:
    Low prices and huge selection!

[via Xeni Jardin]

Friday, December 17, 2004

Business model for Bloglines

Eric Peterson (Jupiter Research) posts about Bloglines' plan for advertising:
    "AdWords on Steroids" ... Any article or feed I'm interested in [has] content that can be mined and transformed into relevant pay-per-click advertising.

    While Google and Overture sell advertising based on a limited number of keywords, the content in feeds is rich with information that can be mined to laser-target the advertising.

    [Bloglines CEO Mark Fletcher] commented that the aggregate of subscriptions could also be mined to provide additional inventory, e.g., if I subscribe to Engadget and Gizmodo there is A) a strong chance I am a personal technology person and B) I am probably subscribed to other blogs that are gadget-relevant.

    Mark's idea makes sense and is a better idea than injecting advertisements into my feeds.
There is a lot of rich data here. There is an opportunity to do well-targeted, relevant, useful, and unobtrusive advertising. We at Findory are planning something similar for our advertising engine.

Thursday, December 16, 2004

Froogle adds product reviews

Stefanie Olsen at CNet reports that Google has added product reviews to Froogle. Google is aggregating the product reviews from other sites.

It's a good move by Google. A couple weeks ago, I said:
    Currently, Froogle is a price comparison engine, helping users find a specific product at a low price. With product reviews, Froogle gives users the information they need to differentiate between products, helping them find the right product at the right price. It's a much more useful service.
Froogle is suddenly moving quickly. It was only a few weeks ago that they added merchant reviews and wish lists.

[via Search Engine Watch Blog]

MIT Tech Review on Amazon Web Services

Wade Roush raves about Amazon's web services in MIT Technology Review:
    While companies such as Google and Microsoft are also experimenting with the idea of letting outsiders tap into their databases and use their content in unpredictable ways, none is proceeding more aggressively than Amazon.

    The company has, in essence, outsourced much of its R&D, and a growing portion of its actual sales, to an army of thousands of software developers ... The result: a syndicate of mini-Amazons operating at very little cost to Amazon itself and capturing customers who might otherwise have gone elsewhere.

    It's as if Starbucks were to recruit 50,000 of its most loyal caffeine addicts to strap urns of coffee to their backs each morning and, for a small commission, spend the day dispensing the elixir to their officemates.

    The strategy behind Amazon Web Services is to give programmers virtually unlimited access to the very foundation of Amazon's business -- its product database -- whether they are inside or outside the company's walls.
Web services engage creative and talented outside software developers. It's distributed research and development, reaching beyond the walls of the firm to seek innovation.

See also my earlier posts ([1] [2] [3] [4]) on Amazon web services.


Fine-grained, implicit, and anonymous

Laurianne McLaughlin comments on personalized web search in a light article in IEEE Distributed Systems:
    Enhancing personalized results is a large near-term goal for Web search.

    "Our challenge is to read a user's mind," says Daniel Read, vice president of product management for Ask Jeeves. It's an intriguing challenge, given that most Web searches today still contain just two to three words.
The example given of personalized search -- learning of a general interest in cooking and biasing all search results toward cooking -- is coarse-grained and doesn't capture the potential of personalization. Biasing all my searches toward a general subject interest isn't likely to work very well. How does my interest in cooking help when I'm searching for a camera? Fine-grained personalization focuses on your mission -- what you are doing right now -- and how to help you find what you want faster.

There's a brief mention of implicit vs. explicit personalization in the article. While it's true that implicit personalization is hard, working from sparse and noisy data, the article missed the major issue with explicit personalization: Most people won't do it. It takes work. People don't want more work. The entire point of personalization is to make things easier.

There's also a brief mention of privacy, something that can be handled by making users anonymous.

Personalized web, news, and blog search on Findory is fine-grained, implicit, and anonymous. We keep our eye on the goal, helping searchers find what they want quickly and easily.

[via Gary Price]

Blinkx TV

Blinkx announces their new search for TV clips called Blinkx TV. More information on their About page.

A search for "Jon Stewart" turns up a number of mentions of the Daily Show host on several news programs (but no clips of the Daily Show). Fun and probably useful for some.

Blinkx's core technology is implicit search, finding information you need automatically without you. I'd assume the idea behind their TV search is to automatically surface TV clips if they're relevant to your current task.

For example, if you're reading a web page on RSS, perhaps they would surface some relevant video clips of news programs talking about RSS. It's a very hard problem, but it'd be pretty cool if they can do it right.

Google supposedly is also working on TV search, but it's still vaporware.

[via Search Engine Journal]

Update: Gary Price says Blinkx has always had video search and that the only thing new here is the standalone web interface. I didn't realize that.

Gary also describes the search, which does search the transcript of the TV broadcast, and points out some other small tools that do something similar.

Wednesday, December 15, 2004

Google getting off task?

After hearing about Google's library project, David Coursey at eWeek says that Google is losing its focus.
    My Google searches today are significantly less useful than the searches I made just a year ago. This is partially a reflection of the ever-increasing size of Google's collection, but it also shows how information providers have learned to spoof Google's robotic system.

    I'd rather see Google concentrate on getting search right than trumpet how much is being added to its sea of information.

    The company's first task should be throwing us a line, not building a bigger ocean.
Help me find focus in a flood of information. Help me find order in chaos. Help me find what I need.

That is the opportunity and the challenge.

[via Search Engine Watch Blog]

The threat to Microsoft

Joe Wilcox (Jupiter Research) explains why Microsoft considers Google a threat:
    The real threat remains the Web and how a vendor like Google has found a new way to exploit the Internet's utility beyond Windows.

    Search is one of several mechanisms (fast data connectivity is another) that could catalyst alternative platforms. Search would give tremendous utility to portable devices connected to the Internet or home or corporate networks. With so much computing focus on information and so much information stored somewhere else (meaning not locally), ubiquitous search could unify the utility of many disparate types of devices.

    So like Microsoft integrated the browser into Windows to fight off the threat posed by the Web, so the company is looking to tie the utility of search to its operating system. Because any technology utility where no Windows is required threatens Microsoft's core franchise.
The threat is much larger than just Google. It's about the future of Windows as the dominant computing platform.

Microsoft has been fighting this battle for many years. They worried about the rising power of handheld devices like Palm Pilots and cell phones, so they launched Windows CE. They worried about the additional functionality being built into game consoles, so they launched XBox. They worried about the rise of entertainment devices like TiVo and Replay, so they launched Windows Media Center. They worried about the threat from web-based applications, so they launched IE and MSN.

The latest shining star is Google. It's popular to talk about the search war as involving just two players, Microsoft and Google. In fact, the search war involves many players: Google, Yahoo, AOL, Microsoft, Amazon, and many smaller firms. And, the search war is only one front in the broader war Microsoft must fight to continue its dominance.

Tuesday, December 14, 2004

Google's war with Microsoft

Charles Ferguson publishes a long article in MIT Tech Review on Google's "war with Microsoft":
    Google's defeat is not a foregone conclusion. Indeed, if it does everything right, it could become an enormously powerful and profitable company, representing the most serious challenge Microsoft has faced since the Apple Macintosh. But if Microsoft gets serious about search -- and there is every reason to believe that it will -- Google will need brilliant strategy and flawless execution simply to survive.

    What should Google do? Google should understand that it faces an architecture war and act accordingly. Its most urgent task must be to turn its website into a major platform, as [Amazon has] already done.

    Google should first create APIs for Web search services and make sure they become the industry standard. Second, it should spread those standards and APIs, through some combination of technology licensing, alliances, and software products, over all of the major server software platforms, in order to cover the dark Web and the enterprise market.
The Microsoft giant is awake, says Charles, and it's hungering for a Google snack.

The impressive Google cluster is one part of Google's competitive advantage. I'm curious to see if Google does start offering better web services APIs. I'd certainly love to get my hands on that juicy cluster.

But will Google lose the search war if it doesn't offer better web service APIs? I doubt it.

Google has an impressive track record of innovation on its own. Amazon has web services APIs because it is seeking outside developers to boost innovation. Yahoo is considering them for similar reasons. But it's not clear Google has a problem with innovation. Google's biggest problem seems to be getting all the innovations available internally out the door and available to the public.

Furthermore, Google's lifeblood is advertising. Google is in the middle of building an advertising revolution. I think it is the AdSense revolution that will empower small websites and businesses, wrapping them around Google, not a software API into Google's infrastructure.

That being said, I do expect Google to launch services that allow users to further exploit the power of the Google cluster. But I expect these to be finished services like GMail that target end users, not web services targeting developers.

[via Gary Price]

Monday, December 13, 2004

Google Library

John Battelle reports that Google is digitizing the collections of major libraries.
    Google is working with Stanford, the University of Michigan, Harvard, Oxford, and the New York Public Library to make millions of books available in its index.

    The idea that the world's knowledge, as held through books and libraries, is opening up to all via a web browser cannot be understated. It's one thing to have the an original copy of The Origin of Species on the shelves, where students and interested parties have to travel to find it. It's another to have it available to everyone via a search index and your web browser.

    This could well be a step toward diversifying Google's revenue streams away from advertising and into ... the content business ... Google is not doing this only out of the kindness of its heart - there is a lot of money to be made in selling books, in particular books with no copyright.
Are you paying attention to this, Amazon?

Update: Felicia Lee from the New York Times writes about Google Library and the reaction to it from scholars.

Update: Tara Calishain posts that the Internet Archive is expanding its text archive with support from ten international universities.

Ask Jeeves desktop search

Andy Beal says Ask Jeeves will be launching their desktop search on Wednesday and gives some details.

Goodie. Now everyone's got one. Google, MSN, Ask, Yahoo, and (soon) AOL.

I'm not sure why these companies are launching poorly differentiated products into a crowded space. Google has some nifty integration with their web search that I kind of like. Other than that, I can't tell the difference between these offerings.

Even if one were better, it's not clear that'd be enough. The only reason this market opportunity exists is that the default search in WinXP and MS Office is poor. As soon as MSN integrates their desktop search into Windows, this game is over.

Usama Fayyad joins Yahoo

Usama Fayyad led the data mining group at Microsoft Research a few years ago. He just joined Yahoo.

He and his group were relatively early applying statistical analysis techniques to massive data sets. I remember particularly liking his "Scaling Clustering Algorithms to Large Databases" paper.

Yahoo certainly has plenty of juicy, tasty data. Mmm... data.

[via Yahoo Search Blog]

MSN desktop search

Microsoft joins the desktop search party.

MSN seems to be using Lookout (a company they acquired) for much of their desktop search. Yahoo is licensing X1's desktop search. AOL will be using Copernic. Only Google decided to build their own.

MSN's entry is a little amusing since, aside from searching your browsing history, seems like most of what they're doing is just fixing the miserable file search functionality built into WinXP and MS Office products.

I do find the hype behind desktop search mystifying. You're searching a few thousand files and e-mails on a desktop box . The major problem is grokking thousands of different file formats, painful, sure, but not exciting. With web search, it's the scale -- billions of documents -- makes the problem interesting.

That being said, there is some interesting innovation going on in desktop search. Dashboard, and Blinkx are trying to do personalized information retrieval. They have early, first steps toward making your computer figure out what you are doing and what information might be helpful for that task. Very cool.

Saturday, December 11, 2004

Turning noise to knowledge

A few days ago, Findory started personalizing the recent articles on our source pages.

For example, when I go to Wired magazine on Findory, because of my reading history, two articles are marked as personalized, "Troops stay in touch on the internet" and "Yahoo searches desktops too". When I go to Scobleizer on Findory, four articles are highlighted for me.

The problem with current web feed readers is that they don't solve the information overload problem. Sure, I can pick and choose which RSS feeds I subscribe to. But, once you have tens of subscribed feeds, reading them becomes this cumbersome process. Click on a feed, skim the articles. Anything interesting in that one? No. Click, skim. Click, skim. Click, skim. Ugh.

With Findory, the important news bubbles to the top. On the home page, interesting articles are selected just for you, pulled from thousands of news and blogs. On a search, relevant articles are highlighted. When you read a blog on Findory, important posts are highlighted.

Current RSS readers merely reformat XML for display. That isn't enough. They need to filter and prioritize. Show me what matters. Help me find what I need. Next-generation RSS readers will be personalized.

This is about more than just reading news. This is about information. Where before there was an undifferentiated glut of information, now there is focus. Where before there was noise, now there is knowledge.

What will this future look like? Findory has taken the first steps. Come and take a look.

Friday, December 10, 2004

Yahoo, Google, and those pesky humans

In a long post, John Battelle describes the difference between Yahoo and Google:
    Yahoo is a natural media company - the company is willing to have overt editorial and commercial agendas, and to let humans intervene in search results so as to create media which supports those agendas. Google, on the other hand, is repelled by the idea of becoming a content- or editorially-driven company.

    While both companies ... lay claim to the mission of "organizing the world's information and making it accessible" ... they approach the task with vastly different stances.

    Google sees the problem as one that can be solved mainly through technology - clever algorithms and sheer computational horsepower will prevail. Humans enter the search picture only when algorithms fail.

    But Yahoo has always viewed the problem as one where human beings, with all their biases and brilliance, are integral to the solution ... Humans first, technology second.
See also my earlier post, "Humans vs. Robots == Yahoo vs. Google".

Jeff Barr and Amazon web services

Earlier today, I had a chance to chat with Jeff Barr. Jeff evangelizes web services for and runs the web feed directory Syndic8.

It's remarkable what Amazon exposes through their web services. I'm particularly surprised by providing access to all of Amazon's customer reviews.

All of this has resulted in some clever applications. All Consuming, Amazon Lite, and Delicious Library are my favorites.

See also my earlier post, "May the best services win".

Yahoo desktop search coming soon

John Battelle was quick to announce Yahoo's new desktop search product. They decided to buy rather than build. Yahoo licensed X1's search application and are rebranding it as their own.

Todd Bishop (Seattle PI), Danny Sullivan (Search Engine Watch), and Charlene Li (Forrester Research) have informative details.

See also Gary Stein's comments after Copernic was acquired when he said, "Desktop Search has fully entered into the world of hype ... There's not really a revenue model from Desktop Search."

Google Suggest

Google Labs just released Google Suggest, a little tool that tries to guess your search query as you type it in.

It appears to be a simple UI layer over their spelling correction feature. No real query refinement, no clustering to offer suggestions of different terms (synonyms, related topics).

A cute toy, but I don't see anything really interesting here.

[via Nathan Weinberg and Danny Sullivan]

Update: Chris DiBona points to the ABC's of Google Suggest. Cute.

Update: Okay, I take it back. Looking at this more, I'm impressed, not with the data, but with the UI. Google is using some clever Javascript tricks (you can see in the code at to constantly talk back to the server and retrieve data about how to expand your search string.

Neat-o-jet. Like GMail, this is a remarkable use of Javascript to create a simple, clean, functional UI within the web browser.

Update: Kevin Gibbs talks about Google Suggest on the Google Blog. It's another 20% free time project. Cool.

Update: Chris Justus reverse engineers the Javascript for the Google Suggest UI in a detailed post. Very interesting and worth reading.

Update: A clever knock-off of Google Suggest does the same nifty UI for lookups in a dictionary. Nice work.

Update: And a nifty version of the suggest UI for CPAN (a Perl module archive). [via Joseph Scott]

Thursday, December 09, 2004

Personalized search and social networks

Chris Sherman writes about the new partnership between Eurekster and Friendster. I suppose these "-ster" companies just can't help but get together.

Their new search engine personalizes Yahoo search results using your Friendster social network. From Chris' article:
    Search results are prioritized with results viewed by anyone in your personal network appearing at the top of the list. These results are highlighted with a smiley face icon.
It's an interesting approach, but it remains to be seen how well it works. On the one hand, you trust your friends, so things your friends clicked on might be interesting for you to know about.

On the other hand, I'm not sure how often this will change the search results, whether the changes will focus your attention on the most relevant result for your search, and whether it is scalable to access search and clickstream history for everyone in your social network on every web search you do.

Nevertheless, it's an interesting development, an unusual use of a large social network to do a version of personalized web search.

Amazon UK DVD rentals

Amazon UK launches DVD rentals.
    "Amazon is determined to be the best place to rent DVDs -- online or off," CEO Jeff Bezos said.
See also "Is Amazon the new Netflix?"

Wednesday, December 08, 2004

A dime per search

John Battelle posts incredible numbers on the performance of Google AdWords:
  • Average payment per click on a Google ad is $.54.
  • Nearly 17% of searches end in a click on an ad.
Combining these two pieces of data, it appears Google makes $.09 per search.

That's right. Every search you do on Google generates nearly a dime of revenue for the company.

Impressive numbers. And they're only expected to go higher as demand for internet advertising increases.

Google has firmly established that relevant, useful, and unobnoxious advertising can be stunningly lucrative.

Update: John Battelle reports that the latest numbers (Oct 2005) from Google show a 33% increase, now $0.12 per search. Impressive.

Tuesday, December 07, 2004

MSN vs. eBay

Neowin reports that MSN will be launching a new e-commerce service called Messenger Marketplace:
    Buy and Sell within social network, also list wants and share recommendations. List items you want to sell, things you are looking for, and your recommendations. Your buddies notice new items you’ve listed when they login. They can either buy, sell or refer you to one of their buddies.

    It is like eBay except with people that you already know and trust directly (or a few degrees out).
Clever idea. Hard to do this without already having a good social network built, but MSN does have that for MSN Messenger users.

It's starting to look like eBay will have some new competition.

[via Todd Bishop]

Human editors and web search

Danny Sullivan (Founder of Search Engine Watch) says the solution to manipulation of search result rankings is to:
    ... involve human editors as part of the search equation. At one time, several search engines allowed human beings to make editorial choices about what would be shown in response to a query, to complement technological selections. Today, all the major services have sadly followed Google's lead in assuming all things can be solved through automation and search algorithms.
I assume Danny doesn't literally mean human editors hardcoding which results are returned for queries. How do human editors scale to billions of web pages? How do you do this efficiently and effectively, at low cost with high quality?

You might imagine that humans could provide canned responses to the most frequent queries. But this would only apply to a small subset of queries, and even this would be prohibitively expensive to maintain.

A more scalable and more common form of this is shortcuts where a search engine will detect particular categories of queries and return some results from a specialized data source. This is automated, of course, but humans are involved in identifying and creating the shortcuts.

I do wonder how much this debate of human vs. robots is a real issue. Truth be told, search engines have teams of good ol' humans analyzing data behind the scenes. These humans discover patterns in the data that are lowering the quality of the relevance rank, such as search engine spam, and change the algorithms to adapt.

Is this different than using "human editors as part of the search equation"?

Thunderbird launches

Thunderbird 1.0, the Mozilla e-mail client, is out. Excellent.

Interesting that it includes an RSS reader. It's increasingly common to see RSS readers integrated in e-mail clients (Thunderbird), web browsers (Firefox, Safari soon), and portals (My Yahoo).

[via Steve Rubel and Eric Bangeman]

Blocking RSS advertising?

Kottke discusses blocking or filtering advertising in RSS feeds.

[via therssweblog and Jeremy Zawodny]

Monday, December 06, 2004

Most hated advertising

Jakob Nielsen summarizes research on user's perceptions of online advertising and which advertising practices (popups, playing sound, blinking) are most annoying.

Jakob ends with some "Lessons for Websites":
    Sites that accept advertising should think twice before accepting ads that 80 to 90% of users strongly dislike. The resulting drop in customer satisfaction will damage your long-term prospects.

    Advertisers themselves might be tempted to continue with these nasty design techniques as long as they can find sites that will run them. After all, they typically yield higher clickthrough rates. But clickthrough is not the only goal. Users who are deceived into clicking on a misleading ad might drive up your CTR, but they're unlikely to convert into paying customers. And your brand suffers a distinct negative impact when you antagonize customers.
See also my earlier post, "Bringing sense to web advertising".

[via Alex Edelman]

Update: Jeff Boulter slams Yahoo for their annoying advertising.

Affiliate tags in search engines

Rayg (from Feedster) asks, "If search engines are so willing to pimp their space to sponsored links, why ... [not add] an affiliate ID in the [search result] links?" Ray goes on to argue that the move would largely go unnoticed and wouldn't damage credibility.

Jeremy Zawodny (from Yahoo) disagrees, saying that this would blur the lines between sponsored and non-sponsored results too badly. Even if the relevance rank is unbiased by the affiliates revenue, the perception that some links are paid would damage the credibility of the search engine.

I have wondered if Yahoo and Google have considered adding affiliate links, not to their search engine, but to their metashopping searches (Yahoo Shopping and Froogle).

But Jeremy's point that probably applies to shopping search as well. It would look like a conflict of interest and potentially damage credibility, even if the affiliates revenue did not influence their relevance rank.

Sunday, December 05, 2004

Google Reviews?

Gary Price reports that Google recently registered a few new domains, including

As Nathan Weinberg points out, there isn't really a Google Reviews product out there yet. Closest to it is the merchant reviews spidered into Froogle.

I'm curious to see if Google will be releasing a review service that is more of a competitor with Epinions, CitySearch, Zagat, and's customer reviews. What would this look like?

One version of this could be a combination of product reviews spidered from the web and reviews entered by Google users. I imagine these product reviews would be integrated into Froogle, supplementing the store reviews. Currently, Froogle is a price comparison engine, helping users find a specific product at a low price. With product reviews, Froogle gives users the information they need to differentiate between products, helping them find the right product at the right price. It's a much more useful service.

Or perhaps we'll see small business merchant reviews integrated into Google Local, as Yahoo Local already has done. Currently, Google Local is essentially the Yellow Pages, helping users find a local merchant. With merchant reviews, Google Local would help users differentiate between merchants and find the right merchant for the task. Another much more useful service.

I'd expect to see both of these from Google soon.

Le Monde offers weblogs reports that Le Monde is now offering weblogs, "the largest newspaper in the world to do so."

Robert Andrews says, "Sacre bleu! C'est magnifique! Because Le Monde now provides weblogs to its readers under the brand of the newspaper."

Susan Mernit praises the move and then slams the rest of the news media, saying, "US newspapers, how about having some courage yourselves?"

Friday, December 03, 2004

Google, small businesses, and eBay

Fortune has an interview with Eric Schmidt (CEO of Google). It's interesting and worth reading.

One particular quote on helping small businesses caught my eye:
    The longer term goal is to have businesses give us very timely local information. So, for example, they'll say we have too much of this or too much of that product, and we want to have a sale. The goal is to have the computers arrange that real time and send out targeted advertising to interested parties nearby.
Eric is saying that they want to help small businesses tell interested people about individual products.

Is this more than just targeted advertising? To me, it's starting to look like allowing merchants to use Google to sell their products.

What do I mean? Consider eBay for a moment. What is eBay really? It's classified advertising. Small merchants post advertisements for their products on eBay. Buyers come to eBay, find what they need, and close the sale, often using a non-eBay site for payment. eBay's business is essentially classified advertising.

Now, look back at what Eric Schmidt said. Google will help merchants target individual products to interested people. Advertising at this level of granularity is very similar to eBay's product. Using Froogle, AdWords, and AdSense, small merchants could sell their products through Google instead of eBay.

[via John Battelle and Gary Price]

Update: Three years later, a BusinessWeek article reports the eBay "magic is gone ... Shoppers are simply not buying all the inventory anymore. Some items languish without a single bidder. Many shoppers opt for other sites including, use sophisticated search engines such as Google and Yahoo!, or head to store sites directly."

eBay and Craigslist vs. traditional media

Steve Rubel predicts eBay and Craigslist will merge in 2005 and "usher in a new era where citizen journalism is directly funded by person-to-person commerce."

Steve also notes that eBay and Craigslist "have already eaten away at one of the core underpinnings of big media - the classified advertising dollar" and quotes Dan Gillmor as saying:
    The real threat to traditional journalism isn't blogging. It's eBay, the largest classified ads publisher.
See also my earlier posts ([1] [2]) on Craigslist.

Thursday, December 02, 2004

Data-driven headlines

Eric Peterson is one of many to look at what's going on at Las Ultimas Noticias. The Chilean newspaper is using click data to see what stories are popular and picking headlines for the next day's paper based on that data. Clever to use of online data (clicks on their website) to change an offline publication (their print newspaper).

As Danna Harman describes, the technique has been part of turning the paper from "a middle-of-the-road piece of nothing" into "Chile's most widely read newspaper today."

But some are concerned that newspapers "just cater to the lowest common denominator" if they use this kind of data without exercising good judgment.

Wednesday, December 01, 2004

We're on a roll!

Findory has grown tremendously in the last month:

Personalized Search: We're the first commercial search engine to modify web search results in real-time based on the searcher's behavior. Our changes are modest for now, but will increase over the coming months. Our News and Blogs search engine is also personalized.

New Look: Our redesigned website combines Findory News and Blogory, making it easy for you to keep up with current events.

Source Pages: Now every news source and blog has its own page. This makes it easy to find recently published articles. Our readers are also using it to discover related articles and explore related sources.

Search History: Findory keeps track of all your web, news, and blog searches in one convenient place, so you can easily retrace your steps.

Fountains of Feeds: Findory content, both personalized and unpersonalized, is now accessible from 44 categorized RSS feeds.

Findory Blogs by E-mail: Weblog headlines are now available as a daily e-mail delivery, alongside Findory News daily e-mails.

Last but not least, just today we launched My Recent Sources, a new feature which makes it easy for you to keep track of what news and blog sources you've been reading.

We're so pleased by the positive reaction to all our hard work. In the last month, traffic to Findory more than doubled! Thanks to all our readers for using our service and providing valuable feedback.

The magic behind Google

Matt Loney at ZDNet UK has a fantastic article that describes Google's infrastructure and their scalability and reliability challenges. Very much worth reading.

If you want more, much of this is covered in more detail in the Google Cluster Architecture and Google File System papers.

[via Andy Beal]

Tuesday, November 30, 2004

MSN Spaces

Mary Jo Foley reports that Microsoft will launch a new weblog service this week called MSN Spaces.

MSN Spaces is "a direct competitor to blog creation and hosting tools such as [Google's] Blogger [and] Blog*Spot, LiveJournal and TypePad."

I'm curious where Yahoo is in all of this. Given all their other community and content creation features, you'd expect Yahoo to be faster out of the gate on a Blogger knockoff than Microsoft. But not this time.

Update: MSN Spaces has launched.

Update: Charlene Li lauds MSN Spaces' integration with MSN Photos, MSN Music, and MSN Messenger and lists it as an advantage over the competition. Yahoo? Are you paying attention to this?

Amazon's citation links

Rageboy apparently saw doing an A/B test of citation links between books. It seems to be gone now, but Rageboy's post includes lengthy descriptions of the new feature.

Paul Bausch also noticed the new feature and has a link to a still-active Amazon help page with a little more information about it.

Interesting. And very useful for technical books. I hope they launch it soon.

Google Scholar offers similar functionality for academic papers and books. Seems that Amazon and Google are butting heads more and more frequently lately.

[Rageboy link was via Traffick]

Update: Gary Price points to some interesting work on citation analysis.

Bloggers talking about Findory

We at Findory have been thrilled by the reaction we've gotten from the blogging community to our site. I wanted to highlight some of the blog postings we've noticed lately:

Brian Dennis at New Media Hack proclaimed the "Daily Me is here" and that "people will lock onto sources out in the long tail."

John Battelle at Searchblog praised our pace of innovation, saying that Findory "keeps on truckin'" by "very often" announcing "cool new features".

Gary Price at Search Engine Watch writes about Findory frequently ([1] [2] [3]), most recently saying, "It seems like every week or so Findory launches a new service."

On seeing Findory's related sources, Steve Rubel said, "Looks like Rex Hammock and I are separated at birth."

Rex Hammock responded: "Someone much smarter than I can speculate what algorithm causes that to happen. Whatever it is, it makes perfect sense to me as we discovered each other through our blogs and have since become friends and even got together for lunch recently when I was in New York."

Nathan Weinberg at InsideGoogle called Findory "a powerful, smart news site". Nathan even went as far as to say, "I think I can now replace Google with Findory for [some] searches." since Findory "'just works', and works far better than anything out there."

Cindy Chick at Law Lib Tech said: "Personalization obviously has a lot of advantages in a many different areas. Personalization is what Amazon uses to display other items that might interest you, and it's what Findory uses to give you the news that you want to read."

Thanks, everyone! We're glad you're enjoying Findory!

Google desktop search security

Bruce Schneier responds to some of the hype ([1] [2] [3]) over so-called security flaws in Google Desktop Search:
    Google's desktop search software is so good that it exposes vulnerabilities on your computer that you didn't know about.

    Some people blame Google for these problems and suggest, wrongly, that Google fix them. What if Google were to bow to public pressure and modify GDS to avoid showing confidential information? The underlying problems would remain: The private Web pages would still be in the browser's cache; the encryption program would still be leaving copies of the plain-text files in the operating system's cache; and the administrator could still eavesdrop on anyone's computer to which he or she has access. The only thing that would have changed is that these vulnerabilities once again would be hidden from the average computer user.

    GDS is very good at searching. It's so good that it exposes vulnerabilities on your computer that you didn't know about. And now that you know about them, pressure your software vendors to fix them. Don't shoot the messenger.
Exactly right. These security issues exist whether you have Google Desktop Search installed or not.

Monday, November 29, 2004

Personalized news and blog search

Less than two weeks ago, Findory launched personalized web search.

Today, Findory launched personalized news and blog search.

Want to try it? Read a few news or blog articles on Findory, then do a news or blog search for something related to some of the articles you read.

For example, if you read the Wired article "Google Treads on Microsoft's Turf" through Findory, then do a news search for "desktop search", you'll see some of the articles will be marked with our orange personalized icon. Clicking on the icon will explain why the article was recommended.

As with our personalized web search, our personalized news and blogs search is a first step. As we learn more about how to help people find what they need, we'll begin to make more dramatic changes to the search results.

Personalized search is the future. We at Findory are excited to be part of it.

Newsbreak vs. Google News

Steve Outing talks about the launch of Newsbreak, an Australian news aggregator from Fairfax Digital. The aggregator itself seems indistinguishable from many others, but what is interesting is that it comes from a traditional news organization, Fairfax Digital.

Steve says Newsbreak is a reaction to Google News and other online news aggregators. And he thinks this is a positive sign:
    This continues the trend -- a good one, I think -- of traditional news organizations realizing that they can't continue to operate as islands on the Internet. Linking to other sources (even competitors, in many cases) serves the interests of readers, and establishes the news entity as a portal to the world of news, not just its own coverage. Such services give readers of a news brand less of a reason to turn to Google News, et al.
If only they would embrace the opportunity, traditional news organizations should be better positioned to innovate in online news than Google or Yahoo. Up to this point, innovation has been coming from elsewhere.

Google TV search

Stefanie Olsen at CNet reports on Google, Yahoo, and MSN's efforts on search for video streams.

One particularly interesting excerpt on Google TV search:
    Google's project for TV search is ultra-secretive; only a handful of broadcast executives have seen it demonstrated so far. To build the service, the company is recording live TV shows and indexing the related closed-caption text of the programming. It uses the text to identify themes, concepts and relevant keywords for video so they can be triggers for searching.

    The software allows people to type in keywords, such as "Jon Stewart," to retrieve video clips of the comedian's TV appearances, marked with a thumbnail picture with some captioning text, for example. Refining the search results for the show "Crossfire" would display a page that looks similar to a film reel, with various still images paired with excerpts of closed captioned text of the now-infamous fight between Stewart and CNN's "Crossfire" hosts. The searcher could click on and watch a specific segment of the show.
Watch out TiVo (and Comcast, DirectTV, ...).

See also my earlier post, "Query-free news search" which mentions a Google paper on searching television close caption text to find related news articles.

[via Search Engine Watch Blog]

Finding authoritative reviews

Chris DiBona digs up an interesting old article by Mimi Sheraton that criticizes Zagat's user reviews:
    The Zagat surveys stand or fall on their central premise: that thousands of separate opinions add up to something like the truth ... [But] the majority can be wrong, and one well-informed opinion is worth more than those of a thousand amateurs.
It's a great point. How do you find the authoritative, well-informed, useful opinions? Not only does this apply to community-generated content like customer reviews and product ratings, but even to blog postings and discussion forum comments where the signal-to-noise ratio is equally poor.

One common approach is to allow people to rate the reviews. does this for customer reviews, allowing people to vote on whether the review was helpful. Slashdot takes this a step further, not only allowing users to moderate (rate comments), but also allowing users to metamoderate (rate the rating of the comment).

Mimi Sheraton would probably criticize this approach as just layering a popularity contest on top of a popularity contest. And it does have problems. For example, positive reviews on seem to get many more "helpful" votes than negative reviews. Slashdot moderators seem to have an adolescent sense of humor and favor ill-informed rants, perhaps seeking entertainment more than information.

So, what else can we do? Another approach is to attempt to identify authoritative people and treat all of their reviews or comments as higher quality. This is closer to what Mimi wants, well-informed reviewers to count more than uninformed reviewers. The trick is identifying informed reviewers. Amazon and Slashdot both emphasize active users, I'd guess on the theory that those that bother to put in the effort to be involved probably have something useful to say. Users could rate each other, but this again reverts into a popularity contest.

This does seem like a spot where social networks actually could be useful. Who is an authoritative reviewer? Someone who is considered authoritative by other authoritative users. Yes, it's circular, but identifying a seed set of authoritative users is enough to start the process going.

Would this work? Or would it be just another popularity contest?

Got suggestions for Findory?

Got a suggestion for Findory? We've love to hear from you.

Most e-mails to Findory are either suggestions or oh-my-god-this-is-so-great fan letters. We're thrilled by the feedback we've been getting. It's great to have such an enthusiastic and supportive community using and enjoying Findory.

If you do have ideas, suggestions, or things you'd like to see at Findory, please feel free to drop us an e-mail anytime at or comment on this post. We'd always love to hear from you.

Sunday, November 28, 2004

Mamma buys Copernic

Gary Stein (analyst at Jupiter Research) posts that Mamma, a metasearch engine based in Canada, just purchased Copernic, one of the leading desktop search companies.

Gary doesn't seem to think this was a very smart move by the so-called "mother of all search engines". He says, "Desktop Search has fully entered into the world of hype," and criticizes desktop search as having no business model: "No one's going to be cool with seeing ads -- contextual or otherwise -- displayed with their desktop results."

It also seems to me that, if Microsoft makes the default file and e-mail search on Windows "good enough" for most users -- perhaps by releasing MSN Desktop Search as part of Windows -- most of the opportunity for third-party desktop search applications will evaporate.

Update: Five months later, Copernic kills the deal due to a ongoing SEC investigation of Mamma.

Friday, November 26, 2004

The decline of web directories

Tara Calishain bemoans how Yahoo has deemphasized its web directory. Yahoo's latest redesign relegates their web directory to a corner at the bottom of the Yahoo home page.

It's an interesting point, particularly since Yahoo started as a web directory.

Google also deemphasized its directory a few months ago. Google Directory is based on DMOZ, the "largest, most comprehensive human-edited directory of the Web." At the time Google deemphasized Google Directory, I thought Google would be releasing a new, automated version of a web directory soon. That hasn't happened.

Keyword web search is great, but there's times when a browseable web directory is really useful, such as when you want a list of related sites, a comprehensive list of sites, or you're having a hard time specifying a search query that gets you what you need.

Update: Andrew Goodman says, "The lack of a definitive directory or two is the single biggest glaring hole in online search."

Wednesday, November 24, 2004

Newsprint is wasted on the young

Adam Penenberg at Wired says, "Newspapers Should Really Worry":
    [Young] focus-group participants declared they wouldn't accept a Washington Post subscription even if it were free. The main reason (and I'm not making this up): They didn't like the idea of old newspapers piling up in their houses.

    Don't think for a minute that young people don't read ... They access The Washington Post website or surf Google News, where they select from literally thousands of information sources. They receive RSS feeds on their PDAs or visit bloggers whose views mesh with their own.

    In short, they customize their news-gathering experience in a way a single paper publication could never do. And their hands never get dirty from newsprint.
But should newspapers be worried? This trend toward online news is an opportunity. No longer are newspaper articles competing for scarce space on the front page and the limited space available on the newsprint. No longer are articles limited to distribution to a localized markets.

The online news audience is massive and worldwide. It's hungry for your content. All you have to do is give it to them.

[via Scripting News]

Tuesday, November 23, 2004

Froogle wish lists

Google launches wish lists in Froogle. Google is getting better and better for online shopping.

How do you feel about that, Amazon?

Ask Jeeves and advertising

Jefferson Graham at USA Today writes about Ask Jeeves. Some excerpts:
    The company's signature cartoon butler, known as Jeeves, was a symbol of dot-com excess ... "We had great marketing, but the product just didn't deliver," [CEO Steve] Berkowitz admits about Jeeves' early days.

    Jeeves was initially known for its gimmick: It promised to answer any query formed in a question. Most of the time, though, Jeeves replied with irrelevant links, sending millions away to alternatives such as Google.

    [Acquiring Teoma in 2001] enabled Jeeves to acquire its own search technology and make its search results more relevant to queries ... Jeeves' most profitable move of all [was] deciding to partner with rival Google. Google-placed text ads, which appear atop Jeeves' search results, represent nearly 70% of Jeeves' income.

    "We look at the Web differently — at the credibility of a source, as opposed to just the popularity of a site," says Jim Lanzone, Jeeves' senior vice president.

    For instance, a search for "Bay Area airports" on Jeeves displays official airport sites for San Francisco, Oakland and San Jose. The same search on Google highlights local newspaper articles about the airports.
The biggest problem I have with Ask Jeeves is the focus on advertising. SiliconBeat illustrates this well with screenshots of the same search on Google, Yahoo, and Ask Jeeves.

On Google, search results are at the top. On Ask Jeeves, advertising (sponsored results) fill the top of the page. Which is more appealing to someone trying to find something?

Ask Jeeves' advertising-focused page may result higher short-term revenue, but Ask is crippling its long-term growth with its obnoxious and intrusive advertising.

[via Gary Price and Andy Beal]

Artist similarities in music

Brian Dennis points to an interesting paper by Brian Whitman and Steve Lawrence, "Inferring Descriptions and Similarity for Music from Community Metadata".

If that title didn't turn you off completely, the paper does have an interesting idea. Basically, they mine text in web pages, discussion groups, and blogs (which they call "community metadata") to discover information about music artists. They extract phrases from the community metadata and use it to find relationships. Because they analyze the web pages and discussion groups continuously, they claim to be able to capture short-term trends, like a groundswell of buzz around a particular song or artist.

This idea of extracting data and relationships from community metadata is clever. is an interesting example of this for books. It "watches weblogs for books that they're talking about". Memeorandum is an interesting example for news. It watches blogs to see what news articles they are talking about.

By the way, one of the authors of this paper, Steve Lawrence, is now at Google.

Monday, November 22, 2004

Findory's source pages

Findory launched another new feature this weekend, pages for every news site and blogger in our database. Alex Edelman has a good write-up.

Check out the pages for BBC, Wired, or Nature.

Take a peek at our pages for the blogs ResourceShelf, Searchblog, InsideGoogle, or Scobleizer.

Every article on Findory has a link to the appropriate source page just under the title. Each source page has recent articles, related sources, and related articles. Related sources and related articles are a great way for readers to discover interesting news stories and sources.

Try surfing from source to source using the related sources! I just used them to read recent articles from National Geographic, then clicked over to Nature, then New Scientist, then ScienceDaily.

Personalized search vs. clustering

Raul Valdes-Perez (CEO of Vivisimo) has a CNet article attacking personalized web search and touting the virtues of document clustering.

Raul makes some excellent points on the difficulties of doing personalized web search well. He says people's interests are fleeting and noisy. Raul says it's difficult to accurately infer interests from clickstream data, which is also noisy and imprecise.

It's true that personalized search is challenging. But Raul criticism is overstated. If personalized search learns immediately in response to new data, it can react to people's immediate goals and interests, even if they differ from their long-term behavior. If the personalization helps in more cases than it hurts, then the personalization has value, even if the data is noisy and the assumptions made from the data are speculative.

Raul's solution is to give up on personalized search and do document clustering instead. Vivisimo's Clusty is an excellent clustering web search -- if you haven't tried it, go try it, it's great -- but it requires effort. Users have to refine their query repeatedly using the clusters to find what they want.

People are lazy. They want what they want and they want it now. Google recognizes this, providing an "I'm feeling lucky" button that just sends you to the top search result immediately. They recognize that it's better to just find what the searcher wants on the first try, no refining, no effort.

Personalized search offers improvements to relevance rank by recognizing that relevance differs from individual to individual. Personalized search makes it more likely that you find what you need on the first try.

Sunday, November 21, 2004

Robin Sloan's EPIC 2014

Robin Sloan produced a clever Flash movie called "EPIC 2004" speculating about the future of personalized news and information. Worth watching.

After a brief recap of events of the last decade, the movie speculates about a future product from Google, the Google Grid, a vast file and content-sharing network that appears to be some combination of Blogger, TiVo, Napster, and the Google cluster. In Robin's vision, this is followed by MSN Newsbotster, a personalized news site that appears to be some combination of Findory, Slashdot, Memeorandum, and social networking tools like Friendster. Next up is Googlezon and EPIC, an "evolving personalized media construct" that provide personalized information by summarizing and rewriting content dynamically for each user.

Summarizing news and documents as described in EPIC is very difficult, but it is an active area of research. One of the most interesting examples out there now is Columbia Newsblaster. Microsoft Research also is doing work in this area.

Aside from the silly brand names of Newsbotster and Googlezon, Robin Sloan has created an interesting and thought-provoking vision of the future. Definitely watch the movie.

The movie ends with criticism of this new world of personalized news and information, complaining that it will be dominated by "narrow, shallow, sensationalist trivia", apparently what Robin Sloan thinks is all people really want and all they'll get from personalized news. He also claims Googlezon and EPIC will cause the death of large and well-respected news organizations like the New York Times.

The death of the New York Times? Clearly hyperbole. At best, Google News is another distribution channel for news. While it may reduce traffic to the front page of online newspapers, it drives traffic to their content, to individual articles. As the CEO of AP said recently, the "content will be more important than its container." News organizations will continue and thrive in the future. The only differences are that content -- the work of talented reporters and writers -- will be emphasized and that the content will be distributed more widely.

Are personalized news sites more shallow or more narrow? Compare a personalized news site to the current front page of CNN. The unpersonalized front page of CNN provides only a shallow view targeting some mishmash of the general interests of millions of readers. By trying to satisfy everyone, it satisfies no one, a bland blend of interests that results in mediocrity. And, I only get the perspective of CNN, what they think is important to their readers.

Personalized news provides an opportunity to broaden reader's interests, exposing them to news sources, perspectives, and viewpoints they otherwise would never have seen. A personalized news aggregator provides both breadth and focus, sorting through huge numbers of sources and articles and helping you find what you need.

Personalized news helps you discover news you would otherwise miss. It makes it easier to get the information you need to be well-informed about the events that impact your life. If this is the future, it is a future which should excite us.

Update: A few weeks later, EPIC is making the rounds again and getting some additional coverage (CNet, Slashdot, Traffick, InsideMicrosoft). A couple folks even started a Googlezon blog.

Friday, November 19, 2004

Findory's personalized web search

A couple weeks ago, Findory launched search history for web, news, and blog search. As I've said before, search history is not personalized search.

This week, Findory took our first step toward true personalized web search. In subtle and small ways, we are starting to modify web search results based on your history at

To see the impact, do a web search at Findory, then click on one or two of the search results, then do another search for something fairly similar. In cases where we believe we can help, we'll modify and highlight some of the search results.

Here's a couple specific examples:
  • Search for "Yahoo".
    Click on the top link for
    Search for "Dilbert".
    The Dilbert page at Google a few results down will be highlighted and modestly reranked.

  • Search for "Incredibles"
    Click on the IMDB link (fourth down).
    Search for "Nemo".
    The IMDB page on Finding Nemo will be highlighted and popped up to the top slot.
Please keep in mind these are our first, early, baby steps. The changes are small, infrequent, and subtle. Findory need to learn to walk before it can run. Over time, Findory will better understand how to help people find what they need and the changes will become larger and more frequent.

As small as this step may be, we believe it is a first for a commercial web search engine. Many are talking about personalized search, but no one is doing it. Our personalized web search learns from your behavior, modifies your search results, and helps you find what you need.

Update: It took many months, but a new version of Findory personalized web search has launched that makes more substantial changes in the relevance rank.

Google Kirkland open house

The Google Kirkland open house party was last night. It was a great time. Quite a turnout, totally packed. Strong UW presence, which wasn't surprising, but I was amazed by the number of Amazon and MSN people there.

Brady Forrest (PM at MSN Search, frequent poster on MSN Search blog) was there. Scott Pitasky (former, now head of HR at MSN). Erik Selberg (author of Metacrawler, one of the first metasearch engines, now at MSN Search). Robert Scoble was apparently there, but I didn't bump into him.

I got a chance to catch up with a few Googlers, Joshua Redstone (old friend from graduate school, works on GFS), Peter Norvig, Jeff Dean. Jeff Dean and I had an interesting discussion about the potential for abuse of MapReduce; I was arguing you might see tragedy of the commons issues because the system makes it so easy to consume vast resources on the Google cluster, but Jeff said everyone plays nice and that it isn't an issue. I was hoping to see Joe Beda, but he couldn't make it, unfortunately. David Krane was there, but I didn't see him.

Bumped into a couple of the Slashcode guys too, Brian Aker and Chris Nandor. Unbelievable that Slashdot uses NFS in a production system, but Brian and Chris insist it's not a serious problem.

I finally got a chance to meet Todd Bishop from the Seattle PI in person. Great to see you there, Todd.

Making sense of the chaos

Bill Joy (co-founder, Sun Microsystems) on the Charlie Rose show:
    Our lives are overwhelmed by all the information coming at us in a very disorganized way. We're going to hunger for something that will make sense of all the chaos--that will look at all the things happening in the world and filter and order them in a way that's personalized to us. That will be the next great revolution--that is something that doesn't take an index of the dead information on the Net, but the live information of things as they are occurring and as they are relevant to us.
The next great revolution is finding focus and relevance in the flood of new information. The next great revolution is personalized news.

[via Musing on Technology]

Thursday, November 18, 2004

AI is the mainstream

Adam Bosworth (who left BEA for Google recently) at his ICSOC 2004 talk:
    You want to see the future. Don’t look at Longhorn. Look at Slashdot. 500,000 nerds coming together everyday just to manage information overload. Look at BlogLines. What will be the big enabler? Will it be Attention.XML as Steve Gillmor and Dave Sifry hope? Or something else less formal and more organic? It doesn’t matter. The currency of reputation and judgment is the answer to the tragedy of the commons and it will find a way. This is where the action will be. Learning Avalon or Swing isn’t going to matter. Machine learning and inference and data mining will. For the first time since computers came along, AI is the mainstream.
Managing information overload with AI. That's the future.

[via Niall Kennedy]

Google, Glog, and G-nius

The Motley Fool writes about Google's track record of innovation.

In addition to coining some odd words like "g-nius" and "glog", the article has an interesting piece on Google AdWords and AdSense:
    In 2003, Google derived 97% of its revenues from advertising.

    Google's simple text-based ad results often segue so well with the search that they can hold actual interest for the user. Even the most grudging critic probably has to acknowledge that there was a time when he or she clicked on one of Google's ads because of its relevance to their interests.

    Relevance. That's where Google's got it down, especially with its AdWords and AdSense programs.
See also my earlier post, "Bringing sense to web advertising".

[via Andy Beal]

The big, bad VC

An interesting essay, "10 Reasons to Shy Away from Venture Capital".

The essay echoes advice I've gotten from many others. The distraction and poorly aligned goals are the biggest issues for me. We're too busy executing to play these games.

[via Joel on Software]

Intent marketing

Charlene Li at Forrester writes about personalized advertising, which she labels "intent marketing":
    Today, publishers announce that they have content and an audience that is attracted to that content – so if you're a marketer interested in that audience, the publisher will sell you access to those users in the form of advertising at a specified price. The onus falls to the marketer to figure out where the audience is, hence the important role of media buyers and ad agencies.

    In the future, marketers will announce that they want to reach a certain segment – let's say, women in-market for a car – and are willing to pay $25 per qualified lead. The onus now falls to the publisher to deliver that audience to the marketer. Publishers will be able to see what the "bids" are within the system for a particular user profile and optimize their ad serving to maximize revenue per page.

    This is the development of what I call "intent marketing" where the marketer targets intent, in this case, inferred from past behaviors.
I'd like to see this go one step further. I'd like to see the entire process of targeting advertisements handled as an optimization problem.

In this future, marketers create a large pool of advertisements with specific segments in mind for each ad. The advertisements go out on the network of publishers, mostly showing to people who match the segments, but also sampling related segments outside of the marketers intent. Quickly, the advertisements focus in on narrow clusters of readers who are interested or, if no one seems interested, the advertisements are dropped completely.

I'm not alone in having this vision. Many have talked about it. But it's quite a challenge to implement. It requires a massive amount of data, only possible at scale.

But Google AdWords seems close to doing it. They suggest alternative keywords, show ads for queries that aren't exact matches to the specified keywords, and drop ads that perform poorly. The next step is to use the vast amount of data they have on what ads are effective to start showing ads for other keywords than what was specified and to further narrow the targets when responsive audiences are found.

Already, I click on Google ads much more than other ads because they're relevant, especially when I do a Google search for a specific product. Perhaps advertising actually can be informative, unobnoxious, and useful.

See also my earlier post, "Bringing sense to web advertising".