Wednesday, December 29, 2004

BitTorrent, Internet TV, and personalization

Clive Thompson at Wired has an interesting article on BitTorrent, the filesharing software that has tens of millions of users and generates about a third of all internet traffic.

The entire article is worth reading, but I wanted to highlight this excerpt on using BitTorrent for watching TV:
    BitTorrent is something deeper and more subtle. It's a technology that is changing the landscape of broadcast media.

    "All hell's about to break loose," says Brad Burnham, a venture capitalist with Union Square Ventures ... BitTorrent does not require the wires or airwaves that the cable and network giants have spent billions constructing and buying ... BitTorrent transforms the Internet into the world's largest TiVo.

    If enough people start getting their TV online, it will drastically change the nature of the medium ... The whole concept of must-see TV changes from being something you stop and watch every Thursday to something you gotta check out right now, dude. Just click here.

    What exactly would a next-generation broadcaster look like? The VCs at Union Square Ventures ... suspect the network of the future will resemble Yahoo! or - an aggregator that finds shows, distributes them in P2P video torrents, and sells ads or subscriptions to its portal. The real value of the so-called BitTorrent broadcaster would be in highlighting the good stuff, much as the collaborative filtering of Amazon and TiVo helps people pick good material.
In a flood of information, we need focus. With tens of thousands of TV shows, we need personalization to filter, to help us find what we need.

See also my earlier post, "Will somebody please fix TiVo Suggestions?"

Tuesday, December 28, 2004

The problem for newspapers

Michael Bazeley posts that Craigslist has cost SF Bay Area newspapers $50-60M in classified advertising. Bob Cauthorn (former VP at the SF Chronicle) is quoted as saying:
    The problem for newspapers isn't Craigslist. The problem for newspapers is the newspapers themselves. Specifically, that class of slow-blink-rate executive who refuses to see today through the lens of today....They recite from business self-help manuals and reduce the hard work of innovation and creativity to comic book parables. Meanwhile, they lose market share, circulation and audience. Ultimately these people will cost an industry its future.
Harsh words. I'd say that Cauthorn is being unreasonable, but moves by newspapers such as mandatory registration seem to support his fears.

Newspapers used to have localized monopolies on distribution. Reading the local newspaper was the only way to see local news and local classifieds.

Increasingly, newspapers have to live in a world of decentralized distribution. Advertisements that used to run in a local paper may now run on Craigslist, Yahoo Local, Monster, or eBay. More visitors will come to read local news not through the front page of newspaper's website, but via RSS feeds or aggregators like Google News.

Newspapers know local better than anyone. They know the local advertisers. They know the local news. They are the kings of local content.

Tom Curley (CEO of AP) said it best: "The franchise is not the newspaper; it's not the broadcast; it's not even the Web site. The franchise is the content itself."

Newspapers should take advantage of decentralized distribution. Before, advertising and classifieds would run in a print newspaper to a small subscriber base. Now, newspapers could distribute local advertisements out across many channels, with the newspaper managing the key relationship with the local advertisers. Before, reporters for the paper often find their articles condemned to the back pages, read by only a few thousand readers. Now, a vast audience of readers can discover their work through RSS and news aggregators, pulling readers to the newspaper's website through the strength of their content.

Grasping for the fading monopoly on local distribution will only cause it to slip away faster. Focus on the content. Embrace change.

Friday, December 24, 2004

In 2005, search becomes personal

Alfred Hermida at the BBC writes about personalized web search:
    For all these advances, search is still a clumsy tool, often failing to come up with exactly what you had in mind.

    In order to do a better job, search engines are trying to get to know you better, doing a better job of remembering, cataloguing and managing all the information you come across.

    "Personalisation is going to be a big area for the future," said [Marketing Director at Yahoo] Yonca Brunini.

    "Whoever cracks that and gives you the information you want is going to be the winner. We have to understand you to give you better results that are tailored to you."

    This is perhaps the Holy Grail of search, understanding what it is you are looking for and providing it quickly.
[via Findory]

Thursday, December 23, 2004

What has become of Orkut?

Nathan Weinberg said:
    There's no shortage of people at Google who are disappointed with the way Orkut is not catching on. Google really wanted to build a powerful community, and it isn't going to happen through Orkut.
When I read Nathan's post, I realized it had been months since I used Orkut. Like many people, I played with it a bit when it first came out, set up my little network, got in contact with a few old friends and colleagues.

It was a fun toy. But the fun died quickly. The discussion forums were useless, all noise, no signal. The messaging system was full of spam, people foolishly broadcasting inane messages out to all friends of friends. And Orkut became so slow as to be unusable (something that, I can only assume, is quite embarrassing to the rest of Google).

My visits, initially a couple times a day, dropped to once a week, then dropped off entirely.

The toy wasn't fun anymore, so I stopped playing with it. Had it been more than a toy -- if it were a useful tool that helped me with my life -- I would have stuck with it, but there was no real value to Orkut.

Checking it out again now, it seems that everyone else abandoned Orkut too. The only ones left seem to be Brazilian teenagers. Oh, Orkut. What has become of you?

Webfeeds and ease of use

Rich Gordon says using webfeeds is still too difficult for most:
    Talking to novices about webfeeds is like trying to explain the World Wide Web in 1995 to someone who'd never used a browser. But as soon as browser software became easily accessible and there was good content to view through it, the significance of the Web became clear to most everyone.

    Because the Web (and XML) already existed when RSS was invented, it was relatively easy to generate webfeeds with interesting content. But we're still waiting for the equivalent of the first Netscape browser -- the software that makes ordinary consumers ... go, "Aha. I get it."
It's not at all clear to me that ordinary users want to know what a webfeed is. They just want news. They want their news to be quick to access, easy to read, and relevant to their lives.

Focusing on webfeeds confuses the tool with the goal. Webfeeds are a means to an end, not the end itself.

Smart aggregation

Mike Davidson posts on the need for smart aggregators:
    Information overload. It’s the next big issue in publishing, and technology in general.

    With the internet still growing and changing at such a rapid rate, the raw amount of information your brain processes will see a huge increase ... The flow of information into our lives is only going up and our free time is only going down ...

    The key to our information gathering lives is all about smart aggregation. The days of media companies deciding what’s on your "front page" are numbered. Within five years, I believe customizable newsreader technology ... will be as prevalent as the web is right now.
[via The Shifted Librarian]

Wednesday, December 22, 2004

Battelle's 2005 predictions

It's that time of year again. John Battelle has his predictions for the search war for 2005. It's a great list.

Personalized news and search isn't mentioned explicitly, but is implied in the long tail (#4) and in redefining what's possible in search (#10).

The further entry of Yahoo and Google into e-commerce (#7) seems to me like a bigger threat to Amazon and eBay than John says. If Google's AdWords intrudes on classified advertising (see "Google, small business, and eBay") and Froogle becomes the place to find and buy anything online (see "Froogle adds product reviews"), eBay and Amazon will be hurt.

See also John's predictions from 2004 and how they turned out.

Innovation and the GYM triumvirate

Adam Rifkin has some great ramblings on the search wars in 2004-2005.
    In 2004, Google is the leader of GYM -- the triumvirate of Google/Yahoo/Microsoft, which in turn leads a dozen other related companies in the web-related innovations that improve peoples' lives.

    Google lays down one gauntlet after another -- a better email experience and a Gig of storage, and a better desktop experience in searching my stuff, to name two examples from 2004 alone -- and Yahoo and Microsoft follow the leader by improving their email experiences and announcing their desktop search tools. Often then others follow the troika -- even if, as in X1's and Lycos's cases they actually had desktop search before Google did, once Google plants a flag it's like a shot hearing round the world, and everyone seems like a follower.

    Together GYM and their followers offer a suite of tools that give me hope that I can manage my personal Web -- and accelerate my ability to search and research simply, to discover and find again easily, to filter and incorporate suggestions collaboratively. As the web grows, so does each of our personal Webs, and tools become not just important but critical to productivity.
And, dipping my knife into that peanut butter, a "Googlecalifragilisticexpialidocious" and "Yahoocalifragilisticexpialidocious" 2005 to you too, Adam.

News moves to the Web "The only news source showing an increase in daily use since Gallup's 2002 poll on media usage is the Internet."

Update: More details from an article at Editor & Publisher. [via JD Lasica]

Tuesday, December 21, 2004

Yahoo and Google, vive la difference

Michael Liedtke has an AP article on the differences between Yahoo and Google:
    Google ... is devoted to ... transforming the way the world finds and stores information, even if that means sending people somewhere else.

    Yahoo ... strives to be all things for all people — a one-stop destination for recreation, work and research.

    Google ... takes a ... laissez faire approach toward innovation, embracing new ideas and products long before the company's management figures out how everything fits into the overall business plan.

    Yahoo takes a more practical approach to technology, first identifying what people want and then building or buying a product designed to give visitors one less reason to leave its Web site.
[via Andy Beal and Gary Price]

Monday, December 20, 2004

Killing comment spam

Jeremy Zawodny argues that search engines should stop using links in weblog comments for PageRank in order to reduce the incentive for comment spam.

As with e-mail spam, the basic problem is that, at least for some, the benefits of posting spam exceed the costs. So, how do you attack the problem? Increase the costs or reduce the benefits.

Not counting links in weblog comments for PageRank reduces the benefits. People won't be able to use weblog comments to inflate their PageRank.

But this alone is not sufficient. There's value from a spammer just to having a link or even just a product name mentioned in a public forum. Since the costs are so low -- just like with e-mail spam -- a spammer only needs a tiny fraction of spammed people to respond to make their campaign of annoyance worthwhile.

Increasing the costs will have to be part of the solution. Spammers rely on being able to hit tens of thousands of weblogs automatically, so anything that makes this automation more difficult increases costs.

And there's many strategies out there to make weblog spam more difficult. Blacklists ban specific IP addresses from posting comments. Some require an account or a verified e-mail address before posting. Requiring entering a code from a distorted image (that is difficult for a robot to read) is another technique. Even asking a simple question (e.g. "What's the third word in this sentence?") before posting can be enough of a hassle to block spammers if everyone asks a different question.

But this will be an ongoing problem. The full costs of spam are not borne by the spammers. As long as someone, somewhere finds comment spam rewarding, the problem will exist.

[via John Battelle and Joseph Scott]

Paper on cracking Google Desktop Search

Seth Nielson, Seth Fogarty, and Dan Wallach released a paper, "Attacks on Local Search Tools"(PDF), that discusses in detail the widely reported security flaw in Google Desktop Search.

The paper is worth reading. Most interesting are the details on the implementation of Google Desktop Search. They found:
  1. Google Desktop must be observing all outgoing network connections.
  2. Google Desktop performs packet analysis to identify HTTP proxy connections in addition to looking for direct connections to Google.
  3. The search requests did not need to originate from a web browser visiting
  4. Integration is triggered by observing outgoing packets, and occurs after packets are received, but before they are given to the web browser or application.
This is pretty cool. Google Desktop Search integrates local results into a Google search by intercepting the request out to Google and rewriting it before it gets to the web browser.

At this point, Nielson et al. had already found the chink in the armor, that the request doesn't have to be from a web browser directly. They tried a few tricks to get Google Desktop Search to show local data inappropriately. And were successful.
    We found that the Google Desktop personal search engine contained serious security flaws that would allow a third party to read the search result summaries that are embedded in normal Google web searches by the local search engine. While an attacker would not be able to read the victim’s files directly, the search results often contain snippets of the file results that will be visible to the attacker.
Doh. No need to panic though. Google has already patched the problem and automatically updated everyone.

Google Desktop Search's integration of the local search results into a Google web search was really clever. Ever since I saw it, I've been curious about the details of how it was implemented. This paper was an enjoyable read.

[via eWeek and InsideGoogle]

Update: Nikhil Bhatla (PM, Google Desktop Search) posts about the security patch on the official Google weblog.

Saturday, December 18, 2004

Unfortunate AdWords

El Cogote discovered Amazon, eBay, and others running silly ads on Google for many terms.

For example, a search for "misfortune" brought up an eBay affiliate ad that said, "Find it on eBay! Misfortune and much more."

It's easy to play this game with other terms. I just did a search for "fraudulent" and got an eBay affiliate ad saying:
    Low prices and huge selection!

[via Xeni Jardin]

Friday, December 17, 2004

Business model for Bloglines

Eric Peterson (Jupiter Research) posts about Bloglines' plan for advertising:
    "AdWords on Steroids" ... Any article or feed I'm interested in [has] content that can be mined and transformed into relevant pay-per-click advertising.

    While Google and Overture sell advertising based on a limited number of keywords, the content in feeds is rich with information that can be mined to laser-target the advertising.

    [Bloglines CEO Mark Fletcher] commented that the aggregate of subscriptions could also be mined to provide additional inventory, e.g., if I subscribe to Engadget and Gizmodo there is A) a strong chance I am a personal technology person and B) I am probably subscribed to other blogs that are gadget-relevant.

    Mark's idea makes sense and is a better idea than injecting advertisements into my feeds.
There is a lot of rich data here. There is an opportunity to do well-targeted, relevant, useful, and unobtrusive advertising. We at Findory are planning something similar for our advertising engine.

Thursday, December 16, 2004

Froogle adds product reviews

Stefanie Olsen at CNet reports that Google has added product reviews to Froogle. Google is aggregating the product reviews from other sites.

It's a good move by Google. A couple weeks ago, I said:
    Currently, Froogle is a price comparison engine, helping users find a specific product at a low price. With product reviews, Froogle gives users the information they need to differentiate between products, helping them find the right product at the right price. It's a much more useful service.
Froogle is suddenly moving quickly. It was only a few weeks ago that they added merchant reviews and wish lists.

[via Search Engine Watch Blog]

MIT Tech Review on Amazon Web Services

Wade Roush raves about Amazon's web services in MIT Technology Review:
    While companies such as Google and Microsoft are also experimenting with the idea of letting outsiders tap into their databases and use their content in unpredictable ways, none is proceeding more aggressively than Amazon.

    The company has, in essence, outsourced much of its R&D, and a growing portion of its actual sales, to an army of thousands of software developers ... The result: a syndicate of mini-Amazons operating at very little cost to Amazon itself and capturing customers who might otherwise have gone elsewhere.

    It's as if Starbucks were to recruit 50,000 of its most loyal caffeine addicts to strap urns of coffee to their backs each morning and, for a small commission, spend the day dispensing the elixir to their officemates.

    The strategy behind Amazon Web Services is to give programmers virtually unlimited access to the very foundation of Amazon's business -- its product database -- whether they are inside or outside the company's walls.
Web services engage creative and talented outside software developers. It's distributed research and development, reaching beyond the walls of the firm to seek innovation.

See also my earlier posts ([1] [2] [3] [4]) on Amazon web services.


Fine-grained, implicit, and anonymous

Laurianne McLaughlin comments on personalized web search in a light article in IEEE Distributed Systems:
    Enhancing personalized results is a large near-term goal for Web search.

    "Our challenge is to read a user's mind," says Daniel Read, vice president of product management for Ask Jeeves. It's an intriguing challenge, given that most Web searches today still contain just two to three words.
The example given of personalized search -- learning of a general interest in cooking and biasing all search results toward cooking -- is coarse-grained and doesn't capture the potential of personalization. Biasing all my searches toward a general subject interest isn't likely to work very well. How does my interest in cooking help when I'm searching for a camera? Fine-grained personalization focuses on your mission -- what you are doing right now -- and how to help you find what you want faster.

There's a brief mention of implicit vs. explicit personalization in the article. While it's true that implicit personalization is hard, working from sparse and noisy data, the article missed the major issue with explicit personalization: Most people won't do it. It takes work. People don't want more work. The entire point of personalization is to make things easier.

There's also a brief mention of privacy, something that can be handled by making users anonymous.

Personalized web, news, and blog search on Findory is fine-grained, implicit, and anonymous. We keep our eye on the goal, helping searchers find what they want quickly and easily.

[via Gary Price]

Blinkx TV

Blinkx announces their new search for TV clips called Blinkx TV. More information on their About page.

A search for "Jon Stewart" turns up a number of mentions of the Daily Show host on several news programs (but no clips of the Daily Show). Fun and probably useful for some.

Blinkx's core technology is implicit search, finding information you need automatically without you. I'd assume the idea behind their TV search is to automatically surface TV clips if they're relevant to your current task.

For example, if you're reading a web page on RSS, perhaps they would surface some relevant video clips of news programs talking about RSS. It's a very hard problem, but it'd be pretty cool if they can do it right.

Google supposedly is also working on TV search, but it's still vaporware.

[via Search Engine Journal]

Update: Gary Price says Blinkx has always had video search and that the only thing new here is the standalone web interface. I didn't realize that.

Gary also describes the search, which does search the transcript of the TV broadcast, and points out some other small tools that do something similar.

Wednesday, December 15, 2004

Google getting off task?

After hearing about Google's library project, David Coursey at eWeek says that Google is losing its focus.
    My Google searches today are significantly less useful than the searches I made just a year ago. This is partially a reflection of the ever-increasing size of Google's collection, but it also shows how information providers have learned to spoof Google's robotic system.

    I'd rather see Google concentrate on getting search right than trumpet how much is being added to its sea of information.

    The company's first task should be throwing us a line, not building a bigger ocean.
Help me find focus in a flood of information. Help me find order in chaos. Help me find what I need.

That is the opportunity and the challenge.

[via Search Engine Watch Blog]

The threat to Microsoft

Joe Wilcox (Jupiter Research) explains why Microsoft considers Google a threat:
    The real threat remains the Web and how a vendor like Google has found a new way to exploit the Internet's utility beyond Windows.

    Search is one of several mechanisms (fast data connectivity is another) that could catalyst alternative platforms. Search would give tremendous utility to portable devices connected to the Internet or home or corporate networks. With so much computing focus on information and so much information stored somewhere else (meaning not locally), ubiquitous search could unify the utility of many disparate types of devices.

    So like Microsoft integrated the browser into Windows to fight off the threat posed by the Web, so the company is looking to tie the utility of search to its operating system. Because any technology utility where no Windows is required threatens Microsoft's core franchise.
The threat is much larger than just Google. It's about the future of Windows as the dominant computing platform.

Microsoft has been fighting this battle for many years. They worried about the rising power of handheld devices like Palm Pilots and cell phones, so they launched Windows CE. They worried about the additional functionality being built into game consoles, so they launched XBox. They worried about the rise of entertainment devices like TiVo and Replay, so they launched Windows Media Center. They worried about the threat from web-based applications, so they launched IE and MSN.

The latest shining star is Google. It's popular to talk about the search war as involving just two players, Microsoft and Google. In fact, the search war involves many players: Google, Yahoo, AOL, Microsoft, Amazon, and many smaller firms. And, the search war is only one front in the broader war Microsoft must fight to continue its dominance.

Tuesday, December 14, 2004

Google's war with Microsoft

Charles Ferguson publishes a long article in MIT Tech Review on Google's "war with Microsoft":
    Google's defeat is not a foregone conclusion. Indeed, if it does everything right, it could become an enormously powerful and profitable company, representing the most serious challenge Microsoft has faced since the Apple Macintosh. But if Microsoft gets serious about search -- and there is every reason to believe that it will -- Google will need brilliant strategy and flawless execution simply to survive.

    What should Google do? Google should understand that it faces an architecture war and act accordingly. Its most urgent task must be to turn its website into a major platform, as [Amazon has] already done.

    Google should first create APIs for Web search services and make sure they become the industry standard. Second, it should spread those standards and APIs, through some combination of technology licensing, alliances, and software products, over all of the major server software platforms, in order to cover the dark Web and the enterprise market.
The Microsoft giant is awake, says Charles, and it's hungering for a Google snack.

The impressive Google cluster is one part of Google's competitive advantage. I'm curious to see if Google does start offering better web services APIs. I'd certainly love to get my hands on that juicy cluster.

But will Google lose the search war if it doesn't offer better web service APIs? I doubt it.

Google has an impressive track record of innovation on its own. Amazon has web services APIs because it is seeking outside developers to boost innovation. Yahoo is considering them for similar reasons. But it's not clear Google has a problem with innovation. Google's biggest problem seems to be getting all the innovations available internally out the door and available to the public.

Furthermore, Google's lifeblood is advertising. Google is in the middle of building an advertising revolution. I think it is the AdSense revolution that will empower small websites and businesses, wrapping them around Google, not a software API into Google's infrastructure.

That being said, I do expect Google to launch services that allow users to further exploit the power of the Google cluster. But I expect these to be finished services like GMail that target end users, not web services targeting developers.

[via Gary Price]

Monday, December 13, 2004

Google Library

John Battelle reports that Google is digitizing the collections of major libraries.
    Google is working with Stanford, the University of Michigan, Harvard, Oxford, and the New York Public Library to make millions of books available in its index.

    The idea that the world's knowledge, as held through books and libraries, is opening up to all via a web browser cannot be understated. It's one thing to have the an original copy of The Origin of Species on the shelves, where students and interested parties have to travel to find it. It's another to have it available to everyone via a search index and your web browser.

    This could well be a step toward diversifying Google's revenue streams away from advertising and into ... the content business ... Google is not doing this only out of the kindness of its heart - there is a lot of money to be made in selling books, in particular books with no copyright.
Are you paying attention to this, Amazon?

Update: Felicia Lee from the New York Times writes about Google Library and the reaction to it from scholars.

Update: Tara Calishain posts that the Internet Archive is expanding its text archive with support from ten international universities.

Ask Jeeves desktop search

Andy Beal says Ask Jeeves will be launching their desktop search on Wednesday and gives some details.

Goodie. Now everyone's got one. Google, MSN, Ask, Yahoo, and (soon) AOL.

I'm not sure why these companies are launching poorly differentiated products into a crowded space. Google has some nifty integration with their web search that I kind of like. Other than that, I can't tell the difference between these offerings.

Even if one were better, it's not clear that'd be enough. The only reason this market opportunity exists is that the default search in WinXP and MS Office is poor. As soon as MSN integrates their desktop search into Windows, this game is over.

Usama Fayyad joins Yahoo

Usama Fayyad led the data mining group at Microsoft Research a few years ago. He just joined Yahoo.

He and his group were relatively early applying statistical analysis techniques to massive data sets. I remember particularly liking his "Scaling Clustering Algorithms to Large Databases" paper.

Yahoo certainly has plenty of juicy, tasty data. Mmm... data.

[via Yahoo Search Blog]

MSN desktop search

Microsoft joins the desktop search party.

MSN seems to be using Lookout (a company they acquired) for much of their desktop search. Yahoo is licensing X1's desktop search. AOL will be using Copernic. Only Google decided to build their own.

MSN's entry is a little amusing since, aside from searching your browsing history, seems like most of what they're doing is just fixing the miserable file search functionality built into WinXP and MS Office products.

I do find the hype behind desktop search mystifying. You're searching a few thousand files and e-mails on a desktop box . The major problem is grokking thousands of different file formats, painful, sure, but not exciting. With web search, it's the scale -- billions of documents -- makes the problem interesting.

That being said, there is some interesting innovation going on in desktop search. Dashboard, and Blinkx are trying to do personalized information retrieval. They have early, first steps toward making your computer figure out what you are doing and what information might be helpful for that task. Very cool.

Saturday, December 11, 2004

Turning noise to knowledge

A few days ago, Findory started personalizing the recent articles on our source pages.

For example, when I go to Wired magazine on Findory, because of my reading history, two articles are marked as personalized, "Troops stay in touch on the internet" and "Yahoo searches desktops too". When I go to Scobleizer on Findory, four articles are highlighted for me.

The problem with current web feed readers is that they don't solve the information overload problem. Sure, I can pick and choose which RSS feeds I subscribe to. But, once you have tens of subscribed feeds, reading them becomes this cumbersome process. Click on a feed, skim the articles. Anything interesting in that one? No. Click, skim. Click, skim. Click, skim. Ugh.

With Findory, the important news bubbles to the top. On the home page, interesting articles are selected just for you, pulled from thousands of news and blogs. On a search, relevant articles are highlighted. When you read a blog on Findory, important posts are highlighted.

Current RSS readers merely reformat XML for display. That isn't enough. They need to filter and prioritize. Show me what matters. Help me find what I need. Next-generation RSS readers will be personalized.

This is about more than just reading news. This is about information. Where before there was an undifferentiated glut of information, now there is focus. Where before there was noise, now there is knowledge.

What will this future look like? Findory has taken the first steps. Come and take a look.

Friday, December 10, 2004

Yahoo, Google, and those pesky humans

In a long post, John Battelle describes the difference between Yahoo and Google:
    Yahoo is a natural media company - the company is willing to have overt editorial and commercial agendas, and to let humans intervene in search results so as to create media which supports those agendas. Google, on the other hand, is repelled by the idea of becoming a content- or editorially-driven company.

    While both companies ... lay claim to the mission of "organizing the world's information and making it accessible" ... they approach the task with vastly different stances.

    Google sees the problem as one that can be solved mainly through technology - clever algorithms and sheer computational horsepower will prevail. Humans enter the search picture only when algorithms fail.

    But Yahoo has always viewed the problem as one where human beings, with all their biases and brilliance, are integral to the solution ... Humans first, technology second.
See also my earlier post, "Humans vs. Robots == Yahoo vs. Google".

Jeff Barr and Amazon web services

Earlier today, I had a chance to chat with Jeff Barr. Jeff evangelizes web services for and runs the web feed directory Syndic8.

It's remarkable what Amazon exposes through their web services. I'm particularly surprised by providing access to all of Amazon's customer reviews.

All of this has resulted in some clever applications. All Consuming, Amazon Lite, and Delicious Library are my favorites.

See also my earlier post, "May the best services win".

Yahoo desktop search coming soon

John Battelle was quick to announce Yahoo's new desktop search product. They decided to buy rather than build. Yahoo licensed X1's search application and are rebranding it as their own.

Todd Bishop (Seattle PI), Danny Sullivan (Search Engine Watch), and Charlene Li (Forrester Research) have informative details.

See also Gary Stein's comments after Copernic was acquired when he said, "Desktop Search has fully entered into the world of hype ... There's not really a revenue model from Desktop Search."

Google Suggest

Google Labs just released Google Suggest, a little tool that tries to guess your search query as you type it in.

It appears to be a simple UI layer over their spelling correction feature. No real query refinement, no clustering to offer suggestions of different terms (synonyms, related topics).

A cute toy, but I don't see anything really interesting here.

[via Nathan Weinberg and Danny Sullivan]

Update: Chris DiBona points to the ABC's of Google Suggest. Cute.

Update: Okay, I take it back. Looking at this more, I'm impressed, not with the data, but with the UI. Google is using some clever Javascript tricks (you can see in the code at to constantly talk back to the server and retrieve data about how to expand your search string.

Neat-o-jet. Like GMail, this is a remarkable use of Javascript to create a simple, clean, functional UI within the web browser.

Update: Kevin Gibbs talks about Google Suggest on the Google Blog. It's another 20% free time project. Cool.

Update: Chris Justus reverse engineers the Javascript for the Google Suggest UI in a detailed post. Very interesting and worth reading.

Update: A clever knock-off of Google Suggest does the same nifty UI for lookups in a dictionary. Nice work.

Update: And a nifty version of the suggest UI for CPAN (a Perl module archive). [via Joseph Scott]

Thursday, December 09, 2004

Personalized search and social networks

Chris Sherman writes about the new partnership between Eurekster and Friendster. I suppose these "-ster" companies just can't help but get together.

Their new search engine personalizes Yahoo search results using your Friendster social network. From Chris' article:
    Search results are prioritized with results viewed by anyone in your personal network appearing at the top of the list. These results are highlighted with a smiley face icon.
It's an interesting approach, but it remains to be seen how well it works. On the one hand, you trust your friends, so things your friends clicked on might be interesting for you to know about.

On the other hand, I'm not sure how often this will change the search results, whether the changes will focus your attention on the most relevant result for your search, and whether it is scalable to access search and clickstream history for everyone in your social network on every web search you do.

Nevertheless, it's an interesting development, an unusual use of a large social network to do a version of personalized web search.

Amazon UK DVD rentals

Amazon UK launches DVD rentals.
    "Amazon is determined to be the best place to rent DVDs -- online or off," CEO Jeff Bezos said.
See also "Is Amazon the new Netflix?"

Wednesday, December 08, 2004

A dime per search

John Battelle posts incredible numbers on the performance of Google AdWords:
  • Average payment per click on a Google ad is $.54.
  • Nearly 17% of searches end in a click on an ad.
Combining these two pieces of data, it appears Google makes $.09 per search.

That's right. Every search you do on Google generates nearly a dime of revenue for the company.

Impressive numbers. And they're only expected to go higher as demand for internet advertising increases.

Google has firmly established that relevant, useful, and unobnoxious advertising can be stunningly lucrative.

Update: John Battelle reports that the latest numbers (Oct 2005) from Google show a 33% increase, now $0.12 per search. Impressive.

Tuesday, December 07, 2004

MSN vs. eBay

Neowin reports that MSN will be launching a new e-commerce service called Messenger Marketplace:
    Buy and Sell within social network, also list wants and share recommendations. List items you want to sell, things you are looking for, and your recommendations. Your buddies notice new items you’ve listed when they login. They can either buy, sell or refer you to one of their buddies.

    It is like eBay except with people that you already know and trust directly (or a few degrees out).
Clever idea. Hard to do this without already having a good social network built, but MSN does have that for MSN Messenger users.

It's starting to look like eBay will have some new competition.

[via Todd Bishop]

Human editors and web search

Danny Sullivan (Founder of Search Engine Watch) says the solution to manipulation of search result rankings is to:
    ... involve human editors as part of the search equation. At one time, several search engines allowed human beings to make editorial choices about what would be shown in response to a query, to complement technological selections. Today, all the major services have sadly followed Google's lead in assuming all things can be solved through automation and search algorithms.
I assume Danny doesn't literally mean human editors hardcoding which results are returned for queries. How do human editors scale to billions of web pages? How do you do this efficiently and effectively, at low cost with high quality?

You might imagine that humans could provide canned responses to the most frequent queries. But this would only apply to a small subset of queries, and even this would be prohibitively expensive to maintain.

A more scalable and more common form of this is shortcuts where a search engine will detect particular categories of queries and return some results from a specialized data source. This is automated, of course, but humans are involved in identifying and creating the shortcuts.

I do wonder how much this debate of human vs. robots is a real issue. Truth be told, search engines have teams of good ol' humans analyzing data behind the scenes. These humans discover patterns in the data that are lowering the quality of the relevance rank, such as search engine spam, and change the algorithms to adapt.

Is this different than using "human editors as part of the search equation"?

Thunderbird launches

Thunderbird 1.0, the Mozilla e-mail client, is out. Excellent.

Interesting that it includes an RSS reader. It's increasingly common to see RSS readers integrated in e-mail clients (Thunderbird), web browsers (Firefox, Safari soon), and portals (My Yahoo).

[via Steve Rubel and Eric Bangeman]

Blocking RSS advertising?

Kottke discusses blocking or filtering advertising in RSS feeds.

[via therssweblog and Jeremy Zawodny]

Monday, December 06, 2004

Most hated advertising

Jakob Nielsen summarizes research on user's perceptions of online advertising and which advertising practices (popups, playing sound, blinking) are most annoying.

Jakob ends with some "Lessons for Websites":
    Sites that accept advertising should think twice before accepting ads that 80 to 90% of users strongly dislike. The resulting drop in customer satisfaction will damage your long-term prospects.

    Advertisers themselves might be tempted to continue with these nasty design techniques as long as they can find sites that will run them. After all, they typically yield higher clickthrough rates. But clickthrough is not the only goal. Users who are deceived into clicking on a misleading ad might drive up your CTR, but they're unlikely to convert into paying customers. And your brand suffers a distinct negative impact when you antagonize customers.
See also my earlier post, "Bringing sense to web advertising".

[via Alex Edelman]

Update: Jeff Boulter slams Yahoo for their annoying advertising.

Affiliate tags in search engines

Rayg (from Feedster) asks, "If search engines are so willing to pimp their space to sponsored links, why ... [not add] an affiliate ID in the [search result] links?" Ray goes on to argue that the move would largely go unnoticed and wouldn't damage credibility.

Jeremy Zawodny (from Yahoo) disagrees, saying that this would blur the lines between sponsored and non-sponsored results too badly. Even if the relevance rank is unbiased by the affiliates revenue, the perception that some links are paid would damage the credibility of the search engine.

I have wondered if Yahoo and Google have considered adding affiliate links, not to their search engine, but to their metashopping searches (Yahoo Shopping and Froogle).

But Jeremy's point that probably applies to shopping search as well. It would look like a conflict of interest and potentially damage credibility, even if the affiliates revenue did not influence their relevance rank.

Sunday, December 05, 2004

Google Reviews?

Gary Price reports that Google recently registered a few new domains, including

As Nathan Weinberg points out, there isn't really a Google Reviews product out there yet. Closest to it is the merchant reviews spidered into Froogle.

I'm curious to see if Google will be releasing a review service that is more of a competitor with Epinions, CitySearch, Zagat, and's customer reviews. What would this look like?

One version of this could be a combination of product reviews spidered from the web and reviews entered by Google users. I imagine these product reviews would be integrated into Froogle, supplementing the store reviews. Currently, Froogle is a price comparison engine, helping users find a specific product at a low price. With product reviews, Froogle gives users the information they need to differentiate between products, helping them find the right product at the right price. It's a much more useful service.

Or perhaps we'll see small business merchant reviews integrated into Google Local, as Yahoo Local already has done. Currently, Google Local is essentially the Yellow Pages, helping users find a local merchant. With merchant reviews, Google Local would help users differentiate between merchants and find the right merchant for the task. Another much more useful service.

I'd expect to see both of these from Google soon.

Le Monde offers weblogs reports that Le Monde is now offering weblogs, "the largest newspaper in the world to do so."

Robert Andrews says, "Sacre bleu! C'est magnifique! Because Le Monde now provides weblogs to its readers under the brand of the newspaper."

Susan Mernit praises the move and then slams the rest of the news media, saying, "US newspapers, how about having some courage yourselves?"

Friday, December 03, 2004

Google, small businesses, and eBay

Fortune has an interview with Eric Schmidt (CEO of Google). It's interesting and worth reading.

One particular quote on helping small businesses caught my eye:
    The longer term goal is to have businesses give us very timely local information. So, for example, they'll say we have too much of this or too much of that product, and we want to have a sale. The goal is to have the computers arrange that real time and send out targeted advertising to interested parties nearby.
Eric is saying that they want to help small businesses tell interested people about individual products.

Is this more than just targeted advertising? To me, it's starting to look like allowing merchants to use Google to sell their products.

What do I mean? Consider eBay for a moment. What is eBay really? It's classified advertising. Small merchants post advertisements for their products on eBay. Buyers come to eBay, find what they need, and close the sale, often using a non-eBay site for payment. eBay's business is essentially classified advertising.

Now, look back at what Eric Schmidt said. Google will help merchants target individual products to interested people. Advertising at this level of granularity is very similar to eBay's product. Using Froogle, AdWords, and AdSense, small merchants could sell their products through Google instead of eBay.

[via John Battelle and Gary Price]

Update: Three years later, a BusinessWeek article reports the eBay "magic is gone ... Shoppers are simply not buying all the inventory anymore. Some items languish without a single bidder. Many shoppers opt for other sites including, use sophisticated search engines such as Google and Yahoo!, or head to store sites directly."

eBay and Craigslist vs. traditional media

Steve Rubel predicts eBay and Craigslist will merge in 2005 and "usher in a new era where citizen journalism is directly funded by person-to-person commerce."

Steve also notes that eBay and Craigslist "have already eaten away at one of the core underpinnings of big media - the classified advertising dollar" and quotes Dan Gillmor as saying:
    The real threat to traditional journalism isn't blogging. It's eBay, the largest classified ads publisher.
See also my earlier posts ([1] [2]) on Craigslist.

Thursday, December 02, 2004

Data-driven headlines

Eric Peterson is one of many to look at what's going on at Las Ultimas Noticias. The Chilean newspaper is using click data to see what stories are popular and picking headlines for the next day's paper based on that data. Clever to use of online data (clicks on their website) to change an offline publication (their print newspaper).

As Danna Harman describes, the technique has been part of turning the paper from "a middle-of-the-road piece of nothing" into "Chile's most widely read newspaper today."

But some are concerned that newspapers "just cater to the lowest common denominator" if they use this kind of data without exercising good judgment.

Wednesday, December 01, 2004

We're on a roll!

Findory has grown tremendously in the last month:

Personalized Search: We're the first commercial search engine to modify web search results in real-time based on the searcher's behavior. Our changes are modest for now, but will increase over the coming months. Our News and Blogs search engine is also personalized.

New Look: Our redesigned website combines Findory News and Blogory, making it easy for you to keep up with current events.

Source Pages: Now every news source and blog has its own page. This makes it easy to find recently published articles. Our readers are also using it to discover related articles and explore related sources.

Search History: Findory keeps track of all your web, news, and blog searches in one convenient place, so you can easily retrace your steps.

Fountains of Feeds: Findory content, both personalized and unpersonalized, is now accessible from 44 categorized RSS feeds.

Findory Blogs by E-mail: Weblog headlines are now available as a daily e-mail delivery, alongside Findory News daily e-mails.

Last but not least, just today we launched My Recent Sources, a new feature which makes it easy for you to keep track of what news and blog sources you've been reading.

We're so pleased by the positive reaction to all our hard work. In the last month, traffic to Findory more than doubled! Thanks to all our readers for using our service and providing valuable feedback.

The magic behind Google

Matt Loney at ZDNet UK has a fantastic article that describes Google's infrastructure and their scalability and reliability challenges. Very much worth reading.

If you want more, much of this is covered in more detail in the Google Cluster Architecture and Google File System papers.

[via Andy Beal]