Monday, January 31, 2005

Personalization and the future of search

Richard Waters at the Financial Times wrote an excellent article on the future of search.

Craig Silverstein (data mining researcher, now a Director at Google) has a quote that leads into a discussion of personalized search:
    "It's clear that a list of links, though very useful, doesn't match the way people give information to each other ... How can the computer become more like your friend when answering your questions?"

    That means giving direct answers to questions, extracting data from online sources rather than giving links to web pages. It also means doing a better job of divining what the searcher is looking for, tailoring results more closely to what, based on past experience, appear to be the user's particular interests.
How can the computer be more like your friend? It has to know you. Like a friend would.

Your friend will give you and someone else different answers to the same question. Why? Because your friend knows you and what you want without you having to say everything explicitly. Information is implied in the long history of your interactions. Not everything has to be stated. Your friend knows what you mean.

So, why does your search engine give the same search results to you as it does to everyone else? Shouldn't it know you? Shouldn't it help you? Like a friend would?

[FT article via John Battelle]

Sunday, January 30, 2005

For feeds, one-click is one click too many

Tim Bray writes about how to make RSS easier for mainstream users. The problem, he says, is:
    All over the web, you see these ugly little orange "XML" stickers, but to subscribe you have to get the URI they're pointing at and paste it into your feed-reader somehow, which is awkward and maybe a little too geeky for non-geeky ordinary people.
Tim proposes a solution, a one-click button that works by sending an "application/atom+xml" data stream that your feed reader uses to subscribe to the feed.

Unfortunately, this only works for client-side readers. Web-based readers like Bloglines, My Yahoo, and My MSN are stuck with the current solution, scattering their little "subscribe on X" buttons across blogs everywhere.

But, stepping back for a second, is Tim solving the right problem? Is the problem that it's too hard to cut-and-paste a URL? Sure, that's awkward, but it's not the core problem.

The problem with RSS is that it's too difficult to find interesting feeds.

Right now, feed readers are used by early adopters. Early adopters love to tinker. They love pain. They'll spend hours fiddling with their feed reader, hunting down and adding new feeds. They'll deal with the hassle of manually skimming through thousands of articles to ferret out the few gems that interest them. Mainstream readers won't.

A one-click button makes it easier to add a good feed one you've discovered it, but it does nothing to help you discover good feeds in the first place.

Next-generation feed readers, feed readers built for the mainstream, will solve this problem by abstracting away the cryptic and unimportant details of the data feed. Next-generation feed readers will help readers discover interesting news and weblog articles without caring where the data comes from.

What will these next-generation feed readers look like? Our vision of it is at Findory.

See also "XML is for geeks" and "Getting your grandmother to use RSS".

Saturday, January 29, 2005

Google chasing digital identities

Niall Kennedy posted recordings of a few talks at Future Salon, including one by Eric Sachs (PM at Google).

Eric made some intriguing comments about your digital identity, the information online about who you are and what you do. He started by motivating the problem:
    We [at Google] quite frequently hear that what doesn't work well is doing a Google search about someone else.
So, just searching for someone by first and last name often doesn't work very well. And that's a very common search; it needs to work well.

The problem isn't trivial to solve. For example, a home page by itself isn't enough if you can't find it. Even when you can find data, the information is often stale and incomplete. Finally, there's nasty issues with separating identities, differentiating between people with the same or similar names.

Eric even went as far at one point as to claim that the "primary purpose" of both Orkut and Blogger is to solve this problem, to help people "define some type of digital identity on the internet."

One interesting approach to this problem is Eliyon People Search. They try to extract information about people from web pages. "Try" is the key word here -- it's far from perfect -- but the idea of summarizing web pages to do people search has some promise. I wouldn't be surprised to see something similar from Google at some point.

Findory helps bloggers

How's that blog of yours going?

If you're like most bloggers, you sometimes might get a little frustrated with your weblog. Sometimes you might feel like no one reads your blog. Maybe you can't find anything to write about. Let Findory help.

Want traffic? Add your Blog to Findory. We'll show your posts to our hungry readers and send 'em your way.

Want to improve your blog? Add Inline Findory to your page and show your readers interesting news every time they come. You can even show your readers your personalized news, your personalized top headlines based on your reading history.

Looking for something to blog about? You need some interesting news. No, not the mainstream stuff that you see on every news site with an AP feed. You need to dig deep and find things people haven't seen. You need Findory. Read a few articles, teach Findory your interests, and Findory will pull up interesting news and weblog posts that you'd otherwise miss.

Want even more news? Subscribe to your related articles feed and get a constant stream of weblog posts related to what you've been talking about on your blog. Nothing else like it out there. You can even put your related articles up on your blog using Inline.

Use Findory for your blog. We can help.

Thursday, January 27, 2005

Local search from A9 and Amazon.com

It's being widely reported that A9 announced their version of a Yellow Pages local search.

Most of the coverage is focusing on A9's "20 million images of businesses and their local surroundings" that gives searchers a picture of the storefront.

More interesting to me is the integration with Amazon.com, particularly the fact that every business has a detail page at Amazon. For example, here's the detail page for Wild Ginger, a Seattle restaurant.

If every business has a page at Amazon, you can write reviews of the restaurants, rate the restaurants, even get Amazon.com recommendations.

The ability to evaluate and differentiate local businesses is what has been sorely lacking in Yellow Pages. This new feature brings Amazon and A9 into the same space as guidebooks such as Fodor's, Zagat, and Citysearch.

Wednesday, January 26, 2005

What's related at Findory

For every news and weblog source at Findory, we now have related articles available as a RSS feed and an inline widget. That's right, thousands of feeds helping you discover interesting news from thousands of sources, all built on the same innovative personalization system that drives Findory.

Do you read Wired? How about subscribing to the RSS feed for articles related to Wired?

Read Slashdot? Discover interesting other interesting "news for nerds" with our Slashdot related articles RSS feed.

Are you a blogger? Show your readers related articles! Our inline widget constantly updates as you add new posts and as other weblogs post new articles. Take a look over in the right column at the inline for articles related to Geeking with Greg. It's a fun way to spiffy up your weblog and help your readers find other interesting weblog posts.

Findory interview

I did an interview a while back with Search Lounge about Findory.

Here's an excerpt on finding relevant articles in weblogs:
    The real issue with weblogs is finding good content. Weblogs are self-published. No publisher means no filter. That’s a good thing and a bad thing. It’s good because it opens the floodgates for so much new content. It’s bad because those filters are sometimes useful, helping readers differentiate useful from useless. This problem will only get worse and worse as the blogging phenomenon accelerates.

    Findory is all about relevance. In a sea of information, how do you surface what people need? Current web and weblog search engines all use the same relevance rank for all searchers, but not everyone has the same definition of relevance. Findory learns your interests –-- what is relevant for you –-- and surfaces that content.
We need to expose the long tail of weblogs articles, surfacing the interesting posts buried in the less popular, largely unknown blogs. There's so much useful information out there. We need to help people find it.

Social networking + RSS = Rojo

Corie Lok at MIT Technology Review writes about Rojo, "the first company to combine RSS aggregation with social networking."

The basic idea is that people list all their friends on Rojo, read news on Rojo, and share interesting articles and feeds between each other.

Of course, building yet another social network is a lot of work. From the article:
    The company will have to deal with more fundamental questions, like whether people will build social networks at Rojo just to help them sort through RSS feeds, when they probably already maintain networks at places like Friendster.
It's quite a hurdle. Seems like an existing large social network would be in a much better position to do this than someone starting from scratch.

But that's not the only problem. Another problem is that it's just not clear how effective this approach will be.

Just because I'm someone's friend doesn't mean we like the same news. And just because I'm not someone's friend doesn't mean we won't like the same news. Unless your friends are all clones of you, friendships probably aren't the best predictor of your interests. Your friends are different than you. That's what makes them interesting.

What might work better is reaching out to the entire community -- beyond just your friends -- finding the people like you, and having them recommend interesting articles. But then you'd have Findory.

[Tech Review article via Simon Waldman]

Tuesday, January 25, 2005

Google video search

Looks like the rumors of Google video search are real.

It's being widely reported that Google launched their video search in Google Labs. Right now, it searches the closed caption text from eight TV channels. Obviously, just a first step, but it is an interesting step.

Many great comments on this development across the blogosphere. Some excerpts:
    Chris Sherman
    "One interesting twist to the service is that it indexes all content broadcast by the television stations [including advertising] ... Google Video ... differs from Yahoo's recently announced video search prototype and AOL's Singingfish streaming media search, both of which use metadata rather than closed caption information."

    Gary Price
    "After taking a look at Google Video, here are other video search services you'll want to take a look at..."

    Eric Bangeman
    "The new service underscores Google's ambitions to be the premier information source for computer users. Whether you want to search for a phrase in an old college paper on your home computer, look for highlights from last night's sports action, or find a passage in a favorite book, Google wants to be the company serving up the information. And some ads to go along with it."

    Michael Bazeley
    "We're thinking a lot right now about the distributed media landscape (blogs, podcasting, RSS, Tivo, videoblogging), the convergence of the Internet and the TV, and which aggregator/search engine is going to cobble together our daily media experiences in the future."

    Brad Hill
    "Google Video is all about TV, and would be better named Google TV ... As Google TV the service would be unique. As Google Video it is embarrassing [compared to Yahoo Video]."

    Charlene Li
    "Google has built their own technology to capture broadcasts (they set up their own rabbit ears to pull programming from local Bay Area stations) and index the video streams. But you can’t view the videos themselves!"

    Nathan Weinberg
    "You want some sad news for Google lovers? For the moment, Yahoo Video Search is way better. Google has no watchable videos, and no video older than seven weeks ago."

    John Piscitello
    "It's just an early-stage beta product at this point; you'll only see stills and text snippets from shows that match your search terms, and you can only search shows from a few channels, dating back to December, 2004, when we started compiling the index. But we'll be steadily improving Google Video in the months to come, so as they say in the TV biz, stay tuned."
See also my earlier post, "Query-free news search".

Monday, January 24, 2005

Nearest Neighbor News Network

Jacob Lee and Ben Hodes from University of Illinois, Urbana-Champaign have a research project called the Nearest Neighbor News Network. From their about page:
    The Nearest Neighbor News Network (NNNN) is a collaborative filtering RSS aggregator. NNNN collects articles from news sites and weblogs of your choosing and then displays them all together. It also displays other articles that it believes you would be interested in.
Very interesting. You explicitly sign up for various RSS feeds, then it shows you recent articles from those feeds (ordered by date, it appears) and recommended articles from other feeds (ordered by relevance, it appears).

It doesn't learn from what you read like Findory. You have to tell it what you like. It's more like a version of Bloglines that focuses mostly on helping you discover new weblogs and news sources. It's a very cool idea, an interesting step toward a next generation weblog reader.

Unfortunately, it's slow as a dog. The trick with personalization is providing high quality recommendations to hundreds of thousands of users in a fraction of a second. Not sure what algorithm they're using, but a simple implementation of collaborative filtering breaks down pretty badly with only a few thousand users. I suppose that might be their problem.

Anyway, it's a great prototype, a good demonstration of what a simple, easy-to-use, mass market weblog reader could look like. Worth a look.

[via Steven Cohen]

Better web searches

Javed Mostafa at Scientific American writes about the future of web search.

After a good introduction on the basics of web search, he talks about clustering, personalized search, desktop search, search from mobile devices, location-based search, and advances in image search. A good read.

Some excerpts on personalized search:
    If search engines could take the broader task context of a person's query into account -- that is, a user's recent search subjects, personal behavior, work topics, and so forth -- their utility would be greatly augmented.

    Good sources of information on personal interests are the records of a user's Web browsing behavior and other interactions with common applications in their systems. As a person opens, reads, plays, views, prints or shares documents, engines could track his or her activities and employ them to guide searches of particular subjects.
By looking at what you seem to be trying to do, the search engine can modify the relevance rank to help bubble up articles that you might otherwise miss. From your actions, the search engine learns what you want and helps you find it.

It's interesting that Javed seems pretty negative toward personalization approaches that require explicitly stating your interests (e.g. Google Personalized Search). He argues that "most people are unlikely to put up with the bother of entering personal data." Exactly right. Personalization is supposed to make my life easier, remember? Don't make me do work, especially if you don't have to.

[via Gary Price]

Interview with Rich Skrenta

Peter Da Vanzo interviews Rich Skrenta (CEO, Topix.net). A great excerpt on the information overload problem:
    We're experiencing an explosion in the number of news outlets ... To scan this massive amount of incremental information available each day for items which can be personally relevant is a big job, but one amenable to computer automation. At a high level Topix.net's mission is to read everything new on the Internet every 30 minutes and let you know about new, relevant information that's of interest to you -- whether that interest is based on a local city, a hobby interest, a business sector, or some other content channel.
Rich has it exactly right. It's all about relevance. In a glut of news, we need to help readers find the news they need.

Friday, January 21, 2005

Windows file search and desktop search

Jon Udell has the details on why Windows file search is so broken.

Not only is indexing turned off by default, but, even if you turn it on, Windows will bypass the index "as soon as even one unindexed file lands on your disk."

Think that's weird? It gets weirder. According to Simon Burns, you can force Windows to use the indexes by adding '!' or '@' as a prefix to your query, making it as lightning fast. As it should be.

Desktop search applications are hot right now. Everyone and their mother seems to have written one. But the only reason there's even an opportunity here is that the default file search in Windows is so pitifully slow.

Of course, it's not hard for Microsoft to eliminate the opportunity they created. If they fix Windows file search, either by patching the existing functionality or by integrating MSN Desktop Search into Windows, this game is over.

Thursday, January 20, 2005

Do people need desktop search?

Joe Wilcox at Microsoft Monitor criticizes MSN Desktop Search, saying:
    I'm still not convinced all this rush for desktop search delivers consumers what they really want or need. Don the dad probably could find that letter he wrote to Susie's teacher in the "My Documents" folder; no search tool required. A media player manages his music just fine. Sure, he needs help finding his photos, but those search capabilities aren't here yet. Right now, I'm convinced desktop search's real value is e-mail ...
Joe is right that desktop search doesn't provide much that mainstream users need. But I'm not sure I see faster e-mail search as making it any more compelling. Even with my gigabytes of archived mail, e-mail search is fast enough. There's got to be more here.

Why doesn't MSN Desktop Search search web history? I want to search every website I've ever seen. Very useful when I know I saw something on the web before but can't remember where. This is where desktop search needs to go to bring real value. I should be able to search and easily find again anything I ever saw on my computer before.

Google Desktop Search does search browsing history (unfortunately only for IE), and Microsoft's Stuff I've Seen project suggests we'll be seeing steps in this direction from MSN soon too.

See also my earlier posts, "Yahoo desktop search" and "Seruku: Search what you've seen".

Wednesday, January 19, 2005

AOL Search improvements

Chris Sherman and Gary Price report that AOL will be rolling out several improvements to AOL web search next week.

Clustering using Vivisimo will be one of the major new features. If you haven't tried Vivisimo's clustering, you should. It's quite good. This move by AOL is pretty interesting, the first of the big search engines to do clustered search results.

AOL apparently will also go out the door with "Smartbox", a query refinement tool that sounds similar to Google Suggest. AOL will also be the first here, the first big search engine to put query refinement suggestions on their live site (Google Suggest is still in the labs).

They'll also have saved web search queries (like A9, My Ask Jeeves, My Yahoo Search, and even good old Findory) and, later, will launch a new desktop search product (like Yahoo, Google, MSN, and Ask Jeeves).

AOL web search, like A9, is just a thin layer on over Google. It's all Google search underneath. It's an interesting strategy. Yahoo and Microsoft are doing their own crawl and indexing. A9 and AOL are skipping that costly step and trying to find ways to add value on top of Google search results.

Update: John Battelle posts a good analysis, including some thoughts on how AOL is giving up its "walled garden" and opening its content to everyone.

Update: The new AOL Search is now live.

Peeking in at the Amazon Developers Conference

Real-time notes from the Amazon Developers Conference at the Amazon Web Services Blog.

The list of speakers include Joel Spolsky (of Joel on Software fame), Brian Aker (MySQL), James Gosling (Sun), and many other uber geeks. Sounds like a fun time.

There's a note that says that video from the conference may be available online eventually.

Tuesday, January 18, 2005

Shiny, happy search engines killing spam

Yahoo, Google, MSN, and others all join hands in an attempt to reduce weblog comment spam.

The idea is to have search engines ignore links in comments on weblog posts when computing PageRank, somewhat reducing the incentive to engage in comment spam.

I say somewhat because there's value to spammers in having a link or even just a mention of a product name in a public forum. The proposed solution will prevent spammers from improving their rank in search engine results using weblog comments, but that's only one of several reasons spammers spam.

The problem is similar to e-mail spam where, at least for some, the value from the tiny number of people who respond positively to the spamvertising exceeds the very low cost of sending out the spam. Raising the cost of spamming also has to be part of the solution.

See also my earlier post, "Killing comment spam".

Adam Bosworth on personalization

Adam Bosworth (VP of Engineering at Google) talked to the Gillmor Gang on IT Conversations. Early in the discussion, he seemed to be proposing a personalized feed reader:
    Imagine that one could take any number of RSS feeds coming in every day ... and you could put all that in a database on a very large scale, so 100 million people are posting to each other every day ... I'd get [a] list in sort of a relevance way, so I could wade through the ones that are most likely to be interesting to the ones that are least likely to be interesting to me. That would be very cool. It would help me find out what's going on out there in a richer way...
His emphasis on relevance made me think that Adam was proposing something like Findory. Adam said he wants a personalized relevance rank of RSS feed posts that focuses you in on the most interesting ones for you. That's personalized news.

But, later, I was surprised to hear Adam criticize personalized feed readers:
    One of the things that works pretty well today -- even with Amazon -- is things that are global where the personalization is a global and essentially says here's what I know about you and here's what I know about the world. Based what I know about the world and how you fit in, here are the recommendations I can make. And that model seems to work, partially because people assume it's not perfect. They understand that this is a pretty imperfect model.

    But if it started filtering -- you know, it's one thing to say, what are the recommendations. It's another thing to say here what are the new posts and it only shows you the ones it thinks you need to see. That would be kind of frightening. And it's hard to be that smart.
It wasn't extremely clear what Adam's concerns were, but, pulling from many other comments in his talk, they seemed to be around three issues: loss of control, loss of breadth, and loss of serendipity.

For a power user like Adam, loss of control is a really big deal. Spending hours managing hundreds of subscriptions in Bloglines or configuring a customizable portal is just fine. The more knobs, the better.

But, as Adam said at one point, his mother doesn't agree. Adam's mother doesn't want control. She won't customize. She just wants the right thing to happen. She just wants to read news.

Most current RSS readers are for people like Adam, techies who love to push buttons and get great joy out of programming their VCRs. For RSS to enter the mainstream, it needs to be easy. No effort. No configuration. No hunting down feeds. It needs to just work. That's what personalization does. It makes it so it all just works.

Adam was also seemed concerned that personalization might cause loss of breadth and serendipity; he seemed to think it might pigeonhole readers and only show them a small selection of content. To explore this, let's start with what Adam said about why he likes Google News:
    Like I go to Google News and I look at their news .. Most of the time, I find it actually of intriguing because there are stories and I look at who's writing about it and I see all these people like the Times of India or the Australian-whatever that I wouldn't normally see ... with totally different points of view ...
Adam likes Google News because it helps him discover articles he otherwise would have missed. What's so interesting about this comment is that Google News is stunningly bad at discovery. It shows the same front page to everyone. With 100k+ articles available, everyone sees the same thin slice of 20-30 articles. All the depth of information is lost.

Personalization offers a way to show different front pages to different people. It plucks the interesting bits and pieces out of a sea of information. Everyone sees a different slice of the data. Readers see new sources, are exposed to new viewpoints, and discover articles they otherwise would have missed.

For example, if you read Google News over the last couple weeks, there were hundreds of articles on the tsunami. Buried somewhere in there was what I thought was a fascinating article on the science of the tsunami from National Geographic. Google News had no way of showing this article to me; it shows the same thing to everyone. A personalized news site like Findory could (and did) surface this article for me by learning my interests and pulling the article out of the noise.

In weblogs, it's even better. There's millions of weblogs out there. It's quite hard to find good ones. The signal to noise ratio is shockingly poor. A personalized weblog reader can recommend relevant weblogs you've never heard of (like, perhaps, this one). And it surface the occasional gem on high traffic but low value weblogs.

Discovery in vast quantities of data is what personalization is designed to do. The key is to make sure the personalization reaches beyond the obvious and into the surprising. If you do that, personalization reveals the full breadth of the data and enhances serendipity.

Adam certainly seems to recognize the value of personalization for reading news and weblogs:
    If you have participatory client like a blog reader obviously you can do more tracking and obviously that would be useful information to have. I would love to know what are the usage patterns in terms of reading each of my posts and more importantly I think someone else would be interested in looking at that and how that correlates to other things that are read and thinking about how to make suggestions about what people might want to read.
And he talked about the value of personalization for information overload:
    You want to actually -- because you have so much information overload -- find out what other people are reading as a way to filter what you read.
But he seems concerned about how difficult it is to do the personalization right.

In addition to personalization, Adam also talked quite a bit about distributed databases. He described wanting a generic virtual database that has "data routers" that know where data is stored on a very large cluster of databases and routes the request, much like Google does with replicated shards of its search index distributed across its cluster. Very cool stuff.

All in all, a very interesting talk. It's long and time consuming -- and there's no transcript available, unfortunately -- but it's worth a listen.

[Bosworth talk via Scoble]

Monday, January 17, 2005

John Doerr on personalized information

John Doerr (of Kleiner Perkins, on the board of Google, Amazon, and others) spoke at Web 2.0 and audio of the talk is available at IT Conversations. A great excerpt on personalization:
    I was talking with you and I fantasized about something we haven't invested in yet but is Amazon or Google-sized in terms of its technical challenge ...

    Maybe we'll get to 3 billion people on the web and say that what matters to all of us is information, and products, and more. Which is we live in time and we're assaulted by events. And, so, let's just say there's 3 billion events going on at any given time. And if you wanted to compute the cross product of the 3 billion people and the 3 billion events -- 'cause you need to filter very carefully the information that's going to get to this device -- I don't want to be assaulted by anything but the most relevant information ...
John Doerr is talking about personalized information streams, personalized filtering of information about events. John's saying, show me the relevant news, interesting new products, and useful new documents I need to see. Surface the events that matter to me.

[Doerr talk via Scoble]

Friday, January 14, 2005

Up, up, and away!

Findory traffic graphFindory.com's quarterly traffic growth for 2004. We're just about to add two additional servers.

Great to see so many people using and enjoying Findory! Upward and onward!

Targeting and online newspapers

Steve Outing at Poynter Online suggests that online newspapers should put carefully selected, targeted classified ads on to their article pages.
    It's the same concept as Google's AdSense, which puts Google text ads on websites, with the ads complementing article content. Those bicycle classifieds might show up on sports and recreation stories, for example. Classified-ad buyers could even bid on placement, a la the models of Google AdWords and Overture.
It's surprising newspapers aren't doing this more already.

Readers aren't coming to newspapers just through the front page. Web feeds and deep linking means that readers are often coming directly to article pages.

When they come, it's a perfect time to show them a bunch of interesting and relevant other content. "Here's the article you want. Want to know more? Here's a few related articles and links to related areas of our site. And here's a few of our classifieds and local businesses that might be useful."

The article page could be like a mini-front page, but focused around the article. It could be a selected view into the newspaper, surfacing parts of the newspaper relevant to the article, hiding parts that are not. It could help readers learn more, draw them in, and show them what the paper has to offer.

Delicious Library, Delicious Monster

Leander Kahney at Wired interviews Delicious Monster, the team behind the clever Delicious Library application. It creates a gorgeous virtual bookshelf of the books you own.

From the article:
    Delicious Library ... generated $250,000 worth of sales in its first month .... [and] won an "innovators award" from O'Reilly & Associates.

    One of the niftiest features is the ability to use a video camera to read a product's barcode, which is used to fetch its details from the net.
Amazon web services are what made it all possible. Remarkable.

Interesting to see they're up here in Seattle.

[via Library Stuff]

Update: A year later, Todd Bishop at the Seattle PI takes a look at how Delicious Monster is doing.

Update: Sixteen months later, Amazon.com launches a web-based Delicious Library knock-off called "Your Media Library". While not quite as pretty as Delicious Library, Amazon's version does have the advantages of being web-based and building your bookshelf automatically from your Amazon.com purchases.

Thursday, January 13, 2005

Will Technorati die?

Andrew Chen gives us "5 Reasons Why Feedster and Technorati will Die". Brutal, but the post makes some good points and is worth reading.

The biggest issue I see with Technorati and Feedster is that they seem to have no competitive advantage over the search giants. Technorati is a weblog search engine. It's a web search restricted to weblogs.

What would it take for Google to offer the same product? Identify weblogs in their crawl, crawl them more frequently, and then put up a version of their search index that only contains weblogs. I've got weblog search and trackbacks. Looks like I'm done. Frankly, I'd be really surprised if Google didn't already have this available internally.

It's unfortunate. I like Technorati. I'd love to see them thrive and succeed.

What should Technorati do? Andrew Chen suggests they should focus on helping people read the news, a version of Google News for weblogs. This would be entering the same space as My Yahoo, My MSN, and Bloglines though, a space that has its own issues.

Nevertheless, Feedster definitely appears to be moving in this direction with My Feedster. And Technorati is including more and more news on its home page.

But I think the answer lies in what appears to be a comment from Dave Sifry (CEO, Technorati) at the bottom of Andrew Chen's post. Dave says:
    ... I think you're missing out the huge 90% of Technorati that is "under the surface" ...
And there it is. Technorati has huge amounts of oh-so-tasty data. They're swimming in an ocean of information. If they can surface the right data to the right people, bubble up the news people need, then they'd have something truly unusual.

And Technorati is moving in this direction. For example, some of the clever uses of Technorati from the Technorati developers contest surface otherwise hidden tidbits of data in some quite interesting ways.

But they'll have to move quickly to avoid the search giants. Right now, Technorati is underfoot. That's not a good place to be.

Wednesday, January 12, 2005

XML is for geeks

Danny Sullivan summarizes a blog discussion between Jeremy Zawodny, Dave Winer, and others on "more consistency in how people can find and subscribe to RSS, Atom and other feed content."

When you boil all of this down, the problem really is that the XML shouldn't be exposed in the first place. Dan Isaacs nails it in his post:
    What the hell is the point of making the XML simply link to the .rss page? Are there really more than 5 people on the planet that would be interested in the actual RSS?
RSS is a data format. No one but us geeks cares about XML feeds. People just want to read news.

See also "Getting your grandmother to use RSS".

Tuesday, January 11, 2005

My MSN vs. My Yahoo

John Battelle has a tease about some new features coming out of My MSN including "the ability to discover, read and search through blog and RSS content."

Sounds like we're about to get another web-based blog reader to compete with My Yahoo and Bloglines.

See also "Bloglines and feed readers" and "All new My Yahoo".

Update: More details from SiliconBeat and CNet.

Update: Brady Forrest announces the My MSN RSS Reader on the official MSN Search weblog.

Monday, January 10, 2005

Yahoo desktop search

Yahoo Desktop Search (a rebranded version of X1) has launched.

Yes, Google, MSN, Ask Jeeves, and others already have free desktop search downloads. Here's another to throw on the pile.

In an excellent and detailed review of Yahoo Desktop Search, Chris Sherman makes the interesting point that "missing from YDS is one of the most useful features of Google's desktop search -- —the automatic indexing and caching of web pages you've viewed with Internet Explorer."

Without search over your browse history, desktop search is merely a fix for the dismally slow file search built into Windows.

But search over the browsing history moves desktop search toward Memex, the memory extender. It would allow you to recall anything you've seen on your computer before with the touch of a button. That's where Google is headed.

See also my comments on personalized desktop search (Blinkx, Dashboard) in my earlier post, "MSN desktop search", and my comments on the limited business prospects for desktop search in "Mamma buys Copernic".

Defining the long tail

Chris Anderson posts reader-submitted definitions of the term "The Long Tail".

I particularly liked Eric Akawie's contribution:
    The Long Tail is what you get when the obscure becomes ubiquitous.
Another of the submissions -- "An embarrassment of niches" (by kpk) -- reminds me of an amusing story from a while back.

Jeff Bezos, during a presentation to hundreds of people, showed his Amazon.com recommendations. His top recommendation was the DVD "Slave Girls from Beyond Infinity".

Turns out he had just bought the DVD of "Barbarella", so the recommendation was fairly accurate. But still an embarrassment of niches.

MSN Search results as RSS

Gary Price discovers that the MSN Search beta allows you to get your search results as an RSS feed.

For example, here's a search for "Findory" as an RSS feed from MSN Search.

Aside from the geeky cool factor, this might a useful alternative to Google Alerts and other services that monitor for changes in web search results.

Update: Brady Forrest announces and promotes the new feature on the official MSN Search blog.

Subscriptions and the New York Times

The cover story of BusinessWeek is "The Future of the New York Times".

Apparently, the online version of the New York Times is quite profitable ($17.3M net on $53.1M of revenue in Q1 & Q2 2004). Despite this success, the paper is considering requiring online subscriptions.

It's a little curious. Only a few months ago, the Wall Street Journal experimented with allowing unrestricted access to its content to see the impact on its traffic. Now the New York Times is considering restricting access to its content to paid subscribers.

The BusinessWeek article spends a fair amount of time on the distinction between subscription and "free" news websites. It's isn't really correct to say the New York Times is free for users. It isn't uncompensated; it is supported by advertising. Readers pay for the content by viewing advertising.

Google is compensated and compensated quite well by users of its "free" search engine. On average, Google makes 54 cents per click on an advertisement and nearly 17% of searches end in a click on an advertisement.

Rather than destroying their traffic by restricting access to their content, perhaps newspapers should improve the relevance and usefulness of their online advertising.

[BusinessWeek article via Steve Klein]

Update: As Maarten pointed out (in the comments), I made a couple mistakes in my original post. They have been corrected.

Friday, January 07, 2005

Zen and the art of Amazon recommendations

Chris Anderson at Wired interviews Jeff Bezos. Jeff had some interesting comments on Amazon's personalization:
    We not only help readers find books, we also help books find readers, with personalized recommendations based on the patterns we see.

    I remember one of the first times this struck me. The main book on the page was on Zen. There were other suggestions for Zen books, and in the middle of those was a book on how to have a clutter-free desk.

    That's not something that a human editor would have ever picked. But statistically, the people who were interested in the Zen books also wanted clutter-free desks. The computer is blind to the fact that these things are dissimilar in some way that's important to humans. It looks right through that and says yes, try this. And it works.
What makes the recommendations so effective is that they're non-obvious but still relevant. Recommending other books with Zen in the title is obvious and not particularly useful. Readers could easily find those books themselves.

Recommending a book on simplifying your work environment is surprising and interesting. It's a book that would have been difficult to discover on your own.

But Jeff's comments might overemphasize the difference between humans selecting the recommendations and a computer selecting the recommendations. The computer merely does an analysis of what humans are doing. It's going out to the community and saying, "What other books do you like?"

It's as if everyone who bought that Zen book e-mailed you recommendations on other books to buy, but all the wisdom of this crowd is gathered automatically and with no effort from the community. It's easy to use. It requires no effort. It just works.

Thursday, January 06, 2005

Tyranny of choice and the long tail

Chris Anderson writes about the "tyranny of choice":
    Studies showed that people actually felt worse about their purchases when they had more to choose from (did I make the right decision?), and often even bought less because of it.
It's information overload. Lack of easy ways to differentiate and filter the options leads to paralysis.

Chris continues:
    What was missing ... [is] "data" about products, and "metadata" about information itself. For the physical goods, this can take the form of something as simple as Amazon's "rank by bestselling" lists to more complex background information such as reviews, price comparisons, version histories and manufacturing details.

    As Amazon's Jeff Bezos explains it, for a product that a potential purchaser has a great deal of interest in, no amount of information is too much: from reader and trade reviews to service records, the more they can learn about a product the more comfortable they are buying it. But for products that they just don't care much about, even something as simple as knowing what most other people bought can make the difference between being frozen by overwhelming choice and purchasing with confidence.
Jeff always has said that no amount of information is too much, but I think this misses the key point with tyranny of choice.

We don't just need more information. We need the right information.

Specifically, we need information that allows us to easily and quickly differentiate between the products. Flooding me with tons of information isn't useful if the information I need is lost in the clutter.

See also my earlier post, "Froogle adds product reviews", about how reviews transform Froogle from a mere price comparison engine to a service that gives "users the information they need to differentiate between products, helping them find the right product at the right price."

Getting your grandmother to use RSS

Citing the Pew Internet Life & American Life Project report, "The State Of Blogging", Charlene Li says:
    5% or 6 million online consumers currently use RSS.

    I knew that the number was getting up there, especially with the push from media heavies like Yahoo!, WSJ.com, and nytimes.com, but it’s still amazing that it’s gotten so much traction while being still so kludgy to find and add content.
Current RSS readers are clunky and difficult to use. Readers have to manually hunt down and add web feeds. These kinds of tools are only suitable for early adopters, people who see the vision and are willing to endure this level of suffering.

What will it take for RSS to get into the mainstream? What would it take to get your grandmother to use RSS?

Some say that integration into existing applications like web browsers and mail readers will lead to mainstream adoption. While this makes it easier to install an RSS reader, it does nothing to reduce the amount of work of configuring and using an RSS reader.

Some say it might be sufficient to recommend additional feeds to add to your weblog reader. Bloglines and My Yahoo have taken some early steps here. While useful for discovering new feeds, finding and reading the feeds you want is still a long and labor intensive process.

Stepping back a second, why are we exposing things called RSS, Atom, and XML to readers at all? Do they care what these data formats are? No, only geeks like us care. Mainstream readers just want to read news.

Next generation RSS readers will get past exposing RSS feeds. Readers will just read news. All the magic of locating the content will be appropriately hidden. It will all just work.

What will this future look like? We're trying to build it.

We see Findory as the next-generation of RSS reader. RSS feeds are hidden. Discovering new content is automatic. Readers just read news. That's it. Just read news.

We recognize what is necessary to bring RSS to the mainstream. We're building a next-generation RSS reader today.

Wednesday, January 05, 2005

Bill Gates on search

Michael Kanellos at CNet interviews Bill Gates. Some interesting excerpts on search:
    The commitment we made is to build unique search technology across the board. And if you look at the Microsoft Research things that we've had breakthroughs in -- natural language, document analysis, personalization, image analysis, language translation -- our research agenda will allow us to take today's search from ourselves and Google, and make what we have today look like a joke.

    Just take the idea of finding your local pizza place and doing that right; search doesn't do that well today. Search is really crummy today -- it's just that it used to be really crummy, and now it's better, and there never was anything like this before. So most of the results people get back today are irrelevant results. Deep analysis can take us much further, and that's why we're investing a lot, and you'll see us more very rapidly.
There is a lot of great work going on at Microsoft Research. The trick at Microsoft has always been to get this work into their products. At Google, the research teams are so integrated into the product teams that you can't tell them apart.

See also my earlier post, "The secret weapon: An army of PhDs".

[CNet interview via Search Engine Watch Blog]

Personalized search RSS feeds

Findory just launched RSS feeds for our personalized search for news and blogs.

Execute a news or blog search on Findory and, at the bottom of the search results, you'll see links to RSS feeds for those searches. The personalized version of the RSS feed (visible only if you are signed in) highlights recommended articles from the search results.

It's true that many others offer RSS feeds for searches over news or blogs. What makes Findory's RSS feeds so different is our personalization.

Normal RSS feeds are generic. Ever subscriber sees the same content. Findory's personalized RSS feeds are one-to-one. The feeds are built just for you with your recommended news and blog articles.

Any article you read through the RSS feed is included in your reading history, teaches Findory more about your interests, changes any personalized RSS feeds you use, and further personalizes the Findory front page.

Most uses of RSS out there view it as a static, generic channel, the same content doled out to tens of thousands of subscribers. We at Findory expect more from RSS. We think RSS should be dynamic, customized, and personalized. We think every RSS feed can be different, unique, and targeted. We're pushing RSS beyond what it has done before.

Battelle's BlogPlasma

John Battelle writes about his new idea, BlogPlasma, a visualization tool for seeing the relationship between blogs. He imagines it would work like MusicPlasma does for music artists.

If you haven't seen MusicPlasma, by the way, definitely check it out. It's a clever and fun visualization tool for viewing Amazon.com's similar artist data.

Although it's not as flashy as MusicPlasma, Findory does have a pretty cool way of viewing relationships between blogs. Our source pages show all related blogs and articles from related blogs.

For example, here's the page for this blog, Geeking with Greg, on Findory. It's related to Search Engine Watch Blog, Inside Google, and several other search related blogs.

It's fun to surf from blog to blog, following the relationships and seeing where they lead. The page itself has articles and related articles on it already, so it's an easy way to keep track of your favorite blogs and discover other blogs.

Findory's all about keeping it simple, but I can see the appeal of putting a flashy MusicPlasma-like UI on top of this kind of data. What a fun way to browse the relationships in the blogging community!

Tuesday, January 04, 2005

Six Apart buying Live Journal?

Om Malik reports that Six Apart is about to acquire Live Journal.

Om says that the deal "gives the company a very fighting chance against Google’s Blogger and Microsoft’s MSN Spaces."

[via Niall Kennedy]

Update: SiliconBeat posts some good background on the companies. Danah Boyd thinks there may be a culture mismatch.

Update: It's official. Xeni Jardin posts the news and links out to announcements from LiveJournal and SixApart.

Local search lacking local ads

Rob McGann reports that the top local search queries on Google only generate ads for local businesses 5% of the time.
    Google, with its advertising base in the hundreds of thousands, has, "barely scratched the surface of what's out there in terms of reaching a critical mass of advertisers that could be relevant to local search. Yahoo! has the same problem," [TACODA Systems CEO Dave] Morgan [said].

    "Google has loads of technology, but this is one marketplace it cannot automate its way into," [SiteLab International EVP Dana] Todd said. "In order for them to monetize local search with locally based advertisers, they are going to have to have people walking down the streets, knocking on doors explaining AdWords to local dry cleaners. That's not going to happen anytime soon, because they don't have the people to do that."
This is exactly the problem with local search. Local search is hard because there's many tiny little local businesses and advertisers, all being created, changing, and dying rapidly.

See also my previous post, "Down on local search".

[via Danny Sullivan]

Monday, January 03, 2005

Personalized news in 2005

Rick Edmonds at Poynter Online sees heavy competition for newspapers from Yahoo, Google, and Microsoft in 2005.
    Google and Yahoo go first among the new forces in news for three reasons. They have boatloads of capital to invest in new ventures and acquisitions. They have strong existing news aggregation products, increasingly able to become the personalized "Daily Me," so long a staple of thinking about the-newspaper-of-the-future. Their principal revenue base is advertising, the lifeblood of newspapers' income.

    Until now RSS (really simple syndication) feeds, the standard means for compiling your own news report from multiple sources, required a degree of user sophistication and have displayed content choices in uninviting, super-abridged list form. Neither issue is disappearing overnight, but [Morgan Stanley analyst Mary] Meeker contends that My Yahoo gets a lot closer to real simplicity and also serves a menu of special interests from the content-rich blogosphere.

    Look for 2005 to be the year the heat gets turned up on newspapers to get better, and innovate faster online as credible rival versions of the "Daily Me" emerge in the marketplace.
Personalized news (the "Daily Me") will be big in 2005, Rick says, with Yahoo, Google, and Microsoft charging on in.

See also "Daily Me is here", "Making sense of the chaos", and "Turning noise to knowledge".

Answers.com launches

Gary Price reports on the launch of Answers.com, a new service from GuruNet.

It's interesting. It's a metasearch engine that hits a series of specialized databases to try to answer your question. From their about page:
    We've collected authoritative facts in Answers.com by licensing top-quality reference work to give you concise, relevant information on each of over a million topics. We handpicked reference from publishers such as Houghton Mifflin, Columbia University Press, Merriam Webster, Computer Desktop Encyclopedia, Inlumen, Investopedia and Who2 (just to name a few)...
It's more focused than a web search engine and sometimes more useful.

Try some searches on it. For example, ask Answers.com, "What is a blog?" "What is a tsunami?" "Who is Jeff Bezos?"

At a time when some are talking about how big they can make their search index, it's interesting to see the opposite approach, focusing on a few, specialized, high quality data sources. Sometimes less is more.

Update: Apparently, Google is experimenting with using Answers.com instead of Dictionary.com for word definitions.

Update: Google has switched over to Answers.com for all word definitions.

Update: Walter Mossberg at the WSJ has an interesting and well-written review of Answers.com.
    Answers.com is ... a start toward a new search paradigm where the object is to provide real instant information, not just links to pages where that information may, or may not, be found.
[via Emergic and David Jackson]

Saturday, January 01, 2005

Reading news should be easy

Simon Waldman (Director, UK Guardian) sees everyone reading news from a feed aggregator in the future:
    Over time, you develop a rich cocktail of sources and you develop a new habit for browsing information. Some things you look at hourly, some daily, and some you deliberately save till Friday pm for a catch up. This is light years away from sitting down at the table in the morning looking at your paper, or even your paper’s website.

    In this new environment, no single organisation gives shape to the world in this way - there is no single front page, or lead story.
It's an interesting vision, but it sounds too time consuming for all but dedicated news junkies.

Most people don't want to spend hours hunting down good sources, setting up RSS readers, and skimming tens or even hundreds of web feeds every day. Most people just want to read news.

Simon needs to look beyond the current generation of aggregators. They're designed for early adopters, not the mainstream. There's no prioritization, no filtering, no sorting. Almost all of the work of hunting down quality sources and good information is put on the reader.

We at Findory have a different vision. We think news shouldn't require any effort to read. We think your newspaper should adapt to you and help you find the news you need. We think a newspaper should help readers discover important articles and sources buried deep in the long tail of news.

People need information. People need to know the news that impacts their lives. We should help them.