Tuesday, November 30, 2004

MSN Spaces

Mary Jo Foley reports that Microsoft will launch a new weblog service this week called MSN Spaces.

MSN Spaces is "a direct competitor to blog creation and hosting tools such as [Google's] Blogger [and] Blog*Spot, LiveJournal and TypePad."

I'm curious where Yahoo is in all of this. Given all their other community and content creation features, you'd expect Yahoo to be faster out of the gate on a Blogger knockoff than Microsoft. But not this time.

Update: MSN Spaces has launched.

Update: Charlene Li lauds MSN Spaces' integration with MSN Photos, MSN Music, and MSN Messenger and lists it as an advantage over the competition. Yahoo? Are you paying attention to this?

Amazon's citation links

Rageboy apparently saw Amazon.com doing an A/B test of citation links between books. It seems to be gone now, but Rageboy's post includes lengthy descriptions of the new feature.

Paul Bausch also noticed the new feature and has a link to a still-active Amazon help page with a little more information about it.

Interesting. And very useful for technical books. I hope they launch it soon.

Google Scholar offers similar functionality for academic papers and books. Seems that Amazon and Google are butting heads more and more frequently lately.

[Rageboy link was via Traffick]

Update: Gary Price points to some interesting work on citation analysis.

Bloggers talking about Findory

We at Findory have been thrilled by the reaction we've gotten from the blogging community to our site. I wanted to highlight some of the blog postings we've noticed lately:

Brian Dennis at New Media Hack proclaimed the "Daily Me is here" and that "people will lock onto sources out in the long tail."

John Battelle at Searchblog praised our pace of innovation, saying that Findory "keeps on truckin'" by "very often" announcing "cool new features".

Gary Price at Search Engine Watch writes about Findory frequently ([1] [2] [3]), most recently saying, "It seems like every week or so Findory launches a new service."

On seeing Findory's related sources, Steve Rubel said, "Looks like Rex Hammock and I are separated at birth."

Rex Hammock responded: "Someone much smarter than I can speculate what algorithm causes that to happen. Whatever it is, it makes perfect sense to me as we discovered each other through our blogs and have since become friends and even got together for lunch recently when I was in New York."

Nathan Weinberg at InsideGoogle called Findory "a powerful, smart news site". Nathan even went as far as to say, "I think I can now replace Google with Findory for [some] searches." since Findory "'just works', and works far better than anything out there."

Cindy Chick at Law Lib Tech said: "Personalization obviously has a lot of advantages in a many different areas. Personalization is what Amazon uses to display other items that might interest you, and it's what Findory uses to give you the news that you want to read."

Thanks, everyone! We're glad you're enjoying Findory!

Google desktop search security

Bruce Schneier responds to some of the hype ([1] [2] [3]) over so-called security flaws in Google Desktop Search:
    Google's desktop search software is so good that it exposes vulnerabilities on your computer that you didn't know about.

    Some people blame Google for these problems and suggest, wrongly, that Google fix them. What if Google were to bow to public pressure and modify GDS to avoid showing confidential information? The underlying problems would remain: The private Web pages would still be in the browser's cache; the encryption program would still be leaving copies of the plain-text files in the operating system's cache; and the administrator could still eavesdrop on anyone's computer to which he or she has access. The only thing that would have changed is that these vulnerabilities once again would be hidden from the average computer user.

    GDS is very good at searching. It's so good that it exposes vulnerabilities on your computer that you didn't know about. And now that you know about them, pressure your software vendors to fix them. Don't shoot the messenger.
Exactly right. These security issues exist whether you have Google Desktop Search installed or not.

Monday, November 29, 2004

Personalized news and blog search

Less than two weeks ago, Findory launched personalized web search.

Today, Findory launched personalized news and blog search.

Want to try it? Read a few news or blog articles on Findory, then do a news or blog search for something related to some of the articles you read.

For example, if you read the Wired article "Google Treads on Microsoft's Turf" through Findory, then do a news search for "desktop search", you'll see some of the articles will be marked with our orange personalized icon. Clicking on the icon will explain why the article was recommended.

As with our personalized web search, our personalized news and blogs search is a first step. As we learn more about how to help people find what they need, we'll begin to make more dramatic changes to the search results.

Personalized search is the future. We at Findory are excited to be part of it.

Newsbreak vs. Google News

Steve Outing talks about the launch of Newsbreak, an Australian news aggregator from Fairfax Digital. The aggregator itself seems indistinguishable from many others, but what is interesting is that it comes from a traditional news organization, Fairfax Digital.

Steve says Newsbreak is a reaction to Google News and other online news aggregators. And he thinks this is a positive sign:
    This continues the trend -- a good one, I think -- of traditional news organizations realizing that they can't continue to operate as islands on the Internet. Linking to other sources (even competitors, in many cases) serves the interests of readers, and establishes the news entity as a portal to the world of news, not just its own coverage. Such services give readers of a news brand less of a reason to turn to Google News, et al.
If only they would embrace the opportunity, traditional news organizations should be better positioned to innovate in online news than Google or Yahoo. Up to this point, innovation has been coming from elsewhere.

Google TV search

Stefanie Olsen at CNet reports on Google, Yahoo, and MSN's efforts on search for video streams.

One particularly interesting excerpt on Google TV search:
    Google's project for TV search is ultra-secretive; only a handful of broadcast executives have seen it demonstrated so far. To build the service, the company is recording live TV shows and indexing the related closed-caption text of the programming. It uses the text to identify themes, concepts and relevant keywords for video so they can be triggers for searching.

    The software allows people to type in keywords, such as "Jon Stewart," to retrieve video clips of the comedian's TV appearances, marked with a thumbnail picture with some captioning text, for example. Refining the search results for the show "Crossfire" would display a page that looks similar to a film reel, with various still images paired with excerpts of closed captioned text of the now-infamous fight between Stewart and CNN's "Crossfire" hosts. The searcher could click on and watch a specific segment of the show.
Watch out TiVo (and Comcast, DirectTV, ...).

See also my earlier post, "Query-free news search" which mentions a Google paper on searching television close caption text to find related news articles.

[via Search Engine Watch Blog]

Finding authoritative reviews

Chris DiBona digs up an interesting old article by Mimi Sheraton that criticizes Zagat's user reviews:
    The Zagat surveys stand or fall on their central premise: that thousands of separate opinions add up to something like the truth ... [But] the majority can be wrong, and one well-informed opinion is worth more than those of a thousand amateurs.
It's a great point. How do you find the authoritative, well-informed, useful opinions? Not only does this apply to community-generated content like customer reviews and product ratings, but even to blog postings and discussion forum comments where the signal-to-noise ratio is equally poor.

One common approach is to allow people to rate the reviews. Amazon.com does this for customer reviews, allowing people to vote on whether the review was helpful. Slashdot takes this a step further, not only allowing users to moderate (rate comments), but also allowing users to metamoderate (rate the rating of the comment).

Mimi Sheraton would probably criticize this approach as just layering a popularity contest on top of a popularity contest. And it does have problems. For example, positive reviews on Amazon.com seem to get many more "helpful" votes than negative reviews. Slashdot moderators seem to have an adolescent sense of humor and favor ill-informed rants, perhaps seeking entertainment more than information.

So, what else can we do? Another approach is to attempt to identify authoritative people and treat all of their reviews or comments as higher quality. This is closer to what Mimi wants, well-informed reviewers to count more than uninformed reviewers. The trick is identifying informed reviewers. Amazon and Slashdot both emphasize active users, I'd guess on the theory that those that bother to put in the effort to be involved probably have something useful to say. Users could rate each other, but this again reverts into a popularity contest.

This does seem like a spot where social networks actually could be useful. Who is an authoritative reviewer? Someone who is considered authoritative by other authoritative users. Yes, it's circular, but identifying a seed set of authoritative users is enough to start the process going.

Would this work? Or would it be just another popularity contest?

Got suggestions for Findory?

Got a suggestion for Findory? We've love to hear from you.

Most e-mails to Findory are either suggestions or oh-my-god-this-is-so-great fan letters. We're thrilled by the feedback we've been getting. It's great to have such an enthusiastic and supportive community using and enjoying Findory.

If you do have ideas, suggestions, or things you'd like to see at Findory, please feel free to drop us an e-mail anytime at suggestions@findory.com or comment on this post. We'd always love to hear from you.

Sunday, November 28, 2004

Mamma buys Copernic

Gary Stein (analyst at Jupiter Research) posts that Mamma, a metasearch engine based in Canada, just purchased Copernic, one of the leading desktop search companies.

Gary doesn't seem to think this was a very smart move by the so-called "mother of all search engines". He says, "Desktop Search has fully entered into the world of hype," and criticizes desktop search as having no business model: "No one's going to be cool with seeing ads -- contextual or otherwise -- displayed with their desktop results."

It also seems to me that, if Microsoft makes the default file and e-mail search on Windows "good enough" for most users -- perhaps by releasing MSN Desktop Search as part of Windows -- most of the opportunity for third-party desktop search applications will evaporate.

Update: Five months later, Copernic kills the deal due to a ongoing SEC investigation of Mamma.

Friday, November 26, 2004

The decline of web directories

Tara Calishain bemoans how Yahoo has deemphasized its web directory. Yahoo's latest redesign relegates their web directory to a corner at the bottom of the Yahoo home page.

It's an interesting point, particularly since Yahoo started as a web directory.

Google also deemphasized its directory a few months ago. Google Directory is based on DMOZ, the "largest, most comprehensive human-edited directory of the Web." At the time Google deemphasized Google Directory, I thought Google would be releasing a new, automated version of a web directory soon. That hasn't happened.

Keyword web search is great, but there's times when a browseable web directory is really useful, such as when you want a list of related sites, a comprehensive list of sites, or you're having a hard time specifying a search query that gets you what you need.

Update: Andrew Goodman says, "The lack of a definitive directory or two is the single biggest glaring hole in online search."

Wednesday, November 24, 2004

Newsprint is wasted on the young

Adam Penenberg at Wired says, "Newspapers Should Really Worry":
    [Young] focus-group participants declared they wouldn't accept a Washington Post subscription even if it were free. The main reason (and I'm not making this up): They didn't like the idea of old newspapers piling up in their houses.

    Don't think for a minute that young people don't read ... They access The Washington Post website or surf Google News, where they select from literally thousands of information sources. They receive RSS feeds on their PDAs or visit bloggers whose views mesh with their own.

    In short, they customize their news-gathering experience in a way a single paper publication could never do. And their hands never get dirty from newsprint.
But should newspapers be worried? This trend toward online news is an opportunity. No longer are newspaper articles competing for scarce space on the front page and the limited space available on the newsprint. No longer are articles limited to distribution to a localized markets.

The online news audience is massive and worldwide. It's hungry for your content. All you have to do is give it to them.

[via Scripting News]

Tuesday, November 23, 2004

Froogle wish lists

Google launches wish lists in Froogle. Google is getting better and better for online shopping.

How do you feel about that, Amazon?

Ask Jeeves and advertising

Jefferson Graham at USA Today writes about Ask Jeeves. Some excerpts:
    The company's signature cartoon butler, known as Jeeves, was a symbol of dot-com excess ... "We had great marketing, but the product just didn't deliver," [CEO Steve] Berkowitz admits about Jeeves' early days.

    Jeeves was initially known for its gimmick: It promised to answer any query formed in a question. Most of the time, though, Jeeves replied with irrelevant links, sending millions away to alternatives such as Google.

    [Acquiring Teoma in 2001] enabled Jeeves to acquire its own search technology and make its search results more relevant to queries ... Jeeves' most profitable move of all [was] deciding to partner with rival Google. Google-placed text ads, which appear atop Jeeves' search results, represent nearly 70% of Jeeves' income.

    "We look at the Web differently — at the credibility of a source, as opposed to just the popularity of a site," says Jim Lanzone, Jeeves' senior vice president.

    For instance, a search for "Bay Area airports" on Jeeves displays official airport sites for San Francisco, Oakland and San Jose. The same search on Google highlights local newspaper articles about the airports.
The biggest problem I have with Ask Jeeves is the focus on advertising. SiliconBeat illustrates this well with screenshots of the same search on Google, Yahoo, and Ask Jeeves.

On Google, search results are at the top. On Ask Jeeves, advertising (sponsored results) fill the top of the page. Which is more appealing to someone trying to find something?

Ask Jeeves' advertising-focused page may result higher short-term revenue, but Ask is crippling its long-term growth with its obnoxious and intrusive advertising.

[via Gary Price and Andy Beal]

Artist similarities in music

Brian Dennis points to an interesting paper by Brian Whitman and Steve Lawrence, "Inferring Descriptions and Similarity for Music from Community Metadata".

If that title didn't turn you off completely, the paper does have an interesting idea. Basically, they mine text in web pages, discussion groups, and blogs (which they call "community metadata") to discover information about music artists. They extract phrases from the community metadata and use it to find relationships. Because they analyze the web pages and discussion groups continuously, they claim to be able to capture short-term trends, like a groundswell of buzz around a particular song or artist.

This idea of extracting data and relationships from community metadata is clever. AllConsuming.net is an interesting example of this for books. It "watches weblogs for books that they're talking about". Memeorandum is an interesting example for news. It watches blogs to see what news articles they are talking about.

By the way, one of the authors of this paper, Steve Lawrence, is now at Google.

Monday, November 22, 2004

Findory's source pages

Findory launched another new feature this weekend, pages for every news site and blogger in our database. Alex Edelman has a good write-up.

Check out the pages for BBC, Wired, or Nature.

Take a peek at our pages for the blogs ResourceShelf, Searchblog, InsideGoogle, or Scobleizer.

Every article on Findory has a link to the appropriate source page just under the title. Each source page has recent articles, related sources, and related articles. Related sources and related articles are a great way for readers to discover interesting news stories and sources.

Try surfing from source to source using the related sources! I just used them to read recent articles from National Geographic, then clicked over to Nature, then New Scientist, then ScienceDaily.

Personalized search vs. clustering

Raul Valdes-Perez (CEO of Vivisimo) has a CNet article attacking personalized web search and touting the virtues of document clustering.

Raul makes some excellent points on the difficulties of doing personalized web search well. He says people's interests are fleeting and noisy. Raul says it's difficult to accurately infer interests from clickstream data, which is also noisy and imprecise.

It's true that personalized search is challenging. But Raul criticism is overstated. If personalized search learns immediately in response to new data, it can react to people's immediate goals and interests, even if they differ from their long-term behavior. If the personalization helps in more cases than it hurts, then the personalization has value, even if the data is noisy and the assumptions made from the data are speculative.

Raul's solution is to give up on personalized search and do document clustering instead. Vivisimo's Clusty is an excellent clustering web search -- if you haven't tried it, go try it, it's great -- but it requires effort. Users have to refine their query repeatedly using the clusters to find what they want.

People are lazy. They want what they want and they want it now. Google recognizes this, providing an "I'm feeling lucky" button that just sends you to the top search result immediately. They recognize that it's better to just find what the searcher wants on the first try, no refining, no effort.

Personalized search offers improvements to relevance rank by recognizing that relevance differs from individual to individual. Personalized search makes it more likely that you find what you need on the first try.

Sunday, November 21, 2004

Robin Sloan's EPIC 2014

Robin Sloan produced a clever Flash movie called "EPIC 2004" speculating about the future of personalized news and information. Worth watching.

After a brief recap of events of the last decade, the movie speculates about a future product from Google, the Google Grid, a vast file and content-sharing network that appears to be some combination of Blogger, TiVo, Napster, and the Google cluster. In Robin's vision, this is followed by MSN Newsbotster, a personalized news site that appears to be some combination of Findory, Slashdot, Memeorandum, and social networking tools like Friendster. Next up is Googlezon and EPIC, an "evolving personalized media construct" that provide personalized information by summarizing and rewriting content dynamically for each user.

Summarizing news and documents as described in EPIC is very difficult, but it is an active area of research. One of the most interesting examples out there now is Columbia Newsblaster. Microsoft Research also is doing work in this area.

Aside from the silly brand names of Newsbotster and Googlezon, Robin Sloan has created an interesting and thought-provoking vision of the future. Definitely watch the movie.

The movie ends with criticism of this new world of personalized news and information, complaining that it will be dominated by "narrow, shallow, sensationalist trivia", apparently what Robin Sloan thinks is all people really want and all they'll get from personalized news. He also claims Googlezon and EPIC will cause the death of large and well-respected news organizations like the New York Times.

The death of the New York Times? Clearly hyperbole. At best, Google News is another distribution channel for news. While it may reduce traffic to the front page of online newspapers, it drives traffic to their content, to individual articles. As the CEO of AP said recently, the "content will be more important than its container." News organizations will continue and thrive in the future. The only differences are that content -- the work of talented reporters and writers -- will be emphasized and that the content will be distributed more widely.

Are personalized news sites more shallow or more narrow? Compare a personalized news site to the current front page of CNN. The unpersonalized front page of CNN provides only a shallow view targeting some mishmash of the general interests of millions of readers. By trying to satisfy everyone, it satisfies no one, a bland blend of interests that results in mediocrity. And, I only get the perspective of CNN, what they think is important to their readers.

Personalized news provides an opportunity to broaden reader's interests, exposing them to news sources, perspectives, and viewpoints they otherwise would never have seen. A personalized news aggregator provides both breadth and focus, sorting through huge numbers of sources and articles and helping you find what you need.

Personalized news helps you discover news you would otherwise miss. It makes it easier to get the information you need to be well-informed about the events that impact your life. If this is the future, it is a future which should excite us.

Update: A few weeks later, EPIC is making the rounds again and getting some additional coverage (CNet, Slashdot, Traffick, InsideMicrosoft). A couple folks even started a Googlezon blog.

Friday, November 19, 2004

Findory's personalized web search

A couple weeks ago, Findory launched search history for web, news, and blog search. As I've said before, search history is not personalized search.

This week, Findory took our first step toward true personalized web search. In subtle and small ways, we are starting to modify web search results based on your history at Findory.com.

To see the impact, do a web search at Findory, then click on one or two of the search results, then do another search for something fairly similar. In cases where we believe we can help, we'll modify and highlight some of the search results.

Here's a couple specific examples:
  • Search for "Yahoo".
    Click on the top link for Yahoo.com.
    Search for "Dilbert".
    The Dilbert page at Google a few results down will be highlighted and modestly reranked.

  • Search for "Incredibles"
    Click on the IMDB link (fourth down).
    Search for "Nemo".
    The IMDB page on Finding Nemo will be highlighted and popped up to the top slot.
Please keep in mind these are our first, early, baby steps. The changes are small, infrequent, and subtle. Findory need to learn to walk before it can run. Over time, Findory will better understand how to help people find what they need and the changes will become larger and more frequent.

As small as this step may be, we believe it is a first for a commercial web search engine. Many are talking about personalized search, but no one is doing it. Our personalized web search learns from your behavior, modifies your search results, and helps you find what you need.

Update: It took many months, but a new version of Findory personalized web search has launched that makes more substantial changes in the relevance rank.

Google Kirkland open house

The Google Kirkland open house party was last night. It was a great time. Quite a turnout, totally packed. Strong UW presence, which wasn't surprising, but I was amazed by the number of Amazon and MSN people there.

Brady Forrest (PM at MSN Search, frequent poster on MSN Search blog) was there. Scott Pitasky (former Amazon.com, now head of HR at MSN). Erik Selberg (author of Metacrawler, one of the first metasearch engines, now at MSN Search). Robert Scoble was apparently there, but I didn't bump into him.

I got a chance to catch up with a few Googlers, Joshua Redstone (old friend from graduate school, works on GFS), Peter Norvig, Jeff Dean. Jeff Dean and I had an interesting discussion about the potential for abuse of MapReduce; I was arguing you might see tragedy of the commons issues because the system makes it so easy to consume vast resources on the Google cluster, but Jeff said everyone plays nice and that it isn't an issue. I was hoping to see Joe Beda, but he couldn't make it, unfortunately. David Krane was there, but I didn't see him.

Bumped into a couple of the Slashcode guys too, Brian Aker and Chris Nandor. Unbelievable that Slashdot uses NFS in a production system, but Brian and Chris insist it's not a serious problem.

I finally got a chance to meet Todd Bishop from the Seattle PI in person. Great to see you there, Todd.

Making sense of the chaos

Bill Joy (co-founder, Sun Microsystems) on the Charlie Rose show:
    Our lives are overwhelmed by all the information coming at us in a very disorganized way. We're going to hunger for something that will make sense of all the chaos--that will look at all the things happening in the world and filter and order them in a way that's personalized to us. That will be the next great revolution--that is something that doesn't take an index of the dead information on the Net, but the live information of things as they are occurring and as they are relevant to us.
The next great revolution is finding focus and relevance in the flood of new information. The next great revolution is personalized news.

[via Musing on Technology]

Thursday, November 18, 2004

AI is the mainstream

Adam Bosworth (who left BEA for Google recently) at his ICSOC 2004 talk:
    You want to see the future. Don’t look at Longhorn. Look at Slashdot. 500,000 nerds coming together everyday just to manage information overload. Look at BlogLines. What will be the big enabler? Will it be Attention.XML as Steve Gillmor and Dave Sifry hope? Or something else less formal and more organic? It doesn’t matter. The currency of reputation and judgment is the answer to the tragedy of the commons and it will find a way. This is where the action will be. Learning Avalon or Swing isn’t going to matter. Machine learning and inference and data mining will. For the first time since computers came along, AI is the mainstream.
Managing information overload with AI. That's the future.

[via Niall Kennedy]

Google, Glog, and G-nius

The Motley Fool writes about Google's track record of innovation.

In addition to coining some odd words like "g-nius" and "glog", the article has an interesting piece on Google AdWords and AdSense:
    In 2003, Google derived 97% of its revenues from advertising.

    Google's simple text-based ad results often segue so well with the search that they can hold actual interest for the user. Even the most grudging critic probably has to acknowledge that there was a time when he or she clicked on one of Google's ads because of its relevance to their interests.

    Relevance. That's where Google's got it down, especially with its AdWords and AdSense programs.
See also my earlier post, "Bringing sense to web advertising".

[via Andy Beal]

The big, bad VC

An interesting essay, "10 Reasons to Shy Away from Venture Capital".

The essay echoes advice I've gotten from many others. The distraction and poorly aligned goals are the biggest issues for me. We're too busy executing to play these games.

[via Joel on Software]

Intent marketing

Charlene Li at Forrester writes about personalized advertising, which she labels "intent marketing":
    Today, publishers announce that they have content and an audience that is attracted to that content – so if you're a marketer interested in that audience, the publisher will sell you access to those users in the form of advertising at a specified price. The onus falls to the marketer to figure out where the audience is, hence the important role of media buyers and ad agencies.

    In the future, marketers will announce that they want to reach a certain segment – let's say, women in-market for a car – and are willing to pay $25 per qualified lead. The onus now falls to the publisher to deliver that audience to the marketer. Publishers will be able to see what the "bids" are within the system for a particular user profile and optimize their ad serving to maximize revenue per page.

    This is the development of what I call "intent marketing" where the marketer targets intent, in this case, inferred from past behaviors.
I'd like to see this go one step further. I'd like to see the entire process of targeting advertisements handled as an optimization problem.

In this future, marketers create a large pool of advertisements with specific segments in mind for each ad. The advertisements go out on the network of publishers, mostly showing to people who match the segments, but also sampling related segments outside of the marketers intent. Quickly, the advertisements focus in on narrow clusters of readers who are interested or, if no one seems interested, the advertisements are dropped completely.

I'm not alone in having this vision. Many have talked about it. But it's quite a challenge to implement. It requires a massive amount of data, only possible at scale.

But Google AdWords seems close to doing it. They suggest alternative keywords, show ads for queries that aren't exact matches to the specified keywords, and drop ads that perform poorly. The next step is to use the vast amount of data they have on what ads are effective to start showing ads for other keywords than what was specified and to further narrow the targets when responsive audiences are found.

Already, I click on Google ads much more than other ads because they're relevant, especially when I do a Google search for a specific product. Perhaps advertising actually can be informative, unobnoxious, and useful.

See also my earlier post, "Bringing sense to web advertising".

Wednesday, November 17, 2004

Google Scholar

Shirl Kennedy and Gary Price gush about the launch of Google Scholar, a version of Google focusing on scholarly sources.

Using it a bit, it is very cool. It seems similar to Citeseer. If you haven't tried Citeseer, you should. Among other great features, Citeseer links academic articles by citations and similarity. The new Google Scholar also has a feature to show all article citations. Nice.

You know, Citeseer was created by NEC Labs. One of the authors was Steve Lawrence. Steve Lawrence is now at Google. Hmm...

See also David Krane's post and the New York Times article on Google Scholar.

Update: John Battelle says Google Scholar was built by Anurag Acharya, a former UCSB professor and now a principal engineer at Google. Here's Anurag's old home page and a brief bio from a UW talk he gave back in February 2003.

Update: Danny Sullivan also has a good post on Google Scholar.

Update: Anurag Acharya announces Google Scholar on the Google blog.

Update: 1.5 years later, Microsoft launches Windows Live Academic Search. Apparently, another creator of Citeseer, Lee Giles, was involved with Microsoft's effort.

In my usage, I found myself agreeing with Dare that the breadth of Windows Live Academic is too weak to be competitive. I also found the lack of advanced search annoying.

Top online news websites

CyberJournalist posts the top online news websites for October 2004 (from Nielsen/NetRatings).

Top four are CNN, MSNBC, Yahoo News, and AOL News. Google News is small by comparison at about 1/3 the reach. Interesting to see several newspapers have large online audiences.

Et tu, TiVo?

TiVo will stop skipping past advertisers. As you would expect, reaction to this has been negative. Musing on Technology calls the move "customer-uncentric". BoingBoing says, "Time to build a MythTV."

This comes on top of restrictions on moving recordings made with TiVo and restrictions on recording of pay-per-view shows. Alternatives to TiVo, such as MythTV and Surestream's Beyond TV have no such limitations.

Yet another disappointing business move by TiVo.

See also my earlier post, "Will somebody please fix TiVo Suggestions?"

Update: Scott Johnson (VP of Engineering at Feedster) rails against TiVo and says there's a business opportunity here for someone who builds the PVR that people want.

Update: Interesting new twist on this. PVRblog cites a CBS study that says that people fast forwarding through commercials actually have a higher recall rate than people watching it normally. The counterintuitive result is probably because people fast forwarding are paying close attention to the screen.

Tuesday, November 16, 2004

RocketNews personalization

RocketNews issues a press release claiming "unique personalization features". Among other things, they say:
    RocketNews tracks the articles visitors click to read, and these individual choices are used to determine the relevance and ranking of search results every time the individual returns to RocketNews to view their searches or their personal news portal.
But I wasn't able to see any changes to the RocketNews portal after reading several articles.

RocketNews does appear to keep search history, as do many others (My Yahoo Search, A9, MyJeeves, Findory).

Gary Price has more on the RocketNews announcement.

Yahoo ads in RSS

Yahoo's Overture takes the first steps towards advertising in RSS feeds.

Sigh. I really don't think advertising in RSS is a good idea. I'd rather see excerpts only in RSS feeds and advertising on websites when people clickthrough to read the full article.

But it seems that advertising in RSS is inevitable. I hope it's well targeted and unobtrusive, like Google AdWords, not untargeted and obnoxious, like popup advertising.

Update:Dave Winer comments on advertising in RSS, arguing that excerpt-only feeds are already advertisements (for the site where you have to go to read the full article) and that bloggers should "think carefully about how much advertising you think your readers can endure."

Exactly right. Heavy advertising may generate more short term revenue, but only at a long-term loss in revenue from driving away readers. It's a careful balance. The best advertising is unobnoxious, relevant, and useful.

RSS information overload

Chris Sherman talks about information overload with RSS
    The information overload problem comes from sources that update frequently. Sometimes these sources provide must-read information. However, in many cases, new information may consist of a rambling post, a post on a topic of little interest to you, or worst of all (in my opinion) a post that simply points you to another feed that you're already tracking.
and recommends PubSub as the solution (even though Google News Alerts, Yahoo News, and Feedster all offer similar functionality).

The problem with all these solutions is you still have to go through all the effort of explicitly specifying what you want. You need to enter a bunch of keywords, refine them to narrow down to what you actually want, and then maintain them over time. All but the most dedicated information junkies won't bother.

What's really needed is an automated solution. Something that just figures out what you want. With no effort. Something like this.

Mooter CEO Liesl Capper

Red Herring interviews Mooter CEO Liesl Capper. This company certainly has a different view of search. From the interview:
    Our entire focus has been on building our algorithms from the human side of the search equation, not the data side. I have spent the last decade studying cognitive styles, and how who you are as a person affects how you deal with information.

    [We] use human sciences to predict what a person wants to see. Our information amplifier learns from current interest patterns, and amplifies implicit search interests.

    Humanity is facing a subtle but pervasive shift in the way in which we perceive reality and interact with the world and other humans, and that this would be increasingly mediated by technology. In the future we will walk into a room, and an immersive world of information will flow around us, engaging all of our senses.
Well, okay, then.

But, I have to say, it's nice to see people approaching search from such a different perspective. After all, experimentation is the key to innovation.

Monday, November 15, 2004

MSN's long road ahead

US Bancorp analyst Safa Rashtchy says of MSN's new search engine:
    This is not a move that will give MSN a big market-share gain; instead, it's a first step to stop the market share loss that MSN Search was experiencing. A new search engine was an absolute strategic necessity, but the hard work comes after the engine is perfected and fully launched to the entire population of MSN. This will include positioning not only the search site, but also MSN itself to customers, as a better web experience than Google.
[via Searchblog]

Sunday, November 14, 2004

MSN desktop search leaked

An apparently unauthorized review with several screenshots of the prototype of MSN's desktop search. Supposedly, this will be released by December 2004.

[via msnsearch's weblog]

More on Google in Seattle

Todd Bishop at the Seattle PI writes about Google's new Kirkland office.

See also my earlier post "Google's Kirkland Office".

Saturday, November 13, 2004

It's the content itself

Tom Curley (CEO of Associated Press) talked at the Online News Association Conference about online news:
    New technologies allow news consumers to get the information they wish when they wish in the forms they wish. "Content will be more important than its container in this next phase," Curley said. "The franchise is not the newspaper; it's not the broadcast; it's not even the Web site. The franchise is the content itself."

    He imagined a hypothetical "My Personalized News" of the future, which might include: the latest headlines and photos delivered, delivered to his computer by the AP; video news and ESPN highlights delivered to his set-top box; a list of upcoming earnings reports delivered by The Wall Street Journal to his PDA; and a BusinessWeek analysis delivered as a PDF to his printer. The challenge is therefore, he said, to first "get comfortable with this ice-cold shower of 'disintermediation'" and then for companies to begin "tagging our news for delivery in discrete pieces" while keeping control of their intellectual property and earning money to support their businesses.

    "We believe that world needs AP's primary content more than ever," Curley said, "that authoritative voice that we -- and you -- provide, precisely because there are so many new voices and free-flowing content 'atoms' out there."
Content is king. News sources need to take advantage of new, decentralized distribution mechanisms such as RSS and news aggregators, exploiting it to reach a wider audience.

Update: The full text of Tom Curley's keynote is available. I particularly liked this part:
    Discrete pieces of content -- stories, photos and video clips -- all categorized and branded, will be disassembled from whatever presentation you create and magically reassembled ... That's the fundamental behind personalization. The content comes to you; you don't have to come to the content.

The value of simplicity

In an InternetNews article, an interesting discussion of simplicity in search interfaces:
    Google took the lead in the early days of search because it provided better results in a clean layout. MSN Search includes a bevy of buttons, drop-down menus and sliders to help searchers refine results. But it's not clear to experts that many searchers even care about advanced query techniques, no matter how simplified.

    "Users are notoriously very lazy," said usability expert Jakob Nielsen, a principal of Nielsen/Norman Group. "They don't want to go to page two; they don't even want to scroll. The average behavior is to type two or three terms [into the query box], look at what's visible and click on those links."

    Nielsen said that Google rose to the top at a time when other search services had piled on content and features in the rush to become portals. "They were very cluttered. Google had the opposite approach -- very lean and cut back and very good at prioritizing. Those are the two reasons for its big success."

    Nielsen compared Google to A9, the search service from Amazon.com, which combines Google's search technology with proprietary features, such as personalization, Search Inside the Book and results from different sources appearing in multiple columns."A9 has a variety of extra features, so you would think it must be better than Google. But it's worse."
See also my earlier post "Google wants to keep it simple".

Friday, November 12, 2004

MSN Search's beta blunder?

A brutal article at BusinessWeek:
    Nov. 11 was supposed to be a big day for the folks running MSN Search. The new service, albeit a test version, was slated to launch and begin Microsoft's big push into the lucrative and competitive world of Internet search technology. But the day didn't start off too well. The site went down almost as soon as it went up.

    It would be one thing if a startup's beta had sputtered. Not too many folks would have noticed. But this is Microsoft, earth's largest software company. And it trumpeted this test launch with a public-relations campaign to ensure that users around the world knew the service was ready for widespread use. So when MSN Search went down, a bit of Microsoft's credibility in the search-engine business went with it.

Silliness with page rank

Scoble asks Google, "Who has the best geek blog?" and notes that he gets top honors. But, oh mighty Scoble, what happens if you ask who has the "best geeking blog"?

Yes, we have a lot to be proud of. We're both thoroughly geeky.

Of course, the top Google result for "personalized news" is exactly what you would expect. Google knows a good thing when it sees it.

Meeting Topix.net

I just had the pleasure of chatting with Rich Skrenta and meeting the rest of the Topix.net team.

Although Findory.com is not listed on their partners page, we've had a partnership with Topix.net for some time. Topix.net is focused on news classification and local news. Findory is focused on personalization of information. But the two companies have a lot in common.

Google Firefox start page

Sushubh Mittal discusses the new default start page for Firefox, which is hosted by Google, and the benefits for both the Mozilla Foundation and Google.

Of course, the default start page for Internet Explorer is MSN. The major competitor to Internet Explorer is Firefox/Mozilla. And the major competitors to MSN are Google, Yahoo, and AOL.

All very interesting.

Online shopping up substantially

2004 online holiday shopping sales will be up 20-30% over 2003, according to SearchEngineJournal's summary of several analyst reports.

Thursday, November 11, 2004

MSN Search and the competition

Many are saying that MSN Search isn't better than Google at this point. But does it have to be? Charlene Li at Forrester explains:
    [MSN Search] needs to be "good enough" for MSN to serve its current users, and hopefully, to entice wayward MSN.com and Hotmail users who search with other engines to come back to the roost.

    Also, now that MSN has its own search algorithm, it will be able to develop search features like personalization that meets its users' needs. Note that I mean MSN's users, not necessarily the loyal users of Google or Yahoo! Search is fragmented enough that I think each player will carve out their own audience.
There can be many players in this market and, with Microsoft's market power, it's guaranteed to be a big one.

Wednesday, November 10, 2004

MSN Search launch

MSN Search launched, albeit in beta form. As you would expect, Chris Sherman and John Battelle have good writeups. The MSN press release is also available.

Other than implicit geolocation for local search, no personalization yet. But the MSN Search about page still promises it is coming:
    A Personalized Experience. Your Search service should learn from you. What you like, what you read, where you live. Search should deliver results that are more personal and relevant to you.

Does size matter?

In his announcement about Google's index size, Bill Coughran makes a good counterargument against my claim that index size doesn't matter.

His basic point is that there are large classes of searches that return few or no useful results. The bar is pretty low in these cases. If you only return three results for a query, relevance rank isn't going to matter much. If you increase the number of results to five, that's probably helpful if the results you added are at all relevant. So, it's true. If you can increase the number of results in these cases without reducing relevance on other queries, you're helping people find what they need.

Nevertheless, is expanding a general crawl really the right approach? When you already crawl 4B pages, any additional pages you crawl will be deep in the crufty back alleys of the web. These kinds of documents can not only be useless, but can hurt overall search quality if they get surfaced inappropriately. Perhaps it would be more useful to do directed crawls of high quality or specialized data sources? Target specific holes in your coverage?

And there's other ways to be helpful. For example, perhaps the query is just the wrong query. Maybe the searcher needs help with query refinement (replacing search terms) or query expansion (broadening a search with synonymous or related terms).

Again, it all about relevance. If you can improve relevance in a minority of cases by expanding the general crawl without hurting the common cases, you'll improve the overall usefulness of the search engine. But increasing index size isn't the only way to improve relevance.

Update: Interesting post from Danny Sullivan on index size of and depth of the page indexed by the various search engines.

Google's index doubles

Bill Coughran (VP Engineering at Google) announces that Google doubled their search index to 8B pages today.

I'm curious what's going on at MSN Search right now. They're launching their search engine in just hours. The MSN announcement supposedly was going to include boasting of the impressive size of their search index.

An amusingly well-timed move by Google.

Update: It does appear MSN had to change their press release at the last minute. Comparing the final and pre-release versions, they went from saying
    The largest index of information. More than 5 billion web documents – larger than any web indexes reported today.
to saying
    Vast index of information. The MSN Search index of more than 5 billion Web documents is one of the largest indexes offered today.

MSN Search launching tomorrow?

Many are saying that MSN Search is launching tomorrow. John Battelle has some interesting tidbits on it.

John Markoff writes that "Microsoft will stress the size and completeness of its service" as it's advantage over Google and Yahoo.

As I said in "It's not how big it is, it's what you do with it", improvements to search are all about relevance. The size of your index only matters if you're missing documents that should be at the top of the search results in a substantial number of cases. Otherwise, you should really be focused on getting the documents already in your index surfaced at the right times.

I'd hoped there would be more here given Microsoft Research's work in question answering, personalization, and natural language processing. We'll see what they end up releasing, but I hope this isn't just an "ours is bigger" play.

Update: Danny Sullivan posts an interesting timeline that shows MSN Search's development over the last several years.

Tuesday, November 09, 2004

Solving the Google puzzles

MathWorld posts detailed solutions for the problems on the Google billboard and Google Labs Aptitude Test.

Oh yeah, baby. That's some geeky stuff right there.

[via Google Blog]

Monday, November 08, 2004

Personalized ads on cable TV

Saul Hansell reports that MusicChoice on cable TV will start doing personalized advertising:
    It will be the first company to display commercials that are selected for each user based on behavior. Using a special computer installed in each cable "head end" - the control center for each neighborhood - the system will track each user's music video choices to deduce demographic information. Different sorts of people, even neighbors, will be shown different commercials. "If we see that you listen to soft rock, and we know what else you picked, we know you are an over-35 female," said David Del Beccaro, the chief executive of Music Choice.

    This sort of targeting is common on the Internet, but it has not been used in cable television until now.
Looks like this isn't quite able to work at the individual level, just at the neighborhood level. And it's doing demographic targeting, not personalizing to individual interests (e.g. categories like soft rock or even particular subgenres, artists, or related songs).

Nevertheless, an interesting new development in personalized advertising.

Thursday, November 04, 2004

Query-free news search

Interesting paper by several people at Google, including Sergey Brin (co-founder). It describes research work to match text news articles to TV news reports. Clever idea.

It uses the text in the close captions and tries to find news articles on the same event or topic. It's not a recommendation system -- they're finding other writeups of the same story, not different but relevant stories -- but it's still worth a peek.

On a related note, according to Sergey Brin's amusingly outdated resume, he once prototyped a movie recommendation system. Small world.

Update: Nathan Weinberg calls this "GoogleTV" and discusses the implications.

Update: Gary Price lists a bunch of tools that provide keyword search over close caption text (although none of them seem to do it implicitly, which is the focus of the Google paper). And Danny Sullivan talked about blending search and TV a couple weeks ago.

Topix and Citysearch

Rich Skrenta announces a partnership between Topix.net and Citysearch.

Topix.net is a news aggregator with a deep browse hierarchy. For example, you can find news on asthma, not just health news, or news on nanotechnology, not just tech/science news. They have local news sections even for very small cities. And they have RSS feeds for everything. It's worth checking out.

There's a good write up on SiliconBeat about the deal and Topix.net.

Gary Price points out that Topix.net has been on a deal-making spree lately, announcing deals with Yahoo, Info.com, and Ask Jeeves in addition to Citysearch. Citysearch has also been making deals lately. They just announced they're powering part of local search for Ask Jeeves.

As Google, MSN, and Yahoo increasingly move into local search (including providing merchant reviews), there's going to be a lot of competition for smaller local search players. But there's also a lot of opportunity.

Update: Topix.net makes yet another deal, this one with Dogpile, a metasearch engine.

Wednesday, November 03, 2004

A proper burial for old code

A little humor on this day, perhaps, in what the programmers at LexisNexis do with old code:
    Among the tiny graves on Blocker Hill, the wind echoes with the tortured cries of computer programmers. Beneath the eight grave markers, and perhaps in a rumored unmarked grave nearby, lie reams of paper printouts of code for software that has left this mortal operating system.

    The cemetery is a quirky tradition among the programmers at LexisNexis ... Rather than simply delete programs that are retired or replaced, they print them out for a proper send-off — not always with fond regards.

    Workers had to "drive a stake" through the heart of a poorly performing program named CCI, which received an ignominious burial beneath an emblem of a pig.
[AP via Niall Kennedy]

Tuesday, November 02, 2004

Findory's search history

One of the cool features in our recent launch is search history. It's true that others have search history -- A9, Ask Jeeves, and My Yahoo Search -- but Findory has an interesting take on it.

First, we make it easier. No registration required. Just use Findory and we'll keep track of your searches for you. It's that easy. As always, you're anonymous on Findory.

Second, we do more. It's not just web searches in your search history. Findory keeps search history for news, blog, and web searches. It's all right there for you.

I've talked before about how search history is not personalized search, but it is a step toward personalized search. Personalized search is the future. Findory is going to be part of it.

AOL and personalized search

AOL expects to launch personalized web search in the next few months.

Not clear if they're just talking about matching A9, Ask, and My Yahoo Search by offering search history features or if they're talking about something more.

Monday, November 01, 2004

A new Findory

Findory.com launched a major design this Halloween weekend, all without any scares.

Using Findory is as easy as ever -- just read articles and Findory builds a newspaper just for you -- and our new features make it even easier to find the information you want. Some highlights of what's new:
  • Home page: A new home page, built just for you, with everything in one place.
  • News and blogs: We merged Findory News and Blogory into one site. The home page has all your top stories, both from the mainstream media and from independent weblog authors, right there for you to read. All part of helping you find the news you'd otherwise miss.
  • Web search: Web search right on your Findory home page. Quick and easy, like it should be.
  • Search history: We remember your previous web, news, or blog web searches and make it easy for you to go back and find something again.
  • More feeds: We expanded our selection of RSS feeds. Now there's even more ways to read Findory.
  • More by e-mail: Just like with news, you can now get your personalized top blog articles by e-mail every morning. Delivered right to you just like your morning paper.
In addition, our entire infrastructure changed. New webservers, new database servers, new databases, new build boxes, new development boxes, and a new HTML delivery engine. It's all shiny and new. As it should be, no one ever saw our mess.

We're proud of the new Findory. Try it out. Read a few articles. Play with it a bit. I think you'll be surprised to see how much interesting news you've been missing every day.

May the best services win

In "Ubiquity in the Internet Age", Jeremy Zawodny argues that ease of distribution and free flow of information on the internet leads to three keys for success:
  • A great product
  • Web services for the community to build on that product
  • Revenue sharing
The idea seems to be to build grassroots distribution and viral marketing through revenue sharing and spur innovation through web services.

He holds up Amazon, Google, and eBay as positive examples of companies using this strategy. Amazon certainly has benefited from its affiliates program (revenue sharing) and has extensive web services, many of which are publicly exposed. Most of Google's advertising growth does come from AdSense (revenue sharing), but Google's web services are limited and probably have no substantial impact on their business at this point. eBay's seller tools do have a huge impact on their business, but I think it's a bit of a stretch to call those web services, especially since eBay is well known for prohibiting outside access to its searches and catalog.

But Jeremy's article is more about the future than the past. I think he's right that web services and revenue sharing can create rapid growth and innovation. It'd be great to see more web services and revenue sharing from Yahoo.

Tell your readers to go away

Robert Scoble posts about the value of directing traffic off of your website:
    Yahoo. How did they start? Two kids in college telling their readers to go away and check out some other site. Craig's List? He took the Yahoo concept further. His list sent readers away to check out jobs, housing, and other stuff.

    And, then, there's the now famous Google. They couldn't find enough ways to send their readers away, so they started selling advertising to companies and people who'd pay to have Google's readers come to them.

    It's the new marketing ... Instead of being desperate and saying "look at me look at me" you tell your readers to get lost. Go someplace else.

    The evidence is clear. Want some attention? Tell your readers to go away.
Although Scoble mentions Yahoo as a positive example, nowadays Yahoo tries to keep people on the Yahoo site. Larry Page (co-founder of Google) had a great quote on this in an interview with Playboy a while back:
    PLAYBOY: With the addition of e-mail, Froogle -- your new shopping site -- and Google news, plus your search engine, will Google become a portal similar to Yahoo, AOL or MSN? Many Internet companies were founded as portals. It was assumed that the more services you provided, the longer people would stay on your website and the more revenue you could generate from advertising and pay services.

    PAGE: We built a business on the opposite message. We want you to come to Google and quickly find what you want. Then we're happy to send you to the other sites. In fact, that's the point. The portal strategy tries to own all of the information.
Findory is built around a similar philosophy. You come to Findory to find what you want, then we're happy to send you to other sites. We help you find information. We don't try to control information.