Geeking with Greg: 05/01/2005

Monday, May 30, 2005

Personalized advertising on Findory

Findory launched our personalized advertising engine today.

This early version is built on top of Google AdSense, but these are not normal AdSense ads. They are not targeted merely to the content of the page, but to the individual behavior of each reader.

For example, my personalized Findory front page right now is showing me ads for load balancers and networking equipment. If I load the home page again, it then shows me ads for help with filing patent applications. These ads are unusually relevant and useful for AdSense ads, a good match to my specific interests in business and technology.

Other examples across Findory are also interesting. The top ad when I looked at Findory's page for the popular weblog InsideGoogle was an ad for Google Enterprise Solutions. The ads for a news search on "Star Wars" were appropriately related to Star Wars. The advertising on our science news page included products for science teachers.

Just as Findory's personalization engine matches content to interested audiences, our personalized advertising matches advertisements to interested people. After all, at its best, advertising is a form of content. It is useful when it is relevant. When it is not relevant, it is annoying. We firmly believe advertising should be useful, not annoying.

At Findory, we launch early and often. Our advertising is no different. This release is an early, first step for our personalized advertising engine. As we learn more, we will refine our algorithms, enhance the personalization, and improve the relevance. We will make advertisements helpful and useful to our readers.

Bloglines vs. My Yahoo

SearchViews has a brief interview with Jim Lanzone (SVP of Search at Ask Jeeves). Jim talks about how Bloglines is going to try to grab market share from My Yahoo:

We envision Bloglines as the homepage of the 21st century: 'the Universal Inbox'. With recent rollouts like Weather and Package Tracking, you see it beginning to move beyond news and blogs. We're going to follow this to its logical conclusion. Watch out, My Yahoo!

Bloglines CEO Mark Fletcher also has said some interesting things about My Yahoo. See my earlier post, "Bloglines, My Yahoogle, and information overload".

[via Danny Sullivan]

Friday, May 27, 2005

Yahoo MindSet from Yahoo Research

Bernard Mangold has the announcement on the Yahoo Search blog about Yahoo MindSet, a prototype that reorders search results "according to whether they are more commercial or more informational" as you move a little slider around.

It's pretty similar to but more limited than Google Labs Personalized Search, which allows searchers to check boxes for their high level subject interests and then gives you a slider to reorder search results.

MSN Search also has some similar sliders (though they don't reorder dynamically) on their main search site inside their Search Builder advanced interface.

I'm surprised to see this focus on sliders. They aren't particularly useful. They fail the grandma test. Most novice users will not use or understand the slider; they just want the top result to be useful. It's not even that useful to power users since sliders fail to provide the level of granularity they need.

More knobs, more buttons, more complexity. With less than 1% of users even bothering with the existing advanced search options on search engines, are more controls really what people need? Most searchers just want the right thing to happen. They want it to just work.

See also my earlier posts, "Personalized search at PC Forum" and "Peeking at the future of MSN Search".

Thursday, May 26, 2005

Ask Jeeves Web Answers

Ask Jeeves just launched Web Answers, a service similar to Google Q&A that tries to isolate the direct answer to a query from the other text in the page. For example, the first search result on Ask for "Who is the CEO of Amazon.com?" is a Web Answer.

Like Google Q&A, coverage is poor, but I would expect this to improve with time.

I find these attempts to extract answers from web pages fascinating. It's an early step on the long road toward understanding the vastness of knowledge stored in the Web.

[via Chris Sherman & Gary Price, John Battelle, and Nathan Weinberg]

Tuesday, May 24, 2005

Eric Schmidt talk at UW

Google CEO Eric Schmidt is giving a talk titled "Perspectives on the Information Industry" this Thursday (May 26) at University of Washington.

If you're not lucky enough to be living in Seattle, it will be broadcast live on the internet.

Update: The talk was similar to many of Eric's past talks. Same slides, even the same jokes. The most interesting part was the Q&A.

Eric answered several questions about Google's 20% time and what they do to promote innovation. He talked about small teams and the value of controlled chaos. The comments were similar to the quote I have in my earlier post, "Making innovation run rampant".

He received several questions about A9. Eric responded by praising A9's effort and saying that Google and Amazon cooperated in many more ways than they compete.

In response to my question about personalized search, Eric said that Google definitely intends to show different people different search results using information about their behavior, but also said that it would be optional and respectful of privacy. That's as clear a statement as I've even heard from Google that they will pursue personalized search.

Update: MP3 audio of the talk is already available. ~~On-demand video usually takes a few days.~~

Update: On-demand video of the talk is now available.

Monday, May 23, 2005

Bloglines, My Yahoogle, and information overload

Bloglines CEO Mark Fletcher comments on Google's launch of a customizable version of its home page:

The surprising thing ... is that they're trying to copy My Yahoo.

As many people have found out, the My Yahoo metaphor of a customizable page displaying static information doesn't scale ... With millions of blogs and other sites of interest, you need a different interface paradigm to deal with all that information.

Each time you visit your My Yahoogle page, it takes time to scan the page to see if there's new information. This is a complete waste. If you only show new things, the amount of information that needs to be displayed decreases greatly. There's less information, and it's all new.

Even better, Mark, is if there's less information, it's all new, and it's all relevant.

Most people don't have the time to read 200+ feeds every day. At some point, you're going to have to give up on ordering the articles by date or feed and start ordering by relevance.

Doing better than Google

In the New York Times today, Bob Tedeschi quotes Google Director Marissa Mayer as saying:

"Thinking long term, my gut sense is that, yes, there will be a search engine that knows more about me and as a result does a better job than Google does today. It's my hope that that search engine is us, but it's a further-reaching thing."

Personalized search is the future. What is and is not relevant varies from person to person. At some point, to get further improvements in relevance rank, search engines will have learn each searcher's needs and show different results to different people.

Findory has taken some first steps ([1] [2]) toward personalized search. Findory is alone in field. No other commercial search engine learns individual searcher's interests and changes the order of search results depending on who you are and what you have done.

If you want to see the future, take a look at what the innovative little startups are doing.

Thursday, May 19, 2005

Google Factory Tour slides

Google has a long webcast of their "Factory Tour" meeting today.

If you're short of time like me, you might just skim the slides. Like I did for the recent shareholders meeting, I put the slides all together for quick consumption here.

It's a long presentation, 165 slides. Whew. I like slide #131, the ultimate search engine.

Starting on slide 156, it appears Marissa Mayer will be announcing a new product in Google Labs, a customizable version of the Google.com home page.

The product doesn't appear to have launched yet, though I assume it will later this afternoon. It appears to be similar to My Yahoo in that you can stick little widgets on your Google home page for news and other Google content.

Update: The customizable Google home page has launched at http://www.google.com/ig. Pretty simple. A few checkboxes for plopping a small selection of content like headlines from BBC and Google News underneath the Google search box. Useful, but limited. I assume they will be expanding this over time.

Update: John Battelle gives a little more context and says, "This is an all out response to the success of Yahoo and others in the personalized/RSS space." I see it more as Google dipping a toe in the water, but it's true that My Yahoo and others should be concerned and looking for new ways to keep ahead of Google.

Making innovation run rampant

An InfoWorld article quotes Google CEO Eric Schmidt on how to encourage innovation:

"We prefer [our engineers] to run rampant," said Schmidt. "The most clever ideas don't come from the leaders, but rather from the leaders listening and encouraging and kind of creating a discussion."

Focusing resources on activities that are not directly related to the company's core business will ultimately lead to new discoveries, Schmidt said.

"You want to see every conceivable demo, no matter how wacky it is. People love that. ... They get a chance to present to someone important like yourselves. All of a sudden the whole (corporate culture) becomes about leadership and innovation."

The people closest to the problem are in the best position to innovate. Let them go. Help them create. And watch as they amaze you.

Microsoft takes on the information glut

Jay Greene at BusinessWeek reports that Microsoft Chairman Bill Gates is about to give a speech where he "will talk about how businesses can help workers wade through the information glut" and "will unveil a whole new vision for Microsoft about the way people use information."

Workers are increasingly deluged with e-mail and instant messages. They can troll through scads of information on the Web and in corporate databases. But finding just what they need when they need it is tough.

"The software challenges that lie ahead are less about getting access to the information people need, and more about making sense of the information they have," Gates writes.

We are overwhelmed by all the information coming at us in our daily lives. We need something that makes sense of the chaos, that orders and filters the information streams.

Personalization must be part of this vision. Search can help you find things when you already know what you want; personalization helps surface useful information when you don't already know what you want.

Personalization offers a way to find focus. It learns what you like, shows you what you want to see, and filters out the rest. It extracts knowledge from the information chaos and helps you get the information you need.

[Found on Findory]

Wednesday, May 18, 2005

Findory's Mac OS X Dashboard widget

Findory just launched a cute little Dashboard widget for Mac OS X Tiger 10.4. Your personalized news and blogs, right on your Mac desktop.

Tuesday, May 17, 2005

AdSense for RSS and Atom feeds

Shuman Ghosemajumder announces on the Google Blog that AdSense now is supporting advertising in RSS and Atom feeds.

The idea is simple: advertisers have their ads placed in the most appropriate feed articles; publishers are paid for their original content; readers see relevant advertising - and in the long run, more quality feeds to choose from.

The key here is relevance. Advertising needs to be useful, interesting, and relevant to the reader. It's a challenging problem to match advertising to interested audiences using only the content of the feeds.

As I've said before, advertising in feeds isn't an obvious win. Excerpt-only feeds are already advertisements, advertisements of the full content of the article at the website. Layering ads on the ads will drive traffic away.

Those that do include the full-text of your articles in their feeds may find the feeds become the primary channel to some of their readers. In that case, advertising in that channel makes more sense.

NewsGator eats FeedDemon

Nick Bradbury, creator of RSS reader FeedDemon, has announced that his one-person company has been acquired by NewsGator.

FeedDemon is a fairly popular feed reader with a good number of subscribers, so an acquisition to try to move those subscribers over to NewsGator might make sense.

For the remaining hundreds of small, undifferentiated feed readers, I suspect we'll mostly see them stall or fade away in the next year or two as Yahoo, MSN, IAC/Ask/Bloglines, and other giants gobble up the market. Simple products that merely reformat feeds for display are sitting in a saturated market.

But so much more can be done with RSS and Atom feeds. Feeds are data. They can be sliced and diced, reorganized, analyzed, and filtered. Sorting articles by feed or by date is trivial. Sorting articles by relevance is the future.

Every day, people face an overwhelming flood of information. Products that help users find focus and discover the information they need are entering a wide open field with massive consumer demand. The opportunity is huge for those willing to take it.

See also Chris Pirillo's interview with Nick and NewsGator CTO Greg Reinacker that includes some discussion of competition with Bloglines. Tony Gentile has some useful comments on the acquisition. Om Malik deserves credit for breaking the news.

Friday, May 13, 2005

Google shareholders meeting

Nathan Weinberg posts some comments on the recent Google shareholders meeting and links to the webcast.

If, like me, you don't really want to plow through the entire webcast, you might just take a peek at the slides from Eric Schmidt's talk. I threw a quick page together with all the slides for easy consumption.

The talk looks pretty similar to the analysts day talk a few months ago. New tidbits included elaborating on serving the long tail, mention of integration of Keyhole's nifty satellite data into Google Maps, and showing off Google Search History.

Dogpile's Missing Pieces

The metasearch engine Dogpile released a snazzy little Flash app they call "Missing Pieces" that lets you query Google, Ask Jeeves, and Yahoo simultaneously and see the overlap between results. Well designed, fun little app.

Chris Sherman has a detailed review of Dogpile's recent redesign and new tools.

Thursday, May 12, 2005

Millions of feeds

Alex has the scoop on Findory's millions of feeds, all available by RSS or displayed in our nifty inline widget.

Get a feed for Tech news, a personalized selection of articles from political blogs, even all news or weblog articles that match specific keywords like "Microsoft", "Seattle Sonics", "San Diego", or "Star Wars". There are millions of possible combinations, a feed for every taste.

And you can put news and blog articles on any topic you want right on to your website. Want to show the latest news about Linux on your blog? Go to our inline page, type "Linux" into the search box, and copy the Javascript code it gives you into your blog page.

Just like our RSS feeds, inline works for millions of possible categories and keyword searches. You can even show your weblog readers a view of your personalized Findory page by putting up your personalized version of inline. If you need to match the look and feel of your website, just click "Show optional style code" and customize inline as much as you want.

There are three examples -- related blog posts to Geeking with Greg, a snippet of the news from my personalized front page, and news for "Google" -- in the right side column on my weblog. They are customized to match my weblog style. They update as new news comes in. The personalized version ("My Findory News") even changes in real time as I read new articles on Findory. Very cool.

Wednesday, May 11, 2005

Googleball

Google acquires Dodgeball, a two-person mobile social networking startup.

More details from Michael Bazeley and Gary Price.

Update: About two years later, both the founder and the second employee of Dodgeball leave Google, saying:

It's no real secret that Google wasn't supporting dodgeball the way we expected. The whole experience was incredibly frustrating for us.

Perhaps Google is not as proficient at handling these small acquisitions as they claim to be.

Tuesday, May 10, 2005

Answers.com about Shopping.com

Gary Price posts about Answers.com's deal to integrate product information from Shopping.com.

Answers.com is a metasearch engine that hits specialized databases to try to directly answer your query rather than returning a list of web pages that might contain the answer to your query. More information in my earlier post, "Answers.com launches".

Google.com recently switched their word definitions links from Dictionary.com to Answers.com. I'm curious if this new deal could put some new strain on that relationship. After all, Shopping.com and Google's Froogle are similar metashopping search engines.

As the overlap increases, helping GuruNet's Answers.com may start to be less and less in Google's interests. Google's word definitions could just as easily be provided by Wikipedia directly or by a homegrown combination of dictionary and Wikipedia content.

Saturday, May 07, 2005

Profiting from the long tail

In "Profiting from Obscurity", The Economist talks about how companies profit from helping customers find products in the long tail. Some excerpts:

[The long tail] is a shift from mass markets to niche markets, as electronic commerce aggregates and makes profitable what were previously unprofitable transactions.

How can people find content they want when it is buried far down the tail? Already, a number of mechanisms have emerged, based around user recommendations. Perhaps the best known is "collaborative filtering", in which purchase histories are analysed to work out what else is likely to interest the buyer of a particular product ("Customers who bought this item also bought...", as Amazon puts it). This approach allows users to navigate from hits that they know they like to more obscure titles further down the tail.

Many successful online businesses, such as Amazon, Rhapsody or the iTunes Music Store, already exploit the effects of the long tail. So too do other internet companies, such as Google (which makes money not just by selling adverts to big firms, but also by placing obscure adverts alongside obscure web pages) and eBay (which aggregates low levels of demand for obscure products to make a huge business).

Sadly, no mention of Findory and its unique ability to surface articles from the long tail of news sources and weblogs.

Friday, May 06, 2005

Controversy about Google Web Accelerator

Many have been talking about Google Labs' latest release, Google Web Accelerator. It promises to reduce wait times when web browsing mostly by a combination of caching and pre-fetching.

Unfortunately, it appears to have some issues. First, as Nathan Weinberg writes, it appears that Google Web Accelerator is sometimes serving up the wrong cached page, showing you a page with someone other than you logged into site, for example.

Second, as Fons Tuinstra reports, Google Web Accelerator effectively acts like a proxy and, among other things, allows users to bypass China's firewall. While I'm no supporter of that firewall, I suspect China will not be happy about this development. The response is unlikely to be favorable to Google.

Finally, many have pointed out the privacy implications of Google knowing about most pages each user has ever visited anywhere on the web and the full contents of those pages. Google may have earned a lot of trust, but this is a big step, and one that is likely to cause some concern.

But keep the big picture in mind here. Not only is Google providing a free product that can save people time if they choose to use it, but also properly anonymized information about what pages people are visiting could be used by something like TrustRank to help reduce web spam and improve the relevance rank of search results. In the end, Google is helping people find the information they need more quickly and efficiently.

Update: Hmm... I have to say I'm feeling a lot less charitable toward Google Web Accelerator after seeing Jason Fried's post about how Google Web Accelerator can delete data and have other undesirable behaviors. It looks like a lot of sites, including Findory, are going to have to go through the effort of explicitly disabling Google's prefetch. Ugh, what a mess.

Tuesday, May 03, 2005

Seattle Times on small search companies

Findory was mentioned in a Seattle Times article covering search companies in the Seattle area. There's a brief discussion of the rapid growth of our tiny personalization startup. And an amusing picture of me that I think truly captures the inner geek.

UW CS Professor Oren Etzioni has a great quote in the article: "Could we build a search engine that learns from the Web? ... Imagine a program that is running and constantly learning from the Web and getting smarter over time."

It's an excellent vision, the dream of many AI researchers, understanding the vastness of knowledge stored in the Web.

Monday, May 02, 2005

Web spam and TrustRank

I finally managed to get a good look at the "Combating Web Spam with TrustRank" paper by Gyongyi et al this weekend.

TrustRank takes a manually designated set of good or bad pages and propagates that information across the link graph. It's an interesting modification to PageRank. Definitely worth a read.

The paper describes a manual process for determining the seed set of trusted sites. I'm curious what we'd find by instead analyzing user behavior. For example, we could consider websites used over the past month by trusted people to be trusted. That is, trusted sites would be the sites the community uses and trusts.

Noisier data, to be sure, but there's a sea of data here, enough that we should be able to be robust to the noise and discover the wisdom hidden within. Ah... So many interesting possibilities with this kind of juicy data.

By the way, if you're interested in web spam, don't miss "Web Spam Taxonomy", also by Zoltan Gyongyi and Hector Garcia-Molina. It's a light paper that describes many of the devious techniques used by web spammers.

Update: There appears to be a recent March 2006 technical report on this TrustRank work, "Link Spam Detection Based on Mass Estimation" (PDF).

Seattle Times on the search war

In "Microsoft Learns to Crawl", Kim Peterson at the Seattle Times gives an inside view of MSN Search's efforts in the search war. Some excerpts:

The search battle is bigger than beating Google. Microsoft is carving its path in the next generation of computing -- one in which search becomes a platform, not a feature ....

Some ... are eyeing perhaps the biggest weapon in Microsoft's arsenal: the operating system. The company is building search into its upcoming operating system, code-named Longhorn and expected next year, and likely will give it a prominent spot in front of the user.

Kim also wrote a good article on Sunday about the past and future of search.

See also Fred Vogelstein's excellent Fortune article on MSN and the search war.