Thursday, September 30, 2004

Clusty from Vivisimo

Vivisimo launches Clusty, an expanded and enhanced version of their clustering search engine that includes web, news, image, shopping, blog, and other types of search.

The quality of the clustering seems quite high. I found a shopping search for "turtle beach" created appropriate clusters for sound cards and even subclusters for 5.1 and 7.1 cards. A vanity search does a fairly good job of separating my work out from others with the same name and clustering various parts of my career. Although the coverage and quality of the default search results seems lower than Google to me, the clustering makes it easy to refine searches quickly.

Vivisimo's clustering technology was already impressive. This new offering seems quite compelling, a strong alternative to Google.

Chris Sherman and Gary Price have an excellent write-up on Clusty and its features over at Search Engine Watch. The New York Times has an article with a good high level summary. Steve Rubel points out that Clusty includes a blog search; it's disabled by default, but you can turn it on at the customize page.

Wednesday, September 29, 2004

Microsoft and the search war

Angus Kidman summarizes Microsoft's new high priority focus on search. Some excerpts:
    As any user of the existing Windows "File find" option can attest, Microsoft's current search technology isn’t exactly an advertisement for speed or efficiency. CEO Steve Ballmer admitted as much last March ... "People say that Microsoft does it all, but this is a case where we didn't do it all."

    The principle cause for Ballmer's chagrin is Google, which has become virtually synonymous with the notion of effective Internet searching ... Microsoft's plan is to change that, and the company is talking up its new approach to search at every possible opportunity.

    Microsoft has long promoted the slogan "Windows everywhere", but now it's search that has the Redmond developer team huddled over their PCs. With data volumes online and on hard drives continuing to expand, the ability to quickly find relevant information seems rather more important than adding a flashy interface to the next operating system iteration.

    "Search is a very pervasive thing," chief software architect Bill Gates remarked during a recent visit to Australia. "You want to search the Web, you want to search your corporate network, you want to search your local machine, and sometimes you want search to work against multiples of those things."
This last quote is important. It signals the importance of desktop search to Microsoft, not just as a separate application, but also as a means of beating Google on web search.

Google News, publishers, and fair use

Adam Penenberg suggests Google News is still in beta and runs no advertising because of concern over whether the news excerpts violate fair use.

Danny Sullivan at Search Engine Watch has one of the better responses, pointing out that the same fair use issues apply to web search. Moreover, publishers realize that being listed in web search engines increases their traffic, and publishers actively seek to be listed in news aggregators for the same reason.

Adam's argument does seem quite weak. For those publishers who don't want the traffic Google News sends them, the answer is simple, remove yourself from the index. It's their loss, lost traffic, lost market share, and lost advertising revenue.

Most publishers understand the value of being listed in a news aggregator. Most publishers actively seek to be listed and even optimize their site for news crawlers. The irrational minority of publishers that ask to be removed won't significantly impact the coverage or quality of the Google News product.

Google Labs Aptitude Test

Many are pointing at Adam Rifkin's excerpt of the Google Labs Aptitude Test from Dr. Dobbs Journal.

It's hilarious, thoroughly geeky, and well worth the read.

Tuesday, September 28, 2004

Finding the right content

Alex said this to me earlier today:
    What we're doing is unique because it actually tackles the problem of selecting interesting content and ranking/organizing the content sources. The word "find" is distinct from "browse" (My Yahoo) and "search" (Technorati).
Couldn't agree more.

Overstock Auctions

Overstock, the successful e-commerce store that sells mostly liquidation merchandise, announced their entry into auctions. They appear to be trying to undermine eBay by offering lower fees to sellers and lower prices to buyers.

The idea of using liquidated merchandise as a way to bootstrap into eBay's market sounds familiar.

Amazon products in your web feeds

FeedBurner announces a tool to put ads for products into your RSS feeds tagged with your associate id (so you get the payment for the referral). Sounds similar to the Blender prototype, the interesting but half-baked feed advertising solution from Think Tank 23.

Feedburner says that Amazon is doing the actual matching of blog content with Amazon products for their version, so I'd assume that the quality of the matches would be relatively high. It's a challenging problem to recommend products given arbitrary free text weblog posts, but, if anyone is up to it, Amazon is.

[via RSS Weblog]

Bloglines new web services

Bloglines just launched web services, including the ability to access web feeds cached at Bloglines instead of from the source, a cool trick which normalizes the data formats and eases the load on the source servers.

Mark Fletcher talks about some of what can be done with the nifty new service and, in a press release, announced deals with FeedDemon, NetNewsWire, and Blogbot (no, not MSN Blogbot).

Geodog praises the innovation coming out of Mark Fletcher's world, and I have to agree. This is a great new feature.

Update: Marc Hedlund has an excellent tutorial over at the O'Reilly Network on using Bloglines web services.

Monday, September 27, 2004

All new My Yahoo

My Yahoo is doing a live beta of their redesign. As a long time fan and user of My Yahoo, I was eager to check it out.

Other than UI twiddles like different fonts, looks like it's integration of RSS with My Yahoo (providing a lot more content) and customization on steroids.

It's pretty interesting that Yahoo is trying to turn My Yahoo into an RSS reader but, unfortunately, it isn't a very good RSS reader, at least not compared to Bloglines. I have both set up, but I find My Yahoo to be so slow to update and so cumbersome to use that I rarely bother with it. I'm not sure turning My Yahoo into an RSS reader is a good idea, at least not in the current state of the product.

Even worse, I'm really not sure more content and more customization are what My Yahoo needs. For example, after upgrading to the beta UI, try clicking on the "Add Content" buttons at the bottom of the page. You're given a directory of content with what appears to be thousands of largely undifferentiated widgets to add, most of them RSS feeds. Browse, search, even try looking at "popular" content or "editor's picks", and you'll get hundreds of items to review. Completely overwhelming.

Given that My Yahoo's biggest hurdle is that most people don't bother to customize the page, I don't see how an overwhelming and confusing blast of content helps matters. The site should focus on helping me find what I want. Recommend content. Filter content. Show me a few good choices. Help me.
See also Cory Kleinschmidt's review of the My Yahoo beta on Traffick.

See also my earlier posts about customization vs. personalization, suggestions for a Yahoo home page redesign, and a review of Bloglines.

Update: Jeremy Zawodny claims the new My Yahoo "brings RSS to the masses", asserts it's super easy to use, and argues comparisons with Bloglines aren't valid. Just about the opposite of everything I just said. Odd that we disagree so completely.

Feedster is expanding

Feedster is hiring and moving into new office space. These folks are doing some great work. Not surprising that they're growing.

Gmail and Google revenue growth

The NY Post discusses GMail as an important source of revenue growth for Google. [via Search Engine Lowdown]

See also my earlier post arguing that AdSense is the key to Google's revenue growth.

Sunday, September 26, 2004

4M weblogs?

Technorati now claims to be watching over four million weblogs. That's a lot of blogging going on out there.

I wonder how many of these are real weblogs (active, not duplicates, not just search engine spam). Feedster crawls less than 1M. Blogstreet lists only 140k. lists 2.8M.

It seems that no one really knows the answer. There's even a blog dedicated to trying to figure out how many blogs there are out there.

In the end, it probably a lot depends on how you define active blog. One good metric might be the number of unique weblogs that were updated at least once in the last 90 days. The NITLE Weblog Census was the closest thing I could find to this. They list 1.3M active English weblogs (and 2.2M total worldwide). Their methodology sounds pretty solid, so this might be the best estimate of the number of blogs out there.

[Thanks, Micro Persuasion, for pointing out the Technorati stat]

Saturday, September 25, 2004

Friday, September 24, 2004

Looking for something to blog about?

So, you've started your own blog. Great. Now you just need something interesting to write about. No, not about you. Please, nobody cares about that curious thing you found stuck in your teeth this morning. People read blogs for interesting information and, hate to break it to you, but you really aren't that interesting.

So what do you write about? You need some way of discovering interesting news. Not the mainstream stuff that you see on every news site with an AP feed. You need to dig deep and find something people haven't seen.

Well, good friends, Blogory is just the ticket. Read a few weblog articles on it and it'll help you discover weblog articles you'd otherwise never have seen. Give it a whirl. Just what you need for inspiration.

We at Findory eat our own dog food. We use Findory and Blogory all day long every day. We came up with so much interesting news that we created a new blog, Findory Finds, to capture some of those goodies.

So, give Findory and Blogory a try. Discover just how much news you're missing.

Thursday, September 23, 2004

Humans vs. Robots == Yahoo vs. Google

JD Lasica writes about potential biases in Google News' automated selection of articles for their front page, apparently mostly due to small news sites trying to game the system.

But a fascinating sub-theme in the article is the differences between how Google News and Yahoo News select articles for their front pages. For Google News:
    Google News uses a mix of techniques to ensure that users are presented a diverse range of perspectives. The ranking and prominence of stories are based on several factors: How many publications are writing about a topic; how recent the articles are; the size of the story, with substantive pieces ranking higher than short items; and the frequency of the search term within the article. The computer algorithms, [Krishna Bharat, chief scientist for Google News] said, "are trying to understand how hot and how big the story is."

    Every 15 minutes a new edition of Google News is generated and the ranking changes. The formula rearranges the headline blurbs in each story cluster based on the freshness of each article and the importance of the source.
Yahoo News uses humans:
    A small editorial staff programs the Yahoo News front page as well as plucking out hidden gems that appear on other sites ... the factors include the source, the freshness of the story, and a method of determining relevance.

    "We use actual humans," [Jeff Birkeland, product manager for Yahoo News] added. "News is far too human of an endeavor to rely 100 percent on automation."
Yahoo's human-based approach is more typical. Most news sites seem wary of automation. CNet went a step further, mocking Google's approach with the tagline "The Web filtered by humans, not bots" on their Extra product.

But some things can only be done with automation. With over 4,500 sources, Google News has such a deep database of articles that it would be impossible for human editors to review even a small subset. Using human editors means many of these articles would never even have the potential to be featured; the depth of knowledge is wasted.

And there's more. What's the most relevant news of the day? It varies from person to person, depending on interests, career, and location. Until you personalize the news, you're still wasting the depth of knowledge in your database. So, you personalize. Now, instead of one front page, you're building millions of front pages, each with a different view into your hundreds of thousands of news articles. It's a task that's simply impossible to do with human editors.

In the end, it will be robots. It is inevitable.

Update: Yahoo News contacted me to clarify Jeff Birkeland's comment, saying that some of Yahoo's front page is programmed by hand, some is automated, and story rankings are mostly automated. This doesn't impact the thrust of my argument that total automation of the entire front page will be necessary to expose the full depth of news available, especially as personalized news becomes mainstream. acquired

Looksmart acquires in a move toward personalized search. [via Searchblog and SearchEngineWatch]

Wednesday, September 22, 2004

Yahoo web services?

Jeremy Zawodny asks, "What web services do you wish Yahoo offered?"

I'd start with the basics. Web services for search and browse for the web, images, and shopping. Improve the visibility and expand the functionality of the existing news web services. Access to Yahoo Finance data (stock quotes, etc.) would be great too.

Google only provides access to web search through their API. Amazon only provides access to shopping. If Yahoo just provided a comprehensive suite of the basics -- web search, shopping, news, images, etc. -- they'd already be the strongest offering out there.

Update: Only a couple weeks after I wrote this, Amazon expanded their web services and released an Alexa API that gives access to web search, directory, and other interesting data. Yahoo still has an opportunity, but things are moving fast.

Bloglines and feed readers

I love Bloglines and use it every day. But, for a while now, I've been wondering how Bloglines will survive. There are literally hundreds of feed readers available. Though most of them are desktop applications, not web-based like Bloglines, I'd think it'd be straightforward to copy the basic Bloglines functionality of caching and displaying Atom and RSS feeds.

But, lately, it seems that Bloglines is getting more difficult to knock off. Not only is Bloglines reviewed very highly and building market share, but also it's adding some impressive functionality that increases stickiness and switching costs.

For example, Bloglines has related and recommended feeds, making it easier to discover new feeds. Bloglines clip blogs are clever. And I find Bloglines basic keyword search to be better than Technorati and on par with Feedster. All very cool.

It'll be interesting to see how this plays out. RSS is already integrated into Firefox and probably will soon be in Safari, IE, and Mozilla. My Yahoo already has a web-based feed reader in beta; Google and MSN may follow soon. Will independent feed readers survive the entry of these giants?

Update: A SJ Mercury News article with a nice overview of Bloglines. [via]

Update: As of Jan 2005, both My Yahoo and My MSN have integrated feed readers. Here come the giants.

Bloglines related feeds

Bloglines now shows related feeds for any feed. Interesting and a fun way to discover new blogs.

On a related note, Bloglines feed recommendations, which are based on your blog subscriptions, seem to be much better lately. I was shocked to see the blogs of people I used to work with at on the top of the list of recommendations. Impressive.

Tuesday, September 21, 2004

Personalized search, today and tomorrow

USA Today summarizes the current state of search personalization, mentioning products from A9, Ask Jeeves, and Google.

Danny Sullivan, editor of Search Engine Watch, is quoted in the article as saying personalized search is inevitable.
    This is where search is eventually headed. Everything will be personalized to make you feel like you have a more personal relationship with the Web site.
Sullivan also has a good post on Search Engine Watch with tidbits on the past, present, and future of personalized search.

Evidence of a Google web browser

Jason Kottke lists the evidence that Google is developing a web browser.

Update: A BBC article on the rumors. [via Scripting News]

Monday, September 20, 2004

My oh my, MyJeeves

John Battelle summarizes the new features from, focusing on MyJeeves, their new personalized search. It appears to have many of the search history features of A9 with the annotation features of Furl, all without requiring login. This is some pretty nice work.

But it doesn't appear that MyJeeves changes the search results based on your history. So, while we've got the foundation for personalized search results, we aren't quite there yet. also announced that (congrats, Rich!) and CitySearch will be included in Ask Local.

See also coverage from Gary Price and AP.

Challenging eBay in categories quotes a WSJ article (paid subscription required) describing how eBay faces substantial competition in specific narrow categories such as tickets, automobiles, real estate, books, music, and videos.

See also my post from earlier this year, "Kill eBay, Vol. 1."

Google and going global

Verne Kopytoff at the SF Chronicle reports on Google's international efforts:
    Google's international business accounts for nearly 30 percent of its revenue. That total is expected to increase as the company matures.

    Internal Google documents obtained by The Chronicle paint an even more detailed picture of the firm's business abroad. They include project updates, contract specifications and a company study.

    They show that Google's business, though global, is concentrated in just 10 countries. They also reflect the company's frustration over its billing system, which couldn't accept some popular payment methods from advertisers in some major countries. The system, scheduled for replacement earlier this year, was blamed for lost revenue.
[Via Search Engine Watch]

AOL's In-Store and Pointpoint Search

Search Engine Watch reviews AOL's shopping metasearch engine, a new competitor to Froogle, Yahoo Shopping, and MySimon.

Blogory in Seattle Times

Seattle Times reporter Kim Peterson chimes in on the web feeds scaling debate. After a high level introduction to RSS and some of the scaling issues with web feeds, the article includes a nice mention of Blogory:
    Seattle-based is offering what Chief Executive Greg Linden says is the next-generation RSS reader. The Web-based service, called Blogory, collects RSS feeds into a massive pool and sends its users individual snippets to read, based on the history of what they've read. Users don't have to subscribe to individual feeds.

    RSS feeds are popular with early adopters and news junkies, Linden said, but the technology may not become as widely used as some have predicted.

    "Other people don't really have the time to set up an RSS reader, hunt down these feeds and copy and paste (them) into a reader, which is exactly the problem that Findory is trying to solve," he said.
To be clear, Blogory is trying to solve two problems, ease of use and relevance. We make reading blogs trivially easy. Just read articles, we find the weblogs for you. And, we provide focus. We filter out irrelevant posts and help you find interesting articles you would otherwise miss.

See also my earlier posts ([1] [2]) on RSS scaling issues.

Saturday, September 18, 2004

Registration? For what?

John Dvorak's take on mandatory registration on online news sites:
    I have to conclude that the typical newspaper in this country does not want you going on its Web site, and deliberately creates a barrier in order to prove to the shareholders that the Web is losing them money. It's a feeble attempt to emphasize the printed version of the paper at the Web site's expense.
See also my earlier comments ([1] [2] [3]) on the folly of mandatory registration.

[Thanks, David Carlson, for pointing out John Dvorak article]

Update: Rich Skrenta discusses the Dvorak article, focusing on the distorted incentives created by undervaluing online advertising. Well worth a read.

Friday, September 17, 2004

Orkut unusable

Google's social networking site, Orkut, has become so slow as to be unusable, as many have noticed.

Search as a dialogue

Microsoft Research's text mining group says:
    A search engine should be more helpful than merely delivering a large list of documents to a user. We are building prototypes that give users a better search experience by allowing users to more effectively navigate results and have a dialog with the search engine to find what they’re looking for.
The focus on search as a dialogue is interesting. A major flaw of current search engines is that each search is treated as independent.

For example, let's say I'm trying to find discussions some of the topics covered at Foo Camp. I might start by searching for "foo camp". Not satisfied with those results, I might change it to "foo camp blogs". That doesn't get me what I want. I try "foo camp web feeds". And so on. I'm repeatedly refining my search query, trying to find the information I need.

But current search engines ignore this stream of related queries, this dialogue, instead treating each search as independent. There is an opportunity for techniques that focus explicitly on this kind of refinement process, using all the information to help you find what you need more efficiently and reliably. Personalized search is one of these techniques.

[Thanks, Mary Jo Foley, for pointing out quote on the MSR page]

MS Research reorgs around search

Mary Jo Foley reports on a reorganization at Microsoft Research that appears to be emphasizing search technology:
    It is not just the product teams at Microsoft that are focusing heavily on search. The Microsoft Research unit is doing so, as well.

    Microsoft Research (MSR) recently reorganized its Redmond labs by creating four new research teams. A text mining, search and navigation team is prominent among the newly minted groups. At the same time, MSR Asia ... created a handful of new research teams. Among them: A Web search and mining team.

    The Redmond text mining/search team will be "a new hub for search," [Kevin] Schofield [GM at MSR] said. "And our investment in this area will only grow. That's where the industry is headed."
[Thanks, Gary Price, for pointing out the article]

Thursday, September 16, 2004

Motley Fool on A9

Alyce Lomax at writes:
    While it was in beta, I kind of thought that Amazon's A9 was just a casual dalliance into the search space, fueled by its old-school Internet credentials and a desire to keep up with the Joneses. However, I can see it may very well be a force to contend with in the space between e-commerce and search -- that is, if folks don't feel a bit too pinpointed and profiled. I'm curious as to how this experiment will turn out.
[Via Search Engine Lowdown]

Update: Alyce follows up with a discussion of Amazon's new 1.57% discount for users of A9.

Tuesday, September 14, 2004

A9's new "discover" feature

A9 launches some new features (reports from New York Times and Business 2.0). Most interesting is a "Discover" section that tries to recommend other interesting web pages based on your history. From John Battelle:
    New to this version of the site is a feature A9 calls "Discover," which finds sites you might be interested in based on your clickstream and -- here’s the neat part -- the clickstream of others.

    This powerful feature smells an awful lot like Amazon’s fabled recommendation system, and over time, may well become the basis of an entirely new relevance scheme that builds upon Google’s link-based PageRank.
Still wondering if A9 will do personalized search?

See also my previous posts ([1] [2] [3] [4]) on A9 and personalized search.

War on comment spam

Mark Glaser posted an excellent article today about fighting comment spam. Suggestions include turning off comments on old posts, using redirects on links in comments (preventing Google from indexing them), and mandatory registration or preview pages (making it more difficult for robot spammers). In the article, Dave Winer makes this excellent point:
    While [Winer] thinks that all the war tactics by bloggers will ultimately fail, he says that Google itself could solve the problem by adjusting PageRank so that it doesn't weight links from comments as heavily as links within blog posts or on other pages.
Joel on Software also has a good post (scroll down to "Discussion Group Software") where he talks about using Bayesian filtering to detect comment spam and META tags to prevent search engines from following links on comments.

[Thanks, Niall Kennedy, for pointing to the OJR article]

Findory buzz

Some nice blog buzz on Findory lately (ResourceShelf, Weblog, ResearchBuzz, New Media Hack). Thanks, all.

Changes at Google Local

Searchblog quotes the PR announcement from Google on new features and improvements on Google Local.

Still missing are the detail pages of Yahoo Local that allow people to post reviews and comments about each individual local business. This feature is a huge advantage for Yahoo since customer reviews and ratings provide substantial additional value and serve as a differentiator.

See also my earlier comments on why local search isn't likely to be particularly lucrative.

Yahoo buys MusicMatch

AP is reporting that Yahoo just bought MusicMatch for $160M.

MusicMatch has a pretty nifty recommendation engine that helps listeners discover new music.

Monday, September 13, 2004

Blender from Think Tank 23

John Battelle and others pointed to Blender, a prototype of a tool that tries to add related Amazon books (with associate tags) to any RSS feed. If you must put advertising in your feed -- and please consider why you really shouldn't -- it's true that this might be a relatively non-obnoxious way to do it.

Unfortunately, the idea seems better than the implementation. The first few recommended books for this blog, Geeking with Greg, were:Hmm... Nearly random, from what I can tell. The description of the prototype says they "scan the items in the original feed(s) and use our concept matching technology to find books that relate to the ideas expressed in the feed." If I grok that, it means they're doing keyword or phrase matches between individual blog posts and book descriptions or reviews. A reasonable first approach, but I'm not surprised that they seem to get a lot of spurious matches.

To be fair, it is just a prototype. The idea is clever. I'm sure we'll see a compelling implementation soon enough.

Sunday, September 12, 2004

I, Cringely on distributed backups

This week's I, Cringely column is on a distributed backup system that takes advantage of the free space on everyone's machines:
    Here's my idea for a data backup service I call Baxter. This is NOT a virtual drive available on your system, but a virtualized backup system that works transparently and requires some time to restore your data.

    It's a RAID system using donated disk space on a wide area network. Your data is compressed, then cut into chunks, and those chunks are distributed to dozens of places with enough forward error correction thrown in to cover any storage that is lost or happens to be down when recovery is needed. The data is both encrypted (on the customer end, so unencrypted data never enters the system and that vulnerability is eliminated) and split into chunks so no one person has enough to make any sense of it even if they could decrypt it.
It's a clever idea, but not a new one. Farsite, FreeNet, and many others have explored distributed file systems using free space on client machines, though they didn't focus on backups.

Coming out of the Sloan Program in 2003, one of the startup ideas I explored was a distributed backup system fairly similar to what Bob Cringely now proposes. As I got deeper into it, the idea started to look much less attractive.

First, the service isn't particularly attractive to businesses. Disk space is cheap. Concerns about the reliability of this service and of storing sensitive data out on the cloud of unknown machines would trump any minor cost advantage.

Second, bandwidth is an issue. You certainly can't use machines that are connected over modems. And slow upstream connections on cable modems or other broadband are an issue too. A 10G backup over a 256Kbit upstream connection (very common with DSL and cable) would take about 4 days.

Third, it isn't obvious there really is much of a cost advantage. Under Cringely's proposal, I pay $4/month and lose disk space equal to what I need to backup. But, I can buy a 40G internal drive for $40 and use that for backups. Or spend a little more ($100-200) to get an external Firewire/USB drive for easier installation and backups. Cringely's $50/year and half of my disk space just isn't an obvious win over just buying another disk.

Nevertheless, it's an interesting idea. Like Seti@Home and others have done for idle CPU, it would be nice to find a way to use all this idle disk space sitting on the network. But it isn't the obvious killer app Cringely makes it out to be.

Update: A year and a half later, it looks like a startup,, has implemented Cringely's idea almost exactly as he described it.

Foo Camp on web feeds

Folks at Foo Camp are discussing next-generation weblog readers:
    Building the Next-Gen Feed Reader

    Modern feedreaders work well with 10 or 50 feeds, but fail to be useful at 300+ feeds. Skimming feeds are difficult, and any intelligent sorting and filtering is practically non-existent. Bayesian filtering, adaptive sorting, link aggregation and popularity (your own personal Blogdex!), feed ranking, and internal statistics are all interesting angles.

    Let's talk about building a microcontent browser that can scale to 1000 feeds and beyond.
Seems like they haven't seen Blogory yet. Blogory solves exactly this problem, managing thousands of feeds with adaptive sorting and filtering. It uses personalization to helps readers find focus in the overwhelming glut of information. It is the next generation of feed readers.

There's also a session on weblog search:
    Weblog Search Engines

    Technorati, Feedster, Bloglines, Blogdex and MSN BlogBot are all competing for the blog search space, not to mention the popularity trackers like Blogdex, Daypop and Popdex. What are they doing well, and what could be improved? What features are they ignoring entirely? Will Google step in at the last minute and kick everyone's ass?
See also my earlier post saying that, yes, Google (and Yahoo) will come in and kick everyone's ass. It's not just that Google and Yahoo are bigger. It's that the current small players have scaling trouble and other quality problems, providing an opportunity for the big guys.

There's several other interesting sessions as well on topics such as machine learning, web services, internet advertising, and security. Sounds like a good time.

Friday, September 10, 2004

Personalized news is everywhere

Want your news on your blog? Want to show your readers what news you read? Not just top headlines, but the news that matters to you?

You got it. Findory Inline News lets you put your personalized Findory News right on your blog. This isn't just top headlines, the same news for everyone. This is your news, selected just for you. And now you can put it on your website.

As always, you don't have to do anything to personalize your news. Just read news on Findory. We'll learn your interests and find the news that matters most to you, building you a personalized front page, personalized RSS feeds, personalized daily e-mail, and now personalized inline news.

What? That's not enough? You want one for blogs too?

Okay, okay. Findory Inline Blogs will show your personalized selection of blog articles to your readers. Just read blog articles using Blogory. We'll figure out your interests and find you other weblog articles. And now, using Findory Inline Blogs, you can put your personalized blog feed up on your website. Wowsers.

Too many Microsoft blog feeds

Scoble complains that aggregating all of the 1300 feeds of Microsoft bloggers, as it is currently, "doesn't seem like a great idea (and the RSS bandwidth problems seem to bear that out)." But, he says, individual feeds have many disadvantages too, including harder discovery, worse bandwidth issues, and overwhelming your weblog reader:
    One person, one feed, seems like the best idea (particularly when joined with services like pubsub, feedster, and technorati), but means that discovering bloggers gets harder ... [and] the bandwidth problem is still there, only worse, cause now everyone will subscribe to 1300 feeds instead of only one.

    Added into the mix is that my news aggregator of choice, NewsGator, gets very slow when you go over 1000 separate feeds. So, I'm not going to be adding a whole bunch into the mix.
Scoble, take a look at Blogory, Findory's personalized weblog reader. It helps you discover new feeds, aggregates thousands of feeds and helps you find and focus on interesting weblog articles in those feeds, and caches to reduce bandwidth costs. Isn't this exactly what you need?

Thursday, September 09, 2004

Alex Edelman joins Findory

Alex Edelman, a seven year veteran of, has joined The impact of Alex's raw enthusiasm and impressive talent is already visible on Findory. Stay tuned for more.

Google's news alert manager

Steve Rubel noticed that Google now has a way to manage all your news alerts. Before, you had to manage each news alert individually by e-mail and could only change a news alert by unsubscribing and then resubscribing.

It's interesting that Google seems to be moving toward accounts for users, much like Yahoo.

Web Search Garage

Tara Calishain wrote me about her new book, "Web Search Garage". Looks interesting. Tara is right that being able to find information quickly and efficiently is an enormous advantage.

The section "Seven Ways to Save Time Searching" includes a mention of Findory News, so you know it has to be good.

Wednesday, September 08, 2004

The perfect search

John Battelle asks, what is perfect search?
    Imagine the ability to ask any question and get not just an accurate answer, but your perfect answer -- an answer that suits the context and intent of your question, an answer that is informed by who you are and why you might be asking. The engine providing this answer is capable of incorporating all the world's knowledge to the task at hand -- be it captured in text, video, or audio. It's capable of discerning between straightforward requests -- who was the third president of the United States? -- and more nuanced ones -- under what circumstances did the third president of the United States foreswear his views on slavery?

    This perfect search also has perfect recall -- it knows what you've seen, and can discern between a journey of discovery -- where you want to find something new -- and recovery -- where you want to find something you've seen before.
This is the Oracle of search. Not only would this search engine have to contain all the world's information, it would have to understand it and be able to determine the relevance of it to any particular task. It would have to build a model of each user, understanding what each person knows and doesn't know. Building this would mean we've built an omniscient perfect intelligence. Why this intelligence would bother spending its time answering our puny questions is a mystery to me.

However, perhaps we can take baby steps toward this goal. Much of the work in relevance ranking is going into further understanding the text of the page. Question answering systems tear through volumes of data frantically applying grammatical models and trying desperately to ferret out the answer buried in the sea of natural language goo. Personalized search, in some forms, builds a model of the user, understanding a little of what they know and don't know, and applies information gleaned from other similar users. We are getting better, but we're a long way from the Oracle.

The news through their eyes

Steve Outing and Laura Ruel studied people's eye movements as they read news online. Some tidbits:
  • Upper-left of the page attracts the most attention
  • Large or underlined headlines discouraged reading of the blurbs
  • Navigation at the top of the page is preferable
  • Text ads attracted more attention than display-type ads
The full article is well worth reading.

Tuesday, September 07, 2004

Findory Inc.

Many things are changing at Findory these days. The most recent is our business structure. Findory LLC just converted to Findory Inc.

An LLC is an advantageous structure for a small, closely held firm that is taking losses. It was appropriate for what was, but not for what will become.

Social software gets the money

The SJ Mercury News claims social software will be the latest fad for VC funding:
    Last year, the VC big-swingers ... converged upon social networking start-ups .... This time, they're piling together into a sector we'll call "social software." The term was apparently coined by Clay Shirky, an adjunct professor at New York University and expert on the social Internet. He is an adviser to SocialText, one of the path-breaking companies in the area. SocialText provides Wiki services to corporations. A Wiki, in its simplest form, is a single Web page that can be written upon, and edited, by multiple users at once.

    Meanwhile, action is just as hot in a related set of social software start-ups, including "news aggregators" and "Really Simple Syndication" companies. RSS companies like Technorati and Feedster crawl blog sites and allow users to search them.
[Thanks, Ross Mayfield, for pointing to the article]

Monday, September 06, 2004

More on Craigslist

In addition to being in the top 25 in web traffic, Craigslist generates almost $10M/year in revenue and has a projected valuation of $100M, all with just 14 employees.

It's a fantastic example of the strength of a community website. Some background from a recent NYT article:
    The foremost lesson would be about community and how to sustain one online. Craigslist started in 1995 as an e-mail newsletter that Mr. Newmark sent to friends informing them of San Francisco cultural events. As interest grew, the newsletter became an online flea and job market and an essential community bulletin board.

    As investor-backed Internet companies began to surge in the late 1990's, Craigslist remained the tortoise. When the dot-coms fizzled, Craigslist was celebrated as an antidot-com, achieving - despite its lack of business plans, profit projections and tchotchkes with logos - the kind of mass acceptance that high-tech investors clawed for. When the bubble burst, Craigslist was left standing - a low-maintenance community site used by, among many others, former dot-com workers looking for jobs. accepts no banner advertising. It posts no pop-up ads, requires no visitor registration and charges no fees, except to employers posting job offers.
And Craig Newmark's obsessive focus on customers:
    The other key to the success of Craigslist was Mr. Newmark's fastidious personal commitment to keeping scammers off the site ... Mr. Newmark ... has a kind of condition: obsessive customer-service disorder. He is not totally at peace if there are e-mail messages in his in-box complaining that someone is falsely advertising, defacing or hacking into the site or blanketing various forums or channels with sales spam.

    Craig Newmark is the founder and chairman of Craigslist, but his primary job is as its foremost customer-service representative. He is the vigilant overseer of the company's integrity.

Friday, September 03, 2004

Amazoning the news

Rich Gordon points to an old article"Amazoning the News", about adding personalization, browse, and community features to news sites. After talking about browse features in music services, Rich jumps back to news and says that he wants to discover news that he is "interested in but might never come across otherwise" and wishes that the "service existed for editorial content."

Findory News is what you want, Rich. It's a personalized news site. Searching over 2,000 sources, Findory News helps you discover news you're interested in but might never come across otherwise. There's even an Opinion section with editorial content.

Thursday, September 02, 2004

Web advertising misery

Jakob Nielsen, usability guru, writes about web advertising practices that make customers miserable:
    The third prevailing ideology of Web design is oppression, as mainly espoused by certain analysts who wish the Web would turn into television and offer users no real choices at all. Splash pages, pop-ups, and breaking the Back button are typical examples of the misery ideology.

    One of misery design's most insidious recent examples is the idea of embedding links to advertising on the actual words of an article using a service like IntelliTxt. By sullying the very concept of navigation, such ads not only damage the user experience on the host site, they poison the well for all websites. Such links make users even less likely to navigate sites, and more likely to turn to trusted search engines to guide them to the next page.

    Like much Web advertising, embedded ad links rely on interruption marketing, intruding as much as possible on users and preventing them from doing what they want to do. As such, many of these ads have been failures. The most successful Web ads empower -- rather than annoy -- users. Examples include search engine advertising, sites with classified ads, and request marketing.
See also my earlier posts on behavioral targeted advertising and bringing sense to web advertising.

Hell-bent Ballmer

Hiawatha Bray writes:
    An ebullient Steve Ballmer, president and chief executive of Microsoft Corp., said yesterday that his company is "hell-bent and determined" to challenge Google Inc. for leadership in the Internet search business.
A longer excerpt from a CRN article:
    "We see search as in the early phase of innovation and an area in which we underinvested," said Ballmer acknowledging that other smart ISVs, notably Google, spearheaded the category.

    Ballmer pointed to AltaVista as Search Version 1.0, Yahoo as Search Version 2.0 and Google as Search Version 3.0. Microsoft, he said, will develop a next generation of search technology that would not only allow companies to search the Internet more effectively but search their computer networks, hard drives and e-mail more effectively.

    "We're hell bent and determined to be an innovation leader in that," Ballmer said. "We had fragmented efforts and now we're pulling them all together."
[Thanks, SearchBlog, for pointing out the quote]

Wednesday, September 01, 2004

Behavioral targeted advertising

A light overview of personalized advertising at Search Engine Journal. A couple excerpts:
    Sites visited, content viewed, and length of visit are .. analyzed to predict an online behavioral pattern for such a user ... Behavioral ad networks then serve targeted advertising related to that user's behavioral classification ...

    According to some recent studies, behavior-based ads are faring much better than content-placed ads ... [In one example], when compared with the basic web ads, the behavior ads were seen by 115 percent more business travelers making at least one trip a year. The targeted consumers also scored three percent higher than the average viewers in brand awareness ... Because behavioral marketing enables advertisers to more easily determine and then postulate about user preferences and purchasing habits, the advertiser is able to treat each prospect more as an individual than an advertising collective.
Rather than blast a broad audience with your ad, you target just the people who might be interested. Personalized advertising is more effective because it is more relevant and useful to customers.

The article only describes the more common profile-based approach to personalized advertising, but there's many other techniques that could work well on this problem.