Geeking with Greg: 05/01/2004

Monday, May 31, 2004

E-commerce strong in 2003

Online retail was a $72B industry in 2003, representing 5.4% of all retail sales in the US, the New York Times reported today. Some amazing statistics: 43% of computer hardware and software will be sold online this year, 19% of books, 22% of travel.

CNet news aggregator

CNet is testing a news aggregator, The Guardian Blog and others ([1] [2]) reported today. It allows readers to get news on specific topics or about specific companies, similar to Topix.net, but focused on tech news.

Curious whether this might be a first step toward a generalized news aggregator to compete with Google News.

Saturday, May 29, 2004

More on GMail privacy

After California pushes a bill to restrict GMail, Google publishes a statement about GMail privacy. Excellent coverage on BattelleMedia.

Friday, May 28, 2004

Beyond "Six Degrees"

A post on Pattern Hunting proposes the idea of expanding social networking tools like Orkut to automatically find links between people who share common interests.

The idea would be to set a profile that will automatically introduce you to someone based on a shared interest and a minimum reputation rating. If both parties accept the introduction, then the connection is made. You might also want the option to "watch this person" (similar to "watch this auction item" at eBay) before making the stronger commitment to contact that person and invite him/her into your network.

Regardless of which methods are used to track reputations, they will likely need to have to go beyond FOAF rankings to include some measure(s) of quality regarding the information objects that a person publishes.

Thursday, May 27, 2004

MSN betting on personalization

MSN is "going to make a very big investment in personalization", says Yusuf Mehdi, head of Microsoft's MSN division:

The company hopes to soon have on its MSN web site a system similar to Amazon.com's technology that will recognize a user even if that person hasn't expressly signed on to the Web site, he said. It also is working on a system that will track a user's movements over the Internet and use that data to build a more personalized Web page based on the person's surfing habits.

Mehdi conceded that such efforts create thorny privacy issues. "We're going to make a very big investment in personalization, but it's very clear that privacy and consumer trust is really a key thing in getting your arms around personalization," he said.

Both Yahoo and Microsoft seem to see personalization as the key to attacking Google.

Update: 15MB of Fame has another interesting quote from Yusuf Mehdi and a link to the original speech at a Goldman Sachs conference.

FindForward

FindForward is a clever combination of web APIs (Google, Amazon, etc.) and data feeds (RSS, DMOZ, etc.) to allow a large variety of different types of searches. (from ResearchBuzz).

Wednesday, May 26, 2004

Infospace on personalized search

Search Engine Lowdown has an interview with Arnaud Fischer from Infospace (owners of Dogpile and WebCrawler).

An interesting little snippet on personalized search:

Monitoring navigation behavior at a user-level could conceivably be the basis to developing an understanding of users' individual interests over time, in essence personalizing the equivalent of Google's PageRank scores. If you consistently browse music-related content, search engines should become smart enough to understand that your query "Prince" most probably relates to the singer than to the royal family. Personalizing search relevancy algorithms presents some major scalability and performance challenges, though. It takes days, if not weeks to process link analyses and compute authority scores for individual Web sites after a crawl.

Update: Another interesting quote from this article that I missed the first time I looked at it.

Analyzing click popularity at an aggregate level along IP-associated parameters could be leveraged to extrapolate personalized ranking for clusters of users exhibiting similar behaviors. This technique would not be unlike Amazon's implementation of collaborative filtering technology.

Arnaud Fischer is definitely on the right track here. It's quite tricky to get this kind of thing right though. It'll be interesting to see if Infospace makes an attempt.

Why do personalized search?

Personalized search is showing different results to different people on the same search. Rather than just using the keywords provided for a search, everything the search engine knows about a person is brought to bear and impacts the search results.

The goal is to provide more relevant search results. Since different people have different interpretations of what is relevant, at some point, the only way to further improve the quality of search results will be to show different people different results. So, the advantage of personalized search is that it promises to deliver better search results. The difference is likely to be particularly substantial when a search is very ambiguous (e.g. a single word like "desk") or someone has difficulty finding something and refines a search repeatedly.

What are the disadvantages?

First, privacy is an issue. Search engines would build a profile of everything you tell them about your interests, every search you've done, and every search result you ever clicked on. That's a lot of information and, unless handled in the strictest confidence, could make many people nervous.

Second, search results will no longer be consistent for the same search, certainly not for searches by different users and not even for many searches by the same user. This means that e-mailing a search to someone or easily finding something by searching for it again become a bit more difficult.

Third, personalized search is computationally expensive. Caching and many other optimizations become impossible when every search result list is customized in real-time.

Will the advantages outweigh the disadvantages? Big players are betting yes. Google, Microsoft, A9, and Yahoo are all testing personalized search.

Tuesday, May 25, 2004

Newsimages

ResearchBuzz has a post about Newsimages, a site that displays news as a series of images instead of text. Interesting that it's almost the opposite of the simple, text-only design of Findory News.

See also Newsmap, a clever (although not very useful) way of displaying news.

Google vs. Microsoft

Emergic.org has a good article on Google's business strategy against Microsoft.

I think Microsoft is a bigger threat to Google than Google is to Microsoft, but Microsoft should be worried about the growth of Linux in the server and international markets.

Monday, May 24, 2004

More Findory News coverage

Findory News was mentioned on Search Engine Lowdown today.

Internet advertising is back

Internet advertising revenue was about $2.3B in Q1 2004. That's the highest ever, beating Q4 2002 revenue of $2.2B and surpassing previous peaks in early 2000.

Sunday, May 23, 2004

Findory on refdesk.com

Findory News is the Site of the Day on refdesk.com today. Generating a fair amount of traffic.

Friday, May 21, 2004

Findory in Slate

Jack Shafer at Slate mentions Findory News in today's Press Box column.

A more selective ignorance?

ResourceShelf points to an interesting Vivisimo paper where they discuss information overload and solutions for "ignoring information more knowingly." The Vivisimo solution is to show clusters of search results, allowing you quickly to see what information is available and pick what to explore and what to ignore.

But I'm not sure I agree this is the main problem in information overload. Do you really care about what you've ignored? Or do you just want to be confident that you've seen the most relevant information? The success of the "I'm Feeling Lucky" button on Google (which takes you directly to the first search result) argues that prioritization and relevance is key. Is ignorance about irrelevant information a real issue?

A9 and personalized search

Udi Manber says that search result "relevancy is different from user to user." Is there any doubt that A9 intends to pursue personalized search ([1] [2])?

Thursday, May 20, 2004

Findory in Micro Persuasion

Great article by Steve Rubel on Findory News.

Bill Gates on RSS

Several blogs ([1] [2] [3]) are reporting on comments made by Bill Gates today about RSS. You can read the speech, but he's basically saying that the publish-subscribe nature of RSS is a convenient way to provide information, a nice compromise between manually reading web pages to see if they've changed since the last time you visited and sending out potentially distracting and disruptive e-mails to mailing lists.

The last sentence in the RSS section of his speech is interesting.

The ultimate idea is that you should get the information you want when you want it, and we're progressively getting better and better at that by watching your behavior, ranking things in different ways.

He's no longer talking about RSS here. He's talking about personalization. Learning from your behavior, the system should get you the information you want when you want it. The real problem is prioritizing and filtering information, not accessing information.

Wednesday, May 19, 2004

Seruku: Search what you've seen

ResourceShelf discusses a new toolbar application called Seruku. "Seruku is toolbar-based application that ... makes a copy (called a snapshot) of every html web page you've viewed in your browser, stores it locally, indexes the content and then, when needed, allows you to keyword search the full text of this material."

My first thought was that this is silly, basically reimplementing the browser history and cache with a search layered on top. But then I realized that this really could be useful. Wouldn't it be nice to be able to do a Google search that was limited to web pages you have seen before?

It may just be a matter of time before Google puts this functionality in their toolbar, since this is an easy extention to their upcoming search of your hard drive.

Update: John Battelle briefly compares Seruku to Furl.

Update: Microsoft Research's "Stuff I've Seen" project is a generalized version of Seruku.

Findory in the San Jose Mercury News

Brief coverage in today's San Jose Mercury News of Findory News.

Weblog comment spam

John Battelle is getting increasingly frustrated in his battle with weblog spam. It is a huge problem. But, as Mark Pilgrim points out, the common solution to weblog spam, blacklists and filters, isn't likely to work.

Instead, the focus should be on slightly increasing the cost of posting. For example, some sites put up an image of a random code that needs to be typed in to register or post a comment. The image is difficult for an automated process to read, so most spammers won't bother trying. Other sites require confirmation that you own an e-mail address by sending you an e-mail with a link that needs to be clicked for your registration or post to be accepted. Both of these methods incrase the cost to spammers with only a minor annoyance to non-spammers.

Blog comment spam exists because it works. Increasing the costs, attacking the economics of spam, is the best way to solve the problem.

Tuesday, May 18, 2004

I want my PC TV

In an ideal world, we could watch high definition TV on demand on your PC, any of thousands of programs available at the push of a button. How can we get there?

One approach is to have the PC record directly from broadcast or cable. This has the advantage of using existing TV broadcast streams, but current solutions are difficult to set up and don't usually support HDTV.

We could download shows directly on demand. RealVideo and Windows Media streams are very low quality, but the BBC will soon test BBC on Demand with high quality video streams. Unfortunately, with high quality streams, the file sizes mean that "on demand" may require waiting tens of minutes or more from requesting the feed to being able to watch it.

AtomFilms Hi-Def (using a technology from Maven Networks) downloads short films in high definition resolutions using idle capacity on your broadband connection. It's very cool. And it suggests an alternative approach to TV on the PC.

What if you had a Tivo-like application running on your PC that downloaded TV shows when your network was idle? List the shows you like, watch them on your PC. The shows would be downloaded a few hours after they were broadcast. True, there would be no live TV, but how many Tivo owners would mind losing Live TV?

Where would the downloaded shows come from? One option is a peer-to-peer network where content recorded on Tivos or HTPCs is shared with other clients who can't record live TV. The second would be for the networks to provide Internet broadcasts of their existing broadcast content. With either option, as long as the recorded show includes the original advertising, the networks have an incentive to participate, since more viewers of their broadcast means higher advertising rates and more revenue.

The application could also download content you don't request explicitly, much like Tivo suggestions. Some of this could be other programs that might be of interest; if you watch Simpsons, you might also like Futurama. Some of it might be movie previews or BMW short films, forms of advertising that are useful, interesting, and could be targeted to your interests.

The technology to build this application is out there, a combination of Maven Network's AtomFilms Hi-Def application, Tivo software, and existing broadcast streams. All that remains is for someone to build it.

Update: Emergic.org has an interesting proposal on how to build this system using BitTorrent and RSS.

Update: John Markoff is reporting that that Tivo, Microsoft, and a couple startups are all planning products that download TV over the Internet.

Update: Two of the startups are Akimbo Systems and timeshifTV. Channel selection on both seems to be quite limited, but it's a start.

Update: On a related topic, Texas Instruments is attempting to allow high-definition television over DSL lines. Interesting alternative to cable.

Update: Three months later, a Salon article describes a system using BitTorrent and RSS that allows users to download any show they want automatically to a PC (apparently, about as described on Emergic.org on May 31, 2004). Not legal, I'd assume, but apparently popular and growing rapidly. [Thanks, Niall Kennedy]

Monday, May 17, 2004

Orkut and AdWords

Jeremy Zawodny speculates on the value Google would get from using Orkut profiles for advertising. Interesting (if evil) idea, but the privacy concerns are serious. Instead, I suspect Google will try using your Google search and clickstream history to improve their targeted advertising.

Merging Orkut and Blogger

Steve Rubel argues that combining Orkut and Blogger would "make it easier for us to identify the most credible/valuable bloggers who write about the subjects that matter to us." It's a great point. In the recent redesign, Blogger did add an Orkut-like author profiles, communities, and interests, but it's missing the deeper social network. Just like Orkut could add credibility to reputations for auction transactions, it could add credibility to bloggers' reputations.

Thursday, May 13, 2004

Google vs. eBay?

What if Google decided to move into e-commerce? The business case is compelling. Not only would this be a major new market for Google (at the expense of eBay and Amazon.com), but also it would be an excellent strategic move against Yahoo (Yahoo! Shopping) and MSN (MSN Shopping).

What would it take? Google already has a shopping search engine, Froogle. Merchants even provide product catalogs directly to Google. To actually handle the entire transaction, they'd need a payment system. And they'd need reputation management system; using Orkut, they could build a powerful reputation management system that included having your network of friends vouch for your reliability.

How would a Google shopping site gain critical mass? One easy way would be to show Froogle results for any search. Initially, most searches would yield products selling on other merchants sites. Over time, as Google added more and more listings, Google would be managing more and more of the transaction. Combine this with an approach of trying to dominate specific categories of products first, and you have a viable and useful shopping site from day one.

So, what would it take? Google would extend its AdSense payment system to support shopping transactions, extend Orkut to help manage shopping reputations, and start handling the sale of some of products listed on Froogle directly. That's it. That's all it would take for Google to enter the e-commerce market. Think it will happen?

Google to sell banner ads

Only days after my prediction that Google would lead a revolution in text-only advertising, Google reverses itself and announces that it plans to sell graphical banner ads. Doh. What happened to "don't be evil?"

Wednesday, May 12, 2004

Using ferns to clean water

Arsenic in the water supply can be cheaply eliminated by growing ferns in the water, according to an article in the journal Nature. The ferns can reduce arsenic concentrations by two orders of magnitude in less than 24 hours.

RSS and scaling

A Wired article expresses concern that RSS polling will crush sites from heavily traffic, but scaling RSS is really pretty straightforward using caching. Web-based RSS readers like My Yahoo!'s RSS Beta cache RSS feeds to share among several readers, reducing load on the server. Because immediate updates aren't important when delivering RSS content, it's also straightforward to set up intermediatary caching servers, just like many do for static web content.

Unfortunately, personalized RSS feeds can't easily be cached and shared. Each RSS feed contains different content; there's effectively millions of different RSS feeds. This is a more serious scaling challenge. Findory News has heavily optimized its personalized RSS feeds because of this issue.

Tuesday, May 11, 2004

Thumbshots Ranking

Thumbshots has a useful tool that lets you easily and graphically compare the overlap between top results for searches on major search engines such as Google and Yahoo. SearchEngineWatch has a good article with details.

Your conclusion might be that this proves the value of metasearch engines, but that's not obvious. The data only shows little overlap in the top search results; that is, search engines differ in how they prioritize search results. When combining data from multiple search engines, the trick is to find a combination that results in a better prioritization than any of the individual search engines. That's very hard to do because the metasearch engine is working with less data (often just the page summary) and usually has to do the prioritization in real-time.

Feedster News Alerts

Feedster News Alerts are like Google News Alerts that cover blogs instead of mainstream news sources. Useful for checking blog or news coverage on very specific topics.

While news alerts services like these are useful, they have severe limitations. First, it's a pain to set up these services, providing lists of specific keywords that match your interests, then refining them if they return too many or too few results. Second, as anyone who's used these services knows, you rapidly become saturated by the unprioritized flood of news alerts e-mails coming into your inbox.

Combining all the e-mails and properly prioritizing the combined results would be a step in the right direction. At the extreme, when everything is easy and well prioritized, you may end up with something that looks a lot like the Daily Findory News.

Monday, May 10, 2004

Interview with Craig Silverstein from Google

Interesting interview with Craig Silverstein from Google. A brief discussion of personalization:

Q: There are some personalisation tools emerging. Amazon's A9.com and MSN are using different techniques. Google's tool is a little bit more like, "Give us information, and we will help you out," and the others take the approach, "We will learn from you, and then we will help you out." Tell me why your approach is superior.
A: In the latter scenario, where first you learn, and then you help the visitor out, you have two places where the computer has to make intelligent judgments. I am not saying that is not an interesting or promising approach, but it does put more strain on the computer. When you tell it what your interests are, then the computer only has to be intelligent to use that information to try to help you out. They are both part of the same goal of trying to help people out with personal information -- it is just a matter of how you get there. We will be seeing more of this in the future.

Craig is an incredibly sharp guy, but I think he's missing a key issue here. When you rely on people to tell you want their interests are, they (1) usually won't bother, (2) if they do bother, they often provide partial information or even lie, and (3) even if they bother, tell the truth, and provide complete information, they usually fail to update their information over time.

Sunday, May 09, 2004

Kill eBay, Vol. 2

An interesting article in the New York Times claims the biggest threat to eBay's growth may be from traditional retailers. Traditional retailers offer much better customer service, especially on returns.

As eBay sells more new goods, the brand's lack of a money-back guarantee will become a hindrance. "History suggests that in order to remain competitive, retailers must match the offerings of others," Professor Koehn said. "EBay is unlikely to be an exception."

You might also be interested in reading Kill eBay, Vol. 1.

Friday, May 07, 2004

Bringing sense to web advertising

Google's $30-50B IPO valuation means investors are making strong assumptions about Google's future growth. Where will that growth come from?

At 40-50% of the worldwide search market, Google is close to saturating its core market. Any additional search market share will be fought tooth-and-nail by Yahoo and MSN. New products may offer some opportunity for traffic growth, but not on the scale required to support Google's lofty valuation.

But, Google doesn't need to drive traffic on its site to drive advertising revenue. Through the Google AdSense program, Google's ads are distributed across the web, already accounting for 22% of revenue in 2004. Google can grow revenue by helping other sites with advertising.

And other sites do need help. Despite evidence that big flash banner ads and popups cause long-term reductions in traffic, marketers seem addicted to them. Marketers ignore the long-term costs and optimize for short-term clickthrough rates. Sites like Salon have gone even further, requiring clickthrough on multipage full screen ads before being able to view content. Annoying is in.

But advertising doesn't have to be annoying. By targetting the right customers, marketers can get their message in front of people who are interested in that message. Google ads are remarkably successful with unobtrusive, text-only ads. How do they do it? The ads are targetted and optimized. Ads are matched to relevant searches and sites. Ads are optimized; ads that attract clickthroughs are shown more frequently and ads that don't appear to be working are quickly removed. Google ads are so well targetted and optimized that they're helpful, often yielding the exact product or information you were looking for with your search.

Google isn't the only one doing text-only ads. But their product has some unusual characteristics. Any site, no matter how small, can buy ads or sell advertising space using Google AdWords and Google AdSense. And Google's impressive infrastructure, in-house expertise, and substantial market share should allow them to produce higher quality targetting than the competition.

AdSense is the future growth channel for Google. As Google grows AdSense, it will change the face of web advertising. Welcome to the new world.