Tuesday, August 31, 2004

Tableau Software is the next Google?

Apparently, some think Tableau Software is the next Google.

It's hard for me to believe that one of many database visualization software companies could be the next Google. Even if the product is exceptional, it is only applicable to a small market. As jdray said, the only thing Tableau Software seems to have in common with Google is that "they used to work down the hall from the Google founders." Sure have some nice hype though.

See also my previous post, "Everyone is the next Google".

Monday, August 30, 2004

Fool.com on misinterpreting Google

David Meier's excellent response to an analyst who is criticizing Google's business strategy:
    [Analyst:] Google needs to diversify its revenue base

    Okay, you lost me with this one. If you say Google is well positioned strategically, why diversify? That sounds like a surefire way to reduce returns on invested capital and destroy value, not create it. It draws resources away from their strengths and prevents them from strengthening their capabilities. Google needs to find the best ways to invest the pile of cash from the IPO, and "diworsification" is not the right way to do it.

    [Analyst:] Become more like a portal

    Man overboard!

    If Google is positioned so well and has a competitive advantage, why would it want to compete the same way that Yahoo does? Competing like a portal is not Google's specialty, and in doing so, it would forgo its unique position.

    The analyst can say whatever he wants about Google's stock price. But the analyst should check his logic before saying Google has a competitive advantage and yet should change its spots to look like everyone else in the fray.
[Thanks, Garrett French, for pointing out the article]

Update: David has a follow-up to his first article.

Sunday, August 29, 2004

GMailFS and storage on the Google cluster

Many have mentioned an amusing hack that allows you to use your 1G of GMail storage as a mountable file system (for you non-geeks, essentially another drive visible on your desktop). Heh, heh. Very geeky, though totally impractical.

In speculating about Google's next moves, some have talked about Google's potential as massive remote storage system. The idea is that users could store data remotely and replicated across the Google cluster, providing a service similar to iBackup, Xdrive, or Yahoo Briefcase. I'm not sure about this one. Google's cluster is heavy on CPU and memory and light on disk, perfect for operating over distributed search indexes, but probably not what you want for a simple remote storage system.

However, the Google cluster may be a good platform for remote services and applications, since this would take advantage of the available computing power.

Saturday, August 28, 2004

What's wrong with feed readers?

Another interesting rant, this one on lack of innovation in feed readers. At one point, Ponyboy writes:
    Feed readers have at their disposal near infinite processing power, well-differentiated and -defined data and… do nothing with them. You can sort your feed items by date. Exciting!

    Where are the extrapolations, based on the data? Where is Bayesian filtering? Why isn’t there auto-correlation between like items? Why isn’t there sorting by link popularity? Or inter-linking between feeds? Why can’t I rank feeds or categories higher than others? Why can’t I rate items and let the cumulative ratings over time determine feed rankings? Why isn’t there some statistical combination of each of the above to put what I’m actually going to care about at the top of the list and the discussions about which syndication protocol is best at the bottom?
I couldn't agree more. And here is what you're looking for, Ponyboy.

Friday, August 27, 2004

Feed search engines review

Joseph Scott writes a great review of current feed searches, including Technorati and Feedster. Joseph bemoans the current offerings and then ends by wondering when Google, Yahoo, and MSN will launch their own feed search engine.

I suspect it will be soon. I'm curious what Technorati and Feedster will do to compete when faced with an equal or superior product from the big guys. Blogory is a more unusual product, more of a discovery tool to easily find interesting weblog articles than a blog search engine, but Technorati and Feedster are pure search engines specialized to a particular type of news source. Can they survive the entry of a Google or Yahoo?

Rich Skrenta and Jeremy Zawodny also have some interesting comments on this topic.

Yahoo on Yahoo Local

Paul Levine, GM of Yahoo Local, talks about problems with coverage and data quality:
    Of course we've also gotten some constructive criticism too. Most of it's around holes in the data. Just like cell phone service...there can be dead zones. Comprehensiveness is one of the biggest challenges and we're focusing a lot of time and energy on it.

    Here's the thing about local content: some of the best stuff isn't on the Web, which makes the aggregation process pretty manual. Right now, we have more depth in some areas than others, largely because restaurants and hotels have taken to electronic publishing more quickly than, say, roofers and barbers.
The fact that improving quality on local search requires a "pretty manual" process is worrisome. Manual means slow and expensive. Automating data collection is hard because there's so many very small local businesses and advertisers, all changing rapidly. This is the primary reason I'm skeptical about local search.

Wednesday, August 25, 2004

MSN personalized search by December?

In a recent Search Engine Watch article, a interesting tidbit from MSN:
    MSN is pursuing what [Chris] Payne called "implicit personalization." He hinted that the personalization technology used in its recently launched NewsBot service will migrate over into the mainstream web search service the company plans to launch later this year.
I've been trying to figure out what this means. Chris Payne seems to be claiming that MSN will be rolling out some form of personalized search by the end of this year. Seems ambitious. From the reviews ([1] [2]) of MSN's new search engine, they've got a long way to go between now and December to even match Google and Yahoo's current offerings. Personalization is something you do after you've nailed the basics. I wouldn't expect to see a competitive personalized search product from Microsoft any time soon.

Chris also seems to be claiming that the personalization technology developed for MSN Newsbot can be transferred to MSN Search. This is also surprising. News is very different than web search. Different data and different problems require different approaches. Findory is developing our own personalized search product. The techniques being used are quite different than the personalization for Findory News.

The Google browser (and OS)?

Jason Kottke speculates on Google developing its own Mozilla-based browser:
    A Google Browser would give the Mozilla platform instant credibility and would be a big hit ... IT departments wanting to switch away from IE would have some formidable firepower when pitching to upper management..."Mozilla? What? Oh, it's Google? Go for it!"
Kottke also repeats parts of his argument for a Google OS.

I've seen this argument before and I have a hard time with it. The Google OS idea seems to be suggesting making the PC more of a thin client, possibly running Linux instead of Windows and mainly using web-based applications like GMail. Sometimes, people seem to go even further, suggesting that Google would partner with a PC vendor to sell a Google-branded inexpensive Linux box with OpenOffice, Mozilla, and a default UI that encouraged use of Google web-based applications like GMail.

I don't understand why this is a good idea for Google. To me, this seems like a huge and dangerous distraction from their core mission of making "the world's information universally accessible and useful."

I suppose the primary justification is to counter Microsoft, a preemptive attack on the Redmond giant. I think that's a battle they'd be sure to lose. Microsoft is an order of magnitude larger and well entrenched in the OS, browser, and office applications markets. Do you really think Google should move outside of its expertise to do a full frontal assault on an entrenched competitor x10 its size?

Tuesday, August 24, 2004

More on the Google playboys

Playboy has published an interesting additional excerpt from the earlier interview with Google founders Larry and Sergey. The quotes from Larry Page on Google's management are worth reading.

Larry explains he'd rather have "too few than too many" managers because management layers can alienate people at the bottom and reduce productivity. But he acknowledges that people, "especially junior people", may not get the attention they need. The lack of management apparently works because many management tasks, such as employee reviews, project reports, and project planning, are automated and distributed. Google has "hundreds" of "small projects going on all the time", increasing innovation, reducing complexity, and reducing risk.

The problem with junior people not getting attention can be handled by setting up mentoring relationships. My understanding is that Google already does this, but perhaps not enough. Automating project planning is an excellent, allowing everyone to contribute ideas for new projects and promoting innovation. Interesting that Amazon also runs their projects small and quick.

[Thanks, Searchblog]

It's who you know

A depressing article on inefficiencies in the labor market by Daniel Gross:
    If labor markets were truly efficient, pay among workers with similar credentials would not vary much. But within groups of similarly situated workers, income inequality has risen in recent decades. What's more, [Stanford] Professor [Kenneth] Arrow said, "Observable characteristics like intelligence, education, experience and age explain only half of the difference."

    To get at the other 50 percent, he ... constructed a model that views wages as functions of competitive bidding among companies ... About half of all jobs are still found through personal contacts of some sort. And the more connections you have, the more you end up being paid. Why? Companies that make judgments based solely on a resume are flying blind, to a degree. By contrast, if a job applicant once worked with a current company employee, or attends the same church as a company worker, the company can glean hints about how that applicant will perform.
On the one hand, this confirms what we all know, that networking matters and matters a lot. On the other hand, this is disappointing, since this inefficiency will reduce productivity by failing to put the best person in the best position for their skills.

What's the root cause of this inefficiency? The problem is that information is incomplete, inaccurate, and costly. A resume provides only limited data about a person's skills and lack credibility (without a background check). Formal interviews expand on the information available in a resume. Reference checks increase credibility further and provide information about reliability. But doing all of this over a broad pool of applicants can be very costly and, in the end, you will still have uncertainty about the skill set of the candidates.

Networking may be a short cut, providing information about a candidate at low cost. This information could be much more complete than information available elsewhere. For example, working with someone for five years gives you excellent information about their skills, reliability, and trustworthiness. But, the network relationship may only provide a false sense of security. For example, you have no relevant information about a friend of a friend, someone who went to the same college as you, a member of your golf club, or someone who goes to your church.

The key here is information. Where networking provides relevant information, it's a valuable resource for firms when recruiting. If the network relationship is distant and provides little information about work skills, biasing toward that person will miss more qualified candidates and hurt productivity.

Monday, August 23, 2004

Search engine personalization: The fallout

Jonathan Oxer writes about personalized search, focusing on privacy issues and the impact on search engine optimization (SEO) companies. As Jonathan points out, privacy issues can be overcome by keeping users anonymous, as Findory does. After agonizing about the impact on SEO firms, he ends up saying they should just do the right thing and "create a site that is genuinely useful to real people, with content they are interested in and good usability." Seems obvious, doesn't it? Instead of focusing on cheating the system, focus on making your site genuinely interesting and useful.

Searching for the next Google

Paul La Monica at CNN says "get ready for the Google wannabes." Noting that "there still is room for upstarts to make inroads," he looks at two areas for improvements to web search, more relevant results (primarily using personalized search) and exposing the "Deep Web" (crawling or aggregating databases currently missing from major search engines). Notably missing is any mention of local search or specialized search engines.

Technorati gets $6.5M in funding?

Om Malik reports that Technorati is about to close a $6.5M round of funding. There's some serious capital backing RSS-related companies nowadays.

Update: Good article, "RSS Attracts Really Serious Money", about all this in Wired.

Saturday, August 21, 2004

Yahoo's no-limit search queries

ResearchBuzz has an interesting post about Yahoo's lack of a limit on query strings for searches. Tara has one example with a search restricted to only state government sites (i.e. a search restricted to matches on one of the 50 domains wa.gov OR ca.gov OR ...).

By exploiting this, someone could build a bunch of interesting little tools. For example, a tool could restrict a search to only the blogs you read, the sites you have bookmarked, or even (using a toolbar) the sites you viewed in the last N days. Tools could do various forms of query term expansion (e.g. turning a search for "greg" into a search for "greg or gregory"). At the extreme, if you're willing to seriously abuse the system, you might even be able to use Yahoo search to find related pages or clusters of web documents on the fly by searching for subsets of keywords or phases in common.

Of course, I wouldn't count on Yahoo maintaining this feature. These kinds of very long queries have to be expensive on their servers. Tara's state government search demo took 5-10 seconds to respond when I tried it. But, if you have the time, it might be fun to play with it while it lasts.

Being the biggest brain

I liked this line in the I, Cringely article I posted recently.
    Being the biggest brain in the room didn't always make you the best decision maker. The Google founders have yet to learn that lesson, but they will.
A common mistake, especially for new managers. It can be hard to let go. I've been guilty of it myself in the past. But it is a lesson the Google founders have to learn to be successful.

But, why doesn't the biggest brain entitle you to make all the decisions? Because it just doesn't matter. It's a fundamental misconception about the nature of intelligence that smart people can make decisions in the absence of information. You see this all the time in bad SciFi books or poorly written movies. The genius solves the crime without any evidence. The brainiac learns to speak French after hearing a few sentences of the language. But this isn't how intelligence works. Intelligence is an ability to process, understand, and synthesize information.

So, a big, juicy, chess-club brain doesn't help you if you lack information. In terms of management, this means that the people with the most information are usually in the best position to make a decision. And the people with the most information are usually the people closest to the problem, the ones on the ground, the ones trying to build something, not the people in upper management.

This is why pushing down authority results in better decision making. You make better decisions with better information. Pushing down authority allows people with the information to make the decisions.

Update: Looking at the trackbacks on this post, I don't want to get into a debate about whether the biggest brains are in management. I very much doubt it, but that's not the point. The point is that, if you are a manager, even if you think you're the smartest person in the room (regardless of whether this is true), you should still delegate decision making to the people with the most information. Your team will make better decisions that way.

Jeff Bezos, the explorer

BusinessWeek has an interview with Jeff Bezos. Jeff argues that small projects allow rapid iteration, exploration, and innovation:
    You need to set up and organize so that you can do as many experiments per unit of time as possible. If doing an experiment costs $100 million and takes three years, well, you're not going to be able to do very much innovation. If, on the other hand, you can organize in small, lightweight teams that have certain tools so they can do a lot of experiments per week or per month or whatever the right unit of time is, then you'll get a lot more invention from that.
Excellent. Small projects minimize risk and allow rapid iteration. This will increase innovation, especially if some of the projects are rapid prototypes of radical new ideas. I'd recommend pushing this even further by giving people a fraction of their time -- 20% is common -- to work on and explore anything they like.

Jeff also defends the idea of two-pizza teams (a product group with 4-8 people), claiming they enhance productivity:
    Q: Amazon uses small teams, which you call "two-pizza teams." How do you organize projects so that such teams can work?

    A: The idea of using small teams is a pretty well-accepted notion. What happens is, as the teams get bigger, they have to spend more time coordinating. This is sometimes very misunderstood, but if you want to have a good work environment where people can really build, you don't want them to have to spend a lot of time coordinating. To the degree that you can get people in a team small enough that they can be fed on two pizzas, you'll get a lot more productivity.
As I said before, it's not at all clear to me that two-pizza teams spend less time coordinating or are more productive. You want autonomous, independent teams. You want to minimize complex interdependencies between your teams. You want to maximize informal networks, learning, and knowledge sharing. But setting an arbitrary limit on team size doesn't automatically achieve these goals.

[Thanks, Innovation Weblog, for pointing out the article]

Friday, August 20, 2004

I, Cringely on Google

Interesting I, Cringely column on Google.

On Google's management:
    The problem is that the company has built layers and layers of folks who aren't allowed to make any decisions. That feels like delegation, but isn't ... Being the biggest brain in the room didn't always make you the best decision maker. The Google founders have yet to learn that lesson, but they will.
On Google's business strategy:
    Google needs to grow ... because they have become a target. Any billion dollar market involving IT is one that Microsoft wants to own.

    Google's strengths are its technology, its brand recognition, its current status as a stem cell of Internet business, and of course there's that $1.75 billion. Look for the company to accelerate its acquisition pace with a strong emphasis on acquiring smaller companies with interesting technologies.

    The key to making money in search is to get between people and what they are searching for, and that's where Google is on a collision course not only with Microsoft and Yahoo, but also with Amazon and eBay. Amazon is vulnerable to the Googlization of all the millions of retailers who aren't running Amazon storefronts just as eBay is vulnerable to the Googlization of auctions where localization, pricing, and seller fees can all be improved.

Software project failures

InfoWorld is running a series of articles on the "Six great myths of IT".

Myth #5, "Most software projects fail", hedges by saying that it "depends on how you define failure", but then provides hard data by citing a Standish Group survey that says that only 34% of software projects are "unqualified successes", 51% are "challenged projects" with "cost overruns, time overruns, and projects not delivered with the right functionality to support the business, and the remainder are unqualified disasters. Sounds pretty dismal.

The graph accompanying the article is particularly interesting. Very large projects appear to be nearly guaranteed to fail. Shouldn't be surprising to anyone in the industry. It's hard to overcome the complexity and lack of accountability in very large projects.

Perspectives on the Google IPO

Some interesting views on the success of the Google IPO, starting with Wired:
    Though the 18 percent jump boosted the paper worth of Google shareholders and insiders, it also raised questions about the effectiveness of the unorthodox auction, which was designed to gauge true demand and set a rational price that wouldn't be subject to big swings.
From The Economist:
    However inept Google’s handling of the IPO has been, there are many with an interest in seeing its innovative auction fail. The usual IPO process involves a group of investment banks building interest among their institutional clients (pension and mutual funds, insurance companies and the like) and setting a price that, they hope, will give those clients a decent return on their money. Google wanted to bypass this process for two reasons: so it could open its flotation from the start to a broad base of individual shareholders; and to avoid the “pop” (sudden price rise once trading starts) that often comes with bank-priced IPOs. But the strategy seems to have backfired: a policy of offering low or no sales commissions left brokers with little incentive to push the shares, and with the $15 price rise on the first day of trading the shares got a sharp pop anyway.
And from the San Jose Mercury News:
    Yet today, having bettered [a $20B market cap] by a considerable margin in its IPO, Google is beset by second-guessers -- including people with distinct axes to grind -- who are calling the exercise a failure.

    Google's management certainly did screw up in some ways. But overall, this IPO was a success. Don't let the naysayers, especially the Wall Street crowd, tell you otherwise.

    The most important element of the success was the auction, which Google forced down the throats of investment bankers. Under the cozy old system, the bankers would set the price of hot IPO stocks grossly low and make sure their friends and favored clients got shares at the offering price, which they could then unload for fat profits.

    This time, the company got the money, except for some relatively low fees to the banks. The stock had a modest ... first-day rise anyway, but the people profiting from that were those who bid in the public auction process, not insiders.

Thursday, August 19, 2004

Bug me again?

Xeni Jardin reports that BugMeNot, a site for bypassing registration requirements on news sites, appears to have been taken down. Unfortunate if true.

Update: Apparently, the hosting company for BugMeNot turned off the site, but the developers of the tool appear to be looking for a new host so they can get the service back up.

Update: And BugMeNot is back again. Glad to hear it.

Update: A Wired article on the whole saga.

Wednesday, August 18, 2004

Recall Toolbar

The Recall Toolbar, a clever variation of Seruku and other applications that maintain a searchable index of every web page you've ever seen. [via Searchblog]

Google: Now comes the hard part

An oddly negative NYT article on competing with Google:
    Most industry analysts and search engine experts say that Google arrived at its current position by offering more relevant search results than its rivals could, with a vast cache of Web pages and a stripped-down site that does not distract users. The question being debated by analysts and executives is whether the company will be able to use its technological resources to protect that position after it goes public.

    Google has been forced to place bets on certain technologies, and some industry executives argue that if the company has bet wrong, it is likely to find itself vulnerable.
True, but Google is in a fairly good position, with an incredible team and track record of innovation.

The article holds out personalized search as an opportunity to unseat Google, but then just as quickly snatches it away:
    There are some indications that the major search engines - Google, Yahoo and Microsoft - are all betting on personalization, the idea that collecting user information and tracking the Web sites a user visits can create more precisely tailored search results.

    But [Raul] Valdez-Perez [CEO of Vivisimo] argued that personalization efforts were likely to fail. "The problem is that everything you type will return an overload of information," he said. "Companies such as ours are betting on new ways to organize information."
While it's obviously in Raul's interests to argue against personalized search -- Vivisimo's (admittedly very cool) clustering technology doesn't involve personalization -- his argument is nonsense. The entire point of personalization is to provide focus in the overload of information. If search results are personalized to your interests, you'll see fewer irrelevant results, not more.

The article ends by claiming that Google has been unsuccessful outside of web search:
    So far the company has not had proven success with services like e-mail, catalogs, personalized searches, news and wireless search services.
C'mon, this is just silly. GMail has been an enormous phenomenon that has changed the face of free e-mail by causing all other free e-mail providers to increase their offerings to match. Google News (while not as compelling as one super-nifty-cool news site) gets an incredible amount of traffic after only about two years in service. Yes, Google has proven success with services outside of web search.

Gigablast is growing

Gigablast, an impressive one-person search engine run by former Infoseek engineer Matt Wells, is hiring and expanding. I was wondering when this promising upstart would start growing more aggressively.

More information on Gigablast on their About page.

[Thanks, ResourceShelf]

Tuesday, August 17, 2004

MSN Search tech preview has ended

The tech preview of the new MSN search is over. [via ResearchBuzz]

GOOG and EBAY

Andrew Goodman points out that Google's latest 10-Q shows their six month revenues (ending June 30) to be an impressive $1.35B with nearly 100% growth over the year ago six month period.

Taking this one step further, it's interesting to compare GOOG with EBAY. In Q2 2004, Google had $700M in revenue, eBay had $773M. Net income from operations is about the same for both at 25-30% of revenue. Google grew at 125% compared to Q2 2003, eBay grew at 52%. Google's market cap is projected to be about $30B, eBay's is $52B.

There's so much uncertainty about Google's future income that I have a hard time determining what a fair value is for the company. But I think this comparison with eBay is instructive.

Update: It's being widely reported that Google's IPO is undersubscribed. The initial share price will be set at only $85, yielding a market cap of $23B, less than half of EBAY.

Monday, August 16, 2004

Down on local search

Nate Elliot, a Jupiter Research analyst, argues that local search doesn't work very well and doesn't provide the growth prospects the search engines need. An excerpt on the growth prospects:
    Local search marketing will generate $502 million in 2004. That's just 19 percent of the total search marketing spend. Local search spending is actually growing more slowly than the rest of the industry. In 2009, local search will bring in $879 million, or only 16 percent of the total search marketing spend.

    Only 4 percent of searchers say local search availability attracts them to a search engine, ranking it among the least-demanded search engine features. When consumers do use local search, it's mostly to find known merchants, not discover new ones.
Nate Elliot also points out that local search is hard. In particular, it's hard to get good, accurate, up-to-date, and clean data on all these tiny local businesses. And it's hard to market to and manage hordes of tiny local advertisers.

[Thanks, Andy Beal, for pointing out this article]

More ads in RSS

Feedster joins the growing trend of including advertising in RSS feeds.

See also my earlier argument against advertising in RSS feeds and a discussion of full text RSS feeds with advertising verses providing only excerpts without advertising.

Friday, August 13, 2004

Craigslist and eBay sitting in a tree

Craigslist, a scrappy free classified advertising site that gets 1B page views per month and is in the Top 25 for web traffic, just sold a 25% stake of their 15 employee company to eBay.

Update: An interesting SF Gate interview with Craig Newmark, founder of Craigslist. One particularly cool excerpt on how their community-based moderation reduces the need for editors:
    A: If anyone sees an ad they feel is wrong, they can flag it for removal. If enough people agree with them, it's removed.

    Q: So how does that work?

    A: It's automated. I'm oversimplifying it a bit, but if you see an ad that's inappropriate, you flag it. If a few other people see that, they flag it, and it's removed.
Update: Another good interview, this time in Wired.

The Google playboys

In the middle of the Google IPO quiet period, Larry and Sergey decided sit down for a little chat with Playboy magazine. The interview is published near the end of the latest amendment to their S-1. A couple excerpts, first on portals:
    PLAYBOY: With the addition of e-mail, Froogle—, your new shopping site—, and Google news, plus your search engine, will Google become a portal similar to Yahoo, AOL or MSN? Many Internet companies were founded as portals. It was assumed that the more services you provided, the longer people would stay on your website and the more revenue you could generate from advertising and pay services.

    PAGE: We built a business on the opposite message. We want you to come to Google and quickly find what you want. Then we’re happy to send you to the other sites. In fact, that’'s the point. The portal strategy tries to own all of the information.

    PLAYBOY: Portals attempt to create what they call sticky content to keep a user as long as possible.

    PAGE: That’'s the problem. Most portals show their own content above content elsewhere on the web. We feel that’'s a conflict of interest, analogous to taking money for search results. Their search engine doesn’'t necessarily provide the best results; it provides the portal’'s results. Google conscientiously tries to stay away from that. We want to get you out of Google and to the right place as fast as possible. It'’s a very different model.
And on the GMail targeted advertising privacy flap:
    BRIN: Any web mail service will scan your e-mail. It scans it in order to show it to you; it scans it for spam. All I can say is that we are very up-front about it. That’'s an important principle of ours.

    PLAYBOY: But do you agree that it raises a privacy issue? If you scan for keywords that will trigger ads, you could easily scan for political content.

    BRIN: All we’re doing is showing ads. It’'s automated. No one is looking, so I don'’t think it'’s a privacy issue. To me, if it'’s a choice between big, intrusive ads and our smaller ones, it’'s a pretty obvious choice. I’'ve used Gmail for a while, and I like having the ads.
I think The Register has the best comment on the wisdom of doing this interview in the middle of the IPO quiet period.

Wednesday, August 11, 2004

Everyone is the next Google

John Battelle writes:
    Remember when I predicted that there would be a company claiming to be the new Google every month or so this year? I was wrong. It's more like every two weeks, and it's either the new Google, or the "Google of" (insert vertical here - travel, shopping, etc.).
The latest Google of the Week is apparently Kozoru, a startup that somehow got $3M to build a new natural language search engine. No product yet, and an article on Kozoru in the Johnson County Sun (via ResourceShelf) doesn't raise confidence that there ever will be:
    The fundamental paradox, as Flowers puts it, is that computers are really good with math but really bad with language. Flowers struggled with this dilemma through a stint working for Microsoft

    "Then I gave up, frankly. January 2003, I said the heck with it. Technology was no longer interesting to me, and the really hard problem that I wanted to solve is unsolvable," Flowers said.

    Flowers, who holds degrees in English and philosophy, spent the next few months writing books and screenplays ... In February of this year, Flowers came up with an answer and came back to the states to put it to work.

    After translating more than 980,000 words in the English language into codes of ones and zeroes, Kozoru's first objective will be to establish a knowledge base. To do this they will first turn to the most objective source for language information, the dictionary. After establishing that system, they will incorporate the most objective source for historical information, the encyclopedia.

    Flowers said he hopes to have the initial Kozoru prototype developed in the next nine to 12 months.
I hope this is just bad reporting. Having gotten $3M in investment, one would assume there is, at the very minimum, a strong founding team, existing prototypes, and novel technology at Kozoru.

Update: The next Google? They're everywhere.

Update: Two years later, Korozu appears to be dead. They never launched their promised natural language search engine. The only thing that surprises me on this sad tale is that this company got $3M in funding and so much press attention. It was a remarkable display of hype over substance.

Tuesday, August 10, 2004

Yahoo search rumors

John Battelle throws out some teasers about Yahoo's new search products:
    They want to grok RSS, blogs, mobile, desktop search - and beyond ... I have to admit the things he spoke of and showed me, much of which unfortunately I can't report on yet, were pretty damn cool. Suffice to say Yahoo is continuing and strengthening its approach of driving search results based on intent of the user, and in particular discerning what the "task" is the user is attempting to do, then helping complete that task.
So, Yahoo has been blabbing about RSS, desktop search, and personalized search for a while. Nothing new there. But I was surprised to see the seasoned John Battelle say that he saw new products that hit his "pretty damn cool" level. Particularly in personalization -- anything that attempts the difficult task of determining the "intent of the user" -- I wouldn't have expected Yahoo to execute quickly. Curious to see what they have under wraps.

Google settles patent dispute

Dan Gillmor comments on the Google's patent dispute settlement with Overture:
    It looks like there really was something to Overture's complaint (Overture was purchased by Yahoo) that Google was infringing. The size of this settlement tells you that Google was very afraid.

Monday, August 09, 2004

Voluntary vs. mandatory registration

Steve Outing continues the discussion on mandatory registration at newspaper websites and points to Findory News as a good example of voluntary registration.

Poynter E-media Tidbits is read by many people in the newspaper industry. I'm hopeful that discussions like this might convince many traditional newspapers to switch to voluntary registration. Mandatory registration is an annoying hurdle that repels readers, loses traffic, and reduces advertising revenues. Using techniques such as geolocation and personalization, newspapers can target advertising more effectively, drive traffic, and increase revenue without requiring registration.

Protecting fair use using Slashdot

Pocketpctools.com was threatened by overzealous lawyers from Ziff-Davis/EWeek when they posted an article that included a small excerpt from and a link to an article originally published in eWeek.

After a Slashdot post appeared that publicly aired the issue, the executive editor of eWeek retracted the threat and apologized. Cory Doctorow has an interesting comment on using public humiliation as a tool to protect fair use.

Saturday, August 07, 2004

Amazon.com's two-pizza teams

The "Book of Bezos" on page 4 of the Fast Company article I mentioned earlier has some good advice. But the idea of "two-pizza teams" (under "Communication is terrible") seems well-intentioned but misguided. The goal is to create autonomous and independent teams where communication costs are low, innovation is high, and execution unrestricted. In practice, what I suspect you'll get is teams that are too small to execute on their tasks, a nest of confusing, disruptive dependencies between two-pizza teams, and a competitive, insular atmosphere that reduces morale and productivity.

At the risk of criticizing someone who has built a $15B company from scratch, Jeff, I think you're wrong. It isn't all communication that is terrible, but just hierarchical communication that is terrible. What you don't want is for someone to have to ask a question to their manager, have it go up the hierarchy, down another hierarchy, and then finally get answered. You want informal networks where people freely contact the people they need to contact directly. Two-pizza teams don't give you informal networks; in fact, they probably hinder their development. Practices that promote informal networks include sharing team members across projects, encouraging (but not forcing) people to switch groups occasionally, socializing between groups, easy access to project documentation, and incentive structures that don't penalize helping other groups (by only rewarding individual or small team performance).

Fast Company on Jeff Bezos

Fast Company profiles Amazon.com CEO Jeff Bezos saying, "Amazon.com's founder is a study in contradictions." An excerpt:
    You have to get past his reputation within the industry as the ultimate quant jock, the by-the-numbers boss who supposedly wants to measure everything with spreadsheets, and base all decisions on data, not judgment or instinct ... What really distinguishes Bezos is his harrowing leaps of faith. His best decisions can't be backed up by studies or spreadsheets. He makes nervy gambles on ideas that are just too big and too audacious and too long-term to try out reliably in small-scale tests before charging in. He has introduced innovations that have measurably hurt Amazon's sales and profits, at least in the short run, but he's always driven by the belief that what's good for the customer will ultimately turn out to be in the company's enlightened self-interest. Bezos sees himself as a "change junkie," and the culture he has created is adept at coming up with innovations, but he's also surprisingly blatant and unabashed about copying ideas from competitors. And while Amazon has benefited from Bezos's forceful convictions, he's remarkably good at listening to outside critics and following their advice when they convince him that he's wrong.

    The seeming contradictions are what make Bezos so unusual -- and so formidable. He's the rare leader who obsesses over finding small improvements in efficiency at Amazon's huge warehouses right now while sustaining an entrepreneur's grand vision of changing the world over decades. Depending on the situation, he can be hyperrational or full of faith, left- or right-brained, short or long term.
From my experience, this is a reasonably accurate characterization of Jeff Bezos. He's sharp, visionary, full of zany ideas, and willing to bet the entire company on any one of them at any moment.

Friday, August 06, 2004

Amazon.com's blog aggregator?

Erik Benson points out that Amazon.com appears to be testing its own blog aggregator. Blogcast supposedly has "interesting blog posts from all over the web." Right now, as Erik said, it seems to be only on the electronics early adopter page and is pulling from only two blogs, Engadget and Gizmodo.

Weak initial offering and a weird move for Amazon. But is there more to come?

The mood shifts against Google

John Battelle summarizes the current anti-Google mood in the media. An excerpt:
    There's blood in the water. WashPost (reg required) rounds up the scathing headlines and commentary here. The Merc, NYT, WSJ, the Post, the FT....the list goes on and on. The journos are piling on. And Google is in a quiet period, so it can't defend itself. But there are plenty of folks willing to say nasty things, especially those on Wall St. who felt snubbed by Google's middle finger of an S-1.
This cycle seems familiar. The press hypes up something beyond all possible reason, then gleefully tears it back down.

Old vs. new media

Dan Gillmor points to a hilarious critique of blogging:
    Blogs are going to change the world. Example:

    OLD, TIRED MEDIA: "The Associated Press reported that Saddam Hussein was captured yesterday by American forces."

    NEW, EXCITING MEDIA: "omg like kos reported that he saw on chris's blog that john trackbacked to mike's journal where he read about bob's girlfriend's brother's cousin who was like watching Fox News (fair and balanced my ass! lol) and they said something about saddam i dunno current music: brittney cleary - im me current mood: corpulent"

    Notice the synergy of information and the ease by which information propagates throughout the blogosphere.
This apparently is an excerpt from a Slashdot discussion on Dan Gillmor's latest book, We the Media, but I wasn't able to find this comment or the author when I looked for it.

Thursday, August 05, 2004

Feedster: An engineer's personal hell

Scott Johnson (VP of Engineering at Feedster) has a desperate sounding post about dealing with quality, reliability, and scaling issues at Feedster:
    I don't think anyone out there will deny that Feedster is, sadly, not delivering the best possible quality these days. Although our complexity, features and traffic have all grown dramatically -- our QA resources have not ... I no longer have the ability to reliably predict that "If I add feature X, feature Y will stil function correctly". More likely its like "feature Z will decide to take the weekend off, feature Q will go on a diet and feature X.12 will turn around, moon me and then give me the bird". Now Feedster is a highly interlinked system and the levels of isolation that perhaps should be there just aren't.
Feedster and Technorati are fantastic blog search engines, more targeted and useful than Google for finding weblogs and weblog articles. Lately, because of Technorati's own scaling issues, I've been tending to use Feedster a lot more. I haven't noticed quality or performance issues with Feedster, but it does sound like they're struggling.

My advice to Scott is basically the standard stuff. Add automated tests, constantly refactor the code and your architecture, and keep your code as simple and easy to maintain as possible. Unit and other automated tests allow you to quickly check for unexpected behavior after making a change. Use them in addition, not instead of, manual QA. Constant refactoring means redesigning the code around a change any time you touch the code to clean up the interfaces, reorganize the components, and reduce dependencies. Generally plan on spending half your time refactoring any time you go in. Other than avoiding balls of mud using constant refactoring, the other part of keeping your code easy to maintain is to avoid undocumented complexities like lengthy regular expressions, weird special cases, or cryptic algorithms. If it's not immediately obvious what the code does and why it's there, stick a comment by it that explains what the code does and why it's there.

Some might recommend taking time for a big rearchitecture project. "Stop doing anything, freeze the code, and rewrite everything," they'll say. I'd recommend against that approach. I've never seen anyone successfully deliver a rearchitecture project that had no other purpose but "cleaning up the code." You can't optimize without a goal to optimize toward. Rearchitecture should always be done as part of a larger goal. Want better performance? Rearchitect the code while seeking better performance. Adding feature X.12? Rearchitect the code to fit the new feature in gracefully.

Wednesday, August 04, 2004

Downside of controlled chaos

No management and an atmosphere of controlled chaos isn't always a good thing. Like when it comes to handling your equity compensation. Google apparently forgot to register 23M shares it issued to various people. Oopsie.

Geolocation

I'm a little late on this one, but Wired had an interesting article on geolocation back in mid-July. Geolocation maps the ip address of your computer to a physical location. So, using the example from the article, if I do a search for "dentist" on Google, it will show ads for dentists in Seattle.

Doing a simple version of geolocating is pretty straightfoward, but it gets more challenging once the low hanging fruit are gone. For example, with some domains (e.g. stanford.edu), you can make a pretty accurate guess at the location of a computer with an IP address in a block owned by that domain. For others, (e.g. aol.com), it's a little tricker, especially as proxies get involved. Here's a quick summary of some of the issues.

The wide availability of geolocation services is yet another reason why it doesn't make sense ([1] [2]) for newspapers to require registration to access their websites. Why bother people asking for their location when you can compute it yourself (and probably with greater accuracy given that people often lie on registration forms)?

CNet's Search.com updates

John Battelle mentions Search.com's latest update and Gary Price posts a brief review.

I have to agree with Gary. Picking the colors for your page and checking off which data sources to use do not equal personalization.

CNet's press release claims "with its new personalization features, Search.com is raising the bar even higher" and that "users easily [can] adjust virtually every aspect of the site to create a completely customized experience." I can understand why CNet is trying to jump on the personalization bandwagon, but this looks like a lot of marketing and very little technology.

Tuesday, August 03, 2004

Yahoo local search

Yahoo's local search beta launched today. Chris Sherman has a detailed review.

One interesting feature that Chris underemphasized is user ratings. From Yahoo's press release, "Users can now rate and review almost every business in the country - from dry cleaners and dentists to florists and fine dining establishments."

Very cool and very useful. Not only does it provide a differentiator with business directories like the Yellow Pages, but also Yahoo is now offering a service much like what guidebooks provide. Citysearch, Zagat, Travelocity, Fodor's, and many others should be concerned about this new development.

Monday, August 02, 2004

ChoiceStream

A couple interesting if somewhat self-serving articles from ChoiceStream recently. ChoiceStream, among other things, is the company behind MyBestBets, a fun recommendation service for TV programs.

The first article claims that "more than 80 percent of online consumers are interested in personalization" and that "56 percent of respondents [are] willing to provide demographic data in exchange for personalized content." Good news for personalization solutions that rely on filling out lengthy profiles, but the conclusions are directly from a ChoiceStream survey, perhaps not the most objective source.

The second article is by the CEO of ChoiceStream, Steve Johnson. Steve claims we are experiencing a "personalization renaissance" where companies look to personalization to deal with an "unfiltered flood of information" and focus on satisfying existing customers in a more mature online marketplace. The article makes some interesting points, but ends with a bizarre claim that "Attributized Bayesian Choice Modeling" is the obvious solution to every personalization issue. Coincidently, ChoiceStream's product just happens to use this technique. While I don't have anything against this particular technique -- it's a content-based approach that might work well for some problems -- what I really object to is the (biased) claim that it's the obvious solution to all your personalization needs.

Personalization is hard, folks. Different approaches have different characteristics on different types of data. Some have higher predictive accuracy in some cases, some have better scalability. Not only do you have to pick the right approach for your problem and your data, but also the approach that works best may change over time as your data changes (especially as your user base grows). There aren't easy solutions. But, if you do put in the effort, personalization can be a powerful and sustainable advantage over your competition; no product your competitors buy off the shelf will replicate it.

Microsoft's personalized search

Giving up on the more subtle hints ([1] [2]) of earlier comments, Bill Gates says outright that Microsoft will be launching personalized search:
    Microsoft said on Monday it was aiming to make search services customized for users so that results would be based on individual preferences and interests. "We're going to make search extremely personal," Microsoft chairman Bill Gates told an audience of computer science and technology researchers.

    Personalized search promises to deliver search results that are more relevant by taking into account an individual's interests based on previous search queries and other information.
[Thanks, Gary Price, for pointing out the article.]

Topix.net and a new NewsRank

Rich Skrenta announces that Topix.net has launched "a next-gen version of our NewsRank(tm) story technology [which] powers the relevance, accuracy and magnitude of the stories categorized on Topix.net."

In many ways, Topix.net and Findory News are trying to solve the same problem, surfacing the news you need from thousands of different news sources. But they use very different approaches. Topix.net provides fine-grained categorization of news articles (e.g. a news category for paleontology, not just for science), so users can dive in to specific categories to get what they want. Findory uses personalization to learn your interests and automatically surface the news you need. Different, perhaps even complementary, approaches to the same problem.

Topix does have some impressive technology. It's not at all easy to categorize and prioritize articles using content analysis, but Topix.net does a pretty good job. How do they do it?
    Not with human editing, source tagging, or keyword scanning. The Topix.net NewsRank engine is reading each story individually, determining locality and subject information based on the content of the article.

    Categorizing sources in order to produce topic aggregations doesn't work. Susan Mernit writes a great blog about online media, but she also writes about food and other personal topics. Blindly adding her entries to a food or media industry aggregation would result in inappropriate posts showing up.

    Source-based categorization doesn't work for local, either. The San Francisco Chronicle runs stories that aren't about San Francisco. Conversely, there are many stories about events in SF that show up in news sources based outside of San Francisco. These stories would be missed with source-based tagging.

    Keyword-driven filters are also a poor solution. Pulling every story out of the news stream with "San Francisco" in it will not make a good SF rollup, but instead will yield a random jumble of posts, most of which merely mention "San Francisco", but overall have nothing to do with it:

    ... on a business trip to San Francisco, ...
    ... an unrestricted free agent from San Francisco, ...
    ... was bound from Alaska to San Francisco in the winter of 1860 ...
    ... moved, with her family to San Francisco in 1960, ...

    The situation is even worse if the keyword is ambiguous ("Kerry", "Bush", "Springfield").

    Our solution is to disambiguate references to people, places and subjects, and match them against our Knowledge Base of 150,000 topics. The result lets our algorithmic story editing technology leverage a much finer-grained idea of what a story is about than simply using the big 7 news categories (US, World, Business, Sci/Tech, Sports, Entertainment, Health.)
Clever, and it seems to work quite well. Nice work.

Sunday, August 01, 2004

MSN Newsbot biased toward MSNBC

The Washington Post reports that Newsbot biases its front page toward MSNBC stories:
    Another key difference between the Microsoft and Google services is that Google's story-selection formula doesn't favor any particular new source. MSN Newsbot, by contrast, gives favorable placement to articles from Microsoft's own MSNBC.com news site.
Like paid placement in search results, this kind of bias is likely to reduce quality. Not a good move.