Glenn Fleishmann was on NPR's The Works yesterday talking about Web 2.0. Glenn defined Web 2.0 as mashups, accessing and combining web APIs. Mashups and nothing but mashups.
When asked about business models for these mashups, Glenn talked about how they could start small with low costs, but said nothing about they might generate revenue or how far they can grow.
Similarly, I talked to Rael Dornfest a couple weeks ago. He also made it clear that he thought mashups are the next big thing.
When I asked Rael about some basic problems with mashups as a business (no service guarantees, limits on the queries of APIs, limits on commercial use of the APIs, numbingly slow, no barriers to entry), he had no answer.
I keep hearing people talk about as if companies are creating web services because they just dream of setting all their data free. Sorry, folks, that isn't the reason.
Companies offer web services to get free ideas, exploit free R&D, and discover promising talent. That's why the APIs are crippled with restrictions like no more than N hits a day, no commercial use, and no uptime or quality guarantees. They offer the APIs so people can build clever toys, the best of which the company will grab -- thank you very much -- and develop further on their own.
There is no business model for mashups. If Web 2.0 really is just mashups, this is going to be one short revolution.
See also my previous post, "Can Web 2.0 mashups be startups?"
Update: Richard MacManus has some good thoughts on this in his post, "Mashups: who's really in control?"
Wednesday, November 30, 2005
Tuesday, November 29, 2005
Microsoft Fremont vs. Google Base
Ben Charny at eWeek reports that Microsoft will soon be offering a competitor to Craigslist and Google Base:
[Found on Findory]
Update: The news about Microsoft Fremont appears to have been first reported by Michael Arrington at TechCrunch. Congrats on the scoop, Mike.
Update: Todd Bishop at the Seattle PI gives us a longer article about Microsoft Fremont. It sounds like Microsoft intends to do some interesting social network stuff with it, allowing selling to your network of friends.
Update: A little fun trivia, Todd Bishop got confirmation that Microsoft Fremont is named after the Fremont community in Seattle, the "center of the universe."
Update: Charlene Li saw a demo of Microsoft Fremont and describes in some detail why she thinks "Microsoft's classifieds service will be better than Google Base." Most of her criticism centers around poor usability of Google Base for mainstream users.
However, at this point, Google Base is more of a database than a end-user product, literally a base on which to build. It probably underestimates the potential to evaluate it directly against Craigslist or other classified sites in its current form. We'll likely soon see new products launched with a more mainstream look-and-feel and better usability layered on top of Google Base.
Update: Danny Sullivan makes a similar point, that Microsoft Fremont/Craigslist should be compared to a later Google Classifieds product, not to Google Base.
Update: Four months later, Microsoft renamed Fremont to Windows Live Expo and launched it. It has no payment mechanism, requires MS Passport to use, and generally has the feel of free classifieds with some social networking goo thrown on top.
It is interesting to note that Google went the other direction with this, integrating a payment mechanism into Google Base. This makes Windows Live Expo (aka Fremont) look more like a competitor to Craigslist and Google Base more like a competitor to eBay, Amazon, and Yahoo Stores.
Microsoft Corp. said it is readying an online marketplace, code-named Fremont, which is apparently in response to a similar feature that rival Google Inc. introduced a few weeks ago.See also my previous post, "Google Base and getting the crap out".
Fremont is a free service in which people contribute listings, whether it's about a couch for sale or someone looking for a commuting partner.
[Found on Findory]
Update: The news about Microsoft Fremont appears to have been first reported by Michael Arrington at TechCrunch. Congrats on the scoop, Mike.
Update: Todd Bishop at the Seattle PI gives us a longer article about Microsoft Fremont. It sounds like Microsoft intends to do some interesting social network stuff with it, allowing selling to your network of friends.
Update: A little fun trivia, Todd Bishop got confirmation that Microsoft Fremont is named after the Fremont community in Seattle, the "center of the universe."
Update: Charlene Li saw a demo of Microsoft Fremont and describes in some detail why she thinks "Microsoft's classifieds service will be better than Google Base." Most of her criticism centers around poor usability of Google Base for mainstream users.
However, at this point, Google Base is more of a database than a end-user product, literally a base on which to build. It probably underestimates the potential to evaluate it directly against Craigslist or other classified sites in its current form. We'll likely soon see new products launched with a more mainstream look-and-feel and better usability layered on top of Google Base.
Update: Danny Sullivan makes a similar point, that Microsoft Fremont/Craigslist should be compared to a later Google Classifieds product, not to Google Base.
Update: Four months later, Microsoft renamed Fremont to Windows Live Expo and launched it. It has no payment mechanism, requires MS Passport to use, and generally has the feel of free classifieds with some social networking goo thrown on top.
It is interesting to note that Google went the other direction with this, integrating a payment mechanism into Google Base. This makes Windows Live Expo (aka Fremont) look more like a competitor to Craigslist and Google Base more like a competitor to eBay, Amazon, and Yahoo Stores.
Is personalized search a dead end?
Raul Valdes-Perez, CEO of the excellent clustering search engine Vivisimo, wrote a one page paper called "Why Search Personalization is a Dead End" (PDF).
He lists five reasons why personalized search is doomed:
There are two problems with this argument. First, Amazon.com's personalization works just fine from similar clickstream data. Sure, it's true that the data is dynamic, noisy, and sparse, but Amazon deals with that by using algorithms that adapt rapidly, are tolerant to errors, and work from very little data.
Second, personalization doesn't have to be perfect. It just has to be better than the alternative. In Amazon's case, the alternative to a personalized front page is a generic front page with a top sellers list or a bunch of marketing goo. It's easy to be more useful to shoppers than that. Mistakes are just fine. The guesses just need to be right more often than the alternative.
Personalized search is no different. The algorithms need to adapt rapidly, be tolerant to noise, and work from little data. Mistakes are fine. Personalized search just needs to be more useful than unpersonalized search.
See also my earlier posts, "Perfect Search and the clickstream" and "Personalized search vs. clustering".
[Valdes-Perez paper via John Battelle]
He lists five reasons why personalized search is doomed:
People are not static; they have many fleeting and seasonal interests.The criticisms might be summarized as a claim that clickstream data is too dynamic, noisy, and sparse to support personalization.
The surfing data used for personalizing search is weak [compared to purchase data].
The user's decision to visit the page is based on the title
and brief excerpt (snippet) that are shown in the search results, not the whole page.
Home computers are often shared among family members.
Queries tend to be short.
There are two problems with this argument. First, Amazon.com's personalization works just fine from similar clickstream data. Sure, it's true that the data is dynamic, noisy, and sparse, but Amazon deals with that by using algorithms that adapt rapidly, are tolerant to errors, and work from very little data.
Second, personalization doesn't have to be perfect. It just has to be better than the alternative. In Amazon's case, the alternative to a personalized front page is a generic front page with a top sellers list or a bunch of marketing goo. It's easy to be more useful to shoppers than that. Mistakes are just fine. The guesses just need to be right more often than the alternative.
Personalized search is no different. The algorithms need to adapt rapidly, be tolerant to noise, and work from little data. Mistakes are fine. Personalized search just needs to be more useful than unpersonalized search.
See also my earlier posts, "Perfect Search and the clickstream" and "Personalized search vs. clustering".
[Valdes-Perez paper via John Battelle]
Monday, November 28, 2005
Is personalized advertising evil?
In a long post about unethical behavior at software firms, Alex Bosworth has an interesting rant against personalized advertising:
On the other hand, personalization may offer the key to relevance. We are all bombarded by advertising in our daily lives. Junk mail, ads in magazines, it is all ineffective mass market noise pummeling us with things we don't want. But, companies with a helpful product need some way for interested people to find out about it. What I would like is a way for the ads to be limited to only interested people. Don't waste my time, tell me about something that actually might interest me. I'd like the advertising to be useful.
I think that being obnoxious always fails in the long run. Spam e-mail is now filtered. Pop up ads are blocked. Obnoxious personalized advertising would be no different. People hate obnoxious.
But being relevant and useful always pays off. Personalization can help people find and discover relevant information they wouldn't have found on their own. Personalized advertising can be useful. And people like useful.
Until privacy advocates raised a ruckus, Google engineers had big plans for mining a user's email trove to offer them precisely targeted advertisements ...On the one hand, personalized advertising could open the door to a new forms of invasive annoyances. I am reminded of the scene in Minority Report where sales avatars on screens call out to you by name as you walk past. Or the disturbing vision of the future in the Golden Age series where flying banner advertisements swarm through the air, something so obnoxious that everyone uses cybernetic implants that alter their perception of the world and filter out the ads.
A real life equivalent to Google's personalized advertising dream would be a store where an anonymous greeter already knew not only your name but everything about you including the contents of your most intimate communications, and attempted to direct you to things they thought you might buy.
This is not only extremely off-putting and an abuse of trust to many, it's also potentially disruptive to the decisions a person might normally take.
On the other hand, personalization may offer the key to relevance. We are all bombarded by advertising in our daily lives. Junk mail, ads in magazines, it is all ineffective mass market noise pummeling us with things we don't want. But, companies with a helpful product need some way for interested people to find out about it. What I would like is a way for the ads to be limited to only interested people. Don't waste my time, tell me about something that actually might interest me. I'd like the advertising to be useful.
I think that being obnoxious always fails in the long run. Spam e-mail is now filtered. Pop up ads are blocked. Obnoxious personalized advertising would be no different. People hate obnoxious.
But being relevant and useful always pays off. Personalization can help people find and discover relevant information they wouldn't have found on their own. Personalized advertising can be useful. And people like useful.
Saturday, November 26, 2005
Amazon adds product wikis, tagging
Amazon has added a "ProductWiki" to some product pages, following a move a couple weeks ago to test tagging of products.
For as long as I can remember (at least 1996), Amazon has allowed customers to review and comment on products on their site. These ProductWiki and tagging experiments seem to be attempting to build off the success of customer reviews by gathering additional user contributed content.
I think these experiments are pretty interesting, but I'm not sure these particular efforts are likely to bear fruit. Wikipedia fights off spam and crap by having a couple thousand dedicated volunteer editors who track recent changes closely and revert bad content changes quickly. Amazon will not have that for their ProductWikis. Tagging in Flickr works well because metadata isn't available for photos unless users provide it explicitly. Products already have metadata, including keywords extracted from the descriptions, so the value of tagging isn't as obvious.
Regardless, it's great to see this kind of experimentation from Amazon. From customer reviews to user pages to friends pages, Amazon was very early with community and social networking features. There's much opportunity here.
[Found on Findory]
For as long as I can remember (at least 1996), Amazon has allowed customers to review and comment on products on their site. These ProductWiki and tagging experiments seem to be attempting to build off the success of customer reviews by gathering additional user contributed content.
I think these experiments are pretty interesting, but I'm not sure these particular efforts are likely to bear fruit. Wikipedia fights off spam and crap by having a couple thousand dedicated volunteer editors who track recent changes closely and revert bad content changes quickly. Amazon will not have that for their ProductWikis. Tagging in Flickr works well because metadata isn't available for photos unless users provide it explicitly. Products already have metadata, including keywords extracted from the descriptions, so the value of tagging isn't as obvious.
Regardless, it's great to see this kind of experimentation from Amazon. From customer reviews to user pages to friends pages, Amazon was very early with community and social networking features. There's much opportunity here.
[Found on Findory]
Wednesday, November 23, 2005
Web 2.0 bingo
Steven Cohen points to Web2.0bingo.com, a site that lets you quickly print random bingo boards with Web 2.0 buzzwords.
Looking at this with Findory... Bingo! 13 out of 24. We're highly buzzword compliant, yippie.
Looking at this with Findory... Bingo! 13 out of 24. We're highly buzzword compliant, yippie.
Tuesday, November 22, 2005
Perfect Search and the clickstream
John Battelle generously sent me a signed copy of his new book, "The Search". Thanks, John!
The book is a fun read, a great overview of the history of search companies with some interesting thoughts on the future of search.
The last chapter, Perfect Search, looks forward to the next-generation of search engines. It has several pages on clickstream personalization. An excerpt:
The book is a fun read, a great overview of the history of search companies with some interesting thoughts on the future of search.
The last chapter, Perfect Search, looks forward to the next-generation of search engines. It has several pages on clickstream personalization. An excerpt:
Perfect search ... means nothing if the engine does not understand you -- your likes and dislikes, your tendencies and tics.Current search engines treat each search independently, ignoring the valuable information about what you just did, about what you just found or failed to find. Paying attention to that clickstream will allow search to become more relevant, more useful, and more helpful, all with no effort from searchers.
A solution to this problem lies in the domain of your clickstream. Through the actions we take in the digital world, we leave traces of our intent, and the more those traces become trails, the more strongly an engine might infer our intent given any particular query ... I expect those trails ... to turn into relevance gold ....
Clickstreams can provide a level of intelligence about how people use the Web that will be on an order of magnitude more nuanced than mere links, which formed the basis for Google's PageRank revolution ....
Clickstreams are the seeds that will grow into our culture's own memex -- a new ecology of potential knowledge -- and search will be the spade that turns the Internet's soil.
Engines that leverage clickstreams will make link analysis-based search (nearly all of commercial search today) look like something out of the Precambrian era ... We have yet to aggregate the critical mass of clickstreams upon which a next-generation engine might be built ... [but] we're already pouring its foundations.
Mixing local into Google's Froogle
It's being widely reported that Google's metashopping site Froogle now lets shoppers see what items are available at physical stores nearby.
Coverage seems quite weak at the moment. Here's a search for "earrings 94301" that returns almost no results in the Palo Alto area. I would hope that would improve quickly since the service is nearly useless as is.
However, looking forward, this brings Google much closer to the disintermediation threat that has struck fear into retailing giants. Improve the coverage, hook this up to Google SMS, and you've got a service that might "be able to tell Wal-Mart shoppers if better bargains are available nearby."
Coverage seems quite weak at the moment. Here's a search for "earrings 94301" that returns almost no results in the Palo Alto area. I would hope that would improve quickly since the service is nearly useless as is.
However, looking forward, this brings Google much closer to the disintermediation threat that has struck fear into retailing giants. Improve the coverage, hook this up to Google SMS, and you've got a service that might "be able to tell Wal-Mart shoppers if better bargains are available nearby."
Monday, November 21, 2005
Food, perks, and competing with Google
Marc Ramirez at the Seattle Times has a fun article on the food at prominent Seattle companies.
At a couple points, Marc compares the offerings to the "Google Nirvana," saying that "Google single-handedly has redefined the meaning of corporate cafeterias."
I am quoted in the article as saying that Google is "in a class of its own." The article also quotes from this blog post where I said that "investing in your people pays for itself."
It is amazing to me that Microsoft cut their perks at a time when it is trying so hard to compete with Google. That is not helpful if you want to retain your best people.
At a couple points, Marc compares the offerings to the "Google Nirvana," saying that "Google single-handedly has redefined the meaning of corporate cafeterias."
I am quoted in the article as saying that Google is "in a class of its own." The article also quotes from this blog post where I said that "investing in your people pays for itself."
It is amazing to me that Microsoft cut their perks at a time when it is trying so hard to compete with Google. That is not helpful if you want to retain your best people.
Sunday, November 20, 2005
How Google tamed advertising
Randall Stross at the New York times has an interesting column today, "How Google Tamed Ads".
An excerpt:
If the ads are well targeted and interesting, people will stop ignoring them. When I search for products on Google, I often skim over the ads as well as the web search results. The Google search ads are useful, often a link to exactly what I need, as relevant as much of the other content on the page.
But, most ads still are targeted only using the content on the page. This works okay for search, when I'm on a focused mission, but less well for content sites like news or weblogs. We can do better.
The next step is to personalize the advertising. Sites need to learn about me. Pay attention to what I'm doing and what I like. They shouldn't waste my time with things they know will be irrelevant to me. Content sites should show me ads for things I might actually want.
See also my previous posts, "Make advertising useful" and "The content should find you".
An excerpt:
Five years ago, Web advertisers were engaged in an ever-escalating competition to grab our attention. Monkeys that asked to be punched, pop-ups that spawned still more pop-ups, strobe effects that imparted temporary blindness - these were legal forms of assault.Exactly right. Be relevant, not annoying.
The most brazen advertiser of all, hands down, was X10, a little company hawking security cameras, whose ubiquitous "pop under" ads were the nasty surprise discovered only when you closed a browser window in preparation for doing something else.
Today, Web advertisers by and large have put down their weapons and sworn off violence. They use indoor voices now. This is a remarkable change.
Thank you, Google.
Without intending to do so, the company set in motion multilateral disarmament by telling its first advertisers in 2000: text only, please. No banner ads, no images, no animation. Just simple words.
If the ads are well targeted and interesting, people will stop ignoring them. When I search for products on Google, I often skim over the ads as well as the web search results. The Google search ads are useful, often a link to exactly what I need, as relevant as much of the other content on the page.
But, most ads still are targeted only using the content on the page. This works okay for search, when I'm on a focused mission, but less well for content sites like news or weblogs. We can do better.
The next step is to personalize the advertising. Sites need to learn about me. Pay attention to what I'm doing and what I like. They shouldn't waste my time with things they know will be irrelevant to me. Content sites should show me ads for things I might actually want.
See also my previous posts, "Make advertising useful" and "The content should find you".
Friday, November 18, 2005
A data center in a trailer
The I, Cringely article this week, "Google-Mart", has an interesting rumor about Google's prototype of a data center stuffed into a truck trailer:
But this trailer rumor, if true, is an interesting step in mass production of data centers. Google already just wheels in racks, 40-80 computers in each, plugs 'em in, and sets up their data centers very quickly. Now, maybe they'll just drop off a couple trailers, plug them in, and -- poof! -- instant data center.
[via Don Dodge]
Update: In his latest column, Cringely extends this vision with Google Cubes in every house, on every TV, on every phone, everywhere. Interesting mind trip, though a little too reminiscent of the Borg for my taste.
In one of Google's underground parking garages in Mountain View ... in a secret area off-limits even to regular GoogleFolk, is a shipping container. But it isn't just any shipping container. This shipping container is a prototype data center.The article goes on to claim that Google will take over the internet, crush Yahoo and Microsoft, enslave all of humanity to the new Google hive mind, blah, blah, blah. Not so sure about that part.
Google hired a pair of very bright industrial designers to figure out how to cram the greatest number of CPUs, the most storage, memory and power support into a 20- or 40-foot box. We're talking about 5000 Opteron processors and 3.5 petabytes of disk storage that can be dropped-off overnight by a tractor-trailer rig.
The idea is to plant one of these puppies anywhere Google owns access to fiber, basically turning the entire Internet into a giant processing and storage grid.
But this trailer rumor, if true, is an interesting step in mass production of data centers. Google already just wheels in racks, 40-80 computers in each, plugs 'em in, and sets up their data centers very quickly. Now, maybe they'll just drop off a couple trailers, plug them in, and -- poof! -- instant data center.
[via Don Dodge]
Update: In his latest column, Cringely extends this vision with Google Cubes in every house, on every TV, on every phone, everywhere. Interesting mind trip, though a little too reminiscent of the Borg for my taste.
Security hole in Google Sitemaps
Danny Sullivan reports that David Naylor and others discovered a security hole in rights to access statistics through Google Sitemaps.
To prove you own the website you want to access, Google Sitemaps asks you to drop a file with a long code in the filename at the root level of your website (e.g. 1029392729387.html). It then checks to make sure this file exists and, if it does, it gives you access.
The problem is that it only checks if the file exists. As David and Danny point out, many websites -- including huge ones like eBay, AOL, and Google's own Orkut -- display a nice error message to users on invalid pages that say something like, "Hey, this page doesn't exist!" Google Sitemaps sees that error page is returned with a 200 code, not a 404 code, and concludes, "Huh, look, the page exists!"
Oopsie. Because of this error, Danny and David managed to access the Google Sitemap stats for eBay, AOL, and other websites.
On the one hand, Google is right that websites really should return a HTTP "not found" code (404) for pages that are not found. On the other hand, many, many sites don't.
This reminds me of the problems with caching and prefetching when Google Web Accelerator launched. Google assumed all websites strictly obeyed the HTTP spec, but they don't, so the tool didn't work properly. You need to work with reality, Google, not the way things should be.
This really is pretty lame of Google. Many other sites have to deal with this same kind of "claim your site" problem. They often do it by requiring you to put a code in a comment in one of your webpages, not by creating a new file, but there's any number of other ways to do it that work just dandy and don't open huge security holes.
C'mon, Google, you're supposed to be better than this.
Update: About 8 hours later, Stefanie Olsen says that Google has fixed the issue. Quick response, excellent.
Update: Another security problem at Google, a cross-site scripting vulnerability in Google Base. Apparently, the problem already has been fixed. [via Nathan Weinberg and Danny Sullivan]
Update: When it rains, it pours. Another recent security issue, this one in the Google Mini, that could have allowed arbitrary command execution. It already has been patched. [via Danny Sullivan]
To prove you own the website you want to access, Google Sitemaps asks you to drop a file with a long code in the filename at the root level of your website (e.g. 1029392729387.html). It then checks to make sure this file exists and, if it does, it gives you access.
The problem is that it only checks if the file exists. As David and Danny point out, many websites -- including huge ones like eBay, AOL, and Google's own Orkut -- display a nice error message to users on invalid pages that say something like, "Hey, this page doesn't exist!" Google Sitemaps sees that error page is returned with a 200 code, not a 404 code, and concludes, "Huh, look, the page exists!"
Oopsie. Because of this error, Danny and David managed to access the Google Sitemap stats for eBay, AOL, and other websites.
On the one hand, Google is right that websites really should return a HTTP "not found" code (404) for pages that are not found. On the other hand, many, many sites don't.
This reminds me of the problems with caching and prefetching when Google Web Accelerator launched. Google assumed all websites strictly obeyed the HTTP spec, but they don't, so the tool didn't work properly. You need to work with reality, Google, not the way things should be.
This really is pretty lame of Google. Many other sites have to deal with this same kind of "claim your site" problem. They often do it by requiring you to put a code in a comment in one of your webpages, not by creating a new file, but there's any number of other ways to do it that work just dandy and don't open huge security holes.
C'mon, Google, you're supposed to be better than this.
Update: About 8 hours later, Stefanie Olsen says that Google has fixed the issue. Quick response, excellent.
Update: Another security problem at Google, a cross-site scripting vulnerability in Google Base. Apparently, the problem already has been fixed. [via Nathan Weinberg and Danny Sullivan]
Update: When it rains, it pours. Another recent security issue, this one in the Google Mini, that could have allowed arbitrary command execution. It already has been patched. [via Danny Sullivan]
Thursday, November 17, 2005
Netflix and personalization
Davis Freeberg has an interesting writeup of an investor presentation by Netflix CFO Barry McCarthy.
An excerpt on Netflix personalization and recommendations:
Personalization surfaces interesting items you didn't know about and wouldn't have found on your own. It's a complement to search. Search helps when you know what you want. Personalization helps when you don't know what's out there.
[via TechDirt]
An excerpt on Netflix personalization and recommendations:
The personalization of their site is really what makes their service so unique. At this point Netflix has now collected over 1 billion ratings for moves. They use these ratings to make recommendations of long tail content for their consumers.Personalization aids discovery in the long tail, pushing consumer demand into the back catalog.
[McCarthy said,] "We help you find movies you've never heard of ... There were 554 movies released theatrically last year and I bet most of us can't name 20. A lot of those movies you would enjoy, if you knew that they existed. If you don't know that they existed then they might as well have not been invented."
He later goes on to compare this approach with Blockbuster.
"Historically Blockbuster has reported that about 90% of the movies they rent are new theatrical releases ... They have a slightly different mix online ... 70% of what they rent online is new releases and about 30% is back catalog."
"About 30% of what we rent is new releases and about 70% is back catalog and it's not because we have a different subscriber. It's because we create demand for content and we help you find great movies that you'll really like, we do it algorithmically and we do it with recommendations and ratings."
Personalization surfaces interesting items you didn't know about and wouldn't have found on your own. It's a complement to search. Search helps when you know what you want. Personalization helps when you don't know what's out there.
[via TechDirt]
Free Google WiFi in Mountain View
Matt Marshall at SiliconBeat reports that Google will be providing free wireless internet access to everyone in the city of Mountain View.
Google also has an offer out to provide free WiFi to all of San Francisco.
So, Google, when are you coming to Seattle? Pretty please?
Google also has an offer out to provide free WiFi to all of San Francisco.
So, Google, when are you coming to Seattle? Pretty please?
Google gobbling up Riya?
Om Malik posts a rumor that Riya may be getting acquired by Google.
Riya does automatic people and object recognition in photos. The technology supposedly can automatically tag photos with descriptions of the content of the image. It has obvious applicability to Google Image Search and Picasa.
Interesting if it turns out to be true. Yahoo's focus seems to be on community and user-generated content (like Flickr tagging). Google focuses on automation using clever algorithms. Google acquiring Riya would fit that pattern.
[via Niall Kennedy]
Update: Michael Arrington has a fun sneak peek at Riya's technology.
Update: The rumor was bogus.
Update: A year later, Riya gives up on face recognition as too hard and switches to recognizing characteristics of products. Very lame given all the Riya hype. Riya appears to have been yet another company with vaporware. They made strong claims about solving hard problems that they never could actually solve.
Riya does automatic people and object recognition in photos. The technology supposedly can automatically tag photos with descriptions of the content of the image. It has obvious applicability to Google Image Search and Picasa.
Interesting if it turns out to be true. Yahoo's focus seems to be on community and user-generated content (like Flickr tagging). Google focuses on automation using clever algorithms. Google acquiring Riya would fit that pattern.
[via Niall Kennedy]
Update: Michael Arrington has a fun sneak peek at Riya's technology.
Update: The rumor was bogus.
Update: A year later, Riya gives up on face recognition as too hard and switches to recognizing characteristics of products. Very lame given all the Riya hype. Riya appears to have been yet another company with vaporware. They made strong claims about solving hard problems that they never could actually solve.
Wednesday, November 16, 2005
Google Base and getting the crap out
Bindu Reddy has the post on the Google Blog announcing the launch of Google Base.
Widely hyped as a possible Craigslist and eBay killer, Google Base looks to me a lot more like a slightly more structured version of a wiki. You can add nearly any content you want defined by nearly any fields you want.
There isn't all that much content yet but, as content is added, the trick will be keeping spam and crap out. I expect Google Base to be treated like a free version of Google AdWords by many. I doubt it will take long for people to upload your usual assortment of credit card offers, domain name services, get-rich-quick schemes, and exciting new ways to increase the size of your willy to elephantine proportions.
So, how will they help people find the relevant stuff and filter out the crap? At this point, it isn't clear. We'll have to wait and watch.
See also my previous post, "Getting the crap out of user-generated content".
See also good comments on Google Base by Nathan Weinberg, John Battelle, Tara Calishain, Gary Price, Danny Sullivan, and TechDirt.
Update: Google Base does appear to be using a few techniques to reduce crap, including automated detection of naughty or spammy words, community reporting of bad items, and, when searching, suggestions of categories and tag terms to refine the search and improve relevance.
It will be interesting to watch and see how well these techniques scale over time.
Update: Wow, it sure is easy to upload crap to Google Base. It takes RSS feeds of 1k items at a time.
I think we're going to see some heavy abuse of this, especially if Google starts including Google Base search results in Google web search (like they do for Google News and Froogle search results). That would create quite the profit motive.
Here's one crazy idea for abusing the system. Someone should try uploading all of Amazon.com's RSS feeds but inserting an associate tag into the URL. Same thing is probably possible with eBay and many other places that offer referral kickbacks. Woo hoo, go wild, script kiddies.
Update: A week later, John Leyden at The Register reports "Google Base awash with smut".
Update: Three weeks later, Nathan Weinberg reports that "product listings on Google Base have become almost entirely filled with [affiliate] redirect URLs to Amazon." Woo hoo, looks like the script kiddies went wild.
Widely hyped as a possible Craigslist and eBay killer, Google Base looks to me a lot more like a slightly more structured version of a wiki. You can add nearly any content you want defined by nearly any fields you want.
There isn't all that much content yet but, as content is added, the trick will be keeping spam and crap out. I expect Google Base to be treated like a free version of Google AdWords by many. I doubt it will take long for people to upload your usual assortment of credit card offers, domain name services, get-rich-quick schemes, and exciting new ways to increase the size of your willy to elephantine proportions.
So, how will they help people find the relevant stuff and filter out the crap? At this point, it isn't clear. We'll have to wait and watch.
See also my previous post, "Getting the crap out of user-generated content".
See also good comments on Google Base by Nathan Weinberg, John Battelle, Tara Calishain, Gary Price, Danny Sullivan, and TechDirt.
Update: Google Base does appear to be using a few techniques to reduce crap, including automated detection of naughty or spammy words, community reporting of bad items, and, when searching, suggestions of categories and tag terms to refine the search and improve relevance.
It will be interesting to watch and see how well these techniques scale over time.
Update: Wow, it sure is easy to upload crap to Google Base. It takes RSS feeds of 1k items at a time.
I think we're going to see some heavy abuse of this, especially if Google starts including Google Base search results in Google web search (like they do for Google News and Froogle search results). That would create quite the profit motive.
Here's one crazy idea for abusing the system. Someone should try uploading all of Amazon.com's RSS feeds but inserting an associate tag into the URL. Same thing is probably possible with eBay and many other places that offer referral kickbacks. Woo hoo, go wild, script kiddies.
Update: A week later, John Leyden at The Register reports "Google Base awash with smut".
Update: Three weeks later, Nathan Weinberg reports that "product listings on Google Base have become almost entirely filled with [affiliate] redirect URLs to Amazon." Woo hoo, looks like the script kiddies went wild.
Monday, November 14, 2005
Alex takes the red pill
Alex Edelman is leaving Findory and joining IMDb today.
If you have never checked out the Internet Movie Database (IMDb), you definitely should. It's filled with detailed information and reviews about movies. Don't miss their remarkable power search that lets you do elaborate searches for things like "highly rated Sci Fi movies made after 2001 with more than 500 votes".
The problem with being a frugal, self-funded startup is that you're a frugal, self-funded startup. After 14 months without a salary, Alex felt he needed a steady source of income, and Findory is not able to provide that for him.
I am proud of what Alex and I built together at Findory. The last 14 months have been extraordinary:
If you have never checked out the Internet Movie Database (IMDb), you definitely should. It's filled with detailed information and reviews about movies. Don't miss their remarkable power search that lets you do elaborate searches for things like "highly rated Sci Fi movies made after 2001 with more than 500 votes".
The problem with being a frugal, self-funded startup is that you're a frugal, self-funded startup. After 14 months without a salary, Alex felt he needed a steady source of income, and Findory is not able to provide that for him.
I am proud of what Alex and I built together at Findory. The last 14 months have been extraordinary:
- Traffic growth: In the last 14 months, Findory traffic grew nearly 1800% from 250k hits/month to 4.4M hits/month.
- Redesigns: We had two major redesigns (original, first, current).
- Servers: Findory deployed four additional servers as we grew, bringing our total to six.
- Press: We enjoyed seeing press coverage in Time Magazine, Spiegel, The Times, eWeek, Seattle PI, Seattle Times, Puget Sound Business Journal, Searcher Magazine, Forbes.com, Search Engine Watch, InsideGoogle, Online Journalism Review, and many other places.
- Millions of feeds: We launched millions of different RSS feeds, helping people read and consume information from Findory any way they like it. Our unusual personalized versions of our feeds learn and adapt as you read articles from the feeds.
- Inline Findory: Inline Findory lets bloggers put a snippet of their Findory front page on their weblog.
- Findory API: The Findory API lets other sites remix Findory data for fun and profit.
- Personalized web search: Our alpha of our personalized web search modifies web search results using each person's search and clickstream history.
- Personalized news and blogs searches: Our personalized news and blogs searches highlight articles in your search results that Findory recommends based on your reading history.
- Search history: Search history makes it easy to find things you found once before.
- Source pages: For any news site or weblog in our database, source pages show recent articles, related news sites and weblogs, and related articles.
- Findory feed reader: In September 2005, we launched our personalized feed reader. Unlike other feed readers, Findory's feed reader recommends articles that are particularly likely to be interesting to you.
- Personalized advertising: Recently, we launched our personalized advertising engine. It picks advertisements based on not only the content of the page, but also which articles each reader has read in the past.
Thursday, November 10, 2005
Personalized news search from Google?
Chris Sherman reports that Google will be launching personalized news search soon:
The beginning part of Chris' article talks about Google Personalized Web Search. If you haven't tried that yet, you should. It's the only example from any of the search giants of showing different search results to different people based on what each person has done in the past.
There are smaller folks exploring personalized web search including Findory. And it's interesting to look at the differences between Findory's alpha personalized web search and Google's personalized web search. Google's technique is to bias all your search results toward your profile (e.g. read an article on fly fishing, then a future search on "bass" is biased toward fishing, as is a future search on "computer").
Findory's personalized search (which admittedly is much less mature) tries to change your search results based on what you just did. If you do search, don't find what you want, then twiddle your keywords and search again, there's valuable information there. What you did or didn't find in your first search should influence what you see in your second search.
That's the big difference. Findory's technique tries to make fine-grained changes to your search results to help you with whatever you're trying to do right now. Google's technique makes coarse-grained changes (e.g. a bias toward fishing) using a long-term profile.
Fun stuff. It's great to see Google pushing hard on personalization of information.
Google plans to integrate personalized search with Google News ... You'll be able to see the history of past news searches and the articles that you clicked on.Excellent. Findory has had this for a long time, but it is great to see it from Google. It doesn't appear that your Google News reading history will personalize the Google News front page like Findory, but that may be coming at some point as well.
Since Google only maintains links to news stories up to a maximum of 30 days after publication, you may not be able to retrieve the article from your history. However, both the title and URL of stories are preserved, and you will be directed to news site to search for the article using the news service's own site search or archive tools.
Google says that the integration of Google News into personalized search will be coming "soon."
The beginning part of Chris' article talks about Google Personalized Web Search. If you haven't tried that yet, you should. It's the only example from any of the search giants of showing different search results to different people based on what each person has done in the past.
There are smaller folks exploring personalized web search including Findory. And it's interesting to look at the differences between Findory's alpha personalized web search and Google's personalized web search. Google's technique is to bias all your search results toward your profile (e.g. read an article on fly fishing, then a future search on "bass" is biased toward fishing, as is a future search on "computer").
Findory's personalized search (which admittedly is much less mature) tries to change your search results based on what you just did. If you do search, don't find what you want, then twiddle your keywords and search again, there's valuable information there. What you did or didn't find in your first search should influence what you see in your second search.
That's the big difference. Findory's technique tries to make fine-grained changes to your search results to help you with whatever you're trying to do right now. Google's technique makes coarse-grained changes (e.g. a bias toward fishing) using a long-term profile.
Fun stuff. It's great to see Google pushing hard on personalization of information.
Sunday, November 06, 2005
Topix.net adds blogs
Rich Skrenta posts that Topix.net just added thousands of blogs to their site, crawling them and categorizing them along with articles from thousands of mainstream sources.
Topix.net says they added the 15k "top weblogs" to their search results mixed in with mainstream news sources.
For example, a search for "Amazon Mechanical Turk" on Topix.net brings up articles from news sites and a few dozen blogs including my post.
It doesn't dive as deep as Technorati, Feedster, or Google Blog Search, but does give a good number of high quality results. I saw no spam in my search results for the examples I tried, something that is a serious problem on those other blog search engines.
Rich did say that the 15k blogs are just a start and that they hope to expand their coverage up to 1M weblogs. It'll be interesting to see if they can maintain their spammy-free goodness as they broaden their coverage.
Speaking of spam, Rich said:
For more on blog spam, see my earlier post, "How many feeds matter?", where I said that, based on Findory's experience and data from Bloglines, 95% or more of the supposed 20M+ weblogs out there appear to be fake, not useful, or spam.
Topix.net says they added the 15k "top weblogs" to their search results mixed in with mainstream news sources.
For example, a search for "Amazon Mechanical Turk" on Topix.net brings up articles from news sites and a few dozen blogs including my post.
It doesn't dive as deep as Technorati, Feedster, or Google Blog Search, but does give a good number of high quality results. I saw no spam in my search results for the examples I tried, something that is a serious problem on those other blog search engines.
Rich did say that the 15k blogs are just a start and that they hope to expand their coverage up to 1M weblogs. It'll be interesting to see if they can maintain their spammy-free goodness as they broaden their coverage.
Speaking of spam, Rich said:
What we're seeing is that 85-90% of the daily posts hitting ping services such as weblogs.com are spam (take a look for yourself). Of well-ranked non-spam blogs that we've discovered, we've found about half haven't been updated in the past 60 days. Our filters sift through what's left, which even after discarding 95%, is still a great deal of good material.Lots of crap out there, isn't there?
For more on blog spam, see my earlier post, "How many feeds matter?", where I said that, based on Findory's experience and data from Bloglines, 95% or more of the supposed 20M+ weblogs out there appear to be fake, not useful, or spam.
Saturday, November 05, 2005
Just Google it and disintermediation
Steve Lohr at the New York Times reports that fear of disintermediation by Google is hitting retailers as powerful as Wal-Mart:
[Found on Findory]
In Google, Wal-Mart sees both a technology pioneer and the seed of a threat ... The worry is that by making information available everywhere, Google might soon be able to tell Wal-Mart shoppers if better bargains are available nearby.Google is pretty close to that already. In a retail store, when you're looking at something on the store shelf, try using your cell phone to send a text message to Google SMS with the word "price" and the UPC code or a brief description. Google will get back to you in a few seconds with what online retailers are charging for that item. Fun stuff.
[Found on Findory]
Friday, November 04, 2005
Amazon Mechanical Turk?
This has to be the strangest thing I've seen in a while.
Amazon is apparently behind the site mturk.com which calls itself "Amazon Mechanical Turk: Artificial Artificial Intelligence".
According to part of their FAQ:
I really don't know what to say. I have a hard time seeing how this idea can succeed.
Google Answers works because the fees are high, answers quite complex, and experts well vetted. The core idea behind Amazon's Mechanical Turk seems to be to take the success of Google Answers and try to scale it up by a few orders of magnitude.
But there's problems with that. If I scale up by doing cheaper answers, I won't be able to filter experts as carefully, and quality of the answers will be low. Many of the answers will be utter crap, just made up, quick bluffs in an attempt to earn money from little or no work. How will they deal with this?
It seems to me that Amazon has just changed the problem from finding the answer to the problem from the data available to digging out the correct answer from all the crappy answers provided. Filtering crap out of user generated content at large scale is a difficult problem too.
More comments and discussion at Metafilter, TechDirt, Google Blogoscoped, Rob Hof, Jason Fried, Greg Yardley, and Slashdot.
Update: Don't miss the ongoing discussion in the comments to this post.
Update: I have a short quote in a Seattle PI article by Kristen Bolt on Amazon Mechanical Turk.
Amazon is apparently behind the site mturk.com which calls itself "Amazon Mechanical Turk: Artificial Artificial Intelligence".
According to part of their FAQ:
Amazon Mechanical Turk provides a web services API for computers to integrate "artificial, artificial intelligence" directly into their processing by making requests of humans.For those that doubt that Amazon would do something this... umm... innovative, a quick view of the page source shows that many of the images and links are served from Amazon.com. This really does appear to be Amazon.
A network of humans fuels this artificial, artificial intelligence by coming to the web site, searching for and completing tasks, and receiving payment for their work.
For software developers, the Amazon Mechanical Turk web service solves the problem of building applications that until now have not worked well because they lack human intelligence. Humans are much more effective than computers at solving some types of problems, like finding specific objects in pictures, evaluating beauty, or translating text.
For businesses and entrepreneurs who want tasks completed, the Amazon Mechanical Turk web service solves the problem of getting work done in a cost-effective manner by people who have the skill to do the work.
For people who want to earn money in their spare time, the Amazon Mechanical Turk web site solves the problem of finding work that they can do wherever and whenever they want.
I really don't know what to say. I have a hard time seeing how this idea can succeed.
Google Answers works because the fees are high, answers quite complex, and experts well vetted. The core idea behind Amazon's Mechanical Turk seems to be to take the success of Google Answers and try to scale it up by a few orders of magnitude.
But there's problems with that. If I scale up by doing cheaper answers, I won't be able to filter experts as carefully, and quality of the answers will be low. Many of the answers will be utter crap, just made up, quick bluffs in an attempt to earn money from little or no work. How will they deal with this?
It seems to me that Amazon has just changed the problem from finding the answer to the problem from the data available to digging out the correct answer from all the crappy answers provided. Filtering crap out of user generated content at large scale is a difficult problem too.
More comments and discussion at Metafilter, TechDirt, Google Blogoscoped, Rob Hof, Jason Fried, Greg Yardley, and Slashdot.
Update: Don't miss the ongoing discussion in the comments to this post.
Update: I have a short quote in a Seattle PI article by Kristen Bolt on Amazon Mechanical Turk.
Thursday, November 03, 2005
Amazon Upgrade and Amazon Pages
For a while, Amazon has had "Look Inside the Book", that lets shoppers see images of a few pages of the book, and "Search Inside the Book", that lets shoppers search for keywords in the text of books.
Amazon is taking the next step toward online access of books. According to their press release, Amazon Upgrade will let shoppers buy both a physical copy of the book and access to an online copy of the entire book. Amazon Pages will let shoppers buy online access to individual chapters or even individual pages from a book. Fun stuff, particularly useful for technical books, I'd expect.
The big question is coverage. "Search Inside the Book" is only available for a fraction of Amazon's catalog. I suspect many publishers be even more skittish about Amazon Upgrade and Amazon Pages, making it much less useful.
Stepping back for a second and looking at this from a consumer's point of view, what would be ideal is if I could access the full text of every book I own online at Amazon for no additional charge. Search it, read it from anywhere, full online access. Something like that certainly would make me visit Amazon more frequently and buy a lot more books.
Unfortunately, this is unlikely to happen. When I buy a book or a CD, I typically think that I'm buying the right to enjoy that creative work for personal use. But, if I understand the debate correctly, many publishers argue that I'm only buying the right to that particular copy of the material, not to any other copies, and that I have to pay again if I want a copy of the work in a different format.
MP3.com, back in the late 1990's, actually had a service that let you access your music library online. If you could prove you owned a CD, they let you access an MP3 of the music from that CD. For this blasphemy, they were pummeled by an orgy of litigation until they fell screaming into dot-com oblivion.
Amazon Upgrade isn't going to let me access my whole library online, but it's a small step closer. I wonder if Amazon will take us the rest of the way down this path.
Update: Just a few days later, word leaks out that Google is talking to publishers about a new service that would let people rent an online copy of a book for a week for 10% of the cover price. [via TechDirt]
Amazon is taking the next step toward online access of books. According to their press release, Amazon Upgrade will let shoppers buy both a physical copy of the book and access to an online copy of the entire book. Amazon Pages will let shoppers buy online access to individual chapters or even individual pages from a book. Fun stuff, particularly useful for technical books, I'd expect.
The big question is coverage. "Search Inside the Book" is only available for a fraction of Amazon's catalog. I suspect many publishers be even more skittish about Amazon Upgrade and Amazon Pages, making it much less useful.
Stepping back for a second and looking at this from a consumer's point of view, what would be ideal is if I could access the full text of every book I own online at Amazon for no additional charge. Search it, read it from anywhere, full online access. Something like that certainly would make me visit Amazon more frequently and buy a lot more books.
Unfortunately, this is unlikely to happen. When I buy a book or a CD, I typically think that I'm buying the right to enjoy that creative work for personal use. But, if I understand the debate correctly, many publishers argue that I'm only buying the right to that particular copy of the material, not to any other copies, and that I have to pay again if I want a copy of the work in a different format.
MP3.com, back in the late 1990's, actually had a service that let you access your music library online. If you could prove you owned a CD, they let you access an MP3 of the music from that CD. For this blasphemy, they were pummeled by an orgy of litigation until they fell screaming into dot-com oblivion.
Amazon Upgrade isn't going to let me access my whole library online, but it's a small step closer. I wonder if Amazon will take us the rest of the way down this path.
Update: Just a few days later, word leaks out that Google is talking to publishers about a new service that would let people rent an online copy of a book for a week for 10% of the cover price. [via TechDirt]
Tuesday, November 01, 2005
Windows Live and Start.com
Bill Gates announced a new effort at Microsoft called Windows Live.
This site supposedly will be the default start page in IE 7. If this will be the default home page for millions of users, it is definitely a trend worth watching.
If you noticed it looks a lot like MSN's Start.com, there's a good reason for that. Sanaz Ahari posts that the same team that developed Start.com developed Live.com.
After using Windows Live for a little bit, I'm pretty surprised that this is their attempt to compete with My Yahoo, My Google, and My AOL. The initial experience is poor; the site has little or no value until it is customized.
Most users won't bother with the effort of extensive customization. Windows Live needs to provide a great initial experience and be trivially easy to use before it's going to appeal to the mainstream.
This site supposedly will be the default start page in IE 7. If this will be the default home page for millions of users, it is definitely a trend worth watching.
If you noticed it looks a lot like MSN's Start.com, there's a good reason for that. Sanaz Ahari posts that the same team that developed Start.com developed Live.com.
After using Windows Live for a little bit, I'm pretty surprised that this is their attempt to compete with My Yahoo, My Google, and My AOL. The initial experience is poor; the site has little or no value until it is customized.
Most users won't bother with the effort of extensive customization. Windows Live needs to provide a great initial experience and be trivially easy to use before it's going to appeal to the mainstream.
Subscribe to:
Posts (Atom)