Thursday, March 02, 2006

In a world with infinite storage, bandwidth, and CPU power

Google is hosting an analyst day today. I found skimming the 94 slide presentation (PPT, PDF alternative) to be interesting and worthwhile.

In particular, I liked slide 19, 20, and 31, all of which makes it clear that Google isn't losing its wide-eyed optimism.

Slide 31 says that Google's philosophy to new product development is "no constraints" and that they initially ignore "CPU power, storage, bandwidth, and monetization."

Slide 20 says (in the notes) that Google plans to "get all the worlds information, not just some."

And slide 19 (in the notes) talks about how their work is inspired by the idea of "a world with infinite storage, bandwidth, and CPU power." They say that "the experience should really be instantaneous". They say that they should be able to "house all user files, including: emails, web history, pictures, bookmarks, etc and make it accessible from anywhere (any device, any platform, etc)" which leads to a world where "the online copy of your data will become your Golden Copy and your local-machine copy serves more like a cache". And, they say that they want "transparent personalization" that uses user "data to transparently optimize the user's experience ... implicitly."

Google also recommits to a future with personalized search. They say in the notes on slide 12 that they will "introduce new personalization elements" and that they view that as one of two major directions for their efforts to improve relevance rank.

Some might be inclined to dismiss all this talk as the wild fantasies of engineers with too much caffeine, but I think Google does see their ability to build out their massive cluster as one of their primary competitive advantages. I think they do intend to continuing extending their computing infrastructure until everyone everywhere really does feel that they have near infinite CPU power and storage at their fingertips.

[link to presentation via Paul Kedrosky]

Update: It appears Google suddenly removed the PPT file. Ugh. Well, sorry, but, unless you moved quickly, looks like there's no way to see it anymore.

Update: Google just made a PDF version of the slides available.

Unfortunately, this new PDF version of the slides no longer has the notes attached to each slide, so you can't see some of what I was referring to in my comments above.

However, I did download the original PPT presentation. Though I didn't keep a copy, I recently discovered that my Google Desktop cache does contain a text-only copy of notes for slide 12 and most of slide 19. The cached copy ends in the middle of the notes for slide 19.

Here are the notes from slide 12 with the reference to using personalized search to improve relevance rank:
Lead in Search
As the market leader, we need to ensure search doesn't become a commodity. Our focus on search is nothing new. We built our brand on being the best search engine, with the best results, and as our competitors have caught up to us, it's become even more important for us to focus on:
1) Speed
Solve international speed issues and bring international users to US performance
2) Comprehensiveness and freshness
"All webpages included in the Google index and searched all the time" -- Teragoogle makes this possible
Expand to other sources of data
Become the leader in geo search (any search with a geographic component).
New forms of content -- video, audio, offline printed materials
3) Relevance
Leverage implicit and explicit user feedback to improve popular and nav queries
Introduce new personalization elements
4) User Interface
Experiment with several new UI features to make the user experience better
And here are part of the notes from slide 19. Unfortunately, my cached copy ends right before the discussion of "transparent personalization" that I mentioned above:
In a world with infinite storage, bandwidth, and CPU power, here's what we could do with consumer products --
Theme 1: Speed
Seems simple, but should not be overlooked because impact is huge. Users don't realize how slow things are until they get something faster.
Users assume it takes time for a webpage to load, but the experience should really be instantaneous.
Gmail started to do this for webmail, but that's just a small first step. Infinite bandwidth will make this a reality for all applications.
Theme 2: Store 100% of User Data
With infinite storage, we can house all user files, including: emails, web history, pictures, bookmarks, etc and make it accessible from anywhere (any device, any platform, etc).
We already have efforts in this direction in terms of GDrive, GDS, Lighthouse, but all of them face bandwidth and storage constraints today. For example: Firefox team is working on server side stored state but they want to store only URLs rather than complete web pages for storage reasons. This theme will help us make the client less important (thin client, thick server model) which suits our strength vis-a-vis Microsoft and is also of great value to the user.
As we move toward the "Store 100%" reality, the online copy of your data will become your Golden Copy and your local-machine copy serves more like a cache. An important implication of this theme is that we can make your online copy more secure than it would be on your own machine.
Another important implication of this theme is that storing 100% of a user's data makes each piece of data more valuable because it can be access across applications. For example: a user's Orkut profile has more value when it's accessible from Gmail (as addressbook), Lighthouse (as access lis... [...TRUNCATED...]
Update: Derrick made the full notes for slide 19 available in the comments to this post.

Update: The full story about why the PPT version of these slides disappeared is now clear.

When I first posted a few excerpts from the notes to the slides, I had assumed that the notes were intended for the speakers of the presentation. I was annoyed and even a bit angry when the PPT was pulled, not fully comprehending why Google wouldn't want to make the notes generally available.

It now appears that many of the notes in the slides were cut-and-pasted from other presentations, never intended for Google Analyst Day. As mb points out in the comments to this post, the notes for slide 10 contain an odd reference to CBS, something I didn't notice when I originally was reviewing the slide deck.

Even worse, the notes to slide 14 contain revenue projections for next year, also something I didn't notice previously. Because Google published these projections to their website, even briefly, they were forced to file a 8-K with the SEC. In that filing, they say that the notes were "not speaker notes prepared for the Analyst Day presentation."

All very unfortunate.

Google's mission may be "to organize the world's information and make it universally accessible," but some information is not intended to be accessed by all.

Update: After waiting for the press storm to fade, Paul Kedrosky posts the original PPT file with the troublesome notes included.

Update: Nearly two years later, the WSJ reports that "a service that would let users store on its computers essentially all of the files they might keep ... could be released as early as a few months from now."

64 comments:

Anonymous said...

Greg the .ppt got removed from the Google site. Any chance you still have a copy?

Greg Linden said...

Huh, I don't see any way to get to it anymore, you're right.

No, I don't have a cached copy. I guess I should have made one, but it never occurred to me that Google would pull the slides after posting them.

Sorry about that.

Anonymous said...

Looks like they didn't delete them, they just converted them to pdf.

Found the links on this page:

http://investor.google.com/webcast.html

http://investor.google.com/pdf/20060302_nonGAAP_recon.pdf
http://investor.google.com/pdf/20060302_analyst_day.pdf

Anonymous said...

new link is
http://investor.google.com/pdf/20060302_analyst_day.pdf

Greg Linden said...

Thanks, Anonymous. I updated my post.

Anonymous said...

Greg, I do have a copy of the PPT:
check my blog.
http://tomcaster.com/blog/2006/03/03/google-2006-strategy/

Derrick said...

Here is the full text comment of pg 19. What the heck is lighthouse and is there a GDrive project within Google?

Purpose of this slide:
In a world with infinite storage, bandwidth, and CPU power, here's what we could do with consumer products…
Theme 1: Speed
Seems simple, but should not be overlooked because impact is huge. Users don't realize how slow things are until they get something faster.
Users assume it takes time for a webpage to load, but the experience should really be instantaneous.
Gmail started to do this for webmail, but that's just a small first step. Infinite bandwidth will make this a reality for all applications.
Theme 2: Store 100% of User Data
With infinite storage, we can house all user files, including: emails, web history, pictures, bookmarks, etc and make it accessible from anywhere (any device, any platform, etc).
We already have efforts in this direction in terms of GDrive, GDS, Lighthouse, but all of them face bandwidth and storage constraints today. For example: Firefox team is working on server side stored state but they want to store only URLs rather than complete web pages for storage reasons. This theme will help us make the client less important (thin client, thick server model) which suits our strength vis-a-vis Microsoft and is also of great value to the user.
As we move toward the "Store 100%" reality, the online copy of your data will become your Golden Copy and your local-machine copy serves more like a cache. An important implication of this theme is that we can make your online copy more secure than it would be on your own machine.
Another important implication of this theme is that storing 100% of a user's data makes each piece of data more valuable because it can be access across applications. For example: a user's Orkut profile has more value when it's accessible from Gmail (as addressbook), Lighthouse (as access list), etc.
Theme 3: Transparent Personalization
The more data, access, and processing Google can handle for the user, the greater our ability to use that data to transparently optimize the user's experience.
Google Desktop w/ RSS Feeds is a good first example: the user should not have to tell us which RSS feeds they want to subscribe to. We should be able to determine this implicitly.
Other potential examples: User should not have to specify the "From" address in Google Maps; user should not have to specify which currency they want to see Froogle prices in; user should not have to manually enter their buddy list into Google Talk.

Greg Linden said...

Thanks, Derrick!

No idea on Lighthouse and GDrive. Good catch, I haven't heard about those before. Perhaps projects that are only available internally?

Anonymous said...

Hmm, the ppt from the link above seems to be corrupt--Powerpoint choked when it opened it. (?)

Anonymous said...

For example: a user's Orkut profile has more value when it's accessible from Gmail (as addressbook), Lighthouse (as access lis... [...TRUNCATED...]

Goddamn cliffhanger of the month.

Anonymous said...

Oho, just saw derrick's comment. Too bad it doesn't clarify anything.

Anonymous said...

I have some ideas on what Lighthouse might be.

Anonymous said...

sorry, the folder was wrong
now it works
http://tomcaster.com/blog/wp-content/uploads/2006/20060302_analyst_day.ppt

Anonymous said...

the PPT above from Tomy doesn't have any comments in the deck. Slide 19 says nothing more than "Store 100%".

hubert said...

I like the story of the disappearing ppt:
- For some weird (?) reason, if you search for 20060302_analyst_day.ppt on Google, nothing comes up (not even in the Google cache ;-). alltheweb at least finds Greg's mention, Yahoo even tomcaster's copy (which lacks the comments) ...

Anonymous said...

What's GDS? Google Digital Storage?

/Henrik

Greg Linden said...

I think GDS is Google Desktop Search.

The latest version of Google Desktop Search can store the index to your files on Google's servers. See

http://desktop.google.com/features.html#searchremote

Anonymous said...

Hot news, too bad nobody seems able to accurately make a copy and post it on their own site.

Anonymous said...

An important implication of this theme is that we can make your online copy more secure than it would be on your own machine.

Except now the government does not need a warrant any more to access your information, it can be requested with just a subpoena, which is suprisingly easy to get. Admittingly though, for the average user this would be more secure, its just the legal ramifications that worry me.

Anonymous said...

That's actually really weird. Though I happen to download the PPT right after Greg commented on it, I never my self saw the attached comments.

Does Eric Smith mention the GDrive and Lighthouse durinng the Webcast?

Dr Aniruddha Malpani said...

I think it's ironic. Google promises "no contraints" - and then promptly deletes its own content, so it has actually created a constraint and has prevented readers from seeing the PPT ! Sounds like something Big Brother would do. " Constraints " are defined by google , not by the user !

Anonymous said...

Who doesn't have at least one copyright-violating media file on their computer, and how many will trust Google to store it?

Anonymous said...

Great stuff, Greg. It’s a classic blunder that many companies make, inadvertently leaving information in Microsoft Office documents that they don’t want the outside world to see. They also revealed that they have little understanding of usability for investor relations websites. I've explained that here.

MikeZ said...

Obviously, searches will go faster if everything (that means Everything) is on Google's petabyte drives.

But is it a good idea for Our Stuff to be on Their Machine? Some, yes, everything, perhaps not.

Anonymous said...

Great stuff,

John said...

Perhaps lighthouse is an IM app based on the google talk format that implicitly finds your important contacts. Maybe the bandwidth constraint is the use of voice recognition to search gtalk voice calls, or maybe some sort of video conferencing and video conference archive search. I wouldn't put it past them, especially given their recent foray into "voicemail" inside gmail.

Anonymous said...

Google just stole Microsoft's idea of central computers and all the others linking to it being dub terminals. That is a few years old!

Derrick said...

I put the rest of the slide comments up:

http://absolutevalue.blogspot.com

None as interesting as pg 19, but for googleholics like me, more is always better!

Greg Linden said...

Derrick, do you have a copy of the original PPT file? Are you willing to make it available?

Anonymous said...

Google just stole Microsoft's idea of central computers and all the others linking to it being dub terminals. That is a few years old!

You're kidding right? That concept predates Microsoft by a long time.

Anonymous said...

Lighthouse. If you look at the context, it is refering to an interface which can access information. Gmail accesses information from Orkut. Lighthouse must also be a Web-interface. My guess is that it is file access. Similar to Flikr except files. So, when I upload a MS Word document, I can post it to Lighthouse for it to be viewed/reviewed by anyone I designate who has a gmail account. Consider the technology of Sharepoint and put it into Google terms.. makes a lot of sense.

Anonymous said...

he's either kidding, fell asleep in history class or too young to know better.

why even bother with thin clients? embed 3d graphic ards into our monitors and send opengl instructions directly to the monitor. we can all throw ms's os, intel's cpu's into the trash.

mb said...

It's kinda funny that Google got burned by accidental release of information, just like privacy advocates are worried that users will get burned when the Google Grid archives our digital lives.

Today Google filed an 8-K with the SEC since some of the presentation comments contained financial projections. In the 8-K, they say some slides were copied from an "internal product strategy presentation." (See http://www.sec.gov/Archives/edgar/data/1288776/000119312506047267/d8k.htm)

From reading the full notes at Derrick's site, it seems that other slides may have been copied from presentations to CBS. Witness Slide 10 which attempts to sell CBS on exposing their video "assets" to the "wisdom of crowds."

(See http://absolutevalue.blogspot.com/2006/03/analyst-day-slide-comments.html)

Anonymous said...

Yeah infinite storage so they can sell all our secret information to the U.S. government damn it!

Anonymous said...

100% storage ... except when it comes to this instance. Oh, the irony!

Anonymous said...

i can't believe nobody saved a copy of the original PPT. surely someone must have it. ???

Anonymous said...

I have a copy of that ppt. if u want i can send that by mail.

Anonymous said...

if you do then send it to greg

Anonymous said...


> Anonymous said...
> I have a copy of that ppt.
> if u want i can send that by mail


Hi Anonymous,

Do you have the original .ppt file with All the original NOTES intact ??

If so, you could always zip it (it compresses from 19MB to just 6MB) and upload it to the free site http://www.rapidshare.de and then just post the download link :)

Cheers

Anonymous said...

Google is really becoming evil now.. if the whole world keeps there information on google then the information will be in googles hands not in our.. the removing of the ppt is an example...

google can do what ever it can...

after google china issue i don't believe gogole...

looks like Google Grid will eventually turn into "The Matrix" the evil AI from the Movie...

Anonymous said...

I remember Mr. Ellison from Oracle wanting to do this a long time ago... everything will be in a thin client experience with no "pc" needed. I disagree with the part about upgrading everyone to U.S. speeds....try Eastern world speeds with their fiber everywhere. The U.S. needs to catch up man.

Anonymous said...

Can we all get back on topic !!

ie. Someone please post the original .ppt file with the original NOTES

Zip it first (it compresses from 19MB to just 6MB) and upload it to the any free site like http://www.rapidshare.de and then just post the download link :)

Dave

Anonymous said...

Someone has to come up with the original PPT that includes the notes! I've downloaded the few copies available and none of them have the notes. Did they ever exist or is this a hoax!

Ed Gerck said...

...
> An important implication of this
> theme is that we can make your online copy more secure than it
> would be on your own machine.
...
> Another important implication of this theme is that storing 100%
> of a user's data makes each piece of data more valuable

Google likes to throw ideas out there, to use the market as their
computing device for what works. However, by the reasons (*) listed below, google already knows that this one will not fly.

Why do this, then? They might be just trying to raise market awareness for the problems of such approach. Even though Microsoft already had to pull the plug on a very similar program (google "hailstorm microsoft"), Microsoft is still in an ideal situation to try it again and better. Which (given users' notorious naivete') would kill a large market segment
for google's search -- namely, every Internet user. Of course, google's
search appliance for enterprises (and later, a more affordable gadget
for the masses) would not have these problems...

Brilliant preventive move by google, it seems, as it looks for options
-- and time -- to better place its technology.

Cheers,
Ed Gerck

(*) What google proposes is a direct contradiction, for several reasons:

(1) Because you *still* have your local copy, the online copy becomes
an _additional_ risk. Risk MUST increase with the added online copy.

(2) Even if the online copy is encrypted (best case) with a key that
google does NOT have, the file may still be attacked and decrypted by
a variety of methods -- some of them not even cryptographically or
computationally limited.

(3) Losing physical control of your data (by placing a copy under
google's control) cannot be recalled. It's a final revocation of
your sole control rights.

(4) Creates a single point of failure.

(5) The more valuable your data becomes, it also becomes a more valuable target.

(6) Goes agains usual confidentiality principles, including "need to know"
and "least privilege".

(7) Either contradicts legal requirements for confidentiality or makes google legally liable for safekeeping everyone's data against any disclosure risk (including disclosure that is legally mandated, which is always a risk to comply with because any order is potentially disputable).

Greg Linden said...

Paul Kedrosky just posted the original PPT file with the notes included.

Anonymous said...

Why the google paranoia and "evil" witch-hunt? Storing files on the net IS the future. Don't store stuff online if you don't want to, but don't imagine it won't happen and don't say it's a bad thing that should be stopped.

Guru said...

Nice coverage on this matter,Greg.
I was thinking around six months back , that why Google is not delving into Chat service and GDump(my abbreviation).And now I see Google launching its Chat,that's taking a toll on other chatting service.About GDump, I thought of something like a big dumping storage space provided by google ,where users can dump the things they want to share with others and we can have some categorization on them based on regions or types of dumped materials.And now I am hearing about this online storage space by Google .Quite excited!.Waiting to see what it will be like.

Anonymous said...

I love Google, and all that they are doing, I look forward to what becomes of thie especially after all this publicity that they are getting about this.

mb said...

With all the ruckus raised by this, I've yet to see one shred of info on where the original ppt posted by Kedrosky came from, or if it's legitimate.

Greg Linden retrieved partial comments from his Google Desktop cache, but not the original file. Then Derrick mysteriously posted all the comments, but didn't post the original. Now Kedrosky posts what is supposed to be the original but with no explanation of where it came from.

It's a little suspicious that the zip file Kedrosky posted contains a ppt doc dated 3/6/06 at 11:15 PM, while the Analyst Day was held on back on 3/2/06. Did someone change the document after the Analyst Day? Did someone copy the comments from Derrick's site into the redacted slide deck?

What gives?

Greg Linden said...

MB, Google filed a 8-k with the SEC acknowledging that they accidentally published these notes.

Sorry, no big conspiracy here.

mb said...

Sorry if I sounded like a crackpot conspiracy theorist. I'm not disputing that Google published the ppt notes accidentally, or that your Google Desktop cached some of the notes.

I was pointing out that there's not enough information to determine if the ppt published by Kedrosky's is authentic. He doesn't say where he got it, and the file date is four days after the Analyst Day. For all the drama (Google Desktop archives, SEC filings), it would be nice to have some reason to trust the authenticity of the file.

Paul said today (http://paul.kedrosky.com/archives/2006/03/08/the_google_ppts.html) that he personally downloaded the file, and may have changed the modified date of the file. That's good enough for me.

Anonymous said...

Is GDrive a P2P Storage Application?
Has anyone heard of "Pond" beofre?
Are these two systems the same in nature?

Anonymous said...

ha there using microsoft software to present there ish...lol microsoft owns the world and google is no competition. also..with this leaked, microsoft now has the upper edge with this new user storage thing and they can improve on googles way of things...look out google, the new live is coming strong and hard!

Anonymous said...

I would not trust Google at all or for that matter any company with such a vast amount of personal information. Google has been known to track people, it would not stop there. This sounds like they are moving into P2P?

Anonymous said...

I find it quite funny that many people distrust any Government institution with personal information and yet they are readily willing to hand over their desktop and content to some private institution who is much less contrained by law and do as they please with the information ...and with an ultimate goal of simply making money.

It should also be noted that any corporate Goliath can be brought down by simple events so read the fine print. If gdrive is where you will store your cherished data then you may be interested to know what would happen to that data should the company ever become insolvent. Nothing is "free" and there is a price to pay for any service ...pay me now or later is the only difference.

Sorry folks, gdrive is no-go from the get-go for me. As disk drives become less costly by the day, I will and continue store my data at home on a couple external drives ...simple and effective.

Installing a secure remote desk-top to access my data from anywhere allows me the freedom which gdrive suggests. Add to this that my important data is encrypted in case my remote desktop ever gets compromised.

All Microsoft needs to do, for the average home user, is simplify the above process to install and support remote desktop, storage device add-ons and data encryption.

I remember when the server was the king and I didn't like it and neither did my colleagues ...it was called the "mainframe". More power to the desktop baby!!!

Wes said...

It's hard to trust a public company who has the vast amount of information like google does. Google may have more information than the government.

Anonymous said...

Yes, i agree. Google has so much information, that it can become new "big brother" - think it`s normal to make some rules for this monster...

Anonymous said...

Back-up needs to be more than just make an extra encrypted copy to leave at the house! Sure, you can remote into your desktop from anywhere in the world but what would you do if you were remoted into your PC and your house burned down? Where would your data be then? Also, what makes you think the media you are using to copy/save your data is going to be available in 10 or 20 years? You can't find anything to translate music from an 8-track tape and VHSs are on their way out now. In digital format the playback media isn't a concern and you are always up to date!

Anonymous said...

Searching through the web, I found the answer to the mistery of Google Lighthouse:
http://blogs.zdnet.com/Google/?p=254

It seems like it was the codename for Google Picasa Web Albums...

Anonymous said...

well, that's thinking big

well hats off google

Anonymous said...

Is it just me, or do all of these anti-google kook comments seem so astroturfed?

Anonymous said...

Google can't shoot anybody, so I think any worries are overstated. They're trying to make a buck, not take over the world. Now if they're selling private data to the Federal Government, then that IS a problem.

Anonymous said...

Well, beyond comments and doubts, this is gonna happen.
Why?, Well, world's power has get through 3 phases, when the one with most Lands were the most powerfull, then when the one with most capital where the most powerfull and now since 1970 so far, when more INFORMATION you have more powerfull you are.
Fist it all, internet it's a dangerous for any goverment, because it (actually) permits any kind of personal's opinions or argumet at your reach, that can differs with what goverments wants us to believe or accept as true.
This is gonna be solutionated in few time with the "internet privatization" that has been put in march for all the "pipes" companies like AT&T, Cysco, Motorola, etc.
This company want to price more the more you use their pipes, like for examples, youtube.
This mean that you get package for the kind of information you want. Besides that other programas like p2p will be elimenated or reduce to the lower velocitys conections, say nothing about webs with "no correct" material, like page with political information that they find "incorrect" for the people.
That mean the internet's free expretion extermination. So, you'll hear what they want you to hear. Exactly like tv or radio.
And finally the information's centralization where make easier all this, and ofcourse, google bring government access to all your web information, but not, porn information, the really information like: are you believer of such belief, are you a "terrorist", do you have or manage information that can be dangerous for "national security".
The stupid media news about google dening information to goverment is just a news for the people believe that google is gonna a be a guardian of you information more important and confidential, that you can trust it.
Well, you can not avoid it , but you can make money with all this. An example, buy stocks from companies that creats the next generation massive storge information "disks", make a "short sell" of stocks of companies that makes and dedicates to PCs.
This can sound paranoic, but well, is like world moves and always moves.

Anonymous said...

These comments have been invaluable to me as is this whole site. I thank you for your comment.