Saturday, January 26, 2008

Optimizing Web 2.0 applications

Microsoft Researchers Benjamin Livshits and Emre Kiciman have the fun idea in their recent paper, "Doloto: Code Splitting for Network-Bound Web 2.0 Applications" (PDF), of automatically optimize the downloading of Javascript code for large Web 2.0 applications in a way that minimizes the delay before users can interact with the website.

The basic concept is simple but clever. Look at the code to see what Javascript functions are called immediately and which are called only after a user interacts with the website. Then, replace the functions that are not needed immediately with stubs that get the Javascript function body on demand. Finally, as network bandwidth becomes available, preemptively load the stubbed functions' code in the order of the likelihood that it will be needed quickly.

From the paper:
[We] rewrite existing application JavaScript code based on a given access profile to split the existing code base into small stubs that are transferred eagerly and the rest of the code that is transferred either on-demand on in the background using a prefetch queue.

When a new JavaScript file is received from the server on the client, we let the browser execute it normally. This involves running the top-level code that is contained in the file and creating stubs for top-level functions contained therein.

When a function stub is hit at runtime, if there is no locally cached function closure, download the function code using helper function blocking download, apply eval to it, ... cache the resulting function closure locally ... [and then] apply the locally cached closure and return the result.

When the application has finished its initialization and a timer is hit, fetch the next cluster [of Javascript code] from the server and save functions contained in it on the client.
The paper motivates this optimizer with some pretty remarkable statistics on how big Web 2.0 applications are becoming. The paper says, for example, Pageflakes is over 1M gzipped, GMail nearly 900k, and Live Maps 500k. By automatically splitting the code and data into pieces that are needed immediately and those that are not, "the time to download and begin interacting with large applications is reduced by 20-40%."

As Web applications add more and more functionality to try to match their offline competition, their code will become bigger and bigger, and techniques like Doloto are sure to be necessary.

On a related note, if you have not seen Yahoo Steve Souders' work on exceptional performance for Web 2.0 sites, you definitely should. What makes Steve's work so compelling is that it correctly focuses on the user experience -- the delay a user sees when trying to view a web page -- and not the server-side. Because of that, much of his advice and the YSlow tool look for ways to reduce the size of the download and number of connections needed to render a web page.

Update: About two years later, Doloto becomes publicly available.


paul.querna said...

For what its worth, Bloglines beta is effectively already is doing this.... Dojo, which bloglines beta is based on, also has the ability to do dojo.require on demand...

Anonymous said...

Didn't do something similar with java applets in the 90's?

-- dave