Sunday, September 12, 2004

I, Cringely on distributed backups

This week's I, Cringely column is on a distributed backup system that takes advantage of the free space on everyone's machines:
    Here's my idea for a data backup service I call Baxter. This is NOT a virtual drive available on your system, but a virtualized backup system that works transparently and requires some time to restore your data.

    It's a RAID system using donated disk space on a wide area network. Your data is compressed, then cut into chunks, and those chunks are distributed to dozens of places with enough forward error correction thrown in to cover any storage that is lost or happens to be down when recovery is needed. The data is both encrypted (on the customer end, so unencrypted data never enters the system and that vulnerability is eliminated) and split into chunks so no one person has enough to make any sense of it even if they could decrypt it.
It's a clever idea, but not a new one. Farsite, FreeNet, and many others have explored distributed file systems using free space on client machines, though they didn't focus on backups.

Coming out of the Sloan Program in 2003, one of the startup ideas I explored was a distributed backup system fairly similar to what Bob Cringely now proposes. As I got deeper into it, the idea started to look much less attractive.

First, the service isn't particularly attractive to businesses. Disk space is cheap. Concerns about the reliability of this service and of storing sensitive data out on the cloud of unknown machines would trump any minor cost advantage.

Second, bandwidth is an issue. You certainly can't use machines that are connected over modems. And slow upstream connections on cable modems or other broadband are an issue too. A 10G backup over a 256Kbit upstream connection (very common with DSL and cable) would take about 4 days.

Third, it isn't obvious there really is much of a cost advantage. Under Cringely's proposal, I pay $4/month and lose disk space equal to what I need to backup. But, I can buy a 40G internal drive for $40 and use that for backups. Or spend a little more ($100-200) to get an external Firewire/USB drive for easier installation and backups. Cringely's $50/year and half of my disk space just isn't an obvious win over just buying another disk.

Nevertheless, it's an interesting idea. Like Seti@Home and others have done for idle CPU, it would be nice to find a way to use all this idle disk space sitting on the network. But it isn't the obvious killer app Cringely makes it out to be.

Update: A year and a half later, it looks like a startup, Allmydata.com, has implemented Cringely's idea almost exactly as he described it.

4 comments:

Greg Linden said...

Hi, Maarten. Great point on the advantages of remote backup over a network for protection against burglary, fire, and other localized disasters. Thanks for mentioning it.

Travis said...

Remote backup is definitely a requirement to ensure against physical problems. I actually made pretty much the same system as Cringely describes for a University project. Architecture docs are posted here:

http://pages.cpsc.ucalgary.ca/~reeder/cpsc502/

om said...

i know i'm way behind on commenting on this, but i think a killer feature of such a system would not just be backup, but making your data accessible anywhere. such a feature would be quite useful.

Anonymous said...

imho the killer app saves huge costs !

think replacing the daily helpdesk trips and tape replacing, documenting, storing of tapes, cleaning tape drives, buying libraries : with just trusted machines using a backup client !

Think of a company that has 10 TB of data to backup.
Doing that on tape is very expensive.

Say that company has 1000 trusted machines on its LAN, each with 50 GB to spare, each with backup client.
That would give me the ability to backup 50 TB, so i can even save a load of old backups fifo style.
Need more backup space ? Use a few more pc's !

Wow this is PURE GENIUS !!!