On the timely issue of preserving privacy in anonymized data sets, Dan Frankowski (researcher at GroupLens, intern at Google, currently attending SIGIR) recently gave a talk at Google called "You Are What You Say: Privacy Risks of Public Mentions". I had a chance to watch it this morning.
The talk discusses some interesting examples of how anonymized data sets can be combined with other public data sets to reveal private information.
The most dramatic example given in the talk is how a former governor's medical records were revealed by combining information from an anonymized data set of medical records with publicly available voting records.
Unfortunately, the conclusion of the talk largely is that it is "hard to preserve privacy" against all forms of this kind of attack while preserving the usefulness of the data set.
Update: Dan gave a nearly identical talk at SIGIR today.