I've been responding to these individually, but that seems inefficient, so I decided to go ahead and post a list of a few of my favorites.
The focus in this list is on breadth, mostly surveys that provide a good introduction, mostly work that used very large data sets. Follow citations on Citeseer if you want to explore in more depth.
- John S. Breese, David Heckerman, Carl Kadie, "Empirical Analysis of Predictive Algorithms for Collaborative Filtering", 1998
- Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, John Riedl, "GroupLens: An Open Architecture for Collaborative Filtering of Netnews", 1994
- Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl, "Analysis of Recommendation Algorithms for E-Commerce", 2000
- Paul S. Bradley, Usama M. Fayyad, Cory A. Reina, "Scaling Clustering Algorithms to Large Databases", 1998
- Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey, "Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections, 1992
- Jeffrey Dean, Monika R. Henzinger, "Finding Related Pages in the World Wide Web", 1999
- Luiz Barroso, Jeffrey Dean, and Urs Hoelzle, "Web Search for a Planet: The Google Cluster Architecture, 2003
- Alistair Moffat, Justin Zobel, David Hawking, "Recommended Reading for IR Research Students, 2005
- Ian H. Witten, Alistair Moffat, Timothy C. Bell, "Managing Gigabytes: Compressing and Indexing Documents and Images, 1999
- Zoltan Gyongyi, Hector Garcia-Molina, Jan Pedersen, "Combating Web Spam with TrustRank, 2004
- Sergey Brin, Lawrence Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine", 1998
- R. Guha, Ravi Kumar, Prabhakar Raghavan, Andrew Tomkins, "Propagation of Trust and Distrust", 2004