Tuesday, November 14, 2006

Excellent data mining lecture notes

I have been reading and enjoying the slides from the Stanford CS Data Mining class being taught by Anand Rajaraman and Jeff Ullman.

The talk on recommender systems (PDF) was particularly interesting, with a thorough and insightful look at different techniques (e.g. collaborative filtering, item-to-item, content-based, clustering) for personalization and recommendations. Note that one of the options for the class project is working on the Netflix contest.

The talks on association rules (PDF #1, PDF #2) were fun with some clever applications discussed (e.g. detecting plagiarism) and nice optimizations (e.g. sampling the data set at the limit of main memory multiple times to determine which data can be ignored in a full run).

The clustering talks are also worthwhile, focused on handling very large data sets and clearly explained. Finally, if you are working on web search (or are an evil SEO), it is worth reviewing the talks on page rank and web spam.

Looks like a great class. Impressive that this is all being covered at the undergraduate level.

5 comments:

Anonymous said...

Thanks Greg!

Anonymous said...

you know greg, the universe is so much better because of guys like you.

thank you, thank you very much, for the content you share.

Ben Hosken said...

Thanks for the link Greg. For an under grad class that is a fantastic overview of recs systems. Even for most of our clients that would provide a really good primer on the differences between the systems.

zman said...

I just saw this post months after Greg gave the talk (I was a student in the audience). Just wanted to point out that this is not an undergrad but a graduate course. (I this pretty much all the students in the class were grads)

Aravapalli Rama Satish said...

The links here mentioned shows "page not available". & some requires password. if any body have those materials or any other materials regarding data mining plz sende me at: ramsatpm@gmail.com