We currently have three datasets available:
- 100,000 ratings for 1682 movies by 943 users
- 1 million ratings for 3900 movies by 6040 users
- 10 million ratings and 100,000 tags for 10681 movies by 71567 users
Before using these datasets, please review the README files for the usage license and other details. The README files are available separately below.
WikiLens was a generalized collaborative recommender system that allowed its community to define item types (e.g. beer) and categories (e.g. microbrews, pale ales, stouts), and then rate and get recommendations for items.
It was taken offline in 2009 due to lack of system maintenance and support.
This data set was extracted in February 2008.
The BookCrossing (BX) dataset was collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September 2004) from the Book-Crossing community with kind permission from Ron Hornbaker, CTO of Humankind Systems. It contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books.
Ken Goldberg from UC Berkeley has also released a dataset from the Jester Joke Recommender System. This dataset contains 4.1 million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,496 users.
HP/Compaq Research (formerly DEC Research) ran the EachMovie movie recommender. When EachMovie was shutdown, the dataset was available to the public for use in research. MovieLens was originally based on this dataset. It contains 2,811,983 ratings entered by 72,916 for 1628 different movies, and it has been used in numerous CF publications. As of October, 2004, HP retired the EachMovie dataset. It is no longer available for download.