We currently have three datasets available:
- 100,000 ratings for 1682 movies by 943 users
- 1 million ratings for 3900 movies by 6040 users
- 10 million ratings and 100,000 tags for 10681 movies by 71567 users
Before using these datasets, please review the README files for the usage license and other details. The README files are available separately below.
The BookCrossing (BX) dataset was collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September 2004) from the Book-Crossing community with kind permission from Ron Hornbaker, CTO of Humankind Systems. It contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books.
Ken Goldberg from UC Berkeley has also released a dataset from the Jester Joke Recommender System. This dataset contains 4.1 million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,496 users.
HP/Compaq Research (formerly DEC Research) ran the EachMovie movie recommender. When EachMovie was shutdown, the dataset was available to the public for use in research. MovieLens was originally based on this dataset. It contains 2,811,983 ratings entered by 72,916 for 1628 different movies, and it has been used in numerous CF publications. As of October, 2004, HP retired the EachMovie dataset. It is no longer available for download.