GroupLens Research has collected and made available rating data sets from the MovieLens web site (http://movielens.umn.edu). The data sets were collected over various periods of time, depending on the size of the set.
You can click the links in the following list or the links in the list of file attachments. The MovieLens 10M data set is not displayed in the file attachments list.
- MovieLens 100k - Consists of 100,000 ratings from 1000 users on 1700 movies.
- MovieLens 1M - Consists of 1 million ratings from 6000 users on 4000 movies.
- MovieLens 10M - Consists of 10 million ratings and 100,000 tag applications applied to 10,000 movies by 72,000 users.
Before using these data sets, please review their README files for the usage licenses and other details. The README files are available as separate downloads below.
The 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011, http://ir.ii.uam.es/hetrec2011) has
released a couple of datasets obtained from the Delicious, Last.fm Web 2.0, MovieLens, IMDb, and Rotten Tomatoes systems. These datasets contain social networking, tagging, and resource consuming (Web page bookmarking and music artist listening) information from sets of around 2K users.
The datasets were generated by the Information Retrieval Group at
Universidad Autónoma de Madrid (http://ir.ii.uam.es):
- Delicious bookmark data (105,000 bookmarks from 1867 users)
- Last.fm music artist data (92,800 artist listening records from 1892
- MovieLens rating data with IMDb and Rotten Tomatoes links (86,000 ratings from 2113 users)
Before using these datasets, please review the README files for the usage license and other details. The README files are available separately below.
WikiLens was a generalized collaborative recommender system that allowed its community to define item types (e.g. beer) and categories (e.g. microbrews, pale ales, stouts), and then rate and get recommendations for items.
It was taken offline in 2009 due to lack of system maintenance and support.
This data set was extracted in February 2008.
The BookCrossing (BX) dataset was collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September 2004) from the Book-Crossing community with kind permission from Ron Hornbaker, CTO of Humankind Systems. It contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books.
Ken Goldberg from UC Berkeley has also released a dataset from the Jester Joke Recommender System. This dataset contains 4.1 million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,496 users.
HP/Compaq Research (formerly DEC Research) ran the EachMovie movie recommender. When EachMovie was shutdown, the dataset was available to the public for use in research. MovieLens was originally based on this dataset. It contains 2,811,983 ratings entered by 72,916 for 1628 different movies, and it has been used in numerous CF publications. As of October, 2004, HP retired the EachMovie dataset. It is no longer available for download.