A new MovieLens data set was made available today.  Known internally as the 10M100K data set, it contains 10,000,000 movie ratings and 100,000 tags.  Previous MovieLens data sets have all contained user ratings data, but this new set is ten times as large as the last. 

This new release also contains, for the first time, tag data.  Tags are small bits of user generated metadata about movies.  MovieLens first added tagging features two years ago, in January 2006, and has since grown an active movie-tagging community.

Also included in the release is a tool for splitting the ratings data into subsets for cross-validation of prediction algorithms.

The read-me file and the data are available for download on the MovieLens Data Sets page.

Comments are closed.