This post describes work being presented at WWW 2014, by Tien Nguyen
Those of you following recommender systems have almost certainly heard the debate about filter bubbles. This concept, perhaps best articulated by Eli Pariser, argues that recommenders have the potential to trap users into increasingly similar content, isolating them from the diversity of content that makes people rich learners.
We decided to test this concept empirically, using longitudinal data from MovieLens. Specifically, we wanted to answer two questions:
Do recommended movies get narrower as users continue to rate movies?
Do users consume narrower movies — and if so, is this a consequence of taking recommendations?
What we found surprised us.
First, we did find that users of MovieLens received narrower recommendations as they continued to rate movies. And we also found that user consumption (measured by the movies they rated) narrowed over time.
But we found something neither we nor Pariser expected. Users who took recommendations from MovieLens actually narrowed their consumption less than users who didn’t! In other words, it seems that narrowing consumption is just the natural state of developing preferences, and that recommenders can help maintain exposure to more diverse content than users would otherwise consume.
To tackle this question, we had to take on a few interesting challenges. The approach we used may be valuable for others, thus we overview the three most important parts here, and encourage you read our paper (link) for more details.
Measuring the movie content similarity
The traditional method of computing movie diversity via user rating vectors does not tell us how movies are similar in content. Thus, to do so, we use the tag-genome dataset proposed by Vig et al.1. We use tag-genome because it provides an expressive way to describe the content of a movie. This expressive way is better than the traditional measure with user rating vectors. For more details, please consult our paper.
Accessing users’ lifecycles
Our objective in this study is to examine the temporal effect of recommender systems on users throughout their lifecycles. To do so, we have to divide the rating history of a user into discrete intervals called ‘rating blocks’. The rating block is defined as follow:
Given a history of what a user rated, we perform the following steps:
We remove the first 15 ratings because these ratings are given based on the movies suggested by MovieLens in order to gain knowledge about the preferences of new users.
We remove the first 3 months of his rating after the first 15 ratings to:
avoid a case that the user rated movies that he watched before joining MovieLens
give the user sufficient time to learn how to use MovieLens
give MovieLens sufficient time to learn about the user’s preference
For the rest of the rating history, we define a rating block as 10-consecutive ratings. If the very last few ratings can’t form a 10-movie rating block, we would remove these last few rating from the analysis.
Identifying recommendations takers
The purpose of our study is to investigate the long term effect of using recommender systems on content diversity. To this end, it is useful to draw comparisons between two groups of users – one that consumes recommendations consistently over the, and one does not. Thus for each user, we compute the percentages of rating blocks in which he followed recommendations from MovieLens. Then we plot these percentages from the highest to the lowest as in the below figure. From the figure, we can see two natural cut-off points: at 50% and 0%. Hence, we define:
Following Group: users who took recommendations in at least 50% of their rating blocks.
Ignoring Group: users who did not take recommendations at all.
After identifying the two groups of our interests, we examine the effect of recommender systems on content diversity within a group. We measure the shift in means of the content diversity distributions at the beginning and at the end of rating histories of all users in the group. We do the same for measuring the effect on user experience. We call this within-group comparison.
To examine how the effect on content diversity or user experience is different between two groups, we measure the shift in means of the distributions of the two groups at the beginning rating histories of all users in the group. We do the same for the two distributions of the group at end of user rating histories. We call this between-group comparison.
The below figure visualizes our comparison method.
We have listed a few challenges in our study the long term effect of recommender systems on users. There are several other challenges and key points such as how we know what movies come from recommendations, or how to we measure the experience and the consumed movie content similarity of the two groups. For curious readers, please read our paper for more details. See you all in WWW 2014.
1 J. Vig, S. Sen, and J. Riedl. Navigating the tag genome. In Proceedings of the 16th international conference on Intelligent user interfaces, pages 93–102. ACM, 2011.