Are you trying to launch an online site for customers? Did you know that on an average, 60% of users do not return to a site after their first visit?

In this research, we discover factors that predict whether first-time users return to MovieLens, our movie recommendation site.  A model based on these factors successfully predicts 70% of returning users (and non-returning ones).  Notably, the best single predictor of user return is the diversity of features explored in the user’s first session!  Along the way, we develop a process and a metric for activity diversity — one that can be applied to any site or context. Interested in further details? Read on!

Activity diversity of a user on a site
            Activity diversity of a user on a site


Upon discussion with psychologist Mark Snyder, a Professor at the University of Minnesota, my collaborator (Joe Konstan) and I learned that a volunteer who is aware of more ways of getting involved with an organization is likely to remain committed longer than one who isn’t. For example, a volunteer for Red Cross who is rotated through different activities such as giving out cookies, checking people in for blood drives, and helping the hurricane victims is more likely to be committed to Red Cross in the long run compared to a volunteer who only gives out cookies. We wondered if that is true with online sites too. A site like MovieLens has 17 different and very diverse features (see figure). We hypothesized that someone who has tried out more features in the first session is more likely to return to the site than one who hasn’t.

But how do we capture the diversity of features tried? From the above figure, we see that features, like providing ratings and providing tags, are very similar (annotations), but these are very different from inviting your friends to the site, or managing your personal profile. So, a simple count of the number of features is not a good diversity metric. Also, most existing diversity metrics such as the Gini-Simpson Index increase with the level of overall activity. This will make it difficult to tease out and study the effect of diversity independent of the amount of activity. All these challenges led to the design of a new metric.

Based on a distance-tree analysis of the online site’s features, in this work, we developed a metric called DSCORE that addresses all these challenges. First, we used a card-sorting technique to organize all of the site’s features into a tree (asking site users and administrators to cluster together related features). We then computed DSCORE which aggregates the diversity of experiences, giving higher weight to activities that are farther apart in the classification tree. Together with another metric called ASCORE for the total amount of activity, we build machine learning models for user retention. Our results show that diversity is the best single predictor of first-time user retention. Based on future experiments that can confirm our findings, sites could be re-designed for a more diverse first-time experience that can get new users to stay longer.

Check out my paper on early activity diversity for more details!

Written by

Raghav Pavan Karumur is a Ph.D. Candidate at GroupLens Research. Broadly, his research interests include recommender systems, personalization, user modeling, social computing, applied machine learning, and feature engineering. He applies concepts from social science, behavioral science, and organizational theory to understand individual as well as crowd behaviors in these communities to personalize and design better systems for users, recommend items matching their preferences, and devise strategies that increase their commitment and motivate contributions in these systems. He was a member of IEEE and is currently a member of ACM. He has published in top-tier conferences in Human-Computer Interaction such as CHI, CSCW, UMAP and RecSys and top-tier journals such as IJHCI and Information Systems Frontiers. He has reviewed and has served on the Program Committee for conferences and journals such as UMAP, RecSys and CHI, DIS, and Journal of Intelligent Information Systems.

Comments are closed.