GroupLens has gathered for a photograph

By on

Look at this group of nice folks! This GroupLens group photo was taken in the atrium of Keller Hall, where we work.


GroupLens, Fall 2013, back to front:

  • Jacob, Raghav, Zihong, Kate
  • Yilin, Brent, Morten
  • Pik-Mai, Steven, Dan
  • Zihong, Derian, Fernando
  • Vlad, Anu, Michael
  • Ting, Alison, Loren
  • Vikas, Max, Joe
  • Andrew, Daniel, Tien

Technology Review on Wikipedia’s decline

By on

wikipedia decline

The number of active editors is plotted over time for the English Language Wikipedia


Tom Simonite at Technology Review just published a great piece covering “The Decline of Wikipedia” where they cite my my work (published in American Behavioral Scientist, see also the free preprint) with GeigerMorgan and Riedl exploring potential reasons for Wikipedia’s declining pool of editors (see figure above).  In that work, we manually categorized newcomers to Wikipedia by the quality of their edits and built a set of models to predict which high quality newcomers would continue editing and which ones would leave the project.  We showed that the reason for the decline is not due to the the quality of newcomers but rather the reception they receive; newcomers whose work is immediately rejected and who are sent warning messages about their behavior don’t come back.  It looks like the dramatic change in 2007 corresponds to the introduction of counter-vandalism robots and automated tools in Wikipedia that were used to reject newcomers’ edits.


Similarity Functions for User-User Collaborative Filtering

By on

Typically, user-user collaborative filtering has used Pearson correlation to compare users. Early work tried Spearman correlation and (raw) cosine similarity, but found Pearson to work better, and the issue wasn’t revisited for quite some time.

When I was revisiting some of these algorithmic decisions for the LensKit paper, I tried cosine similarity on mean-centered vectors (sometimes called ‘Adjusted Cosine’) and found it to work better (on our offline evaluation metrics) than Pearson correlation, even without any significance weighting. So now my recommendation is to use cosine similarity over mean-centered data. But why the change, and why does it work?