Blogs

Kids these days: the quality of new Wikipedia editors over time

I just posted an entry in Wikimedia's blog explaining part of a study I'm working on with some Wikimedians (Wikipedians working at the Wikimedia Foundation). In response to speculation that the English Wikipedia's editor decline could be the result of a general decrease in the quality of newcomers to the site, we performed a hand-coded evaluation of the first few edits performed by editors over time.

Overall, we found that the quality of newcomers has not substantially decreased since 2006. While the rate at which these good newcomers have their contributions reverted or deleted has been rising over time, the survival rate of good new editors has been falling. This supports our working hypothesis that the increased rate of rejection for new editors is causally related to the decline in the survival of new editors.

See the full report here: http://meta.wikimedia.org/wiki/Research:Newcomer_quality

This analysis is part of a larger contribution in submission to a special issue of American Behavioral Scientist on Wikis. Stay tuned.

Wikipedia's Gender Gap

When we saw Noam Cohen's January 2011 New York Times article about Wikipedia's large gender gap, we wondered what light we could shed on the questions and observations raised by Mr. Cohen and the results of the Wikimedia Foundation's 2009 survey. Drawing upon the experience and the data sets that we've accumulated while researching Wikipedia and other online communities for the past decade, we explored Wikipedia's gender imbalance and wrote a paper about our findings. We've recently heard that our paper has been accepted for presentation at WikiSym 2011.

Look below the jump for a summary of our findings. For those who are interested in more details, the full paper is available here on our publications list.

Goog-411 Hanging Up

I'm sad to see that goog-411 is shutting down on November 12, 2010. If you haven't used it, goog-411 is a service you could dial with an 800 number (1-800-GOOG-411) to access a speech recognition system that would help you find businesses in any city and state in the US. I used it frequently on the road, to find places to eat in cities that were coming up on the map. The service was impressively accurate, simple to use, and could be used from *any* phone in the country, generally for free.

Now that I'm an Android user, I confess that Google Maps has just about wiped out the need for GOOG-411. But: I feel sad for all the non-Android folk in the country. What are they going to do now that Google is "putting all of our resources into speech-enabling the next generation of Google products and services across a multitude of languages"? Will their be tools for the non-smart phone generation?

In addition to being sad on its own merits, the shuttering of GOOG-411 is an important reminder that not all useful services can find a way to be paid for. I'm sure that part of the problem for GOOG-411 was that Google could not figure out a way to put ads onto the service without annoying its users. That's a difficult balance; and one that I'm sad Google could not manage for its excellent goog-411 service.

John

Google TV: Finally a device that recognizes that TV is just a way of consuming content

The Read/Write Web story on why Google TV might be a game changer (http://www.readwriteweb.com/archives/google_tv_will_change_the_way_people_live_their_li.php?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+readwriteweb+%28ReadWriteWeb%29&utm_content=Google+Reader)
does a nice job of explaining the many advantages of a television device that lets you display all of the content you have permission to display on one device. It has been an amazingly slow path to get here: producers of television content are on the one hand doing deals to get their content onto the Internet, while on the other hand working to prevent people from displaying that Internet content on their televisions! This is a crazy world! We should be focused on creating fair ways to compensate the people who create content, and then working on making the consumption of that content as free as possible. There are thousands of ways to consume a television show -- most of them not invented yet -- only one of which starts with the show coming over the air, down and antenna, and being displayed on a television device in real-time. I, for one, am very excited to see the Google TV, and to see how much it opens the television platform.

John

Talk to Me -- in German!

A wonderful innovation in the study of foreign languages is the use of the Internet to connect learners to native speakers. In some cases the learners write text that is commented on by the native speakers, while in other cases the two can talk with each other, such as in the Skype foreign language forums. These services provide a wonderful way for people to learn the truly important parts of a language: how to communicate with someone else from a different place and with a different background. Too often language skill acquisition is about formalisms and structure, rather than about communication.

An even more innovative way of learning language may be the ideas that Luis von Ahn is exploring in yet another one of his creative games. He is developing tools that allow native speakers of one language to help translate texts from another language that they do not know. The idea is that the tools will show the native speaker how to translate individual words, and the speaker will then fashion the result into idiomatically correct language in his or her native tongue. It is too early to know how well this will work, or if it does work whether the native speaker will actually be learning the other tongue or just volunteering his time in a useful way. In either case, the idea is fresh and interesting and I look forward to seeing how it works in practice.

Net Neutrality and Innovation

This story in the New York Times (http://www.nytimes.com/2010/08/05/technology/05secret.html?_r=2) discusses negotiations going on between google and verizon so that google services can get special access to Verizon's data network. These sorts of agreements are a serious threat to competition on the Internet. The problem is that only large established players are going to be able to afford to pay for the enhanced service. Startup companies will be unable to get fast access to their services for consumers who might otherwise be interested.

The growth of the Internet is threatened if innovation can be stifled by these sorts of pair wise agreements. In order to encourage freedom and innovation we need to find a way to regulate the types of agreements that can be formed, and to ensure that others have access to the same levels of service quality. These principles are important, and they require regulation in order to create and maintain a fair and competitive marketplace of ideas.

Groundhog Day, Usability Testing, and Creativity

There's a lovely article about the movie GroundHog Day. The article talks about a lot of issues in the movie, but ends with the wonderful quote:

A/B testing is like sandpaper. You can use it to smooth out details, but you can't actually create anything with it.

This thought reminds me of Don Norman's comment that one of the risks for the field of CHI is that we become so focused on analysis that we never actually create anything new.

John

On Critical Mass

I finally got around to carefully reading "A Theory of the Critical
Mass..." by Oliver, Marwell, and Teixeira. Now I'm asking: what took
me so long?

The article formalizes the notion of critical mass in collective
action. It identifies two main independent variables that can
influence the "probability, extent, and effectiveness of group actions
in pursuit of collective goods":

  • The form of the "production function" that relates "contributions of
    resources to the level of the collective good". Two important
    categories of production functions are: (a) decelerating: the
    "first few units of resources contributed have the biggest effect on the
    collective good, and subsequent contributions progressively less"; (b)
    accelerating: "successive contributions generate progressively
    larger payoffs; therefore, each contribution makes the next one more
    likely."
  • The "heterogeneity of interests and resources" in the population of
    potentially interested actors.

The authors then show that the problems and opportunities for
collective action are very different for accelerating vs. decelerating
production functions and for homogeneous vs. heterogeneous populations
of actions. I'm not going to summarize the findings: the paper is a
joy to read, so I mostly want to urge you to do that.

However, there were a couple of ideas that I found particularly
relevant to issues in open content systems that I care about, so I did
want to mention them.

First, this work looks at critical mass in "public" goods, where all
the value is created by a group of people. This is true for many open
content systems: Wikipedia and OpenStreetMap are two good
examples. However, this isn't true of other systems, including our
Cyclopath bicycle routing system. Cyclopath began with a nearly
complete transportation map created from Mn/DOT data and with a good
objective route-finding algorithm that did not require user
input. While we have shown that user input improves route-finding
significantly and that algorithms based on user input are better than
purely objective algorithms, I think it's fair to say that most of the
value of the Cyclopath "good" already was present before any user
contributions were made. It's interesting to consider how the concepts
of this paper can be applied to a system like Cyclopath.

Second, Oliver at al. show that with decelerating production
functions, the optimal outcome would be achieved if the *least*
interested people contribute first and the *most* interested people
contribute later. This obviously isn't the way it usually works. They
point out that one way to make this happen is for the most interested
parties to "hold back"; perhaps they can offer "matching
contributions" to entice less interested parties to contribute early
in the process. This might suggest new strategies for
intelligent-task-routing-like strategies to elicit participation in
open content communities.

Third, many of the illustrative examples the authors give concern the
different opportunities for collective action in "upper middle class"
vs. "lower income" neighborhoods. I wonder: what's the equivalent of
an "upper middle class" open content system?

Fourth, the notion of "interest" presumed here is one of direct
tangible personal benefit: if I give N dollars, I'm increasing the
chances that I'll receive M dollars (M >> N). However, we know that
many contributors to open content systems (and many 'volunteers', too)
contribute for other types of reasons, e.g., they "believe" in the
public good, they are altruistic, or they want to build a
reputation. For example, in Cyclopath, our most active editors don't
request many routes. For another example, other researchers have shown
that there are many users in discussion forums who just answer
questions and don't ask any of their own.

Fifth, finally, and simply, I'd like to empirically measure the
production function in various open content systems. I suspect that in
many cases it is decelerating: i.e., early units of contribution are
proportionally more valuable. I'd also like to measure this for
individual users. Doing this calculation requires a way to measure the
global quality of an open content system as well as the quality for a
particular user. We can do both of these for Cyclopath. We can do the
latter for MovieLens... not sure about the former.

Netbeans + Subversion + Windows XP

For my teaching I've been using Netbeans this semester, which has overall been wonderful.  Overall Netbeans has been an even better experience than Eclipse for teaching -- though both have a steeper learning curve than I'd prefer.
I've enjoyed Netbeans' built-in subversion support.  (This is not a differentiator with Eclipse, just a comment.)  However, getting subversion working reliably with netbeans on a windows box is a bit fiddly, and the online documentation makes it seem easier than it is.  It's easiest to break the setup into steps, and get each of them working before moving on to the next step.  (Part of what makes the documentation a bit complicated is that there are many alternatives.  I'm just going to describe one simple alternative, that assumes that you have a shell account on the Unix computer that contains your subversion repository.)  Here are the steps:
1. Get plink (from putty) working on your box.  Plink will be used by CollabNet to tunnel svn+ssh subversion connections.  First install the full putty from the web site.  Then create a .ssh key for putty using ssh-keygen, store it in a safe place on your Windows computer, and install the key in the authorized_keys file on your Unix server.  Then test with:
./PLINK.EXE -v -l <username> -i c:/path/to/key/file/id_rsa_putty.ppk <remote-host>
The result should be an ssh session to your remote host.  (plink is not a good client to actually use for ssh -- prefer putty -- but this is a simple test that it's working.)  (I'm using forward slashes in the above because I run it in cygwin shells.  You'll need backward slashes if you run it in the traditional unix command console.)
2. Install CollabNet's Subversion Client.  They have a simple installer.
3. Look in your Application Data directory for the Subversion subdirectory.  (It's possible you have to run the Subversion Client once to cause this directory to be created.)  Edit the config file in that directory.  Look for the section called "tunneling". In that section, after all the comments, add a line:
ssh = c:/Program Files/putty-0.60/plink.exe -v -l <username> -i c:/path/to/keyfile/id_rsa_putty.ppk
Here you use forward slashes, because the Subversion Client will translate them.  The path to plink.exe should be changed to wherever you put plink. Adding this line to the config file tells the Subversion Client what command to use with URLs of the form svn+ssh.
4. Test the subversion client from the command line with:
./svn ls svn+ssh://<remote-host>/path/to/remote/svn-repo
If this works you have a working subversion client on windows, which is 80% of the battle!
5.In Netbeans go to Tools/Options/Miscellaneous/Versioning and set the Path to the SVN Client to:
C:\Program Files\CollabNet\Subversion Client
(or wherever you installed Subversion).
6. Right click on a directory and you should be able to use Subversion Update and Commit commands!
Occasionally when things are tricky the netbeans client gets confused.  I just use the command-line client to do an svn update, and all is usually well after that.
One issue to watch out for: subversion is very sensitive to version changes.  The working copy (checked out version) will be updated by the subversion client to the style that version of the client expects.  So if you use both a netbeans client and a command-line client you should make sure they're the same "point" version number.  (E.g., They should both be 1.6.x, though they can have different xs.)
Good luck!
John

An Exciting Time for Cyclopath

One of the premier research platforms around here is Cyclopath, a geowiki and route-finding service for Twin Cities bicyclists.

Now, we've expected Google's announcement that they were getting into the bicycle routing business for some time. But that doesn't mean yesterday was relaxed for us. :)

After sleeping on it, (and speaking for myself) I think this development is actually either neutral or good. We're in a different niche than Google -- we're focused on open content and community, not just maps, and we're strongly local with personal connections to the cycling community and local agencies. And on the plus side: almost all of the reactions from the community I saw on the social web were very supportive of us, and I've never seen so much passion at Cyclopath Headquarters as I did yesterday!

We'll continue to write and publish consistent with our excellent track record (e.g., of the 5 papers we've submitted to top-tier conferences, 4 have been accepted on the first try and 2 have been nominated for Best Paper).

Details on what Google's announcement means for Cyclopath, from the user perspective, are here.

Lastly, and off-topic, please follow @grouplens and @cyclopath_hq on Twitter!

Syndicate content