Zotero Citation Manager

By on

I’ve been a moderately happy citeulike user for quite some time, but their community-developed scrapers have been growing less consistently useful over time – more and more, I have been forced to add citations to my repository by hand. I’m going to try something new, perhaps you’ve heard of it?


Zotero is a Firefox extension that is (also) able to scrape citations from many types of web pages. It also incorporates a nice feature set for tasks like organizing, note taking, and exporting. The site claims it will support sharing citations in the future, though I admit that I’ve never used citeulike’s community features in a serious way. Finally, I was able to move my citations from citeulike to Zotero in about a minute (!).

Have fun citing things!


Wired Article on Netflix Prize

By on

Wired magazine recently published an interesting article on the Netflix Prize:

This Psychologist Might Outsmart the Math Brains Competing for the Netflix Prize

The article is a fun read. It provides some perspective on the importance of tuning algorithms and the potential for combining many algorithms for one prediction task. It also makes it clear that the prize-seeking community is very open to sharing results and techniques. Cool.

I would have been interested in reading more about why the researchers think going from 8% RMSE improvement to 10% improvement will be so hard. Is is because they’ve (finally) bumped up against individuals’ abilities to accurately represent their own movie preferences on the 1-5 scale? I ask, because I had thought we were already there before this contest! How much room is there for algorithms to get better at predicting our individual rating idiosyncrasies and inconsistencies?


Community Policies Fighting Communities

By on

Two interesting examples of businesses fighting their communities came up this morning in my reading.

Example 1.  Amazon is selling Bill Clinton’s new book, Giving.  They also deleted about 20 reviews from the product page.  Presumably, these reviews were mostly low, mostly given by folks who don’t agree with Bill’s politics.  Amazon, however, is in the business of selling books, and low reviews posted by people who are clearly not the target demographic do not help them make money.  Thus, the reviews go away, and people get angry.  Would better moderation interfaces, or different moderation policies affect the need for this type of "censorship"?

Example 2Yahoo! Answers developers post a blog entry that basically says: stop using Yahoo Answers as a social space, and start asking intelligent questions. There are two prominent sub-communities of YA that are clearly not about Q&A discourse.  There is the polls community, where people ask questions like "which pair of jeans should I wear tonight", and the politics community that focuses on flame wars.  Why doesn’t Yahoo! seem to care about these thriving sub-communities?  Well, perhaps they don’t promote Yahoo’s vision of social search, where a huge Q&A database helps Yahoo reclaim the #1 search engine slot.

These examples illustrate to me the challenges that businesses face in leveraging users’ work as part of their core business.


OpenSocial: impact on recommenders

By on

Google and several partners recently launched OpenSocial, an open API for accessing social network information. If you haven’t read about this yet, check out the NY Times article for a primer.

Social networks aren’t the only domain where sharing information between sites is a much-requested feature. Indeed, we’ve all invested significant time in building (perhaps redundant) databases of our preferences in recommendation systems such as Netflix, Amazon, and MovieLens. Wouldn’t it be nice if those ratings would be portable? Then I could sign up for Netflix, and instantly bring my hundreds of movie ratings with me.

Of course, there are a number of issues with ratings portability, such as:

  • What is an entity, and how is it uniquely identified?  (e.g. Is the entire first season of the TV show The Wire a single "tv show entity", or several?)  How is an entity’s category/type determined?
  • How do I communicate my preferences for an entity to the system? Do I use a single 5 star scale, a set of yes/no questions, open text, or something else?
  • Why would any large company wish to share it’s ratings database? (Yes, Netflix and MovieLens have published ratings data sets, but those ratings have been anonymized)

While these are significant issues, I wonder if an open ratings API (built on a platform such as OpenSocial) would clear the path to more ubiquitous recommendations. We could allow MovieLens users to publish their ratings to OpenSocial. Myspace developers could then create a variety of social- or algorithm-driven recommendation systems based on this new source of data. If we were running MovieLens to make money, we’d probably try to build these apps ourselves, and sneak in viral features to bring more new users back to our site.

In the short-run, there are other nice benefits of OpenSocial for small-ish recommendation sites such as MovieLens. For example, could we do away with our "buddy" feature, and just allow users to import their social network from Myspace?


(Note: some newer recommendation companies like Flixter and iLike are OpenSocial "launch partners").

What Makes a Tech Center?

By on

This morning’s local paper featured an article about Control Data Corporation, a major player in the olden days of mainframe computing.

By the late 1970s, [Control Data] had made the Twin Cities one of five U.S.
computer industry centers (a distinction that is now only a memory). By
encouraging entrepreneurship among employees, it spawned dozens of
local spinoff companies, including the supercomputer firm Cray Research
(also now gone). At its peak, CDC had 60,000 employees and about $5
billion in revenue.

This summer, I went and worked in Silicon Valley, to see what a modern day computer industry center was like.  It was indeed an exciting environment, full of new companies, people with ideas, and support for those ideas.  Contrast that with Minneapolis (a city I very much love), where technology innovation feels particularly limited to a few industries.  And yet, Minneapolis/St. Paul ranks as the #1 best metro center for business.  Where are the tech startups?

It feels as though Minneapolis is prime for a computing technology resurgence.  But I’m not sure what the catalyst of that resurgence will be, or when it will happen.