I've been a moderately happy citeulike user for quite some time, but their community-developed scrapers have been growing less consistently useful over time - more and more, I have been forced to add citations to my repository by hand. I'm going to try something new, perhaps you've heard of it?
Zotero is a Firefox extension that is (also) able to scrape citations from many types of web pages. It also incorporates a nice feature set for tasks like organizing, note taking, and exporting. The site claims it will support sharing citations in the future, though I admit that I've never used citeulike's community features in a serious way. Finally, I was able to move my citations from citeulike to Zotero in about a minute (!).
Have fun citing things!
Max
Interesting article on Code Climber about why working at night can seem so much more productive. I think he's nailed the most important issues (e.g., No interruptions!) -- though he's missed one of my faves: if I'm up late working, it's important to me to get it done, and since I'm not going to sleep until it is done, I'm harder to distract. (During the day, Google Reader is taking away from the man. At night, it's taking away from me!)
Reminds me of Don Knuth's arguments about why he doesn't use email. He says that most the work he cares about requires long periods of concentration, and that email distracts him from the concentration he needs. It's interesting that for me lots of the work I care about the most requires lots of concentration -- but that I still use IM, email, &c. Perhaps these tools help me stay connected to my network, and perhaps my most important jobs are related to being connected to that network. It would be interesting to see what sort of work would get done if we separated from that network for a while, and did more solo work. The productivity would be very different. Better? Worse?
John
The Internet is a silly place, and one of the silliest phenomena is a meme known as Rickrolling -- if you follow a link, expecting e.g. burrito recipes, and instead find yourself watching Rick Astley and a backflipping bartender explaining how Mr. Astley will never give you up nor desert you, you've been Rickrolled. And sometimes, you might be Rickrolled en masse -- on April 1, all YouTube "Featured Videos" were really Rick Astley, and readers of fark.com and other sites engineered a coordinated and successful bid to place Rick Astley on the sing-along schedule for the April 8th New Yorks Mets game (fans booed).
Anyway, how many Americans have been Rickrolled? According to SurveyUSA -- a highly respected polling firm well-known to political junkies -- at least 18 million.
Lately I've been doing some Python multi-threading to make the best use of some of our amazing server resources. As I was pondering the reasons why one of our 8-core servers reported 83% idle despite 8 threads banging away, I re-discovered the Global Interpreter Lock.
BLECH!
The GIL enforces Python's requirement that only a single bytecode operation is executed at a time. My nicely coded multi-threaded app was only being executed serially!! Sadly, this seems unlikely to change, even in Python 3000. Last year Guido said:
"Just Say No to the combined evils of locking, deadlocks, lock granularity, livelocks, nondeterminism and race conditions."
I was brought up to believe that threading was dirty and independent communicating processes were the way to go. But even I realize that this just isn't practical in these days of GUIs, multi-core processors, and application servers.
Why does the Python community accept the GIL? Is it because most people only use Python as a scripting language? Are there simple workarounds (e.g. not forking, shared memory, or the like) that I'm missing?
An article on Read/WriteWeb says that Google Maps are now editable by anyone with a google login (which surely must be everyone by now). The idea is that you can change the address, location, and link for any of the designated places on the map. This is a very cool step forward in crowdsourcing. It will be interesting to see how well they control vandalism.
One of the interesting questions is what methods Google is using for watching for vandalism.Many crowdsourcing sites seem to be using a "security through obscurity" approach. I wonder if it will prove over time that open approaches that publish effective security methods work as well? The ESP Game seems like one good example: the rules are published, but by working to ensure the players are strangers, most attempts to hack the system are not effective. (The slashdot attack is a good counter-example.)
What do you think? Excited? Worried?
John
Anyone with an entrepreneurial spirit and a good idea for a recommender system should take note. Strands will invest $100,000 in the best recommender system start up. Finalists to present, and winner to be announced at RecSys08.
There's a fun article in the Winter 2007 AI Magazine about "Machine Ethics". The basic argument is that as machines get more and more in control (e.g., planned army robots that would fire weapons), it is more and more important (to humans) that they behave in an ethical manner.
The article argues that there is a fundamental difference between implicit and explicit ethics. Implicit ethics would be programmed into a machine by its designer, much as Asimov's imagined three laws of robotics. Explicit ethics would also be programmed in by a designer, but at a more fundamental level: the robot would be able to compute the ethics of new situations based on a fundamental understanding of ethics. The authors argue that explicit ethics are necessary for several reasons:
1) So it could explain why a particular action is right or wrong by appealing to an ethical principle.
2) Because otherwise it would be lacking something essential to being accepted as an ethical agent. (Kant admired agents that work consciously from ethical principles more than those that work slavishly from rules.)
3) So it could adjust to new situations, evolving the appropriate ethics.
(1) is a red herring: explanation systems often appeal to principles they don't understand in any sort of principled way. For instance, in our work on explanations for recommender systems some of the most effective (for humans) explanations were only loosely connected to the operation of the recommender.
(2) is in contradiction with an argument the authors make later in the paper. They argue that even though computers won't be conscious in the near term, they should be accepted as ethical agents if they act ethically. Agreed! So, then, all we need is that they act ethically.
(3) is intriguing. On the one hand, it would be remarkable if an AI agent could evolve new ethical patterns for situations it has never seen, based on core ethical principles.On the other hand, the results of that evolution might be very surprising. For instance, if a military robot were to decide, based on ethical principles, that it ought to prevent an attack on Iran that a general wishes to carry out, how would that be perceived, by the military, by the loyal opposition, by the anti-war effort? What if the robot assassinates the general to prevent the war? Overall, given our track record in predicting the performance of complicated software systems, I have some doubts about this approach.
I liked a later quote in the article, which says that ethical relativists cannot say that anything is absolutely good, even tolerance.
John
I just set up an account for my daughter with audible.com, and downloaded a book for her to listen to on the bus. The good news is that is appears to be all set up now, and ready to download to her iTouch. The bad news is ... everything else.
We spent nearly an hour buying a single audio "book", and getting it copied down to her computer. The problems were nearly all related to digital rights management, though I'd class them in two groups: fundamental, and incompetent.
The fundamental problem is that DRM makes downloading and using media much more difficult. It restricts which programs and devices you can use it with. Further, is it any surprise that downloading a program whose fundamental purpose is to prevent proscribed uses of a media file makes it more difficult to successfully use that media file? In the case of audible, we had to download a program to my daughter's laptop that insinuated itself into firefox and itunes in unspecified ways, so that she could download the Audible files she had paid for to her laptop, and thence to her iPod. This program failed to install itself properly the first time -- apparently it doesn't check to see whether itunes is running, but fails mysteriously if it is. When we tried to download the book we had paid for to her computer, we kept getting mysterious error messages. These went away once we reinstalled the software.
The problems of incompetence were mostly caused by a user interface that tries to pretend that the challenge is easier than it actually is. The Web site makes a big thing out of the four simple steps required to get going with Audible. Step 1 is "Pick a plan". We didn't want to sign up for a plan, so it took us a while to figure out that you can buy books without a plan. Step 2 is "Download Audible software". In the description it says "You can also use ITunes to download audio ...". We decided to go that route initially, before figuring out that apparently the audible.com software is required in addition to iTunes. It didn't help that the iPod Touch is not listed in the "supported devices" list, so we had to guess which software we need. Step 3 is "Purchase and download". Our problems with this step are described in the previous paragraph. Step 4 is "Transfer your audio to your AudibleReady device". Here the solution was easy: we just had to figure out that Audible had created a new sort of "playlist" in iTunes, and that we had to tell iTunes to sync that playlist with the iPod Touch. A common step in iTunes -- but it would have been nice for Audible to walk us through that step.
A very frustrating hour later, my daughter is pretty happy with having her book ready for the bus. I'm much less happy. Audible seems like a company that is going to fail if they don't figure out these user interface issues. What, then, will happen to the DRM that requires a "phone home" to install the book on a different device? (Yes, even the iPod Touch will one day seem outdated.) Even though I'm eager to listen to "books on pod" while I exercise, I refuse to buy these DRM-crippled alternatives. Yes, convenience is worth a lot, but more important to me is the principle that media that I buy must be usable for me into the murky future, independent of the survival of any one company, format, device, or business model.
What do you think?
John
Newsweek has an article that argues that Web 3.0 is going to be all about injecting the experts back into the information production and dissemination process. I think they've gotten the big picture badly wrong, but the saddest quote in the article is about why one of the 'experts' they interview thinks this change will come about:
Fueling all this podium worship is the potential for premium audiences—and advertising revenue. "The more trusted an environment, the more you can charge for it," says Mahalo founder Jason Calacanis, a former AOL executive who was previously involved with several Web start-ups. It's also easier to woo advertisers with the promise of controlled content than with hit-and-miss blog blather. "Nobody wants to advertise next to crap," says Andrew Keen, author of "The Cult of the Amateur," a jeremiad against the ills of the unregulated Web.
Pretty amazing that the argument is that advertisers are going to fight to prevent the amateurs from taking over information processes so they can protect their advertising revenue. (Newsweek is, of course, heavily supported by advertising.)
It's also interesting that none of the examples they give in the article -- from Google's wikipedia killer to the Maholo search engine -- have any real traction in the marketplace. I think we're seeing a fantasy here. People whose business depends on the elites managing who reads what where and when are arguing that we have to return to that model to make sure "good" information gets out.
I was speculating the other day about how different the world would be if there had been some way that radio and television could have been supported through a fee-based model, rather than the advertising-based model that we have today ...
John
Wired magazine recently published an interesting article on the Netflix Prize:
This Psychologist Might Outsmart the Math Brains Competing for the Netflix Prize
The article is a fun read. It provides some perspective on the importance of tuning algorithms and the potential for combining many algorithms for one prediction task. It also makes it clear that the prize-seeking community is very open to sharing results and techniques. Cool.
I would have been interested in reading more about why the researchers think going from 8% RMSE improvement to 10% improvement will be so hard. Is is because they've (finally) bumped up against individuals' abilities to accurately represent their own movie preferences on the 1-5 scale? I ask, because I had thought we were already there before this contest! How much room is there for algorithms to get better at predicting our individual rating idiosyncrasies and inconsistencies?
Max