Sunday, February 17, 2013

Cut off the freeloaders: a potential model for maintenance of academic software

I've been wondering about the distinct lack of incentives for academics to maintain the code and software they develop. (As some people have noted, by the way, the situation is getting better, but it still feels like a crapshoot.) One type of incentive that's not usually available for software maintenance is taxpayer or nonprofit money. Sometimes the (very generous) Sloan foundation will swoop in and unexpectedly bestow millions of dollars upon a deserving project, but that's the exception, not the rule. And altruism doesn't pay the bills. I wonder if the solution isn't fairly simple: quit giving stuff away for free, and make the users pay for what they use. Academics are freeloaders, and they should start bearing some of the cost for the tools they use.

Don't get me wrong: I'm not arguing that software should not be open source (it should), nor am I arguing that it shouldn't be free to download and use (it should, in most cases.) But beyond that, if a community wants to prevent code from rotting, they should bear the cost of maintaining it.

My suggestion is this: once I've released software, I'm not obligated to keep spending my own time maintaining it unless I personally want to or can derive some other type of incentive from my efforts - and as I've already discussed, those incentives are in short supply. So I'll continue to develop software and milk it for as many publications/conference presentations as it's worth. Then I'll move on to other projects.

Beyond that, it's up to you: every time you donate my currently hourly rate to a project, assuming I can make time, you've bought me for one hour to work on fixing bugs, developing new enhancements, or improving documentation for that specific project. (And if I can't make time, at some point I'll be able to hire someone who can.) If people are interested in my open source projects and don't want them to rot, they can either contribute their own time and effort to submit changes, or they can donate the monetary cost of getting me to do it. Fair enough, right?

This provides a decent way to gauge whether working on a particular project is actually a productive use of my time. It also seems like this is something that could be directly mentioned on my CV: I developed Open Source Project X, which has raised $Y in community funding, etc. Creating an open source community is a fuzzy concept, but dollar amounts are something that everyone can understand.

I don't personally expect the floodgates to open at this moment and the donations to start pouring in for anything I've built, but I'm looking to the future. I was raised as an undergraduate in several research groups that produced software and were entirely dependent on federal money. I plan to continue doing software development, and need to think about sustainability.

In 2010 I created a prototype of a programming language called Scotch (the dynamic, functional lovechild of Python and Haskell) largely as an educational exercise while I worked through Krishnamurthi's Programming Languages book. After a few blog posts took off on Reddit/Hacker News, it seemed like there was a decent amount of interest. (Matz, the creator of Ruby, even tweeted about it! Major nerd moment for me. Oh, and there's a mention of Scotch on the Haskell Wikipedia page, just waiting for some very justified moderator to remove it.) It's a pretty complex language and would've required a lot of work to fully develop myself. Reality set in - after the birth of my son, while working on other projects and taking 18 credits of courses, I just didn't have time to freely donate to this project anymore, and while people seemed to think it was cool, no one was helping me to write it.

I created a Pledgie campaign so that people could support the continued development of the language, and, like any good street performer, seeded it with $50 to make it look like people were donating. In the years since I opened the Pledgie, some anonymous saint has generously donated $1.

No happy ending to this story: at some point, I broke the interpreter, and I really don't have time to get back into the code base and figure out where I went wrong. The lack of actual support for the project reveals the true level of value people place on the project - at best, it's of niche interest. I'll probably continue to work on it every now and then, but it's a low priority. I'm not going to invest too much time on it. I have other more valuable ways to spend my time that are more likely to result in direct short-term benefits.

Are there any successful examples of projects that are funded by community contributions instead of grants in academia?

Friday, February 15, 2013

Hour challenge: NCBI taxonomy tree

Today I'm planning a more science-y and useful hour challenge. Over the course of one hour, I'm going to transform the NCBI taxonomy from a tabulated text dump into a Newick tree which can be manipulated by phylogenetic tools. While other people have done this and Newick strings created from the NCBI taxonomy can be downloaded on external sites, the taxonomy is constantly updated, so it would be nice to have a reproducible process to update the tree whenever necessary.

I have other things going on and I haven't decided exactly when I'm going to do this today, but it'll happen. I think there's a good chance that this is the first one I actually finish in an hour, too. When I decide on the timing, and when I complete the project, I'll update this post with links. As always, everything will be done on GitHub so you can watch me tackle this live if you have nothing better to do.

While we're on the subject of the NCBI taxonomy...

Update 1: Busy day. The plan is to get started at 7 PM Eastern. So, theoretically, I should be finished by 8.

Update 2: Started at 7:15, and finished promptly at 8:15, so this was a success. This actually required fixing a bug in the BioPython Newick writer, as node labels in Newick trees weren't being quoted when they contained invalid characters such as spaces or parentheses. So, in addition to the 989,621-node NCBI Newick tree, I also generated a bug fix for BioPython.

The code is available at: