Saturday, December 29, 2012

What incentives are there to maintain software in academia?

Just read an article in PLoS Comp. Bio. called "Ten Simple Rules for the Open Development of Scientific Software" by Andreas Prlić which was linked by Karthik Ram. A Twitter discussion followed, in which 140 characters was not enough to be sufficiently expressive. Let me start off by saying that I think this was a fantastic article. I'm 100% in agreement and think that these are some important points to make. I start with this caveat because I'm about to dwell on one suggestion that I had a negative reaction to.

From rule 10, "science counts:"

As scientists, the software we write is primarily a means to advance our research and, ultimately, achieve our scientific goals. Whilst the development of software for the consumption of others aligns well with other processes of scientific advancement, it is the science that ultimately counts. Scientific software development fulfils an immediate need, but maintenance of code that is no longer relevant to your own research is a serious time sink, and will rarely lead to your next paper, or secure your next grant or position.

 The author hits on the unfortunate practical reality that time spent on software development that doesn't result in widely-recognized deliverables such as publications or grants is essentially time wasted, and will be inversely correlated with your chances of success as an academic.

The troubling part is that this is an extraordinarily short-sighted view of the value of software. Outside of academia, large communities of developers frequently and happily contribute to open source projects for which they receive no tangible benefit. The rewards developers receive vary from education and experience to networking and recognition to simply having fun. Sometimes extrinsic rewards eventually present themselves, and beyond a certain level of growth money becomes increasingly necessary to keep a large project going (see the "Money" chapter from Producing Open Source Software.) Still, popular open source projects such as Linux and Python have value that far outweigh the modest amounts of money that have been funneled into them, and they're still developed largely by unpaid (sometimes anonymous) volunteers.

Scientific software is important, and even very specialized software should be more widely available and used more often. Replication is one of the cornerstones of the scientific method. I envision a future where results and figures from papers are easily replicable upon publication and where people (reviewers especially) are in the habit of checking each others' work. This is already being done on small scales - see Weecology on GitHub for some excellent examples. The problem is this: a scientist who develops code for a single analysis and makes their code publicly available is doing it to benefit the broader scientific community. But code rots over time. Inevitably, when code makes the jump from a single user to many, problems will be discovered. Thus, the benefit provided by open source software is directly related to the effort spent responding to users and maintaining code. And for most projects, this effort has a very low probability of providing the author of the code with an additional grant or publication, so there's little incentive to do it. (There are notable counterexamples - massive projects such as DataONE for which there's already funding for long-term development and maintenance and which tend to result in multiple publications and presentations for those involved.)

So, my question is this: what can be done to provide incentives for the development and maintenance of important scientific code?


  1. I think this is largely a question of changing attitudes in academia to value software as much as (or more than) publications. I think this is beginning to happen (e.g., NSF's recent biosketch change, Titus Brown's recent grant reviews, etc. It will take some time, but it is happening. Those of us who really value this stuff are gradually reaching the phase where we hire postdocs and serve on search and tenure committees.

    The reason that I have confidence that this is coming is that at its core science and academia are all about reputation, and reputation is driven by what people see and use on a regular basis. In the long run I think that widely used software will actually produce a more direct route to this kind of reputation building than writing papers.

  2. I agree with E. White. I would add that slowing down science and *taking time* to develop *software* instead of just programs would allow reuse of this software for further experiments. It's easier to maintain something you use than something you don't use anymore and for which you didn't think of any future.
    (I make a distinction between a program and a software: a program has no future, it's just written to do one thing; the development of software is driven by ideas of what it should be in the future, giving it potential for evolution).
    So, when software development will be recognized as valuable as publications, it could be an incentive to invest in software rather than just write programs.

    - zoggy

  3. I really like this distinction between 'program' and 'software'.

    1. If you can read french, I developed this idea here:
      (google automatic translation may help you)

      - z