Who owns research data and the rights to publish?

Yesterday, I asked a very simple question on Twitter:


Let’s assume for the moment that we are ignoring moral and ethical considerations, and focusing only on the question of legality. Simple, right? Except that it’s not. Immediately, I received several responses, the first from Mike Taylor (@MikeTaylor):


Again, let’s ignore for the moment the question of whether it’s wise (we’ll get to that later). Playing devil’s advocate, I responded that the advisor’s claim is that any data collected in their lab are their property, and that by extension they control the publication of any work based on those data. Mike then tweeted:


Ok, so now we get to the heart of it. Are data subject to copyright and, if so, who owns the copyright? Mike’s argument, and that of others I have spoken to, is that data collections are sets of facts and since facts are not subject to copyright, then neither are data. But is this how copyright law is written? (Let me note that I am currently focusing on U.S. copyright law, but I would love to do a comparison of the laws in different countries in future.) This is where the law, at least to someone like myself not educated in this area, gets tricky. Here is an excerpt I pulled from Bitlaw:

Although databases may be protected as compilations under U.S. copyright law, the underlying data is not automatically granted protection. The Copyright Act specifically states that the copyright in a compilation extends only to the compilation itself, and not to the underlying materials or data. 17 U.S.C. § 103(b). As a result, compilation copyrights cannot be used to extend copyright protection to ideas or facts that are otherwise unprotectable (it is a basic premise of copyright law that there is no copyright protection for ideas and basic facts…Thus, a database of unprotectable works (such as basic facts) is protected only as a compilation. Since the underlying data is not protected, U.S. copyright law does not prevent the extraction of unprotected data from an otherwise protectable database.

So, as Mike and others argued, the data themselves are considered facts and thus not subject to copyright. This law arises from the case of Feist v. Rural, which ruled that “information alone without a minimum of original creativity cannot be protected by copyright” (Wikipedia). Thus, you can only claim copyright on a data compilation, and only when you can show that you have organized the data or provided some infrastructure that is unique. Unless I am reading this wrong, this appears to me to be in conflict with statements by many universities regarding the ownership of data. For example, take this one from Columbia University:

Although graduate students, postdoctoral fellows, or even some faculty in academia performing research may believe that they own the data collected, they are wrong. As employees of a university, they are working for hire for the university, which, in most cases, owns the rights to the data. In federally sponsored research, the university owns the data but allows the principal investigator on the grant to be the steward of the data.  …With industry-funded or privately funded research, data can belong to the sponsor, although the right to publish the data may or may not be extended to the investigator.

I am far from an expert on copyright, but this reads to me as if universities and funding agencies are trying to claim ownership, not of an original database or collection, but of the underlying facts themselves; something specifically prohibited by copyright law. Have I misinterpreted something here? How do these institutions get by this?

Getting back to the original question, it gets even more complicated. Even if we assume that the advisor owns the rights to the data, which is questionable, the copyright to the written work in the form of the dissertation is sometimes owned by the student. (I say sometimes because I am not sure in what percentage of cases this is true. I, for example, as the author of my dissertation hold copyright, which is stated in the online repository record.) When the student then wants to publish parts of their dissertation, which copyright takes precedence? (1) the copyright (assuming there is one) on the data collection on which the publication is based, or (2) the copyright on the written work itself? Perhaps this is obvious to others, but it’s not to me and I’m guessing it’s not to a least a few other researchers out there. More to come on this soon, including additional discussion of whether it is wise for students to publish without their advisor’s consent. In the meantime, I’d really appreciate comments from anyone who can help me understand this mess of who owns research data and the rights to publish it.

Update 10/29/2012: I have removed the previous note saying this was a draft. I think this post stands as a good introduction to the questions that inspired what will be a series of posts on data ownership and copyright. Please see future posts for answers to some of the questions posed above. I still welcome input from anyone with either personal or professional experience in this area. I would also like to thank Ian Holmes (@ianholmes), @MnkyMnd, Casey Bergman (@caseybergman), and Dan Stowell (@mclduk) who, though not quoted here, also participated in the original discussion on Twitter.



  1. Just to note (as I clarified in subsequent tweets) that I am very far from being an expert on this stuff. What I said in the tweets that Erin quotes here is just hazily half-remembered bits and pieces. I am very far from being an expert on copyright and absolutely nothing I ever say should be construed as legal advice, not even by the Italian government.

    1. Good point! I should have mentioned that nothing said here by either myself, or anyone I quote, should be construed as legal advice. Mike, sorry to bring you into this mess :).

  2. Really informative article Erin, on a very important issue. I noticed that a huge lawsuit at U of Penn was recently settled. The Abramson institute was seeking up to a $1 billion against Craig Thompson for hiding some of his research from Penn to help start a company, Agios.

    1. Thanks, John! I was not aware of this case. For other interested readers, here is an article on the lawsuit and settlement http://bitly.com/NcZXNa. Many details about the case were not disclosed due to the legal proceedings, but it seems to me that this case is more clear cut than, for example, a student’s right to publish their dissertation. One obvious difference is that the student would not receive direct financial benefit from publishing in an academic journal, while Thompson was using research to benefit financially through his company. In addition, although I am not a tenure-track professor nor PI of my own grant, I imagine professors sign paperwork at the time of hiring and receiving funds that outline clearly what they can and cannot do with data obtained during the period of the grant or employment. Students, on the other hand, more rarely receive such specific instructions, especially when it come to what they can or cannot do with their dissertation and what rights they are afforded if they have copyright. Therefore, whether or not Thompson was in the wrong (I am not making judgement on the case), I can see to a certain extent why the institution felt it had the right to bring a legal claim for financial damages. In the case of advisors and student dissertations, however, the waters look murkier to me.

  3. Copyright law is not the only area of law implicated in questions of data ownership (or “ownership” if you wish). Many of the university policies you cite derive their force not from copyright, but from the necessity to honor contracts signed with grant funders — especially with respect to potential fraud or fund-misuse allegations.

    1. Thanks for your comment. You’re quite right to point out that copyright law isn’t the only consideration in these situations. I tried to cover this a little more in the subsequent post, where I mentioned that students may be prevented from sharing data (even if no copyright incurs) due to non-disclosure agreements or clauses built into their contracts with the university. Still, I wonder if, in some cases, universities and funders may be overstepping their bounds and simply relying on an element of fear to buffer their position. Do you know of any specific cases (besides Abramson v. Thompson discussed above) in which universities/funders have argued data ownership in court and won, or lost? Thanks in advance for any info.

