Who owns research data and the rights to publish? Part II

Disclaimer: Nothing contained in this post should be construed as legal advice.

Ok, now that we’ve gotten that out of the way, let’s get to the interesting part. The first post in this series arose from a simple question: Is it legal for a student to publish their dissertation without the consent of their advisor? As I mentioned previously, I am ignoring for the moment any ethical issues (important as they may be), and just focusing on the legal ones. This question boils down to an issue of data ownership. Are the data on which the dissertation is based subject to copyright and, if so, who holds the copyright? I am not an expert on copyright, but fortunately, there are others who are. Charles Oppenheim (@CharlesOppenh) is a former professor of Information Science at Loughborough University. He has published on copyright and intellectual property rights, and advises companies and government organizations. A huge thanks  to Dr. Oppenheim, who has patiently and generously answered all my questions.* With his permission, I have selectively reproduced portions of his answers below.

Let’s start with the issue of data ownership. Suppose the following scenario: A student has collected a set of microscope images, or a set of recordings in the form of binary files. The student places all the files on a hard drive, but only organizes them with respect to the date on which they were obtained. Are these data subject to copyright? In response to my query, Dr. Oppenheim answered:

…there is no copyright in individual facts, though there is copyright (in the USA and EU) in a COLLECTION of facts if there has been creativity in the selection and arrangement of the facts.

For those interested in the case law, this comes from the Supreme Court ruling in the case of Feist v. Rural. Let’s examine this further, because it seems to me there is a lot of confusion regarding how this applies to research data. I have heard people say that research data incur copyright just by putting them in ‘fixed form’. In fact, this is not true, since putting data in fixed form does not meet the minimal creativity requirement established by Feist.  The Copyright Office, Guidelines for Registration of Fact-Based Compilations states that the arrangement of facts within a compilation must “go beyond the mere mechanical grouping of data as such, for example, the alphabetical, chronological, or sequential listings of data” (quoted in Patry, 1990).** There is an issue of physical ownership of the hard drive; the student (and the PI, in many cases) cannot take the hard drive from the laboratory without permission. However, there is nothing a priori preventing the student from making a DVD copy of the raw data files  and sharing those with others because there is no copyright to infringe upon. There is one potential caveat. As Ian Holmes (@ianholmes) pointed out on Twitter, there may be cases in which students have signed a non-disclosure  agreement, or their contract has a specific clause saying they cannot take copies of data and share them outside the laboratory. If this is the case, then the student must legally abide by the terms of the contract or NDA.

So, now we know some creativity in the compilation or presentation of the data must exist in order for the data to incur copyright. With specific reference to a student dissertation and the data compilation on which it is based, Dr. Oppenheim goes on to say:

I assume there has been such creativity, since no doubt the student and/or advisor decided which facts to present and which were uninteresting/irrelevant from a range of facts collected.

In other words, at the moment the student or advisor chooses a set of criteria by which to filter, rank, or otherwise selectively present the data, the resulting collection is now subject to copyright. For most dissertations, this means that the underlying data compilation is copyrighteable, since it is rarely the case that every piece of data is presented in the final written document without some element of creativity applied to the original data set.

Having established that the data comprising the dissertation are subject to copyright, who owns the rights? Dr. Oppenheim writes:

There are two possibilities – either the creator (i.e., the student – I am assuming the advisor did not prepare the collection), or the employer if the work was “work for hire”.  … “Work for hire” (or “employee-created works” in EU law) are those works created by an employee who was paid to create those works.  Unless the student had a contract of employment with the University, or the advisor, which stated “we will pay you so much, to do this work”, it is not a work for hire.  It may well be that such a contract is embedded in the University regulations which the student signed up to when they started their research. Was there such a contract, or was it simply a grant without any strings attached? If a contract, then this is indeed a work for hire, and the University/advisor owns copyright in the outputs;  If it was a grant, the student owns it.

Therefore, to determine who owns the rights to data in any particular case, we must know the terms under which the collection was created. If the conditions do not meet the requirements of ‘work for hire’, then the student owns the rights to the data and there would be no legal problem in publishing the data without the permission of the advisor.  Let’s look at the more complicated case. In many universities, students sign a notice of appointment, which serves as an employment contract between the student and the university. I am not sure whether these NOAs typically state explicitly that the student is being paid to produce a data collection, but I imagine that in some cases the contract can be interpreted in these terms. If the data were collected under these terms, then the advisor/university would have the rights to the data, and the student would be guilty of copyright infringement in publishing the data without permission.

But is this where the story ends? I mentioned in my last post that there are cases in which the student holds copyright on the written dissertation. For example, some universities post student dissertations in their online repositories. These records show the student as the sole author, and may indicate that the author holds copyright. How do we reconcile this with what we have learned about data ownership? Dr. Oppenheim writes:

Whoever the original owner [of the data] was, that owner can choose to assign or licence the copyright in the work to anyone else – either by contract, or by custom and practice. …irrespective of whether the University/advisor did indeed initially own the copyright, they have assigned the student the copyright in the work by placing the material on the repository with that notice. In effect, the University has shot itself in the foot by doing that. To sum the position up in a nutshell:  it is likely the University/advisor did own the copyright in the first place.  But even if it did, it chose to grant the student that copyright.  Ergo, the student is entitled to use any or all of the materials, including the data, in their thesis in any way they like, including writing it up for a journal article.

There you have it. The moment a university posts a dissertation with a note saying the student author has copyright, they forfeit any future claims to ownership and cannot legally prevent the student from publishing the work.

So far, we have only considered the legality of publishing without permission. For the next post in the series, I’d like to discuss ethical considerations and non-legal ramifications. We’ll look at particular case studies with very different outcomes. Coming soon…

1. Feist Publications, Inc. v. Rural Telephone Service Co., 499 U.S. 340 (1991).
2. Patry, W. (1990). Copyright in compilations of facts (or why the white pages are not      copyrighteable). Communications and the Law, 12: 37

*My thanks to Mike Taylor (@MikeTaylor) who suggested I contact Dr. Oppenheim and put us in touch.
**I cannot find the original document, Guidelines for Registration of Fact-Based Compilations. If anyone knows where I can download this, please let me know in the comments. Thanks in advance!


  1. Glad to have helped, even if indirectly!

    One clarification:

    There may be cases in which students have signed a non-disclosure agreement, or their contract has a specific clause saying they cannot take copies of data and share them outside the laboratory. If this is the case, then the student must legally abide by the terms of the contract or NDA.

    This is true, but it’s nothing to do with copyright. It’s a point of contract law, which is quite separate.

    1. Thanks, Mike. Quite right. I didn’t mean to imply that this example was related to copyright; I only wanted to point out that there may be circumstances under which the student cannot legally share the data, even if there is no copyright in place.

      1. If you’re going to question the farce that is copyright you may as well question NDAs, whether people really can alienate themselves from that which is inalienable. Law may abridge your freedom of speech, but contract is not law, it is merely an exchange of that which is alienable.

        As to copyright, bear in mind that what it protects precisely (from what, and for whom) is academic. Copyright is a weapon, not a means of restoring equity. The minutiae of copyright law only come into play in the rare case of peers, e.g. Sony & Samsung. If it’s Stanford vs struggling student, it’s settle, starve, or Sing Sing. See Swartz.

        1. Thanks for your comment. I should clarify that my goal with this and the last post was not to question copyright law (or NDAs), but simply to understand how the law applies specifically to research data and dissertations. The question of whether these laws are appropriate, or whether there should be any restrictions on the exchange and reuse of information, is separate and really deserves a post (or series) in its own right.

          I will say that, in general, I am not a huge fan of copyright; I strongly believe that information does more public good when it can be freely shared and built upon. However, I think that in the case of student dissertations, copyright, while not “restoring equity” as you say, does at least give the student an edge up in a system which is often stacked heavily against them. The student who created the work – not the faculty advisor or the university – should have the right to decide how that work is shared and reused. Of course, I am hoping that most students will use that right for good, and decide to post their dissertations in open access repositories, or even place Creative Commons licenses on them. I think that is far more likely if the decision is in the hands of the student, rather than the advisor or university.

          I agree with you that students who exercise their rights against advisors or universities face a serious challenge. However, there are a few cases in which students have won this fight. See for example, Seshadri v. Kasraian, where the student published and the advisor’s claim of copyright infringement against the student was dismissed. Of course, this won’t always be the outcome, but I think it’s important students know what their rights are.

      2. I meant ‘question’ in the broad sense – including your questioning of copyright’s operation concerning research, data, researchers, universities, etc.

        When you say “should have the right to decide how that work is shared and reused” you are claiming power. Who from? The state. The state takes the power from the people and gives it to those few who lobby the most persuasively.

        Rights are not determined by ‘should’ or ‘we’ll donate funds’, but by nature. Governments are created to secure the rights we already have – not to decide who should be given what right. Rights precede government, precede law. Try http://culturalliberty.org/blog/index.php?id=291 and http://culturalliberty.org/blog/index.php?id=289

        It is certainly important that students know what their rights are, but before that, they need to know what rights are. This knowledge has been omitted from the educational system – in the interests of the state (to inform people what their rights are).

  2. Next important point:

    This means that the underlying data compilation is copyrighteable, since it is rarely the case that every piece of data is presented in the final written document without some element of creativity applied to the original data set.

    Yet is seems to leave the original, complete data set from (which the subset used was drawn) not subject to copyright, so that the entire set of data could be published unencumbered — it’s only the selected subset that bears this encumbrance.

    Is that right, Charles? Strange if so, but I guess it would reflect that notion that what’s being copyrighted here is not the data, but the authorial act of choosing and arranging a subset.

    1. Strange if so, but I guess it would reflect that notion that what’s being copyrighted here is not the data, but the authorial act of choosing and arranging a subset.

      I realize the question was directed at Charles, but I thought I’d chime in, too. 🙂 I think that’s exactly what it reflects, though interestingly, you’re not the first to note the apparent conflict here. In a related case, Key Publications v. Chinatown Today Publishing, judges from the United States Court of Appeals wrote:

      Facts, without more, are not copyrightable…Factual compilations, however, may be copyrighted…Some view these two principles as a paradox in that a compilation comprised of facts not entitled by themselves to copyright protection may support a valid copyright. See Feist, 111 S.Ct. at 1287 (observing that “[t]here is an undeniable tension between these two propositions”); Financial Info., Inc. v. Moody’s Investors Serv., Inc., 751 F.2d 501, 505 (2d Cir.1984) (“the law of copyrights defies the laws of logic…since it `affords to the summation of one hundred or one million [individual facts and their unadorned expression] a significant measure of protection while affording none to the facts themselves”) (quoting Robert C. Denicola, Copyright in Collections of Facts: A Theory for the Protection of Nonfiction Literary Works, 81 Colum.L.Rev. 516, 527 (1981)).

      The judges go on to say:

      Paradox or no, Section 101 of the Copyright Act of 1976, Pub.L. No. 94-553, 90 Stat. 2544, defines a copyrightable compilation as “a work formed by the collection and assembling of preexisting materials or of data that are selected, coordinated, or arranged in such a way that the resulting work as a whole constitutes an original work of authorship.” 17 U.S.C. § 101 (1988). This language suggests three requirements for a compilation to qualify for copyright protection: (1) the collection and assembly of preexisting data; (2) the selection, coordination, or arrangement of that data; and (3) a resulting work that is original, by virtue of the selection, coordination, or arrangement of the data contained in the work. See Feist, 111 S.Ct. at 1293. There is thus more to a copyrightable compilation than the simple collection of uncopyrightable facts. Such a compilation must “feature[] an original selection or arrangement of [those] facts.” Id. at 1290.

