Becoming a more open scientist

Over the past few months, I became increasingly aware that I wasn’t be as open with my research as I could be. Sure, all my articles are openly available, including preprints of some of my work. But I also advocate for sharing code and data, and until now, I hadn’t done either. That changes today.

Want my data? You can have it. Want my code? You can have that, too. Want step-by-step instructions on how to analyze my data and reproduce the tables and figures in my manuscripts? You got it.

I’ve set up a github repository (my first ever!), which includes all the data and code associated with my latest preprint. I’ll be updating that preprint with a new version soon. But in the meantime, I would love to get feedback on the repository contents and especially the ipython notebook (also my first!) detailing how to view the electrophysiological recordings provided and how to analyze the bursting data extracted from those recordings.

When I have more time, I hope to blog about all that went into obtaining these data and creating the repository. Since this is the first time I’ve done this, I learned a lot about how to make my data and code more reusable, and even optimized my code in the process. (Special thanks go to Ross Mounce for his great advice on best practices to follow for sharing data, and to Marco Herrera Valdez for excellent feedback on an earlier version of the notebook.) Issues surrounding data preparation aren’t trivial, and I think we have to consider the time and skill investment involved if we want to see more widespread adoption of data sharing. More on that soon…

For now, please download the data. Play around with it. Tell me what you think. I see tools like github and ipython notebooks as hugely powerful and an essential part of opening up the scientific process. And I’m very excited to be taking these next steps in opening up own work.

Advertisements

16 thoughts on “Becoming a more open scientist

Add yours

  1. I think this is a pretty amazing and radical step. It’s one thing to support Open Source Journals and to publish there, but it’s a much larger step to be so one with data and code. Kudos!

  2. Nice step! Do you know about Zenodo (https://zenodo.org/faq)? It is the open data repository from CERN and has the great advantage of assigning a DOI to each data set you upload, thus making it easy to cite. It also with integrates perfectly with GitHub. You can link both services and once you create a release on GitHub it gets automatically pushed to Zenodo and gets a DOI. This way you obtain a fixed snapshot of the data related to your publication and could cite it from within your publication.

    1. Thanks! Yes, I’ve heard of Zenodo but never used it. Figshare (which I do use regularly and love) also has github integration. Are there any relevant differences between the two I should be aware of? Do you (or any other readers) prefer one over the other for github integration?

        1. I just archived the repository with figshare. It appears that figshare scrapes from your license file (if there is one). I had two licenses specified in that file – the MIT license for code and the CC0 license for data. For some reason, I’m guessing based on my last commit, figshare posted the MIT license on the archive. I’m told that if there isn’t a license file in the repository, there will be a dropdown menu for selecting the license. In any case, it looks like you can select more than just CC0, depending on what’s in your license file or the menu options.

  3. Hello Erin!
    I have been following your posts for quite a while. Please let me congratulate you for your efforts. I would like to call your attention to the project CopIt-arXives based at UNAM in Mexico City (http://scifunam.fisica.unam.mx/mir/copit/). This is a local initiative (but we have nodes in some other countries) to promote publication in open access. We have been up since 2007, publishing peer reviewded academic texts. Please let me know if you, perhaps,could be interested in collaborating with us in this project. Thank you very much for your reply.

    Sincerely Octavio Miramontes

    1. Hi Octavio! Thanks so much for your comment. I’m always excited to hear about open access projects going on in Mexico. Congratulations, it looks like an excellent initiative! I’d love to learn more. Could you please email me at emck31[at]gmail[dot]com? I look forward to talking more.

  4. Hey Erin! Curious as to what’s prompted your change of heart on this. I know in the past, you’ve talked about the drawbacks to sharing code/data openly as a scientist working outside of the US. What has changed your mind? Do the benefits to science overall now outweigh your concerns?

    1. Hey Stacy! I should clarify that for my part, there has really been no change of heart – I’ve believed for a long time that sharing code and data is the right thing to do. The only things holding me back from sharing mine before were: (1) figuring out whether I had permission to share these data (I know now that I do), and (2) figuring out how to do it properly to optimize reusability (I didn’t just want to dump spreadsheets on to the web).

      My goal with my previous post on the PLOS open data policy was more to outline concerns I’ve heard from other researchers, especially those working outside the U.S. (e.g. Mexico). Although I see huge benefits for science and for individual researchers in sharing, many researchers have legitimate concerns. I think advocates have to consider these concerns and look for possible solutions if we want to see more people sharing. While sharing in this way was the right choice for me, I recognize not all researchers may be comfortable with this level of sharing yet. I hope we can move towards making this the norm soon, but for now I’d be happy with some just taking small steps in the right direction.

      On a side note, I’ve been thinking about writing an update to that PLOS post. While I think some of the concerns I outlined there still hold, I have to praise PLOS for leading the way on open data.

  5. Looks pretty good. There are a few tiny things I would nitpick but nothing major.
    You should probably be aware that many of the functions in the string module are deprecated. These functions exist on the builtin str type now (join, split, etc.).
    https://docs.python.org/2/library/string.html#deprecated-string-functions

    You also seem to use loop counters a lot when enumerate would be indicated or when you aren’t actually interested in the index at all. Generally you should loop directly over the contents of a collection if you aren’t specifically concerned with the index.

    This is a great short article on effective iteration:
    http://nedbatchelder.com/text/iter.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

Up ↑

%d bloggers like this: