Most investigators think of data sharing plans as yet another annoying administrative requirement. Here comes some clueless funding agency with this wacky idea that another researcher might actually use your data! Now you have to jump through a bunch of hoops to convince the agency that your data will be distributable. But you know, in your heart of hearts, that most of your data can’t actually be used by anyone else— no matter how religiously you adhere to data sharing mandates . . . at least not without a lot of additional work on your part. You’d have to do a lot of organizing, cleaning, and annotating before there is any real chance that an outside lab can make sense of what you’ve done. And that brings us to the real problem with data sharing: poor data management practices, specifically poor data curation.
Here’s a simple test to determine whether you’re performing adequate data curation. Ask your staff to produce a new, clean, data set combining different sources and document how long it takes for them to complete the task. Have they disappeared for over a week? If so, then your staff is likely re-cleaning the data before every analysis, an entirely inefficient use of their time and of yours. What you’re seeing is a symptom of a backwards process—pre-analysis “data cleaning” is an attempt to make up for poor up-front data curation.
You may be asking yourself, “So what if it takes a little more time to produce a data set? It’s not as though I plan to use this after I publish my work.” Let’s consider the following: your data management practices did indeed keep your data clean, organized, and self-documenting throughout your research process. Your data curation happened systematically and up-front, and not retroactively or ad hoc, every time you needed a clean data set. Wouldn’t such an approach allow a new potential collaborator to make use of the data with minimal direct input from you? Should this not also be the goal of your research?
Is your personal investment in this method worth the value you’re bringing to your collaborators, and equally important, to yourself? I would argue that Yes, the same practices also help to accelerate your own data analyses, internal collaborations, and data-reuse efforts. How much more productive would your research program be if it took you hours (instead of months) to produce a data set that combines data across many data types, studies, and/or sites? How many more questions could you ask, or how many more participants could you engage, if you were efficiently managing your data?
And that’s all a data sharing plan is, and all it should be: a description of practices you will use to organize, curate, and document your data so that you and others can make better use of it. Oh, and those “others” are your own team, close collaborators, and anyone else you choose to share your data with. So, instead of an annoying requirement, preparing for data sharing can be a way to accelerate your own research, and enable effective reuse of data across multiple studies and different data types.