In our last installment of Tips for Getting Control of Your Data series, we suggested using a revision control system to help keep track of your data files as they change over time. As it turns out, the types of files that lend themselves well to revision control also lend themselves to storing and manipulating data in ways that can really highlight the kinks in your data workflows, so you can straighten them out and process your data with less repeated work.
Therefore here are our next two tips:
Learn to separate your data nouns from your data verbs, and
Store each in the simplest format possible
Tip #2: Separate your data nouns from your data verbs
Ever find yourself with data in an Excel file, all your calculations nicely set up, and then you have to update them all and possibly copy / paste / move some columns around just because you got a few new rows of data? And the worst part is, it could be two rows or hundreds of rows – you still have to repeat all the same steps to update your calculations each time you get new data.
Not to pick on one program too much, but part of what’s happening is that Excel (like many other tools) encourages you to conflate the nouns you have in your data with the verbs that you enact upon your data.
By contrast, reimagine for a moment what your nouns are (e.g., observations) and what your verbs are (e.g., count, average, correlate). If you store your nouns in one file and your verbs somewhere else, it doesn’t matter by how much or how often your list of nouns grows – your verbs are safely watching from a distance, waiting to act when you call upon then. And if you need to change your verbs (maybe you want to change a calculation, or you want to add a statistic you weren’t measuring before), you can do that without the fear of accidentally introducing errors into your nouns. Which leads to…
Tip #3: Store your nouns and verbs in the simplest format you can
Let’s say you’ve taken out your calculations, analyses, graphs, and so forth from your Excel file, and all you have left are your nouns (your observations). How are you going to save this file? Back to Excel, right? Well, why? Would a .csv or .txt file work instead?
There are many benefits to using simpler file formats, such as:
- Simpler file formats store less of the stuff you don’t need, like whether your text is black, or very dark grey. Does the color of the text really matter?
- Simpler formats work with lots of tools, whereas more complicated formats necessarily have fewer tools that work on them. (Though there is a large number of programs that can interact with .xlsx files, there is an even larger number that you can use with .csv files.)
- Simpler formats are easiest to track with revision control. As a result, your revision control system can easily show you the differences between two versions of a .csv file, but it’s probably struggling a bit with .xlsx.
- Simpler formats help keep you from conflating nouns and verbs, even when you’re tempted to. You just can’t save your calculations alongside your data in a .csv file, so you’ll have to put them somewhere else.
What kinds of files are good for verbs, you ask? Well, there has to be something for next time. Now you know how to incorporate these two tips into your workflows and you’ll be well on your way to transforming jagged datasets into sparkling, well-cut gems.