Health Care Data Manipulation: How to Reshape Long-Format Excel Data into Wide-Format SPSS Data
As a provider of integrated data management services, we often answer a wide variety of questions relating to data manipulation. One of the most common questions that we encounter is “How do you reshape long-format excel data into wide-format SPSS Data?” In this blog post, we’re happy to answer this (surprisingly!) popular question and share some quick solutions that don’t involve laborious copying-and-pasting.
Let’s say you have some study data related to sleep habits in an Excel file in long format. In long format, each row in the spreadsheet represents data on a single participant session. We can call this file the source file. This file might contain the following columns:
participant_id session_id var_A var_B
In this situation, participant_id and session_id identify the participant and the session, respectively, and var_A and var_B represent different variables that are measured during each session.
For statistical analysis, this data need to be organized in SPSS in wide format so that each row contains all session data for a single participant. We can call this file the target file. If there are three sessions, this dataset would have the following columns in wide-format:
participant_id var_A1 var_A2 var_A3 var_B1 var_B2 var_B3
where each row of var_A1 contains the value of var_A from the source file during session 1 for a single participant, var_A2 contains the value of var_A from the source file during session 2 for that participant, and so forth.
Of course, you can convert your data from long-format to wide-format by manually copying and pasting data from the source file into the target file in the necessary format. However, this procedure is time-consuming, error-prone, and exhausting. We need a solution that’s faster, more reliable, and less cumbersome than manual copy-and-paste.
One Possible Solution
There are many possible solutions, but one of the simplest and easiest is the following two-step procedure:
- Import the long-format Excel data into SPSS.
- Use the SPSS Restructure wizard to reshape the data from long- to wide-format.
Step 1: Import the long-format Excel data into SPSS
Clean Data in Excel
- Open the Excel file containing the data you need to convert.
- Make sure that variable names all appear in the first row, and data start in the second row. This is where SPSS will expect them to be.
- Make sure that the variable names don’t start with special characters (such as @ or #) and don’t have spaces in them (we recommend replacing spaces with an underscore).
- From the File menu, select Save As and provide a new name for the file. Note where you’re saving the file. Click Save.
- Close the Excel file.
Import Data into SPSS
- Open SPSS.
- Select File → Open → Data.
- Select Excel (*.xls, *.xlsx, *.xlsm)under Files of type.
- Find the file you saved in Step 4 above and and click on it. Click Open.
- A window called Opening Excel Data Source should pop up. Make sure the box called Read variable names from the first row of data is checked. Click OK.
- Select File → Save to save the file as an SPSS (*.sav) file.
Step 2: Use the SPSS Restructure wizard to reshape the data from long- to wide-format
- Select Restructure from the Data tab in the SPSS menu. The restructure wizard will pop up.
- Select Restructure selected cases into variables. Click Next.
- For identifier variable enter participant_id (or whatever the variable that identifies participants is called). For index variable enter session_id (or whatever the variable that identifies sessions is called). Click Next.
- Under sorting data select Yes. Click Finish.
For more information about this process, visit the IBM Knowledge Center’s page on the SPSS Restructure Data Wizard.
The RexDB Solution
Moving between long-format data and wide-format data is even simpler with RexDB, Prometheus Research’s open source data management platform. In RexDB, you can quickly convert your data by running queries that retrieve wide- or long-format data.
To share data in either format, you would simply send them the URL to your query. Collaborators who have the appropriate permissions can access the dataset you’ve provided and can even re-run your queries at a later day.
We’ve created a short video demonstration of this process in RexDB, called “Wide- and Long-Data Format Conversion,” which can be found in our Media Library. Enjoy!
What other data manipulation techniques do you struggle with?