Wednesday, March 2, 2016

Reshaping and preserving data in Stata


Preserving, Reshaping and Transferring Data in Stata

Andrew P. Davis

3/2/2016

 

This blog-post will focus on preserving, reshaping and transferring data in Stata – skills that are quite common and make data manipulation more efficient and less prone to human error.

 

I will begin with reviewing the logic behind reshaping data and the Stata commands to achieve this.  I will then move to provide information on transferring data using StatTransfer.  Finally, as with my previous blogpost http://apdsociology.blogspot.com/2016/02/regular-expressions-in-stata.html I will conclude with several exercises to help you get the hang of preserving and reshaping data.

 
Preserve and Restore


In Stata, a “preserve” command saves a pre-manipulated version of the data that will remain unadulterated after you conclude your programming.

-For instance if you write a do-file program that includes data transformations, you could use a “preserve” command that would return you data to its original version post-manipulation.

-Intuitively, “restore” commands call back the previously “preserved” data
 

These commands are called as follows

Syntax:

preserve
(syntax that changes the data in some way)
restore

And are essential to reshaping datasets.

 
Reshaping Datasets

 Data exist in “long” or “wide” formats.  An example of a “wide” dataset would be the following matrix:

 

         countryid   gdp90     gdp91      gdp92

  1.         3      95000      96000      77000 

  2.         1      60000      50500      51000 

  3.         2      75000      45400      65800

 

 

These data are considered to be “wide” as each variable-year is given its own column.  It actually is “wide.”

 When we transfer these data to “long” or “narrow” format they look like this:

 

 

         countryid     year     gdp

  1.         1         90      60000 

  2.         1         91      50500   

  3.         1         92      51000 

  4.         2         90      75000 

  5.         2         91      45400 

  6.         2         92      65800 

  7.         3         90      95000

  8.         3         91      96000 

  9.         3         92      77000

 

So that there is variable-year, long format data that is produced in which the collated measures from the wide format are broken down by year.

-Year is simply a convenient example, you could still theoretically transform wide data into long data and vice-versa if the data are not listed by year

 

*Okay.  But how do you do reshaping in Stata?

 

Examples using reshape wide and reshape long:

*These examples will use fabricated data

 

Syntax:

List



*As you can see, the data are already in “long” format, that is, data are in country-year format

 

Syntax:

Ex.: reshape long var1, i(id) j(year)

reshape wide GDP, i(Country) j (Year)

 




list 

 

*As you can see, our reshape has worked, data are in “wide” format.
*But what if we wanted to move our data back into “long” format?

 

Syntax:

List




Ex.: reshape long var1, i(id) j(year)
reshape long GDP, i(Country) j(Year)



list





 



 

Transferring data:

 Using StatTransfer products


*At times you will work with colleagues who work with data in a different format that you do (and different than you would like to work with)

*Notably, moving data between two of the more popular statistical packages used in the social sciences, SPSS and Stata is difficult if the data is in SPSS format or in Stata format

 
StatTransfer allows for the easy movement of data between formats commonly used in major statistical packages

 

StatTransfer version 13 will move data among the following programs

 

1-2-3
Microsoft Access (Versions 2.0 through Office XP version)
dBASE (all versions)
Delimited ASCII
Delimited ASCII with a Stat/Transfer SCHEMA file
Data Documentation Initiative (DDI) Schemas
Epi Info
EViews
New
Excel worksheets (all versions, including Excel 2013)
Fixed format ASCII
FoxPro
GAUSS (Windows and Unix)
Genstat
New
gretl
JMP 10
LIMDEP
MATLAB
MATLAB Seven Datasets
Mineset
Minitab 14 (read only)
MPLUS (write only)
NLOGIT
ODBC data sources (Oracle, Sybase, Informix, etc.)
Open Document Spreadsheets
OSIRIS (read-only)
http://www.stata.com/includes/contimages/spacer.gif
Paradox
Quattro Pro for DOS and Windows
R
RATS
S-PLUS (now supported through version 7)
SAS CPORT datasets and catalogs (read only)
SAS for Unix—HP, IBM, Sun
SAS for Unix—DEC Alpha
SAS for Windows and OS/2
SAS PC/DOS 6.04 (read only)
SAS Transport
SAS Value Labels
SAS version 7–9
SPSS through version 21
SPSS Datafiles (Windows and Unix)
SPSS Portable Files
Stata (all versions, including 14)
Statistica versions 7–8 (Windows only)
SYSTAT 13
Triple-S Survey Interchange Format


 

Stat/Transfer has a very user-friendly interface, details on transferring data can be found here:

 


 

 

Exercises

 
Using the “City Temperature” dataset (sysuse citytemp) please complete the following tasks and provide evidence using commands that you have reshaped the data.

1.)    “List” the data.  Is it in long or wide format?

2.)    Preserve the data

3.)    Reshape the data from long to wide format

4.)    Re-reshape the data from wide to long format

 

Open your own dataset, play around with this a bit but be sure to use the “preserve” command before reshaping data.

            -Is your data in long or wide format? How do you know?

            -Reshape the data in the direction you’d like
            -What would be the benefit or reshaping the data?