Skip to Main Content

SAS Tutorials: Transposing Data using PROC TRANSPOSE

This SAS software tutorial shows how to use PROC TRANSPOSE to transpose a dataset's rows and columns, and discusses when to use a transpose.

What is a transpose?

Ideally, datasets are structured so that each row corresponds to one unique subject or object, and each column corresponds to a single variable. However, data can be recorded or collected in many different arrangements, depending on what is convenient or cost-effective for the data collector. Furthermore, we may need our data to be arranged in a particular format in order to use a specific analysis or procedure. This is where transposing or reshaping a dataset comes in to play.


Simple transposes

The simplest possible case of transposing switches the rows and columns of a matrix. If applied to a traditional dataset, this would make it so that there was one row per variable, and one column per subject.

For example, consider the following 2x3 matrix (2 rows, 3 columns):

1     2     3
4     5     6

Transposing this matrix would turn it into a 3x2 matrix (3 rows, 2 columns):

1     4
2     5
3     6

Notice that the first row of matrix 1 becomes the first column of matrix 2.


Long datasets

A "long" dataset contains more than one row per subject, and uses a unique ID to identify each subject. Panel (or longitudinal) data is often recorded in this "long" format.

Consider a clinic where patients come in for appointments. As patients come into the clinic, each visit is recorded in the clinic’s records. That is, each row of the "appointments" dataset corresponds to visit. A single visit record might contain information about the patient's name, the date of the visit, and the weight of the patient during that visit.

Patient1 Visit1 Weight
Patient1 Visit2 Weight
Patient2 Visit1 Weight
Patient2 Visit2 Weight
. . .
. . .
. . .
Patientn Visitn Weight
Patientn Visitn Weight

If you want to compare the patient’s weight at visit 1 to their weight at visit 2, you might want to transpose the data so that each patient has one line of data that includes both weight values (i.e., a wide dataset), like below:

Patient1 Weight1 Weight2
Patient2 Weight1 Weight2
. . .
. . .
. . .
Patientn Weight1 Weight2

When to use "long" form data

Statistical software packages typically require data to be in "long" format for procedures like:

  • Time series
  • Mixed and multilevel models

Wide datasets

You can also imagine the reverse situation where you have one line of data per observation (a wide dataset) but you want multiple lines of data with unique values of a certain variable (i.e., a long dataset).

When to use "wide" data

Statistical software packages typically require data to be in "wide" format for procedures like:

  • Paired t-tests
  • Repeated measures ANOVA
  • Correlation

The TRANSPOSE Procedure

In SAS, the TRANSPOSE procedure can perform simple transposes, wide-to-long, and long-to-wide restructuring of datasets.

The general format of PROC TRANSPOSE is:

PROC TRANSPOSE DATA=Dataset-name OUT=New-dataset-name;
    BY variable(s);
    COPY variable(s);
    ID variable;
    VAR variable(s);
RUN;

In the SAS code above:

  • The PROC TRANSPOSE statement tells SAS to execute the transpose procedure on an existing dataset called Dataset-name.
    • The OUT keyword says that the transposed dataset should be created as a new dataset called New-dataset-name.
  • The BY statement is used to determine the row structure of the transposed dataset. You can include more than one variable in the BY statement. Your data must be sorted on your BY variables before running PROC TRANSPOSE.
    • For long-to-wide transposes, the BY variable(s) should uniquely identify each row.
    • For wide-to-long transposes, the BY variable(s) determine the row structure of the long data; that is, it determines the repetition of the rows.
  • The ID statement assigns names to the transposed value columns that match the values in the variable listed in the ID statement.
    • For long-to-wide transposes, the ID variable(s) determine the structure of the columns in the transposed dataset. There will be one column for each unique value of the ID variable (or if multiple ID variables are present, one column for each unique combination of values).
    • For wide-to-long transposes, you typically do not need an ID variable. However, if you do supply an ID variable, it will determine the column structure.
  • The VAR statement is where you actually tell SAS what variables you want transposed. These are the values that will appear in the cells of the transposed variables.
    • For long-to-wide datasets, there is usually one variable in the VAR statement.
    • For wide-to-long datasets, there are usually multiple variables in the VAR statement. The resulting dataset will have one row for each variable identified in the VAR statement.

Other options available in the PROC TRANSPOSE statement that can be found in the SAS Help Guide.