Our tutorials reference a dataset called "sample" in many examples. If you'd like to download the sample dataset to work through the examples, choose one of the files below:
Recall from the Informats and Formats tutorial that a format in SAS controls how the values of a variable should "look" when printed or displayed. For example, if you have a numeric variable containing yearly income, you could use formats so that the values of those variables are displayed using a dollar sign (without actually modifying the data itself).
There will be some situations where SAS built-in formats do not fit your needs - for example, nominal and ordinal categorical variables. In this case, you can create your own formats. This is done using the PROC FORMAT
procedure.
The general form of PROC FORMAT
is:
PROC FORMAT;
VALUE format-name
Data-value-1 = 'Label 1'
Data-value-2 = 'Label 2';
VALUE format-name-2
Data-value-3 = 'Label 3'
Data-value-4 = 'Label 4';
.....;
RUN;
The first line is the start of the proc step. The procedure we want to execute is PROC FORMAT
. The next line starts with a VALUE
keyword, followed by the name of the format you want to create. You can name the format whatever makes sense to you, but it must:
Note that there is no semi-colon after the format name. The next set of lines assigns labels to the values of the variable in your dataset. You can create as many labels as you want and when you are finished don’t forget the semi-colon after the last label. End the PROC FORMAT
with a RUN
statement and a semi-colon.
Typically, you will assign a unique value label to each unique data value, but it's also possible to assign the same label to a range of data values.
The most common way of labeling data is to simply assign each unique code its own label. Here, the format LIKERT_SEVEN
assigns distinct labels to the values 1, 2, 3, 4, 5, 6, 7.
PROC FORMAT;
VALUE LIKERT_SEVEN
1 = "Strongly Disagree"
2 = "Disagree"
3 = "Slightly Disagree"
4 = "Neither Agree nor Disagree"
5 = "Slightly Agree"
6 = "Agree"
7 = "Strongly Agree";
RUN;
We may want to use the same value for more than one numeric code. We can do this by listing all of the values (separated by commas) to assign a given label. Format LIKERT7_A
assigns the label "Disagree" to values 1, 2, 3; and assigns the label "Agree" to values 5, 6, 7.
PROC FORMAT;
VALUE LIKERT7_A
1,2,3 = "Disagree"
4 = "Neither Agree nor Disagree"
5,6,7 = "Agree";
RUN;
If a numeric variable represents ordinal codes (or has some discernable order), you can assign the same label to multiple codes in a range. Format LIKERT7_B
assigns the label "Disagree" to values 1 through 3, and assigns the label "Agree" to values 5 through 7.
PROC FORMAT;
VALUE LIKERT7_B
1-3 = "Disagree"
4 = "Neither Agree nor Disagree"
5-7 = "Agree";
RUN;
You can also use the keywords LOW and HIGH when assigning labels to a variable with continuous values. Here, format INCOME
:
PROC FORMAT;
VALUE INCOME
LOW -< 20000 = "Low"
20000 -< 60000 = "Middle"
60000 - HIGH = "High";
RUN;
PROC FORMAT;
VALUE RACE
1 = "White"
2 = "Black"
OTHER = "Other";
RUN;
Value labels can also be applied to character/string data values. The most important differences are:
PROC FORMAT;
VALUE $GENDERLABEL
"M" = "Male"
"F" = "Female";
RUN;
After the formats have been created using PROC FORMAT, they must still be applied to the data. This can either be done temporarily, by adding the labels during a PROC step, or be done permanently, by applying the labels in a data step.
Our sample dataset has several categorical variables that, without formats, are hard to look at and know what the value represents. These include the variables Gender (which has values 0=male, 1=female), Athlete (which has values 0=non-athlete, 1=athlete), and Smoking (which has values 0=nonsmoker, 1=past smoker, 2=current smoker). Let's create formats for each of these sets of labels.
PROC FORMAT;
VALUE GENDERCODE
0 = 'Male'
1 = 'Female';
VALUE ATHLETECODE
0 = 'Non-athlete'
1 = 'Athlete';
VALUE SMOKINGCODE
0 = 'Nonsmoker'
1 = 'Past smoker'
2 = 'Current smoker';
RUN;
Again, note the placement of a semi-colons:
PROC FORMAT
RUN
statement.Once you’ve created the format, you still have to assign it to your variable. Let’s assign the new formats to their respective variables, so that when we look at the data or output, we see the labels instead of the codes.
DATA sample_formatted2;
SET sample;
FORMAT gender GENDERCODE. athlete ATHLETECODE. smoking SMOKINGCODE.;
RUN;
The syntax above creates a new dataset called sample_formatted2
that is a copy of sample
. We used the FORMAT
statement to assign the previously created format called GENDERCODE.
to the variable gender. Note that the format name ends with a period. When you add the period after the format name, that is an indication to SAS that GENDERCODE.
is a format and not a variable.
If you assign a format to a variable in a data step, the format will stay with the variable for the rest of the SAS session. However, SAS will not permanently store the definitions for user-defined formats between sessions. For example, SAS will know that the variable gender in our sample dataset is assigned to the format GENDERCODE.
, but it won’t know what the GENDERCODE.
format is until you tell it. If you try to use your dataset with user-defined permanent formats, SAS won’t be able to execute any statements on the dataset until you define your user-defined formats.
There are four ways to deal with this:
Each time you launch SAS, manually run your PROC FORMAT code before running any data steps or proc steps that reference your user-defined formats.
This approach is simple, but can be tedious if you have many user-defined formats, or want to reuse format definitions between projects.
Just as SAS datasets can be permanently saved in a SAS library and re-used later, you can permanently save user-defined formats in a SAS library for later reuse. If you find yourself working in a dataset with many user-defined variables, or if you want to reuse your user-defined formats on different datasets, this option will be the easiest for you.
The first step is to save the format definitions to your SAS library. In the PROC FORMAT statement, the LIBRARY keyword lets you specify the name of a library in which to save the definitions:
LIBNAME tutorial "C:/Documents/tutorial";
PROC FORMAT LIBRARY=tutorial;
VALUE GENDER
1 = "Male"
2 = "Female";
VALUE YN
1 = "Yes"
2 = "No";
RUN;
In this program, we first define a SAS library called tutorial that's mapped to the file folder C:/Documents/tutorial
. This is where our formats will ultimately be saved. The addition of the LIBRARY
keyword to the PROC FORMAT statement tells SAS to store the formats to the library called tutorial. After running the above code, you can check to make sure that your formats have been saved by looking in the Explorer window and navigating to the library where you saved your formats. You should see a new icon in that library's contents:
This will save the formats to your library. However, in order to actually make use of these formats during a SAS session, there's an extra step you'll need to do at the start of each SAS session.
By default, SAS will only look in the WORK library (i.e., the working memory of the current SAS session) for formats, regardless of whether or not you've executed your LIBNAME statement that points to the library containing your formats. In order to access the format definitions stored in other SAS libraries, you must run OPTIONS FMTSEARCH
at the start of your SAS session:
LIBNAME tutorial "C:/Documents/tutorial";
OPTIONS FMTSEARCH=(tutorial);
The FMTSEARCH
system option tells SAS where to look for format definitions. Note that if you have format definitions stored multiple SAS libraries, you can specify the names of multiple SAS libraries in the parentheses after FMTSEARCH
(separate the names by spaces). SAS will look for format definitions in the libraries you specify, and it will search the libraries in the order you list them. For example:
LIBNAME tutorial "C:/Documents/tutorial";
LIBNAME school "C:/Documents/school";
OPTIONS FMTSEARCH=(tutorial school work);
In this case, SAS will first look for format definitions in the library called tutorial. If it can't find a matching definition in the library tutorial, it will then look in the library called school; and if it can't find a matching format in the library called school, it will look in the WORK (i.e., temporary) library.
As an alternative to saving formats in a SAS library, you can save your PROC FORMAT code in its own SAS script file, separate from the SAS script(s) where you will do your "main" analysis. Then at the start of each SAS session, you will load those definitions by using an %INCLUDE
statement to execute the format definition script.
Let's suppose that we have two SAS script files:
Suppose this file is saved in the folder C:/Documents/tutorial
.
PROC FORMAT;
VALUE agreement
1 = "Disagree"
2 = "Neither agree nor disagree"
3 = "Agree";
RUN;
%INCLUDE 'C:/Documents/tutorial/my_format_definitions.sas';
DATA test;
INPUT id $ v1;
FORMAT v1 agreement.;
....
RUN;
Here the SAS keyword %INCLUDE
tells SAS to execute the code found in the file my_format_definitions.sas. It behaves exactly as if you had manually executed the PROC FORMAT code. If the execution was successful, then the Log window should say something like
NOTE: Format AGREEMENT has been output.