SEARCH UNIVERSITY LIBRARIES
Informats tell SAS how to read a variable. This is important when you read in data from an external file with the
INPUT statement in a
DATA step, and also when you create a new variable in a dataset. Every variable in any SAS dataset will have an informat, and it’s always a good idea to check your SAS data to see what the informat is for each variable. This will help you ensure that the imported data were read in properly. It is also just good practice to look at the variable informats so that you understand the dataset better.
There are three main classes of informats: character, numeric, and date. Here are some common examples:
|Type||Informat Name||What it Does|
||Reads in character data of length w.|
||Reads in numeric data of length w with d decimal points|
||Reads in date data in the form of mm-dd-yy|
Notice that all informats names contain a period; this helps SAS recognize that it is an informat name rather than a variable name. SAS will not recognize the informat name without the dot.
Formats tell SAS how to display a variable when printed to the output window or viewed in a Viewtable window. The format for a variable does not have to be the same as the informat for the variable. While an informat is declared when you are reading in data or creating a new variable in a data step, the format statement can be used in either a data step or a proc step:
Formats can be grouped into the same three classes as informats (character, numeric, and date-time) and also always contain a dot.
Every variable will have a format – whether you assign one or you let SAS assign one automatically. It is to your advantage to assign formats that make sense to you and that you can easily interpret when you see the values displayed in the dataset or in your output.
The general form of a format statement is:
FORMAT variable-name FORMAT-NAME.;
Here the first word (
FORMAT) is the SAS keyword that tells it to assign a format to a variable. The second word is the name of the variable you want to assign to a format. Finally, type the name of the format followed by a period and a semicolon.
A format statement used in a data step will permanently store the variable’s format assignment with the dataset.
Regardless of the informat or format, date values in SAS are stored as the number of days since January 1, 1960. This means that stored date values can be negative (if the date is before January 1, 1960) or positive (if the date is after January 1, 1960). For example, the date June 30, 1999 will be stored in SAS as the number 14425 because June 30, 1999 was 14,425 days after January 1, 1960.
This storage method is convenient for date arithmetic, but not convenient for human readers. By default, SAS uses this formatting for date values. In order to view date variables normally, you must apply a date format to the variable.
The following syntax reads in a small dataset using the
DATALINES statements. Notice that the
INPUT statement is where we tell SAS what informats to use. In particular, we specify that variables company and type are character variables (with no specific length requirement); score is a numeric variable of length 3; and date is a date variable in the form MM/DD/YYYY.
DATA WineRanking; INPUT company $ type $ score 3. date MMDDYY10.; DATALINES; Helmes Pinot 56 09/14/2012 Helmes Reisling 38 09/14/2012 Vacca Merlot 91 09/15/2012 Sterling Pinot 65 06/30/2012 Sterling Prosecco 72 06/30/2012 ; RUN;
Now if you look at the data using the viewtable, you can see that the values for the variable date look like 19250, 19251, and 19174, rather than 9/14/2012, 9/14/2012, etc. This is because we only told SAS the informat to use. Because we did not explicitly tell SAS what format to use, it used its default format for dates (which is not convenient for human readers).
We can revise the above block of code so that it reads the dates in one format [the informat], but prints the dates in a different format:
DATA WineRanking; INPUT company $ type $ score 3. date MMDDYY10.; FORMAT date MMDDYY8.; DATALINES; Helmes Pinot 56 09/14/2012 Helmes Reisling 38 09/14/2012 Vacca Merlot 91 09/15/2012 Sterling Pinot 65 06/30/2012 Sterling Prosecco 72 06/30/2012 ; RUN;
Executing this program in SAS will assign the variable date to the format
MMDDYY8., which will display it as MM/DD/YY.
Using the sample dataset, let’s change the format of the date of birth variable (bday). If you used the Import Wizard to import the data, the SAS default was to assign the variable a
DATE9. format, which looks like DDMMMYYYY (i.e., a two-digit day of the month, followed by a three-letter abbreviation for the month, followed by a four-digit year):
Let’s permanently change the format to
MMDDYY10. which will make the date values appear as MM/DD/YYYY:
DATA students_formatted; SET sample; FORMAT bday MMDDYY10.; RUN;
Now when you view the values of variable bday, you can see that they use a two-digit value for the month, followed by a two-digit day, followed by a four-digit year:
Using our sample dataset, let’s change the format of the date of birth variable (bday) so that it appears a different way when the dataset is printed. To print the data, we will use a proc step called
PROC PRINT. We will cover this and other proc steps later on, but for now just note that you can put a format statement in a proc step so that the variable has a different format for the output you produce in the proc step. This will not change the format of the variable in the dataset.
PROC PRINT DATA = students_formatted; VAR ids bday; FORMAT bday WORDDATE20.; RUN;
Note that although bday prints in the output (upper panel) with the format
WORDDATE20., the format of the bday variable itself is unchanged in the dataset (lower panel).
You can find an extensive list of built-in formats and informats, listed alphabetically and by category, in the SAS Help Manual.