SEARCH UNIVERSITY LIBRARIES
Our tutorials reference a dataset called "sample" in many examples. If you'd like to download the sample dataset to work through the examples, choose one of the files below:
Sometimes you want to change the structure of your dataset so that observations are ordered according to one or more variables, called sorting. Data is easily sorted by one or more variables with a procedure called
PROC SORT. You can sort data by both numeric and character variables. The general format of the sort procedure is:
PROC SORT <options>; BY var; RUN;
In the syntax above,
PROC is the keyword that starts the proc step and
SORT is the name of the procedure. Immediately following PROC SORT is where you put any options you want to include. Let’s review some of the more common options:
NODUPLICATESis used without
NODUPKEY, then records are only considered duplicates if they have identical values for every variable.
In the next line is the
BY statement, where you tell SAS what variable(s) to sort the data on, and what order to do the sorting in.
DESCENDINGbefore the variable name that you want the dataset to be sorted on.
BYkeyword should be separated by a space, and should be listed in the order you want SAS to order by.
RUN statement is placed at the end of the block and tells SAS to execute the code.
In the sample dataset, we have information about gender and birthday (but not necessarily age) for the subjects in the sample. How can we sort each gender from youngest (most recent birthdate) to oldest (least recent birthdate)?
The birthday variable (bday) in the sample dataset is a date variable. Recall that date variables are a special type of numeric variable; therefore, date variables will follow the same sorting rules as regular numeric variables. This means that when date variables are sorted in ascending order, missing values will come first, and then the dates will be sorted from oldest to newest. Conversely, if a date variable is sorted in descending order, the newest (most recent) dates will come first.
(Why is this? Recall that dates in SAS are internally measured as the amount of time that has elapsed since the reference date (January 1, 1960). This implies that more recent dates will technically be stored as larger numbers.)
PROC SORT data=sample; BY gender descending bday; RUN;
After sorting, your data will look similar to this:
The data is sorted first by gender, in ascending order. Notice how missing gender values appear first, then 0 (coded for male). Within each gender, the data is then sorted in descending order by birth date. Among the cases with missing values for gender, we can see that SAS recognizes that November 29, 1995 > July 28, 1995 > April 7, 1994. In rows 10 through 12, we see the 3 "largest" (most recent) birthdates for males: January 2, 1996 > December 25, 1995 > December 12, 1995.