Our tutorials reference a dataset called "sample" in many examples. If you'd like to download the sample dataset to work through the examples, choose one of the files below:
When summarizing a categorical or qualitative variable (nominal/ordinal), we are typically interested in questions like:
All of these questions can be answered using a frequency table. The most basic type of frequency table contains at least the following:
In SPSS Statistics, the Frequencies procedure can produce frequency tables, which contain tallies and proportions, as well as two types of graphs appropriate for categorical data: bar charts and pie charts.
Your data must contain at least one categorical variable that meets all of the following requirements:
Each row can represent one subject, or can represent an observation from a subject. Each column should represent one variable. Variables that will be tabulated using frequency tables should ideally have the following variable properties defined:
Variable Type: The categorical variables in your SPSS dataset can be numeric or string. By default, the rows of the table are arranged in ascending order (for numeric codes) or alphabetically (for string variables).
Value Labels: If you have entered data using numeric codes that represent specific named categories (especially nominal/unordered categories), you should apply value labels to your variables. This can affect the display of the table.
Missing Value Handling: The frequency table will include sections for Valid (non-missing) and Missing responses. Any values recognized as system-missing or user-missing will appear in the Missing section. If you have more than one user-defined missing value code that appears in the data, those codes are tallied separately in the Missing section of the table. (For example, if you have defined the number code -99
as "Refused response" and -88
to represent "Not asked", you will be able to see how many "Refused" and "Not asked" values there were.)
For numeric variables: Blank values will appear section of the table labeled "Missing". If your dataset also used special number codes to represent missing values (e.g. using ), there will be multiple rows in the "Missing" section of the table.
For string variables: Blank strings will NOT automatically be recognized as missing values. If present, they will appear as one of the categories in the "Valid" section; see example below. This can be resolved by using the Automatic Recode procedure to convert the original string variable to a coded numeric variable prior to creating the frequency table.
Variable Measurement Levels: The variables' measurement levels should be defined as nominal or ordinal. The Frequencies procedure will still work on variables whose measurement level is set to scale; however, frequency tables should only be used when there are a limited number of response categories.
In SPSS, the Frequencies procedure can produce summary measures for categorical variables in the form of frequency tables, bar charts, or pie charts.
To run the Frequencies procedure, click Analyze > Descriptive Statistics > Frequencies.
A Variable(s): The variables to produce Frequencies output for. To include a variable for analysis, double-click on its name to move it to the Variables box. Moving several variables to this box will create several frequency tables at once.
B Statistics: Opens the Frequencies: Statistics window, which contains various descriptive statistics.
The vast majority of the descriptive statistics available in the Frequencies: Statistics window are never appropriate for nominal variables, and are rarely appropriate for ordinal variables in most situations. There are two exceptions to this:
If your categorical variables are coded numerically, it is very easy to mis-use measures like the mean and standard deviation. SPSS will compute those statistics if they are requested, regardless of whether or not they are meaningful. It is up to the researcher to determine if these measures are appropriate for their data. In general, you should never use any of these statistics for dichotomous variables or nominal variables, and should only use these statistics with caution for ordinal variables.
C Charts: Opens the Frequencies: Charts window, which contains various graphical options. Options include bar charts, pie charts, and histograms. For categorical variables, bar charts and pie charts are appropriate. Histograms should only be used for continuous variables; they should not be used for ordinal variables, and should never be used with nominal variables.
Note that the options in the Chart Values area apply only to bar charts and pie charts. In particular, these options affect whether the labeling for the pie slices or the y-axis of the bar chart uses counts or percentages. This setting will greyed out if Histograms is selected.
D Format: Opens the Frequencies: Format window, which contains options for how to sort and organize the table output.
The Order by options affect only categorical variables:
When working with two or more categorical variables, the Multiple Variables options only affects the order of the output. If Compare variables is selected, then the frequency tables for all of the variables will appear first, and all of the graphs for the variables will appear after. If Organize output by variables is selected, then the frequency table and graph for the first variable will appear together; then the frequency table and graph for the second variable will appear together; etc.
E Display frequency tables: When checked, frequency tables will be printed. (This box is checked by default.) If this check box is not checked, no frequency tables will be produced, and the only output will come from supplementary options from Statistics or Charts. For categorical variables, you will usually want to leave this box checked.
Using the sample dataset, let's a create a frequency table and a corresponding bar chart for the class rank (variable Rank), and let's also request the Mode statistic for this variable.
FREQUENCIES VARIABLES=Rank
/STATISTICS=MODE
/BARCHART FREQ
/ORDER=ANALYSIS.
Two tables appear in the output: Statistics, which reports the number of missing and nonmissing observations in the dataset, plus any requested statistics; and the frequency table for variable Rank. The table title for the frequency table is determined by the variable's label (or the variable name, if a label is not assigned).
Here, the Statistics table shows that there are 406 valid and 29 missing values. It also shows the Mode statistic: here, the mode value is "1", which is the numeric code for the category Freshman. Notice that the Mode statistic isn't displaying the value labels, even though they have been assigned. (For this reason, we recommend not requesting the mode statistic; instead, determine the mode from the frequency table.)
Notice how the rows are grouped into "Valid" and "Missing" sections. This grouping allows for easy comparison of missing versus nonmissing observations. Note that "System" missing responses are observations that use SPSS's default symbol -- a period (.) -- for indicating missing values. If a user has assigned special codes for missing values in the Variable View window, those codes would appear here.
The frequency table contains four columns of summary measures:
The bar chart appears after the tables.
Here, we can see that:
What should I do if I create a frequency table in SPSS and one of the rows is blank?
If you are creating a frequency table and notice that the first row has a blank category label, similar to this example:
This issue should not be ignored! This particular issue affects frequency tables created from string variables that use blanks to denote missing values. SPSS does not automatically recognize blank (i.e., empty) strings as missing values, so the blank values appear as one of the "Valid" (i.e., non-missing) categories. This affects the calculation of the Valid Percent columns.
When missing values are treated as valid values, it causes the "Valid Percent" columns to be calculated incorrectly. If the blank values were correctly treated as missing values, the valid, non-missing sample size for this table would be 314 + 94 = 408 -- not 435! -- and the valid percent values would change to 314/408 = 76.9% and 94/408 = 23.0%. Depending on the number of missing values in your sample, the differences could be even more dramatic.
To fix this problem: To get SPSS to recognize blank strings as missing values, you'll need to run the variable through the Automatic Recode procedure. This procedure takes a string variable and converts it to a new, coded numeric variable with value labels attached. During this process, blank string values are recoded to a special missing value code. To see a worked example, see the Automatic Recode tutorial.
In the sample dataset, variable HowCommute has the following value labels defined: 1=Walk, 2=Bike, 3=Car, 4=Public Transit, and 5=Other. However, if you use the Frequencies procedure to create a frequency table for this variable, you will only see four of the five categories:
Why is one of the categories missing despite it being defined as a value label?
The Frequencies procedure is designed to drop unobserved categories from the frequency table: that is, it will not include categories with counts of 0. Although this can be desirable in some cases, it may be actively problematic or misleading in others. For example, if you create a frequency table of a 5-point Likert item or multiple choice question, readers may interpret the omission of categories as the categories not being included in the design of the survey -- which is very different than the categories being present on the survey but not selected by any respondents.
If you wish to create a frequency table that will include all categories with a defined value label even if they have counts of 0, you must use the Custom Tables procedure.
The Custom Tables procedure is included with SPSS Statistics Standard and SPSS Statistics Premium, but is not included in SPSS Statistics Base. If you do not see the Custom Tables procedure in the Analyze menu (Analyze > Tables > Custom Tables), it is possible your license did not include the Custom Tables module.
You can check which modules are available to you by opening a new syntax window (File > New > Syntax) and executing the following command:
SHOW LIC.
If the resulting output does not include "IBM SPSS Statistics Custom Tables", then you will not have access to the procedure.
CTABLES
/VLABELS VARIABLES=HowCommute DISPLAY=LABEL
/TABLE HowCommute [COUNT F40.0]
/CATEGORIES VARIABLES=HowCommute ORDER=A KEY=VALUE EMPTY=INCLUDE
/CRITERIA CILEVEL=95.
CTABLES
/VLABELS VARIABLES=HowCommute DISPLAY=LABEL
/TABLE HowCommute [COUNT F40.0]
/CATEGORIES VARIABLES=HowCommute ORDER=A KEY=VALUE EMPTY=INCLUDE TOTAL=YES POSITION=AFTER
/CRITERIA CILEVEL=95.
CTABLES
/VLABELS VARIABLES=HowCommute DISPLAY=LABEL
/TABLE HowCommute [COUNT F40.0, TABLEPCT.COUNT PCT40.1, TABLEPCT.VALIDN PCT40.1, TABLEPCT.TOTALN
PCT40.1]
/CATEGORIES VARIABLES=HowCommute ORDER=A KEY=VALUE EMPTY=INCLUDE
/CRITERIA CILEVEL=95.
CTABLES
/VLABELS VARIABLES=HowCommute DISPLAY=LABEL
/TABLE HowCommute [COUNT F40.0, TABLEPCT.COUNT PCT40.1, TABLEPCT.VALIDN PCT40.1, TABLEPCT.TOTALN
PCT40.1]
/CATEGORIES VARIABLES=HowCommute ORDER=A KEY=VALUE EMPTY=INCLUDE TOTAL=YES POSITION=AFTER
/CRITERIA CILEVEL=95.
The default output of Custom Tables includes only the counts. It does not include a total row, nor the number of missing values, nor percentages. However, we see that all 5 categories are present, including the Other category, which is shown with a count of 0:
This table is identical to the previous table, but with the addition of a Total row that includes the sum of the counts in the table. We can verify the total shown in the table by performing the sum manually: 26+15+193+13+0=247.
This table is identical to the previous table but with the addition of extra columns containing the percentages we requested.
In this example, Table N% and Table Valid N% are identical, but this will not always be the case. If your variable includes user-missing values and you have enabled the Missing option, then Table N% will be different than Table Valid N%. (In general, "Table Valid N%" in Custom Tables has the same meaning as the "Valid Percent" column of the Frequencies output: it is the proportion based on the number of valid, nonmissing cases.)
The Table Total N% values are based on the total number of cases in the dataset (valid + missing). Recall that the sample dataset has 435 rows, and we know from the Frequencies procedure that variable HowCommute has 247 valid/observed values and 188 missing values (247 valid +188 missing = 435 total). We can verify that the Table Total N% values are based on the number of rows by performing the divisions ourselves and rounding to one decimal place: