Crosstabs
To summarize a single categorical variable, we use frequency tables. To summarize the relationship between two categorical variables, we use a crosstabulation (also called a contingency table). A crosstabulation (or crosstab for short) is a table that depicts the number of times each of the possible category combinations occurred in the sample data.
To create a crosstab, click Analyze > Descriptive Statistics > Crosstabs. The Crosstabs window appears.
To create a crosstab, you must supply at least one categorical variable in both the Row(s) and Column(s) fields. These fields determine which variables compose the rows or columns of the tables. (In most cases, the row or column variables can be treated interchangeably. The choice of row/column variable is usually dictated by space requirements or interpretation of the results.)
The optional Layer area uses a categorical variable to stratify the data. When a layer variable is specified, the crosstab between the Row and Column variable(s) will be created at each level of the layer variable.
The Crosstabs: Statistics window contains fifteen different statistics for comparing two categorical variables. (These statistics will be covered in detail in a later tutorial.)
The Crosstabs: Cell Display window specifies which output is displayed in each cell of the crosstab.
The Crosstabs: Table Format option specifies how the rows of the table are organized.
Special Considerations for Crosstabs
Data Requirements
The crosstabs procedure can use numeric or string variables defined as nominal, ordinal, or scale. However, crosstabs should only be used when there are a limited number of categories.
Describing a Crosstab
The dimensions of the crosstab refer to the number of rows and columns in the table. The table dimensions are reported as as RxC, where R is the number of categories for the row variable, and C is the number of categories for the column variable.
Additionally, a "square" crosstab is one in which the row and column variables have an equal number of categories. Tables of dimensions 2x2, 3x3, 4x4, etc. are all square crosstabs.
Example 1
 Row variable: Class Rank (4 categories: freshman, sophomore, junior, senior)
 Column variable: Gender (2 categories: male, female)
 Table dimension: 4x2
Example 2
 Row variable: Gender (2 categories: male, female)
 Column variable: Smoking (3 categories: never smoked, past smoker, current smoker)
 Table dimension: 2x3
Example 3
 Row variable: Gender (2 categories: male, female)
 Column variable: Alcohol (2 categories: no, yes)
 Table dimension: 2x2 (square)
Understanding Row, Column, and Total Percents
Construction of a crosstab
A typical 2x2 crosstab has the following construction:
Column 1  Column 2  Row totals  
Row 1  a  b  a + b 
Row 2  c  d  c + d 
Column totals  a + c  b + d  a + b + c + d 
The letters a, b, c, and d represent what are called cell counts.
 a is the number of observations corresponding to Row 1 AND Column 1.
 b is the number of observations corresponding to Row 1 AND Column 2.
 c is the number of observations corresponding to Row 2 AND Column 1.
 d is the number of observations corresponding to Row 2 AND Column 2.
By adding a, b, c, and d, we can determine the total number of observations in each category, and in the table overall.
 Row sum of row 1 (i.e., total number of observations in Row 1): a + b
 Row sum of row 2 (i.e., total number of observations in Row 2): c + d
 Column sum of column 1 (i.e., total number of observations in Column 1): a + c
 Column sum of column 2 (i.e., total number of observations in Column 2): b + d
 Total sum (i.e., total number of observations in the table): n = a + b + c + d
Note that if you were to make frequency tables for your row variable and your column variables, the frequency table should match the values for the row and column totals.
When you are describing the composition of your sample, it is often useful to refer to the proportion of the row or column that fell within a particular category. This can be achieved by computing the row percentages or column percentages.
Column 1  Column 2  Row totals  
Row 1 Row 1 % 
a a / (a + b) 
b b / (a + b) 
a + b (a + b) / (a+b) = 100% 
Row 2 Row 2 % 
c c / (c + d) 
d d / (c + d) 
c + d (c + d) / (c + d) = 100% 
Column totals % of total 
a + c (a + c) / (a + b + c + d) 
b + d (b + d) / (a + b + c + d) 
a + b + c + d (a + b + c + d) / (a + b + c + d) = 100% 
Notice that when computing row percentages, the denominators for cells a, b, c, d are determined by the row sums (here, a + b and c + d). This implies that the percentages in the "row totals" column must equal 100%.
Column 1  Column 2  Row totals  
Row 1 Column 1 % 
a a / (a + c) 
b b / (b + d) 
a + b (a + b) / (a + b + c + d) 
Row 2 Column 2 % 
c c / (a + c) 
d d / (b + d) 
c + d (c + d) / (a + b + c + d) 
Column totals Percentage % 
a + c (a + c) / (a + c) = 100% 
b + d (b + d) / (b + d) = 100% 
a + b + c + d (a + b + c + d) / (a + b + c + d) = 100% 
Notice that when computing column percentages, the denominators for cells a, b, c, d are determined by the column sums (here, a + c and b + d). This implies that the percentages in the "column totals" row must equal 100%.
Column 1  Column 2  Row totals  
Row 1 % of total 
a a / (a + b + c + d) 
b b / (a + b + c + d) 
a + b (a + b) / (a + b + c + d) 
Row 2 % of total 
c c / (a + b + c + d) 
d d / (a + b + c + d) 
c + d (c + d) / (a + b + c + d) 
Column totals % of total 
a + c (a + c) / (a + b + c + d) 
b + d (b + d) / (a + b + c + d) 
a + b + c + d (a + b + c + d) / (a + b + c + d) = 100% 
Notice that when total percentages are computed, the denominators for all of the computations are equal to the total number of observations in the table, i.e. a + b + c + d.
Connection to conditional probabilities
Recall the notation for the conditional probability of event A, given event B: P(A  B).
 Row percentages can be thought of as the conditional probabilities of the column, given the row.
 Column percentages can be thought of as the conditional probabilities of the row, given the column.
(Tip: The vertical bar (  ) is equivalent to the word "given".)
Column 1  Column 2  Row totals  
Row 1 Probability 
a P(Col 1  Row 1) 
b P(Col 2  Row 1) 
a + b

Row 2 Probability 
c P(Col 1  Row 1) 
d P(Col 2  Row 2) 
c + d

Column totals 
a + c 
b + d 
a + b + c + d 
Column 1  Column 2  Row totals  
Row 1 Probability 
a P(Col 1  Row 1) 
b P(Col 2  Row 1) 
a + b

Row 2 Probability 
c P(Col 1  Row 1) 
d P(Col 2  Row 2) 
c + d

Column totals 
a + c 
b + d 
a + b + c + d 
Connection to intersection probabilities
Recall the notation for probability of intersection between events A and B: P(A and B).
 Total percentages can be thought of as the probability of the row and column intersecting.
Column 1  Column 2  Row totals  
Row 1 Probability 
a P(Row 1 AND Col 1) 
b P(Row 1 AND Col 2) 
a + b

Row 2 Probability 
c P(Col 1 AND Row 2) 
d P(Col 2 AND Row 2) 
c + d 
Column totals 
a + c 
b + d 
a + b + c + d 
Example 1: Simple crosstab
Using the sample data, let's make a crosstab of the variables Gender and Alcohol. Let the row variable be Gender, and the column variable be Alcohol.
 Open the Crosstabs window (Analyze > Descriptive Statistics > Crosstabs).
 Select Alcohol as the row variable, and Binge as the column variable.
 Click OK.
The Case Processing Summary tells us what proportion of the observations had nonmissing values for both Gender and Alcohol. In this sample, there were 9 cases that had missing values for Gender, Alcohol, or for both Gender and Alcohol.
The second table (here, Gender * Drinks alcohol? Crosstabulation) contains the crosstab. We can quickly observe information about the interaction of these two variables:
 Thirteen males (13) and thirteen (13) females did not drink alcohol
 Thirtythree (33) males did drink alcohol
 Thirtytwo (32) females did drink alcohol
Note that we can also observe from the crosstab the same information that we would get from the frequency tables of Gender alone and Alcohol alone:
 The sample had 46 females and 45 males
 There were 26 individuals who did not drink alcohol, and 65 individuals who did drink alcohol
Example 2: Row, column, and total percentages
Let's build on the table shown in Example 1 by adding row, column, and total percentages.
Row, column, and total percentage settings can be accessed from the Crosstabs window by clicking Cells and selecting the Row percent or Column percent or Total percent check box. Here, we will look at the row, column, and total percents individually.
Row percents
If the row variable is Gender and the column variable is Alcohol, then the row percentages will tell us what percentage of the males or what percentage of the females drink alcohol. That is, variable Gender will determine the denominator of the percentage computations.
 The percentage of males who do not drink is 28.3%, or 13/46. P(No  Male)
 The percentage of females who do not drink is 28.9%, or 13/45. P(No  Female)
 The percentage of males who drink alcohol is 71.7%, or 33/46. P(Yes  Male)
 The percentage of females who drink alcohol is 71.1%, or 32/45. P(No  Female)
Column percents
If the row variable is Gender and the column variable is Alcohol, then the column percentages will tell us what percentage of the individuals who do or do not drink are male or female. That is, variable Alcohol will determine the denominator of the percentage computations.
 The percentage of individuals who drink that are male is 50.8%, or 33/65. P(Male  Yes)
 The percentage of individuals who drink that are female is 49.2%, or 32/65. P(Female  Yes)
 The percentage of individuals who do not drink that are male is 50.0%, or 13/26. P(Male  No)
 The percentage of individuals who do not drink that are female is 50.0%, or 13/26. P(Female  No)
Total percents
If the row variable is Gender and the column variable is Alcohol, then the total percentage tells us what proportion of the total is within each combination of Gender and Alcohol. That is, the overall table size determines the denominator of the percentage computations.
 The percentage of the sample that is male and does not drink alcohol is 14.3%, or 13/91. P(Male AND No)
 The percentage of the sample that is female and does not drink alcohol is 14.3%, or 13/91. P(Female AND No)
 The percentage of the sample that is male and drinks alcohol is 36.3%, or 33/91. P(Male AND Yes)
 The percentage of the sample that is female and drinks alcohol is 35.2%, or 32/91. P(Female AND Yes)
Example 3: Crosstab with a layer variable
Let's modify our analysis slightly by looking at the differences between men and women with respect to alcohol use and binge drinking. Here, we will be working with three categorical variables: Gender, Alcohol, and Binge.
In this example, we want to create a crosstab of Alcohol by binge, with variable Gender acting as a strata, or grouping variable.
 Open the crosstabs dialog (Analyze > Descriptive Statistics > Crosstabs).
 Select Alcohol as the row variable, and Binge as the column variable.
 Select Gender as the layer variable.
 Click OK.
This will produce a table that looks like the following:
From this table, we can see that the males' and females' behaviors with respect to alcohol use and binge drinking were nearly identical.