Crosstabs Print Page 
To summarize a single categorical variable, we use frequency tables. To summarize the relationship between two categorical variables, we use a crosstabulation (also called a contingency table). A crosstabulation (or crosstab for short) is a table that depicts the number of times each of the possible category combinations occurred in the sample data.
To create a crosstab, click Analyze > Descriptive Statistics > Crosstabs. The Crosstabs window appears.
To create a crosstab, you must supply at least one categorical variable in both the Row(s) and Column(s) fields. These fields determine which variables compose the rows or columns of the tables. (In most cases, the row or column variables can be treated interchangeably. The choice of row/column variable is usually dictated by space requirements or interpretation of the results.)
The optional Layer area uses a categorical variable to stratify the data. When a layer variable is specified, the crosstab between the Row and Column variable(s) will be created at each level of the layer variable.
The Crosstabs: Statistics window contains fifteen different statistics for comparing two categorical variables. (These statistics will be covered in detail in a later tutorial.)
The Crosstabs: Cell Display window specifies which output is displayed in each cell of the crosstab.
The Crosstabs: Table Format option specifies how the rows of the table are organized.
The crosstabs procedure can use numeric or string variables defined as nominal, ordinal, or scale. However, crosstabs should only be used when there are a limited number of categories.
The dimensions of the crosstab refer to the number of rows and columns in the table. The table dimensions are reported as as RxC, where R is the number of categories for the row variable, and C is the number of categories for the column variable.
Additionally, a "square" crosstab is one in which the row and column variables have an equal number of categories. Tables of dimensions 2x2, 3x3, 4x4, etc. are all square crosstabs.
A typical 2x2 crosstab has the following construction:
Column 1  Column 2  Row totals  
Row 1  a  b  a + b 
Row 2  c  d  c + d 
Column totals  a + c  b + d  a + b + c + d 
The letters a, b, c, and d represent what are called cell counts.
By adding a, b, c, and d, we can determine the total number of observations in each category, and in the table overall.
Note that if you were to make frequency tables for your row variable and your column variables, the frequency table should match the values for the row and column totals.
When you are describing the composition of your sample, it is often useful to refer to the proportion of the row or column that fell within a particular category. This can be achieved by computing the row percentages or column percentages.
Column 1  Column 2  Row totals  
Row 1 Row 1 % 
a a / (a + b) 
b b / (a + b) 
a + b (a + b) / (a+b) = 100% 
Row 2 Row 2 % 
c c / (c + d) 
d d / (c + d) 
c + d (c + d) / (c + d) = 100% 
Column totals % of total 
a + c (a + c) / (a + b + c + d) 
b + d (b + d) / (a + b + c + d) 
a + b + c + d (a + b + c + d) / (a + b + c + d) = 100% 
Notice that when computing row percentages, the denominators for cells a, b, c, d are determined by the row sums (here, a + b and c + d). This implies that the percentages in the "row totals" column must equal 100%.
Column 1  Column 2  Row totals  
Row 1 Column 1 % 
a a / (a + c) 
b b / (b + d) 
a + b (a + b) / (a + b + c + d) 
Row 2 Column 2 % 
c c / (a + c) 
d d / (b + d) 
c + d (c + d) / (a + b + c + d) 
Column totals Percentage % 
a + c (a + c) / (a + c) = 100% 
b + d (b + d) / (b + d) = 100% 
a + b + c + d (a + b + c + d) / (a + b + c + d) = 100% 
Notice that when computing column percentages, the denominators for cells a, b, c, d are determined by the column sums (here, a + c and b + d). This implies that the percentages in the "column totals" row must equal 100%.
Column 1  Column 2  Row totals  
Row 1 % of total 
a a / (a + b + c + d) 
b b / (a + b + c + d) 
a + b (a + b) / (a + b + c + d) 
Row 2 % of total 
c c / (a + b + c + d) 
d d / (a + b + c + d) 
c + d (c + d) / (a + b + c + d) 
Column totals % of total 
a + c (a + c) / (a + b + c + d) 
b + d (b + d) / (a + b + c + d) 
a + b + c + d (a + b + c + d) / (a + b + c + d) = 100% 
Notice that when total percentages are computed, the denominators for all of the computations are equal to the total number of observations in the table, i.e. a + b + c + d.
Recall the notation for the conditional probability of event A, given event B: P(A  B).
(Tip: The vertical bar (  ) is equivalent to the word "given".)
Column 1  Column 2  Row totals  
Row 1 Probability 
a P(Col 1  Row 1) 
b P(Col 2  Row 1) 
a + b

Row 2 Probability 
c P(Col 1  Row 1) 
d P(Col 2  Row 2) 
c + d

Column totals 
a + c 
b + d 
a + b + c + d 
Column 1  Column 2  Row totals  
Row 1 Probability 
a P(Col 1  Row 1) 
b P(Col 2  Row 1) 
a + b

Row 2 Probability 
c P(Col 1  Row 1) 
d P(Col 2  Row 2) 
c + d

Column totals 
a + c 
b + d 
a + b + c + d 
Recall the notation for probability of intersection between events A and B: P(A and B).
Column 1  Column 2  Row totals  
Row 1 Probability 
a P(Row 1 AND Col 1) 
b P(Row 1 AND Col 2) 
a + b

Row 2 Probability 
c P(Col 1 AND Row 2) 
d P(Col 2 AND Row 2) 
c + d 
Column totals 
a + c 
b + d 
a + b + c + d 
Using the sample data, let's make a crosstab of the variables Gender and Alcohol. Let the row variable be Gender, and the column variable be Alcohol.
The Case Processing Summary tells us what proportion of the observations had nonmissing values for both Gender and Alcohol. In this sample, there were 9 cases that had missing values for Gender, Alcohol, or for both Gender and Alcohol.
The second table (here, Gender * Drinks alcohol? Crosstabulation) contains the crosstab. We can quickly observe information about the interaction of these two variables:
Note that we can also observe from the crosstab the same information that we would get from the frequency tables of Gender alone and Alcohol alone:
Let's build on the table shown in Example 1 by adding row, column, and total percentages.
Row, column, and total percentage settings can be accessed from the Crosstabs window by clicking Cells and selecting the Row percent or Column percent or Total percent check box. Here, we will look at the row, column, and total percents individually.
If the row variable is Gender and the column variable is Alcohol, then the row percentages will tell us what percentage of the males or what percentage of the females drink alcohol. That is, variable Gender will determine the denominator of the percentage computations.
If the row variable is Gender and the column variable is Alcohol, then the column percentages will tell us what percentage of the individuals who do or do not drink are male or female. That is, variable Alcohol will determine the denominator of the percentage computations.
If the row variable is Gender and the column variable is Alcohol, then the total percentage tells us what proportion of the total is within each combination of Gender and Alcohol. That is, the overall table size determines the denominator of the percentage computations.
Let's modify our analysis slightly by looking at the differences between men and women with respect to alcohol use and binge drinking. Here, we will be working with three categorical variables: Gender, Alcohol, and Binge.
In this example, we want to create a crosstab of Alcohol by binge, with variable Gender acting as a strata, or grouping variable.
This will produce a table that looks like the following:
From this table, we can see that the males' and females' behaviors with respect to alcohol use and binge drinking were nearly identical.
Loading...