Our tutorials reference a dataset called "sample" in many examples. If you'd like to download the sample dataset to work through the examples, choose one of the files below:
To describe a single categorical variable, we use frequency tables. To describe the relationship between two categorical variables, we use a special type of table called a cross-tabulation (or "crosstab" for short). In a cross-tabulation, the categories of one variable determine the rows of the table, and the categories of the other variable determine the columns. The cells of the table contain the number of times that a particular combination of categories occurred. The "edges" (or "margins") of the table typically contain the total number of observations for that category.
This type of table is also known as a:
The dimensions of the crosstab refer to the number of rows and columns in the table. (The "total" row/column are not included.) The table dimensions are reported as as RxC, where R is the number of categories for the row variable, and C is the number of categories for the column variable.
Additionally, a "square" crosstab is one in which the row and column variables have the same number of categories. Tables of dimensions 2x2, 3x3, 4x4, etc. are all square crosstabs.
A typical 2x2 crosstab has the following construction:
Column 1 | Column 2 | Row totals | |
---|---|---|---|
Row 1 | a | b | a + b |
Row 2 | c | d | c + d |
Column totals | a + c | b + d | a + b + c + d |
The letters a, b, c, and d represent what are called cell counts.
By adding a, b, c, and d, we can determine the total number of observations in each category, and in the table overall.
The row sums and column sums are sometimes referred to as marginal frequencies. Note that if you were to make frequency tables for your row variable and your column variable, the frequency table should match the values for the row totals and column totals, respectively.
When you are describing the composition of your sample, it is often useful to refer to the proportion of the row or column that fell within a particular category. This can be achieved by computing the row percentages or column percentages.
Column 1 | Column 2 | Row totals | |
---|---|---|---|
Row 1 Row 1 % |
a a / (a + b) |
b b / (a + b) |
a + b (a + b) / (a+b) = 100% |
Row 2 Row 2 % |
c c / (c + d) |
d d / (c + d) |
c + d (c + d) / (c + d) = 100% |
Column totals % of total |
a + c (a + c) / (a + b + c + d) |
b + d (b + d) / (a + b + c + d) |
a + b + c + d (a + b + c + d) / (a + b + c + d) = 100% |
Notice that when computing row percentages, the denominators for cells a, b, c, d are determined by the row sums (here, a + b and c + d). This implies that the percentages in the "row totals" column must equal 100%.
Column 1 | Column 2 | Row totals | |
---|---|---|---|
Row 1 Column 1 % |
a a / (a + c) |
b b / (b + d) |
a + b (a + b) / (a + b + c + d) |
Row 2 Column 2 % |
c c / (a + c) |
d d / (b + d) |
c + d (c + d) / (a + b + c + d) |
Column totals Percentage % |
a + c (a + c) / (a + c) = 100% |
b + d (b + d) / (b + d) = 100% |
a + b + c + d (a + b + c + d) / (a + b + c + d) = 100% |
Notice that when computing column percentages, the denominators for cells a, b, c, d are determined by the column sums (here, a + c and b + d). This implies that the percentages in the "column totals" row must equal 100%.
Column 1 | Column 2 | Row totals | |
---|---|---|---|
Row 1 % of total |
a a / (a + b + c + d) |
b b / (a + b + c + d) |
a + b (a + b) / (a + b + c + d) |
Row 2 % of total |
c c / (a + b + c + d) |
d d / (a + b + c + d) |
c + d (c + d) / (a + b + c + d) |
Column totals % of total |
a + c (a + c) / (a + b + c + d) |
b + d (b + d) / (a + b + c + d) |
a + b + c + d (a + b + c + d) / (a + b + c + d) = 100% |
Notice that when total percentages are computed, the denominators for all of the computations are equal to the total number of observations in the table, i.e. a + b + c + d.
Your data must meet the following requirements:
The categorical variables in your SPSS dataset can be numeric or string, and their measurement level can be defined as nominal, ordinal, or scale. However, crosstabs should only be used when there are a limited number of categories.
Note that in most cases, the row and column variables in a crosstab can be used interchangeably. The choice of row/column variable is usually dictated by space requirements or interpretation of the results.
To create a crosstab, click Analyze > Descriptive Statistics > Crosstabs.
A Row(s): One or more variables to use in the rows of the crosstab(s). You must enter at least one Row variable.
B Column(s): One or more variables to use in the columns of the crosstab(s). You must enter at least one Column variable.
Also note that if you specify one row variable and two or more column variables, SPSS will print crosstabs for each pairing of the row variable with the column variables. The same is true if you have one column variable and two or more row variables, or if you have multiple row and column variables.
C Layer: An optional "stratification" variable. When a layer variable is specified, the crosstab between the Row and Column variable(s) will be created at each level of the layer variable. You can have multiple layers of variables by specifying the first layer variable and then clicking Next to specify the second layer variable. Alternatively, you can try out multiple variables as single layers at a time by putting them all in the Layer 1 of 1 box.
D Statistics: Opens the Crosstabs: Statistics window, which contains fifteen different inferential statistics for comparing categorical variables. (These statistics will be covered in detail in a later tutorial.)
E Cells: Opens the Crosstabs: Cell Display window, which controls which output is displayed in each cell of the crosstab.
F Format: Opens the Crosstabs: Table Format window, which specifies how the rows of the table are sorted.
Some universities in the United States require that freshmen live in the on-campus dormitories during their first year, with exceptions for students whose families live within a certain radius of campus. That is, certain freshmen whose families live close enough to campus are permitted to live off-campus. After completing their first or second year of school, students living in the dorms may choose to move into an off-campus apartment. How prevalent is this pattern?
In the sample dataset, there are several variables relating to this question:
Let's use different aspects of the Crosstabs procedure to investigate the relationship between class rank and living on campus.
Using the sample data, let's make crosstab of the variables Rank and LiveOnCampus. Let the row variable be Rank, and the column variable be LiveOnCampus.
CROSSTABS
/TABLES=Rank BY LiveOnCampus
/FORMAT=AVALUE TABLES
/CELLS=COUNT
/COUNT ROUND CELL.
The Case Processing Summary tells us what proportion of the observations had nonmissing values for both Rank and LiveOnCampus. In this sample, there were 47 cases that had a missing value for Rank, LiveOnCampus, or for both Rank and LiveOnCampus.
The second table (here, Class Rank * Do you live on campus? Crosstabulation) contains the crosstab. We can quickly observe information about the interaction of these two variables:
Note the margins of the crosstab (i.e., the "total" row and column) give us the same information that we would get from frequency tables of Rank and LiveOnCampus, respectively:
Let's build on the table shown in Example 1 by adding row, column, and total percentages. For simplicity's sake, let's switch out the variable Rank (which has four categories) with the variable RankUpperUnder (which has two categories).
CROSSTABS
/TABLES=RankUpperUnder BY LiveOnCampus
/FORMAT=AVALUE TABLES
/CELLS=COUNT ROW COLUMN TOTAL
/COUNT ROUND CELL.
If the row variable is RankUpperUnder and the column variable is LiveOnCampus, then the row percentages will tell us what percentage of the upperclassmen or what percentage of the underclassmen live on campus. That is, variable RankUpperUnder will determine the denominator of the percentage computations.
If the row variable is RankUpperUnder and the column variable is LiveOnCampus, then the column percentages will tell us what percentage of the individuals who live on campus are upper or underclassmen. That is, variable LiveOnCampus will determine the denominator of the percentage computations.
If the row variable is RankUpperUnder and the column variable is LiveOnCampus, then the total percentage tells us what proportion of the total is within each combination of RankUpperUnder and LiveOnCampus. That is, the overall table size determines the denominator of the percentage computations.
Let's modify our analysis slightly by taking into account the students' state of residence (in-state or out-of-state). Here, we will be working with three categorical variables: RankUpperUnder, LiveOnCampus, and State_Residency.
In this example, we want to create a crosstab of RankUpperUnder by LiveOnCampus, with variable State_Residency acting as a strata, or layer variable.
CROSSTABS
/TABLES=RankUpperUnder BY LiveOnCampus BY State_Residency
/FORMAT=AVALUE TABLES
/CELLS=COUNT
/COUNT ROUND CELL.
Again, the Crosstabs output includes the boxes Case Processing Summary and the crosstabulation itself.
Notice that after including the layer variable State Residency, the number of valid cases we have to work with has dropped from 388 to 367. This is because the crosstab requires nonmissing values for all three variables: row, column, and layer.
The layered crosstab shows the individual Rank by Campus tables within each level of State Residency. Some observations we can draw from this table include: