The SPSS EnvironmentThe Data View WindowUsing SPSS SyntaxData Creation in SPSSImporting Data into SPSSCreating and Deleting CasesCreating and Deleting VariablesVariable TypesDate-Time Variables in SPSSDefining VariablesRecoding VariablesAutomatic RecodeComputing VariablesSorting DataSplitting DataWeighting Cases
DescriptivesCompare MeansExploreFrequencies Part I (Continuous Data)Frequencies Part II (Categorical Data)Crosstabs
Pearson Correlation (bivariate)One Sample t TestIndependent Samples t Test
This is the "Crosstabs" page of the "SPSS Tutorials" guide.
Alternate Page for Screenreader Users
Skip to Page Navigation
Skip to Page Content
*To search for student contact information, login to FlashLine and choose the "Directory" icon in the FlashLine masthead (blue bar).

SPSS Tutorials   Tags: spss, statistics, tutorials  

This LibGuide contains written and illustrated tutorials for the statistical software SPSS.
Last Updated: Aug 29, 2014 URL: http://libguides.library.kent.edu/SPSS Print Guide RSS UpdatesEmail Alerts

Crosstabs Print Page
  Search: 
 
 

Crosstabs

To summarize a single categorical variable, we use frequency tables. To summarize the relationship between two categorical variables, we use a cross-tabulation (also called a contingency table). A cross-tabulation (or crosstab for short) is a table that depicts the number of times each of the possible category combinations occurred in the sample data.

To create a crosstab, click Analyze > Descriptive Statistics > Crosstabs. The Crosstabs window appears.

To create a crosstab, you must supply at least one categorical variable in both the Row(s) and Column(s) fields. These fields determine which variables compose the rows or columns of the tables. (In most cases, the row or column variables can be treated interchangeably. The choice of row/column variable is usually dictated by space requirements or interpretation of the results.)

The optional Layer area uses a categorical variable to stratify the data. When a layer variable is specified, the crosstab between the Row and Column variable(s) will be created at each level of the layer variable.

The Crosstabs: Statistics window contains fifteen different statistics for comparing two categorical variables. (These statistics will be covered in detail in a later tutorial.)

The Crosstabs: Cell Display window specifies which output is displayed in each cell of the crosstab.

The Crosstabs: Table Format option specifies how the rows of the table are organized.

 

Special Considerations for Crosstabs

Data Requirements

The crosstabs procedure can use numeric or string variables defined as nominal, ordinal, or scale. However, crosstabs should only be used when there are a limited number of categories.

 

Describing a Crosstab

The dimensions of the crosstab refer to the number of rows and columns in the table. The table dimensions are reported as as RxC, where R is the number of categories for the row variable, and C is the number of categories for the column variable.

Additionally, a "square" crosstab is one in which the row and column variables have an equal number of categories. Tables of dimensions 2x2, 3x3, 4x4, etc. are all square crosstabs.

 

Example 1

  • Row variable: Class Rank (4 categories: freshman, sophomore, junior, senior)
  • Column variable: Gender (2 categories: male, female)
  • Table dimension: 4x2

 

Example 2

  • Row variable: Gender (2 categories: male, female)
  • Column variable: Smoking (3 categories: never smoked, past smoker, current smoker)
  • Table dimension: 2x3

 

Example 3

  • Row variable: Gender (2 categories: male, female)
  • Column variable: Alcohol (2 categories: no, yes)
  • Table dimension: 2x2 (square)

 

Understanding Row, Column, and Total Percents

Construction of a crosstab

A typical 2x2 crosstab has the following construction:

Column 1 Column 2 Row totals
Row 1 a b a + b
Row 2 c d c + d
Column totals a + c b + d a + b + c + d

 

The letters a, b, c, and d represent what are called cell counts.

  • a is the number of observations corresponding to Row 1 AND Column 1.
  • b is the number of observations corresponding to Row 1 AND Column 2.
  • c is the number of observations corresponding to Row 2 AND Column 1.
  • d is the number of observations corresponding to Row 2 AND Column 2.

By adding a, b, c, and d, we can determine the total number of observations in each category, and in the table overall.

  • Row sum of row 1 (i.e., total number of observations in Row 1): a + b
  • Row sum of row 2 (i.e., total number of observations in Row 2): c + d
  • Column sum of column 1 (i.e., total number of observations in Column 1): a + c
  • Column sum of column 2 (i.e., total number of observations in Column 2): b + d
  • Total sum (i.e., total number of observations in the table): n = a + b + c + d

Note that if you were to make frequency tables for your row variable and your column variables, the frequency table should match the values for the row and column totals.

When you are describing the composition of your sample, it is often useful to refer to the proportion of the row or column that fell within a particular category. This can be achieved by computing the row percentages or column percentages.

Formulas for computing row percentages
Column 1 Column 2 Row totals

Row 1

Row 1 %

a

a / (a + b)

b

b / (a + b)

a + b

(a + b) / (a+b) = 100%

Row 2

Row 2 %

c

c / (c + d)

d

d / (c + d)

c + d

(c + d) / (c + d) = 100%

Column totals

% of total

a + c

(a + c) / (a + b + c + d)

b + d

(b + d) / (a + b + c + d)

a + b + c + d

(a + b + c + d) / (a + b + c + d) = 100%

 

Notice that when computing row percentages, the denominators for cells a, b, c, d are determined by the row sums (here, a + b and c + d). This implies that the percentages in the "row totals" column must equal 100%.

 

Formulas for computing column percentages
Column 1 Column 2 Row totals

Row 1

Column 1 %

a

a / (a + c)

b

b / (b + d)

a + b

(a + b) / (a + b + c + d)

Row 2

Column 2 %

c

c / (a + c)

d

d / (b + d)

c + d

(c + d) / (a + b + c + d)

Column totals

Percentage %

a + c

(a + c) / (a + c) = 100%

b + d

(b + d) / (b + d) = 100%

a + b + c + d

(a + b + c + d) / (a + b + c + d) = 100%

 

Notice that when computing column percentages, the denominators for cells a, b, c, d are determined by the column sums (here, a + c and b + d). This implies that the percentages in the "column totals" row must equal 100%.

 

Formulas for computing total percentages
Column 1 Column 2 Row totals

Row 1

% of total

a

a / (a + b + c + d)

b

b / (a + b + c + d)

a + b

(a + b) / (a + b + c + d)

Row 2

% of total

c

c / (a + b + c + d)

d

d / (a + b + c + d)

c + d

(c + d) / (a + b + c + d)

Column totals

% of total

a + c

(a + c) / (a + b + c + d)

b + d

(b + d) / (a + b + c + d)

a + b + c + d

(a + b + c + d) / (a + b + c + d) = 100%

 

Notice that when total percentages are computed, the denominators for all of the computations are equal to the total number of observations in the table, i.e. a + b + c + d.

 

Connection to conditional probabilities

Recall the notation for conditional probability of event A, given B: P(A | B).

  • Row percentages can be thought of as the conditional probabilities of the column, given the row.
  • Column percentages can be thought of as the conditional probabilities of the row, given the column.

(Tip: The vertical bar ( | ) is equivalent to the word "given".)

Conditional probability notation (row percentages)
Column 1 Column 2 Row totals

Row 1

Probability

a

P(Col 1 | Row 1)

b

P(Col 2 | Row 1)

a + b

 

Row 2

Probability

c

P(Col 1 | Row 1)

d

P(Col 2 | Row 2)

c + d

 

Column totals

a + c

b + d

a + b + c + d

 

 

Conditional probability notation (column percentages)
Column 1 Column 2 Row totals

Row 1

Probability

a

P(Col 1 | Row 1)

b

P(Col 2 | Row 1)

a + b

 

Row 2

Probability

c

P(Col 1 | Row 1)

d

P(Col 2 | Row 2)

c + d

 

Column totals

a + c

b + d

a + b + c + d

 

Connection to intersection probabilities

Recall the notation for probability of intersection between events A and B: P(A and B).

  • Total percentages can be thought of as the probability of the row and column intersecting.
Conditional probabilities (total percentages)
Column 1 Column 2 Row totals

Row 1

Probability

a

P(Row 1 AND Col 1)

b

P(Row 1 AND Col 2)

a + b

 

Row 2

Probability

c

P(Col 1 AND Row 2)

d

P(Col 2 AND Row 2)

c + d

 

Column totals

a + c

b + d

a + b + c + d

 

Example 1: Simple crosstab

Using the sample data, let's make a crosstab of the variables Gender and Alcohol. Let the row variable be Gender, and the column variable be Alcohol.

  1. Open the Crosstabs window (Analyze > Descriptive Statistics > Crosstabs).
  2. Select Alcohol as the row variable, and Binge as the column variable.
  3. Click OK.

The Case Processing Summary tells us what proportion of the observations had nonmissing values for both Gender and Alcohol. In this sample, there were 9 cases that had missing values for Gender, Alcohol, or for both Gender and Alcohol.

The second table (here, Gender * Drinks alcohol? Crosstabulation) contains the crosstab. We can quickly observe information about the interaction of these two variables:

  • Thirteen males (13) and thirteen (13) females did not drink alcohol
  • Thirty-three (33) males did drink alcohol
  • Thirty-two (32) females did drink alcohol

Note that we can also observe from the crosstab the same information that we would get from the frequency tables of Gender alone and Alcohol alone:

  • The sample had 46 females and 45 males
  • There were 26 individuals who did not drink alcohol, and 65 individuals who did drink alcohol
 

Example 2: Row, column, and total percentages

Let's build on the table shown in Example 1 by adding row, column, and total percentages.

Row, column, and total percentage settings can be accessed from the Crosstabs window by clicking Cells and selecting the Row percent or Column percent or Total percent check box. Here, we will look at the row, column, and total percents individually.


Row percents

If the row variable is Gender and the column variable is Alcohol, then the row percentages will tell us what percentage of the males or what percentage of the females drink alcohol. That is, variable Gender will determine the denominator of the percentage computations.

  • The percentage of males who do not drink is 28.3%, or 13/46. P(No | Male)
  • The percentage of females who do not drink is 28.9%, or 13/45. P(No | Female)
  • The percentage of males who drink alcohol is 71.7%, or 33/46. P(Yes | Male)
  • The percentage of females who drink alcohol is 71.1%, or 32/45. P(No | Female)

Column percents

If the row variable is Gender and the column variable is Alcohol, then the column percentages will tell us what percentage of the individuals who do or do not drink are male or female. That is, variable Alcohol will determine the denominator of the percentage computations.

  • The percentage of individuals who drink that are male is 50.8%, or 33/65. P(Male | Yes)
  • The percentage of individuals who drink that are female is 49.2%, or 32/65. P(Female | Yes)
  • The percentage of individuals who do not drink that are male is 50.0%, or 13/26. P(Male | No)
  • The percentage of individuals who do not drink that are female is 50.0%, or 13/26. P(Female | No)

Total percents

If the row variable is Gender and the column variable is Alcohol, then the total percentage tells us what proportion of the total is within each combination of Gender and Alcohol. That is, the overall table size determines the denominator of the percentage computations.

  • The percentage of the sample that is male and does not drink alcohol is 14.3%, or 13/91. P(Male AND No)
  • The percentage of the sample that is female and does not drink alcohol is 14.3%, or 13/91. P(Female AND No)
  • The percentage of the sample that is male and drinks alcohol is 36.3%, or 33/91. P(Male AND Yes)
  • The percentage of the sample that is female and drinks alcohol is 35.2%, or 32/91. P(Female AND Yes)
 

Example 3: Crosstab with a layer variable

Let's modify our analysis slightly by looking at the differences between men and women with respect to alcohol use and binge drinking. Here, we will be working with three categorical variables: Gender, Alcohol, and Binge.

In this example, we want to create a crosstab of Alcohol by binge, with variable Gender acting as a strata, or grouping variable.

  1. Open the crosstabs dialog (Analyze > Descriptive Statistics > Crosstabs).
  2. Select Alcohol as the row variable, and Binge as the column variable.
  3. Select Gender as the layer variable.
  4. Click OK.

This will produce a table that looks like the following:

From this table, we can see that the males' and females' behaviors with respect to alcohol use and binge drinking were nearly identical.

Description

Loading  Loading...

Tip