## Pearson Correlation

The bivariate Pearson Correlation produces a sample correlation coefficient, *r*, which measures the strength and direction of linear relationships between pairs of continuous variables. By extension, the Pearson Correlation evaluates whether there is statistical evidence for a linear relationship among the same pairs of variables in the population, represented by a population correlation coefficient, ρ (“rho”). The Pearson Correlation is a parametric measure.

This measure is also known as:

- Pearson’s correlation
- Pearson product-moment correlation (PPMC)

## Common Uses

The bivariate Pearson Correlation is commonly used to measure the following:

- Correlations among pairs of variables
- Correlations within and between sets of variables

The bivariate Pearson correlation indicates the following:

- Whether a statistically significant linear relationship exists between two continuous variables
- The strength of a linear relationship (i.e., how close the relationship is to being a perfectly straight line)
- The direction of a linear relationship (increasing or decreasing)

**Note:** The bivariate Pearson Correlation cannot address non-linear relationships or relationships among categorical variables. If you wish to understand relationships that involve categorical variables and/or non-linear relationships, you will need to choose another measure of association.

**Note:** The bivariate Pearson Correlation only reveals *associations* among continuous variables. The bivariate Pearson Correlation does not provide any inferences about causation, no matter how large the correlation coefficient is.

## Data Requirements

Your data must meet the following requirements:

- Two or more continuous variables (i.e., interval or ratio level)
- Cases that have values on both variables
- Linear relationship between the variables
- Independent cases (i.e., independence of observations)
- There is no relationship between the values of variables between cases. This means that:
- the values for all variables across cases are unrelated
- for any case, the value for any variable cannot influence the value of any variable for other cases
- no case can influence another case on any variable
- The biviariate Pearson correlation coefficient and corresponding significance test are not robust when independence is violated.
- Bivariate normality
- Each pair of variables is bivariately normally distributed
- Each pair of variables is bivariately normally distributed at all levels of the other variable(s)
- This assumption ensures that the variables are linearly related; violations of this assumption may indicate that non-linear relationships among variables exist. Linearity can be assessed visually using a scatterplot of the data.
- Random sample of data from the population
- No outliers

## Hypotheses

The hypotheses can be expressed in the following ways, depending on whether a one-tailed or two-tailed (SPSS default) hypothesis test is selected:

*Two-tailed significance test:*

Null hypothesis: | H_{0}: ρ = 0_{ } |
the population correlation coefficient is 0 (no association) | |||||||||

Alternative hypothesis: | H_{1}: ρ _{ }≠ 0 |
the population correlation coefficient is not 0 |

*One-tailed significance test:*

Null hypothesis: | H_{0}: ρ _{ }= 0 |
the population correlation coefficient is 0 (no association) | |||||||||

Alternative hypothesis: | H_{1}: ρ > 0 |
the population correlation coefficient is greater than 0 | |||||||||

OR | |||||||||||

Alternative hypothesis | H_{1}: ρ < 0 |
the population correlation coefficient is less than 0 |

where ρ is the population correlation coefficient.

## Test Statistic

The sample correlation coefficient for a bivariate Pearson Correlation is denoted as *r*, and may also be known as:

- Pearson’s
*r* - Pearson product-moment correlation coefficient (PPMCC)
- Pearson correlation coefficient (PCC)

The direction of the relationship between variables is represented by the sign of the coefficient:

- -1 (perfectly negative linear relationship)
- 0 (no relationship)
- +1 (perfectly positive linear relationship)

The strength of the correlation, or effect size, is the magnitude or strength of the association, and can be assessed by these general guidelines^{1} (which may vary by discipline):

- .1 < |
*r*| < .3 … small / weak correlation - .3 < |
*r*| < .5 … medium / moderate correlation - .5 < |
*r*| ……… large / strong correlation

**Note:** The direction and strength of a correlation are two distinct properties. The scatterplots below show correlations that are *r* = +0.80 and *r* = -0.80, respectively. The strength of both correlations is the same: 0.80. But the direction of the correlations is different: a positive correlation (left) and a negative correlation (right).

Positive correlation | Negative correlation | |

^{1 }Cohen, J. (1988).

*Statistical power analysis for the behavioral sciences*(2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

## Data Set-Up

Your data should include two continuous variables, each defined as scale, which will be used in the analysis.

## Run a bivariate Pearson Correlation

To run a bivariate Pearson Correlation in SPSS, click **Analyze** > **Correlate** > **Bivariate.**

The Bivariate Correlations window opens, where you will specify the variables to be used in the analysis. All of the variables in your dataset appear in the list on the left side. To select variables for the analysis, select the variables in the list on the left and click the blue arrow button to move them to the right, in the **Variables** field.

**A.** **Variables: **The variables to be used in the bivariate Pearson Correlation. You must select at least two continuous variables, but may select more than two. The test will produce correlation coefficients for each pairing of variables in your list.

**B.** **Correlation Coefficients:** There are multiple types of correlation coefficients. By default, **Pearson** is selected. Selecting Pearson will produce the test statistics for a bivariate Pearson Correlation.

**C.** **Test of Significance:** Click **Two-tailed** or **One-tailed**, depending on your desired significance test. If you do not hypothesize a specific directional relationship (i.e., negative or positive) between your variables, you will want to select a two-tailed significance test.

**D.** **Flag significant correlations:** Checking this option will include asterisks (**) next to statistically significant correlations in the output. By default, SPSS marks statistical significance at the alpha = 0.05 and alpha = 0.01 levels, but not at the alpha = 0.001 level (which is treated as alpha = 0.01)

**E.** **Options:** Clicking **Options** will open a window where you can specify which **Statistics** to include (i.e., **Means and standard deviations**, **Cross-product deviations and covariances**) and how to address **Missing Values** (i.e., **Exclude cases pairwise** **or Exclude cases listwise**). Click **Continue** when you are finished making specifications.

Click **OK** to run the bivariate Pearson Correlation.

## Example: Understanding the linear association between weight and height

Perhaps you would like to test whether there is a statistically significant linear relationship between two continuous variables, weight and height (and by extension, infer whether the association is significant in the population). You can use a bivariate Pearson Correlation to test whether there is a statistically significant linear relationship between height and weight, and to determine the strength and direction of the association.

In the sample data, we will use two variables: “Height” and “Weight.” The variable “Height” is a continuous measure of height in inches and exhibits a range of values from 58.44 to 80.45 (**Analyze** > **Descriptive** **Statistics** > **Descriptives**). The variable “Weight” is a continuous measure of weight in pounds and exhibits a range of values from 108.90 to 226.20. In SPSS, the data look like this:

To run the bivariate Pearson Correlation, click **Analyze** > **Correlate** > **Bivariate**. Select the variables Height and Weight and move them to the right.

In the **Correlation Coefficients** area, select **Pearson**. In the **Test of Significance** area, select your desired significance test, two-tailed or one-tailed. We will select a two-tailed significance test in this example. Check the box next to **Flag significant correlations**.

Click **OK** to run the bivariate Pearson Correlation. Output for the analysis will display in the Output Viewer.

The results will display the correlations in a table, labeled **Correlations**.

Since the results are displayed in a matrix, each result is repeated. Cells A and D indicate perfect correlations (“1”) because these cells reflect the correlation of a variable with itself. Only cells B and C are of interest since they reflect the relationship between “Height” and “Weight.” Cells B and C will display identical results since they include information for the same pairing of variables.

If you have opted to flag significant correlations, SPSS will demark a 0.05 significance level with one asterisk (*) and a 0.01 significance level with two asterisks (0.01). In cell B (repeated in cell C), we can see that the Pearson correlation coefficient for height and weight is .479, which is significant (*p* < .001 for a two-tailed test, based on a sample of 93 cases.

Based on the results, we can state the following:

- Weight and height have a statistically significant linear relationship (
*p*< .001). - The direction of the relationship is positive (i.e., height and weight are positively correlated), meaning that these variables tend to increase together (i.e., greater height is associated with greater weight).
- The magnitude, or strength, of the association is moderate (.3 < |
*r*| < .5).

## Syntax

The corresponding syntax for the bivariate Pearson Correlation example is:

CORRELATIONS

/VARIABLES=Weight Height

/PRINT=TWOTAIL NOSIG

/MISSING=PAIRWISE.

Running this syntax in the Syntax Editor will produce the same output.