Our tutorials reference a dataset called "sample" in many examples. If you'd like to download the sample dataset to work through the examples, choose one of the files below:
The One-Way ANOVA ("analysis of variance") compares the means of two or more independent groups in order to determine whether there is statistical evidence that the associated population means are significantly different. One-Way ANOVA is a parametric test.
This test is also known as:
The variables used in this test are known as:
The One-Way ANOVA is often used to analyze data from the following types of studies:
The One-Way ANOVA is commonly used to test the following:
Note: Both the One-Way ANOVA and the Independent Samples t Test can compare the means for two groups. However, only the One-Way ANOVA can compare the means across three or more groups.
Note: If the grouping variable has only two groups, then the results of a one-way ANOVA and the independent samples t test will be equivalent. In fact, if you run both an independent samples t test and a one-way ANOVA in this situation, you should be able to confirm that t^{2}=F.
Your data must meet the following requirements:
Note: When the normality, homogeneity of variances, or outliers assumptions for One-Way ANOVA are not met, you may want to run the nonparametric Kruskal-Wallis test instead.
Researchers often follow several rules of thumb for one-way ANOVA:
The null and alternative hypotheses of one-way ANOVA can be expressed as:
H_{0}: µ_{1} = µ_{2} = µ_{3} = ... = µ_{k} ("all k population means are equal")
H_{1}: At least one µ_{i} different ("at least one of the k population means is not equal to the others")
where
Note: The One-Way ANOVA is considered an omnibus (Latin for “all”) test because the F test indicates whether the model is significant overall—i.e., whether or not there are any significant differences in the means between any of the groups. (Stated another way, this says that at least one of the means is different from the others.) However, it does not indicate which mean is different. Determining which specific pairs of means are significantly different requires either contrasts or post hoc (Latin for “after this”) tests.
The test statistic for a One-Way ANOVA is denoted as F. For an independent variable with k groups, the F statistic evaluates whether the group means are significantly different. Because the computation of the F statistic is slightly more involved than computing the paired or independent samples t test statistics, it's extremely common for all of the F statistic components to be depicted in a table like the following:
Sum of Squares | df | Mean Square | F | |
---|---|---|---|---|
Treatment | SSR | df_{r} | MSR | MSR/MSE |
Error | SSE | df_{e} | MSE | |
Total | SST | df_{T} |
where
SSR = the regression sum of squares
SSE = the error sum of squares
SST = the total sum of squares (SST = SSR + SSE)
df_{r} = the model degrees of freedom (equal to df_{r} = k - 1)
df_{e} = the error degrees of freedom (equal to df_{e} = n - k - 1)
k = the total number of groups (levels of the independent variable)
n = the total number of valid observations
df_{T} = the total degrees of freedom (equal to_{ }df_{T} = df_{r} + df_{e} = n - 1)
MSR = SSR/df_{r} = the regression mean square
MSE = SSE/df_{e} = the mean square error
Then the F statistic itself is computed as
$$ F = \frac{\mathrm{MSR}}{\mathrm{MSE}} $$
Note: In some texts you may see the notation df_{1} or ν_{1} for the regression degrees of freedom, and df_{2} or ν_{2} for the error degrees of freedom. The latter notation uses the Greek letter nu (ν) for the degrees of freedom.
Some texts may use "SSTr" (Tr = "treatment") instead of SSR (R = "regression"), and may use SSTo (To = "total") instead of SST.
The terms Treatment (or Model) and Error are the terms most commonly used in natural sciences and in traditional experimental design texts. In the social sciences, it is more common to see the terms Between groups instead of "Treatment", and Within groups instead of "Error". The between/within terminology is what SPSS uses in the one-way ANOVA procedure.
Your data should include at least two variables (represented in columns) that will be used in the analysis. The independent variable should be categorical (nominal or ordinal) and include at least two groups, and the dependent variable should be continuous (i.e., interval or ratio). Each row of the dataset should represent a unique subject or experimental unit.
Note: SPSS restricts categorical indicators to numeric or short string values only.
The following steps reflect SPSS’s dedicated One-Way ANOVA procedure. However, since the One-Way ANOVA is also part of the General Linear Model (GLM) family of statistical tests, it can also be conducted via the Univariate GLM procedure (“univariate” refers to one dependent variable). This latter method may be beneficial if your analysis goes beyond the simple One-Way ANOVA and involves multiple independent variables, fixed and random factors, and/or weighting variables and covariates (e.g., One-Way ANCOVA). We proceed by explaining how to run a One-Way ANOVA using SPSS’s dedicated procedure.
To run a One-Way ANOVA in SPSS, click Analyze > Compare Means > One-Way ANOVA.
The One-Way ANOVA window opens, where you will specify the variables to be used in the analysis. All of the variables in your dataset appear in the list on the left side. Move variables to the right by selecting them in the list and clicking the blue arrow buttons. You can move a variable(s) to either of two areas: Dependent List or Factor.
A Dependent List: The dependent variable(s). This is the variable whose means will be compared between the samples (groups). You may run multiple means comparisons simultaneously by selecting more than one dependent variable.
B Factor: The independent variable. The categories (or groups) of the independent variable will define which samples will be compared. The independent variable must have at least two categories (groups), but usually has three or more groups when used in a One-Way ANOVA.
C Contrasts: (Optional) Specify contrasts, or planned comparisons, to be conducted after the overall ANOVA test.
When the initial F test indicates that significant differences exist between group means, contrasts are useful for determining which specific means are significantly different when you have specific hypotheses that you wish to test. Contrasts are decided before analyzing the data (i.e., a priori). Contrasts break down the variance into component parts. They may involve using weights, non-orthogonal comparisons, standard contrasts, and polynomial contrasts (trend analysis).
Many online and print resources detail the distinctions among these options and will help users select appropriate contrasts. Please see the IBM SPSS guide for detailed information on Contrasts by clicking the ? button at the bottom of the dialog box.
D Post Hoc: (Optional) Request post hoc (also known as multiple comparisons) tests. Specific post hoc tests can be selected by checking the associated boxes.
1 Equal Variances Assumed: Multiple comparisons options that assume homogeneity of variance (each group has equal variance). For detailed information about the specific comparison methods, click the Help button in this window.
2 Test: By default, a 2-sided hypothesis test is selected. Alternatively, a directional, one-sided hypothesis test can be specified if you choose to use a Dunnett post hoc test. Click the box next to Dunnett and then specify whether the Control Category is the Last or First group, numerically, of your grouping variable. In the Test area, click either < Control or > Control. The one-tailed options require that you specify whether you predict that the mean for the specified control group will be less than (> Control) or greater than (< Control) another group.
3 Equal Variances Not Assumed: Multiple comparisons options that do not assume equal variances. For detailed information about the specific comparison methods, click the Help button in this window.
4 Significance level: The desired cutoff for statistical significance. By default, significance is set to 0.05.
When the initial F test indicates that significant differences exist between group means, post hoc tests are useful for determining which specific means are significantly different when you do not have specific hypotheses that you wish to test. Post hoc tests compare each pair of means (like t-tests), but unlike t-tests, they correct the significance estimate to account for the multiple comparisons.
E Options: Clicking Options will produce a window where you can specify which Statistics to include in the output (Descriptive, Fixed and random effects, Homogeneity of variance test, Brown-Forsythe, Welch), whether to include a Means plot, and how the analysis will address Missing Values (i.e., Exclude cases analysis by analysis or Exclude cases listwise). Click Continue when you are finished making specifications.
Click OK to run the One-Way ANOVA.
To introduce one-way ANOVA, let's use an example with a relatively obvious conclusion. The goal here is to show the thought process behind a one-way ANOVA.
In the sample dataset, the variable Sprint is the respondent's time (in seconds) to sprint a given distance, and Smoking is an indicator about whether or not the respondent smokes (0 = Nonsmoker, 1 = Past smoker, 2 = Current smoker). Let's use ANOVA to test if there is a statistically significant difference in sprint time with respect to smoking status. Sprint time will serve as the dependent variable, and smoking status will act as the independent variable.
Just like we did with the paired t test and the independent samples t test, we'll want to look at descriptive statistics and graphs to get picture of the data before we run any inferential statistics.
The sprint times are a continuous measure of time to sprint a given distance in seconds. From the Descriptives procedure (Analyze > Descriptive Statistics > Descriptives), and exhibit a range of 4.5 to 9.6 seconds, with a mean of 6.6 seconds (based on n=374 valid cases). From the Compare Means procedure (Analyze > Compare Means > Means), we see these statistics with respect to the groups of interest:
N | Mean | Std. Deviation | |
---|---|---|---|
Nonsmoker | 261 | 6.411 | 1.252 |
Past smoker | 33 | 6.835 | 1.024 |
Current smoker | 59 | 7.121 | 1.084 |
Total | 353 | 6.569 | 1.234 |
Notice that, according to the Compare Means procedure, the valid sample size is actually n=353. This is because Compare Means (and additionally, the one-way ANOVA procedure itself) requires there to be nonmissing values for both the sprint time and the smoking indicator.
Lastly, we'll also want to look at a comparative boxplot to get an idea of the distribution of the data with respect to the groups:
From the boxplots, we see that there are no outliers; that the distributions are roughly symmetric; and that the center of the distributions don't appear to be hugely different. The median sprint time for the nonsmokers is slightly faster than the median sprint time of the past and current smokers.
Output for the analysis will display in the Output Viewer window.
ONEWAY Sprint BY Smoking
/PLOT MEANS
/MISSING ANALYSIS.
The output displays a table entitled ANOVA.
Sum of Squares | df | Mean Square | F | Sig. | |
---|---|---|---|---|---|
Between Groups | 26.788 | 2 | 13.394 | 9.209 | .000 |
Within Groups | 509.082 | 350 | 1.455 | ||
Total | 535.870 | 352 |
After any table output, the Means plot is displayed.
The Means plot is a visual representation of what we saw in the Compare Means output. The points on the chart are the average of each group. It's much easier to see from this graph that the current smokers had the slowest mean sprint time, while the nonsmokers had the fastest mean sprint time.
We conclude that the mean sprint time is significantly different for at least one of the smoking groups (F_{2, 350} = 9.209, p < 0.001). Note that the ANOVA alone does not tell us specifically which means were different from one another. To determine that, we would need to follow up with multiple comparisons (or post-hoc) tests.