Our tutorials reference a dataset called "sample" in many examples. If you'd like to download the sample dataset to work through the examples, choose one of the files below:
The Compare Means procedure is useful when you want to summarize and compare differences in descriptive statistics across one or more factors, or categorical variables.
To open the Compare Means procedure, click Analyze > Compare Means > Means.
A Dependent List: The continuous numeric variables to be analyzed. You must enter at least one variable in this box before you can run the Compare Means procedure.
B Independent List: The categorical variable(s) that will be used to subset the dependent variables. Specifying multiple values in the "Layer 1 of 1" box will produce several tables, each with one layer variable. You can specify several layers for a single table by clicking Next and then entering other categorical variables; this will produce a table that looks like a hybrid of a crosstab and the Descriptives procedure.
C Options: Opens the Means: Options window, where you can specify the summary statistics to produce, and what order they should be listed in.
The Statistics column on the left shows what statistics are available. Summary statistics available include: mean, number of cases, standard deviation, median, grouped median, standard error of mean, sum, minimum, maximum, range, first, last, variance, kurtosis, standard error of kurtosis, skewness, standard error of skewness, harmonic mean, geometric mean, percent of total sum, and percent of total N. The Cell Statistics column on the right are the statistics that will be produced in the output. By default, the mean, number of cases, standard deviation will be computed. You can add additional statistics by clicking and dragging them from the Statistics column to the Cell Statistics column. You can also click and drag items in the Cell Statistics column to change the order they appear in the output.
The Statistics for First Layer area includes options that will perform one-way ANOVA and compute linear fit statistics (R, R^{2}, Eta, and Eta Squared), respectively.
Running speed and ability is known to be correlated with both physical sex and with a person's general level of athleticism.
In the sample dataset, there are several variables relating to this question:
Let's use the Compare Means procedure to summarize the relationship between running ability, athletics, and gender.
First, we will summarize the mile times without the grouping variables using the mean, standard deviation, sample size, minimum, and maximum.
MEANS TABLES=MileMinDur
/CELLS=MEAN COUNT STDDEV MIN MAX.
The Compare Means procedure will report two tables: the Case Processing Summary, which contain information about the number of valid cases that the statistics are based on, and the Report table, which contains the descriptive statistics themselves.
The average mile time overall was 8 minutes, 9 seconds, with a standard deviation of about 2 minutes. The fastest mile time was about 5 minutes; the slowest was about 14 minutes.
Now let's look at how the mile times vary with respect to whether or not someone is an athlete.
Note that Compare Means with one layer produces results that are similar to using the Split File technique with the Descriptives procedure. The major difference between using Compare Means and viewing the Descriptives with Split File enabled is that Compare Means does not treat missing values as an additional category -- it simply drops those cases from the analysis. Compare Means is limited to listwise exclusion: there must be valid values on each of the dependent and independent variables for a given table.
If you are continuing the example from the first section, you will only need to do step 3.
MEANS TABLES=MileMinDur BY Athlete
/CELLS=MEAN COUNT STDDEV MIN MAX.
The Case Processing Summary table shows how many cases had nonmissing values for both the mile time and the athlete indicator variable. The Report table has the descriptive statistics with respect to each group.
From this table, there are several observations we can make about the relationship between mile time and athletics in the sample:
Let's modify the one-layer analysis to report mile times with respect to athletics, with respect to gender. Recall that there are two levels for Gender (Male and Female), and two levels for Athlete (Non-athlete and Athlete). This means that there are four possible factor level combinations:
When we run Compare Means with two layers, we will be able to simultaneously view the averages with respect to each possible factor combination. As mentioned before, Compare Means is limited to listwise exclusion, so a two-layer analysis requires that cases not have missing values for the dependent variable and all independent variables.
If you are continuing the example from the previous section, you will only need to do step 4.
Note: Be careful that you put each factor on its own separate layer. It is easy to accidentally list two factor variables in the Independent List area for the first layer. (If more than one factor is listed on the first layer, it will produce multiple single-layer reports.) Your Independent List area should look like this:
MEANS TABLES=MileMinDur BY Athlete BY Gender
/CELLS=MEAN COUNT STDDEV MIN MAX.
The Case Processing Summary table shows how many cases had nonmissing values for mile time and the athlete indicator and gender. The Report table has the descriptive statistics with respect to each combination of the factors. Notice that because of listwise exclusion, there are now only 383 valid cases, whereas the single-layer report of mile time by athlete included 392 cases.
Using this table, we can expand upon several observations we made from the single-layer table: