Our tutorials reference a dataset called "sample" in many examples. If you'd like to download the sample dataset to work through the examples, choose one of the files below:
Mean centering refers to a type of variable transformation wherein the average of a variable is subtracted from every observation of that variable. The result of this transformation is a new variable that retains its original units, but whose mean value is 0. Mean centering is often used when fitting linear regression models because of how it affects the interpretation of the intercept term. With an uncentered predictor variable, the intercept of a linear regression represents the expected value of y when the x variable is zero. With a centered predictor variable, the intercept of that linear regression model represents the expected value of y when x is at its average.
In this tutorial, we will show how to mean-center a variable in SPSS Statistics by combining the Aggregate procedure and the Compute Variables procedure. We recommend this approach because it is reproducible: the value of the grand mean used for centering is saved in your dataset, making it easier to audit. Additionally, the Aggregate + Compute Variables approach adapts well if your sample (and therefore the mean value you will use to center) changes due to new observations and/or new filtering criteria.
In SPSS, the Aggregate procedure can be used to compute new variables based on summary statistics, or to compute new variables by group. This process can be used to add variables to an existing dataset, or it can be used to create a new dataset containing only the compressed, aggregated information.
This may look like:
To launch the Aggregate Data procedure, click Data > Aggregate.
The Aggregate Data window opens.
A Break Variable: The variable(s) that will determine the grouping structure for the aggregation. The output of the Aggregate procedure will have summary statistics reported for each of the groups identified by this variable/these variables. These are typically categorical variables, or in the case of longitudinal data, could be ID variables.
B Aggregated Variables/Summaries of Variables: These represent the variables that will be summarized within each group. These will typically be numeric variables.
C Function: Options to change the function used to compute the new aggregate variables. This button will only be clickable if you have added one or more variables to the "Summaries of Variables" box and have one of those variables selected.
D Name & Label: Options to change the variable names and variable labels for the new variables being created. This button will only be clickable if you have added one or more variables to the "Summaries of Variables" box and have one of those variable formulas selected.
E Number of Cases/Name: Option to create an additional aggregate variable that counts the number of rows for a given combination of break variables. The name entered in the "Name" field will become the name of the new variable.
F Save: How to structure the output of the Aggregate procedure. You must select one of the three options:
Suppose we plan to fit a linear regression model to predict students' weight based on their height. We wish to center the independent variable, Height, about its mean, in order to improve the interpretability of our regression coefficients.
Our starting dataset should contain at least one continuous numeric variable, and each row should represent an independent observation.
Add the variable you want to center -- in this case, Height -- to the Aggregated Variables/ Summaries of Variables box.
While the variable is selected, the Function and Name & Label buttons will be clickable:
This calculates the mean of the variable, and adds it as a separate variable to the data set:
Now that we have a column containing the grand average, we can perform the mean centering by using the Compute Variables procedure:
Centered_Height
Height-Mean_Height
After this step, we will finally have our mean-centered variable:
/*Use the Aggregate procedure to compute the grand mean as a new variable and add it to the existing dataset.*/
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/Mean_Height=MEAN(Height).
/*Compute the mean-centered values.*/
COMPUTE Centered_Height = Height - Mean_Height.
EXECUTE.
/*Check results.*/
DESCRIPTIVES Height Centered_Height.
While not a requirement, we can check the descriptive statistics for the original variable and the mean-centered version of the variable using the Descriptives command.
The box below shows the mean, standard deviation, minimum, and maximum for the Centered_Height variable and the original, uncentered variable Height:
Notice that the original variable has a mean of 68.03, which matches the Aggregate procedure, and the centered variable has a mean of zero. The number of observations and the standard deviation stay the same.