Skip to main content

The "Compute Variable" command allows you to create new variables from existing variables by applying formulas. This tutorial shows how the "Compute Variable" command can compute a variable using an equation, a built-in function, or conditional logic.

Computing Variables

Sometimes you may need to compute a new variable based on existing information (from other variables) in your data. For example, you may want to convert the units of a variable from feet to meters, or use a subject's height and weight to compute their BMI. You may also want to apply a computation conditionally, so that a new variable is only computed for cases where certain conditions are met. In this tutorial, we'll discuss how to compute variables in SPSS using numeric expressions, built-in functions, and conditional logic.

To compute a new variable, click Transform > Compute Variable.

The Compute Variable window will open where you will specify how to calculate your new variable.

Compute Variable dialog window (SPSS version 23).

ATarget Variable: The name of the new variable that will be created during the computation. Simply type a name for the new variable in the text field. Once a variable is entered here, you can click on “Type & Label” to assign a variable type and give it a label. The default type for new variables is numeric.

BThe left column lists all of the variables in your dataset. You can use this menu to add variables into a computation: either double-click on a variable to add it to the Numeric Expression field, or select the variable(s) that will be used in your computation and click the arrow to move them to the Numeric Expression text field (C).

CNumeric Expression: Specify how to compute the new variable by writing a numeric expression.

D The center of the window includes a collection of arithmetic operators, Boolean operators, and numeric characters, which you can use to specify how your new variable will be calculated. There are many kinds of calculations you can specify by selecting a variable (or multiple variables) from the left column, moving them to the center text field, and using the blue buttons to specify values (e.g., “1”) and operations (e.g., +, *, /).

E If: The If option allows you to specify the conditions under which your computation will be applied.

F Function group: You can also use the built-in functions in the Function group list on the right-hand side of the window. The function group contains many useful, common functions that may be used for calculating values for new variables (e.g., mean, logarithm). To find a specific function, simply click one of the function groups in the Function Group list. You will now see a list of functions that belong to that function group in the Functions and Special Variables area. If you click on a specific function, a description of that function will appear in the text field to the left.


Click If (indicated by letter E in the above image) to open the Compute Variable: If Cases window.

Compute Variable If Cases dialog window (SPSS version 23).

1The left column displays all of the variables in your dataset. You will use one or more variables to define the conditions under which your computation should be applied to the data.

2 The default specification is to Include all cases. To specify the conditions under which your computation should be applied, however, you will need to click Include if case satisfies condition. This will allow you to specify the conditions under which the computation will be applied to your data.

3The center of the dialog box includes a collection of arithmetic operators, Boolean operators, and numeric characters, which you can use to specify the conditions under which your recode will be applied to the data. There are many kinds of conditions you can specify by selecting a variable (or multiple variables) from the left column, moving them to the center text field, and using the blue buttons to specify values (e.g., “1”) and operations (e.g., +, *, /). You can also use the built-in functions in the Function Group list under the right column.

After you are finished defining the conditions under which your computation will be applied to the data, click Continue. Note that when you specify a condition in the Compute Variable: If Cases window, the computation will only be performed on the cases meeting the specified condition. If a case does not meet that condition, it will be assigned a missing value for the new variable.

Example: Computing a New Variable Using Arithmetic

Now we will use what we have learned throughout this tutorial to demonstrate how to compute a new variable. In this example, we wish to compute a new variable called AverageScore that is the average of four test scores—variables English, Reading, Math, and Writing.

  1. Click Transform > Compute Variable.
  2. In the Target Variable field, type a name for the new variable that will be computed. Let's call our new variable AverageScore.
  3. Highlight each variable—EnglishReadingMath, and Writing—from the list on the left and click the arrow to move each variable to the Numeric Expression field. (Alternatively, you can double-click on the variable name to move it to the Numeric Expression field.) Make sure you click the spacebar to create a space between each variable.  
  4. Now your four variables will appear in the Numeric Expression field. Move your cursor between each set of variables and click the “+” sign to add the symbol for addition to the numeric expression. Now your expression should appear as English + Reading + Math + Writing.
  5. Now insert parentheses around the expression so that it appears as (English + Reading + Math + Writing).
  6. At the end of the expression, add the “/” sign and the number “4.” Now your expression should appear as (English + Reading + Math + Writing) / 4.
  7. The final expression indicates that the new variable, AverageScore will be calculated as the average of the four test scores.
  8. Click OK to complete the computation and apply the changes to the data.
  9. Finally, let’s make sure that a new variable called AverageScore was successfully created.
    • We can find the new variable in the last column in Data View or in the last row of Variable View. If you do not see the new variable, the computation was unsuccessful.
    • We can check the syntax that was executed by looking at the log in the Output Viewer window. After running Compute Variable, the syntax that should have appeared in the output window is:
      COMPUTE FinalGrade1=(English + Reading + Math + Writing) / 4. 
      EXECUTE.
      If there was an error in how the computation was specified, the log in the Output Viewer will often show an error message.
    • It is also useful to explore whether the computation you specified was applied correctly to the data. You can spot-check the computation by viewing your data in the Data View tab. To check that the new variable computed correctly, you can manually calculate the averages for a few cases in your dataset just to spot-check that the computation worked correctly.

Example: Computing a New Variable Using a Built-In Function

Let's instead try computing the average test score using the built-in mean function.

  1. Click Transform > Compute Variable.
  2. In the Target Variable area, type a name for the new variable that will be computed; let's call the new variable AverageScore2.
  3. In the Function group list, click All.
  4. In the Functions and Special Variables list, scroll down until you find “Mean”, then click on it. A description of this function will appear in the text box to the left. In this example, the description reads:

    "MEAN(numexpr,numexpr[,..]). Numeric. Returns the arithmetic mean of its arguments that have valid, nonmissing values. This function requires two or more arguments, which must be numeric. You can specify a minimum number of valid arguments for this function to be evaluated."

  5. Double-click “Mean” under in the Functions and Special Values list. When you do this, the syntax MEAN(?,?) should appear in the Numeric Expression field.
  6. Now add each of the variables (i.e., English, Reading, Math, Writing) to the numeric expression by double-clicking on the variable name in the left list. The variable names should be separated by commas, and all of the variable names should remain inside the parentheses.
  7. Your final numeric expression should appear as MEAN(English,Reading,Math,Writing). The final expression indicates that the new variable, AverageScore2, will be calculated as the average of the four test scores.
  8. Click OK to complete the computation and apply the changes to the data.
  9. Finally, let’s make sure that a new variable called AverageScore2 was successfully created.
    • We can find the new variable in the last column in Data View or in the last row of Variable View. If you do not see the new variable in the Variable View, the computation was unsuccessful. Additionally, if you see the new column in the Data View but every row has a missing value, there was an issue with your computation.
    • We can check the syntax that was executed by looking at the log in the Output Viewer window. After running Compute Variable, the syntax that should have appeared in the output window is:
      COMPUTE AverageScore2=MEAN(English,Reading,Math,Writing). 
      EXECUTE.
      
      If there was an error in how the computation was specified, the log in the Output Viewer will often show an error message.
    • It is also useful to explore whether the computation you specified was applied correctly to the data. You can spot-check the computation by viewing your data in the Data View tab. To check that the new variable computed correctly, you can manually calculate the averages for a few cases in your dataset just to spot-check that the computation worked correctly.

Example: Referring to a Range of Variables in a Function

Notice that in the sample dataset, the test score variables in the sample dataset are all next to each other. In the previous example, we explicitly specified all four test score variables in the MEAN function. But what if there had been ten or twenty test score variables? It would take much longer to manually enter all twenty variable names.

What if we wanted to refer to the entire range of test score variables, beginning with English and ending with Writing, without having to type out each variable's name?

When using SPSS's special built-in functions, you can refer to a range of variables by using the statement TO. Let's repeat the previous example and show how the TO statement is used to refer to a range of variables inside a function.

WARNING: This method is dependent on the positions of the variables in the dataset. If the variables are not in sequential order, this method may not work correctly.

  1. Click Transform > Compute Variable.
  2. In the Target Variable area, type a name for the new variable that will be computed; let's call the new variable AverageScore3.
  3. In the Function group list, click All.
  4. In the Functions and Special Variables list, scroll down until you find “Mean”, then click on it.
  5. Double-click “Mean” under in the Functions and Special Values list. The basic setup for using this function will now appear in the Numeric Expression field.
  6. Inside the MEAN function, change the arguments to English TO Writing. Your final numeric expression should appear as MEAN(English TO Writing). The final expression indicates that the new variable, AverageScore3, will be calculated as the average of all the variables between English and Writing in the dataset.
  7. Click OK to complete the computation.
  8. Finally, let’s make sure that a new variable called AverageScore3 was successfully created.
    • We can find the new variable in the last column in Data View or in the last row of Variable View. If you do not see the new variable, the computation was unsuccessful.
    • We can check the syntax that was executed by looking at the log in the Output Viewer window. After running Compute Variable, the syntax that should have appeared in the output window is:
      COMPUTE AverageScore3=MEAN(English TO Writing).
      EXECUTE.
      If there was an error in how the computation was specified, the log in the Output Viewer will often show an error message.
    • It is also useful to explore whether the computation you specified was applied correctly to the data. You can spot-check the computation by viewing your data in the Data View tab. To check that the new variable computed correctly, you can manually calculate the averages for a few cases in your dataset just to spot-check that the computation worked correctly.

If you've already verified the computation for AverageScore or AverageScore2, then you should be able to verify that AverageScore, AverageScore2, and AverageScore3 are all equal.

Example: Computing Subscales when Some Values Missing

In the previous examples, we did not talk about what happens when one or more of the variables has missing values for a given case. In fact, if there is a missing value for one or more of the input variables, SPSS assigns the new variable a missing value. That is, there must be valid values for each input variable in order for the computation to work. This is called listwise exclusion.

Listwise exclusion can end up throwing out a lot of data, especially if you are computing a subscale from many variables.

In SPSS, you can modify any function that takes a list of variables as arguments using the .n suffix, where n is an integer indicating how many nonmissing values a given case must have. As long as a case has at least n valid values, the computation will be carried out using just the valid values.

In the previous example, we used the built-in MEAN() function to compute the average of the four placement test scores. If we change the syntax to:

COMPUTE AverageScore=MEAN.3(English TO Writing).
EXECUTE.

Then any case with three or more nonmissing values will have a successful, nonmissing value for AverageScore. (Stated another way, a given case could have at most one missing test score and still be OK.)

Alternatively, using the syntax

COMPUTE AverageScore=MEAN.2(English TO Writing).
EXECUTE.

would require that two or more of the test score variables have valid values (i.e., a given case could have at most two missing test scores).