Skip to main content

SPSS Tutorials Computing Variables

This tutorial shows how to compute new variables in SPSS using formulas and built-in functions.

Computing Variables

Sometimes you may need to compute a new variable based on existing information (from other variables) in your data. For example, you may want to:

  • Convert the units of a variable from feet to meters
  • Use a subject's height and weight to compute their BMI
  • Compute a subscale score from items on a survey
  • Apply a computation conditionally, so that a new variable is only computed for cases where certain conditions are met

In this tutorial, we'll discuss how to compute variables in SPSS using numeric expressions, built-in functions, and conditional logic.

To compute a new variable, click Transform > Compute Variable.

The Compute Variable window will open where you will specify how to calculate your new variable.

Compute Variable dialog window (SPSS version 23) with labels. A = Target Variable box, B = variable list, C = Numeric Expression box, D = numeric keypad

A Target Variable: The name of the new variable that will be created during the computation. Simply type a name for the new variable in the text field. Once a variable is entered here, you can click on “Type & Label” to assign a variable type and give it a label. The default type for new variables is numeric.

B The left column lists all of the variables in your dataset. You can use this menu to add variables into a computation: either double-click on a variable to add it to the Numeric Expression field, or select the variable(s) that will be used in your computation and click the arrow to move them to the Numeric Expression text field (C).

C Numeric Expression: Specify how to compute the new variable by writing a numeric expression. This expression must include one or more variables from your dataset, and can use arithmetic or functions.

When writing an expression in the Compute Variables dialog window:

  • SPSS is not case-sensitive with respect to variable names.
  • When specifying the formula for a new variable, you have to option to include or not include spaces after the commas that go between arguments in a function.
  • Do not put a period at the end of the expression you enter into the Numeric Expression box.

D The center of the window includes a collection of arithmetic operators, Boolean operators, and numeric characters, which you can use to specify how your new variable will be calculated. There are many kinds of calculations you can specify by selecting a variable (or multiple variables) from the left column, moving them to the center text field, and using the blue buttons to specify values (e.g., “1”) and operations (e.g., +, *, /).

E If: The If option allows you to specify the conditions under which your computation will be applied.

F Function group: You can also use the built-in functions in the Function group list on the right-hand side of the window. The function group contains many useful, common functions that may be used for calculating values for new variables (e.g., mean, logarithm). To find a specific function, simply click one of the function groups in the Function Group list. You will now see a list of functions that belong to that function group in the Functions and Special Variables area. If you click on a specific function, a description of that function will appear in the text field to the left.


Click If (indicated by letter E in the above image) to open the Compute Variable: If Cases window.

Compute Variable If Cases dialog window (SPSS version 23).

1The left column displays all of the variables in your dataset. You will use one or more variables to define the conditions under which your computation should be applied to the data.

2 The default specification is to Include all cases. To specify the conditions under which your computation should be applied, however, you will need to click Include if case satisfies condition. This will allow you to specify the conditions under which the computation will be applied to your data.

3The center of the dialog box includes a collection of arithmetic operators, Boolean operators, and numeric characters, which you can use to specify the conditions under which your recode will be applied to the data. There are many kinds of conditions you can specify by selecting a variable (or multiple variables) from the left column, moving them to the center text field, and using the blue buttons to specify values (e.g., “1”) and operations (e.g., +, *, /). You can also use the built-in functions in the Function Group list under the right column.

After you are finished defining the conditions under which your computation will be applied to the data, click Continue. Note that when you specify a condition in the Compute Variable: If Cases window, the computation will only be performed on the cases meeting the specified condition. If a case does not meet that condition, it will be assigned a missing value for the new variable.

Computing Variables using Syntax

You do not necessarily need to use the Compute Variables dialog window in order to compute variables or generate syntax. You can write your own syntax expressions to compute variables (and it is often faster and more convenient to do so!)

The general form of the syntax for computing a new variable is:

COMPUTE NewVariableName = <formula>.
EXECUTE.

The first line gives the COMPUTE command, which specifies the name of the new variable on the left side of the equals sign, and its formula on the right side of the equals sign. The formula on the right side of the equals sign corresponds to what you would enter in the Numeric Expression field in the Compute Variables dialog window.

The EXECUTE command on the second line is what actually carries out the computation and adds the variable to the active dataset. (If you have tried to run COMPUTE syntax but do not see variables added to your dataset and do not also see error or warning messages in the Output Viewer, you may have forgotten to include the EXECUTE statement.)

Notice how each line of syntax ends in a period.

When writing the expression or formula using COMPUTE syntax:

  • SPSS is not case-sensitive with respect to variable names.
  • When specifying the formula for a new variable, you have to option to include or not include spaces around the equals sign and/or after the commas between arguments in a function.
  • A period goes at the end of the COMPUTE statement, after the end of the formula.

Example: Computing a New Variable Using Arithmetic

Now we will use what we have learned throughout this tutorial to demonstrate how to compute a new variable. In this example, we wish to compute BMI for the respondents in our sample. The height (in inches) and weight (in pounds) of the respondents were observed; so to compute BMI, we want to plug those values into the formula

$$ \mathrm{BMI} = \frac{\mathrm{Weight}*703}{\mathrm{Height}^{2}} $$

Using the Compute Variables Dialog Window

  1. Click Transform > Compute Variable.
  2. In the Target Variable field, type a name for the new variable that will be computed. Let's call our new variable BMI.
  3. In the Numeric Expression field, type the following expression:

    (Weight*703)/(Height**2)
    

    (Alternatively, you can double-click on the variable names in the left column to move them to the Numeric Expression field, and then write the expression around them.) This expression indicates that the new variable, BMI, will be calculated as weight multiplied by 703, divided by the square of height.

  4. Click OK to complete the computation.
  5. Finally, let’s make sure that a new variable called BMI was successfully created.
    • We can find the new variable in the last column in Data View or in the last row of Variable View. If you do not see the new variable, the computation was unsuccessful.
    • We can check the syntax that was executed by looking at the log in the Output Viewer window. If there was an error in how the computation was specified, the log in the Output Viewer will often show an error message.
    • It is also useful to explore whether the computation you specified was applied correctly to the data. You can spot-check the computation by viewing your data in the Data View tab. To check that the new variable computed correctly, you can manually calculate the BMI for a few cases in your dataset just to spot-check that the computation worked correctly.

Using Syntax

Alternatively, you can produce the same result by opening a syntax window (File > New > Syntax) and executing the following code:

COMPUTE BMI=(Weight*703)/(Height**2). 
EXECUTE.

This syntax can be generated automatically by following the dialog window steps above and clicking Paste instead of OK.

Example: Computing a New Variable Using a Built-In Function

Using the Compute Variables Dialog Window

Let's instead try computing the average test score using the built-in mean function.

  1. Click Transform > Compute Variable.
  2. In the Target Variable area, type a name for the new variable that will be computed; let's call the new variable AverageScore2.
  3. In the Function group list, click All.
  4. In the Functions and Special Variables list, scroll down until you find “Mean”, then click on it. A description of this function will appear in the text box to the left. In this example, the description reads:

    "MEAN(numexpr,numexpr[,..]). Numeric. Returns the arithmetic mean of its arguments that have valid, nonmissing values. This function requires two or more arguments, which must be numeric. You can specify a minimum number of valid arguments for this function to be evaluated."

  5. Double-click “Mean” in the Functions and Special Values list. When you do this, the text MEAN(?,?) should appear in the Numeric Expression field.

  6. Now add each of the variables (i.e., English, Reading, Math, Writing) to the numeric expression by double-clicking on the variable names in the left list. The variable names should be separated by commas, and all of the variable names should remain inside the parentheses.
  7. Your final numeric expression should appear as MEAN(English, Reading, Math, Writing). This says that the new variable, AverageScore2, will be calculated as the mean of the four test scores. (Using spaces after the commas is optional, but recommended, since it is easier to read.)

  8. Click OK to complete the computation and apply the changes to the data.
  9. Finally, let’s make sure that a new variable called AverageScore2 was successfully created.
    • We can find the new variable in the last column in Data View or in the last row of Variable View. If you do not see the new variable in the Variable View, the computation was unsuccessful. Additionally, if you see the new column in the Data View but every row has a missing value, there was an issue with your computation.
    • We can check the syntax that was executed by looking at the log in the Output Viewer window. If there was an error in how the computation was specified, the log in the Output Viewer will often show an error message.
    • It is also useful to explore whether the computation you specified was applied correctly to the data. You can spot-check the computation by viewing your data in the Data View tab. To check that the new variable computed correctly, you can manually calculate the averages for a few cases in your dataset just to spot-check that the computation worked correctly.

Using Syntax

Alternatively, you can produce the same result by opening a syntax window (File > New > Syntax) and executing the following code:

COMPUTE AverageScore2=MEAN(English, Reading, Math, Writing). 
EXECUTE.

This syntax can be generated automatically by following the dialog window steps above and clicking Paste instead of OK.

Example: Referring to a Range of Variables in a Function

Notice that in the sample dataset, the test score variables in the sample dataset are all next to each other. In the previous example, we explicitly specified all four test score variables in the MEAN function. But what if there had been ten or twenty test score variables? It would take much longer to manually enter all twenty variable names.

What if we wanted to refer to the entire range of test score variables, beginning with English and ending with Writing, without having to type out each variable's name?

When using SPSS's special built-in functions, you can refer to a range of variables by using the statement TO. Let's repeat the previous example and show how the TO statement is used to refer to a range of variables inside a function.

This method is dependent on the positions of the variables in the dataset. If the variables are not in sequential order, this method may not work correctly.

Using the Compute Variables Dialog Window

  1. Click Transform > Compute Variable.
  2. In the Target Variable area, type a name for the new variable that will be computed; let's call the new variable AverageScore3.
  3. In the Function group list, click All.
  4. In the Functions and Special Variables list, scroll down until you find “Mean”, then click on it.
  5. Double-click “Mean” under in the Functions and Special Values list. The basic setup for using this function will now appear in the Numeric Expression field.
  6. Inside the MEAN function, change the arguments to English TO Writing. Your final numeric expression should appear as

    MEAN(English TO Writing)

    The final expression indicates that the new variable, AverageScore3, will be calculated as the average of all the variables between English and Writing in the dataset.

  7. Click OK to complete the computation.
  8. Finally, let’s make sure that a new variable called AverageScore3 was successfully created.
    • We can find the new variable in the last column in Data View or in the last row of Variable View. If you do not see the new variable, the computation was unsuccessful.
    • We can check the syntax that was executed by looking at the log in the Output Viewer window. If there was an error in how the computation was specified, the log in the Output Viewer will often show an error message.
    • It is also useful to explore whether the computation you specified was applied correctly to the data. You can spot-check the computation by viewing your data in the Data View tab. To check that the new variable computed correctly, you can manually calculate the averages for a few cases in your dataset just to spot-check that the computation worked correctly.

If you've already verified the computation for AverageScore2, then you should be able to verify that AverageScore2 and AverageScore3 are identical.

Using Syntax

Alternatively, you can produce the same result by opening a syntax window (File > New > Syntax) and executing the following code:

COMPUTE AverageScore3=MEAN(English TO Writing).
EXECUTE.

This syntax can be generated automatically by following the dialog window steps above and clicking Paste instead of OK.

Example: Computing Subscales when Some Values Missing

In the previous examples, we did not talk about what happens when one or more of the variables has missing values for a given case. In fact, if there is a missing value for one or more of the input variables, SPSS assigns the new variable a missing value. That is, there must be valid values for each input variable in order for the computation to work. This is called listwise exclusion.

Listwise exclusion can end up throwing out a lot of data, especially if you are computing a subscale from many variables.

In SPSS, you can modify any function that takes a list of variables as arguments using the .n suffix, where n is an integer indicating how many nonmissing values a given case must have. As long as a case has at least n valid values, the computation will be carried out using just the valid values.

In the previous example, we used the built-in MEAN() function to compute the average of the four placement test scores. If we change the formula for AverageScore3 to MEAN.3(English TO Writing), then any case with three or more nonmissing values will have a successful, nonmissing value for AverageScore3. (Stated another way, a given case could have at most one missing test score and still be OK.)

Alternatively, using the formula MEAN.2(English TO Writing) would require that two or more of the test score variables have valid values (i.e., a given case could have at most two missing test scores).

Syntax

If you click Paste after revising the formula, the following syntax will be written to the syntax editor window:

COMPUTE AverageScore3=MEAN.3(English TO Writing).
EXECUTE.

Example: Computing a New Indicator from Several Existing Indicators

A common scenario on health questionnaires is to have multiple questions about risk factors for a certain disease. These questions may originally be coded as 0 (absent) and 1 (present); or 0 (no) and 1 (yes). For example, on a questionnaire about ADHD, we may ask three questions about whether an individual's biological parents or siblings have been diagnosed with ADHD:

  • Has your biological mother been diagnosed with ADHD?
  • Has your biological father been diagnosed with ADHD?
  • If you have siblings or half-siblings, has at least one of them been diagnosed with ADHD?

Suppose we want to only have a single indicator variable, where 0 = does not have any risk factors, and 1 = has one or more risk factors. The function ANY() is a convenient way to compute this indicator. The ANY function is designed to return the following:

  • ANY(value, var1, var2, var3, ...) = 1 if at least one of var1, var2, var3, ... equals value
  • ANY(value, var1, var2, var3, ...) = 0 if all of the nonmissing values of var1, var2, var3, ... do not equal value
  • ANY(value, var1, var2, var3, ...) = missing if there are missing values for each of var1, var2, var3, ...

The application we will demonstrate is intended to be used when you want to check for one specific value across many variables.

For this example, we will use this tiny dataset. Each variable represents a "yes/no" question, with 1=No, 2=Yes.

You can copy, paste, and execute the following syntax to generate this dataset in SPSS, or you can download the linked SPSS datafile below.

DATA LIST FREE (",") / q1 to q3.
BEGIN DATA.
1,2,2,
2,1,,
1,1,1,
2,,1,
1,,2,
1,1,,
1,2,1,
2,,2,
1,1,2,
,,,
1,,,
,,2,
2,2,2,
END DATA.
VALUE LABELS q1 to q3 1 'No' 2 'Yes'.

Using the Compute Variables Dialog Window

  1. Click Transform > Compute Variable.
  2. In the Target Variable area, type a name for the new variable that will be computed; let's call the new variable any_yes.
  3. In the Numeric Expression box, enter the expression
    ANY(2, q1 TO q3)
    Do not put a period at the end of the expression.
    This expression tells SPSS to look for instances of the value 2 (Yes) across variables q1, q2, and q3.
  4. Click OK to complete the computation.
  5. Finally, let’s make sure that a new variable called any_yes was successfully created.
    • We can find the new variable in the last column in Data View or in the last row of Variable View. If you do not see the new variable, the computation was unsuccessful.
    • We can check the syntax that was executed by looking at the log in the Output Viewer window. If there was an error in how the computation was specified, the log in the Output Viewer will often show an error message.
    • It is also useful to explore whether the computation you specified was applied correctly to the data. You can spot-check the computation by viewing your data in the Data View tab.

Using Syntax

Alternatively, you can produce the same result by opening a syntax window (File > New > Syntax) and executing the following code:

COMPUTE any_yes=ANY(2, q1, q2, q3).
EXECUTE.

/*Optional: add labels to the new indicator variable*/
VALUE LABELS any_yes 0 'No' 1 'Yes'.

This syntax (minus the VALUE LABELS line) can be generated automatically by following the dialog window steps above and clicking Paste instead of OK.

Result

Let's check that the ANY() function produced the results that we expected. If you run the above code, you should get results that look like the following:

  q1 q2 q3 any_yes
1 No Yes Yes 1
2 Yes No . 1
3 No No No 0
4 Yes . No 1
5 No . Yes 1
6 No No . 0
7 No Yes No 1
8 Yes . Yes 1
9 No No Yes 1
10 . . . .
11 No . . 0
12 . . Yes 1
13 Yes Yes Yes 1

You should see that as long as a particular row has a value of Yes for at least one of q1, q2, or q3, it will have a value of 1 for any_yes. Notice that in rows 6 and 11, nonmissing values are all equal to No, so the resulting value of any_yes is 0. Also notice that the only case with a missing value for any_yes is row 10, which has missing values for all three of q1, q2, and q3.

What does this mean? If we go back to the ADHD example used at the start of this section, it implies that anyone whose mother, father, or biological sibling has been diagnosed with ADHD, is themselves considered to have a risk factor for ADHD. It does not assign "extra risk" if someone has two or more relatives that have been diagnosed.