Skip to main content

SPSS Tutorials Data Creation in SPSS

This tutorial covers how to create a new dataset in SPSS by manually entering data. Also covered is the difference between row numbers (which are a part of the spreadsheet) and ID variables (which are a part of the dataset and act as case identifiers).

Data Creation in SPSS

When you open the SPSS program, you will see a blank spreadsheet in Data View. If you already have another dataset open but want to create a new one, click File > New > Data to open a blank spreadsheet.

You will notice that each of the columns is labeled “var.” The column names will represent the variables that you enter in your dataset. You will also notice that each row is labeled with a number (“1,” “2,” and so on). The rows will represent cases that will be a part of your dataset. When you enter values for your data in the spreadsheet cells, each value will correspond to a specific variable (column) and a specific case (row).

Follow these steps to enter data:

  1. Click the Variable View tab. Type the name for your first variable under the Name column. You can also enter other information about the variable, such as the type (the default is “numeric”), width, decimals, label, etc. Type the name for each variable that you plan to include in your dataset. In this example, I will type “School_Class” since I plan to include a variable for the class level of each student (i.e., 1 = first year, 2 = second year, 3 = third year, and 4 = fourth year). I will also specify 0 decimals since my variable values will only include whole numbers. (The default is two decimals.)

  2. Click the Data View tab. Any variable names that you entered in Variable View will now be included in the columns (one variable name per column). You can see that School_Class appears in the first column in this example.

  3. Now you can enter values for each case. In this example, cases represent students. For each student, enter a value for their class level in the cell that corresponds to the appropriate row and column. For example, the first person’s information should appear in the first row, under the variable column School_Class. In this example, the first person’s class level is “2,” the second person’s is “1,” the third person’s is “1,” the fourth person’s is “3,” and so on.

  4. Repeat these steps for each variable that you will include in your dataset. Don't forget to periodically save your progress as you enter data.

ID Variables versus Row Numbers

Now that you know how to enter data, it is important to discuss a special type of variable called an ID variable. When data are collected, each piece of information is tied to a particular case. For example, perhaps you distributed a survey as part of your data collection, and each survey was labeled with a number (“1,” “2,” etc.). In this example, the survey numbers essentially represent ID numbers: numbers that help you identify which pieces of information go with which respondents in your sample. Without these ID numbers, you would have no way of tracking which information goes with which respondent, and it would be impossible to enter the data accurately into SPSS.

When you enter data into SPSS, you will need to make sure that you are entering values for each variable that correspond to the correct person or object in your sample. It might seem like a simple solution to use the conveniently labeled rows in SPSS as ID numbers; you could enter your first respondent’s information in the row that is already labeled “1,” the second respondent’s information in the row labeled “2,” etc. However, you should never rely on these pre-numbered rows for keeping track of the specific respondents in your sample. This is because the numbers for each row are visual guides only—they are not attached to specific lines of data, and thus cannot be used to identify specific cases in your data. If your data become rearranged (e.g., after sorting data), the row numbers will no longer be associated with the same case as when you first entered the data. Again, the row numbers in SPSS are not attached to specific lines of data and should not be used to identify certain cases. Instead, you should create a variable in your dataset that is used to identify each case—for example, a variable called StudentID.

Here is an example that illustrates why using the row numbers in SPSS as case identifiers is flawed:

Let’s say that you have entered values for each person for the School_Class variable. You relied on the row numbers in SPSS to correspond to your survey ID numbers. Thus, for survey #1, you entered the first respondent’s information in row 1, for survey #2 you entered the second person’s information in row 2, and so on. Now you have entered all of your data.

But suppose the data get rearranged in the spreadsheet view. A common way of rearranging data is by sorting—and you may very well need to do this as you explore and analyze your data. Sorting will rearrange the rows of data so that the values appear in ascending or descending order. If you right-click on any variable name, you can select “Sort Ascending” or “Sort Descending.” In the example below, the data are sorted in ascending order on the values for the variable School_Class.

But what happens if you need to view a specific respondent’s information? Or perhaps you need to double-check your entry of the data by comparing the original survey to the values you entered in SPSS. Now that the data have been rearranged, there is no way to identify which row corresponds to which participant/survey number.  

The main point is that you should not rely on the row numbers in SPSS since they are merely visual guides and not part of your data. Instead, you should create a specific variable that will serve as an ID for each case so that you can always identify certain cases in your data, no matter how much you rearrange the data. In the sample data file, the variable StudentID acts as the ID variable.