SEARCH UNIVERSITY LIBRARIES
Our tutorials reference a dataset called "sample" in many examples. If you'd like to download the sample dataset to work through the examples, choose one of the files below:
When summarizing quantitative (continuous/interval/ratio) variables, we are typically interested in questions like:
In SPSS, the Frequencies procedure is typically used on categorical variables, but it also has special settings that can be applied for continuous numeric variables. In particular, the Frequencies procedure can compute percentiles that are not otherwise included in the Descriptives, Compare Means, or Explore procedures. In all, the Frequencies procedure can compute the following statistics for one or more continuous variables:
The Frequencies procedure can also produce histograms with or without a normal distribution overlaid on the graph.
To call the Frequencies procedure, click Analyze > Descriptive Statistics > Frequencies.
A Variable(s): The variables to analyze with the Frequencies procedure. To include a variable for analysis, double-click on its name to move it to the Variables box. You can add several variables to this box to obtain statistics for each variable.
B Statistics: Opens the Frequencies: Statistics window, which contains various descriptive statistics, most of which are suitable for continuous numeric variables.
Most of the statistics in the Central Tendency, Dispersion, and Distribution groups are valid for continuous variables; the only exception is the Mode, which very rarely has a useful interpretation for situations involving continuous variables. Most of these statistics are identical to the ones that can be obtained with Descriptives, Compare Means, or Explore, so they will not be covered again here. One noticeable exception to this is the Percentile Values group, which is unique to the Frequencies procedure:
You can select more than one option in the Percentile Values group. If your selections request overlapping information, that information will not be printed twice.
Note: The Values are group midpoints check box should only be selected when your data values represent the midpoint of a range. For example, this would be the case if you had coded anyone between the ages of 30 and 39 as 35 (source: IBM SPSS Statistics Information Center). This situation is more often associated with ordinal categorical variables.
C Charts: Opens the Frequencies: Charts window, which contains various graphical options. Options include bar charts, pie charts, and histograms. Histograms are the only appropriate option for continuous variables; bar charts and pie charts should never be used with continuous variables. If requesting a histogram, the optional Show normal curve on histogram option will overlay a normal curve on top of your histogram, which can be useful when assessing the normality of a variable.
Note that the options in the Chart Values area apply only to bar charts. These buttons will be greyed out if the radio button for Histograms is selected.
D Format: Opens the Frequencies: Format window, which contains options for how to sort and organize the table output.
The Order by options are not relevant to continuous variables, but the Multiple Variables options allow for customization of output when two or more continuous variables are specified.
E Display frequency tables: When checked, frequency tables will be printed. (This box is checked by default.) If this check box is not checked, no frequency tables will be produced, and the only output will come from supplementary options from Statistics or Charts. You will want to uncheck this box if using the Frequencies procedure on a continuous numeric variable. (If this box is left checked, a frequency table will be produced where each unique number is treated as its own category. This could lead to a table with 100+ categories, depending on the number of observations in your dataset.)
For variables with skewed distributions, it is often more useful to look at percentiles than it is to look at means. This is because means are more susceptible to outliers: a single strongly outlying value can "pull" the mean up or down from where it would be otherwise. By comparison, percentiles (including the median) are relatively robust to outliers - that is, percentiles generally do not change much when outliers are present compared to when there aren't outliers present.
When reporting placement or achievement test scores, it's often more useful (and more descriptive) to report the percentiles than it is to report the means. For example, we may want to know the 80th percentile: the score that 80% of students scored below. The sample dataset has placement test scores (out of 100 points) for four subject areas: English, Reading, Math, and Writing. Let's use the Frequencies procedure to obtain the quintiles (i.e., the 20th, 40th, 60th, and 80th percentiles) of the scores.
FREQUENCIES VARIABLES=English Reading Math Writing /FORMAT=NOTABLE /NTILES=5 /ORDER=ANALYSIS.
There is only one box, Statistics, that will print to the Output window. This box will contain the number of valid and missing values for each variable, as well as any additional statistics we requested (in this case, the quintiles).
Note that, by default, SPSS will determine how many decimal places to use for the percentiles based on the variable's number of decimal places. For this screenshot, we have shortened the output to one decimal place for readability.
The "Compare Groups" option we selected told SPSS to put the results for all four variables in a single table, side-by-side. From this, we can quickly make several observations about the data:
Suppose we are interested in getting a rough estimate of whether or not a variable is normally distributed.
In this example, we will demonstrate what a histogram with a normal overlay looks like using the variable English from the sample dataset. This variable represents the subjects' score (out of 100 points) on an English placement test.
FREQUENCIES VARIABLES=English /FORMAT=NOTABLE /HISTOGRAM NORMAL /ORDER=ANALYSIS.
You should see a histogram that looks like this:
If the data were perfectly normally distributed, we would expect to see the bars' height match up with normal overlay. While this graph alone is not enough to decide if the English scores are normally distributed, it does allow us to see that the data is reasonably symmetrically distributed about the mean, and there do not appear to be many huge deviations from the normal curve.
How might this curve look with a variable that was non-normal? Try re-running this example using variable Weight. Pay close attention to the tails. A normally distributed variable should have near-equal numbers of observations in the right and the left tails. Also look for any bars of the histogram that are much taller or much shorter than the normal overlay.