Skip to main content

SPSS Tutorials: Exploring Data

View our tutorials for exploring data using descriptive statistical and graphical methods in SPSS.

Introduction

Before doing any kind of statistical testing or model building, you should always examine your data using summary statistics and graphs. This process is called exploratory data analysis, and it's a crucial part of every research project. Exploratory data analysis is about "getting to know" your data: which values are typical, which values are unusual; where is it centered, how spread out is it; what are its extremes. More importantly, it's an opportunity to identify and correct any problems in your data that would affect the conclusions you draw from your analysis.

How do we "get to know" our data? The answer is different depending on whether our variables are numeric or categorical. In this section, we'll demonstrate which statistics and SPSS procedures to use for both types of data.

Part 1: Descriptive Statistics for Continuous Variables

When summarizing a quantitative (continuous/interval/ratio) variable, we are typically interested in things like:

  • How many observations were there? How many cases had missing values? (N valid; N missing)
  • Where is the "center" of the data? (Mean, median)
  • Where are the "benchmarks" of the data? (Quartiles, percentiles)
  • How spread out is the data? (Standard deviation/variance)
  • What are the extremes of the data? (Minimum, maximum; Outliers)
  • What is the "shape" of the distribution? Is it symmetric or asymmetric? Are the values mostly clustered about the mean, or are there many values in the "tails" of the distribution? (Skewness, kurtosis)

In Part 1, we discuss how to explore quantitative (continuous/interval/ratio scale) data using the Descriptives, Compare Means, Explore, and Frequencies procedures. Each of these procedures offers different strengths for summarizing continuous variables. The Descriptives and Frequencies commands provide summary statistics for an entire sample, while the Explore and Compare Means commands can produce descriptive statistics for subsets of the sample.

Part 2: Descriptive Statistics for Categorical Variables

When summarizing qualitative (nominal or ordinal) variables, we are typically interested in things like:

  • How many cases were in each category? (Counts)
  • What proportion of the cases were in each category? (Percentage, valid percent, cumulative percent)
  • What was the most frequently occurring category (i.e., the category with the most observations)? (Mode)

In Part 2, we describe how to obtain descriptive statistics for categorical variables using the Frequencies and Crosstabs procedures.

Sample Data Files

Our tutorials reference a dataset called "sample" in many examples. If you'd like to download the sample dataset to work through the examples, choose one of the files below: