Our tutorials reference a dataset called "sample" in many examples. If you'd like to download the sample dataset to work through the examples, choose one of the files below:
The Paired Samples t Test compares the means of two measurements taken from the same individual, object, or related units. These "paired" measurements can represent things like:
The purpose of the test is to determine whether there is statistical evidence that the mean difference between paired observations is significantly different from zero. The Paired Samples t Test is a parametric test.
This test is also known as:
The variable used in this test is known as:
The Paired Samples t Test is commonly used to test the following:
Note: The Paired Samples t Test can only compare the means for two (and only two) related (paired) units on a continuous outcome that is normally distributed. The Paired Samples t Test is not appropriate for analyses involving the following: 1) unpaired data; 2) comparisons between more than two units/groups; 3) a continuous outcome that is not normally distributed; and 4) an ordinal/ranked outcome.
Your data must meet the following requirements:
Note: When testing assumptions related to normality and outliers, you must use a variable that represents the difference between the paired values - not the original variables themselves.
Note: When one or more of the assumptions for the Paired Samples t Test are not met, you may want to run the nonparametric Wilcoxon Signed-Ranks Test instead.
The hypotheses can be expressed in two different ways that express the same idea and are mathematically equivalent:
H_{0}: µ_{1} = µ_{2} ("the paired population means are equal")
H_{1}: µ_{1} ≠ µ_{2} ("the paired population means are not equal")
OR
H_{0}: µ_{1} - µ_{2} = 0 ("the difference between the paired population means is equal to 0")
H_{1}: µ_{1} - µ_{2} ≠ 0 ("the difference between the paired population means is not 0")
where
The test statistic for the Paired Samples t Test, denoted t, follows the same formula as the one sample t test.
$$ t = \frac{\overline{x}_{\mathrm{diff}}-0}{s_{\overline{x}}} $$
where
$$ s_{\overline{x}} = \frac{s_{\mathrm{diff}}}{\sqrt{n}} $$
where
\(\bar{x}_{\mathrm{diff}}\) = Sample mean of the differences
\(n\) = Sample size (i.e., number of observations)
\(s_{\mathrm{diff}}\)= Sample standard deviation of the differences
\(s_{\bar{x}}\) = Estimated standard error of the mean (s/sqrt(n))
The calculated t value is then compared to the critical t value with df = n - 1 from the t distribution table for a chosen confidence level. If the calculated t value is greater than the critical t value, then we reject the null hypothesis (and conclude that the means are significantly different).
Your data should include two continuous numeric variables (represented in columns) that will be used in the analysis. The two variables should represent the paired variables for each subject (row). If your data are arranged differently (e.g., cases represent repeated units/subjects), simply restructure the data to reflect this format.
When conducting a Paired Samples t Test, the general form of PROC TTEST is:
PROC TTEST DATA=dataset-name ALPHA=.05;
PAIRED V1*V2;
RUN;
In the PROC TTEST
statement, the DATA
option specifies the name of your dataset. The optional ALPHA
option specifies the desired significance level. By default, PROC TTEST uses ALPHA=.05 (i.e., 5% significance), but you can set it to ALPHA=.01 for 1% significance, or ALPHA=.10 for 10% significance, etc.
The PAIRED
statement is where you specify the pair(s) of variables you will test, using an asterisk between the variable names denote a pair. You can specify multiple pairings by separating each pairing with a space.
The sample dataset has placement test scores (out of 100 points) for four subject areas: English, Reading, Math, and Writing. Each student took all four placement tests. Suppose we are particularly interested in the English and Math sections, and want to determine whether English or Math had higher test scores on average. We could use a paired t test to test if there was a significant difference in the average of the two tests.
The hypotheses for this example can be expressed as:
H_{0}: µ_{English} - µ_{Math} = 0 ("the difference between the average English and Math scores is equal to 0")
H_{1}: µ_{English} - µ_{Math} ≠ 0 ("the difference between the average English and Math scores is not 0")
Before we perform our hypothesis tests, we should decide on a significance level (denoted α). The significance level is the threshold we will use to decide whether a test result is significant. For this example, let's use α = 0.05, or 5%.
In the sample dataset, each student's responses are recorded on one row. Their English and Math scores are represented in the variables English and Math. This format is already appropriate for the paired samples t-test, so no further restructuring is needed.
PROC TTEST DATA=sample ALPHA=.05;
PAIRED English*Math;
RUN;
After executing the SAS program above, SAS produces the following set of tables:
The heading "Difference: English - Math" tells us the order of the subtraction used for these numbers. This is important, since it determines how we interpret positive and negative numbers. Because Math is subtracted from English, positive numbers correspond to higher English scores, and negative numbers correspond to higher Math scores.
In the first table, we have descriptive statistics for the difference scores:
In the second table, we have the 95% confidence intervals for the mean difference and the standard deviation of the differences.
In the third table, we have the actual paired t test results. The p-value is very small (p < .0001), so we reject the null hypothesis that the average English and Math scores were the same, and conclude that the English scores had a significantly different average than the Math scores.
The first graph depicts the distribution of the difference scores, using both a histogram (top panel) and a boxplot (bottom panel).
If this histogram were centered about 0, it would correspond to no difference between the two test scores; however, the highlighted region in the boxplot shows that the center of the distribution is between 17 and 18.
The second graph is a paired profile plot. Profile plots depict the "trajectory" of individuals. Each line represents one subject or case. On the left side is the person's English score; on the right side is their Math score. (Note that the axes on both sides have the same range; this is an important feature of profile plots.) By looking at the slope of these lines, we can get a feel for whether the scores are approximately equal (horizontal lines), or if one score was higher than the other (sloping lines).
Although some of the lines are roughly horizontal, most of the lines tend to have a downward slope. Since English is on the left and Math is on the right, this corresponds to most students scoring higher on the English placement test than the Math placement test. The red line showing the average trend confirms this.
The third graph shows the "agreement" of the two scores. The plot itself is a variation on a simple scatterplot. The diagonal reference line represents identical English and Math scores. Datapoints that fall on this line (or near this line) represent students who scored the same on their English and Math tests. Datapoints above the line represent students who scored higher on the Math test than the English test. Points below the line represent students who scored higher on English than Math. In this graph, we see many more points below the line than above the line, which means that most students scored higher on English than on Math.
The fourth graph is a Q-Q plot, or quantile-quantile plot, of the difference scores. Q-Q Plots are used to inspect whether an observed variable (represented as points) matches what we would expect that variable to look like if it were truly normally distributed (represented as a solid line). To read a Q-Q plot, we look to see if the dots (the observed values) match up with the expected values for a normal distribution (the diagonal line). If the points fall along the line, then the values are consistent with what we would expect them to be if the data were truly normally distributed. Here, we see that the majority of the difference scores fall on the diagonal line, so we can say that the data appear to be approximately normally distributed.
Since p < .0001 is less than our chosen significance level α = 0.05, we can reject the null hypothesis, and conclude that the English and Math scores were significantly different from each other.
Based on the results, we can state the following: