Our tutorials reference a dataset called "sample" in many examples. If you'd like to download the sample dataset to work through the examples, choose one of the files below:
The Independent Samples t Test compares the means of two independent groups in order to determine whether there is statistical evidence that the associated population means are significantly different. The Independent Samples t Test is a parametric test.
This test is also known as:
The variables used in this test are known as:
The Independent Samples t Test is commonly used to test the following:
Note: The Independent Samples t Test can only compare the means for two (and only two) groups. It cannot make comparisons among more than two groups. If you wish to compare the means across more than two groups, you will likely want to run an ANOVA.
Your data must meet the following requirements:
Note: When one or more of the assumptions for the Independent Samples t Test are not met, you may want to run the nonparametric Mann-Whitney U Test instead.
Researchers often follow several rules of thumb:
^{1 }Welch, B. L. (1947). The generalization of "Student's" problem when several different population variances are involved. Biometrika, 34(1–2), 28–35.
The null hypothesis (H_{0}) and alternative hypothesis (H_{1}) of the Independent Samples t Test can be expressed in two different but equivalent ways:
H_{0}: µ_{1} = µ_{2} ("the two population means are equal")
H_{1}: µ_{1} ≠ µ_{2} ("the two population means are not equal")
OR
H_{0}: µ_{1} - µ_{2} = 0 ("the difference between the two population means is equal to 0")
H_{1}: µ_{1} - µ_{2} ≠ 0 ("the difference between the two population means is not 0")
where µ_{1} and µ_{2} are the population means for group 1 and group 2, respectively. Notice that the second set of hypotheses can be derived from the first set by simply subtracting µ_{2} from both sides of the equation.
The test statistic for an Independent Samples t Test is denoted t. However, there are different formulas for the test statistic and degrees of freedom, based on whether or not we assume that the two groups have equal variances.
SAS produces both forms of the test, so both forms of the test are described here. Note that the null and alternative hypotheses are identical for both forms of the test statistic.
There are two differences between the Pooled and Satterthwaite t tests: how the test statistic is calculated, and the degrees of freedom used to determine the significance of the test statistic. The test statistic for the Pooled t test uses pooled variances, and the degrees of freedom are n_{1}+n_{2}-2. The test statistic for the Satterthwaite t test utilizes un-pooled variances, and the degrees of freedom use a special correction formula called the Satterthwaite equation. Why does it matter? When the groups have unequal variances, that introduces uncertainty into the results; the way we take this into account is by using a more conservative estimate for the degrees of freedom and test statistic.
When the two independent samples are assumed to be drawn from populations with identical population variances (i.e., σ_{1}^{2} = σ_{2}^{2}) , the test statistic t is computed as:
$$ t = \frac{\overline{x}_{1} - \overline{x}_{2}}{s_{p}\sqrt{\frac{1}{n_{1}} + \frac{1}{n_{2}}}} $$
with
$$ s_{p} = \sqrt{\frac{(n_{1} - 1)s_{1}^{2} + (n_{2} - 1)s_{2}^{2}}{n_{1} + n_{2} - 2}} $$
Where
\(\bar{x}_{1}\) = Mean of first sample
\(\bar{x}_{2}\) = Mean of second sample
\(n_{1}\) = Sample size (i.e., number of observations) of first sample
\(n_{2}\) = Sample size (i.e., number of observations) of second sample
\(s_{1}\) = Standard deviation of first sample
\(s_{2}\) = Standard deviation of second sample
\(s_{p}\) = Pooled standard deviation
The calculated t value is then compared to the critical t value from the t distribution table with degrees of freedom df = n_{1} + n_{2} - 2 and chosen confidence level. If the calculated t value is greater than the critical t value, then we reject the null hypothesis.
Note that this form of the independent samples T test statistic assumes equal variances.
Because we assume equal population variances, it is OK to "pool" the sample variances (s_{p}). However, if this assumption is violated, the pooled variance estimate may not be accurate, which would affect the accuracy of our test statistic (and hence, the p-value).
When the two independent samples are assumed to be drawn from populations with unequal variances (i.e., σ_{1}^{2} ≠ σ_{2}^{2}), the test statistic t is computed as:
$$ t = \frac{\overline{x}_{1} - \overline{x}_{2}}{\sqrt{\frac{s_{1}^{2}}{n_{1}} + \frac{s_{2}^{2}}{n_{2}}}} $$
where
\(\bar{x}_{1}\) = Mean of first sample
\(\bar{x}_{2}\) = Mean of second sample
\(n_{1}\) = Sample size (i.e., number of observations) of first sample
\(n_{2}\) = Sample size (i.e., number of observations) of second sample
\(s_{1}\) = Standard deviation of first sample
\(s_{2}\) = Standard deviation of second sample
The calculated t value is then compared to the critical t value from the t distribution table with degrees of freedom
$$ df = \frac{ \left ( \frac{s_{1}^2}{n_{1}} + \frac{s_{2}^2}{n_{2}} \right ) ^{2} }{ \frac{1}{n_{1}-1} \left ( \frac{s_{1}^2}{n_{1}} \right ) ^{2} + \frac{1}{n_{2}-1} \left ( \frac{s_{2}^2}{n_{2}} \right ) ^{2}} $$
and chosen confidence level. If the calculated t value > critical t value, then we reject the null hypothesis.
Note that this form of the independent samples T test statistic does not assume equal variances. This is why both the denominator of the test statistic and the degrees of freedom of the critical value of t are different than the equal variances form of the test statistic.
Recall that the Independent Samples t Test requires the assumption of homogeneity of variance -- i.e., both groups have the same variance. SAS includes a test for the homogeneity of variance, called the Folded F Test, whenever you run an independent samples T test.
The hypotheses for the folded F test are:
H_{0}: σ_{1}^{2} = σ_{2}^{2} ("the population variances of group 1 and 2 are equal")
H_{1}: σ_{1}^{2} ≠ σ_{2}^{2} ("the population variances of group 1 and 2 are not equal")
This implies that if we reject the null hypothesis of the Folded F Test, it suggests that the variances of the two groups are not equal; i.e., that the homogeneity of variances assumption is violated. (Source: SAS 9.2 User's Guide, Second Edition)
You will use the results of the Folded F test to determine which output from the Independent Samples t test to rely on: Pooled or Satterthwaite. If the test indicates that the variances are equal across the two groups (i.e., p-value large), you will rely on the Pooled output when you look at the results for the Independent Samples t Test. If the test indicates that the variances are not equal across the two groups (i.e., p-value small), you will need to rely on the Satterthwaite output when you look at the results of the Independent Samples t Test.
Your data should include two variables (represented in columns) that will be used in the analysis. The independent variable should be categorical, and should have exactly two groups. The independent variable's type can be numeric or string, as long as there are only two categories. (Missing values do not count as a category.) The dependent variable should be continuous (i.e., interval or ratio), and must therefore be numeric. Each row of the dataset should represent a unique subject or case.
The following screenshot shows a selection of variables (not exhaustive) from the sample dataset that could be used in an Independent Samples t Test:
In this example, the variables Gender, Athlete, and State would be acceptable for use as independent variables in the Independent Samples t Test. Gender and Athlete are numeric, with data values 0 and 1. State is a string variable, with data values "In state" and "Out of state".
The variables Height, MileMinDur, and English would be acceptable for use as dependent variables in the Independent Samples t Test. Variables Height and English are both numeric. Variable MileMinDur is a duration variable, which is a special type of numeric variable in SAS. (However, if MileMinDur was read into SAS as a character variable, it will need to be converted to a duration variable before using it in the t test.)
SAS can only make use of cases that have nonmissing values for the independent and the dependent variables, so if a case has a missing value for either variable, it can not be included in the test. Additionally, if you try to use a variable with more than two categories as the independent variable, SAS will return an error.
When conducting an Independent Samples t Test, the general form of PROC TTEST is:
PROC TTEST DATA=dataset-name ALPHA=.05;
VAR dependent-variable-name(s);
CLASS independent-variable-name(s);
RUN;
In the PROC TTEST
statement, the DATA
option specifies the name of your dataset. The optional ALPHA
option specifies the desired significance level. By default, PROC TTEST uses ALPHA=.05 (i.e., 5% significance), but you can set it to ALPHA=.01 for 1% significance, or ALPHA=.10 for 10% significance, etc.
The VAR
statement is where you specify the dependent variable(s) -- that is, the continuous numeric variable -- to use in the test. If you are specifying more than one dependent variable, simply separate the names of the variables using spaces.
The CLASS
statement is where you specify the independent variable -- that is, the categorical variable -- to use in the test. You may only specify one CLASS variable at a time; if you try to specify more than one CLASS variable, the procedure will not run.
If you specify more than one VAR variable, you will get back more than one t test result. Specifically, SAS will produce t tests comparing the means of each VAR variable between the groups of the CLASS variable.
When using PROC TTEST for an independent samples t-test, your independent variable must be specified using the CLASS statement, not the BY statement. Using the BY statement will partition your data into subsets based on the BY variable and run One Sample t Tests on those subsets. Using the CLASS statement will compare the means of the CLASS variable group using an Independent Samples t Test.
In our sample dataset, students reported their writing placement test scores, and whether or not they were male or female. Suppose we want to know if the average writing score is different for males versus females. This involves testing whether the sample means for writing scores among males and females in your sample are statistically different (and by extension, inferring whether the means for writing scores in the population are significantly different between these two groups). You can use an Independent Samples t Test to compare the mean writing scores for males and females.
The hypotheses for this example can be expressed as:
H_{0}: µ_{males} = µ_{females }("the mean writing scores in the population of males is identical to the mean writing score in the population of females")
H_{1}: µ_{males} ≠ µ_{females}("the two population means are not equal")
where µ_{males} and µ_{females }are the population means for males and females, respectively.
Before we perform our hypothesis tests, we should decide on a significance level (denoted α). The significance level is the threshold we will use to decide whether a test result is significant. For this example, let's use α = 0.05, or 5%.
In the sample data, we will use two variables: Gender and Writing. The variable Gender has values of either “1” or “0” which correspond to females and males, respectively. It will function as the independent variable in this t test. The variable Writing is a numeric variable, and it will function as the dependent variable. In SAS, the first few rows of data look like this (if variable and value labels have been applied):
Recall that the Independent Samples t Test has several assumptions that we must take into account:
So before we jump into the Independent Samples t Test, it is a good idea to look at descriptive statistics and graphs to get an idea of what to expect, and to see if the assumptions of the test have been reasonably met. To do this, we'll want to look at the means and standard deviations of Writing for males and females, as well as graphs that compare the distribution of Writing for males versus females. PROC TTEST
automatically runs descriptive statistics and graphs for us, but we can also use PROC MEANS
to produce descriptive statistics by group:
PROC MEANS DATA=sample;
VAR Writing;
CLASS Gender;
RUN;
PROC MEANS
tells us several important things. First, there were 204 males and 222 females in the dataset, but only 191 males and 204 females reported a writing score. (This is important to know, because PROC TTEST can only use cases with nonmissing values for both gender and writing score. So our effective sample size for this test is 191+204 = 395, which is less than the total number of rows in the sample dataset (435).) Second, the mean writing score for males is 77.14 points, while the mean writing score for females is 81.73 points. This is a difference of more than four points. Third, the standard deviations for males’ and females’ writing scores are very similar: 4.88 for males, and 5.09 for females.
For graphs, we can use the two graphs that PROC TTEST produces for an independent samples t test:
The first graph contains histograms (top 2 panels) and boxplots (bottom panel) comparing the distributions of males' writing scores and females' writing scores. From the histograms, we can see that the distribution of writing scores for both the males and the females are roughly symmetric, but the distribution of females' writing scores is "shifted" slightly to the right of the males. From the boxplots, we can see that the total length of the boxplots and the inter-quartile range (distance between the 1st and 3rd quartiles, i.e. the edges of the boxes) is similar for males and females. This is what we would expect to see if the two groups had the same variance. By contrast, when we look at the center lines in the boxplot (which represent the median score), we see that they do not line up: the center line for the females' box plot is to the right of the center line for the males' boxplot. Additionally, the diamond shape in each box plot represents the mean score; we see that the mean score for the females is to the right of the mean score for the males. If the two groups had the same mean, we would expect these center lines and/or diamonds to "line up" vertically.
The second graph contains Q-Q plots of the writing scores for males (left panel) versus females (right panel). The Q-Q plots produced by PROC TTEST can be used to check if a variable's observed values are consistent with what we would expect them to be if the variable was truly normally distributed. To read a Q-Q plot, we look to see if the dots (the observed values) match up with the expected values for a normal distribution (the diagonal line). If the points fall along the line, then the values are consistent with what we would expect them to be if the data were truly normally distributed. In this case, we see that the values in the middle of the range are consistent with a normal distribution, for both males and females. Both groups have slight deviations from normality in the tails. Therefore, the normality assumption required for the independent samples t test appears to be satisfied.
PROC TTEST DATA=work.sample ALPHA=.05;
VAR Writing;
CLASS Gender;
RUN;
Four tables appear in the PROC TTEST
output; let's go through each one in the order they appear.
The first table contains descriptive statistics for both groups, including the valid sample size (n), mean, standard deviation, standard error (s/sqrt(n)), minimum, and maximum. Much of this we already saw in the PROC MEANS
output, but this table also contains the computed difference between the two means. In this case, the first mean (male) was 4.5961 points lower than the second mean (females). In plain English, this means that, on average, females scored over 4 points higher on their writing placement test than males. Keep in mind that the independent samples t test is testing whether or not this difference is statistically different from zero.
The second table contains confidence limits for the group means, confidence limits for the group standard deviations, and confidence limits for the difference in the means (which is what we're interested in). Notice that there are two different confidence interval formulas for the difference. The first, Pooled, assumes that both groups have the same variance in writing scores. The second, Satterthwaite, does not make this assumption (i.e., it takes into account that one group has a different variance in writing scores than the other). We know from our exploratory data analysis that males and females have similar standard deviations, so we should look at the Pooled confidence interval. The 95% confidence interval for the difference in the writing scores is (-5.5846, -3.6076).
The third table contains the actual t-test results, and the fourth table contains the "Equality of Variances" test results:
Previously, we had used informal methods (descriptive statistics and graphs) to check if the two groups had the same variance in writing scores. However, we can do a "formal" hypothesis test to check if the two variances are approximately equal, using the Folded F test in the “Equality of Variances” table. This can help us decide whether we should use the Pooled or Satterthwaite result. The null hypothesis of the Folded F test is that the variances are equal; the alternative is that the variances are not equal. Because the p-value is greater than alpha (.05), we fail to reject the null hypothesis, and conclude that the variance of writing scores is equal for these two groups. Therefore, we will use the Pooled version of the independent samples t test.
Going back to the third table, we see that there are two versions of the t test: Pooled (which assumes equal variances) and Satterthwaite (which does not assume equal variances). The columns of the table, from left to right, are:
Based on the Folded F test, we decided to use the Pooled version of the test statistic. To determine if the result is significant or not, we compare the Pooled p-value (p < .001) against our chosen significance level alpha (.05). Since the p-value is smaller than alpha, we reject the null hypothesis.
Since p < .0001 is less than our chosen significance level α = 0.05, we can reject the null hypothesis, and conclude that males and females had a statistically significant difference in their average writing scores.
Based on the results, we can state the following:
In our sample dataset, students reported their typical time to run a mile, and whether or not they were an athlete. Suppose we want to know if the average time to run a mile is different for athletes versus non-athletes. This involves testing whether the sample means for mile time among athletes and non-athletes in your sample are statistically different (and by extension, inferring whether the means for mile times in the population are significantly different between these two groups). You can use an Independent Samples t Test to compare the mean mile time for athletes and non-athletes.
The hypotheses for this example can be expressed as:
H_{0}: µ_{athlete} = µ_{non-athlete} ("the mean mile time in the population of athletes is identical to the mean mile time in the population of non-athletes")
H_{1}: µ_{athlete} ≠ µ_{non-athlete} ("the two population means are not equal")
where µ_{athlete} and µ_{non-athlete} are the population means for athletes and non-athletes, respectively.
Additionally, we should decide on a significance level (typically denoted using the Greek letter alpha, α) before we perform our hypothesis tests. The significance level is the threshold we use to decide whether a test result is significant. For this example, let's use α = 0.05.
In the sample data, we will use two variables: Athlete and MileMinDur. The variable Athlete has values of either “1” or “0” which correspond to athletes and non-athletes, respectively. It will function as the independent variable in this t test. The variable MileMinDur is a numeric duration variable (h:mm:ss), and it will function as the dependent variable. In SAS, the first few rows of data look like this (if variable and value labels have been applied):
As before, we will look at descriptive statistics and graphs to get an idea of the differences in the groups' distributions, means, and variances. This time, we will compare the means and standard deviations of MileMinDur for the athletes and non-athletes, as well as graphs that compare the distribution of MileMinDur for athletes versus non-athletes.
PROC MEANS DATA=sample MAXDEC=1;
VAR MileMinDur;
CLASS Athlete;
RUN;
From this table, we can see that:
Looking at the graphs from the PROC TTEST output:
The top two panels in the histogram show the distribution of the mile run times for athletes and non-athletes, respectively. We can see that the distribution of mile times for both the athletes and the non-athletes are roughly symmetric, but the data range for the non-athletes is much larger. In the bottom panel, we see comparative boxplots of the same data. From the boxplots, we can see that the total length of the boxplots and the inter-quartile range (distance between the 1st and 3rd quartiles, i.e. the edges of the boxes) is much larger for non-athletes than athletes. If the variances of these two groups were indeed equal, we would expect the total length of the boxplots to be about the same for both groups. However, from this boxplot, it is clear that the spread of observations for non-athletes is much greater than the spread of observations for athletes. Already, we can estimate that the variances for these two groups are quite different. (We can confirm this later using the Folded F test in the PROC TTEST output.)
The second graph contains Q-Q plots of the mile run times for non-athletes and athletes. As before, we check to see if the data values (the dots) match up to the values expected from a normal distribution (the line). In this case, we see that the values in the middle of the range are consistent with a normal distribution, for both athletes and non-athletes. Both groups have slight deviations from normality in the tails.
PROC TTEST DATA=work.sample ALPHA=.05;
VAR MileMinDur;
CLASS Athlete;
RUN;
Four tables appear in the PROC TTEST output.
The first table contains descriptive statistics for both groups, including the valid sample size (n), mean, standard deviation, standard error (s/sqrt(n)), minimum, and maximum. Much of this we already saw in the PROC MEANS output, but this table also contains the computed difference between the two means. In this case, the first mean (non-athletes) was 134.8 seconds larger than the second mean (athletes). In plain English, this means that, on average, it took the non-athletes about 2 minutes and 14.8 seconds longer than the athletes to complete their mile run. Keep in mind that the independent samples t test is testing whether or not this difference is statistically different from zero.
The second table contains confidence limits for the group means, confidence limits for the group standard deviations, and confidence limits for the difference in the means. We specifically want to focus on the confidence intervals for the difference in the means. Notice that there are two different confidence interval formulas for the difference. The first, Pooled, assumes that both groups have the same variance in mile run time. The second, Satterthwaite, does not make this assumption (i.e., it takes into account that one group has a different variance in mile run time than the other). We know from our exploratory data analysis that the athletes and non-athletes have different variances, so we should look at the Satterthwaite confidence interval. The 95% confidence interval for the difference in the mile run times is (117.2, 152.4).
Tables 3 and 4 contain the independent samples t test and Folded F test, respectively. This time, we had ample graphical evidence of unequal variances between the groups, so we can use the Folded F test to see if the difference in the variances is significant. Recall that the null hypothesis of this test is that the variances are equal; the alternative is that the variances are not equal. Because the p-value is less than alpha (.05), we reject the null hypothesis, and conclude that the variance of the mile run times is different for these two groups. Because of this, we will use the Satterthwaite version of the test.
Going back to table 3, we now compare the Satterthwaite t-test's p-value (p < .001) against our chosen significance level alpha (.05). Since the p-value is smaller than alpha, we reject the null hypothesis.
Since p < .0001 is less than our chosen significance level α = 0.05, we can reject the null hypothesis, and conclude that the that the mean mile time for athletes and non-athletes is significantly different.
Based on the results, we can state the following: