First, a little history about this curious name. William Gosset (1876-1937) was a Guinness Brewery employee who needed a distribution that could be used with small samples. Since the Irish brewery did not allow publication of research results, he published under the pseudonym of Student. We know that large samples approach a normal distribution. What Gosset showed was that small samples taken from an essentially normal population have a distribution characterized by the sample size. The population does not have to be exactly normal, only unimodal and basically symmetric. This is often characterized as heap-shaped or mound shaped.
Following are the important properties of the Student t distribution.
To use the Student t distribution which is often referred to just as the t distribution, the first step is to calculate a t-score. This is much like finding the z-score. The formula is:
|
t = ( |
Actually, since the population mean is likely also unknown, often the t-score will be looked up based on the level of confidence desired and the degrees of freedom and the population estimated. Degrees of freedom is a fairly technical term which permeates all of inferential statistics. In this case, it is n-1.
|
In general, the degrees of freedom is the number of values that
can vary after certain restrictions have been imposed on all values. |
Where does the term degrees of freedom come from? Suppose, for example, that you have a phone bill from Ameritech that says your household owes $100. Your mother and father state that $70 of it is theirs and that your younger sibling owes only $5. How much does that leave you? Here, n=3 (parents, sibling, you), but once you have the total (or mean) and two more pieces of information, the last data element is constrained. The same is true with the degrees of freedom, you can arbitrarily use any n-1 data points, but the last one will be determined for a given mean. Another example is with 10 tests that averaged 55, if you assign nine people random grades, the last test score is not random, but constrained by the overall mean. Thus for 10 tests and a mean, there are nine degrees of freedom.
If the interval calls for a 90% confidence level, then alpha = 0.10 and alpha/2 = 0.05 (for a two-tailed test). Tables of t values typically have a column for degrees of freedom and then columns of t values corresponding with various tail areas. An abbreviated table is given below. For a complete set of values consult a larger table or your TI-83+ graphing calculator. DISTR 5 gives tcdf. tcdf expects three arguments, lower t value, upper t value, and degrees of freedom. Since no inverse t function is given on the calculator, some guessing may be involved. Note how tcdf(9.9,9E99,2) indicates a t value of about 9.9 for a one tailed area of 0.005 with two degrees of freedom. Please locate the corresponding value of 9.925 in the table.
As with other confidence intervals, we use the t-score to obtain the margin of error term which is added and subtracted from the statistic of interest (in this case, the sample mean) to obtain a confidence interval for the parameter of interest (in this case, the population mean). In this case the margin of error is defined (since you don't have population standard deviation you use the sample's) as:
| ME = talpha/2 (s ÷ sqrt(n)) |
Your confidence interval should look like:
- ME < µ <
+ ME.
| Degrees of Freedom\1/2 tails | .005/.01 | .01/.02 | .025/.05 | .05/.10 | .10/.20 |
|---|---|---|---|---|---|
| 1 | 63.66 | 31.82 | 12.71 | 6.314 | 3.078 |
| 2 | 9.925 | 6.965 | 4.303 | 2.920 | 1.886 |
| 3 | 5.841 | 4.541 | 3.182 | 2.353 | 1.638 |
| 4 | 4.604 | 3.747 | 2.776 | 2.132 | 1.533 |
| 5 | 4.032 | 3.365 | 2.571 | 2.015 | 1.476 |
| 10 | 3.169 | 2.764 | 2.228 | 1.812 | 1.372 |
| 15 | 2.947 | 2.602 | 2.132 | 1.753 | 1.341 |
| 20 | 2.845 | 2.528 | 2.086 | 1.725 | 1.325 |
| 25 | 2.787 | 2.485 | 2.060 | 1.708 | 1.316 |
| z | 2.575 | 2.326 | 1.960 | 1.645 | 1.282 |
Although the t procedure is fairly robust, that is it does not change very much when the assumptions of the procedure are violated, you should always plot the data to check for skewness and outliers before using it on small samples. Here small can be interpreted as n < 15. If your sample is small and the data is clearly nonnormal or outliers are present, do not use the t. If your sample is not small, but n < 40, and there are outliners or strong skewness, do not use the t. Since the assumption that the samples are random is more important that the normality of the population distribution, the t statistic can be safely used even when the sample indicates the population is clearly skewed, if n > 40.
|
t =
(( |
The expression in the denominator reflects the way variances sum (standard deviations do not sum). There are two options for obtaining a value for the degrees of freedom. Calculate a fractional degrees of freedom as given below, or use the smaller of n1-1 or n2-1. This latter value always results in conservative results. As sample size increases, this latter procedure also becomes more accurate. The two-sample t procedures are more robust than the one-sample methods, especially when the distributions are not symmetric. If the sizes of the two samples are equal and the two distributions have similar shapes, it can be accurate down to sample sizes as small as n1 = n2 = 5. The two-sample t procedure is most robust against nonnormality when the two samples are of equal size. Thus when planning such a study, you should make them equal.
The fractional degrees of freedom formula is as follows:
| d.f.= (s12/n1 + s22/n2) ÷ (((s12/n1)2 ÷ (n1-1)) + ((s22/n2)2 ÷ (n2-1))) |
An example might be before and after SAT scores after a high-priced course of study. Or your typical freshman practice EXPO project where peas, corn, or other seeds are grown with and without (control) a treatment. Some Biology instructions and EXPO judges have expected our freshmen to perform this calculation!
| T. OF CONTENTS | HOMEWORK | SOLUTIONS | ACTIVITY | CONTINUE |
|---|