An Introduction to Statistics

Review of Statistics Lessons 5 and 6

Dispersion is how a data set is distributed.
Common measures of dispersion are range, standard deviation, and variance.
Range is the difference between the highest and lowest data element.
Range is easily distorted, due to its use of but two elements.
Standard deviation is by far the most important measure of dispersion.
Standard deviation is the average distance of each data element from the mean.
The formula for standard deviation varies depending on whether it is for a sample or a population.
Sample standard deviation is denoted by s, whereas population standard deviation is denoted by .
This use of Roman characters for sample and Greek charcters for population is standard.
The sample standard deviation is slightly larger because of the dependance on the sample mean.
Degrees of freedom is an important statistic in any statistical study.
Standard deviation comes as the square root of the variance.
Standard deviation has the same units as the data so can be easier to understand.
In general, the range of a sample is about four times its standard deviation (range rule of thumb).
Three is the smallest sample size where standard deviation is meaningful.
Variance is a primary statistic, standard deviation is derived, be careful with precision/accuracy.

The Normal Distribution has two other names: Gaussian, Bell-shaped.
Error distributions and many other phenomena tend toward a normal distribution.
The normal distribution is symmetric.
A standard normal distribution has an area of 1, mean of 0, and standard deviation of 1.
The empirical rule is based on the normal distribution of 68%-95%-99.7% of a data set being within 1, 2 or 3 standard deviations of the mean.
IQ scores with mean of 100 and standard deviation of 15 are a common nonstandard example.
The thin parts of a distribution are called tails.
Statistics can be interested in one tail, the left tail or the right tail, or both.
The Math & Science Center draws students from the upper tail of the IQ curve.
Whereas Blossomland draws students from the lower tail of the IQ curve.
In theory, the tails are of infinite extent.
In practice, the tails are especially difficult to measure.
Chebyshev's Theorem applies to any distribution.
Chebyshev's Theorem guarantees that 1-1/K² of the data to be within K standard deviations of the mean, for K > 1.

e-mail: calkins@andrews.edu
voice/mail: 269 471-6629/ BCM&S Smith Hall 106; Andrews University;
classroom/FAX: 269 471-6646; Smith Hall 100/ 269 471-3713; Berrien Springs, MI, 49104-0140
home: 269 473-2572; 610 N. Main St.; Berrien Springs, MI 49103-1013
URL: http://www.andrews.edu/~calkins/math/webtexts/stat56rv.htm