Back to the Table of Contents

An Introduction to Statistics - Lesson 6

The Bell-shaped, Normal, Gaussian Distribution

The Bell-shaped, Normal, Gaussian Distribution

It can be shown under very general assumptions that the distribution of independent random errors of observation takes on a normal distribution as the number of observations becomes large. Although others were involved, Gauss was one of the first to characterize this distribution and hence it is often named after him. It is also shaped like a bell, hence yet another name. The term used in the title above is rather redundant, but serves to emphasize that the three are identical. You can graph this curve on your calculator as seen below by entering the following function: y=e-x2/2/ [square root] (2[pi]), where e is the transcendental number 2.71828... and [pi] is the more familiar, but also transcendental number 3.14159.... The [pi] in the formula only serves to normalize the total area under the curve. When we normalize something, we make it equal to some norm or standard, usually one (1). The word normal has several other meanings, including perpendicular and the usual/status quo.

[graph of bell-shaped curve]           The standard normal distribution

The height of the curve represents the probability of the measurement at that given distance away from the mean. The total area under the curve being one represents the fact that we are 100% certain (probability = 1.00) the measurement is somewhere. Technically, this is the standard normal curve which has µ=0.0 and =1.0. Other applications of the normal curve do not have this restriction. For example, intelligence has often been cast, albeit controversially, as normally distributed with µ=100.0 and [sigma]=15.0. This is represented below. Our function has been modified to y=e-(x-µ)2/(22)/ ([sigma] [square root] (2[pi])).

Normally distributed IQs           [graph of id distribution]

Other things which may take on a normal or quasi-normal distribution include body temperature, shoe sizes, diameters of trees, etc. It is also important to note the symmetry of the normal curve. Some curves may be slightly distorted or truncated beyond certain limits, but still primarily conform to a "heap" or "mound" shape (see below). This is often an important consideration when analyzing data or samples taken from some unknown population.

The Empirical Rule

For a normally distributed data set, the empirical rule states that 68% of the data elements are within one standard deviation of the mean, 95% are within two standard deviations, and 99.7% are within three standard deviations. Graphically, this corresponds to the area under the curve as shown below for 1 and 2 standard deviations. The empirical rule is often stated simply as 68-95-99.7. Note how this ties in with the range rule of thumb, by stating that 95% of the data usually falls within two standard deviations of the mean.

[+/- 1 std shaded under normal curve]           Data within 1 (left) and 2 (right)           [+/- 2 std shaded under normal curve]

The author usually claims an IQ of at least 145. We can see from the above information that this would put him at least three standard deviations above the population mean (100+3•15=145). Hence, if we accept the hypothesis that IQs are normally distributed, at least 99.85% of the population would have a lower IQ and less than 0.15% a higher one. Please especially note that if 99.7% of the population is within three standard deviations of the mean, the remaining 0.3% is distributed with half beyond three standard deviations below the mean and the other half beyond three standard deviations above the mean. This is a result of the symmetry (due to the fact that x is squared, it matters not if it is positive or negative) of the curve. In practical terms, in a population of 250,000,000; 249,625,000 would have an IQ lower than 145 and 375,000 would have an IQ higher. Because of the small area of these regions, they are often referred to as tails. Depending on the circumstances, we may be interested in one tail or two tails.

Several societies exist which cater to individuals with high IQs. Some specific examples would be MENSA, Triple Nine, Mega, etc.

Another important charateristic of this distribution is that it is of infinite extent. In practical terms, IQs below 0 (-6.67) or above 210 (7.33) (ceiling scores such as Marilyn Vos Savant's are difficult to interpret) do not occur. A recently popularized manufacturing goal has been termed Six Sigma. Interpretted as ±6, one would think this would corresponds with about 2 defects per billion, but their web site implies it is 200 per million. A typically good company operates at less than plus or minus four sigma or 99.994% perfect. This corresponds closer to 63 defects per million. If your family has ever purchased a "lemon" (a colloquialism for bad car, perhaps one built on a Monday) you can appreciate such striving for perfection. (Another source implies six sigma refers to ±3.) Other similar examples would be the large increase in errors related to prescription drugs being dispensed or the case of the Florida patient who had the wrong leg amputated.

Chebyshev's Theorem

Chebyshev (1821–1894) was a preeminent Russian mathematician who primarily worked on the theory of prime numbers, although his writings covered a wide range of subjects. One of those subjects was probability and his theorem applies to any data set, not only normally distributed data sets. His theorem states that the portion of any set of data within K standard deviations of the mean is always at least 1-1/K2, where K may be any number greater than 1.

For K=2, we see that 1-1/22=1-1/4=3/4, which is 75% of the data must always be within
two standard deviations of the mean.

For K=3, we see that 1-1/32=1-1/9=8/9, which is about 89% of the data must always be within
three standard deviations of the mean.

If we consider the data set 50, 50, 50, and 100, we will discover that the sample standard deviation (s) is 25, and the upper score falls exactly at 2s above the rest. However, since the mean is 62.5, it is well within 2s. Added 5 more scores of 50 we find the mean is now 55.6 and the standard deviation now 16.7. We see that two standard deviations above the mean now extends to 88.9 and we have one data point outside that, but within three standard deviations. The general concept of being able to find the mean of a data set and determine how much of it is within a certain distance (number of standard deviations) of the mean is an important one which we will continue in the next lesson.

Note: here is an example of a data set with k=2 and only 75% of the data within the proscribed limits. It comes to us from Hogg and Craig (1978, p. 70) in "Introduction to Mathematical Statistics". (5th ed.) via the AP STAT list server on May 31, 2000. Let the discrete random variable x have probabilities 1/8, 6/8, 1/8 at the points x= -1, 0, 1 respectively. µ=0 and [sigma]2=1/4. If k=2, then 1/k2=1/4 and we thus attain the bound given by Chebyshev's inequality.

Meanings of Normal

The word normal is used extensively in math and science and generally has a very context specific meaning which differs from that "normally" encountered. We list here several.
  1. Normal in mathematics and physics often means perpendicular to. A normal vector is perpendicular to a given line or plane.
  2. Normal in statistics generally refers to the gaussian distribution or the "normal" way we would expect errors to be distributed.
  3. Normal in chemistry refers to the molarity or concentration of an acid or base. Specifically, these are related by normality equals the molarity times the number of equivalents, where equivalents refers to number of solutes. For acids like HCl and basics like NaOH the numerical values of normality and molarity are equal. However, H2SO4 has twice the normality for a given molarity.
  4. Normal can refer to the fact that the area has been made equal to one (to normalize) so that area and probability are equivalent.

Other Distributions

There are many ways a data set may be distributed. The study of these ways will take up a fair section of our statistical studies next year. Of particular importance are the following: uniform distribution, binomial distribution, hypergeometric distribution, Poisson distribution, Lorentzian distribution, Student t distribution, Chi-square distribution, and F distribution.

BACK HOMEWORK ACTIVITY CONTINUE