), where e is the
transcendental number 2.71828... and
is the more familiar,
but also transcendental number 3.14159....
The
in the formula only serves to
normalize the total area under the curve.
When we normalize something, we make it equal
to some norm or standard, usually one (1).
The word normal has several other meanings, including
perpendicular and the usual/status quo.
| The standard normal distribution |
The height of the curve represents the probability of the
measurement at that given distance away from the mean.
The total area under the curve being one represents the fact
that we are 100% certain (probability = 1.00) the measurement
is somewhere. Technically, this is the standard normal
curve which has
µ=0.0 and
=1.0.
Other applications of the normal curve do not have this restriction.
For example, intelligence has often been cast, albeit controversially,
as normally distributed with
µ=100.0 and
=15.0.
This is represented below. Our function has been modified to
y=e-(x-µ)2/(2
2)/
(
(2
)).
| Normally distributed IQs |
|
Other things which may take on a normal or quasi-normal distribution include body temperature, shoe sizes, diameters of trees, etc. It is also important to note the symmetry of the normal curve. Some curves may be slightly distorted or truncated beyond certain limits, but still primarily conform to a "heap" or "mound" shape (see below). This is often an important consideration when analyzing data or samples taken from some unknown population.
| Data within 1 (left) and 2 (right)
|
|
The author usually claims an IQ of at least 145. We can see from the above information that this would put him at least three standard deviations above the population mean (100+315=145). Hence, if we accept the hypothesis that IQs are normally distributed, at least 99.85% of the population would have a lower IQ and less than 0.15% a higher one. Please especially note that if 99.7% of the population is within three standard deviations of the mean, the remaining 0.3% is distributed with half beyond three standard deviations below the mean and the other half beyond three standard deviations above the mean. This is a result of the symmetry (due to the fact that x is squared, it matters not if it is positive or negative) of the curve. In practical terms, in a population of 250,000,000; 249,625,000 would have an IQ lower than 145 and 375,000 would have an IQ higher. Because of the small area of these regions, they are often referred to as tails. Depending on the circumstances, we may be interested in one tail or two tails.
Several societies exist which cater to individuals with high IQs. Some specific examples would be MENSA, Triple Nine, Mega, etc.
Another important charateristic of this distribution is that it is
of infinite extent. In practical terms, IQs below 0 (-6.67
)
or above 210 (7.33
)
(ceiling scores such as
Marilyn Vos Savant's
are difficult to interpret)
do not occur. A recently popularized manufacturing goal has been termed
Six Sigma.
Interpretted as ±6
,
one would think this would corresponds with about 2 defects per billion,
but their web site implies it is 200 per million. A typically
good company operates at less than plus or minus four sigma or 99.994% perfect.
This corresponds closer to 63 defects per million. If your family has
ever purchased a "lemon"
(a colloquialism for bad car, perhaps one built on a Monday)
you can appreciate such striving for perfection.
(Another source implies six sigma refers to ±3
.)
Other similar examples would be the large increase in
errors related to prescription drugs being dispensed
or the case of the Florida patient who had the wrong leg amputated.
|
For K=2,
we see that 1-1/22=1-1/4=3/4,
which is 75% of the data must always be within two standard deviations of the mean. |
|
For K=3,
we see that 1-1/32=1-1/9=8/9,
which is about 89% of the data must always be within three standard deviations of the mean. |
If we consider the data set 50, 50, 50, and 100, we will discover that the sample standard deviation (s) is 25, and the upper score falls exactly at 2s above the rest. However, since the mean is 62.5, it is well within 2s. Added 5 more scores of 50 we find the mean is now 55.6 and the standard deviation now 16.7. We see that two standard deviations above the mean now extends to 88.9 and we have one data point outside that, but within three standard deviations. The general concept of being able to find the mean of a data set and determine how much of it is within a certain distance (number of standard deviations) of the mean is an important one which we will continue in the next lesson.
Note: here is an example of a data set with k=2 and only 75%
of the data within the proscribed limits. It comes to us from
Hogg and Craig (1978, p. 70) in "Introduction to Mathematical Statistics".
(5th ed.) via the AP STAT list server on May 31, 2000.
Let the discrete random variable x have probabilities 1/8, 6/8, 1/8
at the points x= -1, 0, 1 respectively.
µ=0 and
2=1/4.
If k=2, then 1/k2=1/4 and
we thus attain the bound given by Chebyshev's inequality.
| BACK | HOMEWORK | ACTIVITY | CONTINUE |
|---|