In general, distributions often have an overall shape, center, and spread. There may be outliers, or not. Tails (wings) may be thick or thin. The distribution may be skewed to the right or left. The purpose of descriptive statistics and exploratory data analysis was to quantify and/or get a feel for these distribution shapes. The normal distribution is often the "gold standard" by which data sets are compared.
Specifically, whether or not an observation is an outliers is, to some extent, a matter of judgment. An outlier is an individual observation that deviates from or falls outside the overall pattern. Outliers, like the old supreme court definition of pornography ("You know it when you see it.") can be hard to define. Distributions are commonly symmetric. That is, the right and left sides are approximately mirror images of each other. Even uniform or multimodal distributions can be symmetric. If they are not symmetric they are typically heap shaped or mound-shaped. We term a distribution skewed to the right if the right side extends much further out than the left side (usually the mean would then be to the right of the median) and skewed to the left if the left side extends much further out than the right side (usually the mean would then be to the left of the median). We wish not to get involved with the technical definition of skewness in terms of the third moment, or catalog exceptions to the mean/median heuristic above, but instead refer you to this site for details on both.
In Statistics lesson 1 we also noted that data can be discrete or continuous. Again, it can be hard to differentiate between the two due to quantum mechanics and uncertainties about measurement accuracy. Hence discrete distributions are commonly encountered and continuous distributions are at least possible mathematically. The normal distribution is the most important continuous distributions and whenever n is sufficiently large (generally over 30), we often make assumptions about a discrete distribution derived from the normal distribution but shown to be accurate enough.
First, all probabilities are between 0 and 1 (0 P(x) 1).Second, all probabilities in a distribution sum to 1 ( P(x) = 1)(i.e. it is certain your outcome is in the sample space). |
We note above two fundamental rules regarding distributions.
Example: Test the following function to determine whether or not it is a
probability distribution.
P(x) = (5 - x)/10 when x = 1, 2, 3, 4.
Solution:
| x | P(x) |
| 1 | 2/5 = 0.40 |
| 2 | 3/10 = 0.30 |
| 3 | 1/5 = 0.20 |
| 4 | 1/10 = 0.10 |
It works! All probabilities are between zero and one and summing the last column gives 10/10 = 1.00
| x | P(x) |
| 0 | ¼ |
| 1 | ½ |
| 2 | ¼ |
Consider further the pips displayed on a (fair) die:
| x | P(x) |
| 1 | 1/6 |
| 2 | 1/6 |
| 3 | 1/6 |
| 4 | 1/6 |
| 5 | 1/6 |
| 6 | 1/6 |
The table below gives values for the area between z=0 and z=?, where the final z is initially read down, then the value at the top of the column is added. Alternately, the value at the top of the column can be viewed as the second digit. Such tables may clarify why z scores are so typically reported to two decimal places! Warning: Although every effort has been made to verify these numbers (on a TI-83 graphing calculator), errors may still be present. Also, the table is somewhat incomplete due to lack of space.
| z | x.x0 | x.x1 | x.x2 | x.x3 | x.x4 | x.x5 | x.x6 | x.x7 | x.x8 | x.x9 |
|---|---|---|---|---|---|---|---|---|---|---|
| 0.0x | .0000 | .0040 | .0080 | .0120 | .0160 | .0199 | .0239 | .0279 | .0319 | .0359 |
| 0.1x | .0398 | .0438 | .0478 | .0517 | .0557 | .0596 | .0636 | .0675 | .0714 | .0753 |
| 0.2x | .0793 | .0832 | .0871 | .0910 | .0948 | .0987 | .1026 | .1064 | .1103 | .1141 |
| 0.3x | .1179 | .1217 | .1255 | .1293 | .1331 | .1368 | .1406 | .1443 | .1480 | .1517 |
| 0.4x | .1554 | .1591 | .1628 | .1664 | .1700 | .1736 | .1772 | .1808 | .1844 | .1879 |
| 0.5x | .1915 | .1950 | .1985 | .2019 | .2054 | .2088 | .2123 | .2157 | .2190 | .2224 |
| 0.6x | .2257 | .2291 | .2324 | .2357 | .2389 | .2422 | .2454 | .2486 | .2517 | .2549 |
| 0.7x | .2580 | .2611 | .2642 | .2673 | .2704 | .2734 | .2764 | .2794 | .2823 | .2852 |
| 0.8x | .2881 | .2910 | .2939 | .2967 | .2995 | .3023 | .3051 | .3078 | .3106 | .3133 |
| 0.9x | .3159 | .3186 | .3212 | .3238 | .3264 | .3289 | .3315 | .3340 | .3365 | .3389 |
| 1.0x | .3413 | .3438 | .3461 | .3485 | .3508 | .3531 | .3554 | .3577 | .3599 | .3621 |
| 1.1x | .3643 | .3665 | .3686 | .3708 | .3729 | .3749 | .3770 | .3790 | .3810 | .3830 |
| 1.2x | .3849 | .3869 | .3888 | .3907 | .3925 | .3944 | .3962 | .3980 | .3997 | .4015 |
| 1.3x | .4032 | .4049 | .4066 | .4082 | .4099 | .4115 | .4131 | .4147 | .4162 | .4177 |
| 1.4x | .4192 | .4207 | .4222 | .4236 | .4251 | .4265 | .4279 | .4292 | .4306 | .4319 |
| 1.5x | .4332 | .4345 | .4357 | .4370 | .4382 | .4394 | .4406 | .4418 | .4429 | .4441 |
| 1.6x | .4452 | .4463 | .4474 | .4484 | .4495 | .4505 | .4515 | .4525 | .4535 | .4545 |
| 1.7x | .4554 | .4564 | .4573 | .4582 | .4591 | .4599 | .4608 | .4616 | .4625 | .4633 |
| 1.8x | .4641 | .4649 | .4656 | .4664 | .4671 | .4678 | .4686 | .4693 | .4699 | .4706 |
| 1.9x | .4713 | .4719 | .4726 | .4732 | .4738 | .4744 | .4750 | .4756 | .4761 | .4767 |
| 2.0x | .4772 | .4778 | .4783 | .4788 | .4793 | .4798 | .4803 | .4808 | .4812 | .4817 |
| 3.0x | .4987 | .4987 | .4987 | .4988 | .4988 | .4989 | .4989 | .4989 | .4990 | .4990 |
Example: Find the probability for a data value to fall
between the mean (z=0.00) and one standard deviation (z=1.00)
above the mean, assuming the population is normally distributed.
Solution: The table above gives the value 0.3413 or 34.13%.
This is the same as what the empirical rule gives (68÷2).
Example: Find the probability for IQ values between 75 and 130,
assuming a normal distribution, mean = 100 and std = 15.
Solution: An IQ of 75 corresponds with a z score of -1.67
and an IQ of 130 corresponds with a z score of 2.00.
We can read the value for -1.67 by remembering that the normal
distribution is symmetric and then reading the value of .4525 off the table.
For 2.00 we find .4772. The probability of an IQ between 75 and 130
is the same as the probability of an IQ between 75 and 100 plus the
probability of an IQ between 100 and 130 or between 100 and 125 (75) plus
the probability of an IQ between 100 and 130 or .4525+.4772=.9297.
Including a sketch like in Statistics
lesson 6 would be appropriate.
We can calculate the expected value for total pips by summing the product of the value with the frequency. Thus 21/36+32/36+...121/36 = 252/36 = 7.00. The value we obtain is the expected value. In this case, it is also the mode.
Example: Find the expected value given the two coin distribution
discussed above.
Solution:
x takes on the values 0, 1, or 2 with frequency ¼,
½, and ¼.
E=0¼ + 1½ + 2¼ = 0+½+½=1.00.
Thus we expect one head when throwing two coins.
Example: Find the expected value given the one die distribution
discussed above.
Solution:
x takes on the values one through six with equal probability
of 1/6.
(1+2+3+4+5+6)1/6=21/6=3.5.
Thus we expect 3.5 pips when throwing a fair, six-sided die.
Obviously, since pips are discrete, we can't expect 3.5 pips
on any one roll!
| T. OF CONTENTS | HOMEWORK | SOLUTIONS | ACTIVITY | CONTINUE |
|---|