Back to the Table of Contents

Applied Statistics - Lesson 4

The Binomial and [Standard] Normal, Bell-shaped, Gaussian Distributions

Lesson Overview:

Probability distributions may be either discrete or continuous. The normal (Gaussian) and Lorentzian distributions are good examples of continuous distributions—the random variable can take on any value. Examples of discrete distributions include the Binomial, the Hypergeometric, and the Poisson. We will introduce the binomial today and then focus on the normal distribution. The others listed above will not be important for this course. However, other distributions will be important to this course due to their relationship to inferential statistics. We already referenced uniform distributions and will be come further acquainted with the Student t distribution, and Chi-square distribution.

The Binomial Distribution

The prefix bi- has the usual meaning of two in this context, just like bicycle, bifocal, and bigamist. This distribution is related to what happens when you study the expansion of the binomial (1+x)n. Here it means there are two and only two distinct categories. For instance, students either pass or they fail a test. In dining out at fast food restaurants, people either have or haven't eaten at McDonald's.

Some notation has become very standard when working with binomial distributions. S (success) and F (failure) denote possible categories for all outcomes; whereas, p and q=1-p denote the probabilities P(S) and P(F), respectively. The term success may not necessarily be what you would call a desirable result. For example, you may want to find the probability of finding a defective chip, given the probability 0.2 that a chip is defective. Here the term success might actually represent the process of selecting a defective chip. The important thing here is to correlate P(S) with p. Some authors avoid q, but the formulae seem clearer using it rather than the awkward expression 1-p.

What Makes a Binomial Experiment?

The requirements to be a binomial experiments are as follows:
  1. There must be a fixed number of trials.
  2. Trials must be independent. One trial's outcome cannot affect the probabilities of other trials.
  3. All outcomes of trials must be in one of two categories.
  4. Probabilities must remain constant for each trial.

Requirement 2 specifically implies with replacement if we are selecting something, unless the change of not replacing it is slight.

  • P(S) = p.
  • P(F) = q = 1-p.
  • n indicates the fixed number of trials.
  • x indicates the number of successes (any whole number [0,n]).
  • p indicates the probability of success for any one trial.
  • q indicates the probability of failure (not success) for any one trial.
  • P(x) indicate the probability of getting exactly x successes in n trials.

The Binomial Formula.

The formula for calculating P(x) is as follows:
P(x) = nCxpxqn-x where x = 0, 1, 2,..., n

Here nCx has the usual definition as entries from Pascal's Triangle and can be defined in terms of n! divided by (x! • (n-x)!). The symbol !, the factorial symbol as shorthand for the product of all the natural numbers up to that number. Thus, 4!=4 · 3 · 2 · 1 = 24. By definition and convention, 0!=1. Note that if p=q=½, the distribution will be symmetric due to the symmetry in Pascal's Triangle. In chapter 7 we will examine some cases where p does not equal q, i.e. p#½.

Example: 10 coins are flipped and each coin has a probability of 50% of coming up heads. What is the distribution of expected number of heads up?
Solution: From Pascal's Triangle we find row 10 gives us the follow: 1, 10, 45, 120, 210, 252, 210, .... This tells us how many different arrangements there are that have 0, 1, 2, 3, etc. heads. There are 210=1024 different arrangements total and so the corresponding probabilities are:

Since this distribution is symmetric, the mean is clearly 5.0. We will give formulae for calculating the mean and standard deviation for general binomial distributions in lesson 7.

As the number of coin flips increases, the binomial distribution, although discrete, looks more and more like the normal distribution.

The Bell-shaped, Normal, Gaussian Distribution

It can be shown under very general assumptions that the distribution of independent random errors of observation takes on a normal distribution as the number of observations becomes large. This is a general natural phenomenon. The French mathematician DeMoivre (1667-1754) developed the general equation from observations of games of chance. Later Gauss characterize this distribution and hence it is often named after him. It is also shaped like a bell, hence yet another name. The term used in the title above is rather redundant, but serves to emphasize that the three are identical. You can graph this curve on a calculator as seen below by entering the following function: y=e-x2/2/ [square root] (2[pi]), where e is the transcendental number 2.71828... and [pi] is the more familiar, but also transcendental number 3.14159.... The [pi] in the formula only serves to normalize the total area under the curve. When we normalize something, we make it equal to some norm or standard, usually one (1). The word normal has several other meanings, including perpendicular and the usual/status quo.

[graph of bell-shaped curve]           The standard normal distribution

The height of the curve represents the probability of the measurement at that given distance away from the mean. The total area under the curve being one represents the fact that we are 100% certain (probability = 1.00) the measurement is somewhere. Technically, this is the standard normal curve which has µ=0.0 and =1.0. Other applications of the normal curve do not have this restriction. For example, intelligence has often been cast, albeit controversially, as normally distributed with µ=100.0 and [sigma]=15.0. This is represented below. Our function has been modified to y=e-(x-µ)2/22/ ([sigma] [square root] (2[pi]))

Normally distributed IQs           [graph of iq distribution]

Other things which may take on a normal distribution include body temperature, shoe sizes, diameters of trees, etc. It is also important to note the symmetry of the normal curve. Some curves may be slightly distorted or truncated beyond certain limits, but still primarily conform to a "heap" or "mound" shape. This is often an important consideration when analyzing data or samples taken from some unknown population.

We need to differentiate between a set of data which is normally distributed and THE normal distribution. THE normal distribution is a gold standard to which other distributions are compared, whereas various sets of data may follow, to a good approximation, the normal distribution and hence be termed normally distributed. Many procedures of inferential statistics depend on the underlying data being somewhat normally distributed and/or the various samples possible having a high probability of being normal as well.

The Standard Normal Distribution

As noted above when we specified the standard normal distribution, there is a vast family of different normal distributions, each member of which has a different mean and a different standard deviation. These can be termed non-standard normal distributions. The normal distributions are all related and can be referenced back to the standard normal distribution by use of the standard scores (z-scores) introduced in lesson 3. All normal distributions are symmetric, unimodal, bell-shaped, and have their maximum at the mean=mode=median. All normal distributions are continuous and have asymptotic tails---never touching the x-axis. The standard normal distribution is sometimes called the unit normal distribution. By converting normally distributed scores with an arbitrary mean and standard deviation into z-scores, we transform the data into a standard normal distribution.

The Empirical Rule

For a normally distributed data set, the empirical rule states that 68% of the data elements are within one standard deviation of the mean, 95% are within two standard deviations, and 99.7% are within three standard deviations. Graphically, this corresponds to the area under the curve as shown below for 1 and 2 standard deviations. The empirical rule is often stated simply as 68-95-99.7. Note how this ties in with the range rule of thumb, by stating that 95% of the data usually falls within two standard deviations of the mean.

[+/- 1 std shaded under normal curve]           Data within 1 (left) and 2 (right)           [+/- 2 std shaded under normal curve]

The author usually claims an IQ of at least 145. We can see from the above information that this would put him at least three standard deviations above the population mean (100+3•15=145). Hence, if we accept the hypothesis that IQs are normally distributed, at least 99.85% of the population would have a lower IQ and less than 0.15% a higher one. Please especially note that if 99.7% of the population is within three standard deviations of the mean, the remaining 0.3% is distributed with half beyond three standard deviations below the mean and the other half beyond three standard deviations above the mean. This is a result of the symmetry (due to the fact that x is squared, it matters not if it is positive or negative) of the curve. In practical terms, in a population of 250,000,000; 249,625,000 would have an IQ lower than 145 and 375,000 would have an IQ higher. Because of the small area of these regions, they are often referred to as tails. Depending on the circumstances, we may be interested in one tail or two tails.

Several societies exist which cater to individuals with high IQs. Some specific examples would be MENSA, Triple Nine, Mega, etc.

Another important charateristic of this distribution is that it is of infinite extent. In practical terms, IQs below 0 (-6.67) or above 210 (7.33) (ceiling scores such as Marilyn Vos Savant's are difficult to interpret) do not occur. A recently popularized manufacturing goal has been termed Six Sigma. One would think this would corresponds with about 3.4 defects per billion, but their web site implies it is 200 per million. A typically good company operates at less than four sigma or 99.997% perfect. This corresponds closer to 32 defects per million. If you have ever purchased a "lemon" (a colloquialism for bad car, perhaps one built on a Monday) you can appreciate such striving for perfection. Other similar examples would be the large increase in errors related to prescription drugs being dispensed or the case of the Florida patient who had the wrong leg amputated.

Chebyshev's Theorem

Chebyshev's theorem applies to any data set, not only normally distributed data sets. His theorem states that the portion of any set of data within K standard deviations of the mean is always at least 1-1/K2, where K may be any number greater than 1.

For K=2, we see that 1-1/22=1-1/4=3/4, which is 75% of the data must always be within
two standard deviations of the mean.

For K=3, we see that 1-1/32=1-1/9=8/9, which is about 89% of the data must always be within
three standard deviations of the mean.

If we consider the data set 50, 50, 50, and 100, we will discover that the sample standard deviation (s) is 25, and the upper score falls exactly at 2s above the rest. However, since the mean is 62.5, it is well within 2s. Added 5 more scores of 50 we find the mean is now 55.6 and the standard deviation now 16.7. We see that two standard deviations above the mean now extends to 88.9 and we have one data point outside that, but within three standard deviations. The general concept of being able to find the mean of a data set and determine how much of it is within a certain distance (number of standard deviations) of the mean is an important one which will carry over into inferential statistics.

Meanings of Normal

The word normal is used extensively in math and science and generally has a very context specific meaning which differs from that "normally" encountered. We list here several.
  1. Normal in mathematics and physics often means perpendicular to. A normal vector is perpendicular to a given line or plane.
  2. Normal in statistics generally refers to the gaussian distribution or the "normal" way we would expect errors to be distributed.
  3. Normal in chemistry refers to the molarity or concentration of an acid or base. Specifically, these are related by normality equals the molarity times the number of equivalents, where equivalents refers to number of solutes. For acids like HCl and basics like NaOH the numerical values of normality and molarity are equal. However, H2SO4 has twice the normality for a given molarity.

Using the Standard Normal Distribution

The area under the normal curve is one and the probability of an event under the normal curve occurring is one. In fact, there is a direct relationship between the probability of an event and the area under the curve which corresponds to that event. One of the main branches of calculus, integral calculus, was invented just to be able to find the area under an arbitrary curve. Tables of the area under a normal curve commonly available and the ability to read and interpret them is important as well since the technique will apply to other distributions later.

The table below gives values for the area between z=0 and z=?, where the final z is initially read down, then the value at the top of the column is added. Alternately, the value at the top of the column can be viewed as the second digit. Such tables may clarify why z scores are so typically reported to two decimal places! Warning: Although every effort has been made to verify these numbers (on a TI-83 graphing calculator), errors may still be present. Also, the table is somewhat incomplete due to lack of space.


Example: Find the probability for a data value to fall between the mean (z=0.00) and one standard deviation (z=1.00) above the mean, assuming the population is normally distributed.
Solution: The table above gives the value 0.3413 or 34.13%. This is the same as what the empirical rule gives (68÷2).

Example: Find the probability for IQ values between 75 and 130, assuming a normal distribution, mean = 100 and std = 15.
Solution: An IQ of 75 corresponds with a z score of -1.67 and an IQ of 130 corresponds with a z score of 2.00. We can read the value for -1.67 by remembering that the normal distribution is symmetric and then reading the value of .4525 off the table. For 2.00 we find .4772. The probability of an IQ between 75 and 130 is the same as the probability of an IQ between 75 and 100 plus the probability of an IQ between 100 and 130 or between 100 and 125 (75) plus the probability of an IQ between 100 and 130 or .4525+.4772=.9297. Including a sketch like in those given above is always appropriate.

In addition to being able to find the percentile of a score by finding its z-score, reading the area under the table, adding .5, and multiplying by 100, we can use the z-score table to find percentile rank for a given score. Since z-score tables are typically abbreviated, there are some tricks to the trade. Also, the algebra to transform the z-score equation (see below) often slows students down.

Normalized Standard Scores

Percentiles and percentile ranks are limited in their use by their ordinal nature. This limitation can be overcome by generating normalized standard scores. These scores have the advantage of an equal-interval scale, unlike percentiles. A common normalized standard score is the normal curve equivalent score or NCE score. These scores range from 1 to 99 with a mean of 50 and standard deviation of 21.38. Although NCE scores of 1, 50, and 99 correspond with the same percentiles, the other scores do not. Computing NCE scores is a three step process: 1) convert the raw score to a percentile rank; 2) convert the percentile rank into a z-score using the z-score table inside out; and 3) convert the z-score to the NCE score by using the transformed z-score formula: x' = sz + .