Back to the Table of Contents

An Introduction to Statistics - Lesson 3

Averages:
Mean, Mode, Median, or Midrange?

Lesson Overview:

Averages

Average most often refers to the arithmetic mean, but is actually ambiguous
and may be used to also refer to the mode, median, or midrange.

You should always clarify which average is being used, preferrably by using a more specific term. Averages give us information about a typical element of a data set. They are measures of central tendency.

Mean most often refers to the arithmetic mean, but is also ambiguous.
Unless specified otherwise, we will assume arithmetic mean whenever the term mean is used.

The Arithmetic Mean is obtained by summing all elements of the data set
and dividing by the number of elements.

A host of other means and their method of computation will be discussed in lesson 4.

Symbolically, the arithmetic mean is expressed as [sum of the x sub i's divided by n] where [x bar] (pronounced "x-bar") is the arithmetic mean for a sample and [Sigma] is the capital Greek letter sigma and indicates summation. xi refers to each element of the data set as i ranges from 1 to n. n is the number of elements in the data set. The equation is essentially the same for finding a population mean; however, the symbol for the population mean is the small Greek letter µ (mu). As we will also see in lesson 5, Roman letters usually represent sample statistics, whereas Greek letters usually represent population parameters.

Sample Size is the number of elements in a sample. It is referred to by the symbol n.

Be sure to use a lower case n for sample size. An upper case N refers to Population Size, unless being used in the context of a normally distributed population.

Mode is the data element which occurs most frequently.

A useful mnemonic is to alliterate the words mode and most. Alliterations start with the same sound like: "seven slippery slimy snakes...".

Some data sets contain no repeated elements. In this case, there is no mode (or the mode is the empty set). It is also possible for two or more elements to be repeated with the same frequency. In these cases, there are two or more modes and the data set is said to be bimodal or multimodal. In the rare instance of a uniform or nearly uniform distribution, one where each element is repeated the same or nearly the same number of times, one could term it multimodal, but some authors invoke subjectivity by specifying multimodality only when separate, distinct, and fairly high peaks (ignoring fluctuations due to randomness) occur.

The Median is the middle element when the data set is arranged in order of magnitude.

A useful mnemonic is to remember that the median is the grassy strip (in the rural area of the midwest where I come from) that divides opposing lanes in a highway. It is in the middle.

If there are an odd number of data elements, the median is a member of the data set. If there are an even number of data elements, the median is computed as the arithmetic mean of the middle two.

The median has other names which will be studied in lesson 7. The symbol [x tilde] (pronounced "x-tilde") is sometimes used for the median, but will not be used here.

The Midrange is the arithmetic mean of the highest and lowest data elements.

Midrange is a type of average. Range is a measure of dispersion and will be studied in lesson 5. A common mistake is to confuse the two.

Symbolically, midrange is computed as (xmax+xmin)/2

The Best Average

The ambiguity of the term average can lend to deception. Statisticians may often be cast as liars as a result. Note how advertisers may distort statistics to pursue their goals.

Some basic facts regarding averages are as follows.

  1. Mean, median, and midrange always exist and are unique.
  2. Mode may not be unique or may not even exist.
  3. Mean and median are very common and familiar.
  4. Mode is used less frequently; midrange is rarely used.
  5. Only the mean is "reliable" in that it utilizes every data element.
  6. The midrange, and also somewhat the mean, can be distorted by extreme data elements (see lesson 8).
  7. The mode is the only appropriate average for nominal data.

Round-off Rules

The mode, if it exists, and possibly the median are elements of the data set. As such, they should be specified no more accurately than the original data set elements.

The midrange and possibly the median are the arithmetic mean of two data set elements. One additional significant digit may be necessary to accurately convey this information.

The number of significant digits for the mean should conform to one of the following rules.

  1. The significant digits should be no more than the number of significant digits in the sum of the data elements. Since the sample size (n) is an exact value, it has no affect on the number of significant digits obtained from the division. Those rules were outline in Numbers lesson 9. This is sometimes simplify as a rule of thumb by stating that the mean should be given to one more decimal place than the original data. However, this assumes the data set is small (n < 100) and that the data was recorded to a consistant precision. On a historic note, the term rule of thumb apparently does not come from any old English law to limit the size of stick which a husband could use to beat his wife as previously stated. However, abusive relationships still remain an often hidden societal problem.
  2. The number of significant digits should be consistant with the precision obtained for the standard deviation. This concept is expanded upon in lesson 5 after measures of dispersion are discussed.
  3. It is not uncommon in science for results to be left in and interim calculations sometimes rounded to three significant digits, which is about all you could get out of a slide rule. Hence, this was commonly termed slide rule accuracy. In pre-calculator days, this also made hand calculations easier.
The important thing to remember is not to write down twelve decimal places without good reason, even though your calculator will often display such.

Presenting more than five significant digits is probably a joke and points will be deducted!

In 1894 the physicist Michelson apparently quoting Kelvin said: "it seems probable that most of the grand underlying principles have now been firmly established and that further advances are to be sought chiefly in the rigorous application of these principles to all the phenomena which come under our notice....future truths of physical science are to be looked for in the sixth place of decimals." Relativity and quantum mechanics soon revolutionalized physics and we soon were looking at details in the ninth place! My dissertation, reported results of the cesium D1 transition centroid frequency as: 335 116 048 748.2(2.4) kHz.

Examples

The homework for statistics lesson 2 near the end had the question:

17. What is the average of: 1, 1, 2, 4, 7?

As we have seen in this lecture, this is a rather ambiguous question and the answers 1 (mode), 2 (median), 3.0 (mean), and 4.0 (midrange) are all possible and correct!


Example: A sample of size 5 (n=5) is taken of student quiz scores with the following results: 1, 7, 8, 9, 10.

Answer: The mean is (1+7+8+9+10)/5 = 35/5 = 7.0 (note one more decimal place is given).

All scores occur only once, hence there is no mode. The median score is 8 (not 8.0). The midrange is (10+1)/2 = 5.5 (note the extra decimal place is required).

An extreme score (1) distorts the mean so perhaps the median is a better measure of central tendency. For a larger data set, this could be further defined in terms of skewness (median and generally mean to the left of (negatively skewed), right of (positively skewed), or same as (zero skewness) the mode) and symmetry of the data set. It is more common to be positively skewed, since exceptionally large values are easier to obtain due to lower limits. A case in point would be annual earnings. Our left tail is cut off by zero, whereas our right tail is extremely skewed by the likes of Bill Gates and Warren Buffett.


Further examples involving the TI-83+ graphing calculator will be given with the data presented as Stem-and-Leaf Diagrams and Frequency Distribution Tables.

BACK HOMEWORK ACTIVITY CONTINUE