EDRM611 - Applied Statistics in Education and Psychology I

Objectives for Unit Three
Percentiles, Percentile Ranks and Central Tendency

1. Know the meaning of, use for, and recognize examples of a percentage, percentile, and percentile rank.
A percentage is a percent number compared to a standard. The standard is either "perfect" or "everyone." Examples would include "scoring 84% on a test (84% of perfect)" and "64% passed the test (64% of the students)." Percentages are used to describe individual cases (84% on a test) or the distribution as a whole (64% of students).

A percentile is a point that divides a distribution. For example, the 50th percentile divides a distribution into the top and bottom halves. Percentiles are used to describe characteristics of distributions. For example reporting the 50th percentile of a distribution is a measure of the central tendency (this is the median).

A percentile rank is a percent number that indicates the percentage of cases in a distribution below a given variable value. Percentile ranks are used to describe individual cases. For example standardized test scores are usually reported as percentile ranks. When you take the Graduate Record Examination, you get a percentile rank for each part of the test. A percentile rank of 99 indicates that 99% of the persons in the reference group (norm group) scored below the score that you received.

2. Know the reference points needed to interpret a percentage, percentile, and percentile rank.
To interpret a percentage, it is frequently helpful to know the possible or perfect value or the number of cases (subjects) that was used. For example if you get heads 100% of the time when you flip a coin, it is helpful to know whether you flipped 1 time or 100 times. If you scored 100% on a test it is helpful to know whether there was 1 question or 100 questions.

To interpret a percentile, it is helpful to know the number possible. For example, if the 50th percentile on a test is 48, it would be helpful to know whether the possible is 50 or 100.

To interpret a percentile rank it is essential to know the characteristics of the reference group to which the score is being compared. For example, if you were a senior math major in college, receiving a percentile rank of 99 (top of the group) on a standardized college math test would be good if the group to which you were compared were other senior math majors, but if the group was college seniors in general it would be not nearly as good.

3. Know the measurement scale used for a percentage, percentile, and percentile rank and the types of statistical analysis appropriate for each.
Percentages are measured on a ratio scale no matter what the scale is on the original variable. If you said that 30% of a group is male and 70% female, even though gender is nominal, the percentages are ratio (30% is half as much as 60%). There is a meaningful zero to percentages which means no cases. There are no limitations in using percentages in statistical analyses.

Percentiles are measured on the same scale as the original variable. They are not normally used in computing other statistics.

Percentile ranks are reported on an ordinal scale, which means that they are not appropriate for most statistical analyses. You should not compute the mean percentile rank nor compute a correlation between two sets of percentile ranks. Percentile ranks are normally just used to describe individual cases, not group characteristics.

4. Know the values that are appropriate for percentile ranks.
Percentile ranks from 1 to 99 are always reported as integers (no decimals). Occasionally values below 1 and above 99 are reported with decimal points. Values of 0 and 100 are not used.

5. Know the meaning of a quartile.
There are three quartiles (1st, 2nd, and 3rd) that divide a distribution into quarters (top 25% of cases, next 25%, etc.). The 1st quartile is the 25th percentile.

6. Know the meaning of deviation score.
A deviation score is the distance of a score from some point. If no indication is given of what the reference point is, it is assumed to be the mean of the distribution.

7. Know the characteristics of measures of the mean, median, and mode.
The mean is the point at the "mathematical center" or "balance point" of the distribution. It usually does not correspond to an actual score. The sum of the deviations around the mean equal zero. The sum of the squared deviations around it are a minimum (less than any other point).

The median is the point dividing the distribution in half. It frequently does not correspond to an actual score in the distribution. The median is the point around which the sum of the absolute values of the deviations is a minimum (less than any other point).

The mode is the score which is the most common (highest frequency) in the distribution. If there are two or more scores with the same frequency, there would be two or more modes (bimodal, trimodal, etc. distributions).

8. Know how to compute the mean from raw scores.
The mean is computed by summing the scores and dividing by N.

9. Know how to compute the median from raw scores and from a frequency distribution.
If there is an odd number of scores the median is the middle score when the scores are ranked from highest to lowest. If there is an even number of scores the median is halfway between the middle two scores. For example, if there are 11 scores, the median is the 6th score from the bottom or top of the distribution; if there are 10 scores, the median is halfway between the 5th and 6th scores from the bottom or top of the distribution. In the following frequency distribution which has 17 scores, the median would be the 9th score from the bottom which would be a 2 since both the 9th and 10th scores are 2.

X f
4 3
3 4
2 2
1 8

10. Know how to compute the mode from raw scores and from a freqency distribution.
The mode is the score that occurs the most frequently. In a frequency distribution it would be the score with the largest frequency. In the example in the previous objective, the mode would be 1 since there are 8 1's which is the most common score.

11. Know situations when the mean, median, and mode are preferred.
The mean is preferred when a precise measure of the group is desired and every score is important. It is usually the best description of the total group for research purposes. Since in most populations, the means, medians, and modes are similar (and in a normal distribution they are identical), this is another reason why the mean is the preferred statistic for research. It is not appropriate to use the mean when there are extreme cases in the distribution that should not be used in the description.

The median is preferred in this case (when there are extreme scores to be ignored). The median is not as sensitive to changes in scores as the mean and therefore is not as good for a precise description of the group. The median is a better indicator of the "typical" person in the group since half of the scores are above or below the median.

The mode is only preferred when the most common score is desired, when there is more than one mode, or with nominal data when the most common category is desired.

12. Know the effect changes in scores or extreme scores have on the mean, median and mode.
If one score is added to a distribution, the mean will change unless the new score is equal to the mean. If the score is an extreme score the mean may change a great deal. The median may change but only very slightly no matter what the new score is. The mode will probably not change unless the new score is one in which the frequency is equal to or one less than the existing mode.

13. Know the stability of the mean, median and mode.
The mean is the most stable of the three sample measures in terms of estimating the population parameter. If repeated computing of means, medians, and modes was done from small samples chosen from a larger population of scores, the variability of the means of the samples would be smaller than the variability of the medians or the variability of the modes.

14. Know the symbols for population and sample means.
The sample mean is called "bar-X" and is a capital X with a bar over it. The population mean is called "mu" and is the Greek letter M (µ).

15. Know the value of the term "average."
The term "average" is not a term used in statistics or in research. It is a non-professional term used to indicate central tendency. Since the mean, median, or mode may each be the best indicator of central tendency or "average", and each of these terms means a different thing and each has disadvantages, the type of statistic used must be indicated in research communication and the word average should not be used.

16. Know how to use the mean and median to estimate the shape of a distribution.
If the mean is much higher or lower than the median it suggests either a few extreme scores or a skewed distribution. The mean is above the median in a positively skewed distribution and below the median in a negatively skewed distribution. The extreme scores in the tail of the distribution pulls the mean in that direction. The length of the tail has no effect on the median.

17. Be able to estimate the mean and median from a general description of scores.
If no information is known about the shape of the distribution, the median is best estimated by taking a point halfway between the highest and lowest point, excluding all extreme points.

The mean is then adjusted up or down from this point based on whether the predominance of scores are above or below the estimated median and where the extreme scores are located.

18. Know the meaning and uses for unweighted and weighted means.
Unweighted and weighted means are means of groups rather than means of cases. An unweighted mean of two or more groups is the mean of the groups ignoring the number of subjects in each group. The weighted mean of two or more groups is equivalent to the mean that would be computed if group membership was ignored and the mean of all the cases was computed. In computing the weighted mean, the number of cases in each group weights the mean of the group when the group means are combined.

The unweighted mean is useful when each group is equally important (the number of cases in the groups is not related to the importance of the groups). The weighted mean is useful when each case is equally important and each group is not equally important.

19. Know how to estimate unweighted and weighted means.
The unweighted mean is estimated in the same way as the mean is estimated. It is simply the mean of the group means. The sample size of each group is ignored.

To estimate the weighted mean, first the unweighted mean is estimated and then an adjustment is made, taking into consideration the size of the groups. The means of the larger groups are more important (the weighted mean will be closer to the mean of the larger groups than the smaller groups).