EDRM611 - Applied Statistics in Education and Psychology I

Objectives for Unit Two
Frequency Distributions and Graphs

1. Know the advantages and disadvantages of frequency distributions and graphs compared to statistics to describe distributions.
Tables and graphs are good for a quick, rough overview of a distribution and for comparisons of distributions. They are especially useful to evaluate the shape of a distribution. They are not as good as statistics to compare specific characteristics of distributions such as central tendency (means) and variability (standard deviations).

2. Know the characteristics of a frequency distribution.
A frequency distribution has a minimum of two columns, the leftmost one listing each of the variable values found in the data and the one next to it giving the frequency for that value (the number of cases in the distribution with that value). The variable values are for one variable and the variable values must be mutually exclusive. In addition other columns may be present (percentage, cumulative frequency, cumulative percent, etc.)

3. Know the characteristics of a bar chart, line chart (frequency polygon) and histogram, and situations when it is appropriate to use each.
A bar chart is a series of rectangles (bars) with the heights of the rectangles corresponding to the frequency at each variable value. There are spaces between the rectangles to emphasis the qualitative nature of the variable being used. A bar chart is appropriate for nominal and ordinal data.

A histogram is a series of rectangles (bars) with the heights of the rectangles corresponding to the frequency at each variable value. There are no spaces between the rectangles because of the quantitative nature of the variable being used. Histograms are appropriate for discrete data measured on either an interval or ratio scale.

A frequency polygon or line chart is a series of lines joining points corresponding to the frequency at each variable value or interval. If intervals are used, the points are the midpoints of the intervals. Line charts are appropriate for continuous data measured on either an interval or ratio scale. When smoothed, a line chart takes the shape of a curve.

4. Know principles for proper construction of bar charts, histograms, and line charts.
Usually the horizontal axis is the variable and the vertical axis is frequency (sometimes converted to percentage).

For histograms and line charts, each variable value from the lowest found to the highest found must be included (this includes those with zero frequencies).

It is usually best to have the vertical axis be 3/4 the length of the horizontal axis.

The location of the zero value for the vertical axis must be made clear (by a broken axis if necessary).

5. Know how to compare the shapes of two distributions with different sample sizes on the same axes.
To compare the shapes of two distributions with different number of cases on the same axes it is necessary to use percentage as the vertical axis.

6. Know the shape of the following types of distributions, circumstances when each occur, and recognize examples of variables that would result in each shape: symmetrical, positively skewed, negatively skewed, J-shaped, bimodal, U-shaped, normal, and rectangular.

 Distribution Shape Circumstances Examples positively skewed non-symmetrical distribution with longer right tail (high) with scores bunched to the left (low) a limiting boundary at the low end income, difficult tests negatively skewed non-symmetrical distribution with longer left (low) tail with scores bunched to the right (high) a limiting boundary at the high end easy tests J-shaped an extremely skewed distribution in which the highest point is at the left or right extreme a severe limiting boundary at low or high end age, extremely easy or hard test bimodal two high points (the points may not be equally high) two homogeneous subgroups or two very popular points of view height U-shaped an extreme bimodal distribution in which the two modes are the highest and lowest points two severe limiting boundaries or an extremely polarized group attitudes toward women's ordination normal bell-shaped curve many random events influence each value intelligence, height for men rectangular equal number of scores at each point each value randomly determined ID number

Grouping data allows characteristics of the data to be more easily interpreted than would be true if the raw data were to be examined. Grouping does not result in as much loss of detail as describing groups with one statistic such as a mean of standard deviation.

Grouping does lose precision and any graphs, tables, or statistics generated from grouped data will not be as exact as if raw data would be used. With computers, grouped data should not be used for computing statistics. The primary use for grouped data is for making graphs or tables.

Grouping is especially helpful when a researcher wants to make a frequency distribution and/or a graph and there are a large number of variable values. The number of subjects is not a factor in deciding whether to group or not.

8. Know the principles involved in and recommendations for selecting the number of intervals, interval sizes, the limits of the intervals, and when to use open-ended intervals for grouping data.
There should be approximately 8-15 intervals. You want neither too many intervals (difficult to interpret) nor too few intervals (lose details concerning the shape of the distribution).

Interval sizes should be multiple of 2, 3, or 5. The ideal interval size would be 10 to a given power (10, 100, 1000, etc.).

Either the upper or lower interval limits should each be a multiple of the interval size.

Open-ended intervals should be used when there are extreme scores that would result in many intervals with zero frequencies. However, when this is done, it is not appropriate to make a graph that reflects the true distribution of the complete data set since the open-ended interval cannot be placed on the graph in a correct location.