An Introduction to Statistics

Review of Statistics Lessons 7 and 8

Lesson 7: Measurements of Position

z-scores indicate in units of standard deviation how far an element is from the mean.
Positive z-scores are above the mean; negative scores are below the mean.
Thus the formula is z = (element - mean) / standard deviation.
Traditionally, z-scores are rounded to two decimal places and are also known as standard scores.
z-scores make it easier to compare scores with differing means/standard deviations.
An example might be test scores (70, 15), IQs (100, 15), ACT (21, 4.7), and SAT (1020, 157).
Data elements more (less) than 2 standard deviations from the mean are unusual (ordinary).
Data are ranked when arranged in [numeric] order.
The median divides a data set into a bottom half and a top half.
Similarly, the three quartiles, Q₁, Q₂, and Q₃ divide a data set into four quarters.
The left and right hinge correspond with Q₁ and Q₃ respectively, but definition nuances exist.
Outliers are extreme values in a data set and are often classified as mild or extreme.
An outlier is hard to define, but should be easy to recognize.
The interquartile range Q₃-Q₁ is not sensitive to outliers.
The semi-interquartile range: (Q₃-Q₁)/2 is another measure of dispersion.
The midquartile (Q₁+Q₃)/2 is another measure of central tendancy.
The 9 deciles: D₁, D₂,... D₉ divide a data set into 10 parts.
The 99 percentiles: P₁, P₂,... P₉₉ divide a data set into 100 parts.
Q₂, D₅, and P₅₀ are synonyms for median; There is no 100^th percentile.
In the percentile locator formula: L=k•n/100, L must be rounded UP (k is percentile).
The 10-90 percentile range is another measure of dispersion: P₉₀ - P₁₀
The 5-number summary is: minimum, Q₁, median, Q₃, maximum.

Lesson 8: Summarizing and Displaying Data

Frequency tables list data categories/classes in one column and frequencies in another.
Class limits are the largest or smallest numbers which can actually belong to each class.
Class boundaries are the numbers which separate classes--halfway between the limits.
Class marks are the midpoints of the classes.
Class width is the difference between two class boundaries.
Relative frequency tables use percentages or decimal fractions instead of counts.
Cumulative frequency tables include all occurances less than the given value.
A Histogram or bar graph/chart uses the vert. axis for frequency and the hor. axis for classes.
The skewness of a sample/population should become apparent.
Relative frequency historgram uses relative frequency on the vertical scale.
An Ojive is a cumulative frequency polygon---the tops of where the bars would be are joined.
A Pareto chart is a bar graph for qualitative data.
Pie charts are yet another to display relative proportions of a data set.
Stem-and-leaf diagrams are part of exploratory data analysis.
Please omit commas, horizontal lines, and put your data in order.
Rules for split/combined stems, multidigit leaves, will not be covered here.
Plan to have between 5 and 20 stems.
A boxplot or box and whiskers plot visually displays the 5-number summary.

e-mail: calkins@andrews.edu
voice/mail: 269 471-6629/ BCM&S Smith Hall 106; Andrews University;
classroom/FAX: 269 471-6646; Smith Hall 100/ 269 471-3713; Berrien Springs, MI, 49104-0140
home: 269 473-2572; 610 N. Main St.; Berrien Springs, MI 49103-1013
URL: http://www.andrews.edu/~calkins/math/webtexts/stat78rv.htm