Back to the Table of Contents
An Introduction to Statistics
Review of Statistics Lessons 7 and 8
Lesson 7: Measurements of Position
- z-scores indicate in units of standard deviation how far an element is from the mean.
- Positive z-scores are above the mean; negative scores are below the mean.
- Thus the formula is z = (element - mean) / standard deviation.
- Traditionally, z-scores are rounded to two decimal places and are also known as standard scores.
- z-scores make it easier to compare scores with differing means/standard deviations.
- An example might be test scores (70, 15), IQs (100, 15), ACT (21, 4.7), and SAT (1020, 157).
- Data elements more (less) than 2 standard deviations from the mean are unusual (ordinary).
- Data are ranked when arranged in [numeric] order.
- The median divides a data set into a bottom half and a top half.
- Similarly, the three quartiles, Q1, Q2, and Q3
divide a data set into four quarters.
- The left and right hinge correspond with Q1 and Q3
respectively, but definition nuances exist.
- Outliers are extreme values in a data set and are often classified as mild or extreme.
- An outlier is hard to define, but should be easy to recognize.
- The interquartile range Q3-Q1 is not sensitive to outliers.
- The semi-interquartile range: (Q3-Q1)/2 is another measure of dispersion.
- The midquartile (Q1+Q3)/2 is another measure of central tendancy.
- The 9 deciles: D1, D2,... D9
divide a data set into 10 parts.
- The 99 percentiles: P1, P2,... P99
divide a data set into 100 parts.
- Q2, D5, and P50 are synonyms for median; There is no 100th percentile.
- In the percentile locator formula: L=kn/100, L must be rounded UP (k is percentile).
- The 10-90 percentile range is another measure of dispersion:
P90 - P10
- The 5-number summary is: minimum, Q1, median, Q3, maximum.
Lesson 8: Summarizing and Displaying Data
- Frequency tables list data categories/classes in one column and frequencies in another.
- Class limits are the largest or smallest numbers which can actually belong to each class.
- Class boundaries are the numbers which separate classes--halfway between the limits.
- Class marks are the midpoints of the classes.
- Class width is the difference between two class boundaries.
- Relative frequency tables use percentages or decimal fractions instead of counts.
- Cumulative frequency tables include all occurances less than the given value.
- A Histogram or bar graph/chart uses the vert. axis for frequency and the
hor. axis for classes.
- The skewness of a sample/population should become apparent.
- Relative frequency historgram uses relative frequency on the vertical scale.
- An Ojive is a cumulative frequency polygon---the tops of where the bars would be are joined.
- A Pareto chart is a bar graph for qualitative data.
- Pie charts are yet another to display relative proportions of a data set.
- Stem-and-leaf diagrams are part of exploratory data analysis.
- Please omit commas, horizontal lines, and put your data in order.
- Rules for split/combined stems, multidigit leaves, will not be covered here.
- Plan to have between 5 and 20 stems.
- A boxplot or box and whiskers plot visually displays the 5-number summary.