EDRM611 - Applied Statistics in Education and Psychology I

Objectives for Unit Six
Confidence Intervals and Hypothesis Testing

1. Know the meaning of random, biased and representative as applied to samples and sampling procedures and recognize examples of each.
The terms random and biased refer to the process by which elements are selected from a population. A random method is one in which each element in the population has an equal chance of being selected. A biased method is one in some elements in the population a greater chance of being selected than others. Neither random nor biased is concerned with the characteristics of the sample. Representative is the term used to indicate whether the sample is similar to the population on specific characteristics. A representative sample is one in which sample characteristics are the same as population characteristics. A representative sample can result from either a random or biased sampling procedure. A random sampling procedure is the procedure most likely to result in a representative sample.

2. Know the procedures for determine whether a sample is representative.
With very large samples selected by a truly random process, representativeness is usually taken as a given. It is assumed that randomness works when it is done over and over again many times. With small samples randomness cannot be trusted as much to give a representative sample so there is usually a report on the similarity between the sample and the population on characteristics for which the population characteristics are known or can be estimated fairly accurately. For example demographic characteristics of the population are usually available and can be reported for both the population and the sample. If you find that a geographical area, school, gender group, age group, etc. is over- or under-represented, you would questions the representativeness of the sample on the actual variables being studied.

3. Know the procedures used for selecting a random sample.
The normal way to select a sample is to assign each element in the population a number and then randomly pick a set of numbers to use in selecting the elements for the sample. Prior to computers, tables of random numbers were the usual way of selecting random numbers. The current procedure is to have a computer select the random numbers. Other methods such as writing numbers on balls (used in lotteries) and writing numbers on slips of paper are seldom used in research.

4. Know the meaning of sampling error and its implication.
Whenever a sample is selected in which some elements of the population are not selected, it is assumed (and is almost always true) that the sample is not identical to the population (not truly representative). Therefore any research done with sample data must take into consideration the fact that the results will not be true for the population. The error must be considered in the conclusions.

5. Know the meaning of sampling distribution.
A sampling distribution is a type of frequency distribution in which each element is a statistic based on a sample. In a regular frequency distribution the elements are characteristics of persons or things. In a sampling distribution the elements are characteristics of samples with each sample having the same sample size (N).

6. Know the types of sampling distributions.
An "empirical" sampling distribution is based on statistics from "real" samples. Actual data is collected from many samples and a sampling distribution is constructed. In real life this is never done except by simulation using computers. A "theoretical" sampling distribution is one generated by a formula which gives the shape of a curve that would result if statistics were computed based on an infinite number of samples. Computers are used to generate large numbers of samples to form an empirical distribution to validate or verify that the theoretical sampling distributions are accurate.

7. Know the meaning of the expected value as it relates to a sampling distribution and how this relates to biased and unbiased statistics.
If statistics are computed on an infinite number of samples, the mean of the statistics is called the expected value. If the expected value of the statistic is equal to the population parameter, the statistic is called a unbiased statistic. If the expected value is not equal to the population parameter, it is a biased statistic. For example the formula for standard deviation using N in the denominator is a biased statistic since using it to compute statistics for an infinite number of samples would not give the population standard deviation. N-1 must be used in the denominator to give a unbiased statistic.

8. Know the meaning and symbol for standard error.
The standard error is the standard deviation of the sampling distribution. With an empirical distribution it can be computed just a regular standard deviation. With a theoretical distribution, there are other formulas that can be used. These formulas can be validated or verified by generating an empirical distribution of many samples using a computer.

The symbol for standard error is the standard deviation symbol (s or ) with a subscript appropriate for the sample statistic used for the sampling distribution (p or bar X).

Standard errors can be interpreted based on the normal curve in the same way as the standard deviation. 68% of the statistics in a sampling distribution will be within one standard error of the mean of the sampling distribution.

9. Know how various factors affect the size of the standard error and the expected value.
The size of the standard error is dependent on the type of statistic used. For example, estimating the population median will have more error than estimating the population mean. The larger the variability of a characteristic in a population, the larger the error in estimating that population characteristic. The larger the sample size, the smaller the error in estimate any characteristics of the population from the sample.

10. Know the upper and lower limits for the standard error of the mean.
With a sample size of 1, the standard error of the mean will its maximum size which will be equal to the standard deviation of the population. The smallest the standard error can be is 0 which would occur if the sample size equalled the population size.

11. Know how various factors affect the shape of the sampling distribution of the mean.
If the distribution of a characteristic of a population is normally distributed, the sampling distribution of the mean of that characteristic will also be normally distributed no matter what the sample size. If the distribution of the characteristic in the population is not normally distributed the extent to which the sampling distribution is normally distributed will depend on sample size. The greater the sample size, the closer the sampling distribution is to being normally distributed. This characteristic is called the Central Limit Theorem.

12. Know the sample size needed to assume that the sampling distribution of the mean is normal.
With sample sizes above 30 it can be assumed that the sampling distribution of the mean will be close to normally distributed no matter what the shape of the distribution of the characteristic in the population is.

13. Know the meaning of statistical inference.
Inference is the process of taking data from a sample selected from a larger population and basing conclusions about the population on the basis of results derived from analysis of the sample data.

14. Know the meaning of hypothesis testing and the form used in stating conclusions to a hypothesis test.
Hypothesis testing has two components: stating a hypothesis about a population parameter or many population parameters, and drawing a yes/no conclusion as to whether the hypothesized statement is likely to be true or not. The conclusion will be expressed in terms of level of significance which indicates a probability associated with the likelihood that the conclusion is false.

As used in this course the hypothesis deal with testing whether a population parameter is equal to a certain value or whether there is a difference between two population parameters. For example we could test whether the percentage of people passing a stat test on the first attempt is really 80% or whether the mean score on the first stat test is different from the mean on the second test.

The conclusion to a hypothesis test is normally stated by saying that there is a significant difference between the values being compared at a specific level of significance. For example, "there is a significant difference between the means of the experimental and control groups at the .05 level."

15. Know the meaning of a confidence interval and the form with which it is stated.
A confidence interval is a statement giving a range of values within which the researcher is confident that a population parameter lies.

A confidence interval includes an interval in the form of either a range of numbers (10 - 20) or a number plus/minus another number (15 ± 5) and a percentage number indicating the degree of confidence that the population parameter is in the interval. For example you could say that the 95% confidence interval for a mean is 10-20.

A common erroneous statement is to say that there is a 95% probability that the population parameter is within this interval. This is incorrect since the population is either in the interval (100% probability) or not in the interval (0% probability). The probability only is in your certainty, not in the actual fact.

We can say before the confidence interval is computed that there will be a 95% probability that the interval to be formed will include the population parameter. In other words, for every 100 intervals we form, 95 of them will include the population parameter.

16. Know advantages and disadvantages of point estimates, confidence intervals, and hypothesis testing.
Both point estimates and hypothesis testing have a similar advantage in that they allow you to make conclusions about a single number. A point estimates gives you a "best guess" of what the parameter value is and a hypothesis test allows you to be test whether a hypothesized value is reasonable to assume as the parameter value. This advantage of both point estimates and hypothesis tests of dealing with a single value is also a disadvantage of both procedures. Since you know that the point estimate is almost surely inaccurate you have no idea of how sure to be of that estimate. With hypothesis testing all you usually know is that the hypothesized value is not a reasonable value for estimating the population parameter but then you do not know what it is likely to be. A confidence interval is frequently better than both of these since it gives a range of values within which you can be confident to a certain degree that the parameter resides. A point estimate gives you only one number, a hypothesis test gives you one number and a probability, while a confidence interval gives you a range of number (an error factor) and a probability number.

In the most common research situation a difference between the means of two different situations is being studied. A point estimate might say that there is a 10 point difference between the groups. A hypothesis test would say that there is a significant difference between the groups at the .05 level and the sample difference is 10 points, while a confidence interval would say that the researcher is 95% confident that the difference between the two groups is somewhere between 4-16 points.

17. Know how hypothesis testing can be done using confidence intervals.
If a confidence interval does not include zero then it can be said that the parameter is greater than zero at the level of significance corresponding to the confidence level. For example, if a 95% confidence interval of a mean is 4-10 (which does not include zero), the mean is significantly different from zero at the .05 level.

18. Know appropriate ways to act when dealing with statements of uncertainty concerning inferential research results.
Since almost all research dealing with people gives results in which error is present (point estimates are not without error), there are appropriate and inappropriate ways to deal with the conclusions.

While the results of research do specify that error is present and attempt to estimate the degree of error and the probability of error of a given size to occur, the conclusions are frequently stated in a confident manner that does not suggest that error is present. For example a conclusion might state "Method A works better than Method B in teaching spelling." The result of the research might have stated that there is a 99% probability that Method A is between 4-8 points better than Method B. Researchers know that their conclusions might be wrong while they are stating them as being true. The consumer needs to know this assumption.

The two inappropriate ways to deal with research results are the overly confident approach which does not allow any consideration that the conclusions might be incorrect and the overly timid approach which does not allow for any action or belief unless there is no error or absolute certainty. One extreme results in dogmatism and inflexibility while the other results in perpetual doubt, uncertainty, inertia and immobility.

19. Know where to get the values needed for computing a confidence interval.
The center point for a confidence interval is the point estimate computed from the sample data. The z value (later we will also use t) is the two-tailed value found in a normal curve table. The standard error value is computed from the sample data.

20. Know how various factors affect the size of confidence intervals.
Confidence intervals are narrower (less error) with larger samples and if less confidence is required. A 95% confidence interval is not as wide as a 99% confidence interval. To be 100% confident you need the maximum width confidence interval (the possible range of values). The width of the confidence interval also increases as the variability of the characteristic in the population increases.

21. Know the meaning of the null hypothesis and the form in which it is expressed.
A null hypothesis states there is no difference between two or more things (or that two or more things are equal). This difference may be between a parameter and a specified value, between a difference between parameters and a specified value, or other things not considered in this course. For example a null might state that a population mean is equal to 50 (not different from 50), or that two population percentages are not different (are equal).

22. Know the purpose of stating a null hypothesis.
The null hypothesis is stated as a specific value that can be rejected. Most research studies do not have a specific value other than zero that can be defended as a logical choice of a specific value to use in stating the null hypothesis. The most common null hypothesis is to state that there is no difference between statistics for two or more groups. The specific value tested in this case is zero (zero difference). Usually the researcher believes that the null hypothesis is false and hopes to reject it. The degree to which the null hypothesis is false is not known and thus cannot be stated with certainty. When comparing two things, the most common question is "Are they different", rather than "How different are they?"

23. Know the meaning of the alternative hypothesis and the form in which it is expressed.
An alternative hypothesis is the conclusion that will be accepted if the null hypothesis is rejected. It can be expressed in either a "a difference exists" form or a "greater than or less than" form. For example, the alternative hypothesis could be "the mean is not 50" or "the mean is greater than 50." The alternative hypothesis is usually the only logical alternative to the null hypothesis.

24. Know the meaning of one and two tailed tests of significance and situations when each is appropriate.
A one-tailed test is a test of the null hypothesis in which the alternative hypothesis is a "greater than or less than" alternative. In this case rejection of the null hypothesis can only result in conclusions in the direction specified in the null hypothesis. If a null hypothesis states that Methods A and B are equal and the alternative hypothesis states that Method A is greater than Method B, then if the sample results indicate that Method B is greater than Method B, the null hypothesis could not be rejected, since the alternative hypothesis does not allow that option to be considered. A two-tailed test has an alternative hypothesis of "a different exists." A conclusion can be either "greater than" or "less than."

A one-tailed test is only appropriate when the researcher feels that a real difference in the population parameters can only exist in one direction (Method A cannot be less than Method B, or the population mean cannot be less than 50). In most cases this is not appropriate. The few cases in which it is appropriate would be when studying a gain from one time to another when a loss is not possible. For example, you could study whether eating spinach makes you grow taller. You could assume that eating spinach might have no effect, but it would never make you shorter. Most researchers use two-tailed tests since in most things we test there could be a difference in either direction.

25. Know the meaning of alpha (level of significance).
Alpha stands for the level of significance which is a percentage number indicating how willing the researcher is to reject a true null hypothesis. For example, if the level of significance is set at .05 it indicates that in situations when the null hypothesis is true (no difference between population characteristics), we will reject the null hypothesis erroneously 5% of the time. This is not the same as saying that we will reject the null hypothesis erroneously 5% of the time when we use the 5% level.

26. Know how to report significant results.
There are two common ways to report significant results. The most common way used to be placing an asterisk (*) by the side of the test. This allowed a yes/no conclusion indicating whether the probability of a Type I error was larger or smaller than the indicated level of significance. Since computers can give exact probabilities with no effort, a growing number of studies now report the exact probability in addition to or as a replacement for the asterisk.

27. Know the meaning of and symbols for Type I and Type II error.
A Type I error is made when the null hypothesis is erroneously rejected: the sample results indicate a significant difference while in truth, there is no difference in the population. A Type II error is made when the null hypothesis is erroneously retained: the sample results indicate that there is not a significant difference while there is actually a difference in the population. The probability of a Type I error is and the probability of a Type II error is .

28. Know how various factors affect Type I and Type II error and how to reduce both.
The researcher has complete control over Type I error since it is only affected by the level of significance selected. If the level of significance chosen is .05, there is a 5% probability of Type I error no matter what conditions exist.

A researcher can influence the probability of a Type II error by varying sample size and the level of significance. Larger samples will increase the probability of finding a significant difference and reduce the probability of a Type II error (saying there is no difference). Raising the level of significance (e.g., 05 rather than .01) reduces the probability of a Type II error (while increasing the probability of a Type I error). The probability of a Type II error is also affected by the size of the difference existing in the population (effect size). The larger the difference, the less likely a Type II error will occur. Type II error can also be reduced by using research procedures that reduce error (use the same subjects in both groups, use more reliable tests, etc.)

The recommended way to reduce the risk of both Type I and Type II error is to lower the level of significance (reducing the risk of Type I while unfortunately increasing the risk of Type II) but also increasing sample size which reduces the risk of Type II while leaving the risk of Type I the same.

29. Know commonly used levels of significance, situations when each is appropriate, and when they should be chosen.
The most common level of significance is .05, followed closely by .01 and somewhat farther behind, .001.

The level of significance to select depends largely on the importance attached to Type I and Type II error. If Type I error is the most important, then .01 or .001 should be used. If Type II is more important, than .05 and possibly even .10 should be used. In most research .05 and .01 are seen as compromises that gives acceptable values of both Type I and Type II errors. In exploratory research you do not want to miss promising leads for future research (Type II error is more important). If costly decisions (time, money, effort) result from finding a significant difference, you do not want to declare something to be different if it really is not (Type I error is more important).

30. Know the meaning of and how to form a region of rejection for one-and two-tailed tests given a table value.
A region of rejection for a one-tailed test is all z values equal to or larger than the table value only in the direction of the alternative hypothesis. For a two-tailed test the region is equal to or larger than the table value for both positive and negative values.

31. Know the meaning of reject, retain, and accept as they relate to hypothesis testing and how to use them with one- and two-tailed tests.
If the computed z value is in the region of rejection the null hypothesis is rejected. This means that the alternative hypothesis is accepted as being true. The result is considered to be a significant result at the level of significance level used. A difference that is significant at the .05 level means that this result (rejection) would only occur 5% of the time if the null hypothesis were really true (no difference existed in the population).

If the computed z value is not in the region of rejection the null hypothesis is retained. It is not accepted since in most cases a difference was found--it was just not large enough to reject the null hypothesis. Finding a difference is not evidence for "accepting" that there is no difference. When the null hypothesis is retained it simply means that the evidence was insufficient for rejecting it so it is retained.

If a z value is found that is larger than the table value but in the opposite direction from that specified in the alternative hypothesis (a one-tailed test), the null hypothesis is retained.

32. Know the form of the formula for computing a z for a hypothesis test.
In hypothesis testing, the formula for the computed z is the difference between the statistics to be tested divided by the standard error for the test to be made.

33. Know the difference between an inference and a generalization.
An inference is a specific conclusion drawn from a sample to a population as a result of a specific test and which included a probability statement related to the error involved in the conclusion. A generalization is a non-specific subjective statement made to extend the meaning of the results to groups other than the population and for which no probability statements can be attached.

34. Know how various factors affect the probability of finding a significant difference.
A researcher is more likely to find a significant difference if a large difference exists in the population and the researcher uses a large sample, a high level of significance, and a 1-tailed test.

35. Know how to state conclusions to a test of a null hypothesis.
If there is no significant difference, the usual statement is one such as "No significant difference was found at the .05 level." The statement must include the level of significance. If a significant difference was found a proper statement includes at least two and possibly three parts. An appropriate statement is one such as "A significant difference was found between men and women at the .05 level. Men scored higher than women." In addition to the level of significance, the statement must include the direction of the difference. In addition it is many times helpful to include the size of the difference. For example the "Men scored higher than women" could be modified to say that "Men scored 1.5 points higher than women" or "Men scored slightly higher than women."

36. Know the meaning of a significant difference and an importance difference and reasons why this distinction is important.
A significant difference is one which is unlikely to have occurred by chance. An important difference is one that is large. Many significant differences are not large and thus not important. This is why it is important when stating conclusions in research reports make the importance attached to the difference reflect the size of the difference and not the level of significance at which the difference was tested. A difference that is significant at the .05 level may be large or small. If it was significant at the .001 level it still could be large or small. The lower level of significance simply means you are more sure, not that the difference is larger.

37. Know situations that cause large differences to be not significant and small differences to be significant and the implication of these situations.
With small samples you have large error and thus large differences found in the data will probably be declared to be not significant. Small samples result in a high probability of Type II errors. When you do not find significant differences with small samples, it suggests that you need to do the research over again. Your research is probably worthless (the probability of a Type II error was too high).

With large samples you have small error and thus small differences found in the data will probably be declared to be significant. Large samples result in a low probability of a Type II error. When you find significant differences with large samples, be careful to note the size of the differences before you state the importance of the conclusions. It is easy to overemphasize small differences when you have large samples.

38. Know when the sampling distribution of means is distributed as a t distribution and when the t- test is to be used rather than the z (normal) distribution.
When a null hypothesis concerning means is being conducted and the population standard deviation is unknown (it is estimated from the sample data), the sampling distribution of the means will not be normally distributed but distributed as a t distribution. In these situations (which are the most common ones) a t-test is used rather than the z test.

39. Know the meaning of degrees of freedom, how it is related to sample size, and how it is used.
Degrees of freedom (df) is a term referring to how many of the data elements are free to vary. It is generally a number that is slightly smaller than the sample size. For most of the purposes in this course it is N-1 or N-2. The df number is needed to find the appropriate sampling distribution (table values) to use.

40. Know the shape of the t distribution with varying sample sizes and how it is related to the normal distribution.
The t distribution is very similar in shape to the normal distribution except that it has more cases in the tails. It becomes closer to the normal distribution as sample sizes increase.

41. Know how cutoff values for forming regions of rejection and p values differ for t and z distributions.
Since t distributions have more cases in the tails, for equivalent p values the table values for the t distribution are larger. For example for a two-tailed test at the .05 level the z value is 1.96 while t values are 1.98, 1.99, and 2.00 for common sample sizes.

42. Know the meaning of independent and dependent measures and recognize examples of each.
Independent measures are those in which scores for individuals in one group are not predictable from scores in the other group. Dependent measures in one group can be predicted from scores in the other group. Common situations with dependent measures include testing the same group of students with pre- and post-tests or by matching students in experimental and control groups. You can predict post-test scores from the pre-test scores and you can predict scores of matched students from each other.

43. Know what statistical procedures to use with dependent measures.
In order to take advantage of reduced error with using the same or matched subjects, you must use an error term in your statistical test that reflects this reduced error. The smaller error term will result in a larger z or t which will increase the likelihood that a significant difference will be found.

44. Know the advantages and disadvantages of using tests of independent and dependent means.
Using dependent measures (same or matched subjects) reduces the error of the study if the scores are correlated. This will reduce the risk of a Type II error if the appropriate formula is used that takes this error into consideration. The smaller error term will increase the probability of finding a significant result. If there is only a small correlation between the two measures a dependent means test can actually be worse, since the degrees of freedom of a dependent means test is smaller since it uses N as the number of pairs rather than the number of scores. Having a smaller N will result in a larger table value which will slightly reduce the probability of finding a significant difference.

45. Know assumptions for t-tests of differences between means and what to do if the assumptions are violated.
In order to satisfy the assumptions of a t-test for difference between means, the populations from which the two samples were selected must be normally distributed and have equal variance.

In most cases, these assumptions cannot be met. However, as sample size increases, the normality assumption is not important due to the Central Limit Theorem. Unless the variances are markedly different the probabilities associated with the t-test are not going to be off very much and it is not much to be worried about. If the variances are radically different (especially with quite different sample sizes) then a non-parametric test should probably be done rather than the t-test which is a parametric test.

An additional requirement is that the data be at least interval data since means will be computed.