Objectives for Unit Six
Confidence Intervals and Hypothesis Testing
1. Know the meaning
of random, biased and representative as applied to samples and sampling
procedures and recognize examples of each.
The terms random and
biased refer to the process by which elements are selected from a population.
A random method is one in which each element in the population has an equal
chance of being selected. A biased method is one in some elements in the
population a greater chance of being selected than others. Neither random
nor biased is concerned with the characteristics of the sample. Representative
is the term used to indicate whether the sample is similar to the population
on specific characteristics. A representative sample is one in which sample
characteristics are the same as population characteristics. A representative
sample can result from either a random or biased sampling procedure. A
random sampling procedure is the procedure most likely to result in a representative
sample.
2. Know the procedures
for determine whether a sample is representative.
With very large samples
selected by a truly random process, representativeness is usually taken
as a given. It is assumed that randomness works when it is done over and
over again many times. With small samples randomness cannot be trusted
as much to give a representative sample so there is usually a report on
the similarity between the sample and the population on characteristics
for which the population characteristics are known or can be estimated
fairly accurately. For example demographic characteristics of the population
are usually available and can be reported for both the population and the
sample. If you find that a geographical area, school, gender group, age
group, etc. is over- or under-represented, you would questions the representativeness
of the sample on the actual variables being studied.
3. Know the procedures
used for selecting a random sample.
The normal way to
select a sample is to assign each element in the population a number and
then randomly pick a set of numbers to use in selecting the elements for
the sample. Prior to computers, tables of random numbers were the usual
way of selecting random numbers. The current procedure is to have a computer
select the random numbers. Other methods such as writing numbers on balls
(used in lotteries) and writing numbers on slips of paper are seldom used
in research.
4. Know the meaning
of sampling error and its implication.
Whenever a sample
is selected in which some elements of the population are not selected,
it is assumed (and is almost always true) that the sample is not identical
to the population (not truly representative). Therefore any research done
with sample data must take into consideration the fact that the results
will not be true for the population. The error must be considered in the
conclusions.
5. Know the meaning
of sampling distribution.
A sampling distribution
is a type of frequency distribution in which each element is a statistic
based on a sample. In a regular frequency distribution the elements are
characteristics of persons or things. In a sampling distribution the elements
are characteristics of samples with each sample having the same sample
size (N).
6. Know the types
of sampling distributions.
An "empirical" sampling
distribution is based on statistics from "real" samples. Actual data is
collected from many samples and a sampling distribution is constructed.
In real life this is never done except by simulation using computers. A
"theoretical" sampling distribution is one generated by a formula which
gives the shape of a curve that would result if statistics were computed
based on an infinite number of samples. Computers are used to generate
large numbers of samples to form an empirical distribution to validate
or verify that the theoretical sampling distributions are accurate.
7. Know the meaning
of the expected value as it relates to a sampling distribution and how
this relates to biased and unbiased statistics.
If statistics are
computed on an infinite number of samples, the mean of the statistics is
called the expected value. If the expected value of the statistic is equal
to the population parameter, the statistic is called a unbiased statistic.
If the expected value is not equal to the population parameter, it is a
biased statistic. For example the formula for standard deviation using
N in the denominator is a biased statistic since using it to compute statistics
for an infinite number of samples would not give the population standard
deviation. N-1 must be used in the denominator to give a unbiased statistic.
8. Know the meaning
and symbol for standard error.
The standard error
is the standard deviation of the sampling distribution. With an empirical
distribution it can be computed just a regular standard deviation. With
a theoretical distribution, there are other formulas that can be used.
These formulas can be validated or verified by generating an empirical
distribution of many samples using a computer.
The symbol for standard error is the standard deviation symbol (s or ) with a subscript appropriate for the sample statistic used for the sampling distribution (p or bar X).
Standard errors can be interpreted based on the normal curve in the same way as the standard deviation. 68% of the statistics in a sampling distribution will be within one standard error of the mean of the sampling distribution.
9. Know how various
factors affect the size of the standard error and the expected value.
The size of the standard
error is dependent on the type of statistic used. For example, estimating
the population median will have more error than estimating the population
mean. The larger the variability of a characteristic in a population, the
larger the error in estimating that population characteristic. The larger
the sample size, the smaller the error in estimate any characteristics
of the population from the sample.
10. Know the upper
and lower limits for the standard error of the mean.
With a sample size
of 1, the standard error of the mean will its maximum size which will be
equal to the standard deviation of the population. The smallest the standard
error can be is 0 which would occur if the sample size equalled the population
size.
11. Know how various
factors affect the shape of the sampling distribution of the mean.
If the distribution
of a characteristic of a population is normally distributed, the sampling
distribution of the mean of that characteristic will also be normally distributed
no matter what the sample size. If the distribution of the characteristic
in the population is not normally distributed the extent to which the sampling
distribution is normally distributed will depend on sample size. The greater
the sample size, the closer the sampling distribution is to being normally
distributed. This characteristic is called the Central Limit Theorem.
12. Know the sample
size needed to assume that the sampling distribution of the mean is normal.
With sample sizes
above 30 it can be assumed that the sampling distribution of the mean will
be close to normally distributed no matter what the shape of the distribution
of the characteristic in the population is.
13. Know the meaning
of statistical inference.
Inference is the process
of taking data from a sample selected from a larger population and basing
conclusions about the population on the basis of results derived from analysis
of the sample data.
14. Know the meaning
of hypothesis testing and the form used in stating conclusions to a hypothesis
test.
Hypothesis testing
has two components: stating a hypothesis about a population parameter or
many population parameters, and drawing a yes/no conclusion as to whether
the hypothesized statement is likely to be true or not. The conclusion
will be expressed in terms of level of significance which indicates a probability
associated with the likelihood that the conclusion is false.
As used in this course the hypothesis deal with testing whether a population parameter is equal to a certain value or whether there is a difference between two population parameters. For example we could test whether the percentage of people passing a stat test on the first attempt is really 80% or whether the mean score on the first stat test is different from the mean on the second test.
The conclusion to a hypothesis test is normally stated by saying that there is a significant difference between the values being compared at a specific level of significance. For example, "there is a significant difference between the means of the experimental and control groups at the .05 level."
15. Know the meaning
of a confidence interval and the form with which it is stated.
A confidence interval
is a statement giving a range of values within which the researcher is
confident that a population parameter lies.
A confidence interval includes an interval in the form of either a range of numbers (10 - 20) or a number plus/minus another number (15 ± 5) and a percentage number indicating the degree of confidence that the population parameter is in the interval. For example you could say that the 95% confidence interval for a mean is 10-20.
A common erroneous statement is to say that there is a 95% probability that the population parameter is within this interval. This is incorrect since the population is either in the interval (100% probability) or not in the interval (0% probability). The probability only is in your certainty, not in the actual fact.
We can say before the confidence interval is computed that there will be a 95% probability that the interval to be formed will include the population parameter. In other words, for every 100 intervals we form, 95 of them will include the population parameter.
16. Know advantages
and disadvantages of point estimates, confidence intervals, and hypothesis
testing.
Both point estimates
and hypothesis testing have a similar advantage in that they allow you
to make conclusions about a single number. A point estimates gives you
a "best guess" of what the parameter value is and a hypothesis test allows
you to be test whether a hypothesized value is reasonable to assume as
the parameter value. This advantage of both point estimates and hypothesis
tests of dealing with a single value is also a disadvantage of both procedures.
Since you know that the point estimate is almost surely inaccurate you
have no idea of how sure to be of that estimate. With hypothesis testing
all you usually know is that the hypothesized value is not a reasonable
value for estimating the population parameter but then you do not know
what it is likely to be. A confidence interval is frequently better than
both of these since it gives a range of values within which you can be
confident to a certain degree that the parameter resides. A point estimate
gives you only one number, a hypothesis test gives you one number and a
probability, while a confidence interval gives you a range of number (an
error factor) and a probability number.
In the most common research situation a difference between the means of two different situations is being studied. A point estimate might say that there is a 10 point difference between the groups. A hypothesis test would say that there is a significant difference between the groups at the .05 level and the sample difference is 10 points, while a confidence interval would say that the researcher is 95% confident that the difference between the two groups is somewhere between 4-16 points.
17. Know how hypothesis
testing can be done using confidence intervals.
If a confidence interval
does not include zero then it can be said that the parameter is greater
than zero at the level of significance corresponding to the confidence
level. For example, if a 95% confidence interval of a mean is 4-10 (which
does not include zero), the mean is significantly different from zero at
the .05 level.
18. Know appropriate
ways to act when dealing with statements of uncertainty concerning inferential
research results.
Since almost all research
dealing with people gives results in which error is present (point estimates
are not without error), there are appropriate and inappropriate ways to
deal with the conclusions.
While the results of research do specify that error is present and attempt to estimate the degree of error and the probability of error of a given size to occur, the conclusions are frequently stated in a confident manner that does not suggest that error is present. For example a conclusion might state "Method A works better than Method B in teaching spelling." The result of the research might have stated that there is a 99% probability that Method A is between 4-8 points better than Method B. Researchers know that their conclusions might be wrong while they are stating them as being true. The consumer needs to know this assumption.
The two inappropriate ways to deal with research results are the overly confident approach which does not allow any consideration that the conclusions might be incorrect and the overly timid approach which does not allow for any action or belief unless there is no error or absolute certainty. One extreme results in dogmatism and inflexibility while the other results in perpetual doubt, uncertainty, inertia and immobility.
19. Know where to
get the values needed for computing a confidence interval.
The center point for
a confidence interval is the point estimate computed from the sample data.
The z value (later we will also use t) is the two-tailed value found in
a normal curve table. The standard error value is computed from the sample
data.
20. Know how various
factors affect the size of confidence intervals.
Confidence intervals
are narrower (less error) with larger samples and if less confidence is
required. A 95% confidence interval is not as wide as a 99% confidence
interval. To be 100% confident you need the maximum width confidence interval
(the possible range of values). The width of the confidence interval also
increases as the variability of the characteristic in the population increases.
21. Know the meaning
of the null hypothesis and the form in which it is expressed.
A null hypothesis
states there is no difference between two or more things (or that two or
more things are equal). This difference may be between a parameter and
a specified value, between a difference between parameters and a specified
value, or other things not considered in this course. For example a null
might state that a population mean is equal to 50 (not different from 50),
or that two population percentages are not different (are equal).
22. Know the purpose
of stating a null hypothesis.
The null hypothesis
is stated as a specific value that can be rejected. Most research studies
do not have a specific value other than zero that can be defended as a
logical choice of a specific value to use in stating the null hypothesis.
The most common null hypothesis is to state that there is no difference
between statistics for two or more groups. The specific value tested in
this case is zero (zero difference). Usually the researcher believes that
the null hypothesis is false and hopes to reject it. The degree to which
the null hypothesis is false is not known and thus cannot be stated with
certainty. When comparing two things, the most common question is "Are
they different", rather than "How different are they?"
23. Know the meaning
of the alternative hypothesis and the form in which it is expressed.
An alternative hypothesis
is the conclusion that will be accepted if the null hypothesis is rejected.
It can be expressed in either a "a difference exists" form or a "greater
than or less than" form. For example, the alternative hypothesis could
be "the mean is not 50" or "the mean is greater than 50." The alternative
hypothesis is usually the only logical alternative to the null hypothesis.
24. Know the meaning
of one and two tailed tests of significance and situations when each is
appropriate.
A one-tailed test
is a test of the null hypothesis in which the alternative hypothesis is
a "greater than or less than" alternative. In this case rejection of the
null hypothesis can only result in conclusions in the direction specified
in the null hypothesis. If a null hypothesis states that Methods A and
B are equal and the alternative hypothesis states that Method A is greater
than Method B, then if the sample results indicate that Method B is greater
than Method B, the null hypothesis could not be rejected, since the alternative
hypothesis does not allow that option to be considered. A two-tailed test
has an alternative hypothesis of "a different exists." A conclusion can
be either "greater than" or "less than."
A one-tailed test is only appropriate when the researcher feels that a real difference in the population parameters can only exist in one direction (Method A cannot be less than Method B, or the population mean cannot be less than 50). In most cases this is not appropriate. The few cases in which it is appropriate would be when studying a gain from one time to another when a loss is not possible. For example, you could study whether eating spinach makes you grow taller. You could assume that eating spinach might have no effect, but it would never make you shorter. Most researchers use two-tailed tests since in most things we test there could be a difference in either direction.
25. Know the meaning
of alpha (level of significance).
Alpha stands for the
level of significance which is a percentage number indicating how willing
the researcher is to reject a true null hypothesis. For example, if the
level of significance is set at .05 it indicates that in situations when
the null hypothesis is true (no difference between population characteristics),
we will reject the null hypothesis erroneously 5% of the time. This is
not the same as saying that we will reject the null hypothesis erroneously
5% of the time when we use the 5% level.
26. Know how to
report significant results.
There are two common
ways to report significant results. The most common way used to be placing
an asterisk (*) by the side of the test. This allowed a yes/no conclusion
indicating whether the probability of a Type I error was larger or smaller
than the indicated level of significance. Since computers can give exact
probabilities with no effort, a growing number of studies now report the
exact probability in addition to or as a replacement for the asterisk.
27. Know the meaning
of and symbols for Type I and Type II error.
A Type I error is
made when the null hypothesis is erroneously rejected: the sample results
indicate a significant difference while in truth, there is no difference
in the population. A Type II error is made when the null hypothesis is
erroneously retained: the sample results indicate that there is not a significant
difference while there is actually a difference in the population. The
probability of a Type I error is and the probability of a Type II error
is .
28. Know how various
factors affect Type I and Type II error and how to reduce both.
The researcher has
complete control over Type I error since it is only affected by the level
of significance selected. If the level of significance chosen is .05, there
is a 5% probability of Type I error no matter what conditions exist.
A researcher can influence the probability of a Type II error by varying sample size and the level of significance. Larger samples will increase the probability of finding a significant difference and reduce the probability of a Type II error (saying there is no difference). Raising the level of significance (e.g., 05 rather than .01) reduces the probability of a Type II error (while increasing the probability of a Type I error). The probability of a Type II error is also affected by the size of the difference existing in the population (effect size). The larger the difference, the less likely a Type II error will occur. Type II error can also be reduced by using research procedures that reduce error (use the same subjects in both groups, use more reliable tests, etc.)
The recommended way to reduce the risk of both Type I and Type II error is to lower the level of significance (reducing the risk of Type I while unfortunately increasing the risk of Type II) but also increasing sample size which reduces the risk of Type II while leaving the risk of Type I the same.
29. Know commonly
used levels of significance, situations when each is appropriate, and when
they should be chosen.
The most common level
of significance is .05, followed closely by .01 and somewhat farther behind,
.001.
The level of significance to select depends largely on the importance attached to Type I and Type II error. If Type I error is the most important, then .01 or .001 should be used. If Type II is more important, than .05 and possibly even .10 should be used. In most research .05 and .01 are seen as compromises that gives acceptable values of both Type I and Type II errors. In exploratory research you do not want to miss promising leads for future research (Type II error is more important). If costly decisions (time, money, effort) result from finding a significant difference, you do not want to declare something to be different if it really is not (Type I error is more important).
30. Know the meaning
of and how to form a region of rejection for one-and two-tailed tests given
a table value.
A region of rejection
for a one-tailed test is all z values equal to or larger than the table
value only in the direction of the alternative hypothesis. For a two-tailed
test the region is equal to or larger than the table value for both positive
and negative values.
31. Know the meaning
of reject, retain, and accept as they relate to hypothesis testing and
how to use them with one- and two-tailed tests.
If the computed z
value is in the region of rejection the null hypothesis is rejected. This
means that the alternative hypothesis is accepted as being true. The result
is considered to be a significant result at the level of significance level
used. A difference that is significant at the .05 level means that this
result (rejection) would only occur 5% of the time if the null hypothesis
were really true (no difference existed in the population).
If the computed z value is not in the region of rejection the null hypothesis is retained. It is not accepted since in most cases a difference was found--it was just not large enough to reject the null hypothesis. Finding a difference is not evidence for "accepting" that there is no difference. When the null hypothesis is retained it simply means that the evidence was insufficient for rejecting it so it is retained.
If a z value is found that is larger than the table value but in the opposite direction from that specified in the alternative hypothesis (a one-tailed test), the null hypothesis is retained.
32. Know the form
of the formula for computing a z for a hypothesis test.
In hypothesis testing,
the formula for the computed z is the difference between the statistics
to be tested divided by the standard error for the test to be made.
33. Know the difference
between an inference and a generalization.
An inference is a
specific conclusion drawn from a sample to a population as a result of
a specific test and which included a probability statement related to the
error involved in the conclusion. A generalization is a non-specific subjective
statement made to extend the meaning of the results to groups other than
the population and for which no probability statements can be attached.
34. Know how various
factors affect the probability of finding a significant difference.
A researcher is more
likely to find a significant difference if a large difference exists in
the population and the researcher uses a large sample, a high level of
significance, and a 1-tailed test.
35. Know how to
state conclusions to a test of a null hypothesis.
If there is no significant
difference, the usual statement is one such as "No significant difference
was found at the .05 level." The statement must include the level of significance.
If a significant difference was found a proper statement includes at least
two and possibly three parts. An appropriate statement is one such as "A
significant difference was found between men and women at the .05 level.
Men scored higher than women." In addition to the level of significance,
the statement must include the direction of the difference. In addition
it is many times helpful to include the size of the difference. For example
the "Men scored higher than women" could be modified to say that "Men scored
1.5 points higher than women" or "Men scored slightly higher than women."
36. Know the meaning
of a significant difference and an importance difference and reasons why
this distinction is important.
A significant difference
is one which is unlikely to have occurred by chance. An important difference
is one that is large. Many significant differences are not large and thus
not important. This is why it is important when stating conclusions in
research reports make the importance attached to the difference reflect
the size of the difference and not the level of significance at which the
difference was tested. A difference that is significant at the .05 level
may be large or small. If it was significant at the .001 level it still
could be large or small. The lower level of significance simply means you
are more sure, not that the difference is larger.
37. Know situations
that cause large differences to be not significant and small differences
to be significant and the implication of these situations.
With small samples
you have large error and thus large differences found in the data will
probably be declared to be not significant. Small samples result in a high
probability of Type II errors. When you do not find significant differences
with small samples, it suggests that you need to do the research over again.
Your research is probably worthless (the probability of a Type II error
was too high).
With large samples you have small error and thus small differences found in the data will probably be declared to be significant. Large samples result in a low probability of a Type II error. When you find significant differences with large samples, be careful to note the size of the differences before you state the importance of the conclusions. It is easy to overemphasize small differences when you have large samples.
38. Know when the
sampling distribution of means is distributed as a t distribution and when
the t- test is to be used rather than the z (normal) distribution.
When a null hypothesis
concerning means is being conducted and the population standard deviation
is unknown (it is estimated from the sample data), the sampling distribution
of the means will not be normally distributed but distributed as a t distribution.
In these situations (which are the most common ones) a t-test is used rather
than the z test.
39. Know the meaning
of degrees of freedom, how it is related to sample size, and how it is
used.
Degrees of freedom
(df) is a term referring to how many of the data elements are free to vary.
It is generally a number that is slightly smaller than the sample size.
For most of the purposes in this course it is N-1 or N-2. The df number
is needed to find the appropriate sampling distribution (table values)
to use.
40. Know the shape
of the t distribution with varying sample sizes and how it is related to
the normal distribution.
The t distribution
is very similar in shape to the normal distribution except that it has
more cases in the tails. It becomes closer to the normal distribution as
sample sizes increase.
41. Know how cutoff
values for forming regions of rejection and p values differ for t and z
distributions.
Since t distributions
have more cases in the tails, for equivalent p values the table values
for the t distribution are larger. For example for a two-tailed test at
the .05 level the z value is 1.96 while t values are 1.98, 1.99, and 2.00
for common sample sizes.
42. Know the meaning
of independent and dependent measures and recognize examples of each.
Independent measures
are those in which scores for individuals in one group are not predictable
from scores in the other group. Dependent measures in one group can be
predicted from scores in the other group. Common situations with dependent
measures include testing the same group of students with pre- and post-tests
or by matching students in experimental and control groups. You can predict
post-test scores from the pre-test scores and you can predict scores of
matched students from each other.
43. Know what statistical
procedures to use with dependent measures.
In order to take advantage
of reduced error with using the same or matched subjects, you must use
an error term in your statistical test that reflects this reduced error.
The smaller error term will result in a larger z or t which will increase
the likelihood that a significant difference will be found.
44. Know the advantages
and disadvantages of using tests of independent and dependent means.
Using dependent measures
(same or matched subjects) reduces the error of the study if the scores
are correlated. This will reduce the risk of a Type II error if the appropriate
formula is used that takes this error into consideration. The smaller error
term will increase the probability of finding a significant result. If
there is only a small correlation between the two measures a dependent
means test can actually be worse, since the degrees of freedom of a dependent
means test is smaller since it uses N as the number of pairs rather than
the number of scores. Having a smaller N will result in a larger table
value which will slightly reduce the probability of finding a significant
difference.
45. Know assumptions
for t-tests of differences between means and what to do if the assumptions
are violated.
In order to satisfy
the assumptions of a t-test for difference between means, the populations
from which the two samples were selected must be normally distributed and
have equal variance.
In most cases, these assumptions cannot be met. However, as sample size increases, the normality assumption is not important due to the Central Limit Theorem. Unless the variances are markedly different the probabilities associated with the t-test are not going to be off very much and it is not much to be worried about. If the variances are radically different (especially with quite different sample sizes) then a non-parametric test should probably be done rather than the t-test which is a parametric test.
An additional requirement
is that the data be at least interval data since means will be computed.