Teaching students the concept of power in tests of significance can be daunting. Happily, the AP Statistics curriculum requires students to understand only the concept of power and what affects it; they are not expected to compute the power of a test of significance against a particular alternate hypothesis. Show
What Does Power Mean?The easiest definition for students to understand is: power is the probability of correctly rejecting the null hypothesis. We’re typically only interested in the power of a test when the null is in fact false. This definition also makes it more clear that power is a conditional probability: the null hypothesis makes a statement about parameter values, but the power of the test is conditional upon what the values of those parameters really are. To make that even more clear: a hypothesis test begins with a null hypothesis, which usually proposes a very particular value for a parameter or the difference between two parameters (for example, “ ” or “”).1 Then it includes “an” alternate hypothesis, which is usually in fact a collection of possible parameter values competing with the one proposed in the null hypothesis (for example, “” which is really a collection of possible values of , and ," which allows for many possible values of . The power of a hypothesis test is the probability of rejecting the null, but this implicitly depends upon what the value of the parameter or the difference in parameter values really is.The following tree diagram may help students appreciate the fact that α, β, and power are all conditional probabilities. Figure 1: Reality to DecisionPower may be expressed in several different ways, and it might be worthwhile sharing more than one of them with your students, as one definition may “click” with a student where another does not. Here are a few different ways to describe what power is:
To help students better grasp the concept, I continually restate what power means with different language each time. For example, if we are doing a test of significance at level α = 0.1, I might say, “That’s a pretty big alpha level. This test is ready to reject the null at the drop of a hat. Is this a very powerful test?” (Yes, it is. Or at least, it’s more powerful than it would be with a smaller alpha value.) Another example: If a student says that the consequences of a Type II error are very severe, then I may follow up with “So you really want to avoid Type II errors, huh? What does that say about what we require of our test of significance?” (We want a very powerful test.) What Affects Power?There are four things that primarily affect the power of a test of significance. They are:
Two Classroom ActivitiesThe two activities described below are similar in nature. The first one relates power to the “magnitude of the effect,” by which I mean here the discrepancy between the (null) hypothesized value of a parameter and its actual value.2 The second one relates power to sample size. Both are described for classes of about 20 students, but you can modify them as needed for smaller or larger classes or for classes in which you have fewer resources available. Both of these activities involve tests of significance on a single population proportion, but the principles are true for nearly all tests of significance. Activity 1: Relating Power to the Magnitude of the EffectIn advance of the class, you should prepare 21 bags of poker chips or some other token that comes in more than one color. Each of the bags should have a different number of blue chips in it, ranging from 0 out of 200 to 200 out of 200, by 10s. These bags represent populations with different proportions; label them by the proportion of blue chips in the bag: 0 percent, 5 percent, 10 percent,... , 95 percent, 100 percent. Distribute one bag to each student. Then instruct them to shake their bags well and draw 20 chips at random. Have them count the number of blue chips out of the 20 that they observe in their sample and then perform a test of significance whose null hypothesis is that the bag contains 50 percent blue chips and whose alternate hypothesis is that it does not. They should use a significance level of α = 0.10. It’s fine if they use technology to do the computations in the test. They are to record whether they rejected the null hypothesis or not, then replace the tokens, shake the bag, and repeat the simulation a total of 25 times. When they are done, they should compute what proportion of their simulations resulted in a rejection of the null hypothesis. Meanwhile, draw on the board a pair of axes. Label the horizontal axis “Actual Population Proportion” and the vertical axis “Fraction of Tests That Rejected.” When they and you are done, students should come to the board and draw a point on the graph corresponding to the proportion of blue tokens in their bag and the proportion of their simulations that resulted in a rejection. The resulting graph is an approximation of a “power curve,” for power is precisely the probability of rejecting the null hypothesis. Figure 2 is an example of what the plot might look like. The lesson from this activity is that the power is affected by the magnitude of the difference between the hypothesized parameter value and its true value. Bigger discrepancies are easier to detect than smaller ones. Figure 2: Power CurveActivity 2: Relating Power to Sample SizeFor this activity, prepare 11 paper bags, each containing 780 blue chips (65 percent) and 420 nonblue chips (35 percent).3 This activity requires 8,580 blue chips and 4,620 nonblue chips. Pair up the students. Assign each student pair a sample size from 20 to 120. The activity proceeds as did the last one. Students are to take 25 samples corresponding to their sample size, recording what proportion of those samples lead to a rejection of the null hypothesis p = 0.5 compared to a two-sided alternative, at a significance level of 0.10. While they’re sampling, you make axes on the board labeled “Sample Size” and “Fraction of Tests That Rejected.” The students put points on the board as they complete their simulations. The resulting graph is a “power curve” relating power to sample size. Below is an example of what the plot might look like. It should show clearly that when p = 0.65 , the null hypothesis of p = 0.50 is rejected with a higher probability when the sample size is larger. (If you do both of these activities with students, it might be worth pointing out to them that the point on the first graph corresponding to the population proportion p = 0.65 was estimating the same power as the point on the second graph corresponding to the sample size n = 20.) ConclusionThe AP Statistics curriculum is designed primarily to help students understand statistical concepts and become critical consumers of information. Being able to perform statistical computations is of, at most, secondary importance and for some topics, such as power, is not expected of students at all. Students should know what power means and what affects the power of a test of significance. The activities described above can help students understand power better. If you teach a 50-minute class, you should spend one or at most two class days teaching power to your students. Don’t get bogged down with calculations. They’re important for statisticians, but they’re best left for a later course. What are the four factors that affect the power of a test?The 4 primary factors that affect the power of a statistical test are a level, difference between group means, variability among subjects, and sample size.
Is 80% power in a study good?The ideal power of a study is considered to be 0.8 (which can also be specified as 80%) (17). Sufficient sample size should be maintained to obtain a Type I error as low as 0.05 or 0.01 and a power as high as 0.8 or 0.9.
Why does increasing sample size increase power?As the sample size increases, so does the power of the significance test. This is because a larger sample size constricts the distribution of the test statistic. This means that the standard error of the distribution is reduced and the acceptance region is reduced which in turn increases the level of power.
How to increase power in a study without increasing sample size?Increasing statistical power in psychological research without increasing sample size. What is statistical power? ... . Studies in psychology are grossly underpowered. ... . What can we do to increase power? ... . Recommendation 1: Decrease the mean square error. ... . Recommendation 2: Increase the variance of your predictor variable.. |