Yet more perspective on the worshipped and/or abhorred p-value.
Biostatistics, clinical trial design, critical thinking about drugs and healthcare, skepticism, the scientific process.
Thursday, March 18, 2010
Love p-values for what they are, don't try to make them what they're not : Applied Statistics
Love p-values for what they are, don't try to make them what they're not : Applied Statistics
Monday, March 15, 2010
Odds are, you need to read this
With my recent attacks on p-values and many common statistical practices, it's good to know at least someone agrees with me.
Odds are, it's wrong, ScienceNews
(via American Statistical Association Facebook page)
Odds are, it's wrong, ScienceNews
(via American Statistical Association Facebook page)
Saturday, March 13, 2010
Observation about recent regulatory backlash against group sequential trials
Maybe it's just me, but I'm noticing an increased backlash against group sequential trials from regulatory authorities in the last couple of years. The argument against these two trials seems to be twofold:
The second point about safety is a major one, and one where the industry would do better to keep up with the methodology. Safety analysis is usually descriptive because hypothesis testing doesn't really work so well, because Type I errors (claiming a safety problem where there is none) is not as serious a problem as a Type II error (claiming no safety problem where there is one). Because safety issues can take many different forms (does the drug hurt the liver? heart? kidneys?) there is a massive multiple testing problem, and efforts to control the Type I error that we are used to are no longer conservative. There is the general notion that more evidence is better (and, to an extent, I would agree), but I think it is better to solve the hard problem and attempt to characterize how much evidence we have of the safety of a drug. We have started to do this with adverse events; for example, Berry and Berry have implemented a Bayesian analysis that I allude to in a previous blog post. Other efforts include using False Discovery Rates and other Bayesian models.
We are left with another difficult problem: how much of a safety issue are we willing to tolerate for the efficacy of a drug? Of course, it would be lovely if we could make a pill that cured our diseases and left everything else alone, but it's not going to happen. The fact of the matter is that during the review cycle regulatory agencies have to make the determination of whether safety risk is worth the efficacy, and I think it would be better to have that discussion up front. This kind of hard discussion before the submission of the application will help inform the design of clinical trials in Phase 3 and reduce the uncertainty in Phase 3 and the application and review process. Then we can talk with a better understanding about the role of sequential designs in Phase 3.
- Group sequential trials that stop early for efficacy tend to overstate the evidence for efficacy. While true, this can be corrected easily, and should be. Standard texts on group sequential trials, and software make the application of this correction easy.
- Trials that stop early tend to have too little evidence for safety.
The second point about safety is a major one, and one where the industry would do better to keep up with the methodology. Safety analysis is usually descriptive because hypothesis testing doesn't really work so well, because Type I errors (claiming a safety problem where there is none) is not as serious a problem as a Type II error (claiming no safety problem where there is one). Because safety issues can take many different forms (does the drug hurt the liver? heart? kidneys?) there is a massive multiple testing problem, and efforts to control the Type I error that we are used to are no longer conservative. There is the general notion that more evidence is better (and, to an extent, I would agree), but I think it is better to solve the hard problem and attempt to characterize how much evidence we have of the safety of a drug. We have started to do this with adverse events; for example, Berry and Berry have implemented a Bayesian analysis that I allude to in a previous blog post. Other efforts include using False Discovery Rates and other Bayesian models.
We are left with another difficult problem: how much of a safety issue are we willing to tolerate for the efficacy of a drug? Of course, it would be lovely if we could make a pill that cured our diseases and left everything else alone, but it's not going to happen. The fact of the matter is that during the review cycle regulatory agencies have to make the determination of whether safety risk is worth the efficacy, and I think it would be better to have that discussion up front. This kind of hard discussion before the submission of the application will help inform the design of clinical trials in Phase 3 and reduce the uncertainty in Phase 3 and the application and review process. Then we can talk with a better understanding about the role of sequential designs in Phase 3.
Saturday, March 6, 2010
When a t-test hides what a sign test exposes
John Cook recently posted a case where a statistical test "confirmed" a hypothesis but failed to confirm a more general hypothesis. Along those same lines, I had a recent case where I was comparing a cheap and a more expensive way of determining the potency of an assay. If they were equivalent, the sponsor could get by with the cheaper method. A t-test was not powerful enough to show a difference, but I noticed one method showed consistency lower potency than the other. I did a sign test (compared the number with lower results against the expected number based on a binomial) and got a significant result. I could not recommend the cheaper method.
Lessons learned:
- t-test is not necessarily more powerful than a sign test
- a t-test can "throw away" information
- dichotomizing data is often good and exchanges one type of information (qualitative) for loss of quantitative information
Friday, March 5, 2010
Another strike against p-values
Though I'm asked to produce the darn things every day, I have grown to detest p-values, mostly for the way that people wanted to engineer and overinterpret them. The fact that they do not follow the likelihood principle serve to provide an additional impetus to want to shove them overboard while no one is looking. Now, John Cook has brought up another reason: p-values are inconsistent (in the sense that they do not provide evidence for a set of hypotheses in the way that you would expect--I suspect if they were statistically inconsistent in the sense that no unbiased test could exist they would have been abandoned a while back).
Subscribe to:
Posts (Atom)