## Tuesday, January 29, 2008

### The hammer, or the sledgehammer? A small study in simulation

RKN over at Epidemiology blog had a small problem he solved using simulation:

I have been interested in the following question: if there are, let's say, 5 genes involved in the liability of fx risk, and each gene has two alleles with one of them conferring a greater risk, what is the chance that an individual will have 0, 1, 2, 3, 4, or 5 risk alleles?

Obviously, the answer depends on the allelic frequency. I am too lazy to work out by algebraic probability, so I have conducted a simple simulation (using SAS) as follows:

There are 5 genes with allelic frequencies being 1%, 5%, 10%, 15% and 20%
Assuming that they are independent (zero linkage disequilibrium)

He then programmed a short simulation in SAS using 100,000 replicates. However, remembering the advice at this site, I wondered if SAS's random number generator was up to the task. The period of this generator (a linear congruential generator) is 232-1, and the quality decreases after the production of the square root of this number of pseudorandom numbers. (Around 216, or 65536.) At 500,000 numbers, the quality of the random numbers should start to decrease.

I didn't run the battery of tests on the numbers, but I did replicate the experiment using R, which uses the Mersenne twister as its default generator. I got the following as a result:

0 1 2 3 4
0.57481 0.34678 0.07177 0.00633 0.00031

This isn't too far off of what RKN got. So maybe the sledgehammer wasn't necessary here.

## Sunday, January 27, 2008

### Is it ever OK to change the primary endpoint?

One of the recent points for discussion in the recent ENHANCE trial was whether it was ethical to change the primary endpoint of the trial. Ed Silverman writes in his Pharmalot blog
We’ve written this before, and we will write this again: changing a primary endpoint - and doing so without consulting the lead investigator - is inappropriate. And failing to provide more meaningful updates to a widely anticipated trial much sooner in the process only caused skepticism and suspicison. And naming a so-called independent panel without releasing names compounds the matter, especially when some panel members aren’t independent. If Merck and Schering-Plough execs are upset over “mischaracterization,” they have only themselves to blame.

I want to focus in on one statement: "changing a primary endpoint ... is inappropriate."

There are a few points here I want to quickly discuss. In the opinion of this statistician, it is sometimes appropriate to change primary endpoints. The conditions under which this radical change is appropriate may or may not all have been met in the ENHANCE trial. However, while a change in primary endpoint ought to be enough to raise suspicions (and such a change is not to be done lightly), it should not be, by itself, enough to sink a study.

So, without further ado, circumstances where it may be appropriate to change primary endpoints:

Circumstance and discussionMet in ENHANCE?
The change must be decided before the study is unblinded. Making the change after the study is unblinded is enough to sink a study, even if an independent board is the only group who is unblinded.
Yes. The study was unblinded on Dec 31, 2007 (if we believe the press release, but the FDA should be able to audit this).
It would be very useful for the primary investigator (PI) to be involved in the decision. While statistically not necessary, from an operations point a view the PI has been trusted with the scientific responsibility of the study, and so should have input on the primary endpoint.
No, and, as Silverman points out, this casts further doubt on an already suspicious act. The composition and independence of the board who made the decision is unclear, and this may be an issue in the coming weeks.
There should be a good scientific reason for preferring the new endpoint over the old. Sometimes the reason is statistical (for example, an old endpoint may be much less powerful than a new one) or operational (eg. recruitment projects were way off target), but in any case the scientific justification of the new endpoint needs to be well established.
This is unclear. The claim is that there were difficulties in interpreting the old endpoint - the intima media thickness (IMT) - which is essentially the thickness of artery walls which must be determined from ultrasound images. Determining medical measures for clinical trials from imaging is a difficult task, even for areas such as arthritis where the measures are now standard.
Sometimes, there may be a plan to select a primary endpoint based on the data, but the algorithm for this needs to be specified in advance and the operating characteristics of the procedure, such as Type I error and power, need to be understood and specified. If this is the case, the primary endpoint can be chosen after unblinding, but the algorithm should be programmed before unblinding and should adhere to the plan exactly. Indeed, this is a tricky situation, and such a plan should only be used in extenuating circumstances.
I don't think so. I think if ENHANCE had an adaptive hypothesis we would know that by now (but this is not guaranteed - don't want to place too much weight on an appeal to consequence). At any rate, this is auditable, since the plan has to be written and signed.
The study is a pilot trial and the sponsor is prepared to deal with a selection bias.
No, ENHANCE was not a pilot trial. Instead, as can be seen from the news and stock, this trial had major financial and medical consequences.

Personally, I'm not quite ready to point my finger and yell "j'accuse!" at Schering-Plough and Merck quite yet, at least over attempting to change their primary endpoint in ENHANCE. I certainly will follow the facts that bubble up with great interest, though.

## Saturday, January 19, 2008

### The NNT returns

I've discussed the number needed to treat from a statistical point of view before. To review, the NNT is interpreted as the expected number of people who need to be treated (including drug, duration of treatment, and so forth) to receive a specified benefit (recover from a disease, or, in the case of prophylactic treatment such as statins, avoid a bad event).

John Mack of Pharma Marketing blog discusses a BusinessWeek article entitled "Do Cholesterol Drugs Do Any Good?" In these discussions, they reprint a table giving the estimated NNTs for various drugs, including atorvastatin (Lipitor™, Pfizer). I've reprinted (without permission, but hey it's circulating all over the globe) below:

Mack, like any good marketing guy, sensationalizes the findings in this tables by calling this "the statin lottery." We are treated to an attempt at an explanation of the NNT:

250 people are recruited to participate in the contest. Each person gives me \$1,000 and after 1 year one person in the group--selected at random--will receive \$250,000 (250 people x \$1,000). I keep the interest earned.

I'm not really clear what this analogy has to do with the NNT, but call me skeptical of the lottery analogy and even of the following statement by Dr. Nortin Hadler:

Anything over an NNT of 50 is worse than a lottery ticket; there may be no winners

Both the images of lottery and the statement basically claim that there is not benefit to taking statins. But I argue differently. These arguments against statins are based on bamboozling readers with big numbers, ignoring the payoff of taking statins vs. costs, and ignoring the fact that the use of statins is a very personal decision to be make carefully.

So, on to big numbers. I don't know why Dr. Hadler picked 50 as a cutoff for NNT. He may have given a reason in the interview that wasn't reported (or maybe I missed it), or maybe he just picked a number out of the rectal database. Given that the NNT is tied to a specific clinical outcome, course of treatment (including dose and frequency), and other specific medical events the analogy with lotteries break down. Never mind the fact that lotteries typically have winning chances of less than 1 in a million. So the 1 in 50 number just seems arbitrary. After the quote, we are shown higher NNTs for statins (70-250 and 500+), and have the upper range of that singled out for discussion. Why not discuss 70? Why not discuss the harmonic mean 109 (1/(1/70 + 1/250)), which is probably the right NNT estimate assuming that 70-250 is a confidence interval? Not impressive enough, I guess.

In light of the payoffs, I wonder if 1 in 70-250 even looks so bad. What is the balance of avoiding a freakin' myocardial infarction vs. taking 5 years of statins (assuming you have high blood pressure). Most of the cardiovascular events have a high cost in terms of money, healthcare resources, stress, and lifestyle modifications. What is the tradeoff between taking 5 years of statins (including chance of adverse events and money) and cardiovascular events? For each individual person, I don't know. How about 1 in 500 to avoid death or other "serious medical events" (presumably more serious than a myocardial infarction)? That's something to decide with a doctor. What is the NNT of avoiding MIs in people who have a family history of heart disease or other risk factors more than hypertension?

And the NNT can be a very useful statistic to use in that decision, as long as it is considered in context. Notice in the table there are 3 NNTs associated with statins, depending on the risk factors and the events to avoid. There's more NNTs not listed in the table. And, for context, we are given antibiotics for H. pylori ulcers and Avandia. The reason these are singled out for the table is not given, and it would have been very easy to give a simple chart of NNTs for many common medications. Statins might have looked as bad, worse, or better. It all depends on the context. The risk factors that are intimately tied in with NNTs are not discussed beyond people who have had a heart attack and people who merely have high blood pressure. For example, history of heart disease is not discussed. Finally, the uncertainty in calculating an NNT needs to be acknowledged by showing a confidence interval.

In short, I really appreciate that BusinessWeek discussed the NNT statistic. It is definitely a useful and easily interpretable figure that can be used in medical decision making. However, the simplifying explanations given leave out some useful and necessary information on how exactly to use that statistic to make medical decisions both on a personal and a policy level. I do understand the fact that we are overmedicated, but I also believe it is better to understand the phenomenon and base our course of action on reality than sensationalize it and feed the counterproductive pharma-bashing frenzy.

## Wednesday, January 16, 2008

### Notable addition to the blogroll

I just found a new biostatistics blog called Epidemiological Methods. The posts I've looked at so far are high quality, and the blogroll is high quality as well.

Oh, and via that, I found a Bayesian Stats blog, too.

## Monday, January 14, 2008

### Not "ENHANCE"-ing the public image of science, but it does show the power of the scientific method

The recent brouhaha over the Merck/Schering-Plough ENHANCE study can be pointed to as a triumph of science. SP clearly tried to pull a fast one by convening a "panel of experts" to determine the endpoint of the ENHANCE trial long after data was collected. Statistically, of course, this amounts to data dredging and is rather dishonest and unethical, not to mention unscientific. The power of science, after all, lies in our ability to predict outcomes based on our present data and current understanding of how things work, not picking outcomes on the basis of past data.

Our current safeguards have turned back a major assault on our sensibilities. I do feel bad for the Vytorin team because, well, they've gone this far into the development process and had a major setback. But as for the geniuses who decided to try to break the rules, I hope they find a different line of work. Having applied research help the public depends on a certain degree of trust, and shenanigans like this only serve to erode that trust.

As for me, I'm glad we've come this far in our ethical development as scientists to at least catch these blatant cases. But Congress is already trying to bang down our doors calling for more controls. I'm afraid there's more of our act to clean up.

As a note, Derek Lowe has a nice analysis of the clinical and statistical issues.

## Sunday, January 6, 2008

### A level-headed skeptical assessment of echinacaea

It seems that almost everyone either loves or hates echinacaea. Or, at least, there's a vocal core group of people that espouses its miraculous wonders as a cold remedy, and another equally vocal crowd that wants the first to shut up.

At any rate, the research on echinacaea has been mixed and confusing, which is par for the course for an extract of a natural product.

Stats.org just released a commentary on some shoddy reporting of the research, and a resource at the Mayo Clinic shows just how confusing the research has been on echinacaea in particular.

One note about the Stats.org commentary is worthy of a "lying with statistics" article. Articles from Bloomberg, NTY, and LA Times all reported that prophylactic use echinacaea reduced the rate of getting colds by about 65 percent. However, this number is the reduction in the odds of getting a cold, not a reduction in the probability of getting a cold (about 30 percent). Statistically, the concepts of odds and probability are way different, and the reporting of the odds in this case made echinacaea look way better than the reference study would indicate.

That's not the only issue, but I refer you to the Stats.org article for more ways of inoculating yourself against questionable reporting of statistics.

One last comment: it doesn't seem the author is against echinacaea or thinks that it is ineffective, but is simply evaluating the quality of the evidence and the reporting of the evidence.