Friday, August 27, 2010
Wednesday, August 18, 2010
Monday, August 16, 2010
Friday, August 13, 2010
Wednesday, August 11, 2010
Missing data is taken for granted now in clinical trials. This issue colors protocol design, case report form (CRF) development, monitoring, and statistical analysis. Statistical analysis plans (SAPs) must have a section covering missing data, or they are incomplete.
Of course, the handling of missing data is a hard question. Though Little and Rubin came out with the first through treatise in 1976 (link is to second edition), the methods to deal with missing data are hard enough that only statisticians understand them (and not very well at that). Of particular interest is the case when missing data depends on the what the value would have been had it been observed, even after conditioning on values you have observed (this is called "Missing not at random" [MNAR] or "nonignorably missing data"). Methods for dealing with MNAR data are notoriously difficult and depend on unverifiable assumptions, so historically we biostatisticians have relied on simple, but misleading, methods such as complete case analysis, last observation carried forward, or conditional mean imputation (i.e. replace with some adjusted mean or regression prediction).
The FDA has typically balked at these poor methods, but in the last few years has started to focus on the issue. They empaneled a group of statisticians a few years ago to research the issue and make recommendations, and the panel has now issued its report (link when I can find it). This report will likely find its way into a guidance, which will help sponsors deal more intelligently with this issue.
For now, the report carries few specific recommendations for methods and strategies for use, but the following principles apply:
- everything should be prespecified and then executed according to plan
- distinction should be made between dropouts and randomly missed visits
- single imputations such as LOCF should be avoided in favor of methods that adjust the standard error correctly for the missing data
- any missing data analysis should include a sensitivity analysis, where alternate methods are used in the analysis to make sure that the missing data are not driving the result (this still leaves open a huge can of worms, and it is hoped that further research will help here).
It's time to start thinking harder about this issue, and stop using last observation carried forward blindly. Pretty soon, those days will be over for good.
From my JSM 2010 notes on the topic.
Friday, August 6, 2010
Ok, so I am going to leave R, SAS, big data, and so forth aside for a bit (mostly) and focus on trends in biostatistics.
Adaptive trials (group sequential trials, sample size re-estimation, O'Brien-Fleming designs, triangular boundary trials) is a fairly mature literature at least as far as the classical group sequential boundaries goes. However, they leave a lot to be desired as they do not take advantage of full information at interim analyses, especially partial information on the disease trajectory from enrolled subjects who have not completed follow up. On the upside, they are easy to communicate to regulators, and software is available to design them, whether you use R, SAS, or EaST. The main challenge is finding project teams who are experienced in implementing adaptive trials, as not all data managers understand what is required for interim analyses, not all clinical teams are aware of their important roles, not all sites understand, and not all drug supply teams are aware of what they need to do.
Bayesian methods have a lot of promise both in the incorporation of partial information in making adaptation decisions and in making drug supply decisions. I think it will take a few years for developers to find this out, but I'm happy to evangelize. With the new FDA draft guidance on adaptive trials, I think more people are going to be bold and use adaptive trials. The danger, of course, is that they have to be done well to be successful, and I'm afraid that more people are going to use them because they are the in thing and the promise to save money, without a good strategy in place to actually realize those savings.
Patient segmentation (essentially, the analysis of subgroups from a population point of view) seems to be an emerging topic. This is because personalized medicine, which is a logical conclusion of segmentation, is a long way off (despite the hype). We have the methods to do segmentation today (perhaps with a little more development of methodology), and many of the promises of personalized medicine can be realized with an effective segmentation strategy. For example, if we can identify characteristics of subgroups who can benefit more from one class of drugs, that will be valuable information for physicians when they decide first line treatment after diagnosis.
Missing data has always been a thorn in the side, and the methodology has finally developed enough to where the FDA believes they can start drafting a guidance. A few years ago they empaneled a committee to study the problem of missing data and provide input into a draft guidance on the matter. The committee has put together a report (will link when I find the report), which is hot off the press and thought-provoking. Like the adaptive design and noninferiority guidances, the guidance will probably leave it up to the sponsor to justify the missing data method but there are a few strong suggestions:
- don't use single imputation methods, as they underestimate the standard error of the treatment effect
- specify one method as primary, but do other methods as sensitivity analyses. Show that the result of the trial is robust to different methods.
- Distinguish between dropouts and "randomly" missing data such as missed visits.
- Try not to have missing data, and institute followup procedures that decrease missing data rates. For dropouts, try to follow up anyway.
The use of graphics in clinical trial reporting has increased, and that's a good thing. A new site is being created to show the different kinds of graphs that can be used in clinical reports, and is designed to increase the usage of graphs. One FDA reviewer noted that graphics that are well done can decrease review times, and, in response to a burning question, noticed that the FDA will accept graphs that are created in R.
Finally, I will mention one field of study that we do not apply in our field. Yes, it's a fairly complex method but I believe the concept can be explained, and even if it is used in an exploratory manner it can yield a lot of great insights. It's functional data analysis, which is the study of curves as data rather than just points. My thought is that we can study whole disease trajectories (i.e.the changes in disease over time) rather than just endpoints. Using the functional data analysis methods, we can start to characterize patients as, for example, "quick responders," "slow responders," "high morbidity," and so forth depending on what their disease trajectory looks like. Then we can make comparisons in these disease trajectories between treatment groups. Come to think of it, it might be useful in patient segment analysis.
At any rate I think biostatisticians are going to feel the pinch in the coming years as drug developers rely more heavily on us to reduce costs and development time, yet keep the scientific integrity of a drug development program intact. I am confident we will step up to the challenge, but I think we need to be more courageous in applying our full knowledge to the problems and be assertive in those cases where inappropriate methods will lead to delays, higher costs, and/or ethical issues.
The second half of JSM was just as eventful as the first half. Jim Goodnight addressed the new practical problems requiring analytics. Perhaps telling, though is his almost begrudging admission that R is big. The reality is that SAS seems to think they are going to have to work with R in the future. There is already some integration in SAS/IML studio, and I think that is going to get tighter.
The evening brought a couple of reunions and business meetings, including the UNC reunion (where it sounds like my alma mater had a pretty good year in terms of faculty and student awards and contributions) and the statistical computing and graphics sections, where I met some of my fellow tweeters.
On Tuesday, I went a little out of my normal route and attended a session on functional data analysis. This is one area I think we biostatisticans could use more ideas. Ramsay (who helped create and advance the field) discussed software needs for the field (with a few interesting critques of R), and two others talked about two interesting applications to biostatistics, including studying cell apoptosis and brain imaging study of lead exposure. On Wednesday afternoon, we discussed patient population segmentation and tailored therapeutics, which is I guess an intermediate step between marketing a drug to everybody and personalized medicine. I think everybody agreed that personalized medicine is the direction we are going, but we are going to take a long time to get there. Patient segmentation is happening today. Tuesday night brought Revolution Analytics's big announcement about their commercial big data package for R, where you can analyze 100 million row datasets in less than a minute on a relatively cheap laptop. I saw a demo of the system, and they even tried to respect many of the conventions in R, including the use of generic functions. Thanks to them for the beeR, as well. Later on in the evening brought more business meetings. I ended up volunteering for some work for next year, and I begin next week.
On Wednesday, I attended talks on missing data, vaccine trials and practical issues in implementing adpative trials. By then, I was conferenced out, having attended probably 10 sessions over 4 days, for a total of 20 hours absorbing ideas. And that didn't include the business part.
I will present some reflections on the conference, including issues that will either emerge or continue to be important in statistical analysis of clinical trials.
Tuesday, August 3, 2010
The first part of the Joint statistical meetings for 2010 has come and gone, and so here are a few random thoughts on the conference. Keep in mind that I have a bias toward biostatistics, so your mileage may vary.
Vancouver is a beautiful city with great weather. I enjoy watching the sea planes take off and land, and the mountainous backdrop of the city is just gorgeous. Technology has been heavily embraced at least where the conference is located, and the diversity of the people served by the city is astounding. The convention center is certainly the swankiest I've ever been in.
The quality of the conference is certainly a lot higher than previous conferences I've been to, or perhaps I'm just better about selecting what to attend.
- The ASA's session on strategic partnerships in academia, industry, and government (SPAIG) was not well-enough attended. I think these partnerships are going to be essential to the best way to conduct scientific research and the data analysis coming out of and going into that research. Presentations included reflections on the ASA's strategic plan from a past president, and efforts for the future coming from the incoming president-elect Bob Rodriguez. I wish everybody could have heard it.
- The 4pm session on adaptive designs was very good. This important area (for which I enthusiastically evangelize to my company and clients) advances, and it is good to see some of the latest updates.
- Another session I attended had a Matrix theme, in which we were encouraged to break out of a mind prison by reaching out to those in other disciplines and making our work more accessible. The session was packed, and it did not disappoint. It may seem like an obvious point, but it does not seem to be emphasized in education or even on the job.
- Another session focused on segmenting patient populations for tailoring therapeutics. A lot of good work is going on in this area. We are not able to do personalized medicine yet despite the hype, but tailored therapeutics (i.e. tailoring for a subgroup) is an intermediate step that is happening today.
- At the business meeting on statistical computing and graphics I meet some of my fellow tweeters. I am very pleased to make their acquaintance.
There are other themes too. R is still huge, and SAS is getting huger. This all came together in Jim Goodnight's talk on the importance of analytics and how the education system needs to support it. His tone seemed to exhibit a begrudging acceptance of R. (I'll get into philosophizing about SAS and R another time.) Revolution Analytics is addressing some of the serious issues with R, including high performance computing and big data, and this will be certainly something to follow.
Hopefully the second half will be as good as the first half.