Wednesday, August 11, 2010

The curse of missing data, and potential regulatory action

Missing data is taken for granted now in clinical trials. This issue colors protocol design, case report form (CRF) development, monitoring, and statistical analysis. Statistical analysis plans (SAPs) must have a section covering missing data, or they are incomplete.

Of course, the handling of missing data is a hard question. Though Little and Rubin came out with the first through treatise in 1976 (link is to second edition), the methods to deal with missing data are hard enough that only statisticians understand them (and not very well at that). Of particular interest is the case when missing data depends on the what the value would have been had it been observed, even after conditioning on values you have observed (this is called "Missing not at random" [MNAR] or "nonignorably missing data"). Methods for dealing with MNAR data are notoriously difficult and depend on unverifiable assumptions, so historically we biostatisticians have relied on simple, but misleading, methods such as complete case analysis, last observation carried forward, or conditional mean imputation (i.e. replace with some adjusted mean or regression prediction).

The FDA has typically balked at these poor methods, but in the last few years has started to focus on the issue. They empaneled a group of statisticians a few years ago to research the issue and make recommendations, and the panel has now issued its report (link when I can find it). This report will likely find its way into a guidance, which will help sponsors deal more intelligently with this issue.

For now, the report carries few specific recommendations for methods and strategies for use, but the following principles apply:

  • everything should be prespecified and then executed according to plan
  • distinction should be made between dropouts and randomly missed visits
  • single imputations such as LOCF should be avoided in favor of methods that adjust the standard error correctly for the missing data
  • any missing data analysis should include a sensitivity analysis, where alternate methods are used in the analysis to make sure that the missing data are not driving the result (this still leaves open a huge can of worms, and it is hoped that further research will help here).

It's time to start thinking harder about this issue, and stop using last observation carried forward blindly. Pretty soon, those days will be over for good.

From my JSM 2010 notes on the topic

Posted via email from Randomjohn's posterous