Sunday, November 7, 2010

Having data is not the same as using data--what a statistician reviews on case report forms

I have seen three different philosophies of statistical review of case report forms:
  • None at all, either because the statistician never sees it or doesn't know what to look for
  • Some review, where the statistician sees the case report form and the developer ignores the comments
  • Full review
Of course, I believe the latter case is the best, but I've seen the first two much more. In those two scenarios, the lack of attention to these details endangers the trial's ability to fulfill protocol objectives. For example, I have seen cases where the sponsor had to forgo some very important analyses because the data were not collected in such a way that it could be analyzed appropriately.
When I review a case report form, I have in mind how the data is going to be displayed in the final analysis, perhaps even the SAS or R code to perform the analysis (assuming the trial is in the therapeutic areas I am most used to). I can do this because I have been involved in large number of trials now, and I have seen the common threads over a wide range of trials and also within a few therapeutic areas. I know all trials will need an enrollment and disposition table, a background and demography table, efficacy tables, adverse event tables, and laboratory tables, for instance. I also know if the trial requires electrocardiogram measurements (because I've fully reviewed the protocol!). These are tables I've produced over and over, and about 90% of the elements are the same from table to table. Therefore, I can tell if a case report form and the validation specifications of the case report form are capable of producing the required analysis.
The following are some examples of what I review in case report forms:
  1. The ability to calculate complex composite endpoints
    All components need to be present and accessible for any composite endpoints we need to calculate and analyze. For example, disease progression in oncology trials is often complex and difficult to calculate using a SAS program, whether it is based on RECIST or some other working group criteria. To complicate matters, the time to disease progression may be not observed or not observable. For example, if a subject completes the full course on study without progressing, the disease progression is not observed. If the subject discontinues from the study and begins a new treatment, the disease progression may be considered unobservable in some studies, but in other studies the disease progression may be followed up.
  2. Whether endpoints for analysis are captured in a structured way
    With the exception of adverse events and concomitant medications, any data that is going to be summarized or analyzed should be collected in a structured field. To see why, let's look at the exceptions. The medical coding dictionary Medical Dictionary for Regulatory Activities (MedDRA) is used to have the best of both worlds for adverse events: investigators can write the adverse event any which way, and the adverse events can be summarized in a meaningful way. However, it usually requires two subscriptions to the MSSO (one for the clinical research organization performing the coding and one for the sponsor) each at over $10 thousand per year in addition to the labor cost of an MD. Thus, we spend a lot of money and effort being able to analyze free text. (There are other advantages to MedDRA, as well.) For specialized endpoints, it is better to use planning and creativity in collecting the data in a way to make it usable than cut corners on the data collecting.
  3. Whether collection of laboratory and other external data is reconcilable and analyzable
    Sometimes lab data is recorded on the case report form, in which case everything is ok as long as the data is structured. Sometimes, however, data is sent directly from the laboratory to the data management or statistics group, in which case it is preferable to reconcile the collection dates and times on the case report form with the dates and times in the database. The best way to do this is record requisition or accession numbers on the case report form.
This is not an exhaustive list, but it should give a flavor of the kinds of issues occurring in the case report form that can ruin an analysis.