1 STATISTICAL ANALYSIS OF SAFETY DATA IN LONGTERM CLINICAL TRIALS Tailiang Xie, Ping Zhao and Joel Waksman, Wyeth Consumer Healthcare Five Giralda Farms, Madison, NJ 794 KEY WORDS: Safety Data, Adverse Event, Statistical Analysis, Multivariate Survival Analysis, Clinical Trial. 1. Introduction In the past 4 years, approximately 15 drugs worldwide have been withdrawn from market for safetyrelated reasons since the withdrawal of thalidomide in A recent article claimed that serious adverse events (AEs) are now between the fourth and sixth leading cause of death in the United States. 2 In the ICHE9, it is clearly stated that in all clinical trials, the evaluation of safety and tolerability of a drug constitutes an important element of the overall benefit/risk assessment. 3 Over the past decades, although much research has been done and many statistical methods have been utilized for analyzing efficacy data, not much has been utilized directly for the analysis of safety data. One of the reasons might be the complicity of safety data, particularly the data from longterm trials. Unlike efficacy analysis, the Type II error is usually of more concern in the safety analysis, especially for serious AEs. Therefore, more powerful statistical methods should be used. To date, most of the analyses of safety data are routine and exploratory in nature. Listing of AEs and summary of the crude rate are among the most commonly used analyses in regulatory submissions and publications. The crude rate is defined as the number of subjects with the event occurring at any time during exposure, divided by the total number of subjects exposed, regardless of duration of exposure, and it is usually analyzed via Fisher s exact test. This crude rate method may be efficient and sufficient for acute drug use in singledose trials because the duration of drug exposure is usually short and equal among subjects 4. However, for longterm clinical trials with chronic and variable exposures and multiple incidences of AEs, the crude rate method could be invalid and misleading. There are at least three drawbacks associated with the crude rate method for longterm trials. First, because multiple incidences of an AE are counted only once for each subject during the entire trial, the crude rate method could be misleading without reference to chronic exposure of medication and multiple incidences of the same AE. Second, the crude rate method does not take into account important covariates that may have potential impacts on AE, such as demographic and baseline characteristics. Finally, the crude rate method is mostly underpowered for evaluating AE data. The purpose of this paper is to compare and discuss alternative methods for AE analysis. Methods to be discussed are the Cochran MantelHaenszel (CMH) test, Poisson regression and two multivariate survival analysis models, AndersonGill multiplicative hazards model (AG model) 5 and proportional mean model (PM model) 6. The relative performance of these procedures in terms of power and Type I error rate will be evaluated using simulation studies. 2. Aspects of Safety Data in Longterm Trials Longterm clinical trials have a number of unique aspects important to AE assessment. Compared to singledose trials, the drop out rate is usually higher in longterm trials, which results in variable duration of exposure time. Consider a longterm clinical trial illustrated in Figure 1, where there were 1, subjects enrolled in a oneyear clinical trial. Most of the subjects remained exposed during the first three months. Approximately half of the subjects dropped out after six months and only a quarter of the subjects remained active at the end of the trial. 3824
2 Figure 1. Exposure Pattern for Subjects Followedup Month of Exposure A third aspect of AE data in longterm trials is that a subject may experience multiple incidences of the same AE. Consider a 1day clinical trial, in which there were a total of 85 incidences of an AE reported by 98 subjects. A useful graphical method to show multiple AE incidences within subjects is depicted in Figure 3, where each horizontal line represents the time course of a subject who was in the trial and the dots on the line represent the onset of AE incidences for the subject. Figure 3. Example Data with Multiple Incidences of AE Number of Events A second aspect is that the time pattern of an AE might be different for different drugs. Figure 2 illustrates an example in which the AE rate was approximately the same among three treatment groups. For Drug A, most of the AE incidences occurred at the beginning of the treatment and the number of AEs gradually declined until Month 6 and eventually disappeared. This may indicate a gradual buildup of tolerance to Drug A. For Drug B, most of the AE incidences were distributed evenly during Months 16. For Drug C, there was no AE incidence until Month 3. The AE rate then peaked around Months 68. This may indicate a cumulative toxicity of Drug C. Although the crude rates were the same for all three drugs, the time course when the AE occurred was completely different. 5 Figure 2. Pattern of AEs for Three Drugs Month of Exposure Drug A Drug B Drug C Subjects Study Days Finally, a fourth aspect of longterm AE data is the dependence among AEs within a subject, in the sense that a subject who had an AE is likely to have the same AE again. For instance, multiple incidences of somnolence can be induced by certain antihistamine drugs. These aspects undoubtedly bring challenges in assessing safety and tolerability of a drug for longterm use. Clearly, the conventional crude rate method is subject to bias in analysis of longterm safety data because it throws out a great deal of information. Methods capable of capturing these aspects should be used. In the next section, the crude rate method, CMH test, Poisson regression, AG model and PM model will be compared. 3. Comparison of Statistical Methods Among the methods compared, the AG model 5 6 and PM model are relatively recent advancements of multivariate survival analysis, and will be summarized briefly. 3825
3 Let N(t) be a counting process representing number of events occurring over the time interval [, t] (t ). Assuming that N(t) is a nonhomogeneous Poisson process and letting Λ Z (t) be the cumulative intensity function of N(t), conditional on a pdimensional covariate process Z, then the AG model takes the form of β T Z ( t) Λ ( t) = e Λ ( t), Z where Λ is an unspecified continuous intensity function and β is a pvector of regression parameters. By contrast, the PM model does not impose the Poisson assumption on N(t) and takes the form of β T Z ( t ) m ( t) = e m ( t), Z where m z (t)=e{ N(t) Z }, a mean function of N(t). If N(t) is a nonhomogeneous Poisson process, then the mean function is the cumulative intensity function. Therefore, the PM model is an extension of the AG model. By assuming a nonhomogeneous Poisson structure of N(.), the AG model essentially assumes independence among events. The naïve variancecovariance estimator (i.e. the Fisher information matrix) is used. However, the PM model allows dependence among events. The robust variancecovariance estimator (i.e. the sandwich estimator) is used. The crude rate method and CMH test are nonparametric; while the Poisson regression and AG model are parametric, in that they assume the number of events follows a Poisson distribution. The PM model is a semiparametric method, in that it does not assume a specific distribution of N(t), however, it does assume a proportional structure of the mean functions between treatment groups. Among the compared methods, the PM model is the only method that is capable of capturing all of the mentioned aspects of safety data in longterm trials. A comparison of the methods is summarized in Table 1. Table 1. Comparison of Statistical Methods Aspect Consideration Method PA MT TE DP CE DE Crude No No No No No No CMH No Yes No No No No Poisson Yes Yes No No Yes No AG Yes Yes Yes No Yes Yes PM Yes* Yes Yes Yes Yes Yes *PM model is only semiparametric. PA: Parametric assumption. MT: Multiplicity. TE: Time to event. DP: Dependence. CE: Covariate effect. DE: Duration of Exposure. 4. Simulation Results A number of simulation studies were conducted to assess the performance of these methods. We focused on the following aspects of performance: power, Type I error and robustness. The number of events was first simulated via a Poisson distribution for power and Type I error assessment. To assess robustness, the number of events was then simulated via a contaminated Poisson distribution. All simulations were performed using SAS Version 6.12 (SAS Institute, Cary, NC). For Poisson regression, the SAS GENMOD procedure was used. For both the AG and PM models, the SAS PHREG procedure was used with counting process data input. For the PM model, the SAS IML procedure also was used to compute the robust covariancevariance estimator. Data with Poisson Distribution In order to assess power, the AE data sets were generated, in which the number of AEs followed Poisson distributions with rates of 1 and 1.5, and the time to AE had marginal exponential distributions with medians of 1 and 1.5 days for the reference and test treatment groups, respectively. The correlation coefficients ρ among times to AE were.,.4,.6 and.9. There were 1 subjects in each treatment group. For the Type I error assessment, the AE data sets were generated in the same manner, except that the number of AEs followed Poisson distributions with rates of 1, and the time to AE had marginal exponential distributions with medians of 1 day for both of the reference and test treatment groups. 3826
4 The simulated data were then analyzed using each of the five methods. The simulation was repeated 1, times for each combination of correlation and method, and the proportion of times the null hypothesis of equal treatment AE rates was rejected at the.5 level was recorded. Table 2 has the results. Table 2. Comparison of the Simulated Power and Type I Error ρ Method Power Type I Error. Crude CMH Poisson AG PM Crude CMH Poisson AG PM Crude CMH Poisson AG PM Crude CMH Poisson AG PM Unsurprisingly, the crude rate method had the lowest power among all methods compared. All other methods had reasonably high power, especially the AG model, which took advantage of meeting all its model assumptions and utilizing almost all the information in the data. The PM model was the second most powerful method among all methods compared. For those methods that do not take into account the dependence of AEs within subject, the power was stable. For the PM model, however, as expected the power decreased as the withinsubject correlation increased. As shown in Table 2, all methods had reasonable Type I error rate, except for the AG model that had a Type I error rate in a range of To further investigate why the AG model inflates the Type I error, we performed additional simulations as shown in Table 3. Table3.TypeIerrorforAGmodel ρ N=2 N=3 N=ranpoi() N: the number of events per subject. N=ranpoi() means the number of events following a Poisson distribution. For the cases with fixed number of events, the Type I error inflation may be due to the increasing of withinsubject correlation in the simulated data. For the case with random number of events, the inflation may be due to not only the increasing of withinsubject correlation in the simulated data, but also the violation of the proportionality property caused by the randomness of the number of events. NonPoisson Distribution Data In this simulation study, we attempted to assess the robustness of these methods under the non Poisson situation, especially for the Poisson regression and the AG model. The AE data sets were generated in a similar manner as above, in which the time to AEs was simulated with the same distribution. The number of AEs, however, was contaminated by a random noise added to the Poisson distribution in order to create overdispersion to the data. The simulation was run for 1, times for each combination of correlation and method. Because the results were consistent across different correlation coefficients, only the results for correlation of.6 are displayed. Table 4. Comparison of Power for Non Poisson Data (ρ =.6) Method Simulated Power Poisson Regression.28 AG.898 PM
5 As shown in Table 4, the powers for the Poisson regression and AG model were decreased. A severe impact was observed on the Poisson regression. For the AG model, although the Poisson condition was not held, the difference between the median times to event helped to lessen the power loss. Since the PM model is distribution free, its power remained essentially unchanged. 5. Discussion and Conclusion Since failure in detecting elevated AE rates related to a study drug could result in severe consequences, the Type II error is usually of more concern than the Type I error especially for serious AEs, in the safety analysis. The usual crude rate method is severely underpowered for longterm clinical trials. Thus it should not be used routinely for confirmatory analysis, especially for analyzing serious AEs. In addition, the crude rate method cannot factor in the effects of covariates that may affect AEs. The CMH test and Poisson regression are relatively powerful. However, neither of the methods can take into account the time course of AEs and the dependence of recurrent AEs within subjects. In addition, the Poisson regression can be severely impacted by violation of its parametric assumption. fairly easy to implement it in major software, such as SAS (PHREG and IML) and SPlus (COXPH). Thus, the PM model has the best overall performances in terms of power, Type I error rate, robustness and capacity of capturing all aspects of safety data in longterm clinical trials. References 1. SprietPourra C, Auriche M. (1994). Drug withdrawal from sale. 2 nd edition. Scrip Reports. Richmond, England: PJB Publications Ltd. 2. Lazarou J, Pomeranz BH, Corey PN. (1998). Incidence of adverse drug reactions in hospitalized patients. JAMA; 279: ICHE9. 4. O Neil RT. (1987). Statistical analysis of adverse event data from clinical trials. Special emphasis on serious events. Drug Inf. J.; 21: Anderson, P.K. and Gill, R.D. (1982). Cox s Regression Model for counting processes: a large sample study. The Annals of Statistics, 1, Lin, D.Y., Wei, L. J., Yang, I. And Ying, Z. (2). Semiparametric regression for the mean and rate functions of recurrent events. J. R. Statist. Soc. B; 62, Part 4, pp The AG model has the highest power when all conditions pertaining to its model assumption are met. However, it does not take the dependence into account. As indicated by the Table 3, it should be used with caution when the number of events was widely divergent or the time to events were clustered type data. The PM model has reasonable Type I error rate, while its power is only slightly lower than that of the AG model. It is capable of capturing all aspects of longterm safety data. In addition, it is built under a regression framework and thus able to take into account the effects of covariates such as baseline and demographic characteristics. Unlike the AG model and Poisson regression, it is robust to the violation of the underlying distribution assumption. Additionally, because it models the mean instead of intensity function as in the AG model, it is intuitive and easily interpreted. Finally, it is 3828
More information