The role of data mining in pharmacovigilance

Transcription

1 Review General The role of data mining in pharmacovigilance 1. Introduction 2. Mining spontaneous reporting system data: theory 3. Mining spontaneous reporting system data: practice 4. Clinical versus computational approaches 5. Conclusion 6. Expert opinion For reprint orders, please contact: Ashley Publications Manfred Hauben, David Madigan, Charles M Gerrits, Louisa Walsh & Eugene P Van Puijenbroek Takeda Global Research and Development, Inc., Department of Pharmacoepidemiology and Outcomes Research, Lincolnshire, Illinois, USA A principle concern of pharmacovigilance is the timely detection of adverse drug reactions that are novel by virtue of their clinical nature, severity and/or frequency. The cornerstone of this process is the scientific acumen of the pharmacovigilance domain expert. There is understandably an interest in developing database screening tools to assist human reviewers in identifying associations worthy of further investigation (i.e., signals) embedded within a database consisting largely of background noise containing reports of no substantial public health significance. Data mining algorithms are, therefore, being developed, tested and/or used by health authorities, pharmaceutical companies and academic researchers. After a focused review of postapproval drug safety signal detection, the authors explain how the currently used algorithms work and address key questions related to their validation, comparative performance, deployment in naturalistic pharmacovigilance settings, limitations and potential for misuse. Suggestions for further research and development are offered. Keywords: data mining, disproportionality, drug safety, pharmacovigilance Expert Opin. Drug Saf. (2005) 4(5): Introduction Increasing scientific, regulatory and public scrutiny is focused on the obligation of the medical community, pharmaceutical industry and health authorities to ensure that marketed drugs have acceptable benefit risk profiles. This is an intricate and ongoing process that begins with careful preapproval studies, but continues after regulatory market authorisation when the drug is in widespread clinical use. In the latter environment, surveillance schemes based on spontaneous reporting system (SRS) databases are a cornerstone for the early detection of drug hazards that are novel by virtue of their clinical nature, severity and/or frequency. Pharmacovigilance is often used to describe the aforementioned surveillance activities. Early hints that suggest the possibility of novel adverse events (AEs) are often referred to as signals, although considerable ambiguity surrounds the use of this term. The following definition of a signal will be employed: A set of data constituting a hypothesis that is relevant to the rational and safe use of a medicine. Such data are usually clinical, pharmacological, pathological or epidemiological in nature. A signal consists of a hypothesis together with data and arguments [1]. Computational signal detection algorithms hereafter referred to as data mining algorithms (DMAs) may assist pharmacovigilance domain experts to discover potentially relevant drug event associations (DEAs). Data mining is the process of seeking interesting or valuable information within large data sets [2]. Myriad data mining applications exist in healthcare areas as diverse as medical imaging, gene expression analysis and nursing [3-5]. In addition, data mining is also being explored in databases that exist principally for purposes other than knowledge discovery (e.g., / Ashley Publications Ltd ISSN

2 The role of data mining in pharmacovigilance claims databases or [electronic] medical records). By contrast, SRS databases exist precisely to facilitate early discovery of potentially dangerous associations. In recent years, data mining in pharmacovigilance has attracted significant attention and has the potential to discover complex interactions that defy human recognition. The marriage of computer-intensive data mining algorithms with pharmacovigilance domain expertise represents a promising alliance. The authors objective is to critically assess DMAs that are being used increasingly by pharmacovigilance specialists to explore postapproval safety databases. First, the authors provide a concentrated overview of the nature of SRS databases, including their strengths, limitations and interpretive nuances for purposes of signal detection in general. The authors then develop an intuitive theoretical framework by which pharmacovigilance specialists may better appreciate the mechanics of DMAs and feel comfortable with interpreting their output. Finally, the authors delve into a number of issues that the budding data miner in pharmacovigilance needs to consider when deploying DMAs in real-life pharmacovigilance settings and make recommendations for use and further research. As a signal is usually considered to be more than just a statistical association [1], the authors hereafter use the term signal of disproportionate reporting (SDR) when discussing statistical disproportionalities in SRS databases without clinical, pharmacological and/or (pharmaco)epidemiological context [6]. The intention of the aforementioned terminology is to emphasise that DEAs highlighted with DMAs indicate differential reporting of possible reactions, not necessarily indicative of differential occurrence. 1.1 Spontaneous reporting systems and signal detection Pharmaceutical companies, health authorities and drug monitoring centres use SRS databases for global screening for unforeseen AEs after regulatory authorisation for use in clinical practice. The precise details of each SRS differ in terms of size and scope, statutory reporting mandates, surveillance selectivity or intensity and organisational structure. Prominent SRSs include the Adverse Event Reporting System (AERS) of the US FDA [101], the Yellow Card Scheme of the Medicines and Healthcare Products Regulatory Agency (MHRA) [102], and the international pharmacovigilance programme of the World Health Organization (the WHO Uppsala Monitoring Centre) [103]. These, and other systems, were created to provide early warnings of possible safety problems that would be difficult to detect during clinical drug development because of the power limitations, constricted range of demographics, exclusion of patients with extensive co-morbid illnesses and co-medications, and limited duration of follow-up, characteristic of clinical trials. The first step in signal detection is the submission of case reports of suspected adverse events to pharmaceutical companies and health authorities by healthcare professionals and patients. Although legally required in some countries, there is de facto voluntary reporting for all but pharmaceutical manufacturers; this introduces differential reporting of AEs. The literature surveying the factors that influence reporting behavior provides potential opportunities for process improvements that are beyond the scope of this article [7-13]. The next step is review of these reports by a professional capable of accurately classifying and coding AE terms and recognising potentially serious events in reports that do not contain the usual regulatory flags for seriousness (i.e., death, immediately life threatening, hospitalisation/prolongation of hospitalisation etc.). This facilitates signal detection at the case level and reduces data corruption (e.g., inaccurate coding of reported AEs) at the level of individual records that would compromise statistical approaches based on aggregate data. The initial intake assessment is most useful in detecting signals of so-called designated medical events (DMEs). These are AEs considered rare, serious and associated with a high drug-attributable risk and constitute an alarm with as few as 1 3 reports [14]. Typical examples include Stevens-Johnson syndrome, toxic epidermal necrolysis, hepatic failure, anaphylaxis, agranulocytosis, aplastic anaemia and torsade des pointes. Other events of special interest, sometimes also called targeted medical events (TME), associated with particular drugs and/or patient populations may be monitored in a similar fashion. From a public health and a regulatory perspective, the majority of reports in SRS databases represent noise, because the reports are associated with treatment indications (i.e., confounding by indication), co-morbid illnesses, protopathic bias, channeling bias and/or other reporting artifacts, or the reported adverse events are already labelled or are medically trivial. Signals from non-dme reports may not begin to become cogent until cumulative reports generate a pattern that stands out from the background noise. Thus, the next step in traditional signal detection is manual review of lists of reported AE frequencies, and/or comparison of reporting rates to background incidence rates deduced from external longitudinal databases. DEAs of intermediate frequency or seriousness might be discounted, based on manual review of AE lists, especially if a pharmacologically plausible explanation is not readily apparent. More complicated reports, such as with drug drug interactions and drug-induced syndromes, may be especially resistant to detection, because the reviewer would have to cognitively link multiple, separately listed drugs and/or adverse events. Historically, discerning factors suggestive of a possible ADR for non-dme reports has drawn on the clinical and pharmacoepidemiological acumen of the prepared mind [15]. However, as the number of reports submitted to postlicensing safety databases continues to grow, statistical systems have been developed as adjunctive tools to assist the prepared mind in identifying possible ADRs. Different statistical approaches in pharmacovigilance include disproportionality analyses, sequential probability ratio tests, 930 Expert Opin. Drug Saf. (2005) 4(5)

3 Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek correlation analyses and multivariate regression. Most of the published experience, to date, has been with so-called disproportionality analyses, a major focus of this article. Although the precise operational details of each disproportionality algorithm vary, they all calculate surrogate observed-to-expected ratios in which the reporting experience of each reported drug event combination (DEC) is compared to the background reporting experience across all/most drugs and events using an independence model [16-20]. In the appropriate clinical context, DECs that stand out statistically against the background reporting experience may reflect credible signals warranting additional investigation. If there is sufficient correlation between these statistical metrics and novel causal associations, these tools could improve drug safety monitoring. However, as discussed in detail below, many current disproportionality analysis methods have the potential to perform poorly in real world databases and, therefore, the authors encourage the development of new methods to address some of the existing limitations. To fully understand these issues, the authors first delve into the theory behind these tools by looking under the hood of the commonly used DMAs. 2. Mining spontaneous reporting system data: theory SRSs receive reports that consist of one or more drugs, one or more AEs, and possibly some basic demographic information (in addition to text data). Over time, SRS databases emerge that contain thousands or even millions of these reports. Notwithstanding several well-documented data limitations (see Section 1), SRS databases represent a primary data source for evaluating drug safety. Certainly, these databases may play a role in the investigation of specific associations. Here, however, the authors are concerned with harnessing the databases to detect previously undiscovered associations. The field of data mining focuses on problems of this nature and the last few years have seen some useful interplay between the data mining and drug safety communities. Beyond the data quality issues already alluded to, analysis of SRS databases presents some immediate challenges. The Med-DRA adverse event coding system includes > 16,000 distinct preferred terms (PTs). The number of licensed drugs is of the same order of magnitude. Thus, SRS databases resemble spreadsheets with one row per report and 30,000 columns. Table 1 shows a conceptual representation of a typical entry. Multivariate statistical analysis of high-dimensional data of this sort can present significant difficulties. Nonetheless, progress in domains such as gene expression analysis and text categorisation is directly relevant and is discussed in Section Contingency tables and disproportionality measures A number of approaches have emerged in recent years that search SRS databases for interesting associations. Most such algorithms (e.g., gamma-poisson shrinker [GPS], multi-item gamma-poisson shrinker [MGPS], proportional reporting ratios [PRR], reporting odds ratios [ROR], Bayesian Confidence Propagation Neural Network [BCPNN]) focus on low-dimensional projections of the data, typically two-dimensional contingency tables. Table 2 shows a typical (fictitious) table. The explanation of the above names will become clear below. The number of such tables in a 15,000 drug name 16,000 AE SRS database is 240 million, so enumeration of all possible tables is tedious, yet still feasible on standard hardware. The basic data-mining task then is to rank order the tables in order of interestingness and report some subset of the DECs as worthy of further human investigation. Most authors use some statistical measure of association as their measure of interestingness. Many such measures exist and their statistical properties for hypothesis testing vary. GPS focuses on the relative risk (RR) (the authors note that this is an unfortunate term in the data mining literature given that SRS data were never intended for, and cannot be used to calculate incidences and, consequently, relative risks. Alternative terminology, such as relative reporting ratio, is preferred). The RR for the drug i adverse event j combination (RR ij ) is the observed number of occurrences of the combination (20 in Table 2) divided by the expected number of occurrences. GPS computes the expected value under a model of independence. Specifically, in the example above, overall, AE j occurs in 10% of the reports (120 out of 1200). Thus, if drug i and adverse event j are stochastically independent, 10% of the reports containing drug i should include AE j, that is 12 reports in this case. Thus, the RR for this example is 20/12 or 1 2/3; this combination occurred 67% more often than expected. Some analysts use 2 as an threshold of interestness, and, hence, would not report this combination as a candidate for further human investigation. Natural (though not necessarily unbiased) estimates of various probabilities emerge from tables such as Table 2. For example, one might estimate the conditional probability of AE j given drug i by a/a+b (i.e., 20/120 in the example of Table 2). That is, the observed fraction of drug i reports that listed AE j. Table 3 lists the formulae for the various measures of association in common use, along with their probabilistic interpretation. Here drug for example, denotes the reports that did not list the target drug. PRR is the proportional reporting ratio, ROR is the reporting odds ratio, and IC is the information component defined by the WHO Uppsala Monitoring Centre [16,18,21]. All four of these measures make sense; in each case, a particular drug that is more likely to cause a particular AE than some other drug will typically receive a higher score. Similarly, if an AE and a drug are stochastically independent, all measures will return a null value. However, all four methods are subject to sampling variability (i.e., a different set of AE reports from the same population will not give exactly the same value of the measure of association). This may particularly be the case with large, sparse Expert Opin. Drug Saf. (2005) 4(5) 931

4 The role of data mining in pharmacovigilance Table 1. A conceptual representation of a typical entry in an SRS database. Age Sex Drug 1 Drug 2 Drug 15,000 AE 1 AE 2 AE 16, Male No Yes No Yes No Yes AE: Adverse event; SRS: Spontaneous reporting system. Table 2. A fictitious 2-dimensional projection of an SRS database. databases. Due to the law of large numbers, this statistical variability diminishes as the sample size increases. In the SRS context, however, the count in the a cell is often small, leading to substantial variability (and hence uncertainty about the true value of the measure of association) despite the often large numbers of reports overall. Consider, for example, Tables 4 and 5, showing 2 x 2 tables that differ merely by a single extra report. Table 6 shows the various measures for Tables 4 and 5. Notwithstanding the somewhat large sample size of almost 1200, the addition of one extra report doubles, or almost doubles all the measures. Table 7 shows a table involving an uncommon drug and a rare AE where a single report results in an RR of >1000! The essential problem in each case is that the standard error of the measure of association is large. Large standard errors mean that the measure of association is unreliable; even though the observed measure of association might be AE j = yes AE j = no Total Drug i = Yes a = 20 b = Drug i = No c = 100 d = Total AE: Adverse event; SRS: Spontaneous reporting system. Table 3. Common measures of association for 2 x 2 tables in SRS analyses. Measure of association Formula Probabilistic interpretation Relative risk* a * (a + b + c + d) (a + c) * (a + b) Proportional reporting ratio a/(a + b) c/(c + d) Reporting odds ratio a/c ---- b/d Information component a * (a + b + c + d) Log (a + c) * (a + b) Pr(ae I drug) Pr(ae) Pr(ae I drug) Pr(ae I drug) Pr(ae I drug)/pr( ae I drug) Pr(ae I drug)/pr( ae I drug) Pr(ae I drug) Log Pr(ae) * The preferred terminology would be relative reporting ratio (see text). Information component is used in the frequentist sense in this table (as a crude measure), but is formulated as a Bayesian metric in the BCPNN. BCPNN: Bayesian confidence propagation neural network; SRS: Spontaneous reporting system. large, it could be that the true measure is actually small (or vice versa). The pharmacovigilance literature describes different approaches to deal with this difficulty. The most straightforward approach is to estimate a standard error for the measure of association. Roughly speaking, one expects the observed measure of association to be within two or three standard errors of the corresponding true measure of association. Therefore, for example, a PRR of 4 with a standard error of 3 is not as interesting as a PRR of 4 with a standard error of 0.2. Three difficulties arise with this general approach. First, formulae for standard errors justify themselves via asymptotic arguments that may not apply in tables with small counts. Second, the interpretation of the standard error relies on the notion of a true population measure of association and a repeatable random sampling mechanism these may not make any sense in an SRS context. Third, standard errors 932 Expert Opin. Drug Saf. (2005) 4(5)

5 Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek Table 4. 2 x 2 table of counts for drug D1 and adverse event AE. AE = yes AE = no D1 = Yes a = 1 b = 100 D1 = No c = 5 d = 1080 AE: Adverse event. Table 5. 2 x 2 table of counts for drug D2 and adverse event AE. AE = yes AE = no D2 = Yes a = 2 b = 100 D2 = No c = 5 d = 1080 AE: Adverse event. Table 6. Measures of association for Tables 4 and 5. Measure Drug D1 Drug D2 Proportional reporting ratio Reporting odds ratio Information component Relative risk Table 7. 2 x 2 table of counts for drug D3 and adverse event AE. AE = yes AE = no D3 = Yes A = 1 b = 7 D3 = No C = 3 d = AE: Adverse event. shrink with increasing sample size, so this approach tends to overemphasise common DECs. A second related approach conducts statistical hypothesis tests in the 2 x 2 contingency table. Common tests include chi-square tests with or without small cell count adjustments and Fisher s exact test. These run afoul of the same difficulties that are associated with the standard error-based approach. A third approach involves Bayesian shrinkage (thus the word shrinker in the names of certain algorithms). Both (M)GPS and BCPNN use this approach. GPS, for example, places a prior distribution on RRs that encapsulates a prior belief that most RRs are close to a value of one. Only in the face of substantial evidence from the data does (M)GPS return an RR estimate that is substantially larger than one. Thus, for example, an RR of 1000 that derives from an observed count of a = 1 might result in a (M)GPS RR estimate (Empirical Bayesian Geometric Mean or EBGM) of 1.5 (i.e., the crude RR is shrunk towards a value of 1), whereas an RR of 1000 that derives from an observed count of a = 100 might result in a EBGM RR estimate of close to For the specific Bayesian setup that (M)GPS uses, observed counts in excess of 10 result in RR estimates that typically receive essentially no shrinkage although in practice larger differentials have been observed depending on the thresholds used [20,22,23]. All Bayesian statistical analyses begin with a prior distribution for all the unknowns. In the case of (M)GPS, the true RRs are the unknowns. A standard Bayesian analysis specifies the prior distribution before looking at the data. Via Bayes theorem, the data then transform the prior distribution into a posterior distribution. This posterior distribution in a precise sense combines prior knowledge (that the prior distribution encapsulates) with the evidence from the data. As a matter of convenience, prior distributions usually come from some parametric family, such as the normal distribution or the gamma distribution. Encapsulating prior knowledge then amounts to choosing the parameters of these distributions (e.g., the mean and the variance in the case of a normal distribution). One particular Bayesian approach actually uses the data to choose the parameters of the prior. This is the so-called empirical Bayes approach, and although it appears to double-dip in the data, it does enjoy some theoretical support. GPS and MGPS adopt the empirical Bayes approach. Madigan showed that a standard Bayesian analysis (i.e., not the empirical approach) yields similar estimates to (M)GPS and proved somewhat more satisfactory in one small example [24]. Regardless of how the Bayesian analysis handles the prior, the approach produces a posterior distribution for each RR. EBGM is the geometric mean of the posterior distribution. Other summaries are possible. For example, DuMouchel mentions EB05 [25]. This is the fifth percentile of the posterior distribution meaning that there is a 95% probability that the true RR exceeds the EB05. As EB05 is always smaller than EBGM, this, in a sense, adds extra shrinkage and represents a more restrictive choice than EBGM. A number of studies have provided examples where EB05 might be too conservative in the sense that it could result in delayed detection of relevant signals that other disproportionality methods detected much earlier [22,23,26]. The authors note that Bayesian approaches do not receive immunity from concerns about non-random sampling. Furthermore, whereas BCPNN, GPS and its later variant MGPS, provide elegant solutions to an important practical problem, other Bayesian and non-bayesian shrinkage approaches are possible and will, in general, lead to different RR estimates. For instance, one could craft a Bayesian model that leads to the following shrinkage scheme: if the observed count (i.e., the cell a) is equal to 1, divide the RR by 10; if the observed count is between 2 and 4, divide the RR by 5; if the observed count is 5, return the observed RR. The operational characteristics of various possible shrinkage schemes are essentially unknown. Expert Opin. Drug Saf. (2005) 4(5) 933

6 The role of data mining in pharmacovigilance In summary, all of the approaches mentioned, thus far, consider low-dimensional contingency tables that aggregate over very high-dimensional data. The authors contend that the specific measure of association used and the specific statistical method to deal with sampling variability is of secondary importance in the face of some of the issues described below. 2.2 The problem with contingency tables Thus far, the authors have focused on measurement of associations between individual drugs and AEs. Disproportionality analysis can also be used to examine higher-order associations such as between two drugs and one AE, two drugs and two AEs and so on [27-29]. Contemporary DMAs can also incorporate a limited set of demographic variables via stratification, or when logistic regression analysis is applied, RORs can be calculated that are corrected for these demographic variables. In any event, the general approach measures associations in low-dimensional projections of high-dimensional data. Some well-known difficulties with this general approach exist and provide an important focus for the epidemiology literature in general. The authors illustrate one basic difficulty with a simple example. Consider a fictitious drug, Rosinex, that causes nausea. Suppose that 90% of the individuals taking Rosinex experience nausea, whereas 10% of the individuals not taking Rosinex experience nausea. Further, suppose that Rosinex makes one susceptible to eye infections. Consequently, due to standard practice guidelines, 90% of the Rosinex users also take a prophylactic antibiotic called Ganclex, whereas 1% of the non-rosinex users take Ganclex. Ganclex does not cause nausea. Figure 1 shows a causal model that describes the situation. Table 8 shows data that are consistent with this description. Considering only Ganclex and Nausea, the observed count is 82 as compared with an expected value of 18, leading to an RR of > 4. The EBGM score would be similar. Therefore, even though Ganclex has no causal relationship with nausea, the data mining approach based on 2 x 2 tables would generate a Ganclex Nausea SDR. Fram et al. refer to Ganclex as an innocent bystander [19]. The statistical literature sometimes refers to a related phenomenon as Simpson s paradox. This is a simple example of a more general phenomenon. In general, particular patterns of association between observed and unobserved variables can lead to essentially arbitrary measures of association involving the observed variables. These measures can contradict the true unknown underlying causal model that generated the data. 2.3 Multiple regression A multiple regression modelling approach can deal with some of these concerns. In the Rosinex example above, a logistic regression model could be considered: Log [Pr(Nausea)/Pr(Not Nausea)] = β 0 + β 1 x Rosinex + β 2 x Ganclex Here, Rosinex and Ganclex are binary predictor variables, taking values zero (did not take the drug) and one (did take the drug). Maximum likelihood provides one way to estimate the three regression coefficients and, in this case, yields β 1 = 4.4 and β 2 = 0. β 1 represents the expected change in the log-odds of nausea as Rosinex goes from 0 to 1, holding Ganclex constant. Similarly, β 2 represents the expected change in the log-odds of nausea as Ganclex goes from 0 to 1, holding Rosinex constant. In the latter case, this adjusting for Rosinex is key; indeed, within the Rosinex taking group, Ganclex is not associated with nausea and within the non-rosinex taking group, Ganclex is not associated with nausea. Thus, logistic regression provides a satisfactory answer here showing a positive and statistically significant (p < 10-8 ) effect of Rosinex on the reporting structure and no effect at all due to Ganclex. As mentioned above, SRS databases, such as FDA-AERS, can include > 15,000 different drug names (including many redundant drug names). Thus, a regression of a particular AE on all of the drugs, involves simultaneous estimation of 15,000 regression coefficients. This represented an insurmountable barrier until relatively recently. However, approaches based on Bayesian shrinkage have proven successful in applications to gene expression data and text categorisation where several published applications estimated in excess of 100,000 parameters. In their own work, the authors have applied the Bayesian Binary Regression software to AERS and VAERS analyses (the software is available at [104]). The regression approach to data mining in SRS databases, does not, however, provide a totally satisfactory solution. Key limitations include the following: The regression approach just described, builds a separate regression model for each AE presenting a significant multiple comparison challenge and ignoring dependencies between AEs; Regression models adjust merely for measured and recorded factors, such as drugs and demographic covariates, but fail to take account of unmeasured or unrecorded factors, such as health status, or unreported drugs. This issue is discussed below; A model-based approach requires modelling assumptions; the model above assumes linearity which may or may not be appropriate. Nonetheless, for routine data mining in SRS databases, the authors contend that regression approaches have distinct advantages over algorithms that analyse low-dimensional contingency tables, although all are credible additions to the pharmacovigilance toolkit. 2.4 Confounding, causality and propensity scoring The discovery of a drug-induced disorder is a dynamic and often lengthy process that begins with signal detection. Although SRS data can never be used to definitively establish cause-and-effect relationships, it is often used for adjudicating associations with sufficient certainty to make decisions in naturalistic pharmacovigilance settings. In any case, making 934 Expert Opin. Drug Saf. (2005) 4(5)

7 Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek Table 8. 2 x 2 x 2 contingency table from an spontaneous reporting system database that is consistent with these probabilities and with the causal model. Nausea No nausea Total Rosinex Ganclex Rosinex No Ganclex No Rosinex Ganclex No Rosinex No Ganclex Rosinex Nausea Ganclex Figure 1. Graphical causal model. Rosinex causes nausea and also causes individuals to take Ganclex. Taking Ganclex has no effect on the probaility of experiencing nausea. causal inferences is a fundamental downstream goal of SRS data mining. Ultimately, one wants to know if taking a drug (or a combinations of drugs) increases the chance of experiencing some AE (or set of AEs). Confounding is a causal concept closely related to Simpson s paradox and its generalisations. When computing the Ganclex nausea RR, the authors implicitly compared the individuals who took Ganclex with those who did not, and tried to use this comparison to make inference about the causal effect of Ganclex on nausea. However, the true individual level causal effect would compare the nausea status of individuals who took Ganclex with those same individuals had they not taken Ganclex. Population-level causal effects then average over the individual causal effects. This potential outcomes view of causality dates back at least to Neyman [30] (for a recent review see [31]). Because the authors do not have access to individuals who both took and did not take Ganclex (and even if they did, they would have done so at different times and in different circumstances), the individuals who did not take Ganclex are used as surrogates for the Ganclex users had they not taken Ganclex. This is the root of the phenomenon described above the individuals that did not take Ganclex differ from those who did in ways that matter. In particular, Ganclex users are much more likely to have taken Rosinex than non-ganclex users, and, as has been seen, Rosinex causes nausea. This phenomenon is called confounding. Rosinex confounds the relationship between Ganclex and nausea. In the case of Rosinex, a regression model provides one way to deal with the confounding issue. In the case of an unmeasured/unrecorded confounder, such as health status, however, the situation is much less satisfactory. In general, absent assumptions about potential unmeasured confounders, no association can reliably yield a causal interpretation. To quote Cartwright no causes in, no causes out [32]. However, the authors again emphasise that SRS data by themselves can often not provide more than an index of suspicion of novel safety phenomena. As a rule, a further process needs to be initiated using additional data sets that are more reliable from the perspective of causal inference. The randomised clinical trial (RCT) represents the exception to the above rule and plays a central role in the drug licensing process. The key feature of an RCT is that an objective randomised mechanism controls the probability that a subject receives any particular treatment. For example, one could imagine an RCT where a coin flip decides whether a subject receives Ganclex or a placebo. Because of the coin flip, it would be expected, especially with large sample sizes, that the fraction of Rosinex users to be similar in the Ganclex and placebo groups. This contrasts sharply with the above-mentioned example where the Ganclex group contained a preponderance of Rosinex users. In fact, the beauty of the coin flip mechanism is that it is expected that all potential confounders will be roughly equally balanced between the Ganclex and placebo groups, whether they were measured/recorded or not. SRS databases are different from, and complementary to randomised clinical trials and expose themselves to confounding, both from observed factors as well as unobserved factors. Propensity scoring is one framework for formulating causal hypotheses in the face of confounding and examining their consequences. Although propensity scoring has attracted considerable attention in the medical and social sciences [33], the authors are unaware of any prior applications in the SRS context and believe this represents a potentially important future research direction. Here, the authors sketch the basic ideas in the context of drug safety. The propensity-scoring framework comprises of two components one dealing with observed confounders and the other dealing with unobserved confounders. For the first component, the authors begin by assuming there are no unobserved confounders. Suppose, for example, one wants to estimate the potential causal effect of Rosinex on nausea. Because there are no unmeasured confounders, the Rosinex and non-rosinex groups differ systematically only in ways that have been measured. The idea is to construct a Expert Opin. Drug Saf. (2005) 4(5) 935

8 The role of data mining in pharmacovigilance model that predicts whether someone will take Rosinex or not, using all the recorded variables as predictors. The propensity score for an individual is the predicted probability that that individual takes Rosinex. Now, if two individuals match with identical propensity scores, one a Rosinex taker and the other a non-taker, they can be compared as if they had been assigned according to an RCT. Many other details are being ignored here, concerning predictive model construction, optimal matching and multiple testing, but the essential idea is to estimate a causal effect under the (unrealistic) assumption of no unobserved confounders. The second component of the propensity-scoring framework assesses the sensitivity of the causal effect that the first component estimated. The idea is to assess how strong an effect any unmeasured confounder would need to have in order to explain the causal effect. Cornfield et al. s classic work on the smoking-lung cancer connection exemplifies this type of analysis [34]. Specifically, they write: If an agent A with no causal effect upon the risk of a disease, nevertheless because of a positive correlation with some other causal agent B shows an apparent risk, r, for those exposed to A relative to those not so exposed, then the prevalence of B among those exposed to A relative to the prevalence of those not so exposed, must be greater than r [34]. Thus, for some mystery hormone X to explain the nine-fold increase in lung cancer amongst smokers compared to non-smokers, the proportion of hormone X producers among smokers must be at least nine-times greater than that of non-smokers. More recent work extends this line of thinking to binary outcomes and also, crucially, deals with sampling variability. The authors believe that the propensity scoring approach presents exciting possibilities for SRS analysis. Causal effects under the no-unobserved-confounder assumption would represent the primary signalling mechanism with a sensitivity analysis providing second-order uncertainty information. However, its usefulness is contingent upon the completeness and consistency of reporting of all the relevant variables. As discussed, this is often an issue. 2.5 Experience with contingency tables Although the fundamental limitations involved in projecting high-dimensional data onto 2 x 2 contingency tables should be an inspiration for research into new approaches, the authors note that the cumulative experience with disproportionality analysis shows it to be a promising adjunct for safety reviewers confronted with data sets that are difficult to monitor by virtue of their size. The aforementioned experience has occurred across a variety of organisational (health authorities, drug monitoring centres, academia and pharmaceutical companies, see Table 9) and scientific settings (e.g., general and specialised pharmacovigilance settings). Due to space limitation, the authors goal is not to exhaustively list or critically analyse each organisation s experience, but merely to present an informational synopsis of the major published findings in the form of a table. Therefore, the authors emphasise that they do not necessarily agree with the scientific claims made in the referred literature, nor that they believe that all assertions made are always supported by the data. The authors encourage readers to obtain the index articles for critical study using the principles and concepts discussed in this article (Table 9). 3. Mining spontaneous reporting sytem data: practice 3.1 Fundamental considerations Understanding the theoretical basis of commonly used DMAs facilitates thoughtful consideration of crucial operational questions, such as: should data mining be added to routine pharmacovigilance operations? Is there a preferred DMA? What is the optimum positioning of DMAs within a comprehensive pharmacovigilance system that utilises multiple approaches and data sets for signal detection? A sensible starting point is to clearly delineate the improvements that the authors hope to obtain with DMAs. It is important to be clear about this because it is not unusual to hear vague statements that DMAs are useful. If DMAs have value, it is because they achieve one or more of the following pharmacovigilance process improvements: 1) Detection of AEs that would otherwise have gone undetected (this is especially pertinent to higher-order associations because they are difficult to be captured by the human mind; examples are drug drug interactions, drug food association, multiple risk factors for developing an adverse drug reaction etc.); 2) earlier identification of AEs that would have been detected with standard approaches; 3) Detection of the same AEs at the same time, but with greater scientific efficiency (i.e., decreased person-time expended per AE detected); 4) Provision of a safety net against human cognitive lapses. The following questions provide a framework for further assessing the potential need for, or added value of, DMAs for individual organisations: What is the size and scope of existing surveillance activities (numbers of drugs, events, reports) relative to human resources? How rigorous and comprehensive is the current suite of signal detection tools and strategies? Are there significant numbers of medically important DECs that do not meet traditional signalling thresholds for cumulative review despite continuing accumulation of reports? Is there a significant history of delayed recognition of AEs? For example, has a pharmaceutical company been repeatedly alerted to medically important associations involving their drugs by health authorities rather than their own internal pharmacovigilance systems (or vice versa)? The authors emphasise repeatedly because they are all working to the same goal and it is not necessarily unusual that this would happen on occasion. 936 Expert Opin. Drug Saf. (2005) 4(5)

9 Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek Table 9. Disproportionality analysis: major findings from the published literature Organisation (i.e., primary author s affiliation) Algorithms Key findings/conclusions Drug monitoring centres WHO Uppsala Drug Monitoring Centre (Sweden) Netherlands Pharmacovigilance Centre Lareb (The Netherlands) Regulatory authorities Medicines and Healthcare products Regulatory Agency (MHRA, UK) Food and Drug Administration (FDA, US) BCPNN, ROR, PRR PRR, MGPS, PROFILE Retrospective analysis demonstrated early SDRs for known drug event associations (e.g., cough captopril) and avoidance of false positives (e.g., lack of an SDR with digoxin rash). Positive predictive value (44%) and negative predictive value (85%) were calculated [35]. Retrospective analysis of reports of hepatic injury reported with SSRIs, showed an SDR for hepatic injury and nefazadone, and no SDR for other SSRIs [36]. Retrospective analysis showed that SDRs could have been observed for the reported DEC: SSRI-neonatal withdrawal syndrome, especially for paroxetine [37]. In the WHO database, an association between clozapine and cardiomyopathy and myocarditis has been established. Related antipsychotics showed a similar relationship with these ADRs [38]. Relevant DECs were highlighted by BCPNN (practolol peritonitis, captopril cough, terfenadine heart rate rhythm disorders, clozapine myocarditis). BCPNN also demonstrated the ability to highlight possible drug-specific and group effects [39]. Anaphylactic reactions associated with the use of naproxen, ibuprofen and diclofenac were reported disproportionately compared with other drugs [40]. The inter-relationship between the reported ADRs urticaria, fever and arthralgia, and terbinafin, was examined by logistic regression modelling and pointed towards the existence of an immunological syndrome [29]. Logistic regression analysis confirmed the influence of concomitant use of diuretics and NSAIDs, on a decreased efficacy reported with diuretics involved [27]. A cross-sectional analysis of the data set of the Netherlands Pharmacovigilance Centre showed that different frequentist and Bayesian measures of disproportionality are broadly comparable when four or more cases per combination have been collected; did not include empirical measures [41]. Retrospective analysis of 15 newly marketed drugs in the UK, showed that signals referred for 70% to known ADRs, 13% were related to the underlying disease and 17% required further follow-up [18]. Preliminary comparison using commonly-cited thresholds, showed that standard-prr generated more SDRs, but also that PRR highlighted SDRs earlier than stratified-mgps. Performance gradients were dependent upon threshold selection [26]. Retrospective analysis showed an early SDR for the labelled AE bronchospasm with rapacuronium bromide prior to withdrawal for this reason [42]. Retrospective analysis showed an early SDR for liver enzyme abnormalities with pemoline prior to withdrawal in the UK for this reason [42]. Retrospective analysis showed an early SDR for the labelled AE rhabdomyolysis with cerivastatin prior to withdrawal for this reason [21]. ROC curves shown for four EB05 cut off scores for labelled events originally identified in large clinical trials [21]. MGPS signals reportedly prompted further analyses that led to black box warning for life-threatening pancreatitis with valproic acid and valproate [43]. Four data mining metrics (PRR, screened PRR (sprr), EBGM, and EB05) were applied to the Vaccine Adverse Event Reporting System to examine agreement between metrics and other performance characteristics. Unscreened PRR was not studied in full because of overabundance of signals with singleton associations. The most highly ranked vaccine event pairs varied between the metrics. Few known associations were in the top 100 scores of any of the methods. The number of vaccine event pairs in the top 100 rankings of any two methods ranged from sprr was generally comparable with EBGM. Each method has strengths and limitations [44]. ADR: Adverse drug reaction; BCPNN: Bayesian confidence propagation neural network; DEC: Drug event combination; DMA: Data mining algorithm; EB: Empical Bayes; EBGM: Empirical Bayesian geometric mean; IC: Information component; MGPS: Multi-item gamma-poisson shrinker; NSAID: Nonsteroidal anti-inflammatory drug; PRR: Proportional reporting ratio; ROC: Receiver operating characteristic; ROR: Reporting odds ratio; SDR: Signal of disproportionate reporting; SPRT: Sequential probability ratio test; SSRI: Selective serotonin re-uptake inhibitor. Expert Opin. Drug Saf. (2005) 4(5) 937

10 The role of data mining in pharmacovigilance Table 9. Disproportionality analysis: major findings from the published literature (continued) Organisation (i.e., primary author s affiliation) Algorithms Key findings/conclusions Therapeutic Goods Administration (TGA, Australia) Pharmaceutical industry Various Pharmaceutical Companies PRR, MGPS, BCPNN An iterative probability-filtering algorithm ( PROFILE ) including Fischer s exact test was applied to the Australian voluntary reporting system database. PROFILE is based on 2 x 2 contingency tables, but the output is the number of reports surviving the filter, rather than disproportionality scores. Causality guidelines in Australia s voluntary reporting scheme frequently result in multiple suspect drugs per report, raising the issue of innocent bystander drugs. The crude data comprised of almost 2000 drugs and 8000 associations. PROFILE analysis identified 17% of the associations as noise due to co-suspected drugs. If signal was defined as three or more reports surviving the probability filter, then over 80% of the reports could be attributed to 25% of the drugs (i.e., 75% of drugs are noise or evolving signals ). PROFILE analyses of seven specific reaction terms are described. Retrospective analysis demonstrated potential to identify quality control problems with medicines [45]. Contemporary Bayesian methods highlight similar DECs especially for frequently reported events. Three DECs were compared in detail: fluoxetine and headache, akithesia and polyneuritis. For one drug, AEs that were well-recognised after six years of use, would have been highlighted by empirical Bayesian approaches within the first postapproval year [46]. In a series of retrospective data mining exercises involving diverse drugs, events and pharmacovigilance scenarios, both standard-prr and stratified MGPS showed potential value in highlighting relevant DECs in a timely manner, although sensitivity gradients were observed. Standard-PRR was found to be more sensitive than stratified-mgps in detecting adverse events of demonstrated interest in pharmacovigilance when commonly cited thresholds were used. This greater sensitivity was demonstrated both for the number of relevant DECs highlighted, as well as the time-to-signal for DECs highlighted by both methods. In some instances, the time lag between first disproportional PRR and first disproportional EB05 was substantial [22,23,47-53]. For a set of potentially serious/life-threatening DECs that were the subject of black-box warnings, a retrospective analysis did not demonstrate an obvious benefit of DMAs over traditional signalling approaches. Some of these DECs were associated with SDRs after detection by traditional approaches [54]. Using commonly cited thresholds, stratified-mgps failed to detect SDR for antipsychotic-associated pancreatitis that had been detected by traditional clinical approaches [55]. Using commonly cited thresholds, stratified-mgps failed to highlight relevant DECs involving thalidomide reported during the first 18 months of marketing that were selected for further review by traditional approaches [56]. The ability of MGPS to identify drug drug interactions between verapamil and other cardiovascular drugs resulting into impaired cardiac conduction was studied. Empirical Bayesian data mining has potential value in investigating the clinical relevance of potential drug drug interaction [57]. Disproportionality analysis was used to investigate the associations between different asthma polypharmacy regimens and the spontaneous reporting of Churg-Strauss syndrome. Different degrees of association were observed. Disproportionality analysis was able to reveal the differential contribution of each class of drugs to the reports of Churg-Strauss syndrome [57,58]. Preliminary data mining results from spontaneous reports of movement disorders associated with diverse pharmacological/therapeutic drug classes, suggest that data mining may identify drugs for further molecular structure toxicity modelling [49]. ADR: Adverse drug reaction; BCPNN: Bayesian confidence propagation neural network; DEC: Drug event combination; DMA: Data mining algorithm; EB: Empical Bayes; EBGM: Empirical Bayesian geometric mean; IC: Information component; MGPS: Multi-item gamma-poisson shrinker; NSAID: Nonsteroidal anti-inflammatory drug; PRR: Proportional reporting ratio; ROC: Receiver operating characteristic; ROR: Reporting odds ratio; SDR: Signal of disproportionate reporting; SPRT: Sequential probability ratio test; SSRI: Selective serotonin re-uptake inhibitor. 938 Expert Opin. Drug Saf. (2005) 4(5)

11 Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek Table 9. Disproportionality analysis: major findings from the published literature (continued) Organisation (i.e., primary author s affiliation) Algorithms Key findings/conclusions Academia/national research institutes INSERM (France) University of Tokyo (Japan) University of Utrecht (The Netherlands) University of Ballarat (Australia) PRR, MGPS, BCPNN, ROR In short, if an organisation receives extremely large numbers of reports with numerous drugs and there are significant subsets of the database that do not meet current signalling criteria, then it would be prudent to either strengthen existing signalling criteria and/or use supplementary tools, such as DMAs. Organisations planning to use DMAs are presented with a daunting space of available choices (Table 10). This presents a challenge to researchers developing and testing these tools, as well as to students of the published data mining literature. To discuss each possible choice or configuration would be beyond the scope of this article so the authors focus on some basic aspects. The question of which, if any, DMA outperforms the others has been vigorously debated, yet may be overly broad. The authors have already touched on issues of comparative Using commonly cited thresholds, Monte Carlo simulations were performed comparing PRR, ROR, SPRT, IC and EB05. All methods showed low sensitivity. IC was most sensitive, followed by PRR and ROR, EB05 and SPRT [59,60]. In what is primarily an exercise in SRS modelling, two Bayesian signal detection methods were compared: the IC and the EB method. SRS modelling was used to simulate realistic data on 60 drugs and 40 effects using qualitative knowledge of pharmacovigilance experts and literature as expressed by fuzzy representation of knowledge and a fuzzy inference system. Simulation parameters were assigned based on characteristics of the French pharmacovigilance database. EB was superior to IC for low report counts/rare adverse events. With larger numbers of reports, both methods were comparable [105]. Five data mining methodologies were applied to Japanese spontaneous reports. The DECs highlighted as possible signals varied between methodologies, particularly for DECs with report counts of 1 or 2. Using commonly cited thresholds, unstratified-prr was more sensitive and less specific than stratified-gps as measured against BCPNN criteria. Metrics based on the lower bound of the confidence interval of PRR and ROR showed high sensitivity but lower specificity. GPS demonstrated the lowest negative predictive values and highest positive predictive values. Stratified versus unstratified analyses (by reporting year) accounted for very small but statistically significant differences in average scores for the relevant metrics (EB vs. 1.29, EBGM 4.70 vs. 4.49) and small differences in the fraction of combinations highlighted as possible signals (6.9 vs. 7.1%) [61]. ROR was studied using a case-non-case analysis with reports from the WHO-UMC database. RORs were calculated using all 284,426 case reports of suspected AEs of drugs with known anti-herg activity. Cases were defined as reports of cardiac arrest, sudden death, torsade des pointes, ventricular fibrillation and ventricular tachycardia (n = 5591). RORs correlated with various indices related to anti-herg activity [62]. Eight methods/metrics (including both frequentist and Bayesian methods and PROFILE) were applied to one reaction term ( hepatitis cholestatic ) in the Australian adverse reaction database. Sensitivity, specificity, predictive values, and corrected Kappa statistics were calculated to compare statistical methods with each other and with the Australian Adverse Reactions Advisery Committee (ADRAC) bulletin [63]. ADR: Adverse drug reaction; BCPNN: Bayesian confidence propagation neural network; DEC: Drug event combination; DMA: Data mining algorithm; EB: Empical Bayes; EBGM: Empirical Bayesian geometric mean; IC: Information component; MGPS: Multi-item gamma-poisson shrinker; NSAID: Nonsteroidal anti-inflammatory drug; PRR: Proportional reporting ratio; ROC: Receiver operating characteristic; ROR: Reporting odds ratio; SDR: Signal of disproportionate reporting; SPRT: Sequential probability ratio test; SSRI: Selective serotonin re-uptake inhibitor. performance from a purely statistical perspective. To recap, given the rate-limiting data distortions and corruption in SRS data, the limitations of projecting higher dimensional data into 2 x 2 contingency tables, the arbitrary and adjustable nature of commonly cited thresholds, as well as the ability to create ad hoc shrinkage rules with any form of disproportionality analysis, performance differentials between the commonly-used DMAs in naturalistic pharmacovigilance settings is likely to be of questionable significance. Although empirical Bayesian metrics were not included in the analysis, Van Puijenbroek and co-workers demonstrated concordance between Bayesian and non-bayesian methodologies when there are 4 cases for a particular DEC [44]. Although theoretical statistical considerations and published performance evaluations of DMAs (which usually study the DMA in isolation) provide informative guidance for Expert Opin. Drug Saf. (2005) 4(5) 939

12 The role of data mining in pharmacovigilance Table 10. Modifiable parameters and configurations for data mining investigations. Algorithm Type of database Report source Size of database Dictionary Dictionary hierarchy used Case definitions Drugs Study design Methodology Performance measures Threshold selection/threshold metrics adjudicating utility, it is important to remember that DMAs are used to assist the prepared mind [15]. The use of DMAs in naturalistic pharmacovigilance settings involves cognitive processes and interactions that defy explicit characterisation. Consequently, the authors must turn to extra-statistical considerations to complete their assessment of the practical significance, as opposed to the statistical significance, of reported performance gradients between DMAs. These parallel clinical considerations lead to similar conclusions. In general, the comparative performance of these methods used as binary classifiers with commonly cited thresholds can be summarised as follows [17,18]: In general, frequentist forms of DMAs (e.g., PRRs, RORs) seem to highlight a greater number and variety of DEAs than Bayesian DMAs (e.g., BCPNN, (M)GPS) [22,23,26]; For DEAs that are highlighted by both, frequentist and Bayesian methodologies, frequentist DMAs tend to do so earlier [22,23,26]; Proportional reporting ratios Reporting odds ratios Bayesian confidence propagation neural network Multi-item gamma-poisson shrinker Sequential probability ratio tests Public, proprietary (company) Spontaneous versus spontaneous plus clinical trials; inclusion/exclusion by report source (e.g., consumer reports). ~ 35,000 3,000,000 reports Global versus subset of database MedDRA, COSTART, WHO-ART, local, company developed LLT, PT, HLT, HLGT, SOC Special search categories (e.g., Standardised MedDRA Queries) Ad hoc case definitions (suspect) versus (suspect versus concomitant) Certain subsets of drugs removed (to eliminate masking) Grouping by pharmacological/therapeutic class Real data versus database simulation Prospective versus retrospective Stratified versus unstratified (currently age, gender, time period) reports versus events Binary classifier versus ranking classifier Cross-sectional analysis Time trend analysis Deployment in series versus in parallel with other signal detection activities Sensitivity, specificity, positive predictive values, negative predictive value, ROC curves Disproportionality threshold Discrete thresholds: point estimates versus lower CI boundaries Confidence intervals Case count threshold CI: Confidence interval; COSTART: Coding symbols for a thesaurus of adverse reaction terms; HLGT: Higher level group term; HLT: Higher level term; LLT: Lower level term; MedDRA: Medical dictionary for regulatory activities; PT: Preferred term; ROC: Receiver operating characteristic; SOC: System organ class; WHO-ART: The WHO adverse reaction terminology. With courtesy of Drug Safety [64]. Performance gradients in sensitivity and specificity may show a progression from frequentist to Bayesian to empirical Bayesian approaches. Some of the additional DEAs obtained with frequentist DMAs are due to confounding, reporting artifact or statistical noise (especially at low reporting frequency), and require additional triage criteria for practical implementation. Some of the additional DEAs represent coding variants encountered with hypergranular dictionaries (e.g., MedDRA) Characteristics of different forms of disproportionality analysis are shown in Table 11. Practical reality dictates that the search for truth in pharmacovigilance requires judicious limitations on the number of associations that we investigate. Excessive time and effort expended on associations of no significance will be adverse to public safety as focus is diverted from more significant 940 Expert Opin. Drug Saf. (2005) 4(5)

13 Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek Table 11. Characteristics of commonly used data mining algorithms based on disproportionality analysis. Algorithms Some users Advantages Disadvantages Simple /frequentist Proportional reporting ratios Reporting odds ratios * When commonly cited thresholds are used as binary classifiers. DEA: Drug event association; SDR: Signal of disproportionate reporting. Health authorities outside the US Pharmaceutical companies Drug Safety Research Unit More sensitive* Clear, easy to use and to understand Identifies virtually all DEAs identified by Bayesian methods Natural metric for logistic regression analyses hazards, and should therefore be avoided. In response to an overabundance of SDRs generated by DMAs, some health authorities (MHRA) and drug monitoring centres (WHO) apply additional triage procedures based on public health principles, to filter the associations initially highlighted by data mining. The objective of the triage is to limit the number of potential signals to a more manageable level by pinpointing associations that seem most significant from a public health perspective. For example, the MHRA had utilised a triage logic known as SNIP (strength of the association, new event, clinically important, and preventable) and an impact analysis to help select a subset of disproportionately represented associations warranting further review [18,65]. The WHO triage algorithms selects disproportionately represented associations that are also characterised by such factors as a rapid reporting increase, serious AEs with new drugs, reactions of special interest and positive rechallenge, to filter initial data mining output [18,65]. Triage criteria are not governed by specific regulations and may be decided by each institution. Although both frequentist and Bayesian methodologies are associated with false-positive findings, as mentioned above, Bayesian DMAs, in general, highlight less SDRs than frequentist methodologies. This is not surprising because, as discussed earlier, Bayesian methodologies use a mathematical process that factors the overall reporting experience across drugs and events in part to achieve a statistical shrinkage of SDRs associated with low reporting frequency. The extent to which true causal DEAs are shrunk along with noise, is, however, unknown, yet likely to be occurring, as the mathematical model homogenises individual reports and contains no explicit clinical criteria. However, by integrating clinical judgment and prior knowledge of drugs, events, patient population and diseases, safety assessors can often rapidly filter out associations Lower specificity leading to overabundance of SDRs that may require additional triage criteria for practical implementation Bayesian Bayesian Confidence Propagation Neural Network (Multi-item) gamma-poisson shrinker US FDA WHO Drug Monitoring Centre Pharmaceutical companies More specific* Numerous data mining settings and configurations maximise exploratory capacity Configured to perform higher order analysis (e.g., drug drug interactions, complex medical syndromes) Lower sensitivity Numerous data mining settings and configurations raise issues of confirmation bias and multiple comparisons issue reflecting the influence of patient-specific factors, treatment indication, and/or co-medications and co-morbid illnesses. This clinical shrinkage may allow the safety reviewer using frequentist forms of DMAs to review a wider net of SDRs rapidly discarding uninteresting DEAs, while filtering out DEAs of interest for further evaluation. The overabundance of SDRs highlighted with frequentist DMAs may not be prohibitive and the opportunity cost due to lower specificity may be acceptable, and even desirable, if there is a resultant gain in sensitivity; although this is presumably highly situation-dependent. There is currently no theoretical basis or firm empirical support establishing universal thresholds defining a potential signal, although some have been recommended (e.g., PRR: PRR 2 and χ 2 4, N > 2 [18]; MGPS: EB05 2, N > 0 [21]; BCPNN: IC-2SD > 0 [35] [with or without a time-trend analysis]). As thresholds utilised for highlighting SDRs are unvalidated, ad hoc and adjustable, the deployment of DMAs is inherently subjective in nature and can be optimised by judicious selection of data mining parameters, such as statistical thresholds and background for comparison (see Section 3.2 for discussion of backgrounds). Consequently, mathematically more complex and sophisticated Bayesian DMAs should not automatically be assumed to result in superior outcomes in all situations. Finally, the quality and usefulness of the results is strongly influenced by the knowledge and experience of the data miner. Given all the aforementioned considerations, no single method seems superior. Advocacy of a given DMA could partly relate to intellectual or commercial conflicts of interest but can also reflect practical challenges imposed by the unique structure and function of a given pharmacovigilance organisation. All such organisations have the same overall Expert Opin. Drug Saf. (2005) 4(5) 941

14 The role of data mining in pharmacovigilance objective, although each presents a unique combination of workload and resource capacity. The choice of a DMA could reflect this balance (or imbalance) between workload and resource capacity, as well as comfort with varying levels of process automation. Pharmacovigilance systems are somewhat analogous to queuing systems, in which arrivals (the signals ) place demands on finite (human) resource capacity. Queuing theory quantifies the common-sense notion that loading an overloaded system can result in disproportionately increased waiting time resulting in decompensation of the system s efficiency. An organisation massively overloaded with data relative to resource capacity, may put a higher premium on specificity over sensitivity in data mining. Organisations with more balanced data and resource capacity, might appropriately put a higher premium on sensitivity when choosing a DMA, metric and/or threshold and the manner in which it is applied. In the former case, reducing the number of signals ( unloading the system ) would translate into choosing more specific and less sensitive algorithms, metrics, thresholds and configurations (e.g., using the DMA in series with the prepared mind). In the latter situation, however, more sensitive and less specific algorithms, metrics, thresholds and configurations (i.e., parallel deployment in which DMAs assist in detecting additional signals) may be an appropriate choice. In the former situation, an organisation might find it most expedient to use the computerised algorithm to provide all the initial filters, whereas in the latter situation the parallel use of additional filters based on human assessments may be chosen. In short, because pharmacovigilance organisations are not structurally or functionally homogeneous, the formula for achieving maximum scientific efficiency may differ across organisations. However, DMAs may also be useful in the common scenario in which a signal is initially detected without using these tools. Clinical epidemiological principles of screening can help illuminate this process. Screening tests are most informative when the pretest probability of the disease under surveillance is neither very high nor very low. A corresponding situation in pharmacovigilance would be when a detailed case review prompted by a potential signal paints a compelling clinical argument that the association is likely to be causal (high pretest probability) or that it is obviously due to confounding factors, reporting artifacts etc. (low pretest probability). In such cases, statistical calculations on SRS data are unlikely to substantially illuminate the phenomena under investigation. However, in naturalistic pharmacovigilance settings, the clinical data and arguments are often highly ambiguous from the perspective of causality assessment. In these situations, safety reviewers seek multiple convergent lines of data and evidence to formulate an assessment and statistical calculations on SRS data can be one piece of the puzzle in this situation. David Finney pioneered numerical approaches to SRS data and said: the essence is to collect facts that individually tell little, but collectively form a clue to drug dangers. [66]. Higher-order phenomenon, such as complex drug drug interactions and drug-induced syndromes, require making cognitive links between multiple drugs and/or events, and so may be inherently less amenable to detection by manual review of lists and, hence, as stated above, represent an attractive opportunity for DMAs. In a drug drug interaction, both a pharmacokinetic interaction as well as a pharmacodynamic interaction may be involved, each of which may lead to either an increase or decrease in the effect of one or both of the interacting drugs. Although the mechanism of action is important for an understanding of the nature of the interaction, in the detection of ADRs in a spontaneous reporting system, it is of less significance. The net effect of the possible interaction, however, is essential, because this will direct the focus of the safety reviewer to SDRs that are most likely in need for further investigation. Examples include, the possible interaction between oral contraceptives and the use of itraconazole, in which a delayed withdrawal bleeding is taken to be indicative of a possible interaction [28], as well as the influence of concomitant use of diuretics and NSAIDs on symptoms indicating a decreased efficacy of diuretics [27]. In addition, SRS data sets can be used in the detection and analysis of possible drug-related syndromes. A syndrome can be seen as a complex of signs and symptoms that, together, constitute the picture of a disease. Only a small part of the ADR clusters present in the database actually represent distinct clinical syndromes. Rather than being part of a certain clinical syndrome, an apparent clustering of symptoms may occur due to fact that the symptoms themselves are related. This is the case with nausea and vomiting or abdominal pain and diarrhoea, symptoms which, although closely interrelated, do not represent a particular clinical entity. The extent to which the symptoms urticaria, fever and arthralgia were interrelated in a SRS data set was examined by logistic regression modelling. Case series as well as the results of the statistical analyses showed a clustering of symptoms among reports of patients using terbinafine. These finding might point towards a clustering of these symptoms in patients using terbinafine [29]. The regression approach described above, provides one way to deal with complex multi-drug effects, although disproportionality analysis, despite being particularly prone to unreliable results in this setting, may also be used. The use of DMAs may provide additional ancillary gains. As an illustration, for several reasons, AEs may be reported more than once (e.g., directly by the physician and indirectly by the company). Observations of particular interest (e.g., regarding unexpected serious AEs) may be more prone to duplicate reporting. Often, similarities (e.g., age and sex, comedication, administration dates) can yield a high level of suspicion of duplication. Automated detection of suspect duplicate reporting is often routinely incorporated as part of 942 Expert Opin. Drug Saf. (2005) 4(5)

15 Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek cleaning the data prior to data mining [67]. This can lead to improved data quality. 3.2 Mining SRS data: pitfalls Although results, to date, are promising, it is important to be mindful of numerous limitations and pitfalls in the use of DMAs and interpretation of the published data mining literature. The authors briefly discuss some prominent examples of biases that may creep into the picture. Advocates of frequentist as well as the Bayesian approaches note that each of the DMAs has been able to identify known signals. The initial peer-reviewed literature reported predominantly positive results, raising the question of publication bias. Subsequently, unpublished and published examples of known safety issues not identified by DMAs have appeared [50,56,59]. The failure of a causal association to generate an SDR can be related to the overall reported safety profile of the drug ( foreground ) as well as the composition of the background in the database used for comparison [68]. Consider the reported association between cisplatin and bradycardia, which includes multiple positive rechallenges [69]. Despite the presence of up to 41 reports in FDA-AERS through the third quarter of 2003, no SDR was generated either by the more sensitive PRRs or the more specific MGPS. The authors can conjecture about several reasonable explanations for this observation (aside from the obvious one, the lack of causality). Cisplatin has a very complex safety profile that is dominated by traditional patterns of cytotoxicity. Also, there are many drugs that may cause bradycardia, possibly leading to a high background rate of this event in the database. Therefore, bradycardia may not be associated with an SDR because it is masked by both the foreground and background. Sometimes the background is the predominant factor. One can create a more sensitive background to exclude heavy contributors of certain AEs of interest. For example, when all the other drugs in the FDA-AERS database are used as background, to evaluate the DEA between drug X (e.g., a drug for which a dear healthcare professional letter was issued due to hyperglycaemia) and hyperglycaemia NOS, the signalling score is far below the threshold. However, when insulin-formulations and their reported events are deleted from the background, the signalling score for this DEC rises far above commonly used thresholds. This situation can occur with any DMA. The phenomenon of diluting the SDR of relative weaker DEA by strong DEA is referred to in the literature as cloaking, or masking, and in-house company safety databases might be especially prone to this because of the relative lack of diversity of events or drugs [18,47,56,70]. The space of available choices (as displayed in Table 10) in data mining maximises the exploratory capacity, but also makes these exercises highly susceptible to confirmation bias. Given the freedom to choose so many user-adjustable configurations, use of DMAs is prone to self-deception bias in which a data miner with a strong incentive to believe in a particular outcome, consciously or unconsciously, tries to avoid results that contradict pre-existing expectations by retrofitting an analysis around the data, using sequential trials of nonspecific case definitions of uncertain clinical relevance, different subsets of the database, thresholds and/or other configuration parameters until the desired output is achieved. Retrospective and post hoc exercises might be especially susceptible to this form of confirmation bias [71]. The finding that, in general, frequentist methodologies identify DEAs earlier than Bayesian methodologies [22,23,26], highlights an important lesson when studying the published literature on data mining validation, namely the significant limitations of cross-sectional analyses and the importance of repeatedly analysing the developmental anatomy of the database over time, so the nature and number of the SDRs produced by each DMA as well as the relative timing of SDRs for those generated by more than one method are examined. Finally, the authors would be remiss if they did not mention the considerable impact of dictionary architecture on data mining. The numerous coding redundancies and multiplicities of hypergranular dictionaries such as the Medical Dictionary for Regulatory Affairs can profoundly affect the results of data mining computations. The most obvious example is what has been referred to as signal fragmentation [72]. In addition, this could also be introduced by switching to another dictionary, or to changes in mapping structure within a given dictionary (e.g., version upgrading) [50]. Therefore, it is important to see if data mining performance can be optimised by mining particular levels of the dictionary hierarchy or by using Boolean logic to combine clinically redundant or overlapping event terms. Excellent discussions of this issue can be found in articles by Brown [73-75]. 4. Clinical versus computational approaches The computational approach of DMAs basically differs from the classical case-by-case approach in which every incoming report is reviewed by a qualified assessor. The place of the additional available clinical information is less well-defined when using DMAs, despite the fact that this information makes an essential contribution to the signal detection process. In the case-by-case approach, the intrinsic value of the case report itself is a crucial factor. An individual case-report not merely provides information about a certain combination of a drug and an adverse event, but it also places this information in a specific context; for instance, with regards to the pattern of the clinical events, the course of the reaction, specific time-related information and experiences of the patient and doctor with related products. These aspects are currently not taken into account by DMAs. The frame of reference for signal detection in the case-by-case approach is the experience and interpretation of the data by the individual assessor. This basically differs from the statistical approaches in which the frame of reference is the other DECs in the data set. Another major difference is that in the classical case-by-case approach each case report has its own value and may have a Expert Opin. Drug Saf. (2005) 4(5) 943

16 The role of data mining in pharmacovigilance different contribution in building the evidence of the signal involved. With the current DMAs, all reports have an equal contribution to the signal, irrespective of the level of documentation and quality and quantity of additional clinical information available in the case-reports. The basic differences in both approaches make clear that one approach will never be able to replace the other one. 5. Conclusion The development, testing and deployment of DMAs represent a quantum jump in pharmacovigilance. Although there is currently no scientific or regulatory basis to claim that DMAs are a required element of good pharmacovigilance practice, they are an intuitively appealing solution to the operational challenges of screening steadily enlarging safety databases. Higher-order phenomena, such as complex drug drug interactions or drug-induced syndromes, may be especially difficult to identify through manual review of AE lists, and it is this type of phenomena which might be most amenable to detection through data mining. Retrospective applications indicate that DMAs can highlight some medically significant associations in a timely manner, often in advance of the published literature and traditional signalling strategies. This experience includes both general as well as more specialised pharmacovigilance settings [56]. DMAs have been incorporated into routine pharmacovigilance operations of major national and transnational drug monitoring centres, such as the MHRA (PRRs), the Netherlands Pharmacovigilance Centre LAREB in the Netherlands (RORs), the WHO drug monitoring centre (BCPNN) and the US FDA (MGPS). Key regulatory guidance documents include discussions about the potential role of data mining in pharmacovigilance and risk management framework [106]. However, DMAs may fail to highlight legitimate associations for various reasons, have an unclear opportunity cost associated with false alarms, and have yet to prospectively detect new drug hazards. The latter consideration is especially pertinent in light of questions that have been raised about classifier performance in general. As Hand stated: improvements attributed to the more advanced and recent developments are small, and that aspects of real, practical problems often render such small differences irrelevant, or unreal, so that the gains reported on theoretical grounds, or on empirical comparisons from simulated or even real data sets, do not translate into real advantages in practice. That is, the progress is far less than it appears [76]. The authors cannot say with certainty the degree to which these considerations apply to data mining in naturalistic pharmacovigilance settings, but it seems reasonable to ponder the degree to which global retrospective data mining exercises involving large databases populated with many old drugs and labelled events informs us about good prospective pharmacovigilance practice in naturalistic pharmacovigilance settings, especially given that key parameters and uncertainties in the target environment may defy explicit characterisation. However, in fairness, it is important to remember that pharmacovigilance practice has historically depended on semiquantitative and non-computational approaches that are not formally validated (though there is obviously more prospective experience with these approaches). There are formidable challenges to validating data mining algorithms beyond those already mentioned, such as the choice of appropriate reference AEs (true and false positive and negative signals) for assessing DMA performance in the absence of perfect gold standards for adjudicating causality. Accordingly, DMAs should be considered as one of many potentially performance-enhancing options in the pharmacovigilance toolkit that need to be assessed by each institution on an individual basis. They should only be considered potential supplements to, and not substitutes for, a comprehensive signal detection programme based on multiple approaches and data sets. The authors encourage all stakeholders to participate in testing these methodologies. Data mining research is important, interesting and fun. The authors believe the greater the number of perspectives and the more vigorous the discourse, the better for patient safety. This presents an important opportunity for multi-disciplinary knowledge sharing between regulatory authorities, drug monitoring centres, pharmaceutical companies and academia to improve safety of patients. There is significant potential for misapplication and misuse of DMAs. A great danger is that DMAs, especially those with an extensive mathematical veneer and dazzling visualisation tools, will seduce users into believing that the enormous limitations and defects in SRS data have been neutralised and thereby promote overinterpretation and overconfidence in data mining output. As some have noted [77], indications of this have already appeared in the published literature and the courtroom where these methods have been described as potentially useful for testing hypotheses [78]. 6. Expert opinion Two rate-limiting factors loom large in mining SRS data; the qualitative and quantitative data distortions and corruption inherent to voluntary reporting schemes, and the projection of high-dimensional data onto two-dimensional contingency tables. One approach to defective data will be the development and application of DMAs to very large, quality audited pharmacoepidemiological databases. The latter ordinarily contain anonymous longitudinal medical records, including diagnoses, prescribed drugs, hospital admissions and laboratory results. It is, therefore, tempting to use these databases to identify signs and symptoms, drug drug interactions, as well as long latency adverse reactions. Unlike SRS databases, pharmacoepidemiological databases do not contain reporter-defined drug event/ symptom pairs. This might be a challenge, but on the other hand, the routine recording of signs and symptoms regardless of the index of suspicion, provides less biased data. However, 944 Expert Opin. Drug Saf. (2005) 4(5)

17 Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek voluntary reporting systems will still be needed, especially for the detection of extremely rare events for which even large longitudinal pharmacoepidemiological databases are principally too small. Progress beyond the second rate-limiting step will be facilitated by exploring and developing techniques, such as multivariate regression and propensity scoring that incorporate the full dimensionality of the data. Just as there is a danger of overinterpreting data mining results, there is also a danger of overattention to data mining research at the expense of other important areas for growth. With increasing focus on statistical approaches, the authors are, once again, and perhaps more than ever, reminded about Bibliography Papers of special note have been highlighted as either of interest ( ) or of considerable interest ( ) to readers. 1. MEYBOOM RH, EGBERTS AC, EDWARDS IR et al.: Principles of signal detection in pharmacovigilance. Drug Saf. (1997) 16(6): Eloquent discussion of the nuances of signal detection and evaluation. 2. HAND D, BLUNT G, KELLY M, ADAMS N: Data mining for fun and profit. Stat. Science (2000) 15(2): GOODWIN L, VANDYNE M, LIN S, TALBERT S: Data mining issues and opportunities for building nursing knowledge. J. Biomed Inform. (2003) 36(4-5): LI X, RAO S, WANG Y, GONG B: Gene mining: a novel and powerful ensemble decision approach to hunting for disease genes using microarray expression profiling. Nucleic Acids Res. (2004) 32(9): PERNER P: Image mining: issues, framework, a generic tool and its application to medical image diagnosis. Journal Engineering applications of artificial intelligence (2002) 15(3): HAUBEN M, REICH J: Communication of findings in pharmacovigilance: use of term signal and the need for precision in its use. Eur. J. Clin. Pharmacol. (2005) 61(5-6): BATEMAN DN, SANDERS GL, RAWLINS MD: Attitudes to adverse drug reaction reporting in the Northern Region. Br. J. Clin. Pharmacol. (1992) 34(5): BELTON KJ: Attitude survey of adverse drug-reaction reporting by health care professionals across the European Union. The European Pharmacovigilance Research Group. Eur. J. Clin. Pharmacol. (1997) 52(6): BELTON KJ, LEWIS SC, PAYNE S, RAWLINS MD, WOOD SM: Attitudinal survey of adverse drug reaction reporting by medical practitioners in the United Kingdom. Br. J. Clin. Pharmacol. (1995) 39(3): COSENTINO M, LEONI O, BANFI F, LECCHINI S, FRIGO G: Attitudes to adverse drug reaction reporting by medical practitioners in a Northern Italian district. Pharmacol. Res. (1997) 35(2): DE BRUIN ML, VAN PUIJENBROEK EP, EGBERTS AC, HOES AW, LEUFKENS HG: Non-sedating antihistamine drugs and cardiac arrhythmias biased risk estimates from spontaneous reporting systems? Br. J. Clin. Pharmacol. (2002) 53(4): ELAND IA, BELTON KJ, VAN GROOTHEEST AC et al.: Attitudinal survey of voluntary reporting of adverse drug reactions. Br. J. Clin. Pharmacol. (1999) 48(4): WILLIAMS D, FEELY J: Underreporting of adverse drug reactions: attitudes of Irish doctors. Ir. J. Med. Sci. (1999) 168(4): BEGAUD B, MORIDE Y, TUBERT-BITTER P, CHASLERIE A, HARAMBURU F: False-positives in spontaneous reporting: should we worry about them? Br. J. Clin. Pharmacol. (1994) 38(5): Describes a method based on the Poisson distribution for computing the maximum number of reports of an ADR that could be expected to be reported coincidentally, and relates this to the concept of DMEs. the profound rate-limiting defects in SRS data. Opportunities for improving data quality are crucial. Similarly, the study and optimisation of clinical cognition in the process of signal detection and evaluation should not be neglected. From a technological perspective, knowledge-sharing collaborative efforts involving the pharmacovigilance, pharmacoepidemiological, statistics, computer science and artificial intelligence communities, should strive for database tools that will allow the expert safety reviewer to efficiently access and integrate extra-statistical scientific knowledge pertinent to the aetiology of ADRs with the statistical calculations, because associations for which cogent post hoc scientific explanations are found may be more likely to be causal in nature [2]. 15. TRONTELL A: Expecting the unexpected-drug safety, pharmacovigilance, and the prepared mind. N. Engl. J. Med. (2004) 351(14): An excellent description by an expert at the FDA, of signal detection as a comprehensive process using multiple approaches and techniques. 16. BATE A, LINDQUIST M, EDWARDS IR et al.: A Bayesian neural network method for adverse drug reaction signal generation. Eur J. Clin. Pharmacol. (1998) 54(4): A seminal paper describing the BCPNN. 17. EGBERTS AC, MEYBOOM RH, VAN PUIJENBROEK EP: Use of measures of disproportionality in pharmacovigilance: three Dutch examples. Drug Saf. (2002) 25(6): EVANS SJ, WALLER PC, DAVIS S: Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol. Drug Saf. (2001) 10(6): Describes the application of PRR at the MHRA. 19. FRAM D, ALMENOFF J, DUMOUCHEL W: Empirical Bayesian data mining for discovering patterns in postmarketing drug safety. (9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 24-27, 2003, Washington), (2003) p HAUBEN M: A brief primer on automated signal detection Quantitative methods in pharmacovigilance: focus on signal detection. Ann. Pharmacother. (2003) 37(7-8): SZARFMAN A, MACHADO SG, O NEILL RT: Use of screening algorithms and computer systems to efficiently signal Expert Opin. Drug Saf. (2005) 4(5) 945

18 The role of data mining in pharmacovigilance higher-than-expected combinations of drugs and events in the US FDA s spontaneous reports database. Drug Saf. (2002) 25(6): HAUBEN M, REICH L: Safety related drug-labelling changes: findings from two data mining algorithms. Drug Saf. (2004) 27(10): HAUBEN M: Trimethoprim-induced hyperkalaemia lessons in data mining. Br. J. Clin. Pharmacol. (2004) 58(3): MADIGAN D: Discussion of Bayesian data mining in large frequency tables by Bill DuMouchel. Am. Stat. (1999) 53: DUMOUCHEL W: Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. Am. Stat. (1999) 53(3): A seminal paper describing the empirical Bayesian approach to signal detection in pharmacovigilance. 26. MOSELEY J, HEELEY E, EKINS-DAUKES S, EVANS S: Preliminary comparison of 2 signal detection methodologies in the UK regulatory spontaneous ADR data base. [Abstract]. Drug Saf. (2004) 27(12): VAN PUIJENBROEK EP, EGBERTS AC, HEERDINK ER, LEUFKENS HG: Detecting drug-drug interactions using a database for spontaneous adverse drug reactions: an example with diuretics and non-steroidal anti-inflammatory drugs. Eur. J. Clin. Pharmacol. (2000) 56(9-10): VAN PUIJENBROEK EP, EGBERTS AC, MEYBOOM RH, LEUFKENS HG: Signalling possible drug-drug interactions in a spontaneous reporting system: delay of withdrawal bleeding during concomitant use of oral contraceptives and itraconazole. Br. J. Clin. Pharmacol. (1999) 47(6): VAN PUIJENBROEK EP, EGBERTS AC, MEYBOOM RH, LEUFKENS HG: Association between terbinafine and arthralgia, fever and urticaria: symptoms or syndrome? Pharmacoepidemiol. Drug Saf. (2001) 10(2): NEYMAN J: On the application of probability theory to agricultural experiments. Essay on principles. Section 9, translated in Statistical Science, (with discussion). Stat. Science (1990) 5(4): RUBIN D: Causal inference using potential outcomes: design, modeling, decisions. JASA (2005) 100(469): CARTWRIGHT NE: Nature s Capacities and Their Measurements. Cartwright N (ed.) Oxford University Press, Oxford, (1994): 33. ROSENBAUM P, RUBIN D: The central role of the propensity score in observational studies for causal effects. Biometrika (1983) 70(1): CORNFIELD J, HAENSZEL W, HAMMOND EEA: Smoking and lung cancer: recent evidence and a discussion of some questions. J. Natl. Cancer Inst. (1959) 22(1): LINDQUIST M, STAHL M, BATE A et al.: A retrospective evaluation of a data mining approach to aid finding new adverse drug reaction signals in the WHO international database. Drug Saf. (2000) 23(6): SPIGSET O, HAGG S, BATE A: Hepatic injury and pancreatitis during treatment with serotonin reuptake inhibitors: data from the World Health Organization (WHO) database of adverse drug reactions. Int. Clin. Psychopharmacol. (2003) 18(3): SANZ EJ, DE-LAS-CUEVAS C, KIURU A, BATE A, EDWARDS R: Selective serotonin reuptake inhibitors in pregnant women and neonatal withdrawal syndrome: a database analysis. Lancet (2005) 365(9458): COULTER DM, BATE A, MEYBOOM RH, LINDQUIST M, EDWARDS IR: Antipsychotic drugs and heart muscle disorder in international pharmacovigilance: data mining study. BMJ (2001) 322(7296): BATE A, LINDQUIST M, ORRE R et al.: Data-mining analyses of pharmacovigilance signals in relation to relevant comparison drugs. Eur. J. Clin. Pharmacol. (2002) 58(7): Provides cogent and relevant demonstrations and discussion, including graphical data visualisation tools, of the BCPNN, using real-world examples. 40. VAN PUIJENBROEK EP, EGBERTS AC, MEYBOOM RH, LEUFKENS HG: Different risks for NSAID-induced anaphylaxis. Ann. Pharmacother. (2002) 36(1): VAN PUIJENBROEK EP, BATE A, LEUFKENS HG et al.: A comparison of measures of disproportionality for signal detection in spontaneous reporting systems for adverse drug reactions. Pharmacoepidemiol. Drug Saf. (2002) 11(1): O NEILL RT, SZARFMAN A: Some US Food and Drug Administration perspectives on data mining for pediatric safety assessment. Curr. Ther. Res. (2001) 62(9): SZARFMAN A, TONNING JM, DORAISWAMY PM, MACHADO SG, O NEILL RT: Pharmacovigilance in the 21st century: new systematic tools for an old problem. Pharmacotherapy (2004) 24(9): BANKS D, WOO E, BURWEN D et al.: Comparing data mining methods on the VAERS database. Pharmacoepidemiol. Drug. Saf. (2005) Published Online: 13 Jun PURCELL P, BARTY S: Statistical techniques for signal generation: the Australian experience. Drug Saf. (2002) 25(6): GOULD AL: Practical pharmacovigilance analysis strategies. Pharmacoepidemiol. Drug Saf. (2003) 12(7): A comprehensive, well-written and insightful discussion of both theoretical concepts and issues related to the practical deployment of DMAs. 47. HAUBEN M, REICH L: Drug-induced pancreatitis: lessons in data mining. Br. J. Clin. Pharmacol. (2004) 58(5): Describes points to consider when using DMAs. 48. HAUBEN M, REICH L: A case report of rhabdomyolysis with pentamidine that prompted a retrospective evaluation of a pharmacovigilance tool under investigation. Br. J. Clin. Pharmacol. (2004) 58(6): HAUBEN M, REICH L: Data mining, drug safety, and molecular pharmacology: potential for collaboration. Ann. Pharmacother. (2004) 38(12): HAUBEN M, REICH L: Valproateinduced parkinsonism: use of a newer pharmacovigilance tool to investigate the reporting of an unanticipated adverse event with an old drug. Mov. Disord. (2005) 20(3): HAUBEN M, REICH L: Case reports of dobutamine-induced myoclonia in severe renal failure: potential of emerging pharmacovigilancetechnologies. Nephrol. Dial. Transplant (2005) 20(2): Expert Opin. Drug Saf. (2005) 4(5)

19 Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek 52. HAUBEN M, REICH L: Endotoxin-like reactions with intravenous gentamicin: results from pharmacovigilance tools under investigation. Infect. Control Hosp. Epidemiol. (2005) 26(4): HAUBEN M, REICH L, CHUNG S: Postmarketing surveillance of potentially fatal reactions to oncology drugs: potential utility of two signal-detection algorithms. Eur. J. Clin. Pharmacol. (2004) 60(10): HAUBEN M, REICH L: Potential utility of data-mining algorithms for early detection of potentially fatal/ disabling adverse drug reactions: A retrospective evaluation. J. Clin. Pharmacol. (2005) 45(4): HAUBEN M: Application of an empiric Bayesian data mining algorithm to reports of pancreatitis associated with atypical antipsychotics. Pharmacother. (2004) 24(9): HAUBEN M: Early postmarketing drug safety surveillance: data mining points to consider. Ann. Pharmacother. (2004) 38(10): ALMENOFF JS, DUMOUCHEL W, KINDMAN LA, YANG X, FRAM D: Disproportionality analysis using empirical Bayes data mining: a tool for the evaluation of drug interactions in the post-marketing setting. Pharmacoepidemiol. Drug Saf. (2003) 12(6): ALMENOFF J, DUMOUCHEL W, KINDMAN L, YANG X, FRAM D: Letter to the Editor Re: Almenoff et al., Disproportionality analysis using empirical Bayes data mining: a tool for the evaluation of drug interactions in the post-marketing setting. Pharmacoepidemiol. Drug Saf. (2004) 13(3): ROUX ET-B, P.,THESSARD F,FOURRIER A,BEGAUD A,EVANS S.: Automatic signal detection methods evaluation on a simulated data base. Workshop 20th International Conference on Pharmacoepidemiology & Therapeutic Risk Management, Bordeaux, France, ROUX E, TUBERT-BITTER P, THIESSARD F: Evaluation of data mining methods in pharmacovigilance using simulated datasets. 20th International Conference on Pharmacoepidemiology & Therapeutic Risk Management, Bordeaux, France, KUBOTA K, KOIDE D, HIRAI T: Comparison of data mining methodologies using Japanese spontaneous reports. Pharmacoepidemiol. Drug Saf. (2004) 13(6): DE BRUIN ML, PETTERSSON M, MEYBOOM RH, HOES AW, LEUFKENS HG: Anti-HERG activity and the risk of drug-induced arrhythmias and sudden death. Eur. Heart J. (2005) 26(6): HARVEY JT, TURVILLE C, BARTY SM: Data mining of Australian adverse drug reactions database: a comparison of Bayesian and other statistical indicators. Intl. Tran. Op. Research (2004) 11: HAUBEN M, PATADIA V, GERRITS C, WALSH L, REICH L: Data mining in pharmacovigilance: the need for a balanced perspective. Drug Saf. (2005) 28(in Press). 65. STAHL M, LINDQUIST M, EDWARDS IR et al.: Introducing triage logic as a new strategy for the detection of signals in the WHO Drug Monitoring Database. Pharmacoepidemiol. Drug Saf. (2004) 13(6): FINNEY DJ: The detection of adverse reactions to therapeutic drugs. Stat. Med. (1982) 1(2): NOREN G, ORRE R, BATE A: A hit-miss model for duplicate detection in the WHO drug safety database. Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining, August 2005, Chicago, USA. 68. GOGOLAK VV: The effect of backgrounds in safety analysis: the impact of comparison cases on what you see. Pharmacoepidemiol. Drug Saf. (2003) 12(3): ALTUNDAG O, CELIK I, KARS A: Recurrent asymptomatic bradycardia episodes after cisplatin infusion. Ann. Pharmacother. (2001) 35(5): YEE C, KLINCEWICS S, KNIGHT J, THOMAS A, WILSON R: Practical consideration in developing an automated signaling program within a pharmacovigilance department. Drug Inf. J. (2004) 38(3):293. Describes how one pharmaceutical company incorporates DMAs in their current pharmacovigilance practices. 71. HAUBEN M, REICH L: Application of an empiric Bayesian data mining algorithm to reports of pancreatitis associated with atypical antipsychotics. Pharmacother. (2004) 24(9): PURCELL PM: Data mining in pharmacovigilance Int. J. Pharm. Med. (2003) 17(2): BROWN EG: Effects of coding dictionary on signal generation: a consideration of use of MedDRA compared with WHO-ART. Drug Saf. (2002) 25(6): BROWN EG: Methods and pitfalls in searching drug safety databases utilising the Medical Dictionary for Regulatory Activities (MedDRA). Drug Saf. (2003) 26(3): BROWN EG: Using MedDRA: implications for risk management. Drug Saf. (2004) 27(8): HAND D: Technical report, Department of Mathematics, Imperial College London. 77. STROM BL: Evaluation of suspected adverse drug reactions. JAMA (2005) 293(11): ALMENOFF JS, DUMOUCHEL W, KINDMAN LA et al.: Disproportionality analysis using empirical Bayes data mining: a tool for the evaluation of drug interactions in the post-marketing setting. Pharmacoepidemiol. Drug Saf. (2003) 12(6): Websites FDA website Adverse event reporting system (2005) monitorsafequalmed/yellowcard/yellow cardscheme.htm MHRA website Yellow card scheme (2005) The Uppsala Monitoring Centre (2005) madigan/bbr Bayesian Binary Regression software (2005) Roux.pdf Spontaneous reporting system modelling for data mining methods evaluation in pharmacovigilance. Conference on intelligent data analysis in medicine and pharmacology. October 2003, Protaras, Cyprus. (2005) images/pharmacovig3_05.pdf Guidance for industry: Good pharmacovigilance practices and pharmacoepidemiologic assessment (2005). Expert Opin. Drug Saf. (2005) 4(5) 947

20 The role of data mining in pharmacovigilance Affiliation Manfred Hauben 1,2,3 MD, MPH, David Madigan 4 PhD, Charles M Gerrits 5 PharmD, PhD, Louisa Walsh 6 MD & Eugene P Van Puijenbroek 7 MD, PhD Author for correspondence 1 Pfizer, Inc., Risk Management Strategy, New York, NY, USA 2 New York University School of Medicine, Department of Medicine, New York, NY, USA 3 New York Medical College, Departments of Pharmacology and Community and Preventive Medicine, Valhalla, NY, USA 4 Rutgers University, Department of Statistics, Piscataway, New Jersey, USA 5 Takeda Global Research and Development, Inc., Department of Pharmacoepidemiology and Outcomes Research, Lincolnshire, Illinois, USA Tel: ; Fax: ; cgerrits@tgrd.com 6 AstraZeneca LP, Clinical Drug Safety, Wilmington, Delaware, USA 7 Netherlands Pharmacovigilance Centre, Lareb, s-hertogenbosch, The Netherlands 948 Expert Opin. Drug Saf. (2005) 4(5)