Division of Bioinformatics and Biostatistics Weida Tong, Ph.D. The views presented do not necessarily reflect the views of the FDA. 1
Established on May 20 th, 2012. Three Branches: Division Overview Bioinformatics branch is centered around research. Biostatistics branch focuses on research and service (National Toxicology Program and NCTR protocols). Scientific Computing branch is service-oriented. Current Staff: Approximately 50 Full-Time Employees including post-doc fellows. 2
Research Division, Mission, and Vision To conduct integrative bioinformatics and biostatistics research to support FDA s mission of improving the safety and efficacy of FDA-regulated products. Service To provide research and regulatory support to NCTR and FDA scientists in bioinformatics, biostatistics, and scientific computing. Focused on FDA Relevance To ensure that the division s activities relate to FDA s mission, our linkages with product centers are strengthened, and that our capabilities continue to evolve to meet the current and future needs of FDA. 3
Division Research - Overview Drug and Emerging Technology Standards development (e.g., MicroArray Quality Control project). Biomarker and predictive toxicology (e.g., genomic and in vitro biomarkers). Drug safety and drug repositioning (e.g., pharmacovigilance). Personalized medicine (e.g., patient stratification) and risk assessment. Converting Research Findings to Software Applications ArrayTrack- a genomic tool to support FDA review and research. FDALabel - a web-based database for FDA-approved drug labels. Knowledgebase development includes: Liver Toxicity Knowledge Base (LTKB) Endocrine Disruptor Knowledge Base (EDKB) Foodborne Pathogens Salmonella PFGE Knowledge Base 4
Accomplishment: 1. MicroArray Quality Control (MAQC) MAQC: An FDA-led consortium effort to assess technical performance and application of emerging technologies (microarrays, genome-wide association study, and next-generation (NGS) sequencing) for safety evaluation and clinical application. MAQC-1 (2005 2006): Technical Reliability of Microarrays 137 participants from 51 organizations; 6 publications (2006). Conclusion: microarray technologies are reproducible (Provided scientific basis for the document Guidance for Industry: Pharmacogenomics Data Submission (2007). MAQC-2 (2006 2010): Microarray-Based Biomarkers 202 participants from 97 organizations; 13 publications (2010). Conclusion: the accuracy of microarray based biomarkers depends more by the endpoint studied than the choice of bioinformatics methods. 5
MAQC-3: Sequencing Quality Control (SEQC) Project was completed in 2014. More than 180 participants from 73 organizations. Focused on Next-generation Sequencing (RNA-Sequencing) Technical performance of RNA-Seq (quality control, accuracy, cross-lab and cross-platform reproducibility, etc.). Compare RNA-Seq with the mature microarray technology. The effect of bioinformatics approaches for downstream biology. Clinical utility of RNA-Seq as a biomarker. Toxicogenomics and safety evaluation with RNA-Seq. SEQC Manuscripts 3 by Nature Biotechnology 2 by Nature Communication 2 by Genome Biology (1 accepted and 1 in review) 3 by Scientific Data 6
2. Liver Toxicity Knowledge Base (LTKB) LTKB: A knowledgebase system for predicting drug induced liver injury (DILI) in humans. Accomplishments: Compiled a benchmark drug list for community use from which the data generated by different labs can be compared and integrated. Constructed a database containing diverse datasets (e.g., genomics, in vitro assays, etc.) freely available for public use. Developed four predictive models for DILI in humans (one model is being implemented for review use). LTKB Publications: Hepatology, Clinical Pharmacology and Therapeutics, Drug Discovery Today, American Journal of Pathology, and Toxicological Science 7
3. Food and Product Safety Food Safety: A foodborne pathogens Salmonella PFGE knowledgebase system for rapid identification of Salmonella outbreak strains. Accomplishment: Constructed a database of over 46,000 Salmonella PFGE (Pulsed-field Gel Electrophoresis) fingerprints and developed a prediction system for identification/characterization of Salmonella serotypes based on the PFGE patterns. Three Publications: J Clinical Microbiology (2), PLoS One Pharmacovigilance: Data mining for signal detections of FAERS database for postmarking safety surveillance. Accomplishment: Developed a biclustering techniques for identifying drug families with their associated AE groups. Two Publications: PLoS One, J Biopharmaceutical Statistics 8
4. Biomarkers and Personalized Medicine Statistical modeling and data mining techniques to identify personalized medicine (prognostic and predictive) biomarkers for patient treatment decision. Accomplishments: Developed statistical models for the drug-induced organ toxicity biomarkers and classification algorithms for treatment selection. Identified and evaluated reproducibility of cancer biomarkers across studies. Identified low expression region for (RNA) NGS data analysis. Developed software for gene set enrichment analysis. Publications: Pharmacogenomics, PLoS One, BMC Bioinformatics, BMC Med Res Methodology, Translational Cancer Research, International J Molecular Sciences, Briefings in Bioinformatics, J Biopharmaceutical Statistics, BioMed Research International. 9
Research and Collaboration to Support Product Centers FDALabel TM : A web-based database developed from the information in the FDAapproved drug labels to access, utilize, and analyze FDA drug labeling data. Drug labeling: Information about product indications, target populations, and adverse drug reactions collected during clinical trials and post-marketing surveillance. Version 1 is Publicly Available: Approximately 600 unique user accesses per month for the past 6 months. Working with an agency-wide working group and CDER reviewers to apply the database in the review process. Advanced query mechanisms for easy retrieval of relevant information. Link to other FDA databases: FDA Adverse Event Reporting System (FAERS); Substance Registration System (SRS); Drug@FDA 10
LTKB: Support Drug Review Case 1 (An IND drug with NDA pending): The drug has clinical signs of DRESS and liver toxicity found in Japanese patients. The reviewers have concern whether it could cause disasters similar like trovafloxacin (withdrawn due to DILI and DRESS). Case 2 (An IND drug) This drug was found to inhibit mitochondrial ATPase in the non-clinical studies. Its DILI risk in humans was assessed with LTKB by the reviewer. Current and Future Plans: Implementation of software to support the FDA review. Expanding the LTKB data and improve predictive models. Further development of methodologies for integrated analysis of LTKB data. A MAQC-like approach to develop analysis standard and practices of using in vitro models for DILI based on the benchmark drug list. * DRESS: Drug Reaction (or Rash) with Eosinophilia and Systemic Symptoms 11
On-Going CDER Collaborations QT intervals analysis of cardiotoxicity: identifying risk factors for safety assessment of drug-induced QT prolongation. Blood pressure threshold evaluation: assessment of potential sex-based criterion for cardiovascular disease risk: (OWH funded project, 2014-2016) Predicting patient-specific treatment outcomes: identification and validation of molecular biomarkers for patient selection using In silico tools Improve the CDER DASH (Dashboard) system: An NDA data management system with multiple users; Experiencing slowness and crush, needs immediate fix 12
CTP Projects in DBB Two IT centric projects: o o Enclave: Develop an IT infrastructure for an enhanced communication between FDA and non-fda institutes TCKB: Tobacco Constituents Knowledge Base for managing data related to tobacco constituents (e.g., chemical name and structure, physico-chemical property, toxicity, reference, tobacco brand, composition and etc) Two text mining-centric projects: o o Topic modeling for the tobacco documents to facilitate review Assessing tobacco constituents with computational means for 5 major health endpoints (cancer, cardiovascular disease, reproductive toxicity and addiction) Three newly funded projects (2014): o o High-throughput Screening Tobacco Constituents for Addiction Potential Using Docking of Nicotinic Acetylcholine Receptors CTP Bioinformatics Tobacco Constituents Knowledge Base and HPHC Toxicology o CTP Bioinformatics for Text and Topic Modeling 13
NCTR Statistical and Scientific Computing Collaboration/Support Biostatistics Branch: Biochemical Toxicology (12%/FTE): E0218401, E0752901 Systems Biology (45%/FTE): E0740411, E0733101, E0745501, E0750301, E0745301, E0752601, E0752201, E0743001 Neurotoxicology (50%/FTE): E0751901, E0751201,E0745201 Scientific Computing Branch: Develop and modify software tools for data acquisition, standardization, and management. Develop and optimize databases. Coordinate technical integrity of NCTR system and liaise between NCTR and FDA OIM 14
Strategic Positioning #1: NGS Issues: data standard, quality, storage, transforming, analysis, and applications Leverage our experience from MAQC-3/SEQC Work with our FDA product centers Biomarkers and personalized medicine Food Safety and pathogen detection Engage in global collaboration via Global Coalition on Regulatory Science Research (GCRSR) 15
Strategic Positioning #2: Big Data Challenges in Biomedical Big Data: accessing, managing, analyzing, and integrating a collection of datasets that are so large and complex exceeding the abilities of traditional approaches to manage and analyze effectively Align our strategy with the agency-wide effort. Leverage the regional resources: Arkansas Research and Network Optical Network (ARE-ON). Establish the minimum in-house capability. Initial effort is to develop novel data science (statistical) methodologies to deal with FDA big data : FAERS database contains over 20 million reports; Drug label contains approximately 60,000 labels Access, management, process, modeling, analysis 16
THANK YOU!