Big Data and Text Mining

Similar documents
Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences

Find the signal in the noise

Text Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining

New Clinical Research & Care Opportunities Through Big Data Informatics

TRANSFORMING LIFE SCIENCES THROUGH ENTERPRISE ANALYTICS

Big Data Trends A Basis for Personalized Medicine

A leader in the development and application of information technology to prevent and treat disease.

Integrating Genetic Data into Clinical Workflow with Clinical Decision Support Apps

Driving Innovation in Licensing Through Competitive Intelligence and Big Data Analytics

Big Data and Analytics in Government

IMPLEMENTING BIG DATA IN TODAY S HEALTH CARE PRAXIS: A CONUNDRUM TO PATIENTS, CAREGIVERS AND OTHER STAKEHOLDERS - WHAT IS THE VALUE AND WHO PAYS

Uncovering Value in Healthcare Data with Cognitive Analytics. Christine Livingston, Perficient Ken Dugan, IBM

Big Data Analytics- Innovations at the Edge

HOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD

From Data to Foresight:

SILOBREAKER ENTERPRISE SOFTWARE SUITE

KNOWLEDGENT WHITE PAPER. Big Data Enabling Better Pharmacovigilance

ProteinQuest user guide

THOMSON REUTERS CORTELLIS FOR INFORMATICS. REUTERS/ Aly Song

SAP Healthcare Analytics Solutions Provide physicians and researchers access to patient data from various systems in realtime

STATE OF CONNECTICUT State Innovation Model Health Information Technology (HIT) Council Answers to Questions for Zato

Personalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences

CLINICAL TRIALS SHOULD YOU PARTICIPATE? by Gwen L. Nichols, MD

Oracle Buys Phase Forward Expands Oracle s solutions for the life sciences and healthcare industries

An Essential Ingredient for a Successful ACO: The Clinical Knowledge Exchange

BIG DATA BREATHES LIFE INTO NEXT-GEN PHARMA R&D

I n t e r S y S t e m S W h I t e P a P e r F O R H E A L T H C A R E IT E X E C U T I V E S. In accountable care

Dr Alexander Henzing

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

The Big Data Paradigm Shift. Insight Through Automation

Data-driven Medicine in the Age of Genomics Overcoming the Challenge With Advanced Molecular Analytics

The Business Value of Predictive Analytics

Cancer Patients Urgently Need Effective, Genetically-Targeted Treatments

An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives

Design and validation of automated, customized clinical history searches for imaging interpretation

Connecting Basic Research and Healthcare Big Data

Information Exchange and Data Transformation (INFORMED) Initiative

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences

Big Data Text Mining and Visualization. Anton Heijs

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

Sentiment Analysis on Big Data

PONTE Presentation CETIC. EU Open Day, Cambridge, 31/01/2012. Philippe Massonet

Semantically Steered Clinical Decision Support Systems

Regulatory Issues in Genetic Testing and Targeted Drug Development

Real-Time Solutions to Big Data Problems

Natural Language Processing in the EHR Lifecycle

How To Make Sense Of Data With Altilia

Use of the Research Patient Data Registry at Partners Healthcare, Boston

How To Understand The Pharmacology Of The Pharmaceutical Industry

TRANSLATIONAL BIOINFORMATICS 101

Bio-IT World 2013 Best Practices Awards

How To Change Medicine

IBM Watson and Medical Records Text Analytics HIMSS Presentation

Big Data and the Data Lake. February 2015

Accelerating Clinical Trials Through Shared Access to Patient Records

QLIKVIEW FOR LIFE SCIENCES. Partnering for Innovation and Sustainable Growth

Understanding the Value of In-Memory in the IT Landscape

If you are signing for a minor child, you refers to your child throughout the consent document.

How To Use Data Analysis To Get More Information From A Computer Or Cell Phone To A Computer

CTC Technology Readiness Levels

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

Analance Data Integration Technical Whitepaper

EDITORIAL MINING FOR GOLD : CAPITALISING ON DATA TO TRANSFORM DRUG DEVELOPMENT. A Changing Industry. What Is Big Data?

Big Data Mining: Challenges and Opportunities to Forecast Future Scenario

Survey Results: Requirements and Use Cases for Linguistic Linked Data

TIBCO Spotfire Helps Organon Bridge the Data Gap Between Basic Research and Clinical Trials

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE E15

Janus Clinical Trials Repository (CTR) An Update

Managing and Integrating Clinical Trial Data: A Challenge for Pharma and their CRO Partners

Flattening Enterprise Knowledge

SAP/PHEMI Big Data Warehouse and the Transformation to Value-Based Health Care

SAS Drug Development User Connections Conference 23-24Jan08

An Introduction to Genomics and SAS Scientific Discovery Solutions

In-Database Analytics

PharmaPendium. The definitive source of best-in-class drug information

SOLUTION BRIEF. SAP/PHEMI Big Data Warehouse and the Transformation to Value-Based Health Care

MDM Approach for EVMPD & IDMP Compliance

User Needs and Requirements Analysis for Big Data Healthcare Applications

Division of Bioinformatics and Biostatistics

Transcription:

Big Data and Text Mining Dr. Ian Lewin Senior NLP Resource Specialist Ian.lewin@linguamatics.com www.linguamatics.com

About Linguamatics Boston, USA Cambridge, UK Software Consulting Hosted content Agile, scalable, real-time NLP-based text mining Fact extraction and knowledge synthesis Pharma/Biotech Including 17 of the top 20 Healthcare Including Kaiser Permanente Government Including FDA 2 Linguamatics 2015

Solutions & Applications in Life Sciences Advanced text analytics delivers value along the pipeline Gene-disease mapping Target ID/selection Trial site selection and study design Regulatory Submission QC HEOR Toxicity analysis and prediction Safety Pharmacovigilance Mutation/expression analysis SAR Biomarker discovery Competitive intelligence Comparative Effectiveness Drug repurposing Patent analysis KOL identification Opportunity scouting Social media analysis 3 Linguamatics 2015 - Confidential

Solutions & Applications in Healthcare Structured data Patient characteristics FDA drug labels Pathology, radiology, initial assessment, discharge, check up Patient characteristics Electronic Health Record Enterprise Data Warehouse Potential adverse drug reactions Patient characteristics Scientific literature Clinical case histories and/or genomic interpretation Patient characteristics Care gap models Patient lists Matching Clinical trials Clinical trials gov 4 Linguamatics 2015 - Confidential

Structured Data & its Evidential Basis... I2E can mine and extract with precision at scale Scientific literature Patents News feeds EHRs Internal reports Drug labels Clinical trials... Social media 5 Linguamatics 2015 - Confidential

Text Mining a precursor to Big Data? Unstructured data is just huge We can t wait for those human db curators... Besides, those curators ignore my parameter.. And all that text is just out there! (see Google for details) Only it isn t 6 Copyright Linguamatics 2014 - Confidential

Multisource data Big data Lots of different types of data Scientific literature Medical records Patents Regulatory publications (clinical trials, drug labels, adverse event reporting ) Internal reports Lots of different types of text In lots of different silos & lots of different licences 7 Copyright Linguamatics 2014 - Confidential

Connected Data Technology Single query across multiple data sources and network locations 8 Copyright Linguamatics 2014-2015 - Confidential

Connected Data Technology Query across multiple data sources simultaneously 9 Copyright Linguamatics 2014-2015 - Confidential

Connected Data Technology Unified results for fast review and discovery of relationships across multiple data sources 10 Copyright Linguamatics 2014-2015 - Confidential

Huge (Textual) Data Big Data We (i.e. text-miners ) are often joining data Unstructured And structured Across silos Before the tabular results go to analysis 11 Copyright Linguamatics 2014 - Confidential

The How of Text Mining Text Mining isn t completely shrink wrapped There is, usually, some customization To find the parameter value that you re interested in To find the value that everyone s interested in, but only in circumstances c To find it in datasource X To find it in X but only in circumstances c To map to ontology A rather than B It often makes sense to express these constraints at time of text-mining (not analysis) 12 Copyright Linguamatics 2014 - Confidential

Toolbox of Methods for Powerful Querying NLP Precise linguistic relationships, sentence co-occurrence Precise negation e.g. pressure but not blood pressure Terminologies Regular Expressions Search for concepts and classes, not just keywords e.g. cancer and get synonyms and children: Malignant neoplasms, Malignant tumor Rule based pattern matching for e.g. measurements, lab codes, mutations e.g. microrna: let-?\d+.* mirn?a?-?\d+.* Chemistry Fielded Search Restrict within particular regions of a document, including nested e.g. table cell in table in Description High Throughput Simultaneous processing of large numbers of items e.g. 500 compounds, 500 genes from microarray experiment, etc. 13 Linguamatics 2015 - Confidential

Linguistic Processing Using NLP Interprets meaning of the text Groups words into meaningful units Search for different forms of words sentences noun groups verb groups morphology - match entities match actions different forms We find that p42mapk phosphorylates c-myb on serine and threonine. Purified recombinant p42 MAPK was found to phosphorylate Wee1. 14 Linguamatics 2015 - Confidential

Discovering extraction patterns.. We often need to look at the data first (the huge data ) to find the extraction patterns Linguistic patterns of expression vary Over data sets Over time This pre-extraction exploration is something itself that needs informing By the ontologies and KBs that are already out there By the re-use of generally successful strategies 15 Copyright Linguamatics 2014 - Confidential

Innovative tools to enable exploration of complex and specialised data sets Grant funded by InnovateUK (Dept of BIS and EPSRC) Sponsored Partners: Univ. of Essex & Linguamatics Project End-date: mid 2016 easier discovery and extraction of key facts by sharing search strategies rather than sharing just search results by using novel algorithms for semantic information extraction linking information from multiple resources to help users find similar and relevant information. 16 Copyright Linguamatics 2014 - Confidential

Summary Text Mining the extraction of structured information from unstructured text It s a natural precursor to large scale analytics It s also a big data task itself Voluminous source data Distributed over many silos Expressed in different ways It s not just a precursor We re (already) joining data at extraction time We re researching exploiting and joining more data at the earliest phases of data exploration, prior to extraction 17 Copyright Linguamatics 2014 - Confidential

Thank You For more information Visit: www.linguamatics.com Contact: Ian Lewin ian.lewin@linguamatics.com