Applying Big Data approaches to Competitive Intelligence challenges THOMSON REUTERS IP & SCIENCE PHARMA CI EUROPE CONFERENCE & EXHIBITION TIM MILLER 19 FEBRUARY 2014
BIG DATA, NOT JUST ABOUT VOLUMES Patient Data NGS Volume Variety RDBMS, XML, API, RSS, RDF, Office Documents, Web Crawled, EHR Pipeline, Patent, Journal, Conference, Trial, Press, Investor calls, Chemical Structure, Sequence, Expression, Gene Variant, Pathway, Social, Approval documents Veracity Velocity.. But can you trust it? Laboratory Automation Social Media 2
THE OLD APPROACHES DON T WORK Vs. 3
LEARNING FROM BIG DATA TECHNIQUES 1. Start with the (visual) analytic 2. Fetch & enhance the data to fit the analytic 3. Make it interactive 4. Automate it 4
Its all about the visual metaphor EXAMPLE ANALYTICS 5
DEVELOPMENT FUNNEL The funnel is a familiar metaphor for showing competing drug programs. Data from Analytics API Using Tiled Markers X Axis is development status Y Axis is a development status rank, offset by unique drug ID, minus half the total of drugs in earlier phases 6
SHOTS ON GOAL A planetary view is useful for showing closeness to a goal, here showing similarity of a set of in-licensing candidates to an ideal drug Data from Opportunity Finder API X is a random value based on score Y is score 2 -x 2 * rand(+/-) 7
TIMELINE VIEWS Timeline views are useful for showing events and durations, e.g. trial timelines or, in this example: drug development history Data from Investigational Drugs API Events plotted using a JavaScript library 8
HEAT MAPS Heat Maps are good for comparing aggregated data as in this comparison of drug safety profiles Data from Clinical API Adverse Events by % affected in trial aggregated across multiple trials 9
NETWORKS Networks are a useful tool for showing relationships, here identifying Key Opinion Leaders Data from Web Of Science API Author disambiguation using text fingerprints Visualized in CytoScape 10
GEOGRAPHIC Maps can be used to show geographic trends, e.g. epidemiology but also for navigating to street level as in this example for trial site selection Data from Clinical API Right hand panel is a map file with interactive shapes Left hand panel is an embedded Web Page view from Google Maps, passing the address in the URL 11
INFOGRAPHIC Dashboard Infographics can present a lot of data in a single view Data from Targets API & Omics API Infographic built using D3 JavaScript library 12
IMAGE OVERLAYS Image overlays enable you to decorate familiar visuals with additional insight, here we show commercial activity over a pathway Data from Opportunity Finder API, Image & object coordinates from Omics API 13
FETCHING AND ENHANCING DATA 14
AGGREGATING & CLEANING DATA WITH DESKTOP TOOLS Analytics programs like Spotfire have tools for joining data from multiple sources and for enhancing the data. There are also specialist tools like BizInt and general tools like R 15
AGGREGATING & CLEANING DATA WITH PIPELINE TOOLS Tools like Pipeline Pilot and Knime greatly increase the flexibility to build data pipelines that aggregate and enhance the raw data 16
EXTRACTING MORE INFORMATION BY TEXT MINING & CLUSTERING Unstructured sources like social media can be brought into the analysis using text mining and clustering 17
DATA IN HARMONY THE POWER OF ONTOLOGIES The content needed can be unstructured text or databases indexed using the vocabulary that made sense to their creators. Joining that together requires ways to mine content and map vocabularies Office documents in MS SharePoint Text mining Vocabulary Mapping Vocabulary Mapping ETL Healthcare database indexed to ICD9 Vocabulary Mapping ETL Toxicity database indexed to MedDRA Vocabulary Mapping ETL Internal database indexed to MeSH Data Pool 18
BIG DATA TECHNOLOGIES FOR DATA PERSISTENCE Linked Data & NoSQL databases provide powerful frameworks for joining data from multiple sources 19
PUBLISHING THE RESULTS Users aren t interested in data for its own sake, they want information, they want to tailor the view and they need it in their daily workflows Is this a new structure? How many patients can I expect from this site? Has the competitive landscape changed since I did these slides? Entering data into an ELN Designing a clinical trial Presenting to the Therapy Area team 20
MAKING IT HAPPEN Let the Vendor(s) do it Pick a (set of) tool(s) and roll your own Do both Cortellis for Information Integration Cortellis Data Fusion Cortellis Web Portal Cortellis for Informatics APIs 21