Big Data Challenges: Data Management, Analytics & Security



Similar documents
Participating in Alzheimer s Disease Clinical Trials and Studies

Bijan Raahemi, Ph.D., P.Eng, SMIEEE Associate Professor Telfer School of Management and School of Electrical Engineering and Computer Science

Use advanced techniques for summary and visualization of complex data for exploratory analysis and presentation.

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

A Labour Economic Profile of New Brunswick

Dimensionalizing Big Data. WA State vs. peers. Building on strengths CONTENTS. McKinsey & Company 1

LexisNexis Provider FAQs

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Small Business Data Assess Your Competition Define Your Customers

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Personalized Medicine and IT

Big Analytics: A Next Generation Roadmap

Executive Summary. Principal Findings

Big Data Trends A Basis for Personalized Medicine

A LEADER IN BEHAVIORAL ANALYTICS AND PIONEER IN PERSONALITY-BASED SOFTWARE APPLICATIONS

Requirements for Complex Interactive Workflows in Biomedical Research. Jeffrey S. Grethe, BIRN-CC University of California, San Diego

Statistics for BIG data

NC State University Initiatives in Big Data

Prediction of the MoCA and the MMSE in Out-patients with the risks of cognitive impairment

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

Predictive Big Data Analytics: Imaging-Genetics Fundamentals, Research Challenges, & Opportunities. Outline

Big Data a threat or a chance?

Regulatory Issues in Genetic Testing and Targeted Drug Development

Healthcare data analytics. Da-Wei Wang Institute of Information Science

CLUSTER ANALYSIS WITH R

Cognitive Testing for Underwriting Life Insurance

Factors for success in big data science

Big Data Analytics in Health Care

ANXIETY & COGNITIVE IMPAIRMENT

Session 2. The economics of Cloud Computing

Validation parameters: An introduction to measures of

Tools for Understanding Economic Change in Communities: Economic Base Analysis and Shift-Share Analysis

Primary Endpoints in Alzheimer s Dementia

The data explosion is transforming science

Business Case Development for Credit and Debit Card Fraud Re- Scoring Models

LDIF - Linked Data Integration Framework

Advancing research: a physician s guide to clinical trials

IMPLEMENTING BIG DATA IN TODAY S HEALTH CARE PRAXIS: A CONUNDRUM TO PATIENTS, CAREGIVERS AND OTHER STAKEHOLDERS - WHAT IS THE VALUE AND WHO PAYS

Master of Science in Computer Science. Option Health Information Systems

1. Introduction to ehealth:

Collaborations between Official Statistics and Academia in the Era of Big Data

SECURITY RISK MANAGEMENT

Dementia: Delivering the Diagnosis

Frequently Asked Questions

Cloud Computing and Health Care Facing the Future. Jerry Fahrni, Pharm.D. April 14, 2010

STEM Occupations and Employment: A Brief Review for Oklahoma

Capgemini Big Data Analytics Sandbox for Financial Services

ORACLE HEALTH SCIENCES INFORM ADVANCED MOLECULAR ANALYTICS

A Support System for Diagnosis of Dementia, Alzheimer or Mild Cognitive Impairment

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

PPD LABORATORIES CENTRAL LAB: SUPERIOR SERVICE, QUALITY DATA WITHOUT COMPROMISES

General Services Administration Federal Supply Service Authorized Federal Supply Schedule Price List

National and Transnational Security Implications of Big Data in the Life Sciences

Nandan Banerjee Cogent Infotech Corporation COGENT INFOTECH CORPORATION

SCALABLE SYSTEMS LIFE SCIENCE & HEALTHCARE PRACTICES

GLOSSARY OF EVALUATION TERMS

The History of NAICS

Big Data Analytics Empowering SME s to think and act

Turning SIC to NAICS, where do we stand?

Predicting Medication Compliance and Persistency

Metrics that Matter Security Risk Analytics

Segmentation: Foundation of Marketing Strategy

Introduction to Data Mining

A Pharmacometrician s Perspective for Utilization of Big Data

Industry Sector Analysis

The Economic Impact of Fire Damage on Wyoming s Economy from a Business Perspective

Workforce Development: The Future of Nursing Informatics

Neuroimaging Big Data Challenges and Computational Workflow Solutions

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Diagnostik der Zukunft: Wissen wir mit Proteomik und Genomik wirklich mehr?

The New World of Data. Don Strickland President, Strickland & Associates

Local Clinical Trials

Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research

Ivo D. Dinov. Statistics Online Computational Resource Michigan Institute for Data Science School of Nursing University of Michigan

How To Use Big Data Effectively

Landscale to Regional Scale Concerns About Human Well-Being in the Context of Global Change: Approaches to Problem Solving

PKI: THE SECURITY SOLUTION FOR THE INTERNET OF THINGS

Data Mining and Machine Learning in Bioinformatics

EVALUATION OF AUTOMATIC CLASS III DESIGNATION FOR STUDIO on the Cloud Data Management Software DECISION SUMMARY

ADNI Data Training Part 2

Doctor of Philosophy in Computer Science

An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives

Clinical Research Infrastructure

LifeLines Cohort Study. Salome Scholtens, Manager Research Office

Application of SAS! Enterprise Miner in Credit Risk Analytics. Presented by Minakshi Srivastava, VP, Bank of America

Wages of Employed Texans Who Attended Texas Public Schools

FOREIGN AFFAIRS PROGRAM EVALUATION GLOSSARY CORE TERMS

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

Data Analytics at NICTA. Stephen Hardy National ICT Australia (NICTA)

Elevate your analytics with SAS in the cloud

Developing Data Analytics Skills in Japan: Status and Challenge

Mining productivity has declined 28% in the last 10 years. MineLens enables you to reverse the trend and improve productivity.

PsyD Psychology ( )

Best Practices in Data Visualizations. Vihao Pham January 29, 2014

Best Practices in Data Visualizations. Vihao Pham 2014

Avg cost of a complex trial $100mn. Avg cost per patient for a Phase III Study

Proposal for the Theme on Big Data. Analytics. Qiang Yang, HKUST Jiannong Cao, PolyU Qi-man Shao, CUHK. May 2015

TRENDS IN DATA WAREHOUSING

The role of big data in medicine

Transcription:

Big Data Challenges: Data Management, Analytics & Security Ivo D. Dinov Statistics Online Computational Resource University of Michigan www.socr.umich.edu

Big Data Challenges Availability, Sharing, Aggregation and Services Classical Data Science vs. Innovative Big Data Science Amateur Scientists vs. Experts Data Scientists vs. Practitioners Domain-specific vs. Trans-disciplinary knowledge Commercial vs. Open-source Resourceome Rapid Big Data Evolution Big Data IT proliferation Big Data Security risks Centralization won t work in Big Data Space Big Data is incredibly time, space, protocol, context dependent!

Big Data Characteristics * Mixture of quantitative & qualitative estimates Dinov et al. (2013)

Availability, Sharing, Aggregation & Services Cisco: "By the end of 2012, the number of mobile-connected devices [exceeded] the number of people on Earth There will be over 10 billion mobile-connected devices in 2016; i.e., there will be 1.3 mobile devices per capita Industry Sector Computer & Electronic Products Information Services Manufacturing Admin, support & waste management Transportation & Warehousing Wholesale Trade Professional Services Healthcare Providers Real Estate and Rental Finance and Insurance Utilities Retail Trade Government Accomodation & Food Arts & Enterntainment Corporate Management Other Services Construction Education Services Natural Resources Percent Growth Bubble Size ~ Relative size of GDP Big Data Value Potential Index U.S. Bureau of Labor Statistics McKinsey Global Institute

Amateur Scientists vs. Experts Democratization of Big Data Science Doctorate studies/certification is not mandatory nor does it guarantee appropriate Big Data expertise Lower barriers of entry Demand for constant Continuing Education and self-training Dichotomy between theoretical and empirical sciences Differences between fundamental knowledge and experimental skills (big data properties closely approximate core scientific principles)

Domain-specific vs. Trans-disciplinary knowledge Math/Stats Physics Biology Chemistry... Big Data Science Medical Sciences Social Sciences Environmental Sciences... Engineering Computer Science Bioinformatics Biomath/Biostats...

Commercial vs. Open-source Resourceome There is an explosion of open-data-science resources www.data.gov www.ncbi.nlm.nih.gov/gap Spawning of a number of industries and enterprises blending proprietary and open-source data, code, documentation, expert-support, infrastructure and services Big Data to Knowledge: www.bd2k.org Google Cloud Platform (GCP) Amazon Web Services (AWS)

Commercial vs. Open-source Resourceome

Rapid Big Data Evolution Millions of Grass-Roots initiatives addressing Big Data Challenges Big Data complexities require truly innovative, collaborative, trans-disciplinary solutions Increase of Data complexity Sources Heterogeneity Datum-elements Incongruent sampling

Data Scientists vs. Practitioners Modelers, Engineers, (Applied) Users No one user completely understands the entire pipeline of data provenance, processing protocols, analytic strategies, or results interpretation Black-boxes. Accuracy Privacy concerns Consistency Infrastructure

Big Data Security Risks Big Data Fusion provides enormous opportunities and presents significant challenges Privacy, security and legal concerns, authenticity, accuracy, consistency, reliability, availability Healthcare The cloud services enable sharing big data Significant security and privacy concerns exist, Health Insurance Portability and Accountability Act (HIPAA) EMR/EHR Federal, state and local regulations/policies (IRBMED) Genetics Viral - Dual-use research of concern (DURC), 10.1126/science.1223995 de novo synthesis of polio virus, the Australian mousepox experiment, the Penn State aerosolization study

Kryder s law: Exponential Growth of Data Increase of Imaging Resolution 6E+15 4E+15 2E+15 0 1 µm 10 µm 100 µm 1mm Gryo_Byte Cryo_Short Cryo_Color Cryo_Color Cryo_Short Gryo_Byte 1cm 15000000 10000000 5000000 Neuroimaging(GB) Genomics_BP(GB) Moore s Law (1000'sTrans/CPU) Data volume Increases faster than computational power 0 1985-1989 1990-1994 1995-1999 2000-2004 2005-2009 2010-2014 2015-2019 (estimated) Moore s Law (1000'sTrans/CPU) Genomics_BP(GB) Neuroimaging(GB) Dinov, et al., 2013

Alzheimer s Case Study: Stable-MCI vs. MCI-Converters Goals predictive-power of combinations of biomarkers and imaging derivative measures to provide reliable predictors of conversion from MCI to Alzheimer s disease Data MCI converters to AD (24-month period) and stable non-converters; matched for age, gender, handedness, education level Imaging (smri), Behavioral, Clinical, Neuropsychiatric, Biological data Approach Qualitative Exploratory Data Analysis and Quantitative Statistical Analysis (morphometric imaging correlates with clinical and genetics markers) MCI = Mild Cognitive Impairment (prelude to dementia of Alzheimer s type)

Alzheimer s Case Study: Stable-MCI vs. MCI-Converters Subject Demographics Gene -tics Clinical Neuroimaging Index Age Kg Sex APOE A1 APOE A2 NPI SCORE MMSE GD TOTAL CDR FAQ TOTAL L Gyrus Rectus BL L Superior Occipital Gyrus BL R Fusiform Gyrus BL L Caudate BL R Caudate BL L Putamen BL R Putamen BL 1 65 59 F 3 4 0 23 1 0.5 7 1695 3976 8363 1296 1992 1749 2776 2 73 93 M 3 3 7 19 1 1 8 1333 6016 13290 835 2137 2290 4327................................. N 64 63 F 3 3 3 29 6 0.5 2 2237 6887 16109 1223 2222 2525 4110

Alzheimer s Case Study: Stable-MCI vs. MCI-Converters Classification Results Using Baseline Data Hierarchical Clustering Prediction Ana (7 Regions) Metric True State (Dx at 24 month follow up) Converter Stable Total Converter TP FP TP+FP Stable FN TN FN+TN Total TP+FN FP+TN N Top 7 Regions Value Top 20 Regions Sensitivity 0.81 1.0 Specificity 0.61 0.87 Power to detect Converters 0.91 1.0 Accuracy 0.70 0.93