A.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace

Similar documents
Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

An Introduction to Health Informatics for a Global Information Based Society

Healthcare Measurement Analysis Using Data mining Techniques

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Graduate School of Informatics

FACULTY OF ALLIED HEALTH SCIENCES

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Putting IBM Watson to Work In Healthcare

Big Data Text Mining and Visualization. Anton Heijs

Big Data, Analytics, Intelligence: Potenziale und Nutzen

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

APPENDIX to CAHIIM 2012 Curriculum Requirements Health Informatics Master s Degree

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Web Data Mining: A Case Study. Abstract. Introduction

Machine Learning.

Introduction. A. Bellaachia Page: 1

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Exploration and Visualization of Post-Market Data

Hexaware E-book on Predictive Analytics

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

Health Informatics CS580C1/EL Course Format (On Campus/Blended)

Big Health Data the challenges and connections

Big Data Analytics for Healthcare

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Knowledge-based systems and the need for learning

Prediction of Heart Disease Using Naïve Bayes Algorithm

Basic academic skills (1) (2) (4) Specialized knowledge and literacy (3) Ability to continually improve own strengths Problem setting (4) Hypothesis

Syllabus. HMI 7437: Data Warehousing and Data/Text Mining for Healthcare

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Master of Science in Artificial Intelligence

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Course Description This course will change the way you think about data and its role in business.

Automated Problem List Generation from Electronic Medical Records in IBM Watson

STA 4273H: Statistical Machine Learning

Electronic health records to study population health: opportunities and challenges

Module 223 Major A: Concepts, methods and design in Epidemiology

PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA

MA2823: Foundations of Machine Learning

Data Mining Solutions for the Business Environment

Course 395: Machine Learning

What is Artificial Intelligence?

Machine Learning and Statistics: What s the Connection?

Text Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining

An Overview of Knowledge Discovery Database and Data mining Techniques

Financial Trading System using Combination of Textual and Numerical Data

Office: LSK 5045 Begin subject: [ISOM3360]...

Vanderbilt University Biomedical Informatics Graduate Program (VU-BMIP) Proposal Executive Summary

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Search Engines. Stephen Shaw 18th of February, Netsoc

An Introduction to Data Mining

DATA MINING FOR BUSINESS INTELLIGENCE. Data Mining For Business Intelligence: MIS 382N.9/MKT 382 Professor Maytal Saar-Tsechansky

Introduction to Information and Computer Science: Information Systems

Professor, D.Sc. (Tech.) Eugene Kovshov MSTU «STANKIN», Moscow, Russia

life science data mining

Master of Artificial Intelligence

Data Privacy and Biomedicine Syllabus - Page 1 of 6

Health Informatics for Medical Librarians. Ana D. Cleveland and Donald B. Cleveland. Table of Contents

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

Data Mining - Evaluation of Classifiers

An Introduction to Machine Learning and Natural Language Processing Tools

Chapter 11. Managing Knowledge

TDA and Machine Learning: Better Together

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

COURSE PROFILE. Business Intelligence MIS531 Fall

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

Scalable Machine Learning - or what to do with all that Big Data infrastructure

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Lecture: Mon 13:30 14:50 Fri 9:00-10:20 ( LTH, Lift 27-28) Lab: Fri 12:00-12:50 (Rm. 4116)

Getting to Know Big Data

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

: Introduction to Machine Learning Dr. Rita Osadchy

The Prolog Interface to the Unstructured Information Management Architecture

Big Data and Text Mining

Data Mining Analytics for Business Intelligence and Decision Support

A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka Onisko, M.S., 1,2 Marek J. Druzdzel, Ph.D., 1 and Hanna Wasyluk, M.D.,Ph.D.

DICON: Visual Cluster Analysis in Support of Clinical Decision Intelligence

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

Uncovering Value in Healthcare Data with Cognitive Analytics. Christine Livingston, Perficient Ken Dugan, IBM

IT services for analyses of various data samples

Collaborative Filtering. Radek Pelánek

II. RELATED WORK. Sentiment Mining

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

SPATIAL DATA CLASSIFICATION AND DATA MINING

Transcription:

A.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace

what is this class about? health informatics managing and making sense of biomedical information but mostly from an artificial intelligence/machine learning/nlp view accomplishing the above with learning systems

what is this class about? by way of example

can search queries predict flu outbreaks? model probability of flu, given search terms. [Ginsberg et al., Nature, 09]

Google flu trends (movie time).

computer-aided diagnosis

clinical decision support for $200 IBM s Watson is moving into the area of clinical decision support long history of AI in this area aim: assist physicians naturally, exploiting huge database of stored knowledge uses natural language processing, machine learning methods

medical question answering

detection of cardiovascular events can we detect cardiac events?

medical informatics

a (very) little history 1920s Hollerith punch cards for public health surveys / epidemiological studies 1950s Data processing for billing 1960s Clinical Support Systems 1970s Hospital Information Systems 1980s Management Information Systems, Computer Diagnostic Imaging 1990s Unified Health Records, Clinical Decision Support Systems

rise of medical informatics increased reliance on evidence-based practice guidelines too much information not enough time to analyze uncertainty abounds lots of patients / patient-centered movement

a brief illustrative task: abstract screening or, a shameless instance of rampant self-promotion, or, our day job

abstract screening Systematic review: an exhaustive assessment of existing published evidence regarding a precise clinical question Review Specification Search (PubMed) Abstract Screening Data Extraction and Synthesis Goal is to have doctors screen a small number of abstracts (e.g. 100s) and have a classifier do the remainder automatically

is a lot of work

predictive models World Knowledge Hypothesis

machine learning World Knowledge - Learning algorithm - Feature Space Specification - Model Selection - Tunable Parameters - Et Cetera

machine learning Hypothesis

abstract screening, redux need to derive a suitable representation for the input data (text) need to select an appropriate learning algorithm

bag-of-words representation classification algorithms operate on vectors feature space: an n-dimensional representation of things - but how to vectorize text? bag-of-words: map documents to indicator vectors

a bag-of-words example let s say we want to encode two sentences S 1 = Boston drivers are frequently aggressive S 2 = The Boston Red Sox frequently hit line drives

eliminate stopwords S 1 = Boston drivers are frequently aggressive S 2 = The Boston Red Sox frequently hit line drives

remove case information S 1 = boston drivers are frequently aggressive S 2 = The boston red sox frequently hit line drives

stemming S 1 = boston drivers are frequently aggressive S 2 = The boston red sox frequently hit line drives

feature vectors hit red sox line boston frequent drive aggressive x 1 = 0 0 0 0 1 1 1 1 x 2 = 1 1 1 1 1 1 1 0 a new sentence, S 3, comes along it reads: I hate the red sox. to which sentence is it most similar? x 3 = 0 1 1 0 0 0 0 0

support vector machines min w, ε 1 w T w 2 inversely related to margin between support vectors l + C ε i i=1 cost of misclassifications

computer-aided diagnosis

pipeline model decomposes complex task into sequential stages of simpler tasks Preprocessing Region of Interest Detection Segmentation Classification Hypothesis drawbacks?

inference actionable intelligence may require multiple classifiers and domain knowledge important for structured information how do we effectively assemble this information? how do we get system users to trust the results?

unique issues low prevalence, asymmetric loss value of engineering tons of available data analytic frameworks & formal reasoning systems already exist

course goals, expectations & logistics

what are our goals? a survey course on the application of ai and ml to health informatics a competence level of such that you will understand research papers and implement ideas ideally at a level at which you can conduct your own research this is *not* a bioinformatics course

useful textbook

expectations & logistics read class material before class ask questions grading 25% homework (4-5 written/programming) 10% reaction papers (6-8 one page) 25% midterm 40% final project (collaborative, per approval)

coordinates http://www.cs.tufts.edu/comp/150aih/ ksmall@cs.tufts.edu byron.wallace@gmail.com