Big Data in medical image processing



Similar documents
Data Mining. Supervised Methods. Ciro Donalek Ay/Bi 199ab: Methods of Sciences hcp://esci101.blogspot.

How To Use A Webmail On A Pc Or Macodeo.Com

Steven C.H. Hoi School of Information Systems Singapore Management University

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS

ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION

DNS Big Data

BENCHMARKING V ISUALIZATION TOOL

Data Stream Algorithms in Storm and R. Radek Maciaszek

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Fast Analytics on Big Data with H20

How To Make Sense Of Data With Altilia

How To Use Splunk For Android (Windows) With A Mobile App On A Microsoft Tablet (Windows 8) For Free (Windows 7) For A Limited Time (Windows 10) For $99.99) For Two Years (Windows 9

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Streaming items through a cluster with Spark Streaming

Nodes, Ties and Influence

TOLOMEO. ORFEO Toolbox. Jordi Inglada - CNES. TOoLs for Open Mul/- risk assessment using Earth Observa/on data TOLOMEO

Machine Learning Introduction

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Bayesian networks - Time-series models - Apache Spark & Scala

Apache Spark and the future of big data applica5ons. Eric Baldeschwieler

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Scaling Out With Apache Spark. DTL Meeting Slides based on

Stream Deployments in the Real World: Enhance Opera?onal Intelligence Across Applica?on Delivery, IT Ops, Security, and More

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

E6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics

Deep Learning Meets Heterogeneous Computing. Dr. Ren Wu Distinguished Scientist, IDL, Baidu

An to Big Data, Apache Hadoop, and Cloudera

BIG DATA AND INVESTIGATIVE ANALYTICS

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

UNIFIED, END- TO- END EDISCOVERY

Introduc8on to Apache Spark

Final Project Report

A Content based Spam Filtering Using Optical Back Propagation Technique

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Open & Big Data for Life Imaging Technical aspects : existing solutions, main difficulties. Pierre Mouillard MD

Kaseya Fundamentals Workshop DAY THREE. Developed by Kaseya University. Powered by IT Scholars

Overview of SOTI.

NVIDIA IndeX Enabling Interactive and Scalable Visualization for Large Data Marc Nienhaus, NVIDIA IndeX Engineering Manager and Chief Architect

NEURAL NETWORKS IN DATA MINING

Scalable Developments for Big Data Analytics in Remote Sensing

SEIZE THE DATA SEIZE THE DATA. 2015

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Ensemble Learning Better Predictions Through Diversity. Todd Holloway ETech 2008

Scalable Machine Learning - or what to do with all that Big Data infrastructure

An Introduction to Data Mining

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

«Shanoir : une solu/on pour la ges/on de données distribuées en imagerie in- vivo» Jus/ne Guillaumont Isabelle Corouge

Concept and Project Objectives

How Companies are! Using Spark

OS/Run'me and Execu'on Time Produc'vity

Next Generation Operating Systems

Data Mining and Neural Networks in Stata

Image Classification for Dogs and Cats

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Big Data on Microsoft Platform

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

Petascale Visualization: Approaches and Initial Results

Low power GPUs a view from the industry. Edvard Sørgård

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Advanced Big Data Analytics with R and Hadoop

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

Spark. Fast, Interactive, Language- Integrated Cluster Computing

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

An Advanced Performance Architecture for Salesforce Native Applications

Remote Graphical Visualization of Large Interactive Spatial Data

Knowledge Discovery from patents using KMX Text Analytics

Computer Graphics Hardware An Overview

.nl ENTRADA. CENTR-tech 33. November 2015 Marco Davids, SIDN Labs. Klik om de s+jl te bewerken

Visualizing Data: Scalable Interactivity

Privacy- Preserving P2P Data Sharing with OneSwarm. Presented by. Adnan Malik

SQream Technologies Ltd - Confiden7al

Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Sanjeev Kumar. contribute

Cloudian The Storage Evolution to the Cloud.. Cloudian Inc. Pre Sales Engineering

Introduction to Data Mining

Introduction to Spark

SDN Controller Requirement

Data Mining Algorithms Part 1. Dejan Sarka

Azure Machine Learning, SQL Data Mining and R

Integrating Apache Spark with an Enterprise Data Warehouse

Data Centric Interactive Visualization of Very Large Data

Transcription:

Big Data in medical image processing Konstan3n Bychenkov, CEO Aligned Research Group LLC bychenkov@alignedresearch.com

Big data in medicine Genomic Research Popula3on Health Images M- Health

hips://cloud.google.com/genomics/v1beta2/reference/

By 2015, the average hospital will hold 665 TB of pa3ent data - 80% of which will be unstructured image data like CT scans and X- rays Medical imaging archives increasing by 20-40%

New possibili3es Personalized medicine became a reality It is possible to set up a secure, anonymous and large- scale collec3on images High precision of the new age of diagnos3c devices The lack of objec3fica3on is a considerable risk

What do we have hips://biometry.nci.nih.gov/cdas/

What do we have

What do we have hips://openfmri.org/dataset/ds000002 hip://www.brainmap.org/ hip://www.neurosynth.org/ Neurosynth: 413429 ac3va3ons reported in 11406 studies Interac3ve, downloadable meta- analyses of 3107 terms Func3onal connec3vity and coac3va3on maps for over 150,000 brain loca3ons

What can we do with fmri

PaIerns in neurons ac3vity which brain areas are ac3ve at which 3mes? which brain areas are ac3vated by different direc3ons of the visual s3mulus? 100 000 neurons of zebra- fish in 200 sec video Map images into lines of pixels intensity Reduce dimension and find out paierns in 3me 3me

Method PCA method to reduce dimension and eliminate paierns in data Spark to process data in reasonable 3me and in different configura3on

Simplifies Big Data Processing Fast Scalable General

Spark core Task Management Memory Management Sustainability Data Source Management API for data collec3ons (RDD)

Cluster mode overview A Spark program consists of two programs, a driver program and a workers program. The drivers program runs on the driver machine. The worker programs run on cluster nodes or in local threads. Then RDDs are distributed across the workers.

Linear Regression predict y from x. Evaluate accuracy predic3ons by ver2cal distances between points and the line What is PCA PCA reconstruct 2D data via 2D data with single degree of freedom. Evaluate reconstruc3ons by Euclidean distances PCA solu3on finds direc3on of maximal variance Can be done in parallel mode

PCA formula3on

PCA and Spark in scratch dataset with n=46460 pixels and d=240 seconds of 2me series data for each pixel. now have a preprocessed dataset # Run pca using scaleddata componentsscaled, scaledscores, eigenvaluesscaled = pca(scaleddata.values(),3)

Top principal component Second principal component Top two components as one image

Dynamic causal modeling Aim The aim of dynamic causal modeling (DCM) is to infer the causal architecture of coupled or distributed dynamical systems. (How our brain work) Model DCM formulated in terms of ordinary differen3al equa3ons, that model dynamics of hidden states in the nodes of a probabilis3c graphical model, where condi3onal dependencies are parameterised in terms of directed effec3ve connec3vity. (graph of n interac3ng brain regions) Choosing Bayesian best model model comparison procedure that rests on comparing models of how data were generated.

CNN Previous supervised learning approach Recogni3on = Art of Feature Crea3on + Learning Technique Convolu3onal Neural Network feature based supervised learning without need of feature crea3on CNN based recogni3on = Technique of feature extrac3on from data set + Learning Technique

Feature extrac3on + learning classifier CNN

Typical and semi- typical apps Object and text recogni3on Detec3on

Combined nets for complex tasks Scene segmenta3on

Deep Nets = Big Data + GPUs Huge training set is beier then cute Net architecture!

Huston, we ve got a problem Huge data is a solu3on for a lot of ques3ons, but 1. Why are CNN good architecture? 2. How many layers do we need? 3. How many free parameters do we need? 4. What can we do with local minima? 5. What about spa3al rela3ons? 6. We have neurons layers net itself is it enough?

CNN + CRF = Spa3al correla3ons Learning Graphs model over the CNN

Implemen3ng memory Bag- of- words simple associa3ve NN- like memory for billions of words.

Caffe Key architecture idea a config- file with network s configura3on and training strategy Pro - You not need to make a program in order to start - GPU in the box - You are in good company (Berkley, MIT, Google) - Growing fast Contra - You need to know how to make a program (if you want a bit more then a prototyping) - Growing too fast (even without backward compa3bility!) - BeIer for contest than for produc3on