Machine Learning/Data Mining for Cancer Genomics



Similar documents
Statistics for BIG data

A Review of Data Mining Techniques

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept

Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee

Data Mining and Machine Learning in Bioinformatics

Data, Measurements, Features

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Semi-Supervised and Unsupervised Machine Learning. Novel Strategies

Guidelines for Establishment of Contract Areas Computer Science Department

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

A leader in the development and application of information technology to prevent and treat disease.

The University is comprised of seven colleges and offers 19. including more than 5000 graduate students.

Ph.D. in Bioinformatics and Computational Biology Degree Requirements

Concept and Project Objectives

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR.

Information Visualization WS 2013/14 11 Visual Analytics

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Healthcare Measurement Analysis Using Data mining Techniques

UNIVERSITY OF INFINITE AMBITIONS. MASTER OF SCIENCE COMPUTER SCIENCE DATA SCIENCE AND SMART SERVICES

Data Mining and Analytics in Realizeit

Chapter ML:XI. XI. Cluster Analysis

Introduction to Data Mining

MACHINE LEARNING BASICS WITH R

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Business Intelligence and Decision Support Systems

Data Mining On Diabetics

Introduction. A. Bellaachia Page: 1

Novel Mining of Cancer via Mutation in Tumor Protein P53 using Quick Propagation Network

Big Data with Rough Set Using Map- Reduce

An Introduction to Health Informatics for a Global Information Based Society

A Big Data Workshop introducing a smarter decision methodology for your organisation through advanced analytics and career opportunity

CHAPTER 1 INTRODUCTION

How To Use Neural Networks In Data Mining

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

SURVEY REPORT DATA SCIENCE SOCIETY 2014

DATA MINING TECHNIQUES AND APPLICATIONS

Data Science. BSc Hons

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Technical Club: New Vision of Computing

DATA MINING: AN OVERVIEW

How To Become A Data Scientist

Data Mining System, Functionalities and Applications: A Radical Review

Learning from Big Data in

A1 Introduction to Data exploration and Machine Learning

An Overview of Knowledge Discovery Database and Data mining Techniques

BIOINFORMATICS Supporting competencies for the pharma industry

M.S. AND PH.D. IN BIOMEDICAL ENGINEERING

Day 7 Business Information Systems-- the portfolio. Today s Learning Objectives

Introduction to Data Mining

Admission Number. Master of Science Programme in Computer Science (International Programme)

APPLICATION OF DATA MINING TECHNIQUES FOR THE DEVELOPMENT OF NEW ROCK MECHANICS CONSTITUTIVE MODELS

CLUSTER ANALYSIS WITH R

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm

A Knowledge Management Framework Using Business Intelligence Solutions

Big Data. Introducción. Santiago González

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

Delivering the power of the world s most successful genomics platform

A Spatial Decision Support System for Property Valuation

Big Data Visualization for Genomics. Luca Vezzadini Kairos3D

Comparison of K-means and Backpropagation Data Mining Algorithms

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

Deliverable D7.2: The project website

Supervised and unsupervised learning - 1

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Database Marketing, Business Intelligence and Knowledge Discovery

Use of Data Mining in the field of Library and Information Science : An Overview

Machine Learning in Hospital Billing Management. 1. George Mason University 2. INOVA Health System

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Feature Factory: A Crowd Sourced Approach to Variable Discovery From Linked Data

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

M.Tech. Software Systems

The Scientific Data Mining Process

SPATIAL DATA CLASSIFICATION AND DATA MINING

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

INSERM/ A. Bernheim. Overcoming clinical relapse in multiple myeloma by understanding and targeting the molecular causes of drug resistance

Education. Research Experience (Funded Projects)

An intelligent tool for expediting and automating data mining steps. Ourania Hatzi, Nikolaos Zorbas, Mara Nikolaidou and Dimosthenis Anagnostopoulos

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Introduction to Pattern Recognition

Knowledge Management

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA

Transcription:

Machine Learning/Data Mining for Cancer Genomics Bernard Manderick, Vrije Universiteit Brussel Henry Nyongesa, University of the Western Cape Collaboration: Artificial Intelligence Laboratory VUB Intelligent Systems Laboratory - UWC South Africa National Bioinformatics Institute - SANBI Interuniversity Institute of Bioinformatics in Brussels - (IB) 2

Project Outline Machine Learning (ML) is a rapidly growing field of research both in terms of new techniques and applications. The most exciting aspect of ML is the No Free Lunch principle that states no single ML-algorithm is optimal on types of problems, and hence therefore you can t have a priori knowledge if any one technique is the most suitable for a particular problem. In this research we focus on using ML for mining large genomic data sets in order to class human tumours.

Big Data and the Curse of Dimensionality The number of different types of datasets in the public domain continues to grow exponentially. Academic computing research is currently addressing so called big data" solutions to make sense of the vast datasets coming out of research in other disciplines, including genomics and bioinformatics. Such data sets are large scale and highly multi-dimensional and not easily amenable to traditional data analysis tools. ML techniques are suited for automated knowledge discovery from large complex data sets.

Data Mining and Cancer Genomics Data mining addresses the problem of discovering patterns, regularities and structure within data collections. The field is for this reason, also referred to as knowledge discovery from databases (KDD). Such discovered knowledge can then be applied to make predictions on similar datasets, suggest explanation of dependencies between independent variables, or generally improve decision making. Cancer is increasingly becoming more common in African populations. Gene Expression profiling can be used to distinguish between known cancer sub-types, and discover new types that may have remained unknown to pathologists.

Research Collaboration between VUB and UWC Capacity building in competences for advanced research and scholarship: VUB investigators will offer support, expertise and training to UWC staff and students. Collaborative research into novel machine learning and data mining techniques: Next Generation data mining techniques will require new machine learning algorithms, and new methods for information storage and retrieval, feature selection and selection optimization, and optimization of decision making. International cooperation through staff and student exchange visits: Make available and accessible to collaborating groups research and educational material developed by either group.

Long Term Objectives Establishment of a Centre for Machine Learning and Data Mining Applications at UWC. Human capacity development in the field of machine learning and data mining. Recognition of South African science and technology through international cooperation and collaboration. Dissemination of South African research output in international workshops and conferences.

Significance of Research This research aims to answer basic questions in cancer genomics research: What is the most optimal and relevant representation of genomic data? How to diagnose a patient based gene expression profile and on the knowledge gained from previously assayed patients. How to best integrate genome-wide analytic tool into the large and rapidly increasing amount of genome-wide datasets.

Mode of Collaboration Through staff and student exchange visits, and joint co-supervision of research students. Participate in joint authorship, publication and dissemination of research papers. Host a research conference/workshop in each year of the project, jointly organised by the partners, at UWC during which identified international experts and other field scientists shall be invited. Make project decisions jointly in a democratic fashion, and with the maximum amount of information available, after discussions at face-to- face meetings or by email. Develop and establish an online collaboration platform within the project to document data and code development, and other information related to the project.

Work packages and Timelines WP1: Use Distributed/High Performance Computing for large sclae genome analysis. WP2: Visits to enhance collaboration between partners WP 3: Africa Workshops on Artificial Intelligence, Machine Learning, Data Mining and Bioinformatics Establishment of UWC Centre for Artificial Intelligence and Data Mining.

Collaboration Professor Alan Christoffels, Director, SANBI, UWC. Professor Ann Nowe, VUB/(IB) 2 (Machine Learning/Bioinformatics) Professor Tom Lenaerts, VUB/(IB) 2 (Bioinformatics)