Microarray Data Analysis. Prof. Anastasios Bezerianos Andrei Dragomir Ioannis Maraziotis

Size: px
Start display at page:

Download "Microarray Data Analysis. Prof. Anastasios Bezerianos Andrei Dragomir Ioannis Maraziotis"

Transcription

1 Microarray Data Analysis Prof. Anastasios Bezerianos Andrei Dragomir Ioannis Maraziotis

2 Process overview 1- sample 2- reference sample excitation scanning laser 1 laser 2 Cy3: ~550 nm Cy5: ~650 nm cdna probes cdna cy5 cdna cy3 emission printing Hybridization combination normalization of images Microarray Image analysis

3 Image analysis -Software needs to extract red and green intensities from each spot. -Also estimates background intensity in each wavelength band. - Easy for some spots, harder for others.

4 Image Analysis - Usually the company producing the microarrays offer image analysis software suits. - Some preprocessing and normalization are performed in order to make data acquisition reliable. - Normalization = transformations applied to data sets from different experiments to make them comparable, in terms of equal intensity for spots.

5 From array scans to data matrix

6 Gene expression data matrices Experiments (several to 100s) T1 T2 T3 T4 T5 T6 Genes (1000s of rows) At At At At At At At At At Cells hold expression values in arbitrary units

7 Microarray Data Analysis Level 1 Which genes are expressed/suppressed? Biology Level 2 Which genes are co-expressed/suppressed? Discovery of gene function Level 3 How are genes regulated? Discovery of Gene Regulation Network

8 Example of an expression profile for each gene : [mrna] ~ Cy5/Cy3 = r 5 _ 1 _ up-regulation induction down-regulation repression 0 Start of experiment time / h

9 Expression profiles Expression level Five experiments e.g. Red and black genes show similar expression profiles Blue gene is different T1 T2 T3 T4 T5 Gene Gene Gene Experiment

10 First level analysis Question : - is gene i significantly different between states A and B? - significant means statistical significance. - define sets of statistical parameters from experiment data: mean, variances, standard deviation data distributions Fig. Shows changes in expression greater than 2 and 10 fold

11 First level analysis - Answers the question : when, where and to what extent are genes expressed. - Techniques employed : 2-fold rule, statistical tests

12 Second level analysis More complicated questions : - which genes show related expression over a set of experiments? Co-regulation inference of function (genes belonging to same pathway exhibit same regulatory patterns) - which genes are differentially expressed? (e.g. disease / healthy tissue)

13 Similarity: blue e 2 - Distance between profiles of red and blue genes = d(a,b) red e 2 - most common: Euclidian distance but other measures can be used also (correlation, Manhattan distance etc) d( red, blue) = i red ( e e i blue i 2 )

14 Similarity? Large correlation Small correlation Large Correlation

15 Clustering Know-how to measure similarity between profiles Clustering is the process of gathering together similar profiles into clusters Each cluster is a set of genes with related expression patterns in the experiments concerned - Genes may be imagined as points in an n dim space E x p e r i m e n t 3 Experiment 2 Experiment 1

16 Clustering - supervised methods methods make use of a priori knowledge on data while unsupervised methods don t, they are exploratory techniques. Unsupervised clustering Supervised classification

17 Unsupervised Clustering - K means clustering 1. Assign genes randomly to K clusters and compute cluster centroids 2. Evaluate sum of squared distances to cluster centres 3. While (not finished) 4. Make a random change to the cluster assignments 5. Evaluate sum of squared distances to new cluster centres 6. If clustering improved then accept, else reject 7. End while

18 K-means clustering

19 Hierarchical clustering

20 Hierarchical clustering 1. Assign each gene to its own cluster (N genes gives N clusters). 2. Find the two clusters with the most similar expression profiles and cluster them together (reduces the number of clusters by 1). 3. Calculate distances between the new cluster and all existing clusters. 4. Repeat 2 and 3 until there is only one cluster containing all the genes.

21 Hierarchical clustering Variants: Single linkage distance between two clusters is the shortest possible pair-wise distance Complete linkage distance between two clusters is the longest possible pair-wise distance Average linkage (not shown) distance is the average pair-wise distance

22 Self Organizing Map - Well known unsupervised neural network algorithm. - used for analysis and organization of large data sets. - it offers improved visualizations by performing a dimensional reduction projecting data from the n dim input data space into a 2D node grid. - nodes represent clusters, node weights initialized with values from data set. - for each gene pattern the closest node is computed; gene assigned to that node. - update the node weights, perform the whole process for several weight until algorithm performance is satisfactory.

23 Self Organizing Map

24 Supervised data analysis - makes use of available information on genes, such as labels describing gene functions etc. - used in classification, diagnosis or prognosis. - Goal obtain a set of gene patterns on basis of which : - Make diagnosis - Predict future outcome - Predict future response to pharmacological intervention - Categorize tissue as part of a class (e.g malignant or healthy tissue)

25 Supervised Techniques - typically classifiers trained on a subset of data with a priori given classification and tested on other subsets with known classification and then applied to data of unknown class. (see J.Khan paper Nature 7, 2001 p ) - Support vector machines (SVM) (Brown paper PNAS 97, 2000 p ) - SVMs find a linear classifier for data that are impossible to separate by a hyper plane by mapping data into a higher dimension data space. - over-fitting is avoided by defining a maximum margin hyper plane, typically represented by a linear combination of training points.

26 Support Vector Machines - employ distance functions that operate in high dimensional space. - make advantage of prior knowledge in making distinction between one type of gene and another.

27 - Gene regulatory networks : Third level analysis εδοµένα Γονιδιακής Έκφρασης Οµαδοποίηση Κοινής Έκφρασης Εξαγωγή συνρύθµισης από συνέκφραση Βιολογική Γνώση Ανακάλυψη Λειτουργίας γονιδίων -priori ιολογική νώση εδοµένα αλληλούχησης DNA Εκτίµηση των DNA σηµείων δέσµευσης µε επιβλεπόµενη µάθηση Βιολογική Γνώση Ανακάλυψη Σηµάτων Ελέγχου Αρχική Σχεδίαση του ΓΡ Εκπαίδευση του ΓΡ από δεδοµένα έκφρασης Βιολογική Γνώση Ανακάλυψη ΓΡ

28 Genetic Networks - genetic network = a sophisticated regulatory web of cause and effect which describes the way genes interact with one another. - at different stages of development (or response to extra-cellular perturbation), different genes of a cell are switched on at different intensity levels. - parameters that characterize the genetic networks : -intracellular (mrna concentration of another gene) -intercellular (hormones secreted by an organ gives a signal to other organ) -extracellular (environmental conditions e.g. heat shock)

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

Unsupervised and supervised dimension reduction: Algorithms and connections

Unsupervised and supervised dimension reduction: Algorithms and connections Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Analysis of gene expression data. Ulf Leser and Philippe Thomas Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:

More information

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

Supervised and unsupervised learning - 1

Supervised and unsupervised learning - 1 Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

Exploratory data analysis for microarray data

Exploratory data analysis for microarray data Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization

More information

K-Means Clustering Tutorial

K-Means Clustering Tutorial K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July

More information

Hierarchical Clustering Analysis

Hierarchical Clustering Analysis Hierarchical Clustering Analysis What is Hierarchical Clustering? Hierarchical clustering is used to group similar objects into clusters. In the beginning, each row and/or column is considered a cluster.

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

Microarray Technology

Microarray Technology Microarrays And Functional Genomics CPSC265 Matt Hudson Microarray Technology Relatively young technology Usually used like a Northern blot can determine the amount of mrna for a particular gene Except

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering

More information

Visualization of textual data: unfolding the Kohonen maps.

Visualization of textual data: unfolding the Kohonen maps. Visualization of textual data: unfolding the Kohonen maps. CNRS - GET - ENST 46 rue Barrault, 75013, Paris, France (e-mail: ludovic.lebart@enst.fr) Ludovic Lebart Abstract. The Kohonen self organizing

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

Gene expression analysis. Ulf Leser and Karin Zimmermann

Gene expression analysis. Ulf Leser and Karin Zimmermann Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a

More information

Foundations of Artificial Intelligence. Introduction to Data Mining

Foundations of Artificial Intelligence. Introduction to Data Mining Foundations of Artificial Intelligence Introduction to Data Mining Objectives Data Mining Introduce a range of data mining techniques used in AI systems including : Neural networks Decision trees Present

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

DeCyder Extended Data Analysis (EDA) Software

DeCyder Extended Data Analysis (EDA) Software Part of GE Healthcare Data File 28-4015-41 AA DeCyder Extended Data Analysis (EDA) Software DeCyder EDA DeCyder Extended Data Analysis Software (DeCyder EDA) is high-performance informatics software for

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

More information

Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

More information

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

More information

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM)

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM) 1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM) 2. Gene regulation tools and methods Regulatory sequences and motif discovery TF binding sites, microrna target prediction

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining 6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

More information

Data Mining and Machine Learning in Bioinformatics

Data Mining and Machine Learning in Bioinformatics Data Mining and Machine Learning in Bioinformatics PRINCIPAL METHODS AND SUCCESSFUL APPLICATIONS Ruben Armañanzas http://mason.gmu.edu/~rarmanan Adapted from Iñaki Inza slides http://www.sc.ehu.es/isg

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

They can be obtained in HQJHQH format directly from the home page at: http://www.engene.cnb.uam.es/downloads/kobayashi.dat

They can be obtained in HQJHQH format directly from the home page at: http://www.engene.cnb.uam.es/downloads/kobayashi.dat HQJHQH70 *XLGHG7RXU This document contains a Guided Tour through the HQJHQH platform and it was created for training purposes with respect to the system options and analysis possibilities. It is not intended

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Anomaly Detection in Predictive Maintenance

Anomaly Detection in Predictive Maintenance Anomaly Detection in Predictive Maintenance Anomaly Detection with Time Series Analysis Phil Winters Iris Adae Rosaria Silipo Phil.Winters@knime.com Iris.Adae@uni-konstanz.de Rosaria.Silipo@knime.com Copyright

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)

More information

Distances, Clustering, and Classification. Heatmaps

Distances, Clustering, and Classification. Heatmaps Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be

More information

Music Mood Classification

Music Mood Classification Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may

More information

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,

More information

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,

More information

Self Organizing Maps: Fundamentals

Self Organizing Maps: Fundamentals Self Organizing Maps: Fundamentals Introduction to Neural Networks : Lecture 16 John A. Bullinaria, 2004 1. What is a Self Organizing Map? 2. Topographic Maps 3. Setting up a Self Organizing Map 4. Kohonen

More information

Visualization of Breast Cancer Data by SOM Component Planes

Visualization of Breast Cancer Data by SOM Component Planes International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian

More information

Adaptive Anomaly Detection for Network Security

Adaptive Anomaly Detection for Network Security International Journal of Computer and Internet Security. ISSN 0974-2247 Volume 5, Number 1 (2013), pp. 1-9 International Research Publication House http://www.irphouse.com Adaptive Anomaly Detection for

More information

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar

More information

Cluster Analysis: Basic Concepts and Algorithms

Cluster Analysis: Basic Concepts and Algorithms Cluster Analsis: Basic Concepts and Algorithms What does it mean clustering? Applications Tpes of clustering K-means Intuition Algorithm Choosing initial centroids Bisecting K-means Post-processing Strengths

More information

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28 Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object

More information

Summary Data Mining & Process Mining (1BM46) Content. Made by S.P.T. Ariesen

Summary Data Mining & Process Mining (1BM46) Content. Made by S.P.T. Ariesen Summary Data Mining & Process Mining (1BM46) Made by S.P.T. Ariesen Content Data Mining part... 2 Lecture 1... 2 Lecture 2:... 4 Lecture 3... 7 Lecture 4... 9 Process mining part... 13 Lecture 5... 13

More information

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information

203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Self Organizing Maps for Visualization of Categories

Self Organizing Maps for Visualization of Categories Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, julian.szymanski@eti.pg.gda.pl

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu

Machine Learning CS 6830. Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Machine Learning CS 6830 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu What is Learning? Merriam-Webster: learn = to acquire knowledge, understanding, or skill

More information

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical

More information

For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall

For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall Cluster Validation Cluster Validit For supervised classification we have a variet of measures to evaluate how good our model is Accurac, precision, recall For cluster analsis, the analogous question is

More information

Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations

Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations Christian W. Frey 2012 Monitoring of Complex Industrial Processes based on Self-Organizing Maps and

More information

Analytics on Big Data

Analytics on Big Data Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis

More information

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information

TIETS34 Seminar: Data Mining on Biometric identification

TIETS34 Seminar: Data Mining on Biometric identification TIETS34 Seminar: Data Mining on Biometric identification Youming Zhang Computer Science, School of Information Sciences, 33014 University of Tampere, Finland Youming.Zhang@uta.fi Course Description Content

More information

A Survey on Intrusion Detection System with Data Mining Techniques

A Survey on Intrusion Detection System with Data Mining Techniques A Survey on Intrusion Detection System with Data Mining Techniques Ms. Ruth D 1, Mrs. Lovelin Ponn Felciah M 2 1 M.Phil Scholar, Department of Computer Science, Bishop Heber College (Autonomous), Trichirappalli,

More information

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Ran M. Bittmann School of Business Administration Ph.D. Thesis Submitted to the Senate of Bar-Ilan University Ramat-Gan,

More information

REAL TIME PCR USING SYBR GREEN

REAL TIME PCR USING SYBR GREEN REAL TIME PCR USING SYBR GREEN 1 THE PROBLEM NEED TO QUANTITATE DIFFERENCES IN mrna EXPRESSION SMALL AMOUNTS OF mrna LASER CAPTURE SMALL AMOUNTS OF TISSUE PRIMARY CELLS PRECIOUS REAGENTS 2 THE PROBLEM

More information

An Introduction to Microarray Data Analysis

An Introduction to Microarray Data Analysis Chapter An Introduction to Microarray Data Analysis M. Madan Babu Abstract This chapter aims to provide an introduction to the analysis of gene expression data obtained using microarray experiments. It

More information

diagnosis through Random

diagnosis through Random Convegno Calcolo ad Alte Prestazioni "Biocomputing" Bio-molecular diagnosis through Random Subspace Ensembles of Learning Machines Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini DSI Dipartimento

More information

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.

More information

AN EXPERT SYSTEM TO ANALYZE HOMOGENEITY IN FUEL ELEMENT PLATES FOR RESEARCH REACTORS

AN EXPERT SYSTEM TO ANALYZE HOMOGENEITY IN FUEL ELEMENT PLATES FOR RESEARCH REACTORS AN EXPERT SYSTEM TO ANALYZE HOMOGENEITY IN FUEL ELEMENT PLATES FOR RESEARCH REACTORS Cativa Tolosa, S. and Marajofsky, A. Comisión Nacional de Energía Atómica Abstract In the manufacturing control of Fuel

More information

A comparison of various clustering methods and algorithms in data mining

A comparison of various clustering methods and algorithms in data mining Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Cluster software and Java TreeView

Cluster software and Java TreeView Cluster software and Java TreeView To download the software: http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/treeview.html Cluster 3.0

More information

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1 Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints

More information

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors Chapter 9. General Matrices An n m matrix is an array a a a m a a a m... = [a ij]. a n a n a nm The matrix A has n row vectors and m column vectors row i (A) = [a i, a i,..., a im ] R m a j a j a nj col

More information

Maschinelles Lernen mit MATLAB

Maschinelles Lernen mit MATLAB Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical

More information

Statistical Databases and Registers with some datamining

Statistical Databases and Registers with some datamining Unsupervised learning - Statistical Databases and Registers with some datamining a course in Survey Methodology and O cial Statistics Pages in the book: 501-528 Department of Statistics Stockholm University

More information