The data. Introducción al análisis de datos en microarrays ... Characteristics of the data: Universidad Complutense de Madrid ESCUELA DE VERANO 2007

Size: px
Start display at page:

Download "The data. Introducción al análisis de datos en microarrays ... Characteristics of the data: Universidad Complutense de Madrid ESCUELA DE VERANO 2007"

Transcription

1 Universidad Complutense de Madrid ESCUELA DE VERANO 2007 Universidad Complutense de Madrid ESCUELA DE VERANO 2007 Introducción al análisis de datos en microarrays 1. Introducción 2. Microarrays (tipos, tratamiento de las muestras e hibridación, bases de datos, aplicaciones ) 3. Normalizacion y análisis exploratorio (JC Oliveros) 4. Análisis de datos (Análisis supervisado y no supervisado, algoritmos ) 5. Práctica Gonzalo Gómez López INB-CNIO ggomez@cnio.es Madrid, 26 de Julio, 2007 Gonzalo Gómez López ggomez@cnio.es Genes (thousands)... A B C Experimental conditions (from tens up to no more than a few hundreds) The data Different classes of experimental conditions, e.g. Cancer types, tissues, drug treatments, time survival, etc. Expression profile of all the genes for a experimental condition (1 array) Expression profile of a gene across the experimental conditions Characteristics of the data: Low signal to noise ratio High redundancy and intra-gene correlations Most of the genes are not informative with respect to the trait we are studying (physiological conditions, etc.) Many genes have no annotations!! Source: J. Dopazo Unsupervised methods Brief history in microarray data analysis: Supervised methods (predictors) 1. Clustering methods Neural networks (Khan et al. 2001) UPGMA (Sneath and Sokal, 1973) SOMs (Kohonen et al. 1984) kmeans (Hartigan and Wong 1979) knn (Ripley 1996; Hastie et al 2001) kmedians (Hartigan and Wong 1979) PAM (Tibshirani et al. 2002) SOTA (Herrero et al. 2001) DLDA (Dudoit et al. 2002) SOM (Kohonen 1979, Tamayo et al 1999) SVMs (Furey et al, 2000) Gene Shaving (Hastie et al, 2000) Fuzzy methods (Dougherty ET AL. 2002) Probabilistic (Bhattacharjee et al. 2001) Metagenes (Pittman et al, 2004) 2. Exploratory analysis parametric: ttest, SAM (Tusher et al, 2001), ANOVA non parametric: Welch ttest, Wilcoxon, Kruskal Wallis Sumarizing datasets: PCA 3. Blocks of genes Kolmogorov-Smirnof: GSEA (Subramanian et al. 2005) FatiScan (Al-Shahrour et al. 2005) SAM 3.0 (Jan 07) Gonzalo Gómez GeneTrail López (Backes et al. 2007)

2 Overexpression EJEMPLO or No Change Underexpression Image analysis comparison (normalization and filtering) Data analysis (supervised, unsupervised ) Ratios (or not ) Log 2 transform Underexpression Overexpression x1/2 x2 Why Log 2 transform?? QUESTIONS & METHODOLOGICAL APPROACH Ratio scale Log scale (lineal) Can we find groups of experiments with similar gene expression profiles? Different phenotypes... Unsupervised analysis Supervised analysis Reverse engineering Molecular classification of samples What genes are responsible for? Co-expressing genes... What do they have in common? No Log transform After Log transform What profile(s) do they display? and... Genes interacting in a network (A,B,C..)... Genes of a class Are there more genes? How is the network? A B E C D Source: J. Dopazo

3 Algorithms & Reminding Examples Algorithms Molecular classification of cancer Known class prediction (Supervised data analysis).? Hierarchical Neural networks PCA SVM A set of instructions for solving a problem. When the instructions are followed, it must eventually stop with an answer. Computer Science Computational geometry Image processing Design Real-time simulations Economy Ecomomy models Prediction & simulation Applications A.I. Discovering SOTA class SOMs (Unsupervised data analysis).? k-means UPGMA Engineering Brief history in microarray data analysis: Unsupervised methods Supervised methods (predictors) Neural networks (Khan et al. 2001) 1. Clustering methods SOMs (Kohonen et al. 1984) UPGMA (Sneath and Sokal, 1973) knn (Ripley 1996; Hastie et al 2001) kmeans (Hartigan and Wong 1979) PAM (Tibshirani et al. 2002) kmedians (Hartigan and Wong 1979) DLDA (Dudoit et al. 2002) SOTA (Herrero et al. 2001) SVMs (Furey et al, 2000) SOM (Kohonen 1979, Tamayo et al 1999) Gene Shaving (Hastie et al, 2000) Fuzzy methods (Dougherty ET AL. 2002) Probabilistic (Bhattacharjee et al. 2001) Metagenes (Pittman et al, 2004) 2. Exploratory analysis parametric: ttest, SAM (Tusher et al, 2001), ANOVA non parametric: Welch ttest, Wilcoxon, Kruskal Wallis Sumarizing datasets: PCA 3. Blocks of genes Kolmogorov-Smirnof: GSEA (Subramanian et al. 2005) FatiScan (Al-Shahrour et al. 2005) SAM 3.0 (Jan 07) GeneTrail Gonzalo Gómez López (Backes et al. 2007) Science Cosmology & radio astronomy Condensed matter physics 3D proteins design Mathematics N items Cryptography Cipher & decipher codes Patterns recognition Artificial vision Applications and tasks Stochastic systems evolution Proteins folding prediction Biological evolution Atomistic resolution Nanomaterials Communications Optimizations Meteorology Simulation & prediction CLUSTERING Clustering??? K groups

4 Algorithms & Algorithms & Agglomerative Hierarchical algorithms Different methods UPGMA Divisive SOTA Non-hierarchical algorithms Clustering algorithm 26 items 4 groups Why different algorithms? Y C1 C C4 K-means SOMs Data analysis in microarrays Hierarchical & dendrograms I) Cartesian algorithm (x, y):...c3... a) (X>0,Y>0)! C3, 1/4 C2? b) (X<0,Y>0)! C1, 1/4 C2? c) (X<0,Y<0)! C4?, 1/4 C2? d) (X>0,Y<0)! C4?, 1/4 C2? aubucud! C1, C2, C3, C4. x II) Polar algorithm (", m): There s no a perfect method e) ("=cte, m)! C1... f) (", m=cte)! C2, C3?... g) (", m)! C1, C2, C3, C4. Different methods Hierarchical algorithms Non-hierarchical algorithms K-means SOMs Etc Others Agglomerative UPGMA Divisive SOTA Artificial Neural Networks (ANNs) SVM PCA

5 UPGMA Unweighted Pair Group Method with Arithmetic mean Distances between genes?? Metrics Example Pearson Correlation Coefficient Gene 1 1) B y C son los dos genes más próximos (en sus medias) 2) B y C se unen y se recalcula la matriz de distancias empleando el nuevo cluster en vez de B y C. A y D son ahora los mas próximos. 3) A y D tambien se unen. Se reordenan los elementos para ajustar la topología del árbol. Se recalcula la matriz de distancias entre los grupos existentes 4) El proceso termina cuando el dendrograma esta construído. Exp 1 Exp 2 Exp 3 Euclidean distance Pearson correlation coefficient Spearman! correlation coefficient RNA express Gene RNA express Gene RNA express Gene Gene 2 Gene Gene 1 Gene 2 Gene 3 DISTANCIA B C A Dendrograms A EUCLIDEANA Linkage methods Niveles de Expresión B C t CORRELACIÓN CITY BLOCK (Manhattan or taxi cab) OTRAS (Canberra, Bray-Curtis ) PEARSON B A C CLUSTER 1 A C B D E F G CLUSTER 3 CLUSTER 2 Average-linkage method Single-linkage method Complete-linkage method A B C D E F G E D F G A B C G F E D C B A Others SPEARMAN (Weighted pair-group average, Within-groups Ward s method )

6 Ejemplos Hierarchical - UPGMA En nuestra práctica: GEPAS? Garber M. E. et al. PNAS, vol.98, nº24; SOTA Data analysis in microarrays Self-Organizing Tree Algorithm Hierarchical & dendrograms Different methods Hierarchical algorithms Non-hierarchical algorithms K-means SOMs Etc Others - Artificial Neural Network - Growth from the root of the tree, toward the leaves (from lower to higher resolution ) -Threshold of resource value should be fixed -Generation of a hierarchical cluster structure at the desired level of resolution Agglomerative UPGMA Divisive SOTA Artificial Neural Networks (ANNs) SVM PCA Herrero et al. Bioinformatics. 17:

7 SOTA Self-Organizing Tree Algorithm SOTA Self-Organizing Tree Algorithm Number of genes Average profiles Un ejemplo En nuestra práctica: GEPAS K-means Data analysis in microarrays K-means & Self organizing-maps (SOMs) Gene expression Time Gene expression 1 2 Time3 Gene expression Different methods Hierarchical algorithms Non-hierarchical algorithms K-means SOMs Etc Others Agglomerative UPGMA SOTA Divisive Artificial Neural Networks (ANNs) SVM PCA 4 Time5 6

8 K-means K-means Example More information: K-means Example 2 K= 12 d te s a l Re ene g Self-organizing maps (SOMs) or Kohonen Maps SOMs Geometric dependence, i.e. 4 x 3 A B C D exp t Kohonen T. Proc IEEE 78(9): ,1990 Tamayo P. et al. PNAS, 96: ; 1999.

9 Self-organizing maps (SOMs) Unsupervised methods Brief history in microarray data analysis: Supervised methods (predictors) EJEMPLO Indices de bienestar mundiales 1. Clustering methods Neural networks (Khan et al. 2001) UPGMA (Sneath and Sokal, 1973) SOMs (Kohonen et al. 1984) kmeans (Hartigan and Wong 1979) knn (Ripley 1996; Hastie et al 2001) kmedians (Hartigan and Wong 1979) PAM (Tibshirani et al. 2002) SOTA (Herrero et al. 2001) DLDA (Dudoit et al. 2002) SOM (Kohonen 1979, Tamayo et al 1999) SVMs (Furey et al, 2000) Gene Shaving (Hastie et al, 2000) Fuzzy methods (Dougherty ET AL. 2002) Probabilistic (Bhattacharjee et al. 2001) Metagenes (Pittman et al, 2004) 2. Exploratory analysis parametric: ttest, SAM (Tusher et al, 2001), ANOVA non parametric: Welch ttest, Wilcoxon, Kruskal Wallis Sumarizing datasets: PCA 3. Blocks of genes Kolmogorov-Smirnof: GSEA (Subramanian et al. 2005) FatiScan (Al-Shahrour et al. 2005) SAM 3.0 (Jan 07) Gonzalo Gómez GeneTrail López (Backes et al. 2007) Class1 Class2 Exploratory analysis Currently testing block of genes ~ biological pathways FDR< testing genes independently... GSEA (Subramanian et al. PNAS ) Gene Set Enrichment Analysis Broad Institute v 2.0 available since Jan 2007 New version includes Biocarta, Broad Institute, GeneMAPP, KEGG annotations and more... Platforms: Affymetrix, Agilent, CodeLink, 2-color... ttest cut-off GSEA applies Kolmogorov-Smirnof test to find assymmetrical distributions for defined blocks of genes in datasets whole distribution. FDR<0.05 Biological meaning?

10 - Class1 Class2 Gene Set 1 Gene Set 2 Gene Set 3 Now analizing block of genes ~ biological pathways FatiScan Gene set 3 enriched in Class 2 Al-Shahrour et al. Bioinformatics Jul 1;21(13): Available in Babelomics suite: ES/NES statistic ttest cut-off Segmentation test for searching asymmetries More statistically sensitive than GSEA Few annotations implemented En nuestra practica: Gene set 2 enriched in Class 1 + Datamining Mathematics methods for biology A di sc us si on Tengo mis datos... He analizado su estructura y relevancia estadística Qué es, que hace y que sabemos de este gen? Que hay en este cluster?? Cell cycle... DBs Information Current math s methods are useful to explain real and meaningful processes from a biological point of view? mmm actually we don t know sometimes they are sometimes not New integrative methods should be developed Clustering,estadística, etc Source: J. Dopazo Datamining Links

11 Datamining Para profundizar Data Analysis Tools for DNA Microarrays by Sorin Draghic DNA Microarrays and Gene Expression : From Experiments to Data Analysis and Modeling by Pierre Baldi, G. Wesley Hatfield, Wesley G. Hatfield Exploration and Analysis of DNA Microarray and Protein Array Data (Wiley Series in Probability and Statistics) by Dhammika Amaratunga, Javier Cabrera And good luck NetAffx ANSWER Your genes, hypothesis, questions Statistics for Microarrays : Design, Analysis and Inference by Ernst Wit, John McClure Analyzing Microarray Gene Expression Data (Wiley Series in Probability and Statistics) by Geoffrey J. McLachlan, Kim-Anh Do, Christophe Ambroise En nuestra práctica: Bioinformatics and Computational Biology Solutions Using R and Bioconductor (Statistics for Biology and Health) by Robert Gentleman, Vincent Carey, Wolfgang Huber, Rafael Irizarry, Sandrine Dudoit Microarray Gene Expression Data Analysis: A Beginner's Guide by Helen Causton Windows Mac OS X LINUX/UNIX

12 Gracias!! Universidad Complutense de Madrid ESCUELA DE VERANO Introducción Microarrays (tipos, tratamiento de las muestras e hibridación, bases de datos, aplicaciones ) Normalización y análisis exploratorio (JC Oliveros) Análisis de datos (Análisis supervisado y no supervisado, algoritmos ) Práctica Gonzalo Gómez López ggomez@cnio.es Gonzalo Gómez López ggomez@cnio.es Otras herramientas gratuitas: -BASE BRBTools -TM4(MeV) -Otras SNOMAD, DAVID, MIDAW

13 INPUT FILE FORMAT (preprocessing)

14

15 Inf.cluster/genes Expandir arbol

16 SOTA Inf.cluster/genes SOMs FatiScan Inf.cluster/genes

17 FDR!!! FatiScan Output FatiScan output Enriched in A Enriched in B % Genes with the specific GO annotation for each partition

Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -

Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA - Course on Functional Analysis ::: Madrid, June 31st, 2007. Gonzalo Gómez, PhD. ggomez@cnio.es Bioinformatics Unit CNIO ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA

More information

COMPUTATIONAL ANALYSIS OF MICROARRAY DATA

COMPUTATIONAL ANALYSIS OF MICROARRAY DATA COMPUTATIONAL ANALYSIS OF MICROARRAY DATA John Quackenbush Microarray experiments are providing unprecedented quantities of genome-wide data on gene-expression patterns. Although this technique has been

More information

Statistical issues in the analysis of microarray data

Statistical issues in the analysis of microarray data Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data

More information

AIMS ESSAY 2006 Overview of Tools for Microarray Data Analysis and Comparison Analysis

AIMS ESSAY 2006 Overview of Tools for Microarray Data Analysis and Comparison Analysis AIMS ESSAY 2006 Overview of Tools for Microarray Data Analysis and Comparison Analysis By Chodziwadziwa Whiteson Kabudula Supervisors: Bajic, Vladimir B. (Prof) and Hofmann, Oliver (Dr) (South African

More information

Distances, Clustering, and Classification. Heatmaps

Distances, Clustering, and Classification. Heatmaps Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be

More information

Gene expression analysis. Ulf Leser and Karin Zimmermann

Gene expression analysis. Ulf Leser and Karin Zimmermann Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a

More information

Software and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University

Software and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University Software and Methods for the Analysis of Affymetrix GeneChip Data Rafael A Irizarry Department of Biostatistics Johns Hopkins University Outline Overview Bioconductor Project Examples 1: Gene Annotation

More information

Data Mining Techniques for DNA Microarray Data

Data Mining Techniques for DNA Microarray Data Data Mining Techniques for DNA Microarray Data Miguel Rocha Isabel Rocha CCTC / CEB Universidade do Minho PRESENTATION STRUCTURE SCOPE Transcription analysis is the most mature highthroughput analytical

More information

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS

More information

They can be obtained in HQJHQH format directly from the home page at: http://www.engene.cnb.uam.es/downloads/kobayashi.dat

They can be obtained in HQJHQH format directly from the home page at: http://www.engene.cnb.uam.es/downloads/kobayashi.dat HQJHQH70 *XLGHG7RXU This document contains a Guided Tour through the HQJHQH platform and it was created for training purposes with respect to the system options and analysis possibilities. It is not intended

More information

SELDI-TOF Mass Spectrometry Protein Data By Huong Thi Dieu La

SELDI-TOF Mass Spectrometry Protein Data By Huong Thi Dieu La SELDI-TOF Mass Spectrometry Protein Data By Huong Thi Dieu La References Alejandro Cruz-Marcelo, Rudy Guerra, Marina Vannucci, Yiting Li, Ching C. Lau, and Tsz-Kwong Man. Comparison of algorithms for pre-processing

More information

Microarray Data Analysis and Mining Tools

Microarray Data Analysis and Mining Tools www.bioinformation.net Hypothesis Volume 6(3) Microarray Data Analysis and Mining Tools Saravanakumar Selvaraj, Jeyakumar Natarajan* Data Mining and Text Mining Laboratory, Department of Bioinformatics,

More information

Exploratory data analysis approaches unsupervised approaches. Steven Kiddle With thanks to Richard Dobson and Emanuele de Rinaldis

Exploratory data analysis approaches unsupervised approaches. Steven Kiddle With thanks to Richard Dobson and Emanuele de Rinaldis Exploratory data analysis approaches unsupervised approaches Steven Kiddle With thanks to Richard Dobson and Emanuele de Rinaldis Lecture overview Page 1 Ø Background Ø Revision Ø Other clustering methods

More information

Software reviews. Expression Pro ler: A suite of web-based tools for the analysis of microarray gene expression data

Software reviews. Expression Pro ler: A suite of web-based tools for the analysis of microarray gene expression data Expression Pro ler: A suite of web-based tools for the analysis of microarray gene expression data DNA microarray analysis 1±3 has become one of the most widely used tools for the analysis of gene expression

More information

Pattern Recognition Techniques in Microarray Data Analysis: A Survey

Pattern Recognition Techniques in Microarray Data Analysis: A Survey Pattern Recognition Techniques in Microarray Data Analysis: A Survey Faramarz Valafar Department of Computer Science, San Diego State University 5500 Campanile Drive, San Diego, CA 92182, USA faramarz@sciences.sdsu.edu

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

MultiExperiment Viewer Quickstart Guide

MultiExperiment Viewer Quickstart Guide MultiExperiment Viewer Quickstart Guide Table of Contents: I. Preface - 2 II. Installing MeV - 2 III. Opening a Data Set - 2 IV. Filtering - 6 V. Clustering a. HCL - 8 b. K-means - 11 VI. Modules a. T-test

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

More information

A Study of Web Log Analysis Using Clustering Techniques

A Study of Web Log Analysis Using Clustering Techniques A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept

More information

Data Mining in Bioinformatics Day 8: Clustering in Bioinformatics Clustering Gene Expression Data

Data Mining in Bioinformatics Day 8: Clustering in Bioinformatics Clustering Gene Expression Data Data Mining in Bioinformatics Day 8: Clustering in Bioinformatics Clustering Gene Expression Data Chloé-Agathe Azencott & Karsten Borgwardt February 10 to February 21, 2014 Machine Learning & Computational

More information

Cluster 3.0 Manual. for Windows, Mac OS X, Linux, Unix. Michael Eisen; updated by Michiel de Hoon

Cluster 3.0 Manual. for Windows, Mac OS X, Linux, Unix. Michael Eisen; updated by Michiel de Hoon Cluster 3.0 Manual for Windows, Mac OS X, Linux, Unix Michael Eisen; updated by Michiel de Hoon Software copyright c Stanford University 1998-99 This manual was originally written by Michael Eisen. It

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis

Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis Arumugam, P. and Christy, V Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, Tamilnadu,

More information

diagnosis through Random

diagnosis through Random Convegno Calcolo ad Alte Prestazioni "Biocomputing" Bio-molecular diagnosis through Random Subspace Ensembles of Learning Machines Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini DSI Dipartimento

More information

Unsupervised learning: Clustering

Unsupervised learning: Clustering Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What

More information

An Introduction to Microarray Data Analysis

An Introduction to Microarray Data Analysis Chapter An Introduction to Microarray Data Analysis M. Madan Babu Abstract This chapter aims to provide an introduction to the analysis of gene expression data obtained using microarray experiments. It

More information

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Methods for assessing reproducibility of clustering patterns

Methods for assessing reproducibility of clustering patterns Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data Lisa M. McShane 1,*, Michael D. Radmacher 1, Boris Freidlin 1, Ren Yu 2, 3, Ming-Chung Li 2 and Richard

More information

Data Mining in Bioinformatics - A Review

Data Mining in Bioinformatics - A Review Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Clustering Gene Expression Data Chloé-Agathe Azencott & Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational

More information

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

A Review of Anomaly Detection Techniques in Network Intrusion Detection System A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In

More information

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar

More information

Analysis of the colorectal tumor microenvironment using integrative bioinformatic tools

Analysis of the colorectal tumor microenvironment using integrative bioinformatic tools MLECNIK Bernhard & BINDEA Gabriela Analysis of the colorectal tumor microenvironment using integrative bioinformatic tools INSERM U872, Jérôme Galon Team15: Integrative Cancer Immunology Cordeliers Research

More information

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable

More information

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16 Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems

More information

An unsupervised fuzzy ensemble algorithmic scheme for gene expression data analysis

An unsupervised fuzzy ensemble algorithmic scheme for gene expression data analysis An unsupervised fuzzy ensemble algorithmic scheme for gene expression data analysis Roberto Avogadri 1, Giorgio Valentini 1 1 DSI, Dipartimento di Scienze dell Informazione, Università degli Studi di Milano,Via

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

Fecha: 04/12/2010. New standard treatment for breast cancer at early stages established

Fecha: 04/12/2010. New standard treatment for breast cancer at early stages established New standard treatment for breast cancer at early stages established London December 04, 2010 12:01:13 AM IST Spanish Oncology has established a new standard treatment for breast cancer at early stages,

More information

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

A truly robust Expression analyzer

A truly robust Expression analyzer Genowiz A truly robust Expression analyzer Abstract Gene expression profiles of 10,000 tumor samples, disease classification, novel gene finding, linkage analysis, clinical profiling of diseases, finding

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Gene Expression Analysis

Gene Expression Analysis Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands

More information

Computer program review

Computer program review Journal of Vegetation Science 16: 355-359, 2005 IAVS; Opulus Press Uppsala. - Ginkgo, a multivariate analysis package - 355 Computer program review Ginkgo, a multivariate analysis package Bouxin, Guy Haute

More information

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks Ph. D. Student, Eng. Eusebiu Marcu Abstract This paper introduces a new method of combining the

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Hierarchical Clustering Analysis

Hierarchical Clustering Analysis Hierarchical Clustering Analysis What is Hierarchical Clustering? Hierarchical clustering is used to group similar objects into clusters. In the beginning, each row and/or column is considered a cluster.

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Chapter ML:XI (continued)

Chapter ML:XI (continued) Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray

More information

Time series experiments

Time series experiments Time series experiments Time series experiments Why is this a separate lecture: The price of microarrays are decreasing more time series experiments are coming Often a more complex experimental design

More information

Capturing Best Practice for Microarray Gene Expression Data Analysis

Capturing Best Practice for Microarray Gene Expression Data Analysis Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro KDnuggets gps@kdnuggets.com Tom Khabaza SPSS tkhabaza@spss.com Sridhar Ramaswamy MIT / Whitehead Institute

More information

MIC - Detecting Novel Associations in Large Data Sets. by Nico Güttler, Andreas Ströhlein and Matt Huska

MIC - Detecting Novel Associations in Large Data Sets. by Nico Güttler, Andreas Ströhlein and Matt Huska MIC - Detecting Novel Associations in Large Data Sets by Nico Güttler, Andreas Ströhlein and Matt Huska Outline Motivation Method Results Criticism Conclusions Motivation - Goal Determine important undiscovered

More information

CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理

CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理 CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學 Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理 Submitted to Department of Electronic Engineering 電 子 工 程 學 系 in Partial Fulfillment

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Quality Assessment in Spatial Clustering of Data Mining

Quality Assessment in Spatial Clustering of Data Mining Quality Assessment in Spatial Clustering of Data Mining Azimi, A. and M.R. Delavar Centre of Excellence in Geomatics Engineering and Disaster Management, Dept. of Surveying and Geomatics Engineering, Engineering

More information

Data Mining and Machine Learning in Bioinformatics

Data Mining and Machine Learning in Bioinformatics Data Mining and Machine Learning in Bioinformatics PRINCIPAL METHODS AND SUCCESSFUL APPLICATIONS Ruben Armañanzas http://mason.gmu.edu/~rarmanan Adapted from Iñaki Inza slides http://www.sc.ehu.es/isg

More information

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Analysis of gene expression data. Ulf Leser and Philippe Thomas Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:

More information

SoSe 2014: M-TANI: Big Data Analytics

SoSe 2014: M-TANI: Big Data Analytics SoSe 2014: M-TANI: Big Data Analytics Lecture 4 21/05/2014 Sead Izberovic Dr. Nikolaos Korfiatis Agenda Recap from the previous session Clustering Introduction Distance mesures Hierarchical Clustering

More information

270107 - MD - Data Mining

270107 - MD - Data Mining Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 015 70 - FIB - Barcelona School of Informatics 715 - EIO - Department of Statistics and Operations Research 73 - CS - Department of

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad

More information

Data Mining and Neural Networks in Stata

Data Mining and Neural Networks in Stata Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca mario.lucchini@unimib.it maurizio.pisati@unimib.it

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Quality Assessment of Exon and Gene Arrays

Quality Assessment of Exon and Gene Arrays Quality Assessment of Exon and Gene Arrays I. Introduction In this white paper we describe some quality assessment procedures that are computed from CEL files from Whole Transcript (WT) based arrays such

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

Manejo Basico del Servidor de Aplicaciones WebSphere Application Server 6.0

Manejo Basico del Servidor de Aplicaciones WebSphere Application Server 6.0 Manejo Basico del Servidor de Aplicaciones WebSphere Application Server 6.0 Ing. Juan Alfonso Salvia Arquitecto de Aplicaciones IBM Uruguay Slide 2 of 45 Slide 3 of 45 Instalacion Basica del Server La

More information

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information

INTELIGENCIA DE NEGOCIO CON SQL SERVER

INTELIGENCIA DE NEGOCIO CON SQL SERVER INTELIGENCIA DE NEGOCIO CON SQL SERVER Este curso de Microsoft e-learning está orientado a preparar a los alumnos en el desarrollo de soluciones de Business Intelligence con SQL Server. El curso consta

More information

Data Mining Analysis of HIV-1 Protease Crystal Structures

Data Mining Analysis of HIV-1 Protease Crystal Structures Data Mining Analysis of HIV-1 Protease Crystal Structures Gene M. Ko, A. Srinivas Reddy, Sunil Kumar, and Rajni Garg AP0907 09 Data Mining Analysis of HIV-1 Protease Crystal Structures Gene M. Ko 1, A.

More information

life science data mining

life science data mining life science data mining - '.)'-. < } ti» (>.:>,u» c ~'editors Stephen Wong Harvard Medical School, USA Chung-Sheng Li /BM Thomas J Watson Research Center World Scientific NEW JERSEY LONDON SINGAPORE.

More information

DeCyder Extended Data Analysis module Version 1.0

DeCyder Extended Data Analysis module Version 1.0 GE Healthcare DeCyder Extended Data Analysis module Version 1.0 Module for DeCyder 2D version 6.5 User Manual Contents 1 Introduction 1.1 Introduction... 7 1.2 The DeCyder EDA User Manual... 9 1.3 Getting

More information

Visualization and Clustering with High-dimensional Genomic Data

Visualization and Clustering with High-dimensional Genomic Data Visualization and Clustering with High-dimensional Genomic Data Zhenqiu Liu, PhD 10/14/2014 Agenda Data Visualization Big data and their visualization PCA and MDS for data visualization Clustering and

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Cluster Analysis: Basic Concepts and Algorithms

Cluster Analysis: Basic Concepts and Algorithms Cluster Analsis: Basic Concepts and Algorithms What does it mean clustering? Applications Tpes of clustering K-means Intuition Algorithm Choosing initial centroids Bisecting K-means Post-processing Strengths

More information

Applications of Neural Network and Genetic Algorithm Data Mining Techniques in Bioinformatics Knowledge Discovery - A Preliminary Study

Applications of Neural Network and Genetic Algorithm Data Mining Techniques in Bioinformatics Knowledge Discovery - A Preliminary Study Applications of Neural Network and Genetic Algorithm Data Mining Techniques in Bioinformatics Knowledge Discovery - A Preliminary Study Richard S. Segall Arkansas State University Department of Computer

More information

HIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM

HIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM HIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM Ms.Barkha Malay Joshi M.E. Computer Science and Engineering, Parul Institute Of Engineering & Technology, Waghodia. India Email:

More information

Machine Learning. 01 - Introduction

Machine Learning. 01 - Introduction Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge

More information

Microarray Data Mining: Puce a ADN

Microarray Data Mining: Puce a ADN Microarray Data Mining: Puce a ADN Recent Developments Gregory Piatetsky-Shapiro KDnuggets EGC 2005, Paris 2005 KDnuggets EGC 2005 Role of Gene Expression Cell Nucleus Chromosome Gene expression Protein

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information

A comparison of various clustering methods and algorithms in data mining

A comparison of various clustering methods and algorithms in data mining Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering

More information

Cluster analysis Cosmin Lazar. COMO Lab VUB

Cluster analysis Cosmin Lazar. COMO Lab VUB Cluster analysis Cosmin Lazar COMO Lab VUB Introduction Cluster analysis foundations rely on one of the most fundamental, simple and very often unnoticed ways (or methods) of understanding and learning,

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

Data Acquisition. DNA microarrays. The functional genomics pipeline. Experimental design affects outcome data analysis

Data Acquisition. DNA microarrays. The functional genomics pipeline. Experimental design affects outcome data analysis Data Acquisition DNA microarrays The functional genomics pipeline Experimental design affects outcome data analysis Data acquisition microarray processing Data preprocessing scaling/normalization/filtering

More information

6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining 6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

More information

Tables of significant values of Jaccard's index of similarity

Tables of significant values of Jaccard's index of similarity Miscel.lania Zooloqica 22.1 (1999) Tables of significant values of Jaccard's index of similarity R. Real Real, R., 1999. Tables of significant values of Jaccard's index of similarity. Misc. Zool., 22.1:

More information

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA

PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,

More information

LINIO COLOMBIA. Starting-Up & Leading E-Commerce. www.linio.com.co. Luca Ranaldi, CEO. Pedro Freire, VP Marketing and Business Development

LINIO COLOMBIA. Starting-Up & Leading E-Commerce. www.linio.com.co. Luca Ranaldi, CEO. Pedro Freire, VP Marketing and Business Development LINIO COLOMBIA Starting-Up & Leading E-Commerce Luca Ranaldi, CEO Pedro Freire, VP Marketing and Business Development 22 de Agosto 2013 www.linio.com.co QUÉ ES LINIO? Linio es la tienda online #1 en Colombia

More information

Biomedicine The background. The main interest. The tools

Biomedicine The background. The main interest. The tools 1 Biomedicine The background The main interest? Bioinformatics Clinical informatics The tools 2 Outline 3 Outline 4 Working on Network Data Analysis HH RR Infrastructure Training BIOCOMPUTATION & STRUCTURAL

More information

Segmentation of stock trading customers according to potential value

Segmentation of stock trading customers according to potential value Expert Systems with Applications 27 (2004) 27 33 www.elsevier.com/locate/eswa Segmentation of stock trading customers according to potential value H.W. Shin a, *, S.Y. Sohn b a Samsung Economy Research

More information

Introduction to Clustering

Introduction to Clustering Introduction to Clustering Yumi Kondo Student Seminar LSK301 Sep 25, 2010 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, 2010 1 / 36 Microarray Example N=65 P=1756 Yumi

More information

Clustering and Visualising Microarray Data Patterns

Clustering and Visualising Microarray Data Patterns Francisco Azuaje is a reader at the University of Ulster. His research involves data mining and integration applications in biosciences. He is on the editorial board of the IEEE Transactions on Nanobioscience

More information