The data. Introducción al análisis de datos en microarrays ... Characteristics of the data: Universidad Complutense de Madrid ESCUELA DE VERANO 2007
|
|
- Damian Marshall
- 8 years ago
- Views:
Transcription
1 Universidad Complutense de Madrid ESCUELA DE VERANO 2007 Universidad Complutense de Madrid ESCUELA DE VERANO 2007 Introducción al análisis de datos en microarrays 1. Introducción 2. Microarrays (tipos, tratamiento de las muestras e hibridación, bases de datos, aplicaciones ) 3. Normalizacion y análisis exploratorio (JC Oliveros) 4. Análisis de datos (Análisis supervisado y no supervisado, algoritmos ) 5. Práctica Gonzalo Gómez López INB-CNIO ggomez@cnio.es Madrid, 26 de Julio, 2007 Gonzalo Gómez López ggomez@cnio.es Genes (thousands)... A B C Experimental conditions (from tens up to no more than a few hundreds) The data Different classes of experimental conditions, e.g. Cancer types, tissues, drug treatments, time survival, etc. Expression profile of all the genes for a experimental condition (1 array) Expression profile of a gene across the experimental conditions Characteristics of the data: Low signal to noise ratio High redundancy and intra-gene correlations Most of the genes are not informative with respect to the trait we are studying (physiological conditions, etc.) Many genes have no annotations!! Source: J. Dopazo Unsupervised methods Brief history in microarray data analysis: Supervised methods (predictors) 1. Clustering methods Neural networks (Khan et al. 2001) UPGMA (Sneath and Sokal, 1973) SOMs (Kohonen et al. 1984) kmeans (Hartigan and Wong 1979) knn (Ripley 1996; Hastie et al 2001) kmedians (Hartigan and Wong 1979) PAM (Tibshirani et al. 2002) SOTA (Herrero et al. 2001) DLDA (Dudoit et al. 2002) SOM (Kohonen 1979, Tamayo et al 1999) SVMs (Furey et al, 2000) Gene Shaving (Hastie et al, 2000) Fuzzy methods (Dougherty ET AL. 2002) Probabilistic (Bhattacharjee et al. 2001) Metagenes (Pittman et al, 2004) 2. Exploratory analysis parametric: ttest, SAM (Tusher et al, 2001), ANOVA non parametric: Welch ttest, Wilcoxon, Kruskal Wallis Sumarizing datasets: PCA 3. Blocks of genes Kolmogorov-Smirnof: GSEA (Subramanian et al. 2005) FatiScan (Al-Shahrour et al. 2005) SAM 3.0 (Jan 07) Gonzalo Gómez GeneTrail López (Backes et al. 2007)
2 Overexpression EJEMPLO or No Change Underexpression Image analysis comparison (normalization and filtering) Data analysis (supervised, unsupervised ) Ratios (or not ) Log 2 transform Underexpression Overexpression x1/2 x2 Why Log 2 transform?? QUESTIONS & METHODOLOGICAL APPROACH Ratio scale Log scale (lineal) Can we find groups of experiments with similar gene expression profiles? Different phenotypes... Unsupervised analysis Supervised analysis Reverse engineering Molecular classification of samples What genes are responsible for? Co-expressing genes... What do they have in common? No Log transform After Log transform What profile(s) do they display? and... Genes interacting in a network (A,B,C..)... Genes of a class Are there more genes? How is the network? A B E C D Source: J. Dopazo
3 Algorithms & Reminding Examples Algorithms Molecular classification of cancer Known class prediction (Supervised data analysis).? Hierarchical Neural networks PCA SVM A set of instructions for solving a problem. When the instructions are followed, it must eventually stop with an answer. Computer Science Computational geometry Image processing Design Real-time simulations Economy Ecomomy models Prediction & simulation Applications A.I. Discovering SOTA class SOMs (Unsupervised data analysis).? k-means UPGMA Engineering Brief history in microarray data analysis: Unsupervised methods Supervised methods (predictors) Neural networks (Khan et al. 2001) 1. Clustering methods SOMs (Kohonen et al. 1984) UPGMA (Sneath and Sokal, 1973) knn (Ripley 1996; Hastie et al 2001) kmeans (Hartigan and Wong 1979) PAM (Tibshirani et al. 2002) kmedians (Hartigan and Wong 1979) DLDA (Dudoit et al. 2002) SOTA (Herrero et al. 2001) SVMs (Furey et al, 2000) SOM (Kohonen 1979, Tamayo et al 1999) Gene Shaving (Hastie et al, 2000) Fuzzy methods (Dougherty ET AL. 2002) Probabilistic (Bhattacharjee et al. 2001) Metagenes (Pittman et al, 2004) 2. Exploratory analysis parametric: ttest, SAM (Tusher et al, 2001), ANOVA non parametric: Welch ttest, Wilcoxon, Kruskal Wallis Sumarizing datasets: PCA 3. Blocks of genes Kolmogorov-Smirnof: GSEA (Subramanian et al. 2005) FatiScan (Al-Shahrour et al. 2005) SAM 3.0 (Jan 07) GeneTrail Gonzalo Gómez López (Backes et al. 2007) Science Cosmology & radio astronomy Condensed matter physics 3D proteins design Mathematics N items Cryptography Cipher & decipher codes Patterns recognition Artificial vision Applications and tasks Stochastic systems evolution Proteins folding prediction Biological evolution Atomistic resolution Nanomaterials Communications Optimizations Meteorology Simulation & prediction CLUSTERING Clustering??? K groups
4 Algorithms & Algorithms & Agglomerative Hierarchical algorithms Different methods UPGMA Divisive SOTA Non-hierarchical algorithms Clustering algorithm 26 items 4 groups Why different algorithms? Y C1 C C4 K-means SOMs Data analysis in microarrays Hierarchical & dendrograms I) Cartesian algorithm (x, y):...c3... a) (X>0,Y>0)! C3, 1/4 C2? b) (X<0,Y>0)! C1, 1/4 C2? c) (X<0,Y<0)! C4?, 1/4 C2? d) (X>0,Y<0)! C4?, 1/4 C2? aubucud! C1, C2, C3, C4. x II) Polar algorithm (", m): There s no a perfect method e) ("=cte, m)! C1... f) (", m=cte)! C2, C3?... g) (", m)! C1, C2, C3, C4. Different methods Hierarchical algorithms Non-hierarchical algorithms K-means SOMs Etc Others Agglomerative UPGMA Divisive SOTA Artificial Neural Networks (ANNs) SVM PCA
5 UPGMA Unweighted Pair Group Method with Arithmetic mean Distances between genes?? Metrics Example Pearson Correlation Coefficient Gene 1 1) B y C son los dos genes más próximos (en sus medias) 2) B y C se unen y se recalcula la matriz de distancias empleando el nuevo cluster en vez de B y C. A y D son ahora los mas próximos. 3) A y D tambien se unen. Se reordenan los elementos para ajustar la topología del árbol. Se recalcula la matriz de distancias entre los grupos existentes 4) El proceso termina cuando el dendrograma esta construído. Exp 1 Exp 2 Exp 3 Euclidean distance Pearson correlation coefficient Spearman! correlation coefficient RNA express Gene RNA express Gene RNA express Gene Gene 2 Gene Gene 1 Gene 2 Gene 3 DISTANCIA B C A Dendrograms A EUCLIDEANA Linkage methods Niveles de Expresión B C t CORRELACIÓN CITY BLOCK (Manhattan or taxi cab) OTRAS (Canberra, Bray-Curtis ) PEARSON B A C CLUSTER 1 A C B D E F G CLUSTER 3 CLUSTER 2 Average-linkage method Single-linkage method Complete-linkage method A B C D E F G E D F G A B C G F E D C B A Others SPEARMAN (Weighted pair-group average, Within-groups Ward s method )
6 Ejemplos Hierarchical - UPGMA En nuestra práctica: GEPAS? Garber M. E. et al. PNAS, vol.98, nº24; SOTA Data analysis in microarrays Self-Organizing Tree Algorithm Hierarchical & dendrograms Different methods Hierarchical algorithms Non-hierarchical algorithms K-means SOMs Etc Others - Artificial Neural Network - Growth from the root of the tree, toward the leaves (from lower to higher resolution ) -Threshold of resource value should be fixed -Generation of a hierarchical cluster structure at the desired level of resolution Agglomerative UPGMA Divisive SOTA Artificial Neural Networks (ANNs) SVM PCA Herrero et al. Bioinformatics. 17:
7 SOTA Self-Organizing Tree Algorithm SOTA Self-Organizing Tree Algorithm Number of genes Average profiles Un ejemplo En nuestra práctica: GEPAS K-means Data analysis in microarrays K-means & Self organizing-maps (SOMs) Gene expression Time Gene expression 1 2 Time3 Gene expression Different methods Hierarchical algorithms Non-hierarchical algorithms K-means SOMs Etc Others Agglomerative UPGMA SOTA Divisive Artificial Neural Networks (ANNs) SVM PCA 4 Time5 6
8 K-means K-means Example More information: K-means Example 2 K= 12 d te s a l Re ene g Self-organizing maps (SOMs) or Kohonen Maps SOMs Geometric dependence, i.e. 4 x 3 A B C D exp t Kohonen T. Proc IEEE 78(9): ,1990 Tamayo P. et al. PNAS, 96: ; 1999.
9 Self-organizing maps (SOMs) Unsupervised methods Brief history in microarray data analysis: Supervised methods (predictors) EJEMPLO Indices de bienestar mundiales 1. Clustering methods Neural networks (Khan et al. 2001) UPGMA (Sneath and Sokal, 1973) SOMs (Kohonen et al. 1984) kmeans (Hartigan and Wong 1979) knn (Ripley 1996; Hastie et al 2001) kmedians (Hartigan and Wong 1979) PAM (Tibshirani et al. 2002) SOTA (Herrero et al. 2001) DLDA (Dudoit et al. 2002) SOM (Kohonen 1979, Tamayo et al 1999) SVMs (Furey et al, 2000) Gene Shaving (Hastie et al, 2000) Fuzzy methods (Dougherty ET AL. 2002) Probabilistic (Bhattacharjee et al. 2001) Metagenes (Pittman et al, 2004) 2. Exploratory analysis parametric: ttest, SAM (Tusher et al, 2001), ANOVA non parametric: Welch ttest, Wilcoxon, Kruskal Wallis Sumarizing datasets: PCA 3. Blocks of genes Kolmogorov-Smirnof: GSEA (Subramanian et al. 2005) FatiScan (Al-Shahrour et al. 2005) SAM 3.0 (Jan 07) Gonzalo Gómez GeneTrail López (Backes et al. 2007) Class1 Class2 Exploratory analysis Currently testing block of genes ~ biological pathways FDR< testing genes independently... GSEA (Subramanian et al. PNAS ) Gene Set Enrichment Analysis Broad Institute v 2.0 available since Jan 2007 New version includes Biocarta, Broad Institute, GeneMAPP, KEGG annotations and more... Platforms: Affymetrix, Agilent, CodeLink, 2-color... ttest cut-off GSEA applies Kolmogorov-Smirnof test to find assymmetrical distributions for defined blocks of genes in datasets whole distribution. FDR<0.05 Biological meaning?
10 - Class1 Class2 Gene Set 1 Gene Set 2 Gene Set 3 Now analizing block of genes ~ biological pathways FatiScan Gene set 3 enriched in Class 2 Al-Shahrour et al. Bioinformatics Jul 1;21(13): Available in Babelomics suite: ES/NES statistic ttest cut-off Segmentation test for searching asymmetries More statistically sensitive than GSEA Few annotations implemented En nuestra practica: Gene set 2 enriched in Class 1 + Datamining Mathematics methods for biology A di sc us si on Tengo mis datos... He analizado su estructura y relevancia estadística Qué es, que hace y que sabemos de este gen? Que hay en este cluster?? Cell cycle... DBs Information Current math s methods are useful to explain real and meaningful processes from a biological point of view? mmm actually we don t know sometimes they are sometimes not New integrative methods should be developed Clustering,estadística, etc Source: J. Dopazo Datamining Links
11 Datamining Para profundizar Data Analysis Tools for DNA Microarrays by Sorin Draghic DNA Microarrays and Gene Expression : From Experiments to Data Analysis and Modeling by Pierre Baldi, G. Wesley Hatfield, Wesley G. Hatfield Exploration and Analysis of DNA Microarray and Protein Array Data (Wiley Series in Probability and Statistics) by Dhammika Amaratunga, Javier Cabrera And good luck NetAffx ANSWER Your genes, hypothesis, questions Statistics for Microarrays : Design, Analysis and Inference by Ernst Wit, John McClure Analyzing Microarray Gene Expression Data (Wiley Series in Probability and Statistics) by Geoffrey J. McLachlan, Kim-Anh Do, Christophe Ambroise En nuestra práctica: Bioinformatics and Computational Biology Solutions Using R and Bioconductor (Statistics for Biology and Health) by Robert Gentleman, Vincent Carey, Wolfgang Huber, Rafael Irizarry, Sandrine Dudoit Microarray Gene Expression Data Analysis: A Beginner's Guide by Helen Causton Windows Mac OS X LINUX/UNIX
12 Gracias!! Universidad Complutense de Madrid ESCUELA DE VERANO Introducción Microarrays (tipos, tratamiento de las muestras e hibridación, bases de datos, aplicaciones ) Normalización y análisis exploratorio (JC Oliveros) Análisis de datos (Análisis supervisado y no supervisado, algoritmos ) Práctica Gonzalo Gómez López ggomez@cnio.es Gonzalo Gómez López ggomez@cnio.es Otras herramientas gratuitas: -BASE BRBTools -TM4(MeV) -Otras SNOMAD, DAVID, MIDAW
13 INPUT FILE FORMAT (preprocessing)
14
15 Inf.cluster/genes Expandir arbol
16 SOTA Inf.cluster/genes SOMs FatiScan Inf.cluster/genes
17 FDR!!! FatiScan Output FatiScan output Enriched in A Enriched in B % Genes with the specific GO annotation for each partition
Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -
Course on Functional Analysis ::: Madrid, June 31st, 2007. Gonzalo Gómez, PhD. ggomez@cnio.es Bioinformatics Unit CNIO ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA
More informationCOMPUTATIONAL ANALYSIS OF MICROARRAY DATA
COMPUTATIONAL ANALYSIS OF MICROARRAY DATA John Quackenbush Microarray experiments are providing unprecedented quantities of genome-wide data on gene-expression patterns. Although this technique has been
More informationStatistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
More informationAIMS ESSAY 2006 Overview of Tools for Microarray Data Analysis and Comparison Analysis
AIMS ESSAY 2006 Overview of Tools for Microarray Data Analysis and Comparison Analysis By Chodziwadziwa Whiteson Kabudula Supervisors: Bajic, Vladimir B. (Prof) and Hofmann, Oliver (Dr) (South African
More informationDistances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
More informationGene expression analysis. Ulf Leser and Karin Zimmermann
Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a
More informationSoftware and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University
Software and Methods for the Analysis of Affymetrix GeneChip Data Rafael A Irizarry Department of Biostatistics Johns Hopkins University Outline Overview Bioconductor Project Examples 1: Gene Annotation
More informationData Mining Techniques for DNA Microarray Data
Data Mining Techniques for DNA Microarray Data Miguel Rocha Isabel Rocha CCTC / CEB Universidade do Minho PRESENTATION STRUCTURE SCOPE Transcription analysis is the most mature highthroughput analytical
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationThey can be obtained in HQJHQH format directly from the home page at: http://www.engene.cnb.uam.es/downloads/kobayashi.dat
HQJHQH70 *XLGHG7RXU This document contains a Guided Tour through the HQJHQH platform and it was created for training purposes with respect to the system options and analysis possibilities. It is not intended
More informationSELDI-TOF Mass Spectrometry Protein Data By Huong Thi Dieu La
SELDI-TOF Mass Spectrometry Protein Data By Huong Thi Dieu La References Alejandro Cruz-Marcelo, Rudy Guerra, Marina Vannucci, Yiting Li, Ching C. Lau, and Tsz-Kwong Man. Comparison of algorithms for pre-processing
More informationMicroarray Data Analysis and Mining Tools
www.bioinformation.net Hypothesis Volume 6(3) Microarray Data Analysis and Mining Tools Saravanakumar Selvaraj, Jeyakumar Natarajan* Data Mining and Text Mining Laboratory, Department of Bioinformatics,
More informationExploratory data analysis approaches unsupervised approaches. Steven Kiddle With thanks to Richard Dobson and Emanuele de Rinaldis
Exploratory data analysis approaches unsupervised approaches Steven Kiddle With thanks to Richard Dobson and Emanuele de Rinaldis Lecture overview Page 1 Ø Background Ø Revision Ø Other clustering methods
More informationSoftware reviews. Expression Pro ler: A suite of web-based tools for the analysis of microarray gene expression data
Expression Pro ler: A suite of web-based tools for the analysis of microarray gene expression data DNA microarray analysis 1±3 has become one of the most widely used tools for the analysis of gene expression
More informationPattern Recognition Techniques in Microarray Data Analysis: A Survey
Pattern Recognition Techniques in Microarray Data Analysis: A Survey Faramarz Valafar Department of Computer Science, San Diego State University 5500 Campanile Drive, San Diego, CA 92182, USA faramarz@sciences.sdsu.edu
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationMultiExperiment Viewer Quickstart Guide
MultiExperiment Viewer Quickstart Guide Table of Contents: I. Preface - 2 II. Installing MeV - 2 III. Opening a Data Set - 2 IV. Filtering - 6 V. Clustering a. HCL - 8 b. K-means - 11 VI. Modules a. T-test
More informationNeural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationA Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
More informationData Mining in Bioinformatics Day 8: Clustering in Bioinformatics Clustering Gene Expression Data
Data Mining in Bioinformatics Day 8: Clustering in Bioinformatics Clustering Gene Expression Data Chloé-Agathe Azencott & Karsten Borgwardt February 10 to February 21, 2014 Machine Learning & Computational
More informationCluster 3.0 Manual. for Windows, Mac OS X, Linux, Unix. Michael Eisen; updated by Michiel de Hoon
Cluster 3.0 Manual for Windows, Mac OS X, Linux, Unix Michael Eisen; updated by Michiel de Hoon Software copyright c Stanford University 1998-99 This manual was originally written by Michael Eisen. It
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationAdvanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis
Advanced Web Usage Mining Algorithm using Neural Network and Principal Component Analysis Arumugam, P. and Christy, V Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, Tamilnadu,
More informationdiagnosis through Random
Convegno Calcolo ad Alte Prestazioni "Biocomputing" Bio-molecular diagnosis through Random Subspace Ensembles of Learning Machines Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini DSI Dipartimento
More informationUnsupervised learning: Clustering
Unsupervised learning: Clustering Salissou Moutari Centre for Statistical Science and Operational Research CenSSOR 17 th September 2013 Unsupervised learning: Clustering 1/52 Outline 1 Introduction What
More informationAn Introduction to Microarray Data Analysis
Chapter An Introduction to Microarray Data Analysis M. Madan Babu Abstract This chapter aims to provide an introduction to the analysis of gene expression data obtained using microarray experiments. It
More informationLecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions
SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationMethods for assessing reproducibility of clustering patterns
Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data Lisa M. McShane 1,*, Michael D. Radmacher 1, Boris Freidlin 1, Ren Yu 2, 3, Ming-Chung Li 2 and Richard
More informationData Mining in Bioinformatics - A Review
Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Clustering Gene Expression Data Chloé-Agathe Azencott & Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational
More informationA Review of Anomaly Detection Techniques in Network Intrusion Detection System
A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In
More informationExample: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering
Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar
More informationAnalysis of the colorectal tumor microenvironment using integrative bioinformatic tools
MLECNIK Bernhard & BINDEA Gabriela Analysis of the colorectal tumor microenvironment using integrative bioinformatic tools INSERM U872, Jérôme Galon Team15: Integrative Cancer Immunology Cordeliers Research
More informationUNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable
More informationBIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
More informationAn unsupervised fuzzy ensemble algorithmic scheme for gene expression data analysis
An unsupervised fuzzy ensemble algorithmic scheme for gene expression data analysis Roberto Avogadri 1, Giorgio Valentini 1 1 DSI, Dipartimento di Scienze dell Informazione, Università degli Studi di Milano,Via
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationFecha: 04/12/2010. New standard treatment for breast cancer at early stages established
New standard treatment for breast cancer at early stages established London December 04, 2010 12:01:13 AM IST Spanish Oncology has established a new standard treatment for breast cancer at early stages,
More informationClustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
More informationClustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
More informationA truly robust Expression analyzer
Genowiz A truly robust Expression analyzer Abstract Gene expression profiles of 10,000 tumor samples, disease classification, novel gene finding, linkage analysis, clinical profiling of diseases, finding
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationComparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
More informationGene Expression Analysis
Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands
More informationComputer program review
Journal of Vegetation Science 16: 355-359, 2005 IAVS; Opulus Press Uppsala. - Ginkgo, a multivariate analysis package - 355 Computer program review Ginkgo, a multivariate analysis package Bouxin, Guy Haute
More informationMethod of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks
Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks Ph. D. Student, Eng. Eusebiu Marcu Abstract This paper introduces a new method of combining the
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationHierarchical Clustering Analysis
Hierarchical Clustering Analysis What is Hierarchical Clustering? Hierarchical clustering is used to group similar objects into clusters. In the beginning, each row and/or column is considered a cluster.
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
More informationMobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
More informationMolecular Genetics: Challenges for Statistical Practice. J.K. Lindsey
Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray
More informationTime series experiments
Time series experiments Time series experiments Why is this a separate lecture: The price of microarrays are decreasing more time series experiments are coming Often a more complex experimental design
More informationCapturing Best Practice for Microarray Gene Expression Data Analysis
Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro KDnuggets gps@kdnuggets.com Tom Khabaza SPSS tkhabaza@spss.com Sridhar Ramaswamy MIT / Whitehead Institute
More informationMIC - Detecting Novel Associations in Large Data Sets. by Nico Güttler, Andreas Ströhlein and Matt Huska
MIC - Detecting Novel Associations in Large Data Sets by Nico Güttler, Andreas Ströhlein and Matt Huska Outline Motivation Method Results Criticism Conclusions Motivation - Goal Determine important undiscovered
More informationCITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理
CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學 Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理 Submitted to Department of Electronic Engineering 電 子 工 程 學 系 in Partial Fulfillment
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
More informationQuality Assessment in Spatial Clustering of Data Mining
Quality Assessment in Spatial Clustering of Data Mining Azimi, A. and M.R. Delavar Centre of Excellence in Geomatics Engineering and Disaster Management, Dept. of Surveying and Geomatics Engineering, Engineering
More informationData Mining and Machine Learning in Bioinformatics
Data Mining and Machine Learning in Bioinformatics PRINCIPAL METHODS AND SUCCESSFUL APPLICATIONS Ruben Armañanzas http://mason.gmu.edu/~rarmanan Adapted from Iñaki Inza slides http://www.sc.ehu.es/isg
More informationAnalysis of gene expression data. Ulf Leser and Philippe Thomas
Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:
More informationSoSe 2014: M-TANI: Big Data Analytics
SoSe 2014: M-TANI: Big Data Analytics Lecture 4 21/05/2014 Sead Izberovic Dr. Nikolaos Korfiatis Agenda Recap from the previous session Clustering Introduction Distance mesures Hierarchical Clustering
More information270107 - MD - Data Mining
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 015 70 - FIB - Barcelona School of Informatics 715 - EIO - Department of Statistics and Operations Research 73 - CS - Department of
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationA Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization
A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad
More informationData Mining and Neural Networks in Stata
Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca mario.lucchini@unimib.it maurizio.pisati@unimib.it
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationQuality Assessment of Exon and Gene Arrays
Quality Assessment of Exon and Gene Arrays I. Introduction In this white paper we describe some quality assessment procedures that are computed from CEL files from Whole Transcript (WT) based arrays such
More informationSPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
More informationManejo Basico del Servidor de Aplicaciones WebSphere Application Server 6.0
Manejo Basico del Servidor de Aplicaciones WebSphere Application Server 6.0 Ing. Juan Alfonso Salvia Arquitecto de Aplicaciones IBM Uruguay Slide 2 of 45 Slide 3 of 45 Instalacion Basica del Server La
More informationAnalyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6
Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,
More informationData Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
More informationCluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
More informationIT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
More informationINTELIGENCIA DE NEGOCIO CON SQL SERVER
INTELIGENCIA DE NEGOCIO CON SQL SERVER Este curso de Microsoft e-learning está orientado a preparar a los alumnos en el desarrollo de soluciones de Business Intelligence con SQL Server. El curso consta
More informationData Mining Analysis of HIV-1 Protease Crystal Structures
Data Mining Analysis of HIV-1 Protease Crystal Structures Gene M. Ko, A. Srinivas Reddy, Sunil Kumar, and Rajni Garg AP0907 09 Data Mining Analysis of HIV-1 Protease Crystal Structures Gene M. Ko 1, A.
More informationlife science data mining
life science data mining - '.)'-. < } ti» (>.:>,u» c ~'editors Stephen Wong Harvard Medical School, USA Chung-Sheng Li /BM Thomas J Watson Research Center World Scientific NEW JERSEY LONDON SINGAPORE.
More informationDeCyder Extended Data Analysis module Version 1.0
GE Healthcare DeCyder Extended Data Analysis module Version 1.0 Module for DeCyder 2D version 6.5 User Manual Contents 1 Introduction 1.1 Introduction... 7 1.2 The DeCyder EDA User Manual... 9 1.3 Getting
More informationVisualization and Clustering with High-dimensional Genomic Data
Visualization and Clustering with High-dimensional Genomic Data Zhenqiu Liu, PhD 10/14/2014 Agenda Data Visualization Big data and their visualization PCA and MDS for data visualization Clustering and
More informationIntroduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationCluster Analysis: Basic Concepts and Algorithms
Cluster Analsis: Basic Concepts and Algorithms What does it mean clustering? Applications Tpes of clustering K-means Intuition Algorithm Choosing initial centroids Bisecting K-means Post-processing Strengths
More informationApplications of Neural Network and Genetic Algorithm Data Mining Techniques in Bioinformatics Knowledge Discovery - A Preliminary Study
Applications of Neural Network and Genetic Algorithm Data Mining Techniques in Bioinformatics Knowledge Discovery - A Preliminary Study Richard S. Segall Arkansas State University Department of Computer
More informationHIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM
HIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM Ms.Barkha Malay Joshi M.E. Computer Science and Engineering, Parul Institute Of Engineering & Technology, Waghodia. India Email:
More informationMachine Learning. 01 - Introduction
Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge
More informationMicroarray Data Mining: Puce a ADN
Microarray Data Mining: Puce a ADN Recent Developments Gregory Piatetsky-Shapiro KDnuggets EGC 2005, Paris 2005 KDnuggets EGC 2005 Role of Gene Expression Cell Nucleus Chromosome Gene expression Protein
More informationCONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
More informationA comparison of various clustering methods and algorithms in data mining
Volume :2, Issue :5, 32-36 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 R.Tamilselvi B.Sivasakthi R.Kavitha Assistant Professor A comparison of various clustering
More informationCluster analysis Cosmin Lazar. COMO Lab VUB
Cluster analysis Cosmin Lazar COMO Lab VUB Introduction Cluster analysis foundations rely on one of the most fundamental, simple and very often unnoticed ways (or methods) of understanding and learning,
More informationClustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
More informationData Acquisition. DNA microarrays. The functional genomics pipeline. Experimental design affects outcome data analysis
Data Acquisition DNA microarrays The functional genomics pipeline Experimental design affects outcome data analysis Data acquisition microarray processing Data preprocessing scaling/normalization/filtering
More information6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
More informationTables of significant values of Jaccard's index of similarity
Miscel.lania Zooloqica 22.1 (1999) Tables of significant values of Jaccard's index of similarity R. Real Real, R., 1999. Tables of significant values of Jaccard's index of similarity. Misc. Zool., 22.1:
More informationPERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA
PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,
More informationLINIO COLOMBIA. Starting-Up & Leading E-Commerce. www.linio.com.co. Luca Ranaldi, CEO. Pedro Freire, VP Marketing and Business Development
LINIO COLOMBIA Starting-Up & Leading E-Commerce Luca Ranaldi, CEO Pedro Freire, VP Marketing and Business Development 22 de Agosto 2013 www.linio.com.co QUÉ ES LINIO? Linio es la tienda online #1 en Colombia
More informationBiomedicine The background. The main interest. The tools
1 Biomedicine The background The main interest? Bioinformatics Clinical informatics The tools 2 Outline 3 Outline 4 Working on Network Data Analysis HH RR Infrastructure Training BIOCOMPUTATION & STRUCTURAL
More informationSegmentation of stock trading customers according to potential value
Expert Systems with Applications 27 (2004) 27 33 www.elsevier.com/locate/eswa Segmentation of stock trading customers according to potential value H.W. Shin a, *, S.Y. Sohn b a Samsung Economy Research
More informationIntroduction to Clustering
Introduction to Clustering Yumi Kondo Student Seminar LSK301 Sep 25, 2010 Yumi Kondo (University of British Columbia) Introduction to Clustering Sep 25, 2010 1 / 36 Microarray Example N=65 P=1756 Yumi
More informationClustering and Visualising Microarray Data Patterns
Francisco Azuaje is a reader at the University of Ulster. His research involves data mining and integration applications in biosciences. He is on the editorial board of the IEEE Transactions on Nanobioscience
More information