type The annotations of the 62 samples with respect to the cancer types FL, CLL, DLBCL-A, DLBCL-G.
|
|
- Willis Leonard
- 8 years ago
- Views:
Transcription
1 alizadeh Samle a from a lymhoma/leukemia gene exression study Samle a for the ISIS method Format x A 2000 x 62 gene exression a matrix of log-ratio values. 2,000 genes with the highest variance across the samles have been selected from the set of 4,026 genes considered by the original authors. tye The annotations of the 62 samles with resect to the cancer tyes FL, CLL, DLBCL-A, DLBCL-G. Source The a are from the lymhoma/leukemia study of A. Alizadeh et al., Nature 403: (2000), htt://llm.nih.gov/lymhoma/a/figure1/figure1.cdt gotomax Local search function Finds local maxima of the score function tscore by greedy hill climbing. gotomax(, slit, = 50,.offs = 0, min.slit.size = ceiling(0.1*ncol())) slit The gene exression a matrix. Rows corresond to genes, and columns corresond to samles. Either a binary vector encoding a biartition of the set of samles, or a binary matrix with every row encoding a biartition of the set of samles. The number of best discriminating genes selected for comuting the score of a biartition. 1
2 .offs min.slit.size The number of very best discriminating genes that are ignored when comuting the score of a artition. Thus only the genes with ranks.offs+1,..., are used to comute the score. The minimal size either class of a biartition must have to be acceted as a local maximum. A binary matrix where each column (excet the last one) reresents a samle (in the original order). The rows encode the found local maximum biartitions. The last entry in each row gives the score of the artition. slit <- re(0, ncol(alizadeh$x)) slit[which(alizadeh$tye=="fl")] <- 1 y <- gotomax(alizadeh$x, slit) isis Class discovery for microarray a Finds biartitions of a set of tissue samles that show clear searation regarding the exression of a subset of genes. isis(x, = 50,.offs = 0, max.nr.cand.slits = 500) x The gene exression a matrix. Rows corresond to genes, and columns corresond to samles. The values in x may be normalized logarithms of intensities or log-ratios. The number of best discriminating genes selected for comuting the score of a biartition..offs The number of very best discriminating genes that are ignored when comuting the score of a artition. Thus only the genes with ranks.offs+1,..., are used to comute the score. A value >0 may imrove the robustness of the score, sorting out ossibly surious artitions with very few genes that discriminate very well between the two grous. max.nr.cand.slits After the generation of candie biartitions, only the max.nr.cand.slits artitions with highest scores are further investigated. This is used in order to control the running time of the rogram. 2
3 Details It may be useful to first discard genes from the a matrix that dislay consistently low variance across the samles or consistently low exression intensity. For the generation of candie biartitions, the a matrix is enlarged by adding average exression rofiles of all clusters of genes resulting from a centroid linkage hierarchical clustering based on the correlation coefficient as similarity measure. In order to reduce comutation time, similar candie artitions are merged through hierarchical clustering before alying the local search to find local maximum artitions (see gotomax ). A matrix where each column (excet the last one) reresents a samle (in the original order). The rows reresent the found biartitions with highest scores, coded as 0/1-vectors. The last entry in each row gives the score of the artition. Only artitions with each grou containing at least 10 Author(s) Anja von Heydebreck and Wolfgang Huber htt:// References Identifying Slits with Clear Searation: A New Class Discovery Method for Gene Exression Data. Anja von Heydebreck, Wolfgang Huber, Annemarie Poustka, and Martin Vingron, ISMB 2001, Bioinformatics 17, Sul. 1 (2001) cutime <- system.time( y <- isis(alizadeh$x) ) cat("time used:", cutime[1], "sec on CPU,", cutime[3], "sec on wallclock.\n") thresh Simulated quantiles of t-statistic for ordered normal random vectors. For samles of at most 150 i.i.d standard normal random variables, the simulated quantiles of the two-samle t-statistic comaring the a values below and above any cutoint is given. a(thresh) 3
4 Format A 150 x 150 matrix with the row index indicating the samle size and the column index indicating the cutoint. tscore Score function for biartitions Comutes a score for a given biartition of the set of samles that measures how well the two classes are searated by the exression levels of a suitable subset of genes. tscore(, slit, = 50,.offs = 0) slit.offs The gene exression a matrix. Rows corresond to genes, and columns corresond to samles. The values in x may be normalized logarithms of intensities or log-ratios. Either a binary vector encoding a biartition of the set of samles, or a binary matrix with every row encoding a biartition of the set of samles. The number of best discriminating genes selected for comuting the score of a biartition. The number of very best discriminating genes that are ignored when comuting the score of a artition. Thus only the genes with ranks.offs+1,..., are used to comute the score. Details First, the genes with ranks.offs+1,..., according to the two-samle t-statistic are selected. Then, a discriminant axis for a naive Bayes classifier (ML classification rule for two normal distributions with identical, diagonal covariance matrix) based on these genes is comuted, and the score is comuted as the t-statistic for the coordinates of the samle vectors rojected onto this axis. A vector with the scores of the biartitions encoded in slit. slit <- re(0, ncol(alizadeh$x)) slit[which(alizadeh$tye=="fl")] <- 1 z <- tscore(alizadeh$x, slit) 4
5 ttesttwo two-samle t-statistic Comutes the two-samle t-statistic for each row of a a matrix, according to a biartition of the set of columns. ttesttwo(, slit) slit A numeric a matrix. A binary vector encoding a biartition of the set of columns of. A vector with the t-statistic values for each row of, according to the biartition. slit <- re(0, ncol(alizadeh$x)) slit[which(alizadeh$tye=="fl")] <- 1 t <- ttesttwo(alizadeh$x, slit) 5
Exploratory data analysis for microarray data
Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization
More informationSynopsys RURAL ELECTRICATION PLANNING SOFTWARE (LAPER) Rainer Fronius Marc Gratton Electricité de France Research and Development FRANCE
RURAL ELECTRICATION PLANNING SOFTWARE (LAPER) Rainer Fronius Marc Gratton Electricité de France Research and Develoment FRANCE Synosys There is no doubt left about the benefit of electrication and subsequently
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationEffect Sizes Based on Means
CHAPTER 4 Effect Sizes Based on Means Introduction Raw (unstardized) mean difference D Stardized mean difference, d g Resonse ratios INTRODUCTION When the studies reort means stard deviations, the referred
More informationCONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
More informationA Multivariate Statistical Analysis of Stock Trends. Abstract
A Multivariate Statistical Analysis of Stock Trends Aril Kerby Alma College Alma, MI James Lawrence Miami University Oxford, OH Abstract Is there a method to redict the stock market? What factors determine
More informationStatistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees
Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationSQUARE GRID POINTS COVERAGED BY CONNECTED SOURCES WITH COVERAGE RADIUS OF ONE ON A TWO-DIMENSIONAL GRID
International Journal of Comuter Science & Information Technology (IJCSIT) Vol 6, No 4, August 014 SQUARE GRID POINTS COVERAGED BY CONNECTED SOURCES WITH COVERAGE RADIUS OF ONE ON A TWO-DIMENSIONAL GRID
More informationMonitoring Frequency of Change By Li Qin
Monitoring Frequency of Change By Li Qin Abstract Control charts are widely used in rocess monitoring roblems. This aer gives a brief review of control charts for monitoring a roortion and some initial
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More information1 Gambler s Ruin Problem
Coyright c 2009 by Karl Sigman 1 Gambler s Ruin Problem Let N 2 be an integer and let 1 i N 1. Consider a gambler who starts with an initial fortune of $i and then on each successive gamble either wins
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationPackage COSINE. February 19, 2015
Type Package Title COndition SpecIfic sub-network Version 2.1 Date 2014-07-09 Author Package COSINE February 19, 2015 Maintainer Depends R (>= 3.1.0), MASS,genalg To identify
More informationLecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
More informationCHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationExpression Quantification (I)
Expression Quantification (I) Mario Fasold, LIFE, IZBI Sequencing Technology One Illumina HiSeq 2000 run produces 2 times (paired-end) ca. 1,2 Billion reads ca. 120 GB FASTQ file RNA-seq protocol Task
More informationManaging specific risk in property portfolios
Managing secific risk in roerty ortfolios Andrew Baum, PhD University of Reading, UK Peter Struemell OPC, London, UK Contact author: Andrew Baum Deartment of Real Estate and Planning University of Reading
More informationForensic Science International
Forensic Science International 214 (2012) 33 43 Contents lists available at ScienceDirect Forensic Science International jou r nal h o me age: w ww.els evier.co m/lo c ate/fo r sc iin t A robust detection
More informationBig Data Visualization for Genomics. Luca Vezzadini Kairos3D
Big Data Visualization for Genomics Luca Vezzadini Kairos3D Why GenomeCruzer? The amount of data for DNA sequencing is growing Modern hardware produces billions of values per sample Scientists need to
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationStatic and Dynamic Properties of Small-world Connection Topologies Based on Transit-stub Networks
Static and Dynamic Proerties of Small-world Connection Toologies Based on Transit-stub Networks Carlos Aguirre Fernando Corbacho Ramón Huerta Comuter Engineering Deartment, Universidad Autónoma de Madrid,
More informationThe risk of using the Q heterogeneity estimator for software engineering experiments
Dieste, O., Fernández, E., García-Martínez, R., Juristo, N. 11. The risk of using the Q heterogeneity estimator for software engineering exeriments. The risk of using the Q heterogeneity estimator for
More informationDidacticiel - Études de cas
1 Topic Linear Discriminant Analysis Data Mining Tools Comparison (Tanagra, R, SAS and SPSS). Linear discriminant analysis is a popular method in domains of statistics, machine learning and pattern recognition.
More informationCategorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
More informationClustering in the Linear Model
Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple
More informationA Logistic Regression Approach to Ad Click Prediction
A Logistic Regression Approach to Ad Click Prediction Gouthami Kondakindi kondakin@usc.edu Satakshi Rana satakshr@usc.edu Aswin Rajkumar aswinraj@usc.edu Sai Kaushik Ponnekanti ponnekan@usc.edu Vinit Parakh
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationPrice Elasticity of Demand MATH 104 and MATH 184 Mark Mac Lean (with assistance from Patrick Chan) 2011W
Price Elasticity of Demand MATH 104 and MATH 184 Mark Mac Lean (with assistance from Patrick Chan) 2011W The rice elasticity of demand (which is often shortened to demand elasticity) is defined to be the
More informationThe Portfolio Characteristics and Investment Performance of Bank Loan Mutual Funds
The Portfolio Characteristics and Investment Performance of Bank Loan Mutual Funds Zakri Bello, Ph.. Central Connecticut State University 1615 Stanley St. New Britain, CT 06050 belloz@ccsu.edu 860-832-3262
More informationMicrosoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
More informationThey can be obtained in HQJHQH format directly from the home page at: http://www.engene.cnb.uam.es/downloads/kobayashi.dat
HQJHQH70 *XLGHG7RXU This document contains a Guided Tour through the HQJHQH platform and it was created for training purposes with respect to the system options and analysis possibilities. It is not intended
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationDimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
More informationChapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationMS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
More information10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationOn the predictive content of the PPI on CPI inflation: the case of Mexico
On the redictive content of the PPI on inflation: the case of Mexico José Sidaoui, Carlos Caistrán, Daniel Chiquiar and Manuel Ramos-Francia 1 1. Introduction It would be natural to exect that shocks to
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationPerformance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationModeling and Simulation of an Incremental Encoder Used in Electrical Drives
10 th International Symosium of Hungarian Researchers on Comutational Intelligence and Informatics Modeling and Simulation of an Incremental Encoder Used in Electrical Drives János Jób Incze, Csaba Szabó,
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationConcurrent Program Synthesis Based on Supervisory Control
010 American Control Conference Marriott Waterfront, Baltimore, MD, USA June 30-July 0, 010 ThB07.5 Concurrent Program Synthesis Based on Suervisory Control Marian V. Iordache and Panos J. Antsaklis Abstract
More informationThe impact of metadata implementation on webpage visibility in search engine results (Part II) q
Information Processing and Management 41 (2005) 691 715 www.elsevier.com/locate/inforoman The imact of metadata imlementation on webage visibility in search engine results (Part II) q Jin Zhang *, Alexandra
More informationIdentifying SPAM with Predictive Models
Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to
More informationThe Changing Wage Return to an Undergraduate Education
DISCUSSION PAPER SERIES IZA DP No. 1549 The Changing Wage Return to an Undergraduate Education Nigel C. O'Leary Peter J. Sloane March 2005 Forschungsinstitut zur Zukunft der Arbeit Institute for the Study
More informationPrinciples of Hydrology. Hydrograph components include rising limb, recession limb, peak, direct runoff, and baseflow.
Princiles of Hydrology Unit Hydrograh Runoff hydrograh usually consists of a fairly regular lower ortion that changes slowly throughout the year and a raidly fluctuating comonent that reresents the immediate
More informationRisk and Return. Sample chapter. e r t u i o p a s d f CHAPTER CONTENTS LEARNING OBJECTIVES. Chapter 7
Chater 7 Risk and Return LEARNING OBJECTIVES After studying this chater you should be able to: e r t u i o a s d f understand how return and risk are defined and measured understand the concet of risk
More informationD-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationEffective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA
More informationIntroduction to Principal Components and FactorAnalysis
Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a
More informationIndex Numbers OPTIONAL - II Mathematics for Commerce, Economics and Business INDEX NUMBERS
Index Numbers OPTIONAL - II 38 INDEX NUMBERS Of the imortant statistical devices and techniques, Index Numbers have today become one of the most widely used for judging the ulse of economy, although in
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationAnalysis of gene expression data. Ulf Leser and Philippe Thomas
Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationClassifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationDiscovering objects and their location in images
Discovering objects and their location in images Josef Sivic Bryan C. Russell Alexei A. Efros Andrew Zisserman William T. Freeman Dept. of Engineering Science CS and AI Laboratory School of Computer Science
More informationComparing Dissimilarity Measures for Symbolic Data Analysis
Comaring Dissimilarity Measures for Symbolic Data Analysis Donato MALERBA, Floriana ESPOSITO, Vincenzo GIOVIALE and Valentina TAMMA Diartimento di Informatica, University of Bari Via Orabona 4 76 Bari,
More informationUnited Arab Emirates University College of Sciences Department of Mathematical Sciences HOMEWORK 1 SOLUTION. Section 10.1 Vectors in the Plane
United Arab Emirates University College of Sciences Deartment of Mathematical Sciences HOMEWORK 1 SOLUTION Section 10.1 Vectors in the Plane Calculus II for Engineering MATH 110 SECTION 0 CRN 510 :00 :00
More informationNormally Distributed Data. A mean with a normal value Test of Hypothesis Sign Test Paired observations within a single patient group
ANALYSIS OF CONTINUOUS VARIABLES / 31 CHAPTER SIX ANALYSIS OF CONTINUOUS VARIABLES: COMPARING MEANS In the last chater, we addressed the analysis of discrete variables. Much of the statistical analysis
More informationSummary of R software commands used to generate bootstrap and permutation test output and figures in Chapter 16
Summary of R software commands used to generate bootstrap and permutation test output and figures in Chapter 16 Since R is command line driven and the primary software of Chapter 16, this document details
More informationFree Software Development. 2. Chemical Database Management
Leonardo Electronic Journal of Practices and echnologies ISSN 1583-1078 Issue 1, July-December 2002. 69-76 Free Software Develoment. 2. Chemical Database Management Monica ŞEFU 1, Mihaela Ligia UNGUREŞAN
More informationHIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM
HIGH DIMENSIONAL UNSUPERVISED CLUSTERING BASED FEATURE SELECTION ALGORITHM Ms.Barkha Malay Joshi M.E. Computer Science and Engineering, Parul Institute Of Engineering & Technology, Waghodia. India Email:
More informationCluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico
Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More information1 2 3 1 1 2 x = + x 2 + x 4 1 0 1
(d) If the vector b is the sum of the four columns of A, write down the complete solution to Ax = b. 1 2 3 1 1 2 x = + x 2 + x 4 1 0 0 1 0 1 2. (11 points) This problem finds the curve y = C + D 2 t which
More informationThe Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling
The Fundamental Incomatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsamling Michael Betancourt Deartment of Statistics, University of Warwick, Coventry, UK CV4 7A BETANAPHA@GMAI.COM Abstract
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationDesign of A Knowledge Based Trouble Call System with Colored Petri Net Models
2005 IEEE/PES Transmission and Distribution Conference & Exhibition: Asia and Pacific Dalian, China Design of A Knowledge Based Trouble Call System with Colored Petri Net Models Hui-Jen Chuang, Chia-Hung
More informationDistances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
More informationUNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable
More informationLavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs
1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be
More informationSimulink Implementation of a CDMA Smart Antenna System
Simulink Imlementation of a CDMA Smart Antenna System MOSTAFA HEFNAWI Deartment of Electrical and Comuter Engineering Royal Military College of Canada Kingston, Ontario, K7K 7B4 CANADA Abstract: - The
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationAutomatic Search for Correlated Alarms
Automatic Search for Correlated Alarms Klaus-Dieter Tuchs, Peter Tondl, Markus Radimirsch, Klaus Jobmann Institut für Allgemeine Nachrichtentechnik, Universität Hannover Aelstraße 9a, 0167 Hanover, Germany
More informationPrincipal Component Analysis
Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded
More informationJohns Hopkins University
Johns Hopkins University Johns Hopkins University, Dept. of Biostatistics Working Papers Year 2004 Paper 65 Cross-study Validation and Combined Analysis of Gene Expression Microarray Data Elizabeth Garrett-Mayer
More informationPredictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients
Predictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients by Li Liu A practicum report submitted to the Department of Public Health Sciences in conformity with
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationLecture 5 Principal Minors and the Hessian
Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and
More informationCS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationData Mining Practical Machine Learning Tools and Techniques
Credibility: Evaluating what s been learned Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Issues: training, testing,
More informationGraphs. Exploratory data analysis. Graphs. Standard forms. A graph is a suitable way of representing data if:
Graphs Exploratory data analysis Dr. David Lucy d.lucy@lancaster.ac.uk Lancaster University A graph is a suitable way of representing data if: A line or area can represent the quantities in the data in
More informationW6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set
http://wwwcscolostateedu/~cs535 W6B W6B2 CS535 BIG DAA FAQs Please prepare for the last minute rush Store your output files safely Partial score will be given for the output from less than 50GB input Computer
More information