# Classify then Summarize or Summarize then Classify

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Classify then Summarize or Summarize then Classify DIMACS, Rutgers University Piscataway, NJ Workshop Honoring Edwin Diday held on September 4, 2007

2 What is Cluster Analysis? Software package? Collection of Computer Algorithms? Type of multivariate statistical analysis? Branch of discrete mathematics? Should not be recognized as a separate discipline. Want this to be a discipline. So need a mathematical model. Two views: 1. Input data has a true hierarchical structure. Data you are given has possible errors. Clustering estimates the true structure. 2. Cluster analysis suggests possible internal structure for data. The suggestions may or may not be valid.

3 Background From Jardine and Sibson Book: Mathematical Taxonomy by N. Jardine and R. Sibson, Wiley, New York, This got me started. Underlying finite set to be classified: E Σ(E) the reflexive symmetric relations on E. Dissimilarity coefficient (DC) d : E E R + 0 d(a, b) = d(b, a) d(a, a) = 0 d is an ultrametric if also d(a, b) max{d(a, c), d(b, c)} for all a, b, c E. Numerically stratified clustering (NSC) Td : R + 0 Σ(E) a residual mapping (lattice theoretic idea) in that There is an h such that Td(h) = E E. Td( h i ) = Td(h i ). NSCs and DCs are in one-one correspondence. In the book a cluster method is viewed as a transformation of a DC to an ultrametric. Careful (but limited) mathematical model.

4 Connection with Symbolic Data Analysis If every object in E has a specified collection of attributes, it is straightforward to compute a DC. What if there is some variation or uncertainty involving the values of the attributes within an object? One could take a mean, or a median, or some other statistic summarizing the data belonging to each object. This involves a summary before one classifies anything. Might be wise to defer any summary as long as possible. One might view the attributes as taking values in an interval. or one might view them as belonging to a distribution. This places us into the framework of symbolic data analysis, but it also puts us into a discipline called Percentile Clustering (Janowitz and Schweizer, Math. Social Sciences 18, pp ). Included here would be dissimilarities taking values in a confidence interval. In these cases, we need to be able to have DCs taking values in a poset with smallest member 0.

5 What is a dissimilarity coefficient? Mapping d from ordered pairs of objects to some partially ordered set (Often the non-negative reals). Higher values of d(x, y) make (x, y) more dissimilar (less similar). So d(x, y) measures the dissimilarity. Another view: d(x, y) represents the levels at which (x, y) is a candidate for clustering. Basic property: If (x, y) is a candidate for clustering at level h, and h k, then (x, y) is a candidate at level k. k provides a less strict criterion. Cluster method: At each level h, decide which cluster candidates actually get clustered. View clusters as possible classifications.

6 Clustering Based on a Poset L is poset with smallest element 0 where dissimilarities measured. F(L) = order filters of L ordered by F G G F. F, x F, x y implies y F. Principal filter: F h = {y L : y h}. F(L) is a complete distributive lattice. DC: D : E E F(L) such that D(a, b) = D(b, a) D(a, a) D(a, b). Might want D(a, a) = F 0. Can take D(a, b) to be principal filters. Ultrametric if also D(a, b) D(a, c) D(b, c) for all a, b, c E. SD : L Σ(E) (Symmetric relations on E). SD gives cluster candidates at level h. SD(h) = {(a, b) : h D(a, b)}. h k implies SD(h) SD(k). If L has a largest member 1, then SD(1) = E E. h D(a, b) (a, b) SD(h).

7 Detour into theory If we take DCs as mappings d : E E L where L is poset with 0, single linkage clustering not valid. Single linkage operation: For h L, let R h = {(a, b) : d(a, b) h}. Output has at h the transitive relation E h = γ(r h ) generated by R h. For this to work γ(r h R k ) must equal γ(r h ) γ(r k ). Not true unless L is a chain. One solution: For L a chain, the map Td : L Σ(E) has the property that the preimage of every principal filter is a principal filter. Can relax this to assume that each pre-image is the finite union of principal filters. (Applications of the theory of partially ordered sets to cluster analysis, Banach Center Publications 9, 1982, pp ) Another solution (Present view): Assume d takes values in F(L).

8 An example involving numerical data Water Protein Fat Lactose Ash 1. Bison Buffalo Camel Cat Deer Dog Dolphin Donkey Composition of Mammal Milk (Clustan) Original data has 25 mammal species. Just wanted short example. Used DC taking values in R where R + 0 denotes non-negative reals. Here is construction. Used squared Euclidean distance on each attribute to construct five separate DCs, then represent them as columns in a single dissimilarity matrix having 28 rows and 5 columns and denoted P. Use vector ordering inherited from R

9 A Version of Complete Linkage Clustering We illustrate complete linkage clustering with the data at hand. The clusters implied by the minimal members M(P) of P are all formed. We then remove them from P to form P 1. members of P 1. If k is such a level, we look at the clusters implied by the members of M(P) strictly under k. These are all formed. To get the clusters at level k, we merge any clusters for which all links have been made (including any links at level k), and continue the process. We illustrate this numerically. Here is the list of edges of P. level edge level edge level edge level edge

10 Sample Calculations The minimal members of M(P) (height 0) are levels {1, 2, 7, 8, 9, 11, 19, 20, 21, 23, 24}. The next layer (height 1) of levels is {3, 5, 12, 14, 18, 26}. Let s examine the clusters for these levels. level 3 lies only above level 9. Thus we must cluster 24. level 3 has as a cluster candidate 14. In single linkage clustering we would have the cluster 124. But complete linkage would not merge 14 with 24 since there is no link between 1 and 2. Thus at level 3, 24 is the only non-singleton cluster. Similar reasoning shows that at level 5, we have the cluster 12346, while complete linkage only has At level 12, we have SL: 12347, 56 and CL: 1234, 56. level Single Complete , , 456

11 The nontrivial clusters 12347, , , , , 34 Figure: The nontrivial clusters (ignoring levels at which they occur) Remember: Clusters just suggest structure! 1. Bison 5. Deer 2. Buffalo 6. Dog 3. Camel 7. Dolphin 4. Cat 8. Donkey

12 Figure: Complete linkage using standard clustering

13 Figure: Single linkage using standard clustering

14 Vietnam casualties Here is a second example. It relates to US and South Vietnamese combat deaths during the Viet Nam war over a period of 6 years. The data was taken from Hartigan, Clustering Algorithms, (Wiley, New York, 1975), p Will not repeat data here. Example was discussed in Janowitz and Schweizer, Ordinal and Percentile Clustering, Math. Social Sciences, 18 (1989), Description: 72 monthly totals, one for US and one for South Viet Nam. Period was from January, 1966 to December, Will label the data with the letters a, b, c, d, e, f, g, h, i, j, k, l in chronological order. Description of technique: We used Squared Euclidean distance to create a 72 by 72 dissimilarity matrix ˆd. The dissimilarity we are after is then based on the 12 groups of data (6 per group). Thus to get the dissimilarity D(a, b), we use the distribution formed by the 36 entries {ˆd(i, j) : 1 i 6, 7 j 12}.

15 We take the 30th, 50th and 70th percentiles of this distribution. Thus the DC D is a 66 by 3 array of numbers. We order this array with the vector space ordering x y row for x row for y. We label the groups with the letters from a to l, and these represent in chronological order the months JAN-JUN, 1966, JUL-DEC, 1966,..., JUL-DEC, This poset has one minimal member: the triple for ab, and two maximal members, the triples for be, and el. Let s see how the complete linkage algorithms works. We begin by clustering ab at the level (340.3, 451.7, ). This level is not only minimal, it is the smallest member of D. There is only one entry at height 2, and it corresponds to cd. Thus at that level, we have the clusters ab and cd.

16 Height 3: the pairs bc, hj, jl. For bc: bc > cd > ab, so we have the clusters ab, cd. ab and cd do not merge to form abcd because no links at bd, ad, ac. For hj: hj > cd > ab, so clusters ab, cd, hj are formed. For jl: jl > cd > ab so clusters ab, cd, jl. Groups e, i and k seem to not cluster with other entries. The clusters that want to form involve ab, cd, fg, and hjl. The next slide gives a graphic view of the data, and following that a slide that looks at the various clusterings.

17 A Graphic View of the Data Figure: Vietnam Casualty Data Note: Blue is US casualties, red is SVN.

18 A display of the clusters abcdhijkl, efg abcdhjkl, fg abhjkl, cd, efg abhijkl, cd, fg abcdhjl, fg abhjkl, cd, fg ab, cd, fg, hijkl abhjl, cd, fg ab, cd, fg, hjkl ab, cd, fg, hjl ab, cd, hjl ab, cd, hj ab, cd, jl ab, cd ab Figure: The Nontrivial Clusters

19 Figure: Clustering Based on Medians of each Group

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501

CLUSTER ANALYSIS Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 January 2006 Introduction Cluster analysis includes a broad suite of techniques designed to find groups

### Chapter 7. Hierarchical cluster analysis. Contents 7-1

7-1 Chapter 7 Hierarchical cluster analysis In Part 2 (Chapters 4 to 6) we defined several different ways of measuring distance (or dissimilarity as the case may be) between the rows or between the columns

### Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico

Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from

### Identification of noisy variables for nonmetric and symbolic data in cluster analysis

Identification of noisy variables for nonmetric and symbolic data in cluster analysis Marek Walesiak and Andrzej Dudek Wroclaw University of Economics, Department of Econometrics and Computer Science,

### SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

### Clustering and Data Mining in R

Clustering and Data Mining in R Workshop Supplement Thomas Girke December 10, 2011 Introduction Data Preprocessing Data Transformations Distance Methods Cluster Linkage Hierarchical Clustering Approaches

### Data Mining 5. Cluster Analysis

Data Mining 5. Cluster Analysis 5.2 Fall 2009 Instructor: Dr. Masoud Yaghini Outline Data Structures Interval-Valued (Numeric) Variables Binary Variables Categorical Variables Ordinal Variables Variables

### . 0 1 10 2 100 11 1000 3 20 1 2 3 4 5 6 7 8 9

Introduction The purpose of this note is to find and study a method for determining and counting all the positive integer divisors of a positive integer Let N be a given positive integer We say d is a

### Data Mining and Clustering Techniques

DRTC Workshop on Semantic Web 8 th 10 th December, 2003 DRTC, Bangalore Paper: K Data Mining and Clustering Techniques I. K. Ravichandra Rao Professor and Head Documentation Research and Training Center

### Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

### FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

### Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

### Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

### There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are

### Introduction to Statistical Machine Learning

CHAPTER Introduction to Statistical Machine Learning We start with a gentle introduction to statistical machine learning. Readers familiar with machine learning may wish to skip directly to Section 2,

### Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

### Magic Squares and Modular Arithmetic

Magic Squares and Modular Arithmetic Jim Carlson November 7, 2001 1 Introduction Recall that a magic square is a square array of consecutive distinct numbers such that all row and column sums and are the

### Mathematics of Cryptography Modular Arithmetic, Congruence, and Matrices. A Biswas, IT, BESU SHIBPUR

Mathematics of Cryptography Modular Arithmetic, Congruence, and Matrices A Biswas, IT, BESU SHIBPUR McGraw-Hill The McGraw-Hill Companies, Inc., 2000 Set of Integers The set of integers, denoted by Z,

### Cluster analysis with SPSS: K-Means Cluster Analysis

analysis with SPSS: K-Means Analysis analysis is a type of data classification carried out by separating the data into groups. The aim of cluster analysis is to categorize n objects in k (k>1) groups,

### Gerry Hobbs, Department of Statistics, West Virginia University

Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

### Just the Factors, Ma am

1 Introduction Just the Factors, Ma am The purpose of this note is to find and study a method for determining and counting all the positive integer divisors of a positive integer Let N be a given positive

### December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

### Mathematics of Cryptography

CHAPTER 2 Mathematics of Cryptography Part I: Modular Arithmetic, Congruence, and Matrices Objectives This chapter is intended to prepare the reader for the next few chapters in cryptography. The chapter

### Scatter Plots with Error Bars

Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

### Chapter 2: Exploring Data with Graphs and Numerical Summaries. Graphical Measures- Graphs are used to describe the shape of a data set.

Page 1 of 16 Chapter 2: Exploring Data with Graphs and Numerical Summaries Graphical Measures- Graphs are used to describe the shape of a data set. Section 1: Types of Variables In general, variable can

### Cluster Analysis using R

Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other

### Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

### STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

### Copy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any.

Algebra 2 - Chapter Prerequisites Vocabulary Copy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any. P1 p. 1 1. counting(natural) numbers - {1,2,3,4,...}

### Hierarchical Cluster Analysis Some Basics and Algorithms

Hierarchical Cluster Analysis Some Basics and Algorithms Nethra Sambamoorthi CRMportals Inc., 11 Bartram Road, Englishtown, NJ 07726 (NOTE: Please use always the latest copy of the document. Click on this

### Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

### MEASURES OF VARIATION

NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

### STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp. 238-239

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. by John C. Davis Clarificationof zonationprocedure described onpp. 38-39 Because the notation used in this section (Eqs. 4.8 through 4.84) is inconsistent

### Equivalence Relations

Equivalence Relations Definition An equivalence relation on a set S, is a relation on S which is reflexive, symmetric and transitive. Examples: Let S = Z and define R = {(x,y) x and y have the same parity}

### Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

### Northumberland Knowledge

Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

### 0.1 What is Cluster Analysis?

Cluster Analysis 1 2 0.1 What is Cluster Analysis? Cluster analysis is concerned with forming groups of similar objects based on several measurements of different kinds made on the objects. The key idea

### Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

### Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

### Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

### Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

### Risk Analysis of Development Programs

P. P. PANDOLFINI A Risk Analysis of Development Programs Peter P. Pandolfini set-theory methodology for risk analysis was first proposed by Benjamin Blanchard. The methodology is explained here in terms

### Introduction to Multivariate Analysis

Introduction to Multivariate Analysis Lecture 1 August 24, 2005 Multivariate Analysis Lecture #1-8/24/2005 Slide 1 of 30 Today s Lecture Today s Lecture Syllabus and course overview Chapter 1 (a brief

### MATH 240 Fall, Chapter 1: Linear Equations and Matrices

MATH 240 Fall, 2007 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 9th Ed. written by Prof. J. Beachy Sections 1.1 1.5, 2.1 2.3, 4.2 4.9, 3.1 3.5, 5.3 5.5, 6.1 6.3, 6.5, 7.1 7.3 DEFINITIONS

### Self-Organizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data

Self-Organizing g Maps (SOM) Ke Chen Outline Introduction ti Biological Motivation Kohonen SOM Learning Algorithm Visualization Method Examples Relevant Issues Conclusions 2 Introduction Self-organizing

### ARE211, Fall2012. Contents. 2. Linear Algebra Preliminary: Level Sets, upper and lower contour sets and Gradient vectors 1

ARE11, Fall1 LINALGEBRA1: THU, SEP 13, 1 PRINTED: SEPTEMBER 19, 1 (LEC# 7) Contents. Linear Algebra 1.1. Preliminary: Level Sets, upper and lower contour sets and Gradient vectors 1.. Vectors as arrows.

### How to interpret scientific & statistical graphs

How to interpret scientific & statistical graphs Theresa A Scott, MS Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott 1 A brief introduction Graphics:

### 1. Determine all real numbers a, b, c, d that satisfy the following system of equations.

altic Way 1999 Reykjavík, November 6, 1999 Problems 1. etermine all real numbers a, b, c, d that satisfy the following system of equations. abc + ab + bc + ca + a + b + c = 1 bcd + bc + cd + db + b + c

### Lecture 20: Clustering

Lecture 20: Clustering Wrap-up of neural nets (from last lecture Introduction to unsupervised learning K-means clustering COMP-424, Lecture 20 - April 3, 2013 1 Unsupervised learning In supervised learning,

### LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method Introduction to dual linear program Given a constraint matrix A, right

### DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

### A matrix over a field F is a rectangular array of elements from F. The symbol

Chapter MATRICES Matrix arithmetic A matrix over a field F is a rectangular array of elements from F The symbol M m n (F) denotes the collection of all m n matrices over F Matrices will usually be denoted

### Mathematics of Cryptography Part I

CHAPTER 2 Mathematics of Cryptography Part I (Solution to Odd-Numbered Problems) Review Questions 1. The set of integers is Z. It contains all integral numbers from negative infinity to positive infinity.

### BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

### Some Essential Statistics The Lure of Statistics

Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived

### Time series clustering and the analysis of film style

Time series clustering and the analysis of film style Nick Redfern Introduction Time series clustering provides a simple solution to the problem of searching a database containing time series data such

### Intro to Data Analysis, Economic Statistics and Econometrics

Intro to Data Analysis, Economic Statistics and Econometrics Statistics deals with the techniques for collecting and analyzing data that arise in many different contexts. Econometrics involves the development

### UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level *4274470357* MATHEMATICS (SYLLABUS D) 4024/11 Paper 1 October/November 2012 2 hours Candidates answer

### Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

### The Four Numbers Game

University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln MAT Exam Expository Papers Math in the Middle Institute Partnership 7-1-2007 The Four Numbers Game Tina Thompson University

### Teaching & Learning Plans. Plan 6: Planes and Points. Junior Certificate Syllabus

Teaching & Learning Plans Plan 6: Planes and Points Junior Certificate Syllabus The Teaching & Learning Plans are structured as follows: Aims outline what the lesson, or series of lessons, hopes to achieve.

### Solving Sets of Equations. 150 B.C.E., 九章算術 Carl Friedrich Gauss,

Solving Sets of Equations 5 B.C.E., 九章算術 Carl Friedrich Gauss, 777-855 Gaussian-Jordan Elimination In Gauss-Jordan elimination, matrix is reduced to diagonal rather than triangular form Row combinations

### MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

### Data visualization and clustering. Genomics is to no small extend a data science

Data visualization and clustering Genomics is to no small extend a data science [www.data2discovery.org] Data visualization and clustering Genomics is to no small extend a data science [Andersson et al.,

### ROUTH S STABILITY CRITERION

ECE 680 Modern Automatic Control Routh s Stability Criterion June 13, 2007 1 ROUTH S STABILITY CRITERION Consider a closed-loop transfer function H(s) = b 0s m + b 1 s m 1 + + b m 1 s + b m a 0 s n + s

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### CLASSIFYING SERVICES USING A BINARY VECTOR CLUSTERING ALGORITHM: PRELIMINARY RESULTS

CLASSIFYING SERVICES USING A BINARY VECTOR CLUSTERING ALGORITHM: PRELIMINARY RESULTS Venkat Venkateswaran Department of Engineering and Science Rensselaer Polytechnic Institute 275 Windsor Street Hartford,

### Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

### A Introduction to Matrix Algebra and Principal Components Analysis

A Introduction to Matrix Algebra and Principal Components Analysis Multivariate Methods in Education ERSH 8350 Lecture #2 August 24, 2011 ERSH 8350: Lecture 2 Today s Class An introduction to matrix algebra

### The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,

### 4. How many integers between 2004 and 4002 are perfect squares?

5 is 0% of what number? What is the value of + 3 4 + 99 00? (alternating signs) 3 A frog is at the bottom of a well 0 feet deep It climbs up 3 feet every day, but slides back feet each night If it started

### A set is an unordered collection of objects.

Section 2.1 Sets A set is an unordered collection of objects. the students in this class the chairs in this room The objects in a set are called the elements, or members of the set. A set is said to contain

### Chapter 14: Analyzing Relationships Between Variables

Chapter Outlines for: Frey, L., Botan, C., & Kreps, G. (1999). Investigating communication: An introduction to research methods. (2nd ed.) Boston: Allyn & Bacon. Chapter 14: Analyzing Relationships Between

### Intermediate Math Circles November 18, 2015 SOLUTIONS

Intermediate Math Circles November 18, 015 SOLUTIONS Here are the warmup problems to try as everyone arrives. 1. There are two different right angle isosceles triangles that have a side with length. Draw

### Grade 6 Math Circles. Multiplication

Faculty of Mathematics Waterloo, Ontario N2L 3G1 Introduction Grade 6 Math Circles November 5/6, 2013 Multiplication At this point in your schooling you should all be very comfortable with multiplication.

### UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable

### Lecture - 32 Regression Modelling Using SPSS

Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 32 Regression Modelling Using SPSS (Refer

### Pythagorean vectors and their companions. Lattice Cubes

Lattice Cubes Richard Parris Richard Parris (rparris@exeter.edu) received his mathematics degrees from Tufts University (B.A.) and Princeton University (Ph.D.). For more than three decades, he has lived

### Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

### D-optimal plans in observational studies

D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

### Functional Dependencies: Part 2

Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong In designing a database, for the purpose of minimizing redundancy, we need to collect a set F of functional dependencies

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### The Application of Association Rule Mining in CRM Based on Formal Concept Analysis

The Application of Association Rule Mining in CRM Based on Formal Concept Analysis HongSheng Xu * and Lan Wang College of Information Technology, Luoyang Normal University, Luoyang, 471022, China xhs_ls@sina.com

### Sections 2.1, 2.2 and 2.4

SETS Sections 2.1, 2.2 and 2.4 Chapter Summary Sets The Language of Sets Set Operations Set Identities Introduction Sets are one of the basic building blocks for the types of objects considered in discrete

### Machine Learning using MapReduce

Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

### ON FIBER DIAMETERS OF CONTINUOUS MAPS

ON FIBER DIAMETERS OF CONTINUOUS MAPS PETER S. LANDWEBER, EMANUEL A. LAZAR, AND NEEL PATEL Abstract. We present a surprisingly short proof that for any continuous map f : R n R m, if n > m, then there

### Variables. Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

### 2 Descriptive statistics with R

Biological data analysis, Tartu 2006/2007 1 2 Descriptive statistics with R Before starting with basic concepts of data analysis, one should be aware of different types of data and ways to organize data

### ECEN 5682 Theory and Practice of Error Control Codes

ECEN 5682 Theory and Practice of Error Control Codes Convolutional Codes University of Colorado Spring 2007 Linear (n, k) block codes take k data symbols at a time and encode them into n code symbols.

### = 2 + 1 2 2 = 3 4, Now assume that P (k) is true for some fixed k 2. This means that

Instructions. Answer each of the questions on your own paper, and be sure to show your work so that partial credit can be adequately assessed. Credit will not be given for answers (even correct ones) without

### Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

### 1 Introduction. Dr. T. Srinivas Department of Mathematics Kakatiya University Warangal 506009, AP, INDIA tsrinivasku@gmail.com

A New Allgoriitthm for Miiniimum Costt Liinkiing M. Sreenivas Alluri Institute of Management Sciences Hanamkonda 506001, AP, INDIA allurimaster@gmail.com Dr. T. Srinivas Department of Mathematics Kakatiya

### Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

### Supplementary PROCESS Documentation

Supplementary PROCESS Documentation This document is an addendum to Appendix A of Introduction to Mediation, Moderation, and Conditional Process Analysis that describes options and output added to PROCESS

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.