The influence of teacher support on national standardized student assessment.
|
|
- Christina Doyle
- 8 years ago
- Views:
Transcription
1 The influence of teacher support on national standardized student assessment. A fuzzy clustering approach to improve the accuracy of Italian students data Claudio Quintano Rosalia Castellano Sergio Longobardi sergio.longobardi@uniparthenope.it Department of Statistics and Mathematics for Economic Research UNIVERSITY OF NAPLES PARTHENOPE
2 To err is human, to forgive divine. but to include errors in your design is statistical [Leslie Kish, 1978] 2
3 TOTAL SURVEY ERROR SAMPLING ERROR NON SAMPLING ERROR SPECIFICATION error PROCESSING error FRAME error NONRESPONSE error MEASURAMENT error (Biemer and Lyberg, 2003) The excessive teacher support might be considered similar to an interviewer effect (Biemer, Groves, Lyberg, Mathiowetz and Sudman, 1991) and we might suppose that the student s score is affected by an error component which inflates the measurement errors (Braverman, 1996) 3
4 This work considers data on students performance assessments collected by the Italian National Evaluation Institute of the Ministry of Education (INVALSI) in the s.y. 2004/05. A multistage method, combining the factorial analysis with a fuzzy clustering approach, is implemented for evaluating and correcting the cheating in primary classes 4
5 Teacher cheating, especially in extreme cases, is likely to leave tell-tale signs (Jacob B.A., Levitt S.D.; 2003) MATHEMATICS CLASS MEAN SCORE - S.Y 2004/05 IV CLASS PRIMARY SCHOOL II CLASS PRIMARY SCHOOL 5
6 Teacher cheating, especially in extreme cases, is likely to leave tell-tale signs (Jacob B.A., Levitt S.D.; 2003) MATHEMATICS CLASS MEAN SCORE - S.Y 2004/05 III CLASS UPPER SECONDARY SCHOOL I CLASS UPPER SECONDARY SCHOOL I CLASS LOWER SECONDARY SCHOOL 6
7 Histogram II CLASS - PRIMARY SCHOOL Histogram Histogram Frequency Frequency Frequency avergita Mean =87,81 Std. Dev. =8,14 N = avergita Mean =74,71 Std. Dev. =14,133 N = Reading s.y. 2004/05 Mathematics Histogram s.y. 2004/05 Science s.y. 2004/05 Histogram avergita 60 Histogram Frequency Frequency Frequency avergita Mean =77,72 Std. Dev. =13,029 N = avergita Mean =81,5 Std. Dev. =11,439 N = Reading s.y. 2005/06 Mathematics s.y. 2005/06 Science s.y. 2005/ avergita
8 The procedure is aimed at managing the presence of cheating at class level. A class is considered suspicious if : the within (class) variability of the final score is close to zero the answer behaviour is homogeneous the percentage of missing data is very low 8
9 For each class, four indexes are computed: Class mean score: p Class standard deviation score: Class index of answer homogeneity: E Class non-response rate: MC 9
10 p N k 1 N p k N k1 ( p k N p 2 ) Class mean score Class standard deviation score p k denotes the score of k th student of th class N denotes number of respondent students of th class 10
11 E Q s 1 Q E s Index of answer homogeneity Mean of h Gini indexes E s h 1 t1 n N t Gini s index of heterogeneity 2 n t /N denotes the relative frequency of students of th class that has given the t th answer to s th question. The index of answer homogeneity is equal to zero when all students of th class have given the same answers to all test questions, while it reaches the value (h-1)/h when in the th class the answer heterogeneity is maximum 11
12 MC N k 1 N M Q k Class non-response rate 0 (there are no missing or invalid responses for the th class) 1 (all students of th class have given only missing or invalid answers) M k denotes the number both of item non-responses and of invalid responses for the k th student of the th class. Q denotes the number of administered item to th class. It is a constant for each assessment area (reading, mathematics and science) and for each school level. 12
13 At the second step, the size of the data matrix, composed of the four indexes at class level, is reduced to two components by using Factor Analysis with a principal component extraction (Jolliffe, 2002). Eigenvalues of correlation matrix R, simple and cumulative percentage of explained variability by the principal component analysis COMPONENT Initial Eigenvalues TOTAL % of Variance Cumulative %
14 The PCA allows to describe the answer behaviour of each student class through two variables SECOND Component Class non response rate FIRST Component INDEX OF CLASS COLLABORATION TO SURVEY Cheating IDENTIFICATION AXIS 14
15 Proection on the first two factorial axes plane of second class primary students SUSPICIOUS CLASSES High score High homogeneity Low variability Missing responses close to zero 15
16 Fuzzy k-means (FKM) Overcoming the hard clustering The fuzzy k-means Developed by Bezdek (1981) and Dunn (1974), it is a fuzzy version of the non-overlapping partition model - hard k-means- 16
17 Steps of the correction procedure: clustering the classes by fuzzy k-means algorithm proection on the factorial axes of the cluster centroids detection of suspicious cluster centroid computation of a cheating index (on the basis of degree of belonging to suspicious cluster) 17
18 J FKM c n i1 1 u m i x v i 2 The results of FKM are affected by two parameters: number of the clusters (c) fuzziness index (m) 18
19 Calibration strategy A two-step approach: Computation of some validity measures (number of cluster c) Sensitivity of FKM results to the level of the fuzziness (fuzziness parameter m) 19
20 The validity measures to assess the goodness of the fuzzy partitions and to obtain the optimal number of c are: Fuzziness Performance Index (FPI) Normalized Classification Entropy (NCE) Separation index (S) These indices are calculated for several thresholds of the fuzziness parameters m (1.5, 2.0, 2.5). 20
21 Fuzzy Performance Index (Roubens, 1982) FPI 1 cpc c 1 Partition coefficient (Bedzek,1974) PC 1 n c n i1 1 2 u i NCE PE log c Normalized Classification Entropy (Roubens, 1982) PE 1 n c n i1 1 Partition entropy (Bedzek, 1981) u i log u i The FPI and NCE are used for evaluating the fuzziness of the solutions. The lower the FPI and MPE values are, the more suitable is the corresponding solution (McBratney & Moore, 1985). 21
22 Separation index S (Xie and Beni, 1991) S 1 c n i1 1 nmin u 2 i i, x v i v v i 2 2 Where the numerator denotes the compactness by the sum of square distances within clusters, while the denominator denotes separation by the minimal distance between clusters. The smaller the value of S, the better the compactness and separation between the clusters v 22
23 Fuzzy performance Index (FPI), Normal Partition Entropy (NPE) and Separation index (S) in correspondence of m=1.5 and for c[2,20] 0.50 fuzziness (m=1.5) FPI NCE S clusters (c) 23
24 Fuzzy performance Index (FPI), Normal Partition Entropy (NPE) and Separation index (S) in correspondence of m=2 and for c[2,20] 0.60 fuzziness (m=2.0) FPI NCE S clusters (c) 24
25 Fuzzy performance Index (FPI), Normal Partition Entropy (NPE) and Separation index (S) in correspondence of m=2.5 and for c[2,20] fuzziness (m=2.5) FPI NCE S clusters (c) 25
26 Number of classes with a weight close to zero (lower than 0.001) in correspondence of increasing values of fuzziness level: 1.1 m 2.5 and for c=3 50% 40% 30% 20% 10% 0% Units with weight lower than < North West North East Centre South South and Islands fuzziness 26
27 Some weighted central tendency measures of class mean score in correspondence of increasing values of fuzziness level: 1.1 m 2.5 and for c=3 110 Class score xx Area: South and South and islands Mean Median Mode The correction procedure tends to equalize the central tendency measures for m in the range fuzziness 27
28 Cluster 2 gathers the classes that present a cheating profile: high class average scores minimum within variability low presence of missing items Proection on factorial plane of centroids computed by fuzzy k-means algorithm, c=3, m=1.7 28
29 Indicating with a the cheating cluster, the degree of belonging to this cluster is: µ a This measure is considered as the cheating probability of th class Otherwise it can be interpreted as the cheating level of each class 29
30 On the basis of µ a, a weighting factor is developed: W =1 - µ a The students class with high probability to belong to outlier cluster will have a low weight while the class very far from this cluster will have a weight close to 1 30
31 Some Results Graphics by box plot of the index u a distributions (-th unit degree of belonging to cheating cluster) 31
32 Some results ORIGINAL DISTRIBUTION ADJUSTED DISTRIBUTION 32
33 Some results ORIGINAL DISTRIBUTION ORIGINAL DISTRIBUTION ORIGINAL DISTRIBUTION ORIGINAL DISTRIBUTION ORIGINAL DISTRIBUTION North-West North-East Centre South South and Islands ADJUSTED DISTRIBUTION ADJUSTED DISTRIBUTION ADJUSTED DISTRIBUTION ADJUSTED DISTRIBUTION ADJUSTED DISTRIBUTION 33
34 References Bezdek J.C., (1974). Cluster validity with fuzzy sets. Journal of Cybernetics, 3, Bezdek J.C. (1981). Pattern Recognition with Fuzzy Obective Function Algorithms. Plenum Press Biemer P.P., Groves R.M., Lyberg L.E., Mathiowetz N.A., Sudman S. (1991). Measurement Errors in Surveys. Wiley, New York. Braverman M. (1996). Sources of Survey Error: Implications for Evaluation Studies. New Directions for Evaluation: Advances in Survey Research, 70, Dunn J.C. (1974). A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact Well Separated Clusters. Journal of Cybernetics, 3, Jacob B.A., Levitt S.D. (2003) Rotten Apples: An Investigation of the Prevalence and Predictors of Teacher Cheating. The Quarterly Journal of Economics, 118(3), Jolliffe I.T. (2002). Principal Component Analysis. Springer, New York. McBratney A.B., Moore A.W. (1985). Application of fuzzy sets to climatic classification. Agricultural and Forest Meteorology, 35, Quintano, C., Castellano R., Longobardi S. (2009). A Fuzzy Clustering Approach to Improve the Accuracy of Italian Student Data. An Experimental Procedure to Correct the Impact of Outliers on Assessment Test Scores, Statistica & Applicazioni 7, Roubens M. (1982). Fuzzy clustering algorithms and their cluster validity. European Journal of Operational Research, 10 (3),
35 Supplementary Slides Comparison between unweighted average score per class and the weighted score per class according to the factor w =1- u a Original distribution Adusted distribution MEAN MODE I QUARTILE MEDIAN III QUARTILE
36 Workshop INVALSI Supplementary Slides The cluster centroids and the respective membership functions that solve the minimization problem of the J FKM function are: c i u x u v n m i n m i i 1, 1 1 n c i v x v x u c k m k i i 1, 1, 1 1 1) 1 /(
37 Supplementary Slides Invalsi macro areas (SNV 2004/ /06) North West=Valle d Aosta, Piemonte, Liguria, Lombardia North East=Trentino Alto Adige,Veneto, Friuli V.G.,Emilia R. Centre=Toscana,Umbria,Marche,Lazio South=Abruzzo,Molise,Campania,Puglia South and Islands= Basilicata,Calabria,Sicilia,Sardegna 37
Dimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
More informationData Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
More informationCenter: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)
Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center
More information1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression
More informationDATA MINING AND CUSTOMER RELATIONSHIP MANAGEMENT FOR CLIENTS SEGMENTATION
DATA MINING AND CUSTOMER RELATIONSHIP MANAGEMENT FOR CLIENTS SEGMENTATION Ionela-Catalina Tudorache (Zamfir) 1, Radu-Ioan Vija 2 1), 2) The Bucharest University of Economic Studies, Economic Cybernetics
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationWeek 1. Exploratory Data Analysis
Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam
More informationSegmentation of stock trading customers according to potential value
Expert Systems with Applications 27 (2004) 27 33 www.elsevier.com/locate/eswa Segmentation of stock trading customers according to potential value H.W. Shin a, *, S.Y. Sohn b a Samsung Economy Research
More informationDescriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
More informationBASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s
More informationDescriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
More informationGeoGebra. 10 lessons. Gerrit Stols
GeoGebra in 10 lessons Gerrit Stols Acknowledgements GeoGebra is dynamic mathematics open source (free) software for learning and teaching mathematics in schools. It was developed by Markus Hohenwarter
More informationClustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationLocal outlier detection in data forensics: data mining approach to flag unusual schools
Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationData Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data
Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS Predictive Modeling Seminar Louise Francis Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com Louise.francis@data-mines.cm
More informationLecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
More informationMEASURES OF CENTER AND SPREAD MEASURES OF CENTER 11/20/2014. What is a measure of center? a value at the center or middle of a data set
MEASURES OF CENTER AND SPREAD Mean and Median MEASURES OF CENTER What is a measure of center? a value at the center or middle of a data set Several different ways to determine the center: Mode Median Mean
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationSteven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501
PRINCIPAL COMPONENTS ANALYSIS (PCA) Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 May 2008 Introduction Suppose we had measured two variables, length and width, and
More informationAlgebra I Vocabulary Cards
Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression
More informationMeans, standard deviations and. and standard errors
CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationClustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
More informationPrincipal Component Analysis
Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded
More informationFactor Analysis. Chapter 420. Introduction
Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.
More informationCafcam: Crisp And Fuzzy Classification Accuracy Measurement Software
Cafcam: Crisp And Fuzzy Classification Accuracy Measurement Software Mohamed A. Shalan 1, Manoj K. Arora 2 and John Elgy 1 1 School of Engineering and Applied Sciences, Aston University, Birmingham, UK
More informationCOM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3
COMP 5318 Data Exploration and Analysis Chapter 3 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping
More informationDescriptive statistics parameters: Measures of centrality
Descriptive statistics parameters: Measures of centrality Contents Definitions... 3 Classification of descriptive statistics parameters... 4 More about central tendency estimators... 5 Relationship between
More informationFace detection is a process of localizing and extracting the face region from the
Chapter 4 FACE NORMALIZATION 4.1 INTRODUCTION Face detection is a process of localizing and extracting the face region from the background. The detected face varies in rotation, brightness, size, etc.
More informationImproving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
More informationSTATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI
STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members
More informationA Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images
A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images Małgorzata Charytanowicz, Jerzy Niewczas, Piotr A. Kowalski, Piotr Kulczycki, Szymon Łukasik, and Sławomir Żak Abstract Methods
More informationGeostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt
More informationDISCRIMINANT FUNCTION ANALYSIS (DA)
DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant
More informationCHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13
COMMON DESCRIPTIVE STATISTICS / 13 CHAPTER THREE COMMON DESCRIPTIVE STATISTICS The analysis of data begins with descriptive statistics such as the mean, median, mode, range, standard deviation, variance,
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More informationSEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India
More information430 Statistics and Financial Mathematics for Business
Prescription: 430 Statistics and Financial Mathematics for Business Elective prescription Level 4 Credit 20 Version 2 Aim Students will be able to summarise, analyse, interpret and present data, make predictions
More informationThe Annual Inflation Rate Analysis Using Data Mining Techniques
Economic Insights Trends and Challenges Vol. I (LXIV) No. 4/2012 121-128 The Annual Inflation Rate Analysis Using Data Mining Techniques Mădălina Cărbureanu Petroleum-Gas University of Ploiesti, Bd. Bucuresti
More informationFactor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models
Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis
More informationNorthumberland Knowledge
Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about
More informationData Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
More informationIris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode
Iris Sample Data Set Basic Visualization Techniques: Charts, Graphs and Maps CS598 Information Visualization Spring 2010 Many of the exploratory data techniques are illustrated with the Iris Plant data
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More informationA New Method for Dimensionality Reduction using K- Means Clustering Algorithm for High Dimensional Data Set
A New Method for Dimensionality Reduction using K- Means Clustering Algorithm for High Dimensional Data Set D.Napoleon Assistant Professor Department of Computer Science Bharathiar University Coimbatore
More informationIntroduction to Principal Components and FactorAnalysis
Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a
More informationData Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)
Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?
More informationDEA implementation and clustering analysis using the K-Means algorithm
Data Mining VI 321 DEA implementation and clustering analysis using the K-Means algorithm C. A. A. Lemos, M. P. E. Lins & N. F. F. Ebecken COPPE/Universidade Federal do Rio de Janeiro, Brazil Abstract
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationAdvanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
More informationAnalyzing Quantitative Data Ellen Taylor-Powell
G3658-6 Program Development and Evaluation Analyzing Quantitative Data Ellen Taylor-Powell Statistical analysis can be quite involved. However, there are some common mathematical techniques that can make
More informationThe right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median
CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box
More informationCLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES *
CLUSTERING LARGE DATA SETS WITH MIED NUMERIC AND CATEGORICAL VALUES * ZHEUE HUANG CSIRO Mathematical and Information Sciences GPO Box Canberra ACT, AUSTRALIA huang@cmis.csiro.au Efficient partitioning
More informationRobust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
More informationModule 4: Data Exploration
Module 4: Data Exploration Now that you have your data downloaded from the Streams Project database, the detective work can begin! Before computing any advanced statistics, we will first use descriptive
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
More informationBNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationMean = (sum of the values / the number of the value) if probabilities are equal
Population Mean Mean = (sum of the values / the number of the value) if probabilities are equal Compute the population mean Population/Sample mean: 1. Collect the data 2. sum all the values in the population/sample.
More informationPrototype-based classification by fuzzification of cases
Prototype-based classification by fuzzification of cases Parisa KordJamshidi Dep.Telecommunications and Information Processing Ghent university pkord@telin.ugent.be Bernard De Baets Dep. Applied Mathematics
More informationSteven M. Ho!and. Department of Geology, University of Georgia, Athens, GA 30602-2501
CLUSTER ANALYSIS Steven M. Ho!and Department of Geology, University of Georgia, Athens, GA 30602-2501 January 2006 Introduction Cluster analysis includes a broad suite of techniques designed to find groups
More informationEM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More information4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"
Data Analysis Plan The appropriate methods of data analysis are determined by your data types and variables of interest, the actual distribution of the variables, and the number of cases. Different analyses
More informationData Preparation and Statistical Displays
Reservoir Modeling with GSLIB Data Preparation and Statistical Displays Data Cleaning / Quality Control Statistics as Parameters for Random Function Models Univariate Statistics Histograms and Probability
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationExploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
More informationK-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationExploratory data analysis for microarray data
Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization
More informationA Correlation of. to the. South Carolina Data Analysis and Probability Standards
A Correlation of to the South Carolina Data Analysis and Probability Standards INTRODUCTION This document demonstrates how Stats in Your World 2012 meets the indicators of the South Carolina Academic Standards
More informationFUZZY IMAGE PROCESSING APPLIED TO TIME SERIES ANALYSIS
4.3 FUZZY IMAGE PROCESSING APPLIED TO TIME SERIES ANALYSIS R. Andrew Weekley 1 *, Robert K. Goodrich 1,2, Larry B. Cornman 1 1 National Center for Atmospheric Research **,Boulder,CO 2 University of Colorado,
More informationFACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.
FACTOR ANALYSIS Introduction Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables Both methods differ from regression in that they don t have
More information3.2 Measures of Spread
3.2 Measures of Spread In some data sets the observations are close together, while in others they are more spread out. In addition to measures of the center, it's often important to measure the spread
More informationHow to Get More Value from Your Survey Data
Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................2
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 What is data exploration? A preliminary
More informationData Mining for Model Creation. Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds.
Sept 03-23-05 22 2005 Data Mining for Model Creation Presentation by Paul Below, EDS 2500 NE Plunkett Lane Poulsbo, WA USA 98370 paul.below@eds.com page 1 Agenda Data Mining and Estimating Model Creation
More informationKEANSBURG SCHOOL DISTRICT KEANSBURG HIGH SCHOOL Mathematics Department. HSPA 10 Curriculum. September 2007
KEANSBURG HIGH SCHOOL Mathematics Department HSPA 10 Curriculum September 2007 Written by: Karen Egan Mathematics Supervisor: Ann Gagliardi 7 days Sample and Display Data (Chapter 1 pp. 4-47) Surveys and
More informationEngineering Problem Solving and Excel. EGN 1006 Introduction to Engineering
Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques
More informationModeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
More informationPerformance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
More information2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationThree had equal scores for the Progressive and Humanistic. schools. The other four had the following combination of
Three had equal scores for the Progressive and Humanistic schools. The other four had the following combination of schools: Behaviorist-Humanistic, Liberal Education- Progressive, Behaviorist-Progressive,
More informationKNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it
KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:
More informationAlignment and Preprocessing for Data Analysis
Alignment and Preprocessing for Data Analysis Preprocessing tools for chromatography Basics of alignment GC FID (D) data and issues PCA F Ratios GC MS (D) data and issues PCA F Ratios PARAFAC Piecewise
More informationSPSS Tutorial. AEB 37 / AE 802 Marketing Research Methods Week 7
SPSS Tutorial AEB 37 / AE 802 Marketing Research Methods Week 7 Cluster analysis Lecture / Tutorial outline Cluster analysis Example of cluster analysis Work on the assignment Cluster Analysis It is a
More informationHow To Write A Data Analysis
Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction
More informationAssessment of the National Water Quality Monitoring Program of Egypt
Assessment of the National Water Quality Monitoring Program of Egypt Rasha M.S. El Kholy 1, Bahaa M. Khalil & Shaden T. Abdel Gawad 3 1 Researcher, Assistant researcher, 3 Vice-chairperson, National Water
More informationSurvey Data Analysis. Qatar University. Dr. Kenneth M.Coleman (Ken.Coleman@marketstrategies.com) - University of Michigan
The following slides are the property of their authors and are provided on this website as a public service. Please do not copy or redistribute these slides without the written permission of all of the
More informationA fast, powerful data mining workbench designed for small to midsize organizations
FACT SHEET SAS Desktop Data Mining for Midsize Business A fast, powerful data mining workbench designed for small to midsize organizations What does SAS Desktop Data Mining for Midsize Business do? Business
More informationSubspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft
More informationHow To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationStandardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
More information