Using Data Mining Techniques for Analyzing Pottery Databases
|
|
|
- Patrick Simon
- 10 years ago
- Views:
Transcription
1 BAR-ILAN UNIVERSITY Using Data Mining Techniques for Analyzing Pottery Databases Zachi Zweig Submitted in partial fulfillment of the requirements for the Master s degree in the Martin (Szusz) Department of Land of Israel Studies & Archaeology, Bar-Ilan University. Ramat-Gan, Israel 2006
2 This thesis was written under the supervision of Prof. Aren Maeir ii
3 Abstract Archaeological research has always been data driven. Now-a-days, due to the rapid development of computing technology, the use of archaeological databases is starting to take a major role in archaeological studies. Although the amount of archaeological data has increased significantly, efficient methods for analyzing this data are still absent. The Data Mining process may resolve this problem by helping us transform the data into significant information. This thesis deals with testing the possibility of implementing Data Mining techniques on archaeological databases, while focusing on pottery data. Data Mining is the science of extracting useful information from large data sets. It was necessary to develop various unique methods for creating and preparing the data for this kind of process and analysis. Wide data tables with many variables were created. The methods by which this was done, and their implementation, added another aspect to this research. The typological issue was also dealt with from various different angles, while trying to find methods whereby real types would be revealed and identified. The purpose behind the various methods used, was not to accept nor to reject any existing hypotheses, but rather to help find certain patterns and relationships within the data, which could prove worthy of further investigation as to their archaeological meaning. The research was done on two case studies. The primary one was based on 280 whole vessels that were found at the site of Tell es-safi (the biblical Philistine town of Gath). In this case study, different methods for the detailed recording of the pottery were also examined. This data had sampling problems because of its relatively small size, and lack of spatial and chronological variability. The second case study focused on processing and analyzing data that had already been published. Here a large data table of about 8000 pottery shards that were found at the site of Tel Batash (the biblical town of Timnah), was used. In the first chapter the different Data Mining processes as they are used in the SAS Enterprise Miner 5.1 software, are described. This software makes use of the SEMMA (Sampling, Exploring, Modifying, Modeling and Assessing) process, and uses a process flow diagram for setting the flow of the various nodes. One of the primary nodes is the Metadata tool, which gives information about the variables of the data. Each variable is iii
4 defined by its Role (Input, Target, ID, etc.) and Level (Interval, Nominal, Ordinal and Binary). Another important node that is commonly used in this research is the Decision Trees tool. This method creates an empirical tree that represents segmentation of the data, by applying a series of simple rules. Each rule assigns an observation to a segment based on the value of one input. One rule is applied after another, resulting in an hierarchy of segments within segments. These rules can be verbally described, unlike the other kinds of models which use a black box approach. In addition, other nodes were used, such as: Impute for filling in missing values; SAS Code for incorporating code into the process flow and thereby extending the functionality of the process; Transform for creating new variables from existing ones; Cluster Analysis for clustering similar vessels and discovering types. Furthermore, the Data Mining Neural Networks node, which is based on the Neural Networks theory, was implemented to produce fuzzy variables, by creating prediction models for selected attributes. In addition to the Data Mining tools, a few simple Exploratory Data Analysis methods that were used for analyzing and displaying the Data Mining results, are also presented. These methods included: Contingency Tables (Chi-Square tests), Box-n-Whisker plots and Histograms. The second chapter covers the topic of the use of computerized databases in archaeology. This issue became one of the major research challenges with which I had to deal. In this chapter the accelerated development of the use of databases in archaeology in the last two decades is surveyed. Nevertheless, there is still an absence of formal protocols for recording archaeological data. The problems of objective recording, and that of using categorical-typological attributes, are introduced. Two questions arise: How detailed should this recording be? and: Which are the important traits that should be recorded? Emphasis was placed on the substantial potential of the Relational Database which enables enrichment of the data with contextual information. Furthermore, some new methods for recording the morphological characteristics of a vessel are introduced. These methods provide objective ways to record these kinds of attributes. The Karasik and Smilansky method is also included because of its use in the current research. With this method the shape of the vessel profile is represented as a mathematical curve iv
5 function. Using this function, vessel shapes can easily be compared, and various morphological attributes can be extracted. In the third chapter the different processes that were used for creating the data table of the Tell es-safi vessels are described. To start with, all the drawings of the vessels were scanned, and the images were processed in a way that enabled importation into MATLAB software. The Karasik and Smilansky programs were then run on this data. Following that, using Microsoft ACCESS, a special form was created for entering all the data relating to measurements that were directly and manually taken from the vessels. In the first phase the vessel drawing was incorporated within the form, and attributes that were seen in the drawing were recorded. Some of these attributes were defined typologically, such as the form of the rim or of the base. However, other attributes were easier to measure, such as rim direction or number of handles. A secondary form was used to record decoration attributes. In the next phase, attributes measured from the actual vessels were recorded. Such attributes were: hardness, color, slip preservation, erosion level, etc. In addition, some basic morphological attributes were measured, since some of the whole vessels had not been drawn. The next phase consisted of recording traits measured by macroscopic analysis of the clay fabric. A small shard sample was taken from most of the vessels for this analysis. By examining a freshly broken section with a magnifying glass, various attributes were measured, such as: types of inclusions, their frequency, size, shape, color and luster, or voids types, and their frequency, shape and size. Additional attributes were measured by other means, such as, differences in color between core and margins, core hardness, reaction to hydrochloric acid, reaction to a magnet, specific weight, etc. In these forms, there were instances where special text fields had to be created. These fields were necessary for describing complicated attributes that could not be described using lists of categories for nominal variables, such as: scraping marks, inclusions, changes in oxidization levels along the section, etc. Consequently a unique language code for describing each trait was created. These values were later extracted into simple variables using special programs which had to be specially written for this purpose.. The final phase of the manual recording included contextual information. The temporary stratum number for each locus that was used in the new vessels table had to be v
6 entered. For better spatial analysis, different architectural units in the architectural plan had to be defined and assigned to the loci in use in the table. The next stage involved computing many morphological attributes. These computations made use of the Karasik and Smilansky method. Using the MATLAB software, various programs were written for the purpose of automatically defining the borders of different parts, such as: lip, rim, shoulder and base. In addition, the profile was automatically subdivided into elemental sections using different methods. Two of them were based on curve peaks, one was based on inflection points, and another on curvature changes in relation to the horizontal axis. Many attributes were calculated for each section, such as: average curvature, average thickness, curvature skew, curvature kurtosis, section relative length, etc. This resulted in the creation of a total of 946 morphological variables. The final stage of preparing the data was done by programs written with the SAS Base language. The different data tables were merged into one table. The contextual data was also merged into this table, and in addition to the regular locus card fields, it included information on various artifacts that were found in the vessel s locus. This resulted in a data table which included a wealth of contextual information. For example, for each vessel there was information about the feature in which it was found, and other artifacts that were found near it, such as loom weights, bones, burnt seeds etc. In order to extend the functionality of analyzing categorical-typological attributes, the Decoding Table method was developed for breaking down these categorical definitions into shared elements. By doing this, categories with low frequencies could have a greater contribution to the analysis, and common traits of various categories could become meaningful. In addition, special programs were written to decipher the text variables of composite attributes. Many binary variables were created by these programs. Variables with low frequencies were dropped. The color attribute variables were converted to alternative color coding methods. Furthermore, various variables were created by calculating interactions amongst existing variables. These processes resulted in a very wide data table of 1920 variables. This table was imported into Enterprise Miner and was further processed. Using this software it is easy to set and modify the various processes using a flow diagram. It also supplies automatic vi
7 tools for common Data Mining processes. First the metadata was defined, to enable further control for processing and analyzing the data. Using the SAS Code node various programs were written for the purpose of adding more computed variables. Special formulas for defining the presumed function of the vessel were developed using this node, and special programs were also written for dropping variables that tend to have a unimodal distribution. In other nodes missing values were imputed, rare categorical values were grouped, and variables with unsuitable values were rejected. A group of Data Mining Neural Networks nodes were set to create prediction models for the various presumed functions and the temporary stratum number in which they were found. This actually created a set of fuzzy variables that defined the likeness of the vessel to the predicted trait. At the end of this process 1431 variables were left for use in the analysis. The fourth chapter deals with the ways data mining techniques could contribute to the various typological aspects. It begins with an introductory summary of the history of the use and research of typology in archaeology. It is suggested that the goal of the typological process is to compress a large number of items into representational categories in such a way that, on one hand, the loss of information will be minimal, and on the other hand, the level of detail will remain within the resources available for the publication. The partial success in creating an automatic typology of the Tell es-safi bowls is summarized. The conclusion is that with existing software programs and methods, it will not be possible to produce a satisfying typology that will meet the suggested typology goals, and, as such, new algorithms need to be developed for this task. The tests that were done were based on using Cluster Analysis methods, which have a major disadvantage for this kind of task, in that they assign the same weight to all the variables in use. Nevertheless some real types can be discovered using these methods. The methods used included creating different sets of types for different sets of variables (morphological, finishing, fabric and contextual), and then using Cluster Analysis to produce composite types using the four sets of typologies. An example of a successful real type that was discovered using this method is introduced. The importance of using vii
8 different types of attributes (not only morphological) in order to achieve an effective typology, is emphasized. The results of the automatic morphological typology are compared with the typology constructed by Shai, which was done using an intuitive approach. This research found that some attributes, such as, strength of carination, direction of curvature above carination, and curvature of base, showed greater significance within the automatic typology using Cluster Analysis, than in the intuitive typology. In addition, using Decision Trees, further typical characteristics for the intuitively defined types were searched for, and in most attempts, were successfully found. These successes may indicate that these are real types. Also found was that specific weight could be a good typological predictor. The fifth chapter reviews some tests that were conducted according to pre-defined questions. These tests were implemented mainly by predicting certain attributes with Decision Trees. The results show the substantial potential inherent in using a table with many variables. With certain tests there was some difficulty reaching reasonable results, because of the sampling problems of the Tell es-safi vessels. Using Decision Trees an attempt was made to characterize different classes of vessels, or traits, such as, vertical burnishing, reaction to hydrochloric acid, and reaction to a magnet. In addition, characterizations of the Late Bronze Age vessels from Area E were successfully done, resulting in a specific bowl type, with specific wall and rim angles, being revealed. The effectiveness of the fuzzy variables was also examined, by studying a spatial distribution map of vessels according to their likeness relation to serving vessels, and an interesting storage enclosure was found. The sixth chapter reviews experiments that made use of Association Discovery Analysis, for the purpose of discovering different traits that tend to appear together. In order to do this there was a need to reformat the vessel table to suit this kind of analysis, which is not intended for finding associations that are meaningful. This program is designed mainly for market analysis, where the purpose is to find associations that may increase profit. Because of this, it was necessary to develop different ways of preparing the data so that known in advance results would not show up. One of the solutions used viii
9 for solving this problem was to group the variables into classes of manufacturing phases, and to prevent the examination of association of traits of the same class. The main weakness of the Association Discovery Analysis is in the handling of interval variables. This problem was partly solved by binning the values, and changing the variables levels to ordinal. Eventually there were many results, each one of which had to be carefully examined as to its true meaning. The seventh chapter discusses the second case study with the data from Tel Batash. The purpose of this study was to examine methods for implementing Data Mining techniques on published data with a small number of variables. The main table contained 7809 observations of pottery rims that were found in Tel Batash. This table had a relatively small number of variables (about 31). For this reason the table was extended with contextual data and new variables which, with the help of Decoding Tables, were derived from the typological values. Each shard in this table was identified with a certain vessel type. It was therefore necessary to create a Decoding Table for vessel types, on the basis of the type description that appears in the published report. This process began by noting all the common typical attributes that are mentioned in this report, thus enabling formalization of a table with variables and values that matched the verbal descriptions. Furthermore, the basic morphological attributes from the representational drawings of each type, were measured. Another Decoding Table for fabric groups was created according to their verbal description. In addition, as was done with the Tell es-safi data, Decoding Tables were created for certain typological-categorical values, such as burnish type or rim type. In the next phase, the table was enriched with contextual data. Each locus had variables that indicated the existence of various finds within it. Variables for frequent words that appeared in the stratigraphic description were also created. In addition, using the feature definition, more calculated variables were created to indicate on which SFP (Site Formation Process) phase the locus is associated (building, occupation, destruction or abandonment). On the basis of the main shards table, new contextual variables that indicate which types of vessels were found in each locus or architectural unit, were created. ix
10 The results of the analysis of the Tel Batash data were much more meaningful than those obtained from the analysis of the whole vessels from Tell es-safi. The reason for this is the greater chronological and spatial variability of the Tel Batash data. The main key to success in these kinds of analyses is in choosing the right sample. When contextual issues were examined, there was a need to use a sample of one vessel from each locus. When associations among different attributes were examined, there was a need to use a sample of vessel types. When fabric attributes were examined there was a need to use a sample of vessel types that had non-missing fabric data. Decision Trees were used to characterize traits, such as, burnish type; or contextual values, such as, loci with bones. Chronological characterization of the vessels was done according to the stratum in which they were found. First the whole dataset was used, next, a sample of vessel types, and finally, a sample of types that also had fabric data. Surprisingly, the analyses of the sample of vessel types produced very interesting results, in spite of its small size. Many of these results did not appear at all using the whole dataset analysis. This kind of data reflects the potter s function and production style trends, rather than distribution of use. An example of an interesting result from the analysis of this kind of sample, was the tendency of the absolute direction of the rim to turn outwards during the later stages of Iron Age II. Association Discovery was also used with the Tel Batash data to discover traits that tend to appear together in the same vessel. The best results came from using the vessel types sample, because this kind of analysis, using the whole dataset, would result in artificial associations caused by the breaking down of types using the Decoding Tables. Another implementation of this analysis was done by examining vessel types that tend to appear in the same locus. It is interesting to note that the results of this kind of analysis demonstrated new aspects of the distribution of the examined types, aspects which were not mentioned in the published report. To summarize the work, according to the new archaeological results that came out of the different analyses that were done in this research (summarized in index 9), it appears that there is a real potential in using the approach utilized in the present study. In addition, the use of data tables with many variables has also turned out to be worthwhile, but only on the condition that the right analysis methods are used, such as those that were x
11 used in this research. It seems that in the near future, when inter-site databases will be created, the processing and analyzing of these databases, using Data Mining techniques, will be unavoidable. For this reason, a standard database structure should be decided upon and planned ahead to fit these needs. xi
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
Diagrams and Graphs of Statistical Data
Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING
Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction
CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Customer and Business Analytic
Customer and Business Analytic Applied Data Mining for Business Decision Making Using R Daniel S. Putler Robert E. Krider CRC Press Taylor &. Francis Group Boca Raton London New York CRC Press is an imprint
A New Approach for Evaluation of Data Mining Techniques
181 A New Approach for Evaluation of Data Mining s Moawia Elfaki Yahia 1, Murtada El-mukashfi El-taher 2 1 College of Computer Science and IT King Faisal University Saudi Arabia, Alhasa 31982 2 Faculty
What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling
MS4424 Data Mining & Modelling MS4424 Data Mining & Modelling Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : [email protected] 1 Aims To introduce the basic concepts of data mining
Clustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
Course Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Text Analytics using High Performance SAS Text Miner
Text Analytics using High Performance SAS Text Miner Edward R. Jones, Ph.D. Exec. Vice Pres.; Texas A&M Statistical Services Abstract: The latest release of SAS Enterprise Miner, version 13.1, contains
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
Schneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, 2013. p i.
New York, NY, USA: Basic Books, 2013. p i. http://site.ebrary.com/lib/mcgill/doc?id=10665296&ppg=2 New York, NY, USA: Basic Books, 2013. p ii. http://site.ebrary.com/lib/mcgill/doc?id=10665296&ppg=3 New
Data Mining with SAS. Mathias Lanner [email protected]. Copyright 2010 SAS Institute Inc. All rights reserved.
Data Mining with SAS Mathias Lanner [email protected] Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Data mining Introduction Data mining applications Data mining techniques SEMMA
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
BOR 6335 Data Mining. Course Description. Course Bibliography and Required Readings. Prerequisites
BOR 6335 Data Mining Course Description This course provides an overview of data mining and fundamentals of using RapidMiner and OpenOffice open access software packages to develop data mining models.
2015 Workshops for Professors
SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market
Nagarjuna College Of
Nagarjuna College Of Information Technology (Bachelor in Information Management) TRIBHUVAN UNIVERSITY Project Report on World s successful data mining and data warehousing projects Submitted By: Submitted
GEOENGINE MSc in Geomatics Engineering (Master Thesis) Anamelechi, Falasy Ebere
Master s Thesis: ANAMELECHI, FALASY EBERE Analysis of a Raster DEM Creation for a Farm Management Information System based on GNSS and Total Station Coordinates Duration of the Thesis: 6 Months Completion
SAS Rule-Based Codebook Generation for Exploratory Data Analysis Ross Bettinger, Senior Analytical Consultant, Seattle, WA
SAS Rule-Based Codebook Generation for Exploratory Data Analysis Ross Bettinger, Senior Analytical Consultant, Seattle, WA ABSTRACT A codebook is a summary of a collection of data that reports significant
How To Write A Data Analysis
Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction
Applying Data Mining Technique to Sales Forecast
Applying Data Mining Technique to Sales Forecast 1 Erkin Guler, 2 Taner Ersoz and 1 Filiz Ersoz 1 Karabuk University, Department of Industrial Engineering, Karabuk, Turkey [email protected], [email protected]
IBM SPSS Direct Marketing 19
IBM SPSS Direct Marketing 19 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This document contains proprietary information of SPSS
DYNAMIC FUZZY PATTERN RECOGNITION WITH APPLICATIONS TO FINANCE AND ENGINEERING LARISA ANGSTENBERGER
DYNAMIC FUZZY PATTERN RECOGNITION WITH APPLICATIONS TO FINANCE AND ENGINEERING LARISA ANGSTENBERGER Kluwer Academic Publishers Boston/Dordrecht/London TABLE OF CONTENTS FOREWORD ACKNOWLEDGEMENTS XIX XXI
Descriptive Statistics and Measurement Scales
Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample
IBM SPSS Direct Marketing 20
IBM SPSS Direct Marketing 20 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This edition applies to IBM SPSS Statistics 20 and to
Demographics of Atlanta, Georgia:
Demographics of Atlanta, Georgia: A Visual Analysis of the 2000 and 2010 Census Data 36-315 Final Project Rachel Cohen, Kathryn McKeough, Minnar Xie & David Zimmerman Ethnicities of Atlanta Figure 1: From
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
Testing Research and Statistical Hypotheses
Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon
The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon ABSTRACT Effective business development strategies often begin with market segmentation,
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA
315 DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA C. K. Lowe-Ma, A. E. Chen, D. Scholl Physical & Environmental Sciences, Research and Advanced Engineering Ford Motor Company, Dearborn, Michigan, USA
Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.
Title Introduction to Data Mining Dr Arulsivanathan Naidoo Statistics South Africa OECD Conference Cape Town 8-10 December 2010 1 Outline Introduction Statistics vs Knowledge Discovery Predictive Modeling
Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com
SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING
Fairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Data Mart/Warehouse: Progress and Vision
Data Mart/Warehouse: Progress and Vision Institutional Research and Planning University Information Systems What is data warehousing? A data warehouse: is a single place that contains complete, accurate
II. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
Predicting Students Final GPA Using Decision Trees: A Case Study
Predicting Students Final GPA Using Decision Trees: A Case Study Mashael A. Al-Barrak and Muna Al-Razgan Abstract Educational data mining is the process of applying data mining tools and techniques to
Statistics. Measurement. Scales of Measurement 7/18/2012
Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does
Lotto Master Formula (v1.3) The Formula Used By Lottery Winners
Lotto Master Formula (v.) The Formula Used By Lottery Winners I. Introduction This book is designed to provide you with all of the knowledge that you will need to be a consistent winner in your local lottery
A Review of Anomaly Detection Techniques in Network Intrusion Detection System
A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In
Modeling Guidelines Manual
Modeling Guidelines Manual [Insert company name here] July 2014 Author: John Doe [email protected] Page 1 of 22 Table of Contents 1. Introduction... 3 2. Business Process Management (BPM)... 4 2.1.
Marketing Research Core Body Knowledge (MRCBOK ) Learning Objectives
Fulfilling the core market research educational needs of individuals and companies worldwide Presented through a unique partnership between How to Contact Us: Phone: +1-706-542-3537 or 1-800-811-6640 (USA
Skewness and Kurtosis in Function of Selection of Network Traffic Distribution
Acta Polytechnica Hungarica Vol. 7, No., Skewness and Kurtosis in Function of Selection of Network Traffic Distribution Petar Čisar Telekom Srbija, Subotica, Serbia, [email protected] Sanja Maravić Čisar
A Demonstration of Hierarchical Clustering
Recitation Supplement: Hierarchical Clustering and Principal Component Analysis in SAS November 18, 2002 The Methods In addition to K-means clustering, SAS provides several other types of unsupervised
Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement
Measurement & Data Analysis Overview of Measurement. Variability & Measurement Error.. Descriptive vs. Inferential Statistics. Descriptive Statistics. Distributions. Standardized Scores. Graphing Data.
The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median
CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box
Master s Program in Information Systems
The University of Jordan King Abdullah II School for Information Technology Department of Information Systems Master s Program in Information Systems 2006/2007 Study Plan Master Degree in Information Systems
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
an introduction to VISUALIZING DATA by joel laumans
an introduction to VISUALIZING DATA by joel laumans an introduction to VISUALIZING DATA iii AN INTRODUCTION TO VISUALIZING DATA by Joel Laumans Table of Contents 1 Introduction 1 Definition Purpose 2 Data
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry
Paper 1808-2014 Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry Kittipong Trongsawad and Jongsawas Chongwatpol NIDA Business School, National Institute
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Visualizing e-government Portal and Its Performance in WEBVS
Visualizing e-government Portal and Its Performance in WEBVS Ho Si Meng, Simon Fong Department of Computer and Information Science University of Macau, Macau SAR [email protected] Abstract An e-government
AMS 5 CHANCE VARIABILITY
AMS 5 CHANCE VARIABILITY The Law of Averages When tossing a fair coin the chances of tails and heads are the same: 50% and 50%. So if the coin is tossed a large number of times, the number of heads and
Time series clustering and the analysis of film style
Time series clustering and the analysis of film style Nick Redfern Introduction Time series clustering provides a simple solution to the problem of searching a database containing time series data such
RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY
RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY I. INTRODUCTION According to the Common Core Standards (2010), Decisions or predictions are often based on data numbers
Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS
SECTION 2-1: OVERVIEW Chapter 2 Describing, Exploring and Comparing Data 19 In this chapter, we will use the capabilities of Excel to help us look more carefully at sets of data. We can do this by re-organizing
Visualization Techniques in Data Mining
Tecniche di Apprendimento Automatico per Applicazioni di Data Mining Visualization Techniques in Data Mining Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo di Milano
Easily Identify the Right Customers
PASW Direct Marketing 18 Specifications Easily Identify the Right Customers You want your marketing programs to be as profitable as possible, and gaining insight into the information contained in your
A DATA ANALYSIS TOOL THAT ORGANIZES ANALYSIS BY VARIABLE TYPES. Rodney Carr Deakin University Australia
A DATA ANALYSIS TOOL THAT ORGANIZES ANALYSIS BY VARIABLE TYPES Rodney Carr Deakin University Australia XLStatistics is a set of Excel workbooks for analysis of data that has the various analysis tools
Visualization of missing values using the R-package VIM
Institut f. Statistik u. Wahrscheinlichkeitstheorie 040 Wien, Wiedner Hauptstr. 8-0/07 AUSTRIA http://www.statistik.tuwien.ac.at Visualization of missing values using the R-package VIM M. Templ and P.
LVQ Plug-In Algorithm for SQL Server
LVQ Plug-In Algorithm for SQL Server Licínia Pedro Monteiro Instituto Superior Técnico [email protected] I. Executive Summary In this Resume we describe a new functionality implemented
Customer Analytics. Turn Big Data into Big Value
Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data
Course Descriptions: Undergraduate/Graduate Certificate Program in Data Visualization and Analysis
9/3/2013 Course Descriptions: Undergraduate/Graduate Certificate Program in Data Visualization and Analysis Seton Hall University, South Orange, New Jersey http://www.shu.edu/go/dava Visualization and
Visual Structure Analysis of Flow Charts in Patent Images
Visual Structure Analysis of Flow Charts in Patent Images Roland Mörzinger, René Schuster, András Horti, and Georg Thallinger JOANNEUM RESEARCH Forschungsgesellschaft mbh DIGITAL - Institute for Information
Data Preprocessing. Week 2
Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.
LIST OF CONTENTS CHAPTER CONTENT PAGE DECLARATION DEDICATION ACKNOWLEDGEMENTS ABSTRACT ABSTRAK
vii LIST OF CONTENTS CHAPTER CONTENT PAGE DECLARATION DEDICATION ACKNOWLEDGEMENTS ABSTRACT ABSTRAK LIST OF CONTENTS LIST OF TABLES LIST OF FIGURES LIST OF NOTATIONS LIST OF ABBREVIATIONS LIST OF APPENDICES
What is Visualization? Information Visualization An Overview. Information Visualization. Definitions
What is Visualization? Information Visualization An Overview Jonathan I. Maletic, Ph.D. Computer Science Kent State University Visualize/Visualization: To form a mental image or vision of [some
Jacket crown. Advantage : Crown and Bridge
Crown and Bridge Lecture 1 Dr.Nibras AL-Kuraine Jacket crown It is a type of crown that is formed by a tooth colored material. It is mainly used as a single unit in the anterior quadrant of the mouth.
Data Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING
DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING ABSTRACT The objective was to predict whether an offender would commit a traffic offence involving death, using decision tree analysis. Four
Data Mining Using SAS Enterprise Miner : A Case Study Approach, Second Edition
Data Mining Using SAS Enterprise Miner : A Case Study Approach, Second Edition The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2003. Data Mining Using SAS Enterprise
Effecting Data Quality Improvement through Data Virtualization
Effecting Data Quality Improvement through Data Virtualization Prepared for Composite Software by: David Loshin Knowledge Integrity, Inc. June, 2010 2010 Knowledge Integrity, Inc. Page 1 Introduction The
A fast, powerful data mining workbench designed for small to midsize organizations
FACT SHEET SAS Desktop Data Mining for Midsize Business A fast, powerful data mining workbench designed for small to midsize organizations What does SAS Desktop Data Mining for Midsize Business do? Business
INTRODUCING THE NORMAL DISTRIBUTION IN A DATA ANALYSIS COURSE: SPECIFIC MEANING CONTRIBUTED BY THE USE OF COMPUTERS
INTRODUCING THE NORMAL DISTRIBUTION IN A DATA ANALYSIS COURSE: SPECIFIC MEANING CONTRIBUTED BY THE USE OF COMPUTERS Liliana Tauber Universidad Nacional del Litoral Argentina Victoria Sánchez Universidad
THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell
THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether
Exploratory Data Analysis. Psychology 3256
Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find
CSU, Fresno - Institutional Research, Assessment and Planning - Dmitri Rogulkin
My presentation is about data visualization. How to use visual graphs and charts in order to explore data, discover meaning and report findings. The goal is to show that visual displays can be very effective
PolyLens: Software for Map-based Visualization and Analysis of Genome-scale Polymorphism Data
PolyLens: Software for Map-based Visualization and Analysis of Genome-scale Polymorphism Data Ryhan Pathan Department of Electrical Engineering and Computer Science University of Tennessee Knoxville Knoxville,
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Foundation of Quantitative Data Analysis
Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1
6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
Data Mining in Telecommunication
Data Mining in Telecommunication Mohsin Nadaf & Vidya Kadam Department of IT, Trinity College of Engineering & Research, Pune, India E-mail : [email protected] Abstract Telecommunication is one of
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
