IS463 Introduction to Data Mining Semester 1, Academic year Tutorial # 2

Size: px
Start display at page:

Download "IS463 Introduction to Data Mining Semester 1, Academic year Tutorial # 2"

Transcription

1 IS463 Introduction to Data Mining Semester 1, Academic year Tutorial # 2 Activity 1: Classify the following attributes as qualitative (nominal or ordinal) or quantitative (interval or ratio), also classify them as binary, discrete, and continuous: Age in years. Outlook for weather data (sunny, overcast, rainy) Temperature in weather data. (Hot, mild, cool) Angles as measured in degrees between 0 and 360. Bronze, Silver, and Gold medals as awarded in the Olympics. Height above sea level. Number of patients in a hospital. Calendar dates. Ability to pass light in terms of the following values: opaque, translucent, and transparent. Distance from the center of campus. Density of a substance in grams per cubic centimeter. Coat check number. Street Numbers. The answers are between the brackets: Age in years.[discrete, quantitative, ratio] Outlook for weather data (sunny, overcast, rainy). [Discrete, qualitative, nominal] Temperature in weather data. (Hot, mild, cool)[discrete, qualitative, ordinal] Angles as measured in degrees between 0 and 360.[Continuous, quantitative, interval] Bronze, Silver, and Gold medals as awarded in the Olympics.[Discrete, qualitative, ordinal] Height above sea level.[continuous, quantitative, interval] Number of patients in a hospital.[discrete, quantitative, ratio] Calendar dates.[discrete, quantitative, interval] Ability to pass light in terms of the following values: opaque, translucent, and transparent.[discrete, qualitative, ordinal]

2 Distance from the center of campus.[continuous, quantitative, ratio] Density of a substance in grams per cubic centimeter.[continuous, quantitative, ratio] Coat check number.[discrete, qualitative, nominal] Street Numbers.[Discrete, qualitative, nominal; unless they can be used for ordering then it would be ordinal]. Activity 2: From those two given tables, give example for each type of attribute with a brief description. Show how may the same attribute have different types based on how it is being represented. Table 1: Outlook Temperature Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild Normal True Yes Table 2: Outlook Temperature (F) Humidity Windy Play Sunny False No Sunny True No Overcast False Yes Rainy True Yes Nominal: Outlook and windy columns in both tables. Ordinal: Temperature and humidity in table 1. Interval: Temperature and humidity in table 2. Ratio: None. Activity 3: Give three examples of data quality problems with a brief description by examples. Then, Mention some methods for handling the missing data.

3 Noise and outliers: Noise refers to modification of original values Examples: distortion of a person s voice when talkingon a poor phone and snow on television screen. Outliers are data objects with characteristics that are considerably different than most of the other data objects in the data set. Missing values: - Information is not collected(e.g., people decline to give their age and weight). - Attributes may not be applicable to all cases (e.g., annual income is not applicable to children). Duplicate data Data set may include data objects that are duplicates, or almost duplicates of one another. Example: Same person with multiple addresses. Some methods for handling the missing data Eliminate Data Objects Estimate Missing Values Ignore the Missing Value During Analysis Replace with all possible values (weighted by their probabilities) Activity 4: Given this scale for a candy bar {Poor, fair, OK, good, wonderful}. Find d (good, wonderful), d (fair, wonderful), and s as well. The type of the attributes is ordinal. Thus, the values of the ordinal attribute are mapped to successive integers, {poor = 0, fair = 1, OK = 2, good = 3, wonderful = 4}, then: d (good, wonderful) = (4 3) / 4 = 0.25 s = 1 d = = 0.75 d (fair, wonderful) = (4 1 ) / 4 = 0.5 s = = 0.5

4 Activity 5: Given those two points: p (0, 2), q (5, 1). Calculate the Euclidean Distance between them. What about d (q, p)? dist = d (p, q) = d (q, p) = d (p, q) = Activity 6: Considering those two binary vectors: X = (1, 1, 0, 0, 0, 0, 0, 0, 0, 0) Y = (0, 1, 0, 0, 0, 0, 1, 0, 0, 1) Calculate SMC and J. f00 = the number of attribute where x is 0 and y is 0 f01 = the number of attribute where x is 0 and y is 1 f10 = the number of attribute where x is 1 and y is 0 f11 = the number of attribute where x is 1 and y is 1 In this example, the value of those quantities are: f00 = 6,f01 = 2, f10 = 1, f11 = 1 SMC = number of matching attribute values / number of attributes = ( f11+ f00) / ( f00 + f01 + f10 + f11 ) = ( ) / ( ) = 0.7 J = number of matching presence / number of attributes not involved on 00 matches

5 = f11 / ( f01+ f10 + f11 ) = 1 / ( ) = 0.25 Activity 7: -Considering those two document vectors: X = (3, 2, 0, 5, 0, 0, 0, 2, 0, 0) Y = (1, 0, 0, 0, 0, 0, 0, 1, 0, 2) Calculate the cosine similarity. cos = x y / x y, where indicates the vector dot product, n k=1 x y = xkyk, n2 k=1 x is the length of vector x, x = xk = x x x y = 3 * * * * * * * * * * 2 = 5 x = (3 * * * * * * * * * * 0) = 6.48 y = (1 * * * * * * * * * * 2) = 2.24 cos(x, y) = 0.31 Activity 8: Correlation For the following two vectors, x and y, find the correlation. X = (4, 4, 3, 1, 0, 5, 3, 4, 4, 3) Y = (3, 4, 3, 3, 2, 4, 4, 4, 5, 4)

6 r = correlation (x, y) n = number of values S. D. (x) = standard deviation (x) S. D. (y) = standard deviation (y) n x y xy x 2 y Sum corr (x, y) = ( 10 * 120 ) ( 31 * 36 ) / [( 10 * )* ( 10 * )] = 0.73

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Data Mining with Weka

Data Mining with Weka Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Data Mining with Weka a practical course on how to

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Exam Name 1) A recent report stated ʺBased on a sample of 90 truck drivers, there is evidence to indicate that, on average, independent truck drivers earn more than company -hired truck drivers.ʺ Does

More information

Monday Morning Data Mining

Monday Morning Data Mining Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik

More information

Lecture 6 - Data Mining Processes

Lecture 6 - Data Mining Processes Lecture 6 - Data Mining Processes Dr. Songsri Tangsripairoj Dr.Benjarath Pupacdi Faculty of ICT, Mahidol University 1 Cross-Industry Standard Process for Data Mining (CRISP-DM) Example Application: Telephone

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

Knowledge-based systems and the need for learning

Knowledge-based systems and the need for learning Knowledge-based systems and the need for learning The implementation of a knowledge-based system can be quite difficult. Furthermore, the process of reasoning with that knowledge can be quite slow. This

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Data Mining 5. Cluster Analysis

Data Mining 5. Cluster Analysis Data Mining 5. Cluster Analysis 5.2 Fall 2009 Instructor: Dr. Masoud Yaghini Outline Data Structures Interval-Valued (Numeric) Variables Binary Variables Categorical Variables Ordinal Variables Variables

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Instructor s Solution Manual Pang-Ning Tan Michael Steinbach Vipin Kumar Copyright c 2006 Pearson Addison-Wesley. All rights reserved. Contents 1 Introduction 1 2 Data 5 3

More information

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

Intro to GIS Winter 2011. Data Visualization Part I

Intro to GIS Winter 2011. Data Visualization Part I Intro to GIS Winter 2011 Data Visualization Part I Cartographer Code of Ethics Always have a straightforward agenda and have a defining purpose or goal for each map Always strive to know your audience

More information

Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab

Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab 1 Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab I m sure you ve wondered about the absorbency of paper towel brands as you ve quickly tried to mop up spilled soda from

More information

Decision-Tree Learning

Decision-Tree Learning Decision-Tree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: Top-Down Induction of Decision Trees Numeric Values Missing Values

More information

Using SPSS, Chapter 2: Descriptive Statistics

Using SPSS, Chapter 2: Descriptive Statistics 1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,

More information

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL STATIsTICs 4 IV. RANDOm VECTORs 1. JOINTLY DIsTRIBUTED RANDOm VARIABLEs If are two rom variables defined on the same sample space we define the joint

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

Concepts of Variables. Levels of Measurement. The Four Levels of Measurement. Nominal Scale. Greg C Elvers, Ph.D.

Concepts of Variables. Levels of Measurement. The Four Levels of Measurement. Nominal Scale. Greg C Elvers, Ph.D. Concepts of Variables Greg C Elvers, Ph.D. 1 Levels of Measurement When we observe and record a variable, it has characteristics that influence the type of statistical analysis that we can perform on it

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

Data Mining on Streams

Data Mining on Streams Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs

More information

Name: Date: Use the following to answer questions 2-3:

Name: Date: Use the following to answer questions 2-3: Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Creating Pivot Tables Using Excel 2003

Creating Pivot Tables Using Excel 2003 1 Creating Pivot Tables Using Excel 2003 Creating Six Kinds of Tables Milo Schield Member: International Statistical Institute US Rep: International Statistical Literacy Project Director, W M Keck Statistical

More information

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2 Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

Lesson 15 - Fill Cells Plugin

Lesson 15 - Fill Cells Plugin 15.1 Lesson 15 - Fill Cells Plugin This lesson presents the functionalities of the Fill Cells plugin. Fill Cells plugin allows the calculation of attribute values of tables associated with cell type layers.

More information

More Data Mining with Weka

More Data Mining with Weka More Data Mining with Weka Class 5 Lesson 1 Simple neural networks Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Lesson 5.1: Simple neural networks Class

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

More information

Figure 1.1 Vector A and Vector F

Figure 1.1 Vector A and Vector F CHAPTER I VECTOR QUANTITIES Quantities are anything which can be measured, and stated with number. Quantities in physics are divided into two types; scalar and vector quantities. Scalar quantities have

More information

DoIT User Survey Data Analysis: 2013-2015

DoIT User Survey Data Analysis: 2013-2015 DoIT User Survey Data Analysis: 2013- Introduction For the past three years (2013-), DoIT has deployed a user survey to assess the importance of different technologies for campus users. This analysis explores

More information

Metric Conversion: Stair-Step Method

Metric Conversion: Stair-Step Method ntroduction to Conceptual Physics Metric Conversion: Stair-Step Method Kilo- 1000 Hecto- 100 Deka- 10 Base Unit grams liters meters The Metric System of measurement is based on multiples of 10. Prefixes

More information

AIMS Education Foundation

AIMS Education Foundation Developed and Published by AIMS Education Foundation This book contains materials developed by the AIMS Education Foundation. AIMS (Activities Integrating Mathematics and Science) began in 1981 with a

More information

GEOGRAPHIC INFORMATION SYSTEMS CERTIFICATION

GEOGRAPHIC INFORMATION SYSTEMS CERTIFICATION GEOGRAPHIC INFORMATION SYSTEMS CERTIFICATION GIS Syllabus - Version 1.2 January 2007 Copyright AICA-CEPIS 2009 1 Version 1 January 2007 GIS Certification Programme 1. Target The GIS certification is aimed

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

A Quick Guide to Constructing an SPSS Code Book Prepared by Amina Jabbar, Centre for Research on Inner City Health

A Quick Guide to Constructing an SPSS Code Book Prepared by Amina Jabbar, Centre for Research on Inner City Health A Quick Guide to Constructing an SPSS Code Book Prepared by Amina Jabbar, Centre for Research on Inner City Health 1. To begin, double click on SPSS icon. The icon will probably look something like this

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Business Statistics: Intorduction

Business Statistics: Intorduction Business Statistics: Intorduction Donglei Du (ddu@unb.edu) Faculty of Business Administration, University of New Brunswick, NB Canada Fredericton E3B 9Y2 September 23, 2015 Donglei Du (UNB) AlgoTrading

More information

Data Mining of Web Access Logs

Data Mining of Web Access Logs Data Mining of Web Access Logs A minor thesis submitted in partial fulfilment of the requirements for the degree of Master of Applied Science in Information Technology Anand S. Lalani School of Computer

More information

Chapter 1: The Nature of Probability and Statistics

Chapter 1: The Nature of Probability and Statistics Chapter 1: The Nature of Probability and Statistics Learning Objectives Upon successful completion of Chapter 1, you will have applicable knowledge of the following concepts: Statistics: An Overview and

More information

Data Mining Essentials

Data Mining Essentials This chapter is from Social Media Mining: An Introduction. By Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. Cambridge University Press, 2014. Draft version: April 20, 2014. Complete Draft and Slides

More information

8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence 8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

More information

Lecture 2: Types of Variables

Lecture 2: Types of Variables 2typesofvariables.pdf Michael Hallstone, Ph.D. hallston@hawaii.edu Lecture 2: Types of Variables Recap what we talked about last time Recall how we study social world using populations and samples. Recall

More information

Chapter 1: Chemistry: Measurements and Methods

Chapter 1: Chemistry: Measurements and Methods Chapter 1: Chemistry: Measurements and Methods 1.1 The Discovery Process o Chemistry - The study of matter o Matter - Anything that has mass and occupies space, the stuff that things are made of. This

More information

MARKETING RESEARCH AND MARKET INTELLIGENCE (MRM711S) FEEDBACK TUTORIAL LETTER SEMESTER `1 OF 2016. Dear Student

MARKETING RESEARCH AND MARKET INTELLIGENCE (MRM711S) FEEDBACK TUTORIAL LETTER SEMESTER `1 OF 2016. Dear Student MARKETING RESEARCH AND MARKET INTELLIGENCE (MRM711S) FEEDBACK TUTORIAL LETTER SEMESTER `1 OF 2016 Dear Student Assignment 1 has been marked and this serves as feedback on the assignment. I have included

More information

Advanced GMAT Math Questions

Advanced GMAT Math Questions Advanced GMAT Math Questions Version Quantitative Fractions and Ratios 1. The current ratio of boys to girls at a certain school is to 5. If 1 additional boys were added to the school, the new ratio of

More information

Outer Diameter 23 φ mm Face side Dimension 20.1 φ mm. Baffle Opening. Normal 0.5 Watts Maximum 1.0 Watts Sine Wave.

Outer Diameter 23 φ mm Face side Dimension 20.1 φ mm. Baffle Opening. Normal 0.5 Watts Maximum 1.0 Watts Sine Wave. 1. MODEL: 23CR08FH-50ND 2 Dimension & Weight Outer Diameter 23 φ mm Face side Dimension 20.1 φ mm Baffle Opening 20.1 φ mm Height Refer to drawing Weight 4.0Grams 3 Magnet Materials Rare Earth Size φ 9.5

More information

Decision Trees. JERZY STEFANOWSKI Institute of Computing Science Poznań University of Technology. Doctoral School, Catania-Troina, April, 2008

Decision Trees. JERZY STEFANOWSKI Institute of Computing Science Poznań University of Technology. Doctoral School, Catania-Troina, April, 2008 Decision Trees JERZY STEFANOWSKI Institute of Computing Science Poznań University of Technology Doctoral School, Catania-Troina, April, 2008 Aims of this module The decision tree representation. The basic

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION INTEGRATED ALGEBRA. Thursday, August 16, 2012 8:30 to 11:30 a.m.

The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION INTEGRATED ALGEBRA. Thursday, August 16, 2012 8:30 to 11:30 a.m. INTEGRATED ALGEBRA The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION INTEGRATED ALGEBRA Thursday, August 16, 2012 8:30 to 11:30 a.m., only Student Name: School Name: Print your name

More information

Analyzing Research Data Using Excel

Analyzing Research Data Using Excel Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial

More information

6. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation, or survey):

6. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation, or survey): MATH 1040 REVIEW (EXAM I) Chapter 1 1. For the studies described, identify the population, sample, population parameters, and sample statistics: a) The Gallup Organization conducted a poll of 1003 Americans

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

Vector Spaces; the Space R n

Vector Spaces; the Space R n Vector Spaces; the Space R n Vector Spaces A vector space (over the real numbers) is a set V of mathematical entities, called vectors, U, V, W, etc, in which an addition operation + is defined and in which

More information

Best Practices in Data Visualizations. Vihao Pham January 29, 2014

Best Practices in Data Visualizations. Vihao Pham January 29, 2014 Best Practices in Data Visualizations Vihao Pham January 29, 2014 Agenda Best Practices in Data Visualizations Why We Visualize Understanding Data Visualizations Enhancing Visualizations Visualization

More information

Best Practices in Data Visualizations. Vihao Pham 2014

Best Practices in Data Visualizations. Vihao Pham 2014 Best Practices in Data Visualizations Vihao Pham 2014 Agenda Best Practices in Data Visualizations Why We Visualize Understanding Data Visualizations Enhancing Visualizations Visualization Considerations

More information

Microsoft Azure Machine learning Algorithms

Microsoft Azure Machine learning Algorithms Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Framing Business Problems as Data Mining Problems

Framing Business Problems as Data Mining Problems Framing Business Problems as Data Mining Problems Asoka Diggs Data Scientist, Intel IT January 21, 2016 Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS

More information

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

More information

SOST 201 September 18-20, 2006. Measurement of Variables 2

SOST 201 September 18-20, 2006. Measurement of Variables 2 1 Social Studies 201 September 18-20, 2006 Measurement of variables See text, chapter 3, pp. 61-86. These notes and Chapter 3 of the text examine ways of measuring variables in order to describe members

More information

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,

More information

IBM SPSS Statistics for Beginners for Windows

IBM SPSS Statistics for Beginners for Windows ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning

More information

The Dot and Cross Products

The Dot and Cross Products The Dot and Cross Products Two common operations involving vectors are the dot product and the cross product. Let two vectors =,, and =,, be given. The Dot Product The dot product of and is written and

More information

Key. Name: OBJECTIVES

Key. Name: OBJECTIVES Name: Key OBJECTIVES Correctly define: observation, inference, classification, percent deviation, density, rate of change, cyclic change, dynamic equilibrium, interface, mass, volume GRAPHICAL RELATIONSHIPS

More information

13.5. Click here for answers. Click here for solutions. CURL AND DIVERGENCE. 17. F x, y, z x i y j x k. 2. F x, y, z x 2z i x y z j x 2y k

13.5. Click here for answers. Click here for solutions. CURL AND DIVERGENCE. 17. F x, y, z x i y j x k. 2. F x, y, z x 2z i x y z j x 2y k SECTION CURL AND DIVERGENCE 1 CURL AND DIVERGENCE A Click here for answers. S Click here for solutions. 1 15 Find (a the curl and (b the divergence of the vector field. 1. F x, y, xy i y j x k. F x, y,

More information

Measurement and Measurement Scales

Measurement and Measurement Scales Measurement and Measurement Scales Measurement is the foundation of any scientific investigation Everything we do begins with the measurement of whatever it is we want to study Definition: measurement

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

Modifying Colors and Symbols in ArcMap

Modifying Colors and Symbols in ArcMap Modifying Colors and Symbols in ArcMap Contents Introduction... 1 Displaying Categorical Data... 3 Creating New Categories... 5 Displaying Numeric Data... 6 Graduated Colors... 6 Graduated Symbols... 9

More information

Mathematics Pre-Test Sample Questions A. { 11, 7} B. { 7,0,7} C. { 7, 7} D. { 11, 11}

Mathematics Pre-Test Sample Questions A. { 11, 7} B. { 7,0,7} C. { 7, 7} D. { 11, 11} Mathematics Pre-Test Sample Questions 1. Which of the following sets is closed under division? I. {½, 1,, 4} II. {-1, 1} III. {-1, 0, 1} A. I only B. II only C. III only D. I and II. Which of the following

More information

4. How many integers between 2004 and 4002 are perfect squares?

4. How many integers between 2004 and 4002 are perfect squares? 5 is 0% of what number? What is the value of + 3 4 + 99 00? (alternating signs) 3 A frog is at the bottom of a well 0 feet deep It climbs up 3 feet every day, but slides back feet each night If it started

More information

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Practice Test. 4) The planet Earth loses heat mainly by A) conduction. B) convection. C) radiation. D) all of these Answer: C

Practice Test. 4) The planet Earth loses heat mainly by A) conduction. B) convection. C) radiation. D) all of these Answer: C Practice Test 1) Increase the pressure in a container of oxygen gas while keeping the temperature constant and you increase the A) molecular speed. B) molecular kinetic energy. C) Choice A and choice B

More information

Geography affects climate.

Geography affects climate. KEY CONCEPT Climate is a long-term weather pattern. BEFORE, you learned The Sun s energy heats Earth s surface unevenly The atmosphere s temperature changes with altitude Oceans affect wind flow NOW, you

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

Lab 1: The metric system measurement of length and weight

Lab 1: The metric system measurement of length and weight Lab 1: The metric system measurement of length and weight Introduction The scientific community and the majority of nations throughout the world use the metric system to record quantities such as length,

More information

Please be sure to save a copy of this activity to your computer!

Please be sure to save a copy of this activity to your computer! Thank you for your purchase Please be sure to save a copy of this activity to your computer! This activity is copyrighted by AIMS Education Foundation. All rights reserved. No part of this work may be

More information

OUTLIER ANALYSIS. Data Mining 1

OUTLIER ANALYSIS. Data Mining 1 OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,

More information

Scope and Sequence KA KB 1A 1B 2A 2B 3A 3B 4A 4B 5A 5B 6A 6B

Scope and Sequence KA KB 1A 1B 2A 2B 3A 3B 4A 4B 5A 5B 6A 6B Scope and Sequence Earlybird Kindergarten, Standards Edition Primary Mathematics, Standards Edition Copyright 2008 [SingaporeMath.com Inc.] The check mark indicates where the topic is first introduced

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

PUTTING ENGLISH TO WORK 1: UNIT 20

PUTTING ENGLISH TO WORK 1: UNIT 20 PUTTING ENGLISH TO WORK 1: UNIT 20 WEATHER In this unit you will learn: THESE LIFE SKILLS: Giving weather forecasts Leisure activities It s going to be warm tomorrow. taking photographs, playing a game

More information

Credit Risk Models. August 24 26, 2010

Credit Risk Models. August 24 26, 2010 Credit Risk Models August 24 26, 2010 AGENDA 1 st Case Study : Credit Rating Model Borrowers and Factoring (Accounts Receivable Financing) pages 3 10 2 nd Case Study : Credit Scoring Model Automobile Leasing

More information

Data Mining - Introduction

Data Mining - Introduction Data Mining - Introduction Peter Brezany Institut für Scientific Computing Universität Wien Tel. 4277 39425 Sprechstunde: Di, 13.00-14.00 Outline Business Intelligence and its components Knowledge discovery

More information

Statistical research is always concerned with a group of research objects, called population or universe (populaatio/perusjoukko).

Statistical research is always concerned with a group of research objects, called population or universe (populaatio/perusjoukko). 2. Data and Measurement 2.1. Basic Concepts Statistical research is always concerned with a group of research objects, called population or universe (populaatio/perusjoukko). Determining the bounds of

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

WEATHER AND CLIMATE practice test

WEATHER AND CLIMATE practice test WEATHER AND CLIMATE practice test Multiple Choice Identify the choice that best completes the statement or answers the question. 1. What role does runoff play in the water cycle? a. It is the process in

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

The three tests of mental ability you will be asked to do at the AOSB are:

The three tests of mental ability you will be asked to do at the AOSB are: Introduction The Army requires that candidates for Officer Training have certain mental abilities. These mental abilities are measured by three tests that are described in this booklet. It is essential

More information

S P S S Statistical Package for the Social Sciences

S P S S Statistical Package for the Social Sciences S P S S Statistical Package for the Social Sciences Data Entry Data Management Basic Descriptive Statistics Jamie Lynn Marincic Leanne Hicks Survey, Statistics, and Psychometrics Core Facility (SSP) July

More information

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

More information

AMS 7L LAB #2 Spring, 2009. Exploratory Data Analysis

AMS 7L LAB #2 Spring, 2009. Exploratory Data Analysis AMS 7L LAB #2 Spring, 2009 Exploratory Data Analysis Name: Lab Section: Instructions: The TAs/lab assistants are available to help you if you have any questions about this lab exercise. If you have any

More information

Viscosity: A Property of fluids 307-6 Compare the viscosity of various liquids 307-7 Describe factors that can modify the viscosity of a liquid 208-6

Viscosity: A Property of fluids 307-6 Compare the viscosity of various liquids 307-7 Describe factors that can modify the viscosity of a liquid 208-6 Viscosity: A Property of fluids 307-6 Compare the viscosity of various liquids 307-7 Describe factors that can modify the viscosity of a liquid 208-6 Design an experiment to test the viscosity of various

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

PHYSICAL QUANTITIES AND UNITS

PHYSICAL QUANTITIES AND UNITS 1 PHYSICAL QUANTITIES AND UNITS Introduction Physics is the study of matter, its motion and the interaction between matter. Physics involves analysis of physical quantities, the interaction between them

More information