Kenneth Benoit Trinity College Dublin, Ireland
|
|
|
- Morris Byrd
- 10 years ago
- Views:
Transcription
1 530 Data Analysis, Exploratory The ultimate quantitative extreme in textual data analysis uses scaling procedures borrowed from item response theory methods developed originally in psychometrics. Both Jon Slapin and Sven-Oliver Proksch s Poisson scaling model and Burt Monroe and Ko Maeda s similar scaling method assume that word frequencies are generated by a probabilistic function driven by the author s position on some latent scale of interest and can be used to estimate those latent positions relative to the positions of other texts. Such methods may be applied to word frequency matrixes constructed from texts with no human decision making of any kind. The disadvantage is that while the scaled estimates resulting from the procedure represent relative differences between texts, they must be interpreted if a researcher is to understand what politically significant differences the scaled results represent. This interpretation is not always self-evident. Recent textual data analysis methods used in political science have also focused on classification: determining which category a given text belongs to. Recent examples include methods to categorize the topics debated in the U.S. Congress as a means of measuring political agendas. Variants on classification include recently developed methods designed to estimate accurately the proportions of categories of opinions about the U.S. presidency from blog postings, even though the classifier on which it is based performs poorly for individual texts. New methodologies for drawing more information from political texts continue to be developed, using clustering methodologies, more advanced item response theory models, support vector machines, and semisupervised and unsupervised machine learning techniques. Kenneth Benoit Trinity College Dublin, Ireland See also Data, Archival; Discourse Analysis; Interviews, Expert; Party Manifesto Further Readings Baumgartner, F. R., DeBoef, S. L., & Boydstun, A. E. (2008). The decline of the death penalty and the discovery of innocence. Cambridge, UK: Cambridge University Press. Klingemann, H.-D., Volkens, A., Bara, J., Budge, I., & McDonald, M. (2006). Mapping policy preferences II: Estimates for parties, electors, and governments in Eastern Europe, European Union and OECD Oxford, UK: Oxford University Press. Laver, M., Benoit, K., & Garry, J. (2003). Extracting policy positions from political texts using words as data. American Political Science Review, 97(2), Leites, N., Bernaut, E., & Garthoff, R. L. (1951). Politburo images of Stalin. World Politics, 3, Monroe, B., & Maeda, K. (2004). Talk s cheap: Textbased estimation of rhetorical ideal-points (Working Paper). Lansing: Michigan State University. Slapin, J. B., & Proksch, S.-O. (2008). A scaling model for estimating time-series party positions from texts. American Journal of Political Science, 52(3), Data Analysis, Exploratory John W. Tukey, the definer of the phrase exploratory data analysis (EDA), made remarkable contributions to the physical and social sciences. In the matter of data analysis, his groundbreaking contributions included the fast Fourier transform algorithm and EDA. He reenergized descriptive statistics through EDA and changed the language and paradigm of statistics in doing so. Interestingly, it is hard, if not impossible, to find a precise definition of EDA in Tukey s writings. This is no great surprise, because he liked to work with vague concepts, things that could be made precise in several ways. It seems that he introduced EDA by describing its characteristics and creating novel tools. His descriptions include the following: 1. Three of the main strategies of data analysis are: 1. graphical presentation. 2. provision of flexibility in viewpoint and in facilities, 3. intensive search for parsimony and simplicity. (Jones, 1986, Vol. IV, p. 8) 2. In exploratory data analysis there can be no substitute for flexibility; for adapting what is calculated and what we hope plotted both to the needs of the situation and the clues that the data have already provided. (p. 736) 3. I would like to convince you that the histogram is old-fashioned.... (p. 741)
2 Data Analysis, Exploratory Exploratory data analysis... does not need probability, significance or confidence. (p. 794) 5. I hope that I have shown that exploratory data analysis is actively incisive rather than passively descriptive, with real emphasis on the discovery of the unexpected. (p. lxii) 6. Exploratory data analysis is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as those we believe to be there. (p. 806) 7. Exploratory data analysis isolates patterns and features of the data and reveals these forcefully to the analyst. (Hoaglin, Mosteller, & Tukey, 1983, p. 1) 8. If we need a short suggestion of what exploratory data analysis is, I would suggest that: 1. it is an attitude, AND 2. a flexibility, AND 3. some graph paper (or transparencies, or both). (Jones, 1986, Vol. IV, p. 815) This entry presents a selection of EDA techniques including tables, five-number summaries, stem-and-leaf displays, scatterplot matrices, box plots, residual plots, outliers, bag plots, smoothers, reexpressions, and median polishing. Graphics are a common theme. These are tools for looking in the data for structure, or for the lack of it. Some of these tools of EDA will be illustrated here employing U.S. presidential elections data from 1952 through Specifically, Table 1 displays the percentage of the vote that the Democrats received in the states of California, Oregon, and Washington in those years. The percentages for the Republican and third-party candidates are not a present concern. In EDA, one seeks displays and quantities that provide insights, understanding, and surprises. Table A table is the simplest EDA object. It simply arranges the data in a convenient form. Table 1 is a two-way table. Five-Number Summary Given a batch of numbers, the five-number summary consists of the largest, smallest, median, and upper and lower quartiles. These numbers are useful for auditing a data set and for getting a feel for the data. More complex EDA tools may be based on them. For the California data, the five-number summary in percents is shown in Figure 1. These data are centered at 47.6% and have a spread measured by the interquartile range of 8.75%. Tukey actually employed related quantities in a hope to avoid confusion. Stem-and-Leaf Display The numbers of Table 1 provide all the information, yet condensations can prove better. Figure 2 Table 1 Percentages of the Votes Cast for the Democratic Candidate in the Presidential Years Year State California Oregon Washington Source: Statistical Abstracts of the U.S. Census Bureau. Minimum Lower Quartile Median Upper Quartile Maximum Figure 1 A Five-Number Summary for the California Democrat Percentages Note: The minimum,.9%, occurred in and the maximum, 61%, in 2008.
3 532 Data Analysis, Exploratory Figure 2 Stem-and-Leaf Displays, With Scales of 1 and 2, for the California Democratic Data provides a stem-and-leaf display for the data of the table. There are stems and leafs. The stem is a line with a value. See the numbers to the left of the. The leaves are numbers on a stem, the righthand parts of the values displayed. Using this exhibit, one can read, off various quartiles, the five-number summary approximately; see indications of skewness; and infer multiple modes. OR Scatterplot Matrix Figure 3 Scatterplots of Percentages for the States Versus Percentages for the States in Pairs Notes: A least squares line has been added as a reference. California; OR Oregon; Washington.
4 Data Analysis, Exploratory 533 Scatterplot Matrix Figure 3 displays individual scatterplots for the state pairs (, OR), (, ), and (OR, ). A least squares line has been added in each display to provide a reference. One sees the x and y values staying together. An advantage of the figure over three individual scatterplots is that one sees the plots simultaneously. Parallel box plots Outliers An outlier is an observation strikingly far from some central value. It is an unusual value relative to the bulk of the data. Commonly computed quantities such as averages and least squares lines can be drastically affected by such values. Methods to detect outliers and to moderate their effects are needed. So far, the tools discussed in this entry have not found any clear outliers. Box Plots A box plot consists of a rectangle with top and bottom sides at the levels of the quartiles, a horizontal line added at the level of the median, and whiskers, of length 1.5 times the interquartile range, added at the top and bottom. It is based on numerical values. Points outside these limits are plotted and are possible outliers. Figure 4 presents three box plots. When more than one box plot are present in a figure, they are referred to as parallel box plots. Figure 4 presents a parallel box plot display for the presidential data. The California values tend to be higher than those of Oregon and Washington. Those show a single outlier each and a skewing toward higher values. Both the outliers are for the 1964 election. Residual Plots A residual plot is another tool for detecting outliers and noticing unusual patterns. Suppose there is a fit to the data, say, a least squares line. The residuals are then the differences between the data and their corresponding fitted values. Consider the percentages in the table depending on the year of the election that is, consider the data as a time series (Figure 5). The time series of these three states track each other very well, and there is a suggestion of an outlier in each plot. Figure 4 OR Parallel Box Plots for the Percentages, One for Each State Figure 6 shows the residuals for the three states. Each display in Figure 6 shows an outlier near the top. They all correspond to year This was the first year after John Kennedy was assassinated, and Lyndon Johnson received a substantial sympathy vote. There also is a suggestion of temporal dependence. With today s large data sets, one wishes for automatic ways to identify and handle outliers and other unusual values. One speaks of resistant/robust methods, resistant methods being those not overly sensitive to the presence of outliers and robust ones being those not affected strongly by long tails in the distribution. In the case of bivariate data, one can consider the bag plot. Bag Plots The bag plot is a generalization of the box plots of Figure 4. It is often a convenient way to study the scatter of bivariate data. In the construction of a bag plot, one needs a bivariate median, analogs of the quartiles, and whiskers. Tukey and his collaborators developed these. The center of the bag plot is the Tukey median. The bag surrounds the center and contains the % of the observations with the greatest depth. The fence separates inliers from
5 534 Data Analysis, Exploratory percentages vs. year OR percentages vs. year percentages vs. year OR Figure 5 Graphs of the Individual State Democrat Percentages Versus the Election Years Notes: A least squares line has been added as a reference. California; OR Oregon; Washington. residuals OR residuals residuals Figure 6 Residuals From Least Squares Line Versus Year With 0-Line Added outliers. Lines called whiskers mark observations between the bag and the fence. The fence is obtained by inflating the bag, from the center, by a factor of 3. Figure 7 provides bag plots for each of the pairs (, OR), (, ), and (OR, ). One sees an apparent outlier in both the California versus Oregon and the California versus Washington cases. Interestingly there is not one for the Oregon versus Washington case. On inspection, it is seen that the outliers correspond to the 1964 election. One also sees that the points
6 Data Analysis, Exploratory 5 vs. OR vs. OR vs. Figure 7 Bag Plots for the State Pairs, Percentages Versus Percentages Note: The bivariate median is the black spot within the bag (darker shading), and the fence is the outer boundary. vaguely surround a line. Because of the bag plot s resistance to outliers, the unusual point does not affect its location and shape. Smoother Smoothers have as a goal the replacement of a scatter of points by a smooth curve. Sometimes the effect of smoothing is dramatic and a signal appears. The curve resulting from smoothing might be a straight line. More usefully, a local least squares fit might be employed with the local curves, y f(x), a quadratics. The local character is often introduced by employing a kernel. A second kernel might be introduced to make the operation robust/resistant. It will have the effect of reducing the impact of points with large residuals. Figure 8 shows the result of local smoothing of the Democrat percentages as a function of election year. The loess procedure was employed. percentage vs. year OR percentage vs. year percentage vs. year OR Figure 8 Percents Versus Year With a Loess Curve Superposed
7 536 Data Analysis, Exploratory : robust smooth : robust smooth : robust smooth Figure 9 The Curves Plotted Are Now Resistant to Outliers 0.4 Median polish results for, OR, and procedure, it is important to study both the outlier and the robust/resistant curve Figure The Year Effect Obtained for the Election Data via Median Polishing The curves have a similar general appearance. They are pulled up by the outlier at the 1964 point. Robust Variant The behavior in 1964 being understood to a degree, one would like an automatic way to obtain a curve not so strongly affected by this outlier. The loess procedure has a robust/resistant variant. The results follow in Figure 9. Having understood that 1964 was an unusual year, one can use a robust curve to understand the other values better. The plots have similar shapes. One sees a general growth in the Democrat percentages starting around. In this two-step Reexpression This term refers to expressing the same information by different numbers, for example, using logit log(p/(1 p)) instead of the proportion p. The purpose may be additivity, obtaining straightness or symmetry, or making variability more nearly uniform. The final method is a tool for working with two-way tables. Median Polish This is a process of alternately finding and subtracting medians from rows and then columns and perhaps continuing to do this until the results do not change much. One purpose is to seek an additive model for a two-way table, in the presence of outliers in the data. The state percentages in Table 1 form a 3 15 table and a candidate for median polish. The resulting row (year) effects are shown in Figure 10. These effects are not meant to be strongly affected by outliers. Figure 10 shows the same general curve as in Figure 5.
8 Data Visualization 537 This entry ends with Tukey s 1973 rejoinder: Undoubtedly, the swing to exploratory data analysis will go somewhat too far (cited in Jones, Vol. III, p. lxii). David R. Brillinger University of California, Berkeley Berkeley, California, United States See also Cross-Tabular Analysis; Data Visualization; Graphics, Statistical; Statistics: Overview Further Readings Bashford, K. E., & Tukey, J. W. (1999). Graphical analysis of multivariate data. London: Chapman & Hall. Brillinger, D. R. (2002). John W. Tukey: The life and professional contributions. Annals of Statistics, 30, Hoaglin, D. C., Mosteller, F., & Tukey, J. W. (1983). Understanding robust and exploratory data analysis. New York: Wiley. Jones, L. V. (Ed.). (1986). The collected works of John W. Tukey: Philosophy and principles of data analysis (Vols. III & IV). London: Chapman & Hall. McNeil, D. R. (1977). Interactive data analysis. New York: Wiley. Tukey, J. W. (1962). The future of data analysis. Annals of Mathematical Statistics, 33, Tukey, J. W. (1977). Exploratory data analysis. Reading, PA: Addison-Wesley. Velleman, P. F., & Hoaglin, D. C. (1981). Applications, basics, and computing of exploratory data analysis. Boston: Duxbury. Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S (4th ed.). New York: Springer. Data Visualization The basic objective of data visualization is to provide an efficient graphical display for summarizing and reasoning about quantitative information. Data visualization should be distinguished from other types of visualization used in political science (more general information and knowledge visualization, concept visualization, strategy and work flow visualization, metaphor visualization, etc.) as it is more specific to the representation of quantitative data existing in the form of numerical tables. In the following sections, the different types and methods of data visualization and their application in political science are presented. Chart Types and Methods During the past decades, political science has accumulated a large corpus of various kinds of data such as comprehensive fact books and atlases, characterizing all or most of existing states by multiple and objectively assessed numerical indicators within certain time periods (e.g., OECD Factbook and Political Atlas of the Modern World). As a consequence, there exists a tendency for political science to gradually become a more quantitative scientific field and to use quantitative information in analysis and reasoning. Any analysis in political science must be multidimensional and combine various sources of information; however, human capabilities for perception of large amounts of numerical information are limited. Hence, methods and approaches for the visualization of quantitative and qualitative data (especially multivariate data) are an extremely important topic in political science. Data visualization approaches can be classified into several groups, starting from creating informative charts and diagrams (statistical graphics and infographics) and ending with advanced statistical methods for visualizing multidimensional tables containing both quantitative and qualitative information. Data visualization in political science takes advantage of recent developments in computer science and computer graphics, statistical methods, methods of information visualization, visual design, and psychology. Data visualization in political science has certain special features such as the frequent use of geographical maps, which creates a link with the well-developed field of geographic information systems (GIS). Furthermore, numerical tables in political science are often incomplete, which makes important the use of methods dealing with missing or uncertainly measured data entries. There are two main types of numerical tables that can be the subject of data visualization. The first one is called an object feature table, where each row represents an observation or an object and each column corresponds to a numerical feature or indicator commonly measured for the whole set of objects. An example of such an object feature table
Diagrams and Graphs of Statistical Data
Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in
Exploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
Summarizing and Displaying Categorical Data
Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency
COMMON CORE STATE STANDARDS FOR
COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in
Variables. Exploratory Data Analysis
Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is
How To Write A Data Analysis
Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction
The Comparisons. Grade Levels Comparisons. Focal PSSM K-8. Points PSSM CCSS 9-12 PSSM CCSS. Color Coding Legend. Not Identified in the Grade Band
Comparison of NCTM to Dr. Jim Bohan, Ed.D Intelligent Education, LLC [email protected] The Comparisons Grade Levels Comparisons Focal K-8 Points 9-12 pre-k through 12 Instructional programs from prekindergarten
Week 1. Exploratory Data Analysis
Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam
3. Data Analysis, Statistics, and Probability
3. Data Analysis, Statistics, and Probability Data and probability sense provides students with tools to understand information and uncertainty. Students ask questions and gather and use data to answer
MATH 4470/5470 EXPLORATORY DATA ANALYSIS ONLINE COURSE SYLLABUS
MATH 4470/5470 EXPLORATORY DATA ANALYSIS ONLINE COURSE SYLLABUS COURSE DESCRIPTION Introduction to modern techniques in data analysis, including stem-and-leafs, box plots, resistant lines, smoothing and
Teaching Multivariate Analysis to Business-Major Students
Teaching Multivariate Analysis to Business-Major Students Wing-Keung Wong and Teck-Wong Soon - Kent Ridge, Singapore 1. Introduction During the last two or three decades, multivariate statistical analysis
Lecture 1: Review and Exploratory Data Analysis (EDA)
Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel [email protected] Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course
STAT355 - Probability & Statistics
STAT355 - Probability & Statistics Instructor: Kofi Placid Adragni Fall 2011 Chap 1 - Overview and Descriptive Statistics 1.1 Populations, Samples, and Processes 1.2 Pictorial and Tabular Methods in Descriptive
BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI
STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members
Demographics of Atlanta, Georgia:
Demographics of Atlanta, Georgia: A Visual Analysis of the 2000 and 2010 Census Data 36-315 Final Project Rachel Cohen, Kathryn McKeough, Minnar Xie & David Zimmerman Ethnicities of Atlanta Figure 1: From
Module 2: Introduction to Quantitative Data Analysis
Module 2: Introduction to Quantitative Data Analysis Contents Antony Fielding 1 University of Birmingham & Centre for Multilevel Modelling Rebecca Pillinger Centre for Multilevel Modelling Introduction...
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
A Correlation of. to the. South Carolina Data Analysis and Probability Standards
A Correlation of to the South Carolina Data Analysis and Probability Standards INTRODUCTION This document demonstrates how Stats in Your World 2012 meets the indicators of the South Carolina Academic Standards
What is the purpose of this document? What is in the document? How do I send Feedback?
This document is designed to help North Carolina educators teach the Common Core (Standard Course of Study). NCDPI staff are continually updating and improving these tools to better serve teachers. Statistics
Statistics Chapter 2
Statistics Chapter 2 Frequency Tables A frequency table organizes quantitative data. partitions data into classes (intervals). shows how many data values are in each class. Test Score Number of Students
Exploratory Data Analysis
Exploratory Data Analysis Johannes Schauer [email protected] Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th
Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th Standard 3: Data Analysis, Statistics, and Probability 6 th Prepared Graduates: 1. Solve problems and make decisions that depend on un
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)
Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)
Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center
Fairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
Intro to Statistics 8 Curriculum
Intro to Statistics 8 Curriculum Unit 1 Bar, Line and Circle Graphs Estimated time frame for unit Big Ideas 8 Days... Essential Question Concepts Competencies Lesson Plans and Suggested Resources Bar graphs
Descriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures
Introductory Statistics Lectures Visualizing Data Descriptive Statistics I Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the
Common Tools for Displaying and Communicating Data for Process Improvement
Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot
Algebra 1 Course Information
Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through
business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS
DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 [email protected] 1. Descriptive Statistics Statistics
Lecture 2. Summarizing the Sample
Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting
EXPLORING SPATIAL PATTERNS IN YOUR DATA
EXPLORING SPATIAL PATTERNS IN YOUR DATA OBJECTIVES Learn how to examine your data using the Geostatistical Analysis tools in ArcMap. Learn how to use descriptive statistics in ArcMap and Geoda to analyze
Foundation of Quantitative Data Analysis
Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1
MTH 140 Statistics Videos
MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative
Exercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY
RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY I. INTRODUCTION According to the Common Core Standards (2010), Decisions or predictions are often based on data numbers
Manhattan Center for Science and Math High School Mathematics Department Curriculum
Content/Discipline Algebra 1 Semester 2: Marking Period 1 - Unit 8 Polynomials and Factoring Topic and Essential Question How do perform operations on polynomial functions How to factor different types
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize
Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, 5-8 8-4, 8-7 1-6, 4-9
Glencoe correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 STANDARDS 6-8 Number and Operations (NO) Standard I. Understand numbers, ways of representing numbers, relationships among numbers,
Problem Solving and Data Analysis
Chapter 20 Problem Solving and Data Analysis The Problem Solving and Data Analysis section of the SAT Math Test assesses your ability to use your math understanding and skills to solve problems set in
Exploratory Data Analysis with MATLAB
Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
Grade 6 Mathematics Assessment. Eligible Texas Essential Knowledge and Skills
Grade 6 Mathematics Assessment Eligible Texas Essential Knowledge and Skills STAAR Grade 6 Mathematics Assessment Mathematical Process Standards These student expectations will not be listed under a separate
MBA 611 STATISTICS AND QUANTITATIVE METHODS
MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain
Exploratory Spatial Data Analysis
Exploratory Spatial Data Analysis Part II Dynamically Linked Views 1 Contents Introduction: why to use non-cartographic data displays Display linking by object highlighting Dynamic Query Object classification
CRLS Mathematics Department Algebra I Curriculum Map/Pacing Guide
Curriculum Map/Pacing Guide page 1 of 14 Quarter I start (CP & HN) 170 96 Unit 1: Number Sense and Operations 24 11 Totals Always Include 2 blocks for Review & Test Operating with Real Numbers: How are
Exploratory Data Analysis. Psychology 3256
Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find
Visualization Quick Guide
Visualization Quick Guide A best practice guide to help you find the right visualization for your data WHAT IS DOMO? Domo is a new form of business intelligence (BI) unlike anything before an executive
How To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
What Does the Normal Distribution Sound Like?
What Does the Normal Distribution Sound Like? Ananda Jayawardhana Pittsburg State University [email protected] Published: June 2013 Overview of Lesson In this activity, students conduct an investigation
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 [email protected] Genomics A genome is an organism s
Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard
Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express
Regression III: Advanced Methods
Lecture 4: Transformations Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture The Ladder of Roots and Powers Changing the shape of distributions Transforming
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
Big Ideas in Mathematics
Big Ideas in Mathematics which are important to all mathematics learning. (Adapted from the NCTM Curriculum Focal Points, 2006) The Mathematics Big Ideas are organized using the PA Mathematics Standards
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different
Northumberland Knowledge
Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about
Common Core Unit Summary Grades 6 to 8
Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations
Geostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras [email protected]
Data Analysis, Statistics, and Probability
Chapter 6 Data Analysis, Statistics, and Probability Content Strand Description Questions in this content strand assessed students skills in collecting, organizing, reading, representing, and interpreting
Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
What is Data Analysis. Kerala School of MathematicsCourse in Statistics for Scientis. Introduction to Data Analysis. Steps in a Statistical Study
Kerala School of Mathematics Course in Statistics for Scientists Introduction to Data Analysis T.Krishnan Strand Life Sciences, Bangalore What is Data Analysis Statistics is a body of methods how to use
Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
Data Visualization Techniques
Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The
Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)
Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence
In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.
MATHEMATICS: THE LEVEL DESCRIPTIONS In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. Attainment target
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document
Descriptive Statistics and Exploratory Data Analysis
Descriptive Statistics and Exploratory Data Analysis Dean s s Faculty and Resident Development Series UT College of Medicine Chattanooga Probasco Auditorium at Erlanger January 14, 2008 Marc Loizeaux,
Introduction to Exploratory Data Analysis
Introduction to Exploratory Data Analysis A SpaceStat Software Tutorial Copyright 2013, BioMedware, Inc. (www.biomedware.com). All rights reserved. SpaceStat and BioMedware are trademarks of BioMedware,
Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This
List of Examples. Examples 319
Examples 319 List of Examples DiMaggio and Mantle. 6 Weed seeds. 6, 23, 37, 38 Vole reproduction. 7, 24, 37 Wooly bear caterpillar cocoons. 7 Homophone confusion and Alzheimer s disease. 8 Gear tooth strength.
Data Visualisation and Its Application in Official Statistics. Olivia Or Census and Statistics Department, Hong Kong, China [email protected].
Data Visualisation and Its Application in Official Statistics Olivia Or Census and Statistics Department, Hong Kong, China [email protected] Abstract Data visualisation has been a growing topic of
T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these
This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.
Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course
Interactive Data Mining and Visualization
Interactive Data Mining and Visualization Zhitao Qiu Abstract: Interactive analysis introduces dynamic changes in Visualization. On another hand, advanced visualization can provide different perspectives
Street Address: 1111 Franklin Street Oakland, CA 94607. Mailing Address: 1111 Franklin Street Oakland, CA 94607
Contacts University of California Curriculum Integration (UCCI) Institute Sarah Fidelibus, UCCI Program Manager Street Address: 1111 Franklin Street Oakland, CA 94607 1. Program Information Mailing Address:
Data Visualization Techniques
Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The
Problem of the Month Through the Grapevine
The Problems of the Month (POM) are used in a variety of ways to promote problem solving and to foster the first standard of mathematical practice from the Common Core State Standards: Make sense of problems
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Exam Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) The government of a town needs to determine if the city's residents will support the
3: Summary Statistics
3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes
Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode
Iris Sample Data Set Basic Visualization Techniques: Charts, Graphs and Maps CS598 Information Visualization Spring 2010 Many of the exploratory data techniques are illustrated with the Iris Plant data
Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives
Bar Graphs and Dot Plots
CONDENSED L E S S O N 1.1 Bar Graphs and Dot Plots In this lesson you will interpret and create a variety of graphs find some summary values for a data set draw conclusions about a data set based on graphs
1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers
1.3 Measuring Center & Spread, The Five Number Summary & Boxplots Describing Quantitative Data with Numbers 1.3 I can n Calculate and interpret measures of center (mean, median) in context. n Calculate
Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data
A Few Sources for Data Examples Used Introduction to Environmental Statistics Professor Jessica Utts University of California, Irvine [email protected] 1. Statistical Methods in Water Resources by D.R. Helsel
Chapter 111. Texas Essential Knowledge and Skills for Mathematics. Subchapter B. Middle School
Middle School 111.B. Chapter 111. Texas Essential Knowledge and Skills for Mathematics Subchapter B. Middle School Statutory Authority: The provisions of this Subchapter B issued under the Texas Education
The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)
Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data,
Visualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
Box-and-Whisker Plots
Mathematics Box-and-Whisker Plots About this Lesson This is a foundational lesson for box-and-whisker plots (boxplots), a graphical tool used throughout statistics for displaying data. During the lesson,
Describing, Exploring, and Comparing Data
24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter
Correlation and Regression
Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look
2: Frequency Distributions
2: Frequency Distributions Stem-and-Leaf Plots (Stemplots) The stem-and-leaf plot (stemplot) is an excellent way to begin an analysis. Consider this small data set: 218 426 53 116 309 504 281 270 246 523
For example, estimate the population of the United States as 3 times 10⁸ and the
CCSS: Mathematics The Number System CCSS: Grade 8 8.NS.A. Know that there are numbers that are not rational, and approximate them by rational numbers. 8.NS.A.1. Understand informally that every number
EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA
EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA Michael A. Walega Covance, Inc. INTRODUCTION In broad terms, Exploratory Data Analysis (EDA) can be defined as the numerical and graphical examination
A DATA ANALYSIS TOOL THAT ORGANIZES ANALYSIS BY VARIABLE TYPES. Rodney Carr Deakin University Australia
A DATA ANALYSIS TOOL THAT ORGANIZES ANALYSIS BY VARIABLE TYPES Rodney Carr Deakin University Australia XLStatistics is a set of Excel workbooks for analysis of data that has the various analysis tools
