USING THE R STATISTICAL SOFTWARE IN INITIAL TERMS OF CONTROL ENGINEERING COURSE



Similar documents
International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Overview Accounting for Business (MCD1010) Introductory Mathematics for Business (MCD1550) Introductory Economics (MCD1690)...

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Multiple Linear Regression

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE. School of Mathematical Sciences

MASTER COURSE SYLLABUS-PROTOTYPE PSYCHOLOGY 2317 STATISTICAL METHODS FOR THE BEHAVIORAL SCIENCES

Economic Statistics (ECON2006), Statistics and Research Design in Psychology (PSYC2010), Survey Design and Analysis (SOCI2007)

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

UNIT 1: COLLECTING DATA

Description. Textbook. Grading. Objective

I ~ 14J... <r ku...6l &J&!J--=-O--

430 Statistics and Financial Mathematics for Business

training programme in pharmaceutical medicine Clinical Data Management and Analysis

How To Write A Data Analysis

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Diploma of Business Unit Guide 2015

PROBABILITY AND STATISTICS. Ma To teach a knowledge of combinatorial reasoning.

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS. Biostatistics

Factors affecting online sales

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Chapter 13 Introduction to Linear Regression and Correlation Analysis

DAIC - Advanced Experimental Design in Clinical Research

Name of the module: Multivariate biostatistics and SPSS Number of module:

User Behaviour on Google Search Engine

Street Address: 1111 Franklin Street Oakland, CA Mailing Address: 1111 Franklin Street Oakland, CA 94607

Generalized Linear Models

UNIVERSITY of MASSACHUSETTS DARTMOUTH Charlton College of Business Decision and Information Sciences Fall 2010

Teaching Multivariate Analysis to Business-Major Students

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Title: Lending Club Interest Rates are closely linked with FICO scores and Loan Length

The importance of graphing the data: Anscombe s regression examples

Univariate Regression

SAS R IML (Introduction at the Master s Level)

MULTIPLE REGRESSION EXAMPLE

Categorical Data Analysis

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences Academic Year Qualification.

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

MTH 140 Statistics Videos

240ST014 - Data Analysis of Transport and Logistics

Introduction to Regression and Data Analysis

Handling attrition and non-response in longitudinal data

Business Statistics MATH 222. Eagle Vision Home. Course Syllabus

A Brief Introduction to Factor Analysis

CLARKSON SECONDARY SCHOOL. Course Name: Mathematics of Data Management Grade 12 University

2015 TUHH Online Summer School: Overview of Statistical and Path Modeling Analyses

School of Mathematics and Science MATH 153 Introduction to Statistical Methods Section: WE1 & WE2

Mathematics within the Psychology Curriculum

Categorical Data Analysis

RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY

MASTER COURSE SYLLABUS

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Data Mining Part 5. Prediction

NEW YORK CITY COLLEGE OF TECHNOLOGY The City University of New York

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Scatter Plot, Correlation, and Regression on the TI-83/84

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Diablo Valley College Catalog

Simple Predictive Analytics Curtis Seare

STATISTICA Formula Guide: Logistic Regression. Table of Contents

table to see that the probability is (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: = 1.

MAT 120 Probability & Statistics Mathematics

Teaching Biostatistics to Postgraduate Students in Public Health

South Carolina College- and Career-Ready (SCCCR) Algebra 1

PCHS ALGEBRA PLACEMENT TEST

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

Correlation and Simple Linear Regression

Diagrams and Graphs of Statistical Data

SAS Software to Fit the Generalized Linear Model

Extent of work (hours)

Simple Linear Regression Inference

A Prototype System for Educational Data Warehousing and Mining 1

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

Bachelor's Degree in Business Administration and Master's Degree course description

QUANTITATIVE METHODS. for Decision Makers. Mik Wisniewski. Fifth Edition. FT Prentice Hall

Electronic Thesis and Dissertations UCLA

Truman College-Mathematics Department Math 125-CD: Introductory Statistics Course Syllabus Fall 2012

Course Descriptions: Undergraduate/Graduate Certificate Program in Data Visualization and Analysis

Market Size Forecasting Using the Monte Carlo Method. Multivariate Solutions

TIME SERIES ANALYSIS

Cleveland State University NAL/PAD/PDD/UST 504 Section 51 Levin College of Urban Affairs Fall, 2009 W 6 to 9:50 pm UR 108

ATV - Lifetime Data Analysis

Implications of Big Data for Statistics Instruction 17 Nov 2013

MBA Legal Management. A bespoke MBA for lawyers.

AP Statistics: Syllabus 1

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Chapter 7: Simple linear regression Learning Objectives

Data exploration with Microsoft Excel: analysing more than one variable

AC - Clinical Trials

Finite Mathematics Using Microsoft Excel

Using R for Linear Regression

THE ROLE OF THE COMPUTER IN THE TEACHING OF STATISTICS. Tarmo Pukkila and Simo Puntanen University of Tampere, Finland

MATHEMATICAL METHODS OF STATISTICS

MATH 140 HYBRID INTRODUCTORY STATISTICS COURSE SYLLABUS

Transcription:

USING THE R STATISTICAL SOFTWARE IN INITIAL TERMS OF CONTROL ENGINEERING COURSE Marcos Antonio Cruz Moreira macruz@iff.edu.br IF Fluminense Campus Macaé Curso de Engenharia de Automação e Controle Rodovia Amaral Peixoto km 164 27.973-030 Macaé - RJ Natália Souto nati.souto@yahoo.com.br Abstract: This paper describes the experience of using the R package as a tool of Statistics learning in the initial terms of an engineering graduation course. The proceedings adopted during a one semester course corresponding to the third term are presented. Some excerpts from the assignments are commented. The work aimed the acquaintance of the students with an environment for statistical computing and graphics, as well as to facilitate the learning of that syllabus subject through graphical facilities for data analysis. The students opinions about the proceeding, taken one year after the closing of the term are also presented. Palavras-chave: Statistics, R 1. INTRODUCTION The option of R environment use in engineering education in IF Fluminense (RJ) have aroused from coexistence on campus Macaé of the Control Engineering graduation course and a MSc course in Environmental Engineering. Due to the widespread use of R in environmental studies (REIMANN, C. et al., 2008) it seems straightforward to extend its application to the basics matters of graduation course. The syllabus of the Control Engineering course in IFF / Campus Macaé comprises two occasions of Probability and Statistics studies. The first one at the second term of the course, includes sampling, randomization and basic concepts of probability. The second group of classes, at the third term, comprises Discrete and Continuous Probability Distributions, Confidence Intervals and Linear Regression. In this second group of subjects the use of R as a learning tool was tested and this experience is related here. Next section shows the methodology applied during the course development, as well as examples of assignments proposed to be done by the students out of the classes. It follows some excerpts from the assignments, and the assessment of the experience performed by the students. These are now attending the 5th term of their Engineering course, half way to conclude it.

2. METODOLOGY At all the classes the theoretical concepts were presented and furthermore, the R commands available to perform the required statistical analysis. When similar functions were also available at usual commercial software, these were presented too, though students were strongly recommended to try the R package. The exploration of graphical resources and looping possibilities of the R environment were left as assignments to be solved along with statistical calculations and delivered in a one week period. Some examples are presented ahead. The survey of students' views made one year after the closing of the discipline was a research intentional decision to avoid bias in the response of those who were still studying the subject and concerned about possible implications of their responses. 2.1. Binomial Distribution Concerning Binomial Distribution, the proposed assignment to be in R environment developed was to calculate and plot binomial distributions from 0 to 20 successful events on a set of 20 Bernoulli experiments. This should be done for given success probabilities associated with the experiment, in the case 5 %, 10 % and 25 %. 2.2. Normal Distribution The assignment asks the students to identify and plot using the R package the z values such that a gaussian distribution contains a given probability distribution between µ ± zσ. 2.3. Linear Regression Using data from Statistical Abstracts of the United States (AGRESTI & FINLAY, 2009) students were requested to obtain pair wise plots using pairs function as a useful high-level plotting function. It works as a tool for displaying and exploring multivariate data, producing a scatter plot between all possible pairs of variables in a dataset, and using the same scale. Once achieved these dispersion graphs, identify, among them, those that seemed to show closer linear association between two variables. Then, based on any reasonable concept obtained from a reference literature, choose among them one as explanatory and the other as response variable, and find out the linear regression equation that could be used to estimate the relationship between them. 3. RESULTS Among the responses to the assignments, the clearer ones were selected (from the second author) and are shown in this section. The results obtained using R environment code, show a straightforward association with theoretical concepts and also illustrate the usefulness of R graphical methods to introduce students in statistical literacy, reasoning and thinking.

3.1. Binomial Distribution R Script and Graph (Figure 1) shown as follows Script: y1 <- array(0:0, dim=c(1,21)) y2 <- array(0:0, dim=c(1,21)) y3 <- array(0:0, dim=c(1,21)) for (j in 1:21) {y1[j]<-(dbinom(j-1,20,0.05));y2[j] <-(dbinom(j-1,20,0.10)); y3[j]<-(dbinom(j-1,20,0.25))} i <- array(0:20, dim=c(1,21)) plot(i,y1,pch=16,col="red",type="l",lwd=4,xlab="number of events",ylab="probability") points(i,y2,pch=16,col="green",type="l",lwd=4) points(i,y3,pch=16,col="blue",type="l",lwd=4,lty=2) title(main="binomial Distribution") legend(locator(1),c("p=0.05","p=0.10","p=0.25"), col=c("red","green","blue"),pch=16,bty="o") Figure 1 Binomial Distribution Plot

3.2. Normal Distribution R Script and Graph (Figure 2) shown below Script: band2<-function(prob) {x=seq(-6,6,length=200); y=dnorm(x); plot(x,y,type="l", lwd=2, col="blue"); tails<-(1-prob)/2; lim1<-qnorm(tails,0,1) x=seq(lim1,-lim1,length=200) y=dnorm(x) polygon(c(lim1,x,-lim1),c(0,y,0),col="green")} band2(0.90); p<-0.90; z = qnorm(0.05,0,1); mtext(bquote(p ==.(p*100)), line= 3) mtext(bquote(z ==.(z)), line=.25) Figure 2 Normal Distribution Plot

3.3. Linear Regression Table 1 Statewide Data Used to Illustrate Regression Analyses R Script and Graphs (Figure 3) shown below Script: VC <- read.table("g:/rl.txt", header=true) # data acquisition from a txt file cor(vc) # correlation between variables Violent_Crime Murder_rate Poverty_rate Single_Parent Violent_Crime 1.0000000 0.7103507 0.3432887 0.5061586 Murder_rate 0.7103507 1.0000000 0.6979947 0.7574187 Poverty_rate 0.3432887 0.6979947 1.0000000 0.4930857 Single_Parent 0.5061586 0.7574187 0.4930857 1.0000000 pairs(vc) # dispersion plots of variables pairs lm(formula = Murder_rate ~ Single_Parent, data = VC) # linear regression coefficients taken between the variables that appeared to show more # association in dispersion plots Call: lm(formula = Murder_rate ~ Single_Parent, data = VC)

Coefficients: (Intercept) Single_Parent -15.058 2.044 # results Figure 3 Paired Observations 4. ASSESSMENT The evaluation questionnaires were presented to 10 students taking this course, one year after the completion of the period. We sought to identify through six questions using Likert scale, the perception of students in two respects: the R package's usefulness as a tool for statistical calculations and its usefulness as a tool to support the learning process. Counting the results, it was found predominantly positive feedback, which can be categorized as good or regular as regards the use of R as a tool for calculating and a predominantly negative evaluation categorized as fair or poor with regard to the use in the support learning process. Of course these results are subject to inaccuracies because the

sampling was limited to 1/4 of the original class and has yet a bias due to have been answered by those who made it voluntarily. 5. CONCLUSION The software has an interface heavily based on the command line and a poor graphical user interface (GUI). Although there is extensive documentation available online in various sites, could hardly be called user friendly. These features make the R environment more familiar and interesting to students who are biased to scholarly works than to students who cares pragmatically in joining the labor market soon. Nevertheless, it constitutes a powerful tool to be used in introductory courses in statistics in engineering as it easily associates graphical representation to basic concepts of probability and statistics. Of course this does not preclude the use and applicability of the software on more advanced topics of the engineering course, given its ability to treat advanced statistics. Thanks To students who have completed their tasks in the course and those who responded to the survey assessment. 6. REFERENCES AGRESTI, Alan; FINLAY, Barbara. Statistical Methods for the Social Sciences. 4. ed. New Jersey: Pearson Prentice Hall, 2009. 256 p. REIMANN, Clemens; FILZMOSER, Peter; GARRET, Robert G.; DUTTER, Rudolf. Statistical Data Analysis Explained: Applied Environmental Statistics with R. 1. ed. West Sussex: John Wiley & Sons, Ltd., 2008. KUHNERT, Petra; VENABLES, Bill. An Introduction to R: Software for Statistical Modelling & Computing. Available at <http://www.csiro.au/organisation- Structure/Divisions/Mathematics-Informatics-and-Statistics/Rcoursenotes.aspx> Access on 08 may 2011. R Documentation. Available at < http://www.r-project.org/> Access on 01 June 2012.