Systat: Statistical Visualization Software



Similar documents
KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Exercise 1.12 (Pg )

EPA's Data Analysis and Reporting Tool (DART)

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

430 Statistics and Financial Mathematics for Business

Diagrams and Graphs of Statistical Data

Spreadsheet software for linear regression analysis

Simple Predictive Analytics Curtis Seare

Directions for using SPSS

An introduction to using Microsoft Excel for quantitative data analysis

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

MTH 140 Statistics Videos

GeoGebra Statistics and Probability

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Univariate Regression

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 7: Simple linear regression Learning Objectives

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

EPA's Data Analysis and Reporting Tool (DART)

03 The full syllabus. 03 The full syllabus continued. For more information visit PAPER C03 FUNDAMENTALS OF BUSINESS MATHEMATICS

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Microsoft Excel. Qi Wei

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Chapter 13 Introduction to Linear Regression and Correlation Analysis

How To Use Statgraphics Centurion Xvii (Version 17) On A Computer Or A Computer (For Free)

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

Generating ABI PRISM 7700 Standard Curve Plots in a Spreadsheet Program

Data analysis and regression in Stata

Minitab Tutorials for Design and Analysis of Experiments. Table of Contents

SPSS Tests for Versions 9 to 13

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

How To Run Statistical Tests in Excel

Chapter 23. Inferences for Regression

Statistical Models in R

STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

TIPS FOR DOING STATISTICS IN EXCEL

Multiple Linear Regression

Data exploration with Microsoft Excel: analysing more than one variable

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

Using Excel for Statistical Analysis

Figure 1. An embedded chart on a worksheet.

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Please follow these guidelines when preparing your answers:

Psychology 205: Research Methods in Psychology

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Introduction to Regression and Data Analysis

Module 5: Statistical Analysis

Lean Six Sigma Black Belt-EngineRoom

Simple linear regression

From The Little SAS Book, Fifth Edition. Full book available for purchase here.

Data exploration with Microsoft Excel: univariate analysis

Nominal and Real U.S. GDP

How Does My TI-84 Do That

5. Linear Regression

Tutorial on Using Excel Solver to Analyze Spin-Lattice Relaxation Time Data

Using R for Linear Regression

Dealing with Data in Excel 2010

Using Excel for Statistics Tips and Warnings

Scatter Plots with Error Bars

Introduction to Exploratory Data Analysis

SPSS Introduction. Yi Li

Getting started in Excel

Using Excel for descriptive statistics

MULTIPLE REGRESSION EXAMPLE

The importance of graphing the data: Anscombe s regression examples

Fairfield Public Schools

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Tableau Your Data! Wiley. with Tableau Software. the InterWorks Bl Team. Fast and Easy Visual Analysis. Daniel G. Murray and

Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.

Simple Linear Regression

(More Practice With Trend Forecasts)

A Short Introduction to Eviews

2. Filling Data Gaps, Data validation & Descriptive Statistics

Using R for Windows and Macintosh

Moderation. Moderation

Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; and Dr. J.A. Dobelman

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

Data Analysis Tools. Tools for Summarizing Data

Data Mining and Visualization

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Simple Linear Regression Inference

GRADES 7, 8, AND 9 BIG IDEAS

Summarizing and Displaying Categorical Data

Applying Statistics Recommended by Regulatory Documents

Introduction to StatsDirect, 11/05/2012 1

Regression Clustering

DATA INTERPRETATION AND STATISTICS

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Transcription:

Systat: Statistical Visualization Software Hilary R. Hafner Jennifer L. DeWinter Steven G. Brown Theresa E. O Brien Sonoma Technology, Inc. Petaluma, CA Presented in Toledo, OH October 28, 2011 STI-910019-3946

Topics to Cover Systat basics What is Systat? Why use Systat? Overview of the user interface Resources Command language vs. menus Importing data Accepted file types Formatting Limitations Tips and tricks Analysis tools Graphs and analyses Statistics Data manipulation Command language vs. menus Creating variables, appends/merges, transformations, selections, and grouping Saving output Graph customizations Advanced graphs and analyses Regression, significance tests, nonparametric tests, factor analysis, cluster analysis, and analysis of variance (ANOVA) TOPICS 2

What Is Systat? Systat is statistical and graphical analysis software that allows you to explore your data using both menus and a batch command language (similar to macros) 1500 15 TNMOC 1000 500 BENZW 10 5 YEAR 0 0 8 16 24 HOUR 0 3000 2000 1000 Count 0 1000 2000 3000 Count 1994 1995 INTRODUCTION 3

Why Use Systat? In data analysis, we nearly always need to investigate central tendencies, correlations, trends, and other statistical descriptions of data Systat s graphical interface allows the analyst to immediately see the data and rapidly generate and regenerate graphs for review Systat contains statistical functions not found in Excel or Access INTRODUCTION 4

Systat Basics Graphical User Interface Viewspace Workspace Commandspace INTRODUCTION 5

Systat Basics File Types Data (filename.syd,.syz) Output (filename.syo) Command (filename.syc) INTRODUCTION 6

Systat Basics Resources Help a click away Index Search Mouse-overs, F1 key? button Command line Manuals Examples Training videos at http://www.systat.com/downloads/ Useful: Interface, data, graph, help INTRODUCTION 7

Systat Basics Resources INTRODUCTION 8

Command Language vs. Menus Systat is a Windows menu driven package, but full coverage of the menu is provided in the command language Commands are useful for repetitive analyses (and we almost never do anything just once!) Commands help the analyst document analyses that have been performed and where the output is stored Commands can be used in future analyses Log window in Systat records most actions Commands = faster! INTRODUCTION 10

Importing Data into Systat Accepted file formats Limitations Data formatting Tips and tricks IMPORTING DATA 12

$ signifies text field. signifies missing data Names > 1 word require underline char. Text field is left-justified. IMPORTING DATA 16

Tricks and Tips with Excel Data sets can be processed in Excel prior to bringing them into Systat Make date/time conversions and calculations in Excel (convert date/time into separate fields for day of week, month, day, year, etc.) Prepare sums and other calculations easily performed in Excel Copy/paste values to remove all formulae Check that records are continuous Replace missing values (e.g., -999) with. (Systat s missing value code) Save as Excel (designate by NAME_sys.xls) Note that only one page of a workbook can be selected per import Hot tip: Systat doesn t like the variable name temp IMPORTING DATA 17

Exploring Your Variables Ozone data: right click on variable statistics 19

O3 O3 Common Graphs and Analyses Systat can create numerous types of graphs and plots and perform many statistical functions 150 100 50 The analyst must determine the appropriate plot(s) to answer different types of questions 150 100 50 0 1,993 1,994 1,995 1,996 1,997 1,998 YEAR 1,999 2,000 2,001 2,002 WDWE DATA ANALYSIS 0-10 0 10 20 30 40 TEMP 1 2 22

Commonly Used Plots and Statistical Functions Summary statistics quantify data characteristics Histograms understand data distribution Bar charts compare quantities (counts, or means) Scatter plots understand relationships Box plots compare distribution and central tendencies Scatter plot matrices compare many relationships Correlation analysis quantify relationships Linear regression identify predictive variables Open (WY_Site0123_data_ct.syz) DATA ANALYSIS 23

Conc. (ppb) Summary Statistics Used for Trends Plots Average Ozone 60.00 50.00 40.00 30.00 20.00 2005 2006 2007 2008 10.00 0.00 0 5 10 15 20 25 Hour Diurnal trends in median ozone concentrations for a Wyoming site from 2005 to 2008 Overall increase in average ozone concentrations observed less titration? Plot was created in Excel from Systat summary statistics by year and hour DATA ANALYSIS 26

NO2 Scatter Plots Scatter plots are useful for determining relationships between variables 0.9 0.8 S25LC_7 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 SO425LC_7 Sulfur vs. Sulfate 40 30 These plots are useful for both data validation and analysis Are there outliers, and if so, how do they affect comparisons? What are the similarities/differences between parameters? 20 10 0 0 10 20 30 40 50 60 70 80 90 O3 NO 2 vs. Ozone DATA ANALYSIS 27

NO2 Example of Scatter Plot Do we see the expected relationships? REM The following command (PLOT) REM creates a scatter plot of NO2 REM concentrations by wind direction REM and year. PLOT NO2*RD / OVERLAY GROUP = {YEAR} 40 30 20 10 0 0 90 180 270 360 RD YEAR 2,005 2,006 2,007 2,008 This graphic explores NO 2 concentrations and resultant wind direction as a function of year. Is there a change in the direction of high concentrations in this time period? DATA ANALYSIS 28

Box-Whisker Plots Sample box-whisker plot and a notched box whisker plot as defined by Systat Always define this plot because different packages have different definitions DATA ANALYSIS Confidence Interval (CI) for a population parameter is an interval with an associated probability p that is generated from a random sample of an underlying population such that if the sampling was repeated numerous times and the confidence interval recalculated from each sample according to the same method, a proportion p of the confidence intervals would contain the population parameter in question. 29

Example of a Notched Box-Whisker Plot Notched box-whisker plots are useful for showing the central trends of the data (i.e., the median) while also showing variability (i.e., the box and whiskers) REM The following command (DENSITY) REM creates a notched box plot of ozone REM concentrations by year. DENSITY O3 * YEAR / BOX NOTCH COLOR=BLACK O3 = ozone (ppb) DATA ANALYSIS 30

Linear and Nonlinear Regression Regression analyses identify and quantify predictive relationships between variables Options Multiple linear regression Stepwise regression Automatic outlier and influential point detection Plots of residuals vs. predicted values Many nonlinear regression forms DATA ANALYSIS 36

Example Linear Regression Analysis Before performing linear regression, it is vital to examine a scatter plot of the data! Outliers at the ends of data set highly influence linear regression Total nonmethane organic compounds (TNMOC) and NO x at 7 a.m. in an urban setting should have relatively good correlation DATA ANALYSIS 38

Example Results Effect Coefficient Standard Error Std. Coefficient Tolerance t p-value CONSTANT 28.134 2.953 0.000. 9.527 0.000 NOX 2.485 0.112 0.706 1.000 22.283 0.000 Final equation: TNMOC =2.5(NOx)+28.1 Dependent Variable TNMOC N 502 Multiple R 0.706 Squared Multiple R 0.498 Adjusted Squared Multiple R 0.497 Standard Error of Estimate 41.259 Case 344 is an Outlier (Studentized Residual : 11.168) Case 2,360 has large Leverage (Leverage : 0.053) Case 2,576 has large Leverage (Leverage : 0.047) Case 2,648 has large Leverage (Leverage : 0.038) Case 2,936 has large Leverage (Leverage : 0.036) Case 5,408 has large Leverage (Leverage : 0.036) Random Scatter Desired Case 8,028 is an Outlier (Studentized Residual : 5.155) Case 11,490 has large Leverage (Leverage : 0.047) Case 14,536 has large Leverage (Leverage : 0.060) Case 16,240 has large Leverage (Leverage : 0.040) Case 17,488 has large Leverage (Leverage : 0.045) Case 18,256 has large Leverage (Leverage : 0.047) Case 19,432 is an Outlier (Studentized Residual : -4.275) 39

Summary Systat is a powerful graphical statistical tool Explore options and learn statistics through use of the Help facility and examples Share your command files, tips, and tricks with other users SUMMARY 71

Appendix Key Systat Commands Box plot in black and white DENSITY benz*year / BOX NOTCH COLOR=BLACK Save output, graphs OSAVE file path and name /rtf (best for multiple graphs such as with by command) GSAVE file path and name /wmf (also saves.bmp,.emf,.pct,.eps,.pg, and.cgm formats) Save file path and name Export to Excel EXPORT file path and name.xls /type=excel Note that this saves only 16,000 lines!!! (Excel 3.0) APPENDIX 72

Appendix Key Systat Commands Select range of data SELECT QCS=0 AND month>5 AND Month<10 Scatter plot matrix SPLOM var1 var2 etc. / half color=black Setting coordinates DENSITY benz*hour / BOX NOTCH COLOR=BLACK xmin=0 xmax=24 xtick =6 APPENDIX 73

Appendix Troubleshooting Ideas Importing Remove any formulas or formatting in file Make sure there are no gaps (empty lines) Make sure each column is uniquely named Save as Excel 3.0 or tab-delimited.txt Scripts Go via menu and compare log with script Move the stats line one line up or down APPENDIX 74