BRINGING NEW OPPORTUNITIES TO DEVELOP STATISTICAL SOFTWARE AND DATA ANALYSIS TOOLS IN ROMANIA



Similar documents
R a Global Sensation in Data Science

MANIPULATION OF LARGE DATABASES WITH "R"

Statistical Data Analysis via R and PHP: A Case Study Of the Relationship Between GDP and Foreign Direct Investments for The Republic Of Moldova

R-evolution in Time Series Analysis Software Applied on R-omanian Capital Market

GETTING STARTED WITH R AND DATA ANALYSIS

R Tools Evaluation. A review by Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015

Business Analytics: A Knowledge Community and Repository Infrastructure for R Models. Master Teamproject Prof. Dr. Alexander Mädche, Martin Kretzer

The power of IBM SPSS Statistics and R together

Better decision making under uncertain conditions using Monte Carlo Simulation

What is R? R s Advantages R s Disadvantages Installing and Maintaining R Ways of Running R An Example Program Where to Learn More


MySQL databases as part of the Online Business, using a platform based on Linux

Welcome to the second half ofour orientation on Spotfire Administration.

A new ranking of the world s most innovative countries: Notes on methodology. An Economist Intelligence Unit report Sponsored by Cisco

Module 3: Correlation and Covariance

UNDERSTANDING INTERNAL AND EXTERNAL ENVIRONMENT FOR A FLEXIBLE PRODUCTION APPROACH

R: A Free Software Project in Statistical Computing

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Multiple Linear Regression

How To Perform Predictive Analysis On Your Web Analytics Data In R 2.5

R language in data mining techniques and statistics

Strengths and Limitations of Nagios as a Network Monitoring Solution

2014 New Jersey Core Curriculum Content Standards - Technology

Simple Business Dashboard Design Strategies

RevoScaleR Speed and Scalability

SPSS Guide: Regression Analysis

MULTIPLE CHOICE FREE RESPONSE QUESTIONS

STATA and Medical Data Science - An Indian Perspective

BIG DATA: PROMISE, POWER AND PITFALLS NISHANT MEHTA

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Data Warehousing and Data Mining in Business Applications

4 G: Identify, analyze, and synthesize relevant external resources to pose or solve problems. 4 D: Interpret results in the context of a situation.


A QuestionPro Publication

Education & Training Plan. Entrepreneurship Certificate Program with Externship. Columbia Southern University (CSU)

Predictive Analytics Certificate Program

SWOT ANALYSIS WORKSHOP GUIDE

ANOTHER GENERATION OF GENERAL EDUCATION

Statistical Computing / Computational Statistics

Revised Modularity Index to Measure Modularity of OSS Projects with Case Study of Freemind

Automated reporting does not have to mean compromising on quality!

Data Doesn t Communicate Itself Using Visualization to Tell Better Stories

Notes on computer programmes for statistical analysis

The Influence of Media Channel on the Booking Behavior of Hotel Guests

Session 12. Aggregate Supply: The Phillips curve. Credibility

Tutorial: Get Running with Amos Graphics

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Binary Logistic Regression

Direct test of Harville's multi-entry competitions model on race-track betting data

CORRELATION ANALYSIS

Tactical Routing. The leading solution for optimizing your transport on a tactical level

IN THE CITY OF NEW YORK Decision Risk and Operations. Advanced Business Analytics Fall 2015

Unit Testing webmethods Integrations using JUnit Practicing TDD for EAI projects

LearnIT project PL/08/LLP-LdV/TOI/140001

DEVELOP ROBOTS DEVELOPROBOTS. We Innovate Your Business

Interactive Data Mining and Visualization

Aspects in Development of Statistic Data Analysis in Romanian Sanitary System

Multiple regression - Matrices

SYSTEM DEVELOPMENT AND IMPLEMENTATION

Tutorial: Get Running with Amos Graphics

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Solar energy e-learning laboratory - Remote experimentation over the Internet

Case Study 1: Paid Search Wars by John Sharp and Des Laffey

ESS event: Big Data in Official Statistics

Using R for Linear Regression

The Future of Loyalty. Rupert Duchesne President & C.E.O. Groupe Aeroplan Inc.

ETPL Extract, Transform, Predict and Load

How To Test The Performance Of An Ass 9.4 And Sas 7.4 On A Test On A Powerpoint Powerpoint 9.2 (Powerpoint) On A Microsoft Powerpoint 8.4 (Powerprobe) (

Conference "Competencies and Capabilities in Education" Oradea 2009 EDUCAŢIA ANTREPRENORIALĂ - EDUCAŢIE PENTRU VIITOR

YO WE U ENVISION DEVELOP

whitepaper Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R

Psychology 205: Research Methods in Psychology

CONSTRUCTING AN EFFECTIVE ACTION PLAN

District 72 Toastmasters

N/A. Yes. Students are expected to review and understand all areas of the course outline.

The importance of using marketing information systems in five stars hotels working in Jordan: An empirical study

DATA MINING USING PENTAHO / WEKA

Multiple-Comparison Procedures

MOBILE ENTERPRISE APP STORES: THE SAP GAME CHANGER

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Questions for a NEW year

ING International Survey. Strong demand across Europe for financial education in schools

Global Economic Briefing: Global Inflation

Coefficient of Determination

Data analysis and regression in Stata

R and Hadoop: Architectural Options. Bill Jacobs VP Product Marketing & Field CTO, Revolution

Our Philosophy. Authentic Contexts. Provide relevant and meaningful courseware to promote deeper understanding

Empowering the Masses with Analytics

Doctoral Research Scientific Report

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Manage Your Leads Well to Boost Sales Volumes Anupam Bhattacharjee Shine Gangadharan

PREDICTIVE INSIGHT ON BATCH ANALYTICS A NEW APPROACH

Advanced Big Data Analytics with R and Hadoop

10. Comparing Means Using Repeated Measures ANOVA

Overseas Investment in Oil Industry and the Risk Management System

Better planning and forecasting with IBM Predictive Analytics

Transcription:

Globalization and Higher Education in Economics and Business Administration Iasi, GEBA 18-20 October 2012 BRINGING NEW OPPORTUNITIES TO DEVELOP STATISTICAL SOFTWARE AND DATA ANALYSIS TOOLS IN ROMANIA Nicoleta CARAGEA Senior Expert, National Institute of Statistics, Romania/Lecturer at Ecological University of Bucharest Ciprian Antoniade ALEXANDRU Senior Lecturer at Ecological University of Bucharest Ana-Maria DOBRE Expert, National Institute of Statistics

R - a new challenge at academic level Why R? Data analysis tool - high inertia to change: - level of knowledge that young people receive in the academic level - tradition of teachers in keeping poor updated topics Re-shape the thinking at university level - present to students the opportunity to use the state-ofthe-art programming technology for data analysis

R - overview Data analysis tool developed by statisticians for statisticians Statistical programming language created by two academicians in 1993 in New Zealand and released in 1996 Free and open-source software User-friendly Graphical User Interfaces: RStudio, Deducer, Revolution Analytics, Red-R, JGR (Java GUI for R), SciViews-R In about three years the R s users will exceed the number of users of SAS and SPSS

R Packages Extended through packages, user-created add-on programs Every package is a research project that is reviewed at academic level Sources: http://r4stats.com/articles/popularity/ and http://journal.r-project.org/archive/2009-2/rjournal_2009-2_fox.pdf

Strengths Freeware, open-source Easy to install and configure on various operating systems Linked with the way statisticians think and work Various procedures not available in SPSS and SAS The freedom to teach with real-world examples from outside organizations The flexibility to mix-and-match models, scripts and packages for the best results Opportunities Share new techniques with other R users around the world via online community Re-use and reproduce new discovered techniques on analytic operations that the user is going to perform this is difficult in SAS or SPSS Very large area of use - statistics, journalism, mapping, finance, forecasting, social networking, drug development, computational biology, life sciences and many more SWOT Weaknesses Data collection should be available from other tools; MySQL or PostgreSQL are popular among R users for this purpose Guided Analytics not available The help files and the vignettes for packages are written for relatively advanced users R is not very user friendly and it needs basic knowledge of programming language Threats Harder to learn than other similar software due to the fact that it has more types of data structures than the data set It is necessary for the user to carry out the macro language of R and to control the management of the output; SPSS and SAS allow user to skip those issues until he needs them

STATISTICAL ANALYSIS WITH R An example: The simple linear regression model y i =b 0 +b 1 x i +e i y = response variable x = covariate Linear models are fit using R s model formulas.

Estimating the parameters in simple linear regression The basic format for a formula is the ~ (tilde) is read is modeled by and is used to separate the response from the predictor(s). One goal when modeling is to fit the model by estimating the parameters based on the sample. For the regression model is used the method of least squares. To find the estimates, it is used the lm() function. The basic usage of lm function is of the form: lm(formula, data=, subset= )

R output sample lm function

R global brainstorming Source: http://www.r-bloggers.com/where-in-the-world-is-r-andrstudio/?utm_source=feedburner&utm_medium=email&utm_campaign=feed%3a+rbloggers+%28r+bloggers%29

Conclusion R is the most powerful and flexible statistical programming language in the world. - Norman Nie, cofounder of SPSS in the late 1960 s, currently, CEO and president of Revolution Analytics, a company that provides commercialized versions of R programs Official statistical systems using R: Italy, Austria, Australia, Canada Companies using R: Pfizer, Shell, Facebook, Google, Mozilla, Times, The New York Times, The Economist, NewScientist, Lloyd's, Bing, Johnson&Johnson

Romanian new users Unfortunately, Romania is not on the current available users list worldwide, but it is not late to make this happen. Our country has very good and competitive computer scientists, which could become users. For the moment, there is a small group in the official statistic involved in small area estimation based on R technique.

Thank you! If you would like to join Romanian team, please contact us at: romanian.r.team@gmail.com omanian Team