R-Academy I Knowledge, that matters

Similar documents

R Tools Evaluation. A review by Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Predictive Maintenance (with R)

Learning outcomes. Knowledge and understanding. Competence and skills

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

Big Data Analytics and Optimization

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

MS1b Statistical Data Mining

Statistics for BIG data

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Our Raison d'être. Identify major choice decision points. Leverage Analytical Tools and Techniques to solve problems hindering these decision points

Get to Know the IBM SPSS Product Portfolio

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

How To Understand The Theory Of Probability

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

ANALYTICS CENTER LEARNING PROGRAM

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

2015 Workshops for Professors

Azure Machine Learning, SQL Data Mining and R

Data Mining mit der JMSL Numerical Library for Java Applications

Model Deployment. Dr. Saed Sayad. University of Toronto

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

KnowledgeSEEKER Marketing Edition

Advanced analytics at your hands

The Scientific Data Mining Process

Knowledge Discovery from patents using KMX Text Analytics

Introduction to Data Mining

Data, Measurements, Features

Statistics Graduate Courses

An Introduction to Data Mining

430 Statistics and Financial Mathematics for Business

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

A fast, powerful data mining workbench designed for small to midsize organizations

Easily Identify Your Best Customers

<no narration for this slide>

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Street Address: 1111 Franklin Street Oakland, CA Mailing Address: 1111 Franklin Street Oakland, CA 94607

Confidently Anticipate and Drive Better Business Outcomes

QDA Q-Management A S I D A T A M Y T E S P E C S H E E T. From stand-alone applications to integrated solutions. Process optimization tool

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Data Mining in the Swamp

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Information and Decision Sciences (IDS)

Fluency With Information Technology CSE100/IMT100

Sunnie Chung. Cleveland State University

How To Make A Credit Risk Model For A Bank Account

Data Mining. Nonlinear Classification

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Introduction. A. Bellaachia Page: 1

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

R Graphics Cookbook. Chang O'REILLY. Winston. Tokyo. Beijing Cambridge. Farnham Koln Sebastopol

GETTING STARTED WITH R AND DATA ANALYSIS

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Better decision making under uncertain conditions using Monte Carlo Simulation

Course Title: Advanced Topics in Quantitative Methods: Educational Data Science Practicum

MEng, BSc Computer Science with Artificial Intelligence

Analysis of algorithms of time series analysis for forecasting sales

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Master of Science in Health Information Technology Degree Curriculum

MEng, BSc Applied Computer Science

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource Virtualization

Data Warehousing and Data Mining in Business Applications

Leveraging Ensemble Models in SAS Enterprise Miner

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

Tax Fraud in Increasing

Data Isn't Everything

EXPLORING SPATIAL PATTERNS IN YOUR DATA

Information Management course

Today's Topics. COMP 388/441: Human-Computer Interaction. simple 2D plotting. 1D techniques. Ancient plotting techniques. Data Visualization:

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Machine Learning using MapReduce

Energy Load Mining Using Univariate Time Series Analysis

Chapter 20: Data Analysis

9.2 User s Guide SAS/STAT. Introduction. (Book Excerpt) SAS Documentation

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

FOUNDATIONS OF A CROSS- DISCIPLINARY PEDAGOGY FOR BIG DATA

Data Mining Applications in Higher Education

MSCA Introduction to Statistical Concepts

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Data Mining + Business Intelligence. Integration, Design and Implementation

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Why is Internal Audit so Hard?

Interactive Data Mining and Visualization

ANALYTICS IN BIG DATA ERA

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Predictive Analytics Certificate Program

Learning is a very general term denoting the way in which agents:

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

Advanced In-Database Analytics

Transcription:

I Knowledge, that matters

About the R-Academy The R Academy of eoda is a modular course program for the R statistical language with regular events and training sessions. Our course instructors have been working with data analysis for over 10 years. The course concept is aimed to train you to become an R expert. Depending on your needs and interests, you can choose from a variety of different course modules. A strictly hierarchical structure does not exist, and the modules can be combined individually. Our R training at universities, graduate centers as well as for companies are regularly evaluated and rated very well. About R R, which is an object oriented programming language for statistical data analysis, is the best alternative for the analysis and visualization of data, data mining, and business intelligence. R is extremely powerful and very flexible in comparison to most of the big commercial software packages for data analysis. Plus, R is open source and is constantly being developed by a global scientific community. Hence, R sets an unprecedented standard of functionality, quality and contemporariness. The fact that the scientific community as well as big companies such as IBM, SAS and Revolution Analytics engage so heavily in R, creates a strong investment reliability for R users. The programming language R provides users with a large spectrum of functions reaching far beyond the application of traditional statistics. As of now, R is in the process of becoming the multi-platform lingua franca of data analysis today there are more than 6.000 R extension packages available on CRAN, which support data analysis in every way possible as well as imaginable.

R-Academy: Program Find your course R-Expert Text Mining Big Data and Hadoop R in Live Systems Creating Packages Data Mining Interactive Graphics Avertising-Effectiveness Survival Analysis Time Series Analysis Quality Management with R Reproducible Research Programming with R Graphics Multivariate Statistics I Multivariate Statistics II Datamanagement Introduction to R

Introduction to R I 2 days First steps in R Structure of R, CRAN-Mirror, different environments/editors of R, usage of the internal help functions, internet based help sources The basic concept and philosophy of R Programming language, object orientation in R, functions Types of variables Vectors, data frames, lists, Import Data.txt-,.csv-,.xls-,.sav-files, internet sources Data management Assign variable attributes, creating variables, conditional transformations, selecting/filtering cases respectively variables Basic data analysis First descriptive statistics, i.e. means, deviations and other parameters, simple tables and graphics

Time Series Analysis I 2 days Foundations, seasonality, creating time series objects visualization of time series decomposition Trend, seasonal and random effects; calculation of seasonally adjusted values test method Stationarity and autocorrelation exponential smoothing Modeling to Holt-Winters, ETS and STL ARIMA models Manufacture of stationarity about differentiation; definition of AR and MA terms; modeling forecasting Seasonal and non-seasonal models; outlier treatment introduction to event history analysis Basics of creating objects Survival Kaplan Meier model Cumulative hazard curves, log-rank test Cox regression Modeling, model checking, interpretation of the coefficients

Survival Analysis I 1 day To estimate the time span until a special incident occurs, survival-models are used. For example, the prognosis of machine breakdowns or etiopathology are possible application areas. The usage of survival-analyses is taught on the basis of practical representatives. At the end of the course, every attendee should be able to exert the content for his own purpose. To get the best results, we recommend the participation in time series analysis first. The following methods are part of the content: Introduction to the fundamental terms of survical-analyses Episodes & censoring, survivor-functions, hazard-rate Introduction to the survival-analysis on R The survival package Kaplan-Meyer-Estimator Basic concept, Visualization, tabulation, group comparison, significance test Cox-Proportional-Hazards-Model Requirements and approvals, model configuration, the function coxph(), the ties-argument, interpretation of the result Time-varying variables & splitting of episodes The function survsplit() Cox regression Implementation in R, comparison of models, likelihood-ratio-test, information criteria (BIC/AIC), appraised values

Graphics with R I 2 days Overview Graphic Packages base, grid, ggplot2, lattice, plot ggplot Data, Mapping High-Level Graphic Elements Bar Chart, Point Chart, Pie Chart, Histograms, dense graphs, Scatterplots Low-Level Graphic Elements arrows axles laying grid headings Layer Components Geoms, Stats, Coord, Facet, Opts Customer since.. Inhabitants in thousand

Interactive Graphics with R I 2 days Interactive graphics are a flexible and efficient way to analyze data and to present analysis results. Interactive graphic applications offer queries, selections, highlighting or the modification of graphics parameters. In the environment of R, there are various concepts that provide the possibility to create interactive graphics and applications directly out of R. The course presents an overview of the creation of interactive graphics with R and provides the tools to independently implement interactive visualizations in R. Course content ggvis rcharts shiny.

Data Mining with R I 2 days Data Mining indicates a set of methods extracting knowledge from datasets without having presumptions about the data structure. Statistical und mathematical techniques are applied on data to expose inherent patterns. Generally the methods don t need a high level of measurement (categorical, ordinal or metric scale) while they have the capability to release complex non-linear data relations. Universal applications for Data Mining methods are forecast-models, basket of goods analysis, target group analysis and more. Methods which are part of the course: Regression- and Classification Trees Random Forest Artificial Neural Networks Support Vector Machines K-Means-Clustering

Multivariate Statistics with R I 2 days Cluster Analysis Starting point and Theory, different distance measures, Interpretation, Visualization Cluster Analysis Factor Analysis Starting point, Suitability, number of factors, number of extracting dimensions Regression analysis Modell, interpretation, possible problems Multivariate Statistics with R II Confirmatory factor analysis Multi Dimensional Scaling Shapley Value Regression Discriminant Analysis Bootstrapping

Big Data and Hadoop with R I 1 day Various initiatives have developed different concepts to cope with Big Data. For example different parser and packages have been developed to facilitate the handling of Big Data in R. Data in scattered systems require different methods of analysis than not-scattered data do. The principle of MapReduce is to divide problems into small tasks which can be solved on a small part of data. A typical example of application of data, which are saved in a Hadoop- System, is the counting of word in text files. Conventional techniques work through the whole text en bloc which can be really timeconsuming. MapReduce fragments the text into single knots and small blocks. The Reduce-Part reunites the results. Even complex search-, compare-, and analysis operations can be parallelized in this way and can therefore be calculated faster. The course does convey the development of scripts for MapReduce jobs with concrete examples. The course will give an introduction to the following aspects: Connection to data sources like data bases or file systems as Hadoop Linking to cloud environments like WindowsAzure or Amazon Web Services Chunking Partitition of data into sub parts Parallelization of jobs for calculation Overview over different parser s concepts (Revolution Analytics, Oracle R Enterprise, Renjin, ) Visualization of Big Data

Text Mining with R I 2 days As a discipline of Data Mining, Text Mining includes algorithm based analysis methods for the detection of structures and information from texts by using statistical and linguistic analysis tools. An example of application is the Web Mining, which can identify trends and customer requirements on websites and social media platforms. Text Mining is also used to forecast price trends and stock prices on the basis of news reports. The course focuses on the application of the packets tm, RTextTools and OpenNLP and covers the following aspects: Overview of Text Mining Import of unstructured data, Web Scraping Structuring of texts (Pruning, Tokenization, Sentence Splitting, Normalization, Stemming, N-Gramme) Simple content analysis and association analysis Classification of documents with different methods (Support Vector Machines, Generalized Linear Model, Maximum Entropy, Supervised latent Dirichlet allocation, Boosting, Bootstrap aggregating, Random Forrests, Neural Networks, Regression Tree)

Advertising-Effectiveness Measurement with R I 1 day The assessment of advertising material used and its efficiency is still one of the major challenges of marketing. The course is focusing on the analysis of information from the web tracking.

Applied Statistics in Quality Management with R I 3 days Statistical Controlling of incoming goods in production, and outgoing goods generate operating figures necessary to rate the quality of goods and products. The requirements to process quality controls systematically are methodical knowledge of statistics as well as of the right software. The open source statistical language R represents an interesting alternative. The course conveys basic knowledge concerning R which can be used to manage previously processed statistical data. Before they are processed practically with R, the concepts of statistical testing will be introduced theoretically. Furthermore AQL standard values according to ISO 2859 and DIN ISO 3951 will be discussed. Additionally their operation modes and application will be presented related to practical applications. The application of the methods in R covers the most important functions in the area of statistical testing and the development of quality control plans. Essential contents from the area of inference statistics include: How can the optimal size of a random sample be determined? How can a decision for a specific testing method be made? How can operating figures be interpreted? Which degree of safety does the result of the random sample contain? How can the risks of deliverers and customers be arranged? Which discrepancies are acceptable?

Programming with R I 2 days Loops and control elements Vector-valued programming Split-Apply-Combine Approach Define your own functions Environments and Scoping Object-oriented Programming / R-Class systems Exceptions / Error Handling Profilling and Debugging Data Management with R I 1 day Recoding of variables Data Aggregation Forming and analyze subsets of data and groups Groupwise data operations (split-apply-combine) Merging and Sorting Data Data transformations (wide vs. long format) Comparing data Identify and remove duplicates

Creating Packages with R I 1 day The course explains the process from a loose collection of functions to a publishable package. Package structure Release of packages Package documentation Namespaces and package dependencies Testing R in Live Systems I 2 days The course teaches the key aspects of the use of R in a business environment. Update of Packages and R Working in a closed environment Testing Versioning and collaboration Documentation and package creation R in Server/Client-Architecture

Reproducible Research with R I 1 day The analysis of statistical data generate reports with various elements such as text, data, formulas, tables, and graphics. Interfaces between R and latex/html can bring the various contents in R together, and create a clear output which is available for presentation. In addition, it allows R to customize the reports dynamically on the basis of new data. In the method known under the term Reproducible Research the report items are updated without making any manual adjustments. After completion of the course, the participants should be able to create customized and automated reports. Contents of the course : The user interface R-Studio The packets " Sweave " and " knitr Short introduction to latex, Markdown and HTML Formatting the R-issues with Chunk options Making static report templates in various output formats such as pdf and html Dynamic reports and automated adjustments The combination of theoretical introductions, specific cases and practical exercises ensure the success of learning.

We offer our R-Academy at your place as well as via web conferencing. These Inhouse-training modules can be assembled individually and can be aligned completely with your data and analysis needs. Feel free to contact us eoda We at eoda have a passion for data and analysis. We are data scientists, software developers, management consultants and personal trainers all combined in one. We generate strategic advantages from your data on the basis of extensive experience in Data Mining and Predictive Analytics. Our team will derive acting recommendations and solutions that will help you to adjust to upcoming trends or future market changes. It will be a pleasure for us to share this knowledge with you we offer the possibility to coach you in managing statistic methods, and in dealing with evolving data in your enterprise appropriately. In addition, we offer specially tailored SaaS solutions adapted to your unique needs. We do not shrink away from challenges and individual requests. We are always ready for new tasks that we will manage with our hands-onmentality, proven methods and technologies. eoda GmbH Universitätsplatz 12 34127 Kassel - Germany Tel. +49 (0)561 202 724 40 Fax. +49 (0)561 202 724 30 info@eoda.de