Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm!

Size: px
Start display at page:

Download "Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm!"

Transcription

1 Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm! Moderator: David L. Snell, ASA, MAAA Presenters: Brian D. Holland, FSA, MAAA Dihui Lai, Ph.D. Sheamus Kee Parkes, FSA, MAAA

2 Python for Actuaries Brian Holland, FSA, MAAA 2015 SOA Annual Meeting Austin, TX

3 Disclaimer: Any views or opinions discussed or shown in this presentation are solely those of the author and do not represent those of AIG or any of its subsidiaries or employees. 2

4 Why learn Python? We hear a ton about machine learning, data science, big data. To actually do these things personally, you have to have the technical skills programming / hacking skills included. Python has a lot of traction in data science applications and is now quite popular. You don t have to look long before seeing it. Some data science companies are Python shops. Why not learn or learn about Python: You don t program or manage programmers or programming. You can get by in a spreadsheet or with VBA. You have no interest in doing or trying advanced analytics. Fair warning: this is a presentation about a programming language. 3

5 Purpose today: shake hands with Python See what you might want to dig into What is Python? an object-oriented language with extensive scientific, numeric libraries with many special-purpose libraries with an expanding user base that is designed for readability Forced tabbing; many places to comment work in accessible ways around since 1991 in two active versions: 2 and 3 For new work: not much case for sticking with 2 now, big libraries are ported to 3. named after Monte Python, not the snake 4

6 Applications for actuaries A general-purpose master tool, with libraries for special purposes Can manipulate R; MS Office, other Windows objects Data munging: Easily read spreadsheets, text files, databases, scrape web (with library BeautifulSoup) Process automation and documentation Data visualization Statistical modeling / machine learning / data science / predictive modeling Presentations 5

7 Ways to use Python System command: for scripts Command line environment 6

8 Ways to use Python: IPython notebooks Edit browser-based documents saved in JSON Mix formatted text and computation Typeset math Section headings, HTML, markdown Graphics inline with the flow of text, computed as you go Run remote servers thorough the web also grids Convert the notebooks easily to slides, HTML, plain Python files; on to MS Word Note: IPython notebooks recently folded into Jupyter project Front-end for many other back-end computations, including R, Julia 7

9 Ways to use Python: IPython notebooks Could you do that in a spreadsheet? I could not, not reasonably. 8

10 What is knowing Python? Language: syntax, and Python standard library The Python Standard Library by Example, Doug Hellmann, 2011 Libraries to do what you need BeautifulSoup: to read and manipulate HTML/XML, scraping web PyODBC to talk to databases NumPy, Pandas, Scikit-Learn: essential for machine learning and computation generally 9

11 Graphics libraries: Death by choice Bokeh for interactive plots in browser Seaborn GGPLOT port for R fans and experts; VisPy bleeding edge, GPU, interactive, 2d, 3d, wow Matplotlib the main one Tip: come to afternoon session to see what these LTC exhibits are. 10

12 Data I/O with Pandas The Pandas library can import many document types directly into a DataFrame object (similar to R s) Fixed-width text Delimited text Spreadsheets HTML, JSON SQL queries, using an open connection to the DB 11

13 Machine learning: scikit-learn the killer app? Many examples at A very small sample from the page: 12

14 Cooperation with other software: RPy2 in a Notebook R Magic : (are many magic functions in IPython or Jupyter notebooks) Allow commands to other tools directly in the notebook 13

15 More on RPy2: accessing R objects 14

16 PypeR: another way to talk to R PypeR uses pipes to communicate with R. 15

17 Good luck, have fun! Thanks for your interest. Brian Holland, FSA, MAAA 16

18

19 R for Actuarial Science Dihui Lai, PhD Data Scientist Reinsurance Group of America, Incorporated

20 Outline R, Whats and Whys? Use R for Actuarial Science R Demo Conquer Big Data with R

21 R, Whats and Whys? Powerful data manipulation, statistical modeling, and charting tools of modern data science Open source project since 1995 Active community (>2 million users and developers) Incorporates features of object-oriented and functional programming

22 R, Whats and Whys? Statistic toolkits Easy data manipulation STUDY_YEAR ISSUE_AGE POLICY_YEAR EXPOSURE LAPSE_CNT Cutting edge analytics Database Integrate advanced data tech Visualization tools

23 Use R for Actuarial Science Example: Term Tail Lapse Study load("lapsedata.rdata") head(lapsedata) ## STUDY_YEAR ISSUE_AGE POLICY_YEAR EXPOSURE LAPSE_CNT FA_BAND ## B. 100k-249k ## B. 100k-249k ## C. 250k-999k ## B. 100k-249k ## C. 250k-999k ## B. 100k-249k summary(lapsedata) ## STUDY_YEAR ISSUE_AGE POLICY_YEAR EXPOSURE ## : :92930 Min. :10.00 Min. : ## : : st Qu.: st Qu.: ## : :76142 Median :10.00 Median : ## : :69777 Mean :10.87 Mean : ## : : rd Qu.: rd Qu.: ## : :41278 Max. :19.00 Max. : ## (Other) :64476 (Other):83483 ## LAPSE_CNT FA_BAND ## Min. : A. < 100k : ## 1st Qu.: B. 100k-249k : ## Median : C. 250k-999k : ## Mean : D. 1M M: ## 3rd Qu.: E. 2M+ : 7232 ## Max. : D. 1M-1.99M : 1830

24 Use R for Actuarial Science Example: Term Tail Lapse Study

25 Use R for Actuarial Science Example: Term Tail Lapse Study Model1 <- glm(lapse_cnt~offset(log(exposure))+fa_band, family=poisson(),data= LapseData) summary(model1) ## ## Call: ## glm(formula = LAPSE_CNT ~ offset(log(exposure)) + FA_BAND, family = poisso n(), ## data = LapseData) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## ## ## Coefficients: ## Estimate Std. Error z value Pr(> z ) ## (Intercept) <2e-16 *** ## FA_BANDB. 100k-249k <2e-16 *** ## FA_BANDC. 250k-999k <2e-16 *** ## FA_BANDD. 1M M <2e-16 *** ## FA_BANDE. 2M <2e-16 *** ## FA_BANDD. 1M-1.99M <2e-16 *** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for poisson family taken to be 1) ## ## Null deviance: on degrees of freedom ## Residual deviance: on degrees of freedom ## AIC:

26 Use R for Actuarial Science Example: Hierarchical Clustering

27 Use R for Actuarial Science Examples: Other Potentials SVM Text Mining Map Have Fun

28 R Demo Use R for Twitter Streaming

29 Conquer Big Data with R R packages for big data Memory allocation: ff, bigmemory Integrate R with clusters: RHadoop, SparkR Parallel computing package: snowfall, multicore Commercial distribution: Revolution R

30 Summary - Do You Want the Toolbox? Easy data manipulation STUDY_YEAR ISSUE_AGE POLICY_YEAR EXPOSURE LAPSE_CNT Statistic toolkits Cutting edge analytics Database Integrate advanced data tech Visualization tools

31 Questions?

32 R vs Python SOA Annual Meeting October 2015 Presented by Shea Parkes, FSA, MAAA

33 Limitations The views expressed in this presentation are those of the presenter, and not those of Milliman. Nothing in this presentation is intended to represent a professional opinion or be an interpretation of actuarial standards of practice. 2

34 Data Science A Useful Perspective /the-data-sciencevenn-diagram 3 June 27, 2011

35 Data Science A Useful Perspective /the-data-sciencevenn-diagram =Actuarial Student/Analyst Self-Assessment 4 June 27, 2011

36 Data Science A Useful Perspective /the-data-sciencevenn-diagram =Actuarial Student/Analyst Self-Assessment 5 June 27, 2011

37 Bending your brain The more you use Python, the better you are able to think about programming The more you use R, the better you are able to think about data analysis 6 June 27, 2011

38 Both are multi-paradigm but Functions are first class objects, but lambda s are constrained and an awkward nonlocal statement was only recently introduced 3+ ways to do Object Oriented Programming, but none of them are simple and easy to use 7 June 27, 2011

39 Both could use a little help 8 June 27, 2011

40 Recent growth coming together Data Science stack Pandas + scikit-learn + statsmodels + IPython Cutting edge modeling Theano and PyStan RStudio + devtools + more encouraging best software development practices Dplyr + magrittr = more readable code = faster development 9 June 27, 2011

41 But what should I use? Will you need to integrate with other systems at all? Is analyzing data 80%+ of what you will be doing? Whichever your colleagues have experience in! 10 June 27, 2011

42 But what should I use? Will you need to integrate with other systems at all? Is analyzing data 80%+ of what you will be doing? Whichever your colleagues have experience in! 11 June 27, 2011

Unlocking the True Value of Hadoop with Open Data Science

Unlocking the True Value of Hadoop with Open Data Science Unlocking the True Value of Hadoop with Open Data Science Kristopher Overholt Solution Architect Big Data Tech 2016 MinneAnalytics June 7, 2016 Overview Overview of Open Data Science Python and the Big

More information

R Tools Evaluation. A review by Analytics @ Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015

R Tools Evaluation. A review by Analytics @ Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015 R Tools Evaluation A review by Analytics @ Global BI / Local & Regional Capabilities Telefónica CCDO May 2015 R Features What is? Most widely used data analysis software Used by 2M+ data scientists, statisticians

More information

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2 DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.

More information

Session 15 OF, Unpacking the Actuary's Technical Toolkit. Moderator: Albert Jeffrey Moore, ASA, MAAA

Session 15 OF, Unpacking the Actuary's Technical Toolkit. Moderator: Albert Jeffrey Moore, ASA, MAAA Session 15 OF, Unpacking the Actuary's Technical Toolkit Moderator: Albert Jeffrey Moore, ASA, MAAA Presenters: Melissa Boudreau, FCAS Albert Jeffrey Moore, ASA, MAAA Christopher Kenneth Peek Yonasan Schwartz,

More information

ANACONDA. Open Source Modern Analytics Platform Powered by Python ANACONDA DELIVERS OPEN ENTERPRISE PYTHON KEY FEATURES WHY YOU LL LOVE ANACONDA

ANACONDA. Open Source Modern Analytics Platform Powered by Python ANACONDA DELIVERS OPEN ENTERPRISE PYTHON KEY FEATURES WHY YOU LL LOVE ANACONDA 1 Open Source Modern Analytics Platform Powered by Python KEY FEATURES 100% Open Source Modern Analytics Platform Powered by Python Single click installation Package management Works with Windows, OS X,

More information

Microsoft Research Windows Azure for Research Training

Microsoft Research Windows Azure for Research Training Copyright 2013 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the

More information

Microsoft Research Microsoft Azure for Research Training

Microsoft Research Microsoft Azure for Research Training Copyright 2014 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the

More information

Usability of Visualization Libraries for Web Browsers for Use in Scientific Analysis

Usability of Visualization Libraries for Web Browsers for Use in Scientific Analysis Usability of Visualization Libraries for Web Browsers for Use in Scientific Analysis Luke Barnard Technical Student CERN, Route de Meyrin 385 1217 Meyrin, Switzerland Matej Mertik Scientific Associate

More information

Introduction to Python

Introduction to Python 1 Daniel Lucio March 2016 Creator of Python https://en.wikipedia.org/wiki/guido_van_rossum 2 Python Timeline Implementation Started v1.0 v1.6 v2.1 v2.3 v2.5 v3.0 v3.1 v3.2 v3.4 1980 1991 1997 2004 2010

More information

1. The orange button 2. Audio Type 3. Close apps 4. Enlarge my screen 5. Headphones 6. Questions Pane. SparkR 2

1. The orange button 2. Audio Type 3. Close apps 4. Enlarge my screen 5. Headphones 6. Questions Pane. SparkR 2 SparkR 1. The orange button 2. Audio Type 3. Close apps 4. Enlarge my screen 5. Headphones 6. Questions Pane SparkR 2 Lecture slides and/or video will be made available within one week Live Demonstration

More information

Getting more out of Matplotlib with GR

Getting more out of Matplotlib with GR Member of the Helmholtz Association Getting more out of Matplotlib with GR July 20 th 26 th, 2015 Bilbao EuroPython 2015 Josef Heinen @josef_heinen Visualization needs visualize and analyzing two- and

More information

How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5

How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5 Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark

More information

CSE 6040 Computing for Data Analytics: Methods and Tools. Lecture 1 Course Overview

CSE 6040 Computing for Data Analytics: Methods and Tools. Lecture 1 Course Overview CSE 6040 Computing for Data Analytics: Methods and Tools Lecture 1 Course Overview DA KUANG, POLO CHAU GEORGIA TECH FALL 2014 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 1 Course Staff Instructor Da

More information

GR.jl Plotting for Julia based on GR

GR.jl Plotting for Julia based on GR Member of the Helmholtz Association GR.jl Plotting for Julia based on GR June 24 th 28 th, 2015 Massachusetts Institute of Technology, Cambridge, Massachusetts JuliaCon 2015 Josef Heinen @josef_heinen

More information

Big Data Paradigms in Python

Big Data Paradigms in Python Big Data Paradigms in Python San Diego Data Science and R Users Group January 2014 Kevin Davenport! http://kldavenport.com kldavenportjr@gmail.com @KevinLDavenport Thank you to our sponsors: Setting up

More information

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

More information

Big Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify

Big Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify Big Data at Spotify Anders Arpteg, Ph D Analytics Machine Learning, Spotify Quickly about me Quickly about Spotify What is all the data used for? Quickly about Spark Hadoop MR vs Spark Need for (distributed)

More information

Data Science with Hadoop at Opower

Data Science with Hadoop at Opower Data Science with Hadoop at Opower Erik Shilts Advanced Analytics erik.shilts@opower.com What is Opower? A study: $$$ Turn off AC & Turn on Fan Environment Turn off AC & Turn on Fan Citizenship Turn off

More information

Scientific Programming, Analysis, and Visualization with Python. Mteor 227 Fall 2015

Scientific Programming, Analysis, and Visualization with Python. Mteor 227 Fall 2015 Scientific Programming, Analysis, and Visualization with Python Mteor 227 Fall 2015 Python The Big Picture Interpreted General purpose, high-level Dynamically type Multi-paradigm Object-oriented Functional

More information

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech. MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of

More information

MEng, BSc Computer Science with Artificial Intelligence

MEng, BSc Computer Science with Artificial Intelligence School of Computing FACULTY OF ENGINEERING MEng, BSc Computer Science with Artificial Intelligence Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give

More information

Régression logistique : introduction

Régression logistique : introduction Chapitre 16 Introduction à la statistique avec R Régression logistique : introduction Une variable à expliquer binaire Expliquer un risque suicidaire élevé en prison par La durée de la peine L existence

More information

MEng, BSc Applied Computer Science

MEng, BSc Applied Computer Science School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

The most powerful open source data science technologies in your browser.!! Yves Hilpisch

The most powerful open source data science technologies in your browser.!! Yves Hilpisch The most powerful open source data science technologies in your browser.!! Yves Hilpisch I. The Market and The Problem II. How We Solve The Problem III. Market Size and Facts IV. Strategic Opportunities

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Visualization of Semantic Windows with SciDB Integration

Visualization of Semantic Windows with SciDB Integration Visualization of Semantic Windows with SciDB Integration Hasan Tuna Icingir Department of Computer Science Brown University Providence, RI 02912 hti@cs.brown.edu February 6, 2013 Abstract Interactive Data

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

MACHINE LEARNING. Meetup Tutorial - 22 January 2015 LADISPE - Politecnico di Torino - Italy. bit.ly/ml-italy

MACHINE LEARNING. Meetup Tutorial - 22 January 2015 LADISPE - Politecnico di Torino - Italy. bit.ly/ml-italy MACHINE LEARNING Meetup Tutorial - 22 January 2015 LADISPE - Politecnico di Torino - Italy bit.ly/ml-italy Why we are committed in growing a local MACHINE LEARNING and DEEP LEARNING community? Because

More information

Welcome to the second half ofour orientation on Spotfire Administration.

Welcome to the second half ofour orientation on Spotfire Administration. Welcome to the second half ofour orientation on Spotfire Administration. In this presentation, I ll give a quick overview of the products that can be used to enhance a Spotfire environment: TIBCO Metrics,

More information

Data Science at U of U

Data Science at U of U Data Science at U of U Je M. Phillips Assistant Professor, School of Computing Center for Extreme Data Management, Analysis, and Visualization Director, Data Management and Analysis Track University of

More information

Today's Topics. COMP 388/441: Human-Computer Interaction. simple 2D plotting. 1D techniques. Ancient plotting techniques. Data Visualization:

Today's Topics. COMP 388/441: Human-Computer Interaction. simple 2D plotting. 1D techniques. Ancient plotting techniques. Data Visualization: COMP 388/441: Human-Computer Interaction Today's Topics Overview of visualization techniques 1D charts, 2D plots, 3D+ techniques, maps A few guidelines for scientific visualization methods, guidelines,

More information

The data explosion is transforming science

The data explosion is transforming science Talk Outline The data tsunami and the 4 th paradigm of science The challenges for the long tail of science Where is the cloud being used now? The app marketplace SMEs Analytics as a service. What are the

More information

Sisense. Product Highlights. www.sisense.com

Sisense. Product Highlights. www.sisense.com Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze

More information

Intro to scientific programming (with Python) Pietro Berkes, Brandeis University

Intro to scientific programming (with Python) Pietro Berkes, Brandeis University Intro to scientific programming (with Python) Pietro Berkes, Brandeis University Next 4 lessons: Outline Scientific programming: best practices Classical learning (Hoepfield network) Probabilistic learning

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

Ethar Ibrahim Elsaka

Ethar Ibrahim Elsaka ethar.elsaka@gmail.com 9348 Cherry Hill Rd., Apt 621 College Park MD, 20740 USA Tel: 240 581 2664 Ethar Ibrahim Elsaka Education PhD Student, Department of Computer Science, University of Maryland at College

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

Data structures for statistical computing in Python Wes McKinney SciPy 2010 McKinney () Statistical Data Structures in Python SciPy 2010 1 / 31 Environments for statistics and data analysis The usual suspects:

More information

R / TERR. Ana Costa e SIlva, PhD Senior Data Scientist TIBCO. Copyright 2000-2013 TIBCO Software Inc.

R / TERR. Ana Costa e SIlva, PhD Senior Data Scientist TIBCO. Copyright 2000-2013 TIBCO Software Inc. R / TERR Ana Costa e SIlva, PhD Senior Data Scientist TIBCO Copyright 2000-2013 TIBCO Software Inc. Tower of Big and Fast Data Visual Data Discovery Hundreds of Records Millions of Records Key peformance

More information

Data-Intensive Applications on HPC Using Hadoop, Spark and RADICAL-Cybertools

Data-Intensive Applications on HPC Using Hadoop, Spark and RADICAL-Cybertools Data-Intensive Applications on HPC Using Hadoop, Spark and RADICAL-Cybertools Shantenu Jha, Andre Luckow, Ioannis Paraskevakos RADICAL, Rutgers, http://radical.rutgers.edu Agenda 1. Motivation and Background

More information

Analytic Modeling in Python

Analytic Modeling in Python Analytic Modeling in Python Why Choose Python for Analytic Modeling A White Paper by Visual Numerics August 2009 www.vni.com Analytic Modeling in Python Why Choose Python for Analytic Modeling by Visual

More information

Fast and Expressive Big Data Analytics with Python. Matei Zaharia UC BERKELEY

Fast and Expressive Big Data Analytics with Python. Matei Zaharia UC BERKELEY Fast and Expressive Big Data Analytics with Python Matei Zaharia UC Berkeley / MIT UC BERKELEY spark-project.org What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop

More information

R YOU READY FOR PYTHON? Sunday 19th April, 2015

R YOU READY FOR PYTHON? Sunday 19th April, 2015 R YOU READY FOR PYTHON? Sunday 19th April, 2015 THIS IS NOT A PYTHON VS R TALK credits - https://meetmrholland.wordpress.com/2013/02/03/creative-5-tips-to-make-all-your-meetings-exactly-the-same/ WHO ARE

More information

Computer Information Systems (CIS)

Computer Information Systems (CIS) Computer Information Systems (CIS) CIS 113 Spreadsheet Software Applications Prerequisite: CIS 146 or spreadsheet experience This course provides students with hands-on experience using spreadsheet software.

More information

CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [SPARK] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Streaming Significance of minimum delays? Interleaving

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

Data Science Certificate Program

Data Science Certificate Program Information Technologies Programs Data Science Certificate Program Accelerate Your Career extension.uci.edu/datascience Offered in partnership with University of California, Irvine Extension s professional

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

DB2 Web Query Interfaces

DB2 Web Query Interfaces DB2 Web Query Interfaces There are several different access methods within DB2 Web Query and their related products. Here is a brief summary of the various interface and access methods. Method: DB2 Web

More information

Hadoop & SAS Data Loader for Hadoop

Hadoop & SAS Data Loader for Hadoop Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe Agenda What s Hadoop SAS Data management: Traditional In-Database In-Memory The Hadoop analytics lifecycle

More information

Chapter 13: Program Development and Programming Languages

Chapter 13: Program Development and Programming Languages Understanding Computers Today and Tomorrow 12 th Edition Chapter 13: Program Development and Programming Languages Learning Objectives Understand the differences between structured programming, object-oriented

More information

AcademyR Course Catalog

AcademyR Course Catalog AcademyR Course Catalog Table of Contents Our Philosophy...3 Courses Listed by Role Data Analyst...4 Data Scientist...6 R Programmer...9 Statistician.... 10 BI Developer... 11 System Administrator... 12

More information

Data Analytics at NERSC. Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services

Data Analytics at NERSC. Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services Data Analytics at NERSC Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services NERSC User Meeting August, 2015 Data analytics at NERSC Science Applications Climate, Cosmology, Kbase, Materials,

More information

Introducing open source statistical and data science tools to business analytics students and professionals

Introducing open source statistical and data science tools to business analytics students and professionals Detroit ASA January 2015 Introducing open source statistical and data science tools to business analytics students and professionals Mark Isken Assoc. Prof. of MIS School of Business Administration Oakland

More information

Operationalise Predictive Analytics

Operationalise Predictive Analytics Operationalise Predictive Analytics Publish SPSS, Excel and R reports online Predict online using SPSS and R models Access models and reports via Android app Organise people and content into projects Monitor

More information

Interactive Applications for Modeling and Analysis with Shiny

Interactive Applications for Modeling and Analysis with Shiny Interactive Applications for Modeling and Analysis with Shiny Presenter: Nicole Bishop Cindy Fryer, Paul Guill NASA GSFC Code 405 August 26, 2015 Introduction RStudio/Shiny offers quick and easy ways to

More information

TIBCO Spotfire Metrics Modeler User s Guide. Software Release 6.0 November 2013

TIBCO Spotfire Metrics Modeler User s Guide. Software Release 6.0 November 2013 TIBCO Spotfire Metrics Modeler User s Guide Software Release 6.0 November 2013 Important Information SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE. USE OF SUCH EMBEDDED OR BUNDLED TIBCO SOFTWARE

More information

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA by Christian Tzolov @christzolov Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD,

More information

Interactive Visualization

Interactive Visualization 7th China R Conf (Beijing), 2014-05-25 Interactive Visualization with R 王亮博 (亮亮) shared under CC 4.0 BY Esc to overview to navigate Online slide on http://ccwang002.gitcafe.com/chinarconf-interactive-vis/

More information

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising Open Data Partners and AdReady April 2012 1 Executive Summary AdReady is working to develop and deploy sophisticated

More information

An Introduction to Using Python with Microsoft Azure

An Introduction to Using Python with Microsoft Azure An Introduction to Using Python with Microsoft Azure If you build technical and scientific applications, you're probably familiar with Python. What you might not know is that there are now tools available

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

CS 40 Computing for the Web

CS 40 Computing for the Web CS 40 Computing for the Web Art Lee January 20, 2015 Announcements Course web on Sakai Homework assignments submit them on Sakai Email me the survey: See the Announcements page on the course web for instructions

More information

Dealing with Data Especially Big Data

Dealing with Data Especially Big Data Dealing with Data Especially Big Data INFO-GB-2346.30 Spring 2016 Very Rough Draft Subject to Change Professor Norman White Background: Most courses spend their time on the concepts and techniques of analyzing

More information

Big Data Executive Survey

Big Data Executive Survey Big Data Executive Full Questionnaire Big Date Executive Full Questionnaire Appendix B Questionnaire Welcome The survey has been designed to provide a benchmark for enterprises seeking to understand the

More information

Distributed DataFrame on Spark: Simplifying Big Data For The Rest Of Us

Distributed DataFrame on Spark: Simplifying Big Data For The Rest Of Us DATA INTELLIGENCE FOR ALL Distributed DataFrame on Spark: Simplifying Big Data For The Rest Of Us Christopher Nguyen, PhD Co-Founder & CEO Agenda 1. Challenges & Motivation 2. DDF Overview 3. DDF Design

More information

Microsoft Access is an outstanding environment for both database users and professional. Introduction to Microsoft Access and Programming SESSION

Microsoft Access is an outstanding environment for both database users and professional. Introduction to Microsoft Access and Programming SESSION 539752 ch01.qxd 9/9/03 11:38 PM Page 5 SESSION 1 Introduction to Microsoft Access and Programming Session Checklist Understanding what programming is Using the Visual Basic language Programming for the

More information

Programming Languages & Tools

Programming Languages & Tools 4 Programming Languages & Tools Almost any programming language one is familiar with can be used for computational work (despite the fact that some people believe strongly that their own favorite programming

More information

Python for Data Analysis and Visualiza4on. Fang (Cherry) Liu, Ph.D fang.liu@oit.gatech.edu PACE Gatech July 2013

Python for Data Analysis and Visualiza4on. Fang (Cherry) Liu, Ph.D fang.liu@oit.gatech.edu PACE Gatech July 2013 Python for Data Analysis and Visualiza4on Fang (Cherry) Liu, Ph.D PACE Gatech July 2013 Outline System requirements and IPython Why use python for data analysis and visula4on Data set US baby names 1880-2012

More information

What s New in MATLAB and Simulink

What s New in MATLAB and Simulink What s New in MATLAB and Simulink Kevin Cohan Product Marketing, MATLAB Michael Carone Product Marketing, Simulink 2015 The MathWorks, Inc. 1 What was new for Simulink in R2012b? 2 What Was New for MATLAB

More information

Origins, Evolution, and Future Directions of MATLAB Loren Shure

Origins, Evolution, and Future Directions of MATLAB Loren Shure Origins, Evolution, and Future Directions of MATLAB Loren Shure 2015 The MathWorks, Inc. 1 Agenda Origins Peaks 5 Evolution 0-5 Tomorrow 2 0 y -2-3 -2-1 x 0 1 2 3 2 Computational Finance Workflow Access

More information

USE OF PYTHON AS A SATELLITE OPERATIONS AND TESTING AUTOMATION LANGUAGE

USE OF PYTHON AS A SATELLITE OPERATIONS AND TESTING AUTOMATION LANGUAGE USE OF PYTHON AS A SATELLITE OPERATIONS AND TESTING AUTOMATION LANGUAGE Gonzalo Garcia VP of Operations, USA Property of GMV All rights reserved INTRODUCTION Property of GMV All rights reserved INTRODUCTION

More information

Elettra DAta analysis Tool: a data webhousing tool for heterogeneous log analysis

Elettra DAta analysis Tool: a data webhousing tool for heterogeneous log analysis Elettra DAta analysis Tool: a data webhousing tool for heterogeneous log analysis Roberto Pugliese Stefano Maraspin Alessio Curri Software for Measurements Experiment Division Sincrotrone Trieste S.C.p.A.

More information

Data Analysis with MATLAB. 2013 The MathWorks, Inc. 1

Data Analysis with MATLAB. 2013 The MathWorks, Inc. 1 Data Analysis with MATLAB 2013 The MathWorks, Inc. 1 Agenda Introduction Data analysis with MATLAB and Excel Break Developing applications with MATLAB Solving larger problems Summary 2 Modeling the Solar

More information

Questionnaire about the skills necessary for people. working with Big Data in the Statistical Organisations

Questionnaire about the skills necessary for people. working with Big Data in the Statistical Organisations Questionnaire about the skills necessary for people working with Big Data in the Statistical Organisations Preliminary results of the survey (19.08 2014) More detailed analysis will be prepared by October

More information

The Julia Language Seminar Talk. Francisco Vidal Meca

The Julia Language Seminar Talk. Francisco Vidal Meca The Julia Language Seminar Talk Francisco Vidal Meca Languages for Scientific Computing Aachen, January 16, 2014 Why Julia? Many languages, each one a trade-off Multipurpose language: scientific computing

More information

McGraw-Hill The McGraw-Hill Companies, Inc., 20 1. 01 0

McGraw-Hill The McGraw-Hill Companies, Inc., 20 1. 01 0 1.1 McGraw-Hill The McGraw-Hill Companies, Inc., 2000 Objectives: To describe the evolution of programming languages from machine language to high-level languages. To understand how a program in a high-level

More information

Deposit Identification Utility and Visualization Tool

Deposit Identification Utility and Visualization Tool Deposit Identification Utility and Visualization Tool Colorado School of Mines Field Session Summer 2014 David Alexander Jeremy Kerr Luke McPherson Introduction Newmont Mining Corporation was founded in

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics 1 Harnessing the Power of the Microsoft Cloud for Deep Data Analytics Today's Focus How you can operate your business more efficiently and effectively by tapping into Cloud based data analytics solutions

More information

2015, André Melancia (Andy.PT) 1

2015, André Melancia (Andy.PT) 1 2015, (Andy.PT) 1 "" 1. Requirements For this session you will need a computer with: Windows 7 Professional or higher Office 2007 Professional or higher (Outlook and Access installed) Some of the drivers

More information

Session 190 PD, Model Risk Management and Controls Moderator: Chad R. Runchey, FSA, MAAA

Session 190 PD, Model Risk Management and Controls Moderator: Chad R. Runchey, FSA, MAAA Session 190 PD, Model Risk Management and Controls Moderator: Chad R. Runchey, FSA, MAAA Presenters: Michael N. Failor, ASA, MAAA Michael A. McDonald, FSA, FCIA Chad R. Runchey, FSA, MAAA SOA 2014 Annual

More information

How To Write A Web Server In Javascript

How To Write A Web Server In Javascript LIBERATED: A fully in-browser client and server web application debug and test environment Derrell Lipman University of Massachusetts Lowell Overview of the Client/Server Environment Server Machine Client

More information

Lab 13: Logistic Regression

Lab 13: Logistic Regression Lab 13: Logistic Regression Spam Emails Today we will be working with a corpus of emails received by a single gmail account over the first three months of 2012. Just like any other email address this account

More information

The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code.

The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code. Content Introduction... 2 Data Access Server Control Panel... 2 Running the Sample Client Applications... 4 Sample Applications Code... 7 Server Side Objects... 8 Sample Usage of Server Side Objects...

More information

Shark Installation Guide Week 3 Report. Ankush Arora

Shark Installation Guide Week 3 Report. Ankush Arora Shark Installation Guide Week 3 Report Ankush Arora Last Updated: May 31,2014 CONTENTS Contents 1 Introduction 1 1.1 Shark..................................... 1 1.2 Apache Spark.................................

More information

Wiley. Automated Data Collection with R. Text Mining. A Practical Guide to Web Scraping and

Wiley. Automated Data Collection with R. Text Mining. A Practical Guide to Web Scraping and Automated Data Collection with R A Practical Guide to Web Scraping and Text Mining Simon Munzert Department of Politics and Public Administration, Germany Christian Rubba University ofkonstanz, Department

More information

Building a BI Solution in the Cloud

Building a BI Solution in the Cloud Building a BI Solution in the Cloud Stacia Varga, Principal Consultant Email: stacia@datainspirations.com Twitter: @_StaciaV_ 2 SQLSaturday #467 Sponsors Stacia (Misner) Varga Over 30 years of IT experience,

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!

More information

1 Topic. 2 Scilab. 2.1 What is Scilab?

1 Topic. 2 Scilab. 2.1 What is Scilab? 1 Topic Data Mining with Scilab. I know the name "Scilab" for a long time (http://www.scilab.org/en). For me, it is a tool for numerical analysis. It seemed not interesting in the context of the statistical

More information

Logistic Regression (a type of Generalized Linear Model)

Logistic Regression (a type of Generalized Linear Model) Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge

More information

Top 10 Oracle SQL Developer Tips and Tricks

Top 10 Oracle SQL Developer Tips and Tricks Top 10 Oracle SQL Developer Tips and Tricks December 17, 2013 Marc Sewtz Senior Software Development Manager Oracle Application Express Oracle America Inc., New York, NY The following is intended to outline

More information

Big Data. Lyle Ungar, University of Pennsylvania

Big Data. Lyle Ungar, University of Pennsylvania Big Data Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. McKinsey Data Scientist: The Sexiest Job of the 21st Century -

More information

FROM RELATIONAL TO OBJECT DATABASE MANAGEMENT SYSTEMS

FROM RELATIONAL TO OBJECT DATABASE MANAGEMENT SYSTEMS FROM RELATIONAL TO OBJECT DATABASE MANAGEMENT SYSTEMS V. CHRISTOPHIDES Department of Computer Science & Engineering University of California, San Diego ICS - FORTH, Heraklion, Crete 1 I) INTRODUCTION 2

More information

Interactive Data Mining and Visualization

Interactive Data Mining and Visualization Interactive Data Mining and Visualization Zhitao Qiu Abstract: Interactive analysis introduces dynamic changes in Visualization. On another hand, advanced visualization can provide different perspectives

More information

Course Information Course Number: IWT 1229 Course Name: Web Development and Design Foundation

Course Information Course Number: IWT 1229 Course Name: Web Development and Design Foundation Course Information Course Number: IWT 1229 Course Name: Web Development and Design Foundation Credit-By-Assessment (CBA) Competency List Written Assessment Competency List Introduction to the Internet

More information

Session D15 Simple Visualization of your TimeSeries Data. Shawn Moe IBM

Session D15 Simple Visualization of your TimeSeries Data. Shawn Moe IBM Session D15 Simple Visualization of your TimeSeries Data Shawn Moe IBM 1 Agenda IoT & Gateways Moving sensor data around jquery and Ajax Data Access Options Open Source Visualization packages 2 Acknowledgements

More information

Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

More information