Data Science Will computer science and informatics eat our lunch?

Size: px
Start display at page:

Download "Data Science Will computer science and informatics eat our lunch?"

Transcription

1 Data Science Will computer science and informatics eat our lunch? Thomas Lumley University of Auckland (g)tslumley statschat.org.nz notstat schat.tumblr.com

2 In the 1920s, the computing labs helped establish statistics on the American continent. Without them, even a modest study was beyond the ability of an individual statistician. At the same time, statistics labs often had the most powerful computing machines within their larger institution. They showed how organized computing could benefit science and provided a place for the earliest of computer scientists to test their ideas. -- Grier The origins of statistical computing, Amstat Online

3 Fig. 2. The Hollerith Electric Tabulating System

4 Iowa State Statistical Computing Service

5 CSIRAC

6 Iowa State Statistical Computing Service

7 Iowa State statistics PhD prelim exam Two eight-hour written-on-paper exams covering : Theory of Probability and Statistics I. Theory of Probability and Statistics II. Statistical Methods I. Statistical Methods II. Advanced Statistical Methods. Advanced Probability Theory. Advanced Theory of Statistical Inference. They do require a stat computing course: 1 credit/30

8 What is data science? and where can we get some?

9 Data Science is just a fancy name for statistics. Fitting simple models to messy and sometimes large data sets Combination of standard black-box fitting tools and good graphics. Doesn t require any fundamental knowledge our students don t have. Needs good computing skills, which our students can learn

10 Need to avoid going overboard with computing Data Wrangling isn t statistics Cleaning, tidying, querying, reformatting, transforming, getting in and out of databases,

11 Data Science is just a fancy name for statistics. Data Wrangling isn t statistics If you value self-consistency, you can hold at most one of these opinions. A/Prof Jenny Bryan, UBC (less than one is good)

12 Data science is statistics in the same way that epidemiology is statistics opinion polling is statistics ag. field trials are statistics

13 I did think, however, that many well-known applied statisticians attacked problems without the necessary mathematical knowledge and manipulative skill. Moreover, I believed that a principal cause of failure among medical research scientists was the lack of basic scientific knowledge in their special chosen field. H. O. Lancaster

14 Computing is easier to steal Define and explain the relevance to applied statistics of: Suffix trees Supernodal Cholesky factorization Column-store database Translation look-aside buffer

15 Computing is easier to steal Need to teach our data science students: A bit about databases and SQL A statistical programming language (eg R) Abstractions such as tidy data, sparse, map/reduce Reproducible data analysis (eg rmarkdown)?collaborative version control (eg git/github) Force students to work with a wild-caught data set and I'm still pretty sure some of the data is Permit interested students to learn the high-tech data structures missing, and butalgorithms could still stuff. be here, in this ONE HUNDRED SHEET excel file a PhD student on Twitter

16 But we don t know this stuff! let mego glethat for you Google Search I'm Feeling Lucky The computing folks are way better at dissemination than us Unlike statistics, the computer can tell you if you get it wrong.

17 Free online courses Books Related Courses M Exploratory Data Analysis Reproducible Research Statistical Inference /osljjÿp o D Dynamic Documents with R and knitr Yihui Xie Pract cal Dat Scienc * Nina/ml John Hooni Doing Data Science STRAIGHT TALK FROM THE FRONTLINE Getting and Cleaning Data Regression Models Developing Data Products d«n» «- dcns<ty(dot>i. n - npts) IIMIMINt Cathy O'Neil & Rachel Schutt dy2 <- M» - JfCIO KqtwlM «- rtrfyel.). length(dx)) lf(flu T> confshade(dx2. s«qb«lo». dy? S' I - 5>l The Data Scientist's Toolbox Data Analysis and Statistical Inference People who make their notes available ÿ 5b5 Home FAQ Syllabus Topics People J Data wrangling, exploration, and analysis with R UBC STAT 545A and 547M Software tools Open source environment for deep analysis of largecomplex data The Power of R with Big Data Get Started inminutes Resources to Learn & Join Learn how to explore, groom, visualize, and analyze dab make all of that reproducible, reusable, ar using R software carpentry

18 What do we have to offer? Popularity? Romance? Excitement?

19 Big Complex Messy Badly Sampled Creepy Vital to ask the right questions

20 Big Data Computer folks are better at this than us, but statistical insights important eg: Noel Cressie: fast computation for spatial models Bill Cleveland: optimising the divide/recombine strategy

21 Big Data Computer folks are better at this than us, Big doesn t mean gigabytes.

22 Complex Data Models for complex data Summaries (parameters, estimators) that answer the real questions Robustness of meaning, not just of power and level.

23 Complex Data: networks F(x)µ1- x -a Power laws: come from network, queue, Matthew effect process blog links page views long tail sales data citations to papers word frequencies earthquake sizes

24 Complex Data: networks F(x)µ1- x -a Power laws: come from network, queue, Matthew effect process blog links page views long tail sales data citations to papers word frequencies earthquake sizes All fit lognormal better, some much better Clauset et al, SIAM Rev. 2009

25 Complex Data: networks Random graph models for connections Erdös-Renyi graphs Exponential Random Graph Models (ERGMs) meaningful parameters, nice likelihood ERGMs are not consistent under sampling. [Shalizi et al, Ann Stat]

26 Complex Data Robustness of meaning can be hard: Suppose a Wilcoxon test shows X > Y, Y>Z What does this tell us about Means of X and Z? Medians of X and Z? Wilcoxon test of X and Z?

27 i i Messy Data Good applied statisticians know from messy data. o CM - X O and I'm still pretty sure some of the data is missing, but it could still be here, in this ONE HUNDRED SHEET excel file blooc Diastolic NnT i o r o a PhD student on Twitter 0 ao o c Age (years)

28 Badly Sampled Whom the Gods Would Destroy, They First Give Real-time Analytics [Dan McKinley, Etsy] This line of thinking is a trap. It's important to divorce the concepts of operational metrics and product analytics. Confusing how we do things with how we decide which things to do is a fatal mistake. Because non-representativeness of short time slices

29 Badly Sampled Statisticians know about sampling design weighting matching selection models

30 Creepy What questions should data answer? income Mount Eden atistics NZ Chris McDowall Based on census meshblock: not actual household data

31 Creepy (and Evil) What questions should data answer? Familiar issues: Bioethics Statistical disclosure/confidentiality New, but statistical issues: algorithm audit/accountability We also talk to social scientists more. (not enough)

32 Creepy (and Evil) How do we learn more? let me LjOOQie that for you Googlo Search I'm Fooling Lucky Cathy O Neil (mathbabe.org) Ed Felten danah boyd

33 Summary The hard problems in data science are hard. Many of the computational ones are solved (ish) Many of the unsolved ones are closer to statistics

34 Data Science Will computer science and informatics eat our lunch? Only if we let them, and it would be bad for data science, too

What is Data Science? Girl Develop It! Meetup Renée M. P. Teate, March 2015

What is Data Science? Girl Develop It! Meetup Renée M. P. Teate, March 2015 What is Data Science? { Girl Develop It! Meetup Renée M. P. Teate, March 2015 Let s start with: What is Data? http://upload.wikimedia.org/wikipedia/commons/f/f0/darpa _Big_Data.jpg https://encryptedtbn2.gstatic.com/images?q=tbn:and9gcs9dku3_tzi-swwyaqee5y0ehuvoiznsya_raknubbd0jyxpx7pw

More information

What is Data Science? Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014

What is Data Science? Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014 What is Data Science? { Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014 Let s start with: What is Data? http://upload.wikimedia.org/wikipedia/commons/f/f0/darpa

More information

Healthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw

Healthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Healthcare data analytics Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

How to Make Money with Google Adwords. For Cleaning Companies. H i tm a n. Advertising

How to Make Money with Google Adwords. For Cleaning Companies. H i tm a n. Advertising How to Make Money with Google Adwords For Cleaning Companies. H i tm a n Advertising Target Clients Profitably Google Adwords can be one of the best returns for your advertising dollar. Or, it could be

More information

Introduction to Data Science: CptS 483-06 Syllabus First Offering: Fall 2015

Introduction to Data Science: CptS 483-06 Syllabus First Offering: Fall 2015 Course Information Introduction to Data Science: CptS 483-06 Syllabus First Offering: Fall 2015 Credit Hours: 3 Semester: Fall 2015 Meeting times and location: MWF, 12:10 13:00, Sloan 163 Course website:

More information

Why Big Data is not Big Hype in Economics and Finance?

Why Big Data is not Big Hype in Economics and Finance? Why Big Data is not Big Hype in Economics and Finance? Ariel M. Viale Marshall E. Rinker School of Business Palm Beach Atlantic University West Palm Beach, April 2015 1 The Big Data Hype 2 Big Data as

More information

Computer Programming for the Social Sciences

Computer Programming for the Social Sciences Department of Social and Political Sciences Computer Programming for the Social Sciences This two day workshop will teach beginner level, practical computer programming skills for use in social science

More information

ANALYTICS A FUTURE IN ANALYTICS

ANALYTICS A FUTURE IN ANALYTICS ANALYTICS A FUTURE IN ANALYTICS WHAT IS ANALYTICS? In the information age in which we live, almost all of us consume and produce digital data, either for business, community or private uses. We access

More information

The 3 questions to ask yourself about BIG DATA

The 3 questions to ask yourself about BIG DATA The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.

More information

U N D E R S TA N D I N G T H E D N A O F DATA SCIENCE. 2014 Persontyle Ltd. All rights reserved.

U N D E R S TA N D I N G T H E D N A O F DATA SCIENCE. 2014 Persontyle Ltd. All rights reserved. U N D E R S TA N D I N G T H E D N A O F DATA SCIENCE 010100101010011110100101010 101010101010101010101001010 101010100101010101010010101 WHAT IS DATA SCIENCE? One day course to understand the concepts

More information

GETTING AHEAD OF THE COMPETITION WITH DATA MINING

GETTING AHEAD OF THE COMPETITION WITH DATA MINING WHITE PAPER GETTING AHEAD OF THE COMPETITION WITH DATA MINING Ultimately, data mining boils down to continually finding new ways to be more profitable which in today s competitive world means making better

More information

Francois Ajenstat, Tableau Stephanie McReynolds, Aster Data Steve e Wooledge, Aster Data

Francois Ajenstat, Tableau Stephanie McReynolds, Aster Data Steve e Wooledge, Aster Data Deep Data Exploration: Find Patterns in Your Data Faster & Easier Curt Monash, Founder and President, Monash Research Francois Ajenstat, Tableau Stephanie McReynolds, Aster Data Steve e Wooledge, Aster

More information

Practical Data Science with R

Practical Data Science with R Practical Data Science with R Instructor Matthew Renze Twitter: @matthewrenze Email: matthew@matthewrenze.com Web: http://www.matthewrenze.com Course Description Data science is the practice of transforming

More information

Customer Case Study. Automatic Labs

Customer Case Study. Automatic Labs Customer Case Study Automatic Labs Customer Case Study Automatic Labs Benefits Validated product in days Completed complex queries in minutes Freed up 1 full-time data scientist Infrastructure savings

More information

Computational Science and Informatics (Data Science) Programs at GMU

Computational Science and Informatics (Data Science) Programs at GMU Computational Science and Informatics (Data Science) Programs at GMU Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ Outline Graduate Program

More information

Big Data Big Knowledge?

Big Data Big Knowledge? EBPI Epidemiology, Biostatistics and Prevention Institute Big Data Big Knowledge? Torsten Hothorn 2015-03-06 The end of theory The End of Theory: The Data Deluge Makes the Scientific Method Obsolete (Chris

More information

POL 204b: Research and Methodology

POL 204b: Research and Methodology POL 204b: Research and Methodology Winter 2010 T 9:00-12:00 SSB104 & 139 Professor Scott Desposato Office: 325 Social Sciences Building Office Hours: W 1:00-3:00 phone: 858-534-0198 email: swd@ucsd.edu

More information

Streamline your supply chain with data. How visual analysis helps eliminate operational waste

Streamline your supply chain with data. How visual analysis helps eliminate operational waste Streamline your supply chain with data How visual analysis helps eliminate operational waste emagazine October 2011 contents 3 Create a data-driven supply chain: 4 paths to insight 4 National Motor Club

More information

The Edge Editions of SAP InfiniteInsight Overview

The Edge Editions of SAP InfiniteInsight Overview Analytics Solutions from SAP The Edge Editions of SAP InfiniteInsight Overview Enabling Predictive Insights with Mouse Clicks, Not Computer Code Table of Contents 3 The Case for Predictive Analysis 5 Fast

More information

Data Science with Hadoop at Opower

Data Science with Hadoop at Opower Data Science with Hadoop at Opower Erik Shilts Advanced Analytics erik.shilts@opower.com What is Opower? A study: $$$ Turn off AC & Turn on Fan Environment Turn off AC & Turn on Fan Citizenship Turn off

More information

Data Analytics in Organisations and Business

Data Analytics in Organisations and Business Data Analytics in Organisations and Business Dr. Isabelle E-mail: isabelle.flueckiger@math.ethz.ch 1 Data Analytics in Organisations and Business Some organisational information: Tutorship: Gian Thanei:

More information

FIVE STEPS FOR DELIVERING SELF-SERVICE BUSINESS INTELLIGENCE TO EVERYONE CONTENTS

FIVE STEPS FOR DELIVERING SELF-SERVICE BUSINESS INTELLIGENCE TO EVERYONE CONTENTS FIVE STEPS FOR DELIVERING SELF-SERVICE BUSINESS INTELLIGENCE TO EVERYONE Wayne Eckerson CONTENTS Know Your Business Users Create a Taxonomy of Information Requirements Map Users to Requirements Map User

More information

What is Data Analysis. Kerala School of MathematicsCourse in Statistics for Scientis. Introduction to Data Analysis. Steps in a Statistical Study

What is Data Analysis. Kerala School of MathematicsCourse in Statistics for Scientis. Introduction to Data Analysis. Steps in a Statistical Study Kerala School of Mathematics Course in Statistics for Scientists Introduction to Data Analysis T.Krishnan Strand Life Sciences, Bangalore What is Data Analysis Statistics is a body of methods how to use

More information

A future career in analytics

A future career in analytics A future career in analytics What is a career in analytics about? In the information age in which we live, almost all of us consume and produce digital data, either for business, community or private uses.

More information

UNIFY YOUR (BIG) DATA

UNIFY YOUR (BIG) DATA UNIFY YOUR (BIG) DATA ANALYTIC STRATEGY GIVE ANY USER ANY ANALYTIC ON ANY DATA Scott Gnau President, Teradata Labs scott.gnau@teradata.com t Unify Your (Big) Data Analytic Strategy Technology excitement:

More information

» Dealing with Big Data: David Hakken Weighs In blog.castac.org file:///users/dhakken/documents/» Dealing with Big Data_ Dav...

» Dealing with Big Data: David Hakken Weighs In blog.castac.org file:///users/dhakken/documents/» Dealing with Big Data_ Dav... Search: Go blog.castac.org From the Committee on the Anthropology of Science, Technology, and Computing (CASTAC) About Adventures in Pedagogy Beyond the Academy Member Sound-Off News, Links, and Pointers

More information

Challenges, Tools and Examples for Big Data Inference

Challenges, Tools and Examples for Big Data Inference Challenges, Tools and Examples for Big Data Inference Jean-François Plante, HEC Montréal Closing Conference: Statistical and Computational Analytics for Big Data June 12 th, 2015 What is Big Data? Dan

More information

Interoperability and Analytics February 29, 2016

Interoperability and Analytics February 29, 2016 Interoperability and Analytics February 29, 2016 Matthew Hoffman MD, CMIO Utah Health Information Network Conflict of Interest Matthew Hoffman, MD Has no real or apparent conflicts of interest to report.

More information

Making data predictive why reactive just isn t enough

Making data predictive why reactive just isn t enough Making data predictive why reactive just isn t enough Andrew Peterson, Ph.D. Principal Data Scientist Soltius NZ, Ltd. New Zealand 2014 Big Data and Analytics Forum 18 August, 2014 Caveats and disclaimer:

More information

Tips to ensuring the success of big data analytics initiatives

Tips to ensuring the success of big data analytics initiatives Tips to ensuring the success of big data Big data analytics is hot. Read any IT publication or website and you ll see business intelligence (BI) vendors and their systems integration partners pitching

More information

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques

More information

Master of Science in Healthcare Informatics and Analytics Program Overview

Master of Science in Healthcare Informatics and Analytics Program Overview Master of Science in Healthcare Informatics and Analytics Program Overview The program is a 60 credit, 100 week course of study that is designed to graduate students who: Understand and can apply the appropriate

More information

HR STILL GETTING IT WRONG BIG DATA & PREDICTIVE ANALYTICS THE RIGHT WAY

HR STILL GETTING IT WRONG BIG DATA & PREDICTIVE ANALYTICS THE RIGHT WAY HR STILL GETTING IT WRONG BIG DATA & PREDICTIVE ANALYTICS THE RIGHT WAY OVERVIEW Research cited by Forbes estimates that more than half of companies sampled (over 60%) are investing in big data and predictive

More information

Big Data to Knowledge (BD2K)

Big Data to Knowledge (BD2K) Big Data to Knowledge () potential funding agency synergies Jennie Larkin, PhD Office of the Associate Director of Data Science National Institutes of Health idash-pscanner meeting UCSD September 16, 2014

More information

A Robust Method for Solving Transcendental Equations

A Robust Method for Solving Transcendental Equations www.ijcsi.org 413 A Robust Method for Solving Transcendental Equations Md. Golam Moazzam, Amita Chakraborty and Md. Al-Amin Bhuiyan Department of Computer Science and Engineering, Jahangirnagar University,

More information

Six Signs. you are ready for BI WHITE PAPER

Six Signs. you are ready for BI WHITE PAPER Six Signs you are ready for BI WHITE PAPER LET S TAKE A LOOK AT THE WAY YOU MIGHT BE MONITORING AND MEASURING YOUR COMPANY About the auther You re managing information from a number of different data sources.

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

Course Title: Advanced Topics in Quantitative Methods: Educational Data Science Practicum

Course Title: Advanced Topics in Quantitative Methods: Educational Data Science Practicum COURSE NUMBER: APSTA- GE.2017 Course Title: Advanced Topics in Quantitative Methods: Educational Data Science Practicum Number of Credits: 2 Meeting Pattern: 3 hours per week, 7 weeks; first class meets

More information

INDEX. Introduction Page 3. Methodology Page 4. Findings. Conclusion. Page 5. Page 10

INDEX. Introduction Page 3. Methodology Page 4. Findings. Conclusion. Page 5. Page 10 FINDINGS 1 INDEX 1 2 3 4 Introduction Page 3 Methodology Page 4 Findings Page 5 Conclusion Page 10 INTRODUCTION Our 2016 Data Scientist report is a follow up to last year s effort. Our aim was to survey

More information

Data Analytics at NICTA. Stephen Hardy National ICT Australia (NICTA) shardy@nicta.com.au

Data Analytics at NICTA. Stephen Hardy National ICT Australia (NICTA) shardy@nicta.com.au Data Analytics at NICTA Stephen Hardy National ICT Australia (NICTA) shardy@nicta.com.au NICTA Copyright 2013 Outline Big data = science! Data analytics at NICTA Discrete Finite Infinite Machine Learning

More information

POSTGRADUATE PROGRAMS IN APPLIED DATA ANALYTICS

POSTGRADUATE PROGRAMS IN APPLIED DATA ANALYTICS POSTGRADUATE PROGRAMS IN APPLIED DATA ANALYTICS ANU College of Engineering & Computer Science Postgraduate Programs in Applied Data and Analytics 1 ANU is pleased to offer new postgraduate study opportunities

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Predictive Analytics Enters the Mainstream

Predictive Analytics Enters the Mainstream Ventana Research: Predictive Analytics Enters the Mainstream Predictive Analytics Enters the Mainstream Taking Advantage of Trends to Gain Competitive Advantage White Paper Sponsored by 1 Ventana Research

More information

Five Reasons Spotfire Is Better than Excel for Business Data Analytics

Five Reasons Spotfire Is Better than Excel for Business Data Analytics Five Reasons Spotfire Is Better than Excel for Business Data Analytics A hugely versatile application, Microsoft Excel is the Swiss Army Knife of IT, able to cope with all kinds of jobs from managing personal

More information

A Review of "Free" Massive Open Online Content (MOOC) for SAS Learners

A Review of Free Massive Open Online Content (MOOC) for SAS Learners PharmaSUG 2015 Paper A Review of "Free" Massive Open Online Content (MOOC) for SAS Learners Kirk Paul Lafler, Software Intelligence Corporation Abstract Leading online providers are now offering SAS users

More information

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data White Paper A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data Contents Executive Summary....2 Introduction....3 Too much data, not enough information....3 Only

More information

Training for Big Data

Training for Big Data Training for Big Data Learnings from the CATS Workshop Raghu Ramakrishnan Technical Fellow, Microsoft Head, Big Data Engineering Head, Cloud Information Services Lab Store any kind of data What is Big

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Introduction to predictive modeling and data mining

Introduction to predictive modeling and data mining Introduction to predictive modeling and data mining Rebecca C. Steorts Predictive Modeling and Data Mining: STA 521 August 25 2015 1 Today s Menu 1. Brief history of data science (from slides of Bin Yu)

More information

INTRODUCING AZURE MACHINE LEARNING

INTRODUCING AZURE MACHINE LEARNING David Chappell INTRODUCING AZURE MACHINE LEARNING A GUIDE FOR TECHNICAL PROFESSIONALS Sponsored by Microsoft Corporation Copyright 2015 Chappell & Associates Contents What is Machine Learning?... 3 The

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

Jay Buckingham Dynamic Signal jbuckingham@dynamicsignal.com

Jay Buckingham Dynamic Signal jbuckingham@dynamicsignal.com Jay Buckingham Dynamic Signal jbuckingham@dynamicsignal.com Financial Times PeHub.com Wall Street Journal Harvard Business Review Making use of vast amounts of data to: Discover what we don t know Obtain

More information

Use advanced techniques for summary and visualization of complex data for exploratory analysis and presentation.

Use advanced techniques for summary and visualization of complex data for exploratory analysis and presentation. MS Biostatistics MS Biostatistics Competencies Study Development: Work collaboratively with biomedical or public health researchers and PhD biostatisticians, as necessary, to provide biostatistical expertise

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

5 Point Social Media Action Plan.

5 Point Social Media Action Plan. 5 Point Social Media Action Plan. Workshop delivered by Ian Gibbins, IG Media Marketing Ltd (ian@igmediamarketing.com, tel: 01733 241537) On behalf of the Chambers Communications Sector Introduction: There

More information

Actuary vs Data Scientist

Actuary vs Data Scientist Actuary vs Data Scientist Richard Pugh Chief Data Scientist, Mango Solutions President @ R Consortium Chris Reynolds Head of Life Solutions Actuarial, PartnerRe 10 November 2015 Disclaimer The following

More information

SURVEY REPORT DATA SCIENCE SOCIETY 2014

SURVEY REPORT DATA SCIENCE SOCIETY 2014 SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses

More information

Five Tips for Presenting Data Analyses: Telling a Good Story with Data

Five Tips for Presenting Data Analyses: Telling a Good Story with Data Five Tips for Presenting Data Analyses: Telling a Good Story with Data As a professional business or data analyst you have both the tools and the knowledge needed to analyze and understand data collected

More information

Confidence intervals, t tests, P values

Confidence intervals, t tests, P values Confidence intervals, t tests, P values Joe Felsenstein Department of Genome Sciences and Department of Biology Confidence intervals, t tests, P values p.1/31 Normality Everybody believes in the normal

More information

Statistics, Big Data and Data Science!?

Statistics, Big Data and Data Science!? Statistics, Big Data and Data Science!? Prof. Dr. Göran Kauermann Ludwig-Maximilians-Universität Munich, Germany Statistics, Big Data and Data Science Statistics Founded around 1900 with the seminal work

More information

Big data in R EPIC 2015

Big data in R EPIC 2015 Big data in R EPIC 2015 Big Data: the new 'The Future' In which Forbes magazine finds common ground with Nancy Krieger (for the first time ever?), by arguing the need for theory-driven analysis This future

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

Big Data and Privacy. Fritz Henglein Dept. of Computer Science, University of Copenhagen. Finance IT Day Riga, 2015-03-26

Big Data and Privacy. Fritz Henglein Dept. of Computer Science, University of Copenhagen. Finance IT Day Riga, 2015-03-26 Big Data and Privacy Fritz Henglein Dept. of Computer Science, University of Copenhagen Finance IT Day Riga, 2015-03-26 About me Professor, Programming Languages and Systems, University of Copenhagen Director,

More information

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus

More information

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2 DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.

More information

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization

More information

5 - Low Cost Ways to Increase Your

5 - Low Cost Ways to Increase Your - 5 - Low Cost Ways to Increase Your DIGITAL MARKETING Presence Contents Introduction Social Media Email Marketing Blogging Video Marketing Website Optimization Final Note 3 4 7 9 11 12 14 2 Taking a Digital

More information

Correlational Research

Correlational Research Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 36 Outline

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

Everything you wanted to know about using Hexadecimal and Octal Numbers in Visual Basic 6

Everything you wanted to know about using Hexadecimal and Octal Numbers in Visual Basic 6 Everything you wanted to know about using Hexadecimal and Octal Numbers in Visual Basic 6 Number Systems No course on programming would be complete without a discussion of the Hexadecimal (Hex) number

More information

SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

More information

Big Data Analytics. Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs

Big Data Analytics. Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs 1 Big Data Analytics Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs Montevideo, 22 nd November 4 th December, 2015 INFORMATIQUE

More information

Top 5 best practices for creating effective dashboards. and the 7 mistakes you don t want to make

Top 5 best practices for creating effective dashboards. and the 7 mistakes you don t want to make Top 5 best practices for creating effective dashboards and the 7 mistakes you don t want to make p2 Financial services professionals are buried in data that measure and track: relationships and processes,

More information

Statistics in Applications III. Distribution Theory and Inference

Statistics in Applications III. Distribution Theory and Inference 2.2 Master of Science Degrees The Department of Statistics at FSU offers three different options for an MS degree. 1. The applied statistics degree is for a student preparing for a career as an applied

More information

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

More information

Congrats to Game Winners. How can computation use data to solve problems? What topics have we covered in CS 202? Part 1: Completed!

Congrats to Game Winners. How can computation use data to solve problems? What topics have we covered in CS 202? Part 1: Completed! CS 202: Introduction to Computation " UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department Professor Andrea Arpaci-Dusseau How can computation use data to solve problems? Congrats to Game Winners

More information

Hurwitz ValuePoint: Predixion

Hurwitz ValuePoint: Predixion Predixion VICTORY INDEX CHALLENGER Marcia Kaufman COO and Principal Analyst Daniel Kirsch Principal Analyst The Hurwitz Victory Index Report Predixion is one of 10 advanced analytics vendors included in

More information

8 WAYS TO BUILD YOUR BRAND USING SOCIAL MEDIA

8 WAYS TO BUILD YOUR BRAND USING SOCIAL MEDIA TIP SHEET 8 WAYS TO BUILD YOUR BRAND USING SOCIAL MEDIA Social media has changed the way our entire world works. Everyone has an equal voice and immediate access to vast networks of friends and followers.

More information

A Pharmacometrician s Perspective for Utilization of Big Data

A Pharmacometrician s Perspective for Utilization of Big Data Is There a Role of Big Data in Drug Development Decisions? ACoP6 Oct. 5, 2015 Crystal City, VA A Pharmacometrician s Perspective for Utilization of Big Data Marc R. Gastonguay, Ph.D. President & CEO Metrum

More information

APPLIED MATHEMATICS A FUTURE IN

APPLIED MATHEMATICS A FUTURE IN APPLIED MATHEMATICS A FUTURE IN APPLIED MATHEMATICS WHAT IS APPLIED MATHEMATICS? Whether or not we are good at mathematics, most of us would agree that maths is important. It underpins so many aspects

More information

Six Questions to Ask About Your Market Research

Six Questions to Ask About Your Market Research Six Questions to Ask About Your Market Research Don t roll the dice ISR s tagline is Act with confidence because we believe that s what you re buying when you buy quality market research products and services,

More information

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

More information

Network Security. Mobin Javed. October 5, 2011

Network Security. Mobin Javed. October 5, 2011 Network Security Mobin Javed October 5, 2011 In this class, we mainly had discussion on threat models w.r.t the class reading, BGP security and defenses against TCP connection hijacking attacks. 1 Takeaways

More information

A STATISTICS COURSE FOR ELEMENTARY AND MIDDLE SCHOOL TEACHERS. Gary Kader and Mike Perry Appalachian State University USA

A STATISTICS COURSE FOR ELEMENTARY AND MIDDLE SCHOOL TEACHERS. Gary Kader and Mike Perry Appalachian State University USA A STATISTICS COURSE FOR ELEMENTARY AND MIDDLE SCHOOL TEACHERS Gary Kader and Mike Perry Appalachian State University USA This paper will describe a content-pedagogy course designed to prepare elementary

More information

Downloading, Configuring, and Using the Free SAS University Edition Software

Downloading, Configuring, and Using the Free SAS University Edition Software PharmaSUG 2015 Paper CP08 Downloading, Configuring, and Using the Free SAS University Edition Software Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California Charles Edwin Shipp,

More information

INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

More information

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics The Data Engineer Mike Tamir Chief Science Officer Galvanize Steven Miller Global Leader Academic Programs IBM Analytics Alessandro Gagliardi Lead Faculty Galvanize Businesses are quickly realizing that

More information

Role Description. Position of a Data Scientist Machine Learning at Fractal Analytics

Role Description. Position of a Data Scientist Machine Learning at Fractal Analytics Opportunity to work with leading analytics firm that creates Insights, Impact and Innovation. Role Description Position of a Data Scientist Machine Learning at Fractal Analytics March 2014 About the Company

More information