Data Science Will computer science and informatics eat our lunch?

Size: px
Start display at page:

Download "Data Science Will computer science and informatics eat our lunch?"

Transcription

1 Data Science Will computer science and informatics eat our lunch? Thomas Lumley University of Auckland (g)tslumley statschat.org.nz notstat schat.tumblr.com

2 In the 1920s, the computing labs helped establish statistics on the American continent. Without them, even a modest study was beyond the ability of an individual statistician. At the same time, statistics labs often had the most powerful computing machines within their larger institution. They showed how organized computing could benefit science and provided a place for the earliest of computer scientists to test their ideas. -- Grier The origins of statistical computing, Amstat Online

3 Fig. 2. The Hollerith Electric Tabulating System

4 Iowa State Statistical Computing Service

5 CSIRAC

6 Iowa State Statistical Computing Service

7 Iowa State statistics PhD prelim exam Two eight-hour written-on-paper exams covering : Theory of Probability and Statistics I. Theory of Probability and Statistics II. Statistical Methods I. Statistical Methods II. Advanced Statistical Methods. Advanced Probability Theory. Advanced Theory of Statistical Inference. They do require a stat computing course: 1 credit/30

8 What is data science? and where can we get some?

9 Data Science is just a fancy name for statistics. Fitting simple models to messy and sometimes large data sets Combination of standard black-box fitting tools and good graphics. Doesn t require any fundamental knowledge our students don t have. Needs good computing skills, which our students can learn

10 Need to avoid going overboard with computing Data Wrangling isn t statistics Cleaning, tidying, querying, reformatting, transforming, getting in and out of databases,

11 Data Science is just a fancy name for statistics. Data Wrangling isn t statistics If you value self-consistency, you can hold at most one of these opinions. A/Prof Jenny Bryan, UBC (less than one is good)

12 Data science is statistics in the same way that epidemiology is statistics opinion polling is statistics ag. field trials are statistics

13 I did think, however, that many well-known applied statisticians attacked problems without the necessary mathematical knowledge and manipulative skill. Moreover, I believed that a principal cause of failure among medical research scientists was the lack of basic scientific knowledge in their special chosen field. H. O. Lancaster

14 Computing is easier to steal Define and explain the relevance to applied statistics of: Suffix trees Supernodal Cholesky factorization Column-store database Translation look-aside buffer

15 Computing is easier to steal Need to teach our data science students: A bit about databases and SQL A statistical programming language (eg R) Abstractions such as tidy data, sparse, map/reduce Reproducible data analysis (eg rmarkdown)?collaborative version control (eg git/github) Force students to work with a wild-caught data set and I'm still pretty sure some of the data is Permit interested students to learn the high-tech data structures missing, and butalgorithms could still stuff. be here, in this ONE HUNDRED SHEET excel file a PhD student on Twitter

16 But we don t know this stuff! let mego glethat for you Google Search I'm Feeling Lucky The computing folks are way better at dissemination than us Unlike statistics, the computer can tell you if you get it wrong.

17 Free online courses Books Related Courses M Exploratory Data Analysis Reproducible Research Statistical Inference /osljjÿp o D Dynamic Documents with R and knitr Yihui Xie Pract cal Dat Scienc * Nina/ml John Hooni Doing Data Science STRAIGHT TALK FROM THE FRONTLINE Getting and Cleaning Data Regression Models Developing Data Products d«n» «- dcns<ty(dot>i. n - npts) IIMIMINt Cathy O'Neil & Rachel Schutt dy2 <- M» - JfCIO KqtwlM «- rtrfyel.). length(dx)) lf(flu T> confshade(dx2. s«qb«lo». dy? S' I - 5>l The Data Scientist's Toolbox Data Analysis and Statistical Inference People who make their notes available ÿ 5b5 Home FAQ Syllabus Topics People J Data wrangling, exploration, and analysis with R UBC STAT 545A and 547M Software tools Open source environment for deep analysis of largecomplex data The Power of R with Big Data Get Started inminutes Resources to Learn & Join Learn how to explore, groom, visualize, and analyze dab make all of that reproducible, reusable, ar using R software carpentry

18 What do we have to offer? Popularity? Romance? Excitement?

19 Big Complex Messy Badly Sampled Creepy Vital to ask the right questions

20 Big Data Computer folks are better at this than us, but statistical insights important eg: Noel Cressie: fast computation for spatial models Bill Cleveland: optimising the divide/recombine strategy

21 Big Data Computer folks are better at this than us, Big doesn t mean gigabytes.

22 Complex Data Models for complex data Summaries (parameters, estimators) that answer the real questions Robustness of meaning, not just of power and level.

23 Complex Data: networks F(x)µ1- x -a Power laws: come from network, queue, Matthew effect process blog links page views long tail sales data citations to papers word frequencies earthquake sizes

24 Complex Data: networks F(x)µ1- x -a Power laws: come from network, queue, Matthew effect process blog links page views long tail sales data citations to papers word frequencies earthquake sizes All fit lognormal better, some much better Clauset et al, SIAM Rev. 2009

25 Complex Data: networks Random graph models for connections Erdös-Renyi graphs Exponential Random Graph Models (ERGMs) meaningful parameters, nice likelihood ERGMs are not consistent under sampling. [Shalizi et al, Ann Stat]

26 Complex Data Robustness of meaning can be hard: Suppose a Wilcoxon test shows X > Y, Y>Z What does this tell us about Means of X and Z? Medians of X and Z? Wilcoxon test of X and Z?

27 i i Messy Data Good applied statisticians know from messy data. o CM - X O and I'm still pretty sure some of the data is missing, but it could still be here, in this ONE HUNDRED SHEET excel file blooc Diastolic NnT i o r o a PhD student on Twitter 0 ao o c Age (years)

28 Badly Sampled Whom the Gods Would Destroy, They First Give Real-time Analytics [Dan McKinley, Etsy] This line of thinking is a trap. It's important to divorce the concepts of operational metrics and product analytics. Confusing how we do things with how we decide which things to do is a fatal mistake. Because non-representativeness of short time slices

29 Badly Sampled Statisticians know about sampling design weighting matching selection models

30 Creepy What questions should data answer? income Mount Eden atistics NZ Chris McDowall Based on census meshblock: not actual household data

31 Creepy (and Evil) What questions should data answer? Familiar issues: Bioethics Statistical disclosure/confidentiality New, but statistical issues: algorithm audit/accountability We also talk to social scientists more. (not enough)

32 Creepy (and Evil) How do we learn more? let me LjOOQie that for you Googlo Search I'm Fooling Lucky Cathy O Neil (mathbabe.org) Ed Felten danah boyd

33 Summary The hard problems in data science are hard. Many of the computational ones are solved (ish) Many of the unsolved ones are closer to statistics

34 Data Science Will computer science and informatics eat our lunch? Only if we let them, and it would be bad for data science, too

What is Data Science? Girl Develop It! Meetup Renée M. P. Teate, March 2015

What is Data Science? Girl Develop It! Meetup Renée M. P. Teate, March 2015 What is Data Science? { Girl Develop It! Meetup Renée M. P. Teate, March 2015 Let s start with: What is Data? http://upload.wikimedia.org/wikipedia/commons/f/f0/darpa _Big_Data.jpg https://encryptedtbn2.gstatic.com/images?q=tbn:and9gcs9dku3_tzi-swwyaqee5y0ehuvoiznsya_raknubbd0jyxpx7pw

More information

What is Data Science? Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014

What is Data Science? Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014 What is Data Science? { Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014 Let s start with: What is Data? http://upload.wikimedia.org/wikipedia/commons/f/f0/darpa

More information

Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

More information

Healthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw

Healthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Healthcare data analytics Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

How to Make Money with Google Adwords. For Cleaning Companies. H i tm a n. Advertising

How to Make Money with Google Adwords. For Cleaning Companies. H i tm a n. Advertising How to Make Money with Google Adwords For Cleaning Companies. H i tm a n Advertising Target Clients Profitably Google Adwords can be one of the best returns for your advertising dollar. Or, it could be

More information

Why Big Data is not Big Hype in Economics and Finance?

Why Big Data is not Big Hype in Economics and Finance? Why Big Data is not Big Hype in Economics and Finance? Ariel M. Viale Marshall E. Rinker School of Business Palm Beach Atlantic University West Palm Beach, April 2015 1 The Big Data Hype 2 Big Data as

More information

Computer Programming for the Social Sciences

Computer Programming for the Social Sciences Department of Social and Political Sciences Computer Programming for the Social Sciences This two day workshop will teach beginner level, practical computer programming skills for use in social science

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Introduction to Data Science: CptS 483-06 Syllabus First Offering: Fall 2015

Introduction to Data Science: CptS 483-06 Syllabus First Offering: Fall 2015 Course Information Introduction to Data Science: CptS 483-06 Syllabus First Offering: Fall 2015 Credit Hours: 3 Semester: Fall 2015 Meeting times and location: MWF, 12:10 13:00, Sloan 163 Course website:

More information

U N D E R S TA N D I N G T H E D N A O F DATA SCIENCE. 2014 Persontyle Ltd. All rights reserved.

U N D E R S TA N D I N G T H E D N A O F DATA SCIENCE. 2014 Persontyle Ltd. All rights reserved. U N D E R S TA N D I N G T H E D N A O F DATA SCIENCE 010100101010011110100101010 101010101010101010101001010 101010100101010101010010101 WHAT IS DATA SCIENCE? One day course to understand the concepts

More information

Big Data Big Knowledge?

Big Data Big Knowledge? EBPI Epidemiology, Biostatistics and Prevention Institute Big Data Big Knowledge? Torsten Hothorn 2015-03-06 The end of theory The End of Theory: The Data Deluge Makes the Scientific Method Obsolete (Chris

More information

Data Science with Hadoop at Opower

Data Science with Hadoop at Opower Data Science with Hadoop at Opower Erik Shilts Advanced Analytics erik.shilts@opower.com What is Opower? A study: $$$ Turn off AC & Turn on Fan Environment Turn off AC & Turn on Fan Citizenship Turn off

More information

ANALYTICS A FUTURE IN ANALYTICS

ANALYTICS A FUTURE IN ANALYTICS ANALYTICS A FUTURE IN ANALYTICS WHAT IS ANALYTICS? In the information age in which we live, almost all of us consume and produce digital data, either for business, community or private uses. We access

More information

The 3 questions to ask yourself about BIG DATA

The 3 questions to ask yourself about BIG DATA The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.

More information

Francois Ajenstat, Tableau Stephanie McReynolds, Aster Data Steve e Wooledge, Aster Data

Francois Ajenstat, Tableau Stephanie McReynolds, Aster Data Steve e Wooledge, Aster Data Deep Data Exploration: Find Patterns in Your Data Faster & Easier Curt Monash, Founder and President, Monash Research Francois Ajenstat, Tableau Stephanie McReynolds, Aster Data Steve e Wooledge, Aster

More information

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques

More information

Making data predictive why reactive just isn t enough

Making data predictive why reactive just isn t enough Making data predictive why reactive just isn t enough Andrew Peterson, Ph.D. Principal Data Scientist Soltius NZ, Ltd. New Zealand 2014 Big Data and Analytics Forum 18 August, 2014 Caveats and disclaimer:

More information

Computational Science and Informatics (Data Science) Programs at GMU

Computational Science and Informatics (Data Science) Programs at GMU Computational Science and Informatics (Data Science) Programs at GMU Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ Outline Graduate Program

More information

Data Analytics at NICTA. Stephen Hardy National ICT Australia (NICTA) shardy@nicta.com.au

Data Analytics at NICTA. Stephen Hardy National ICT Australia (NICTA) shardy@nicta.com.au Data Analytics at NICTA Stephen Hardy National ICT Australia (NICTA) shardy@nicta.com.au NICTA Copyright 2013 Outline Big data = science! Data analytics at NICTA Discrete Finite Infinite Machine Learning

More information

POL 204b: Research and Methodology

POL 204b: Research and Methodology POL 204b: Research and Methodology Winter 2010 T 9:00-12:00 SSB104 & 139 Professor Scott Desposato Office: 325 Social Sciences Building Office Hours: W 1:00-3:00 phone: 858-534-0198 email: swd@ucsd.edu

More information

Customer Case Study. Automatic Labs

Customer Case Study. Automatic Labs Customer Case Study Automatic Labs Customer Case Study Automatic Labs Benefits Validated product in days Completed complex queries in minutes Freed up 1 full-time data scientist Infrastructure savings

More information

Streamline your supply chain with data. How visual analysis helps eliminate operational waste

Streamline your supply chain with data. How visual analysis helps eliminate operational waste Streamline your supply chain with data How visual analysis helps eliminate operational waste emagazine October 2011 contents 3 Create a data-driven supply chain: 4 paths to insight 4 National Motor Club

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Practical Data Science with R

Practical Data Science with R Practical Data Science with R Instructor Matthew Renze Twitter: @matthewrenze Email: matthew@matthewrenze.com Web: http://www.matthewrenze.com Course Description Data science is the practice of transforming

More information

FIVE STEPS FOR DELIVERING SELF-SERVICE BUSINESS INTELLIGENCE TO EVERYONE CONTENTS

FIVE STEPS FOR DELIVERING SELF-SERVICE BUSINESS INTELLIGENCE TO EVERYONE CONTENTS FIVE STEPS FOR DELIVERING SELF-SERVICE BUSINESS INTELLIGENCE TO EVERYONE Wayne Eckerson CONTENTS Know Your Business Users Create a Taxonomy of Information Requirements Map Users to Requirements Map User

More information

INDEX. Introduction Page 3. Methodology Page 4. Findings. Conclusion. Page 5. Page 10

INDEX. Introduction Page 3. Methodology Page 4. Findings. Conclusion. Page 5. Page 10 FINDINGS 1 INDEX 1 2 3 4 Introduction Page 3 Methodology Page 4 Findings Page 5 Conclusion Page 10 INTRODUCTION Our 2016 Data Scientist report is a follow up to last year s effort. Our aim was to survey

More information

POSTGRADUATE PROGRAMS IN APPLIED DATA ANALYTICS

POSTGRADUATE PROGRAMS IN APPLIED DATA ANALYTICS POSTGRADUATE PROGRAMS IN APPLIED DATA ANALYTICS ANU College of Engineering & Computer Science Postgraduate Programs in Applied Data and Analytics 1 ANU is pleased to offer new postgraduate study opportunities

More information

GETTING AHEAD OF THE COMPETITION WITH DATA MINING

GETTING AHEAD OF THE COMPETITION WITH DATA MINING WHITE PAPER GETTING AHEAD OF THE COMPETITION WITH DATA MINING Ultimately, data mining boils down to continually finding new ways to be more profitable which in today s competitive world means making better

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

Data Analytics in Organisations and Business

Data Analytics in Organisations and Business Data Analytics in Organisations and Business Dr. Isabelle E-mail: isabelle.flueckiger@math.ethz.ch 1 Data Analytics in Organisations and Business Some organisational information: Tutorship: Gian Thanei:

More information

Page Replacement Strategies. Jay Kothari Maxim Shevertalov CS 370: Operating Systems (Summer 2008)

Page Replacement Strategies. Jay Kothari Maxim Shevertalov CS 370: Operating Systems (Summer 2008) Page Replacement Strategies Jay Kothari (jayk@drexel.edu) Maxim Shevertalov (max@drexel.edu) CS 370: Operating Systems (Summer 2008) Page Replacement Policies Why do we care about Replacement Policy? Replacement

More information

Predictive Analytics Enters the Mainstream

Predictive Analytics Enters the Mainstream Ventana Research: Predictive Analytics Enters the Mainstream Predictive Analytics Enters the Mainstream Taking Advantage of Trends to Gain Competitive Advantage White Paper Sponsored by 1 Ventana Research

More information

Introduction to predictive modeling and data mining

Introduction to predictive modeling and data mining Introduction to predictive modeling and data mining Rebecca C. Steorts Predictive Modeling and Data Mining: STA 521 August 25 2015 1 Today s Menu 1. Brief history of data science (from slides of Bin Yu)

More information

Interoperability and Analytics February 29, 2016

Interoperability and Analytics February 29, 2016 Interoperability and Analytics February 29, 2016 Matthew Hoffman MD, CMIO Utah Health Information Network Conflict of Interest Matthew Hoffman, MD Has no real or apparent conflicts of interest to report.

More information

A future career in analytics

A future career in analytics A future career in analytics What is a career in analytics about? In the information age in which we live, almost all of us consume and produce digital data, either for business, community or private uses.

More information

A Robust Method for Solving Transcendental Equations

A Robust Method for Solving Transcendental Equations www.ijcsi.org 413 A Robust Method for Solving Transcendental Equations Md. Golam Moazzam, Amita Chakraborty and Md. Al-Amin Bhuiyan Department of Computer Science and Engineering, Jahangirnagar University,

More information

Challenges, Tools and Examples for Big Data Inference

Challenges, Tools and Examples for Big Data Inference Challenges, Tools and Examples for Big Data Inference Jean-François Plante, HEC Montréal Closing Conference: Statistical and Computational Analytics for Big Data June 12 th, 2015 What is Big Data? Dan

More information

» Dealing with Big Data: David Hakken Weighs In blog.castac.org file:///users/dhakken/documents/» Dealing with Big Data_ Dav...

» Dealing with Big Data: David Hakken Weighs In blog.castac.org file:///users/dhakken/documents/» Dealing with Big Data_ Dav... Search: Go blog.castac.org From the Committee on the Anthropology of Science, Technology, and Computing (CASTAC) About Adventures in Pedagogy Beyond the Academy Member Sound-Off News, Links, and Pointers

More information

The Edge Editions of SAP InfiniteInsight Overview

The Edge Editions of SAP InfiniteInsight Overview Analytics Solutions from SAP The Edge Editions of SAP InfiniteInsight Overview Enabling Predictive Insights with Mouse Clicks, Not Computer Code Table of Contents 3 The Case for Predictive Analysis 5 Fast

More information

Probabilities and Proportions

Probabilities and Proportions CHAPTER 4 Probabilities and Proportions Chapter Overview While the graphic and numeric methods of Chapters 2 and 3 provide us with tools for summarizing data, probability theory, the subject of this chapter,

More information

UNIFY YOUR (BIG) DATA

UNIFY YOUR (BIG) DATA UNIFY YOUR (BIG) DATA ANALYTIC STRATEGY GIVE ANY USER ANY ANALYTIC ON ANY DATA Scott Gnau President, Teradata Labs scott.gnau@teradata.com t Unify Your (Big) Data Analytic Strategy Technology excitement:

More information

What is Data Analysis. Kerala School of MathematicsCourse in Statistics for Scientis. Introduction to Data Analysis. Steps in a Statistical Study

What is Data Analysis. Kerala School of MathematicsCourse in Statistics for Scientis. Introduction to Data Analysis. Steps in a Statistical Study Kerala School of Mathematics Course in Statistics for Scientists Introduction to Data Analysis T.Krishnan Strand Life Sciences, Bangalore What is Data Analysis Statistics is a body of methods how to use

More information

Confidence intervals, t tests, P values

Confidence intervals, t tests, P values Confidence intervals, t tests, P values Joe Felsenstein Department of Genome Sciences and Department of Biology Confidence intervals, t tests, P values p.1/31 Normality Everybody believes in the normal

More information

HR STILL GETTING IT WRONG BIG DATA & PREDICTIVE ANALYTICS THE RIGHT WAY

HR STILL GETTING IT WRONG BIG DATA & PREDICTIVE ANALYTICS THE RIGHT WAY HR STILL GETTING IT WRONG BIG DATA & PREDICTIVE ANALYTICS THE RIGHT WAY OVERVIEW Research cited by Forbes estimates that more than half of companies sampled (over 60%) are investing in big data and predictive

More information

Tips to ensuring the success of big data analytics initiatives

Tips to ensuring the success of big data analytics initiatives Tips to ensuring the success of big data Big data analytics is hot. Read any IT publication or website and you ll see business intelligence (BI) vendors and their systems integration partners pitching

More information

Big Data to Knowledge (BD2K)

Big Data to Knowledge (BD2K) Big Data to Knowledge () potential funding agency synergies Jennie Larkin, PhD Office of the Associate Director of Data Science National Institutes of Health idash-pscanner meeting UCSD September 16, 2014

More information

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials 5th August 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations

More information

Statistics, Big Data and Data Science!?

Statistics, Big Data and Data Science!? Statistics, Big Data and Data Science!? Prof. Dr. Göran Kauermann Ludwig-Maximilians-Universität Munich, Germany Statistics, Big Data and Data Science Statistics Founded around 1900 with the seminal work

More information

Big Data and Privacy. Fritz Henglein Dept. of Computer Science, University of Copenhagen. Finance IT Day Riga, 2015-03-26

Big Data and Privacy. Fritz Henglein Dept. of Computer Science, University of Copenhagen. Finance IT Day Riga, 2015-03-26 Big Data and Privacy Fritz Henglein Dept. of Computer Science, University of Copenhagen Finance IT Day Riga, 2015-03-26 About me Professor, Programming Languages and Systems, University of Copenhagen Director,

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

Master of Science in Healthcare Informatics and Analytics Program Overview

Master of Science in Healthcare Informatics and Analytics Program Overview Master of Science in Healthcare Informatics and Analytics Program Overview The program is a 60 credit, 100 week course of study that is designed to graduate students who: Understand and can apply the appropriate

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 36 Outline

More information

SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

More information

Six Signs. you are ready for BI WHITE PAPER

Six Signs. you are ready for BI WHITE PAPER Six Signs you are ready for BI WHITE PAPER LET S TAKE A LOOK AT THE WAY YOU MIGHT BE MONITORING AND MEASURING YOUR COMPANY About the auther You re managing information from a number of different data sources.

More information

Course Title: Advanced Topics in Quantitative Methods: Educational Data Science Practicum

Course Title: Advanced Topics in Quantitative Methods: Educational Data Science Practicum COURSE NUMBER: APSTA- GE.2017 Course Title: Advanced Topics in Quantitative Methods: Educational Data Science Practicum Number of Credits: 2 Meeting Pattern: 3 hours per week, 7 weeks; first class meets

More information

Correlational Research

Correlational Research Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.

More information

A Review of "Free" Massive Open Online Content (MOOC) for SAS Learners

A Review of Free Massive Open Online Content (MOOC) for SAS Learners PharmaSUG 2015 Paper A Review of "Free" Massive Open Online Content (MOOC) for SAS Learners Kirk Paul Lafler, Software Intelligence Corporation Abstract Leading online providers are now offering SAS users

More information

Data Structures and Programming

Data Structures and Programming Data Structures and Programming http://www.cs.sfu.ca/cc/225/johnwill/ John Edgar 2 Assignments and labs 30% Midterm exam in class 25% Final exam 45% John Edgar 3 Data Structures Algorithms Software Development

More information

Training for Big Data

Training for Big Data Training for Big Data Learnings from the CATS Workshop Raghu Ramakrishnan Technical Fellow, Microsoft Head, Big Data Engineering Head, Cloud Information Services Lab Store any kind of data What is Big

More information

Five Reasons Spotfire Is Better than Excel for Business Data Analytics

Five Reasons Spotfire Is Better than Excel for Business Data Analytics Five Reasons Spotfire Is Better than Excel for Business Data Analytics A hugely versatile application, Microsoft Excel is the Swiss Army Knife of IT, able to cope with all kinds of jobs from managing personal

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

5 - Low Cost Ways to Increase Your

5 - Low Cost Ways to Increase Your - 5 - Low Cost Ways to Increase Your DIGITAL MARKETING Presence Contents Introduction Social Media Email Marketing Blogging Video Marketing Website Optimization Final Note 3 4 7 9 11 12 14 2 Taking a Digital

More information

Jay Buckingham Dynamic Signal jbuckingham@dynamicsignal.com

Jay Buckingham Dynamic Signal jbuckingham@dynamicsignal.com Jay Buckingham Dynamic Signal jbuckingham@dynamicsignal.com Financial Times PeHub.com Wall Street Journal Harvard Business Review Making use of vast amounts of data to: Discover what we don t know Obtain

More information

Statistics in Applications III. Distribution Theory and Inference

Statistics in Applications III. Distribution Theory and Inference 2.2 Master of Science Degrees The Department of Statistics at FSU offers three different options for an MS degree. 1. The applied statistics degree is for a student preparing for a career as an applied

More information

INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

More information

INTRODUCING AZURE MACHINE LEARNING

INTRODUCING AZURE MACHINE LEARNING David Chappell INTRODUCING AZURE MACHINE LEARNING A GUIDE FOR TECHNICAL PROFESSIONALS Sponsored by Microsoft Corporation Copyright 2015 Chappell & Associates Contents What is Machine Learning?... 3 The

More information

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization

More information

Network Security. Mobin Javed. October 5, 2011

Network Security. Mobin Javed. October 5, 2011 Network Security Mobin Javed October 5, 2011 In this class, we mainly had discussion on threat models w.r.t the class reading, BGP security and defenses against TCP connection hijacking attacks. 1 Takeaways

More information

APPLIED MATHEMATICS A FUTURE IN

APPLIED MATHEMATICS A FUTURE IN APPLIED MATHEMATICS A FUTURE IN APPLIED MATHEMATICS WHAT IS APPLIED MATHEMATICS? Whether or not we are good at mathematics, most of us would agree that maths is important. It underpins so many aspects

More information

A Pharmacometrician s Perspective for Utilization of Big Data

A Pharmacometrician s Perspective for Utilization of Big Data Is There a Role of Big Data in Drug Development Decisions? ACoP6 Oct. 5, 2015 Crystal City, VA A Pharmacometrician s Perspective for Utilization of Big Data Marc R. Gastonguay, Ph.D. President & CEO Metrum

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Website Promotion for Voice Actors: How to get the Search Engines to give you Top Billing! By Jodi Krangle http://www.voiceoversandvocals.

Website Promotion for Voice Actors: How to get the Search Engines to give you Top Billing! By Jodi Krangle http://www.voiceoversandvocals. Website Promotion for Voice Actors: How to get the Search Engines to give you Top Billing! By Jodi Krangle http://www.voiceoversandvocals.com Why have a website? If you re busier than you d like to be

More information

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

More information

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics The Data Engineer Mike Tamir Chief Science Officer Galvanize Steven Miller Global Leader Academic Programs IBM Analytics Alessandro Gagliardi Lead Faculty Galvanize Businesses are quickly realizing that

More information

Six Questions to Ask About Your Market Research

Six Questions to Ask About Your Market Research Six Questions to Ask About Your Market Research Don t roll the dice ISR s tagline is Act with confidence because we believe that s what you re buying when you buy quality market research products and services,

More information

What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling)

What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling) data analysis data mining quality control web-based analytics What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling) StatSoft

More information

5 Point Social Media Action Plan.

5 Point Social Media Action Plan. 5 Point Social Media Action Plan. Workshop delivered by Ian Gibbins, IG Media Marketing Ltd (ian@igmediamarketing.com, tel: 01733 241537) On behalf of the Chambers Communications Sector Introduction: There

More information

EBPI Epidemiology, Biostatistics and Prevention Institute. Big Data Science. Torsten Hothorn 2014-03-31

EBPI Epidemiology, Biostatistics and Prevention Institute. Big Data Science. Torsten Hothorn 2014-03-31 EBPI Epidemiology, Biostatistics and Prevention Institute Big Data Science Torsten Hothorn 2014-03-31 The end of theory The End of Theory: The Data Deluge Makes the Scientific Method Obsolete (Chris Anderson,

More information

Five Tips for Presenting Data Analyses: Telling a Good Story with Data

Five Tips for Presenting Data Analyses: Telling a Good Story with Data Five Tips for Presenting Data Analyses: Telling a Good Story with Data As a professional business or data analyst you have both the tools and the knowledge needed to analyze and understand data collected

More information

Actuary vs Data Scientist

Actuary vs Data Scientist Actuary vs Data Scientist Richard Pugh Chief Data Scientist, Mango Solutions President @ R Consortium Chris Reynolds Head of Life Solutions Actuarial, PartnerRe 10 November 2015 Disclaimer The following

More information

T he complete guide to SaaS metrics

T he complete guide to SaaS metrics T he complete guide to SaaS metrics What are the must have metrics each SaaS company should measure? And how to calculate them? World s Simplest Analytics Tool INDEX Introduction 4-5 Acquisition Dashboard

More information

Introduction to Regression. Dr. Tom Pierce Radford University

Introduction to Regression. Dr. Tom Pierce Radford University Introduction to Regression Dr. Tom Pierce Radford University In the chapter on correlational techniques we focused on the Pearson R as a tool for learning about the relationship between two variables.

More information

Differential privacy in health care analytics and medical research An interactive tutorial

Differential privacy in health care analytics and medical research An interactive tutorial Differential privacy in health care analytics and medical research An interactive tutorial Speaker: Moritz Hardt Theory Group, IBM Almaden February 21, 2012 Overview 1. Releasing medical data: What could

More information

Experiment #1, Analyze Data using Excel, Calculator and Graphs.

Experiment #1, Analyze Data using Excel, Calculator and Graphs. Physics 182 - Fall 2014 - Experiment #1 1 Experiment #1, Analyze Data using Excel, Calculator and Graphs. 1 Purpose (5 Points, Including Title. Points apply to your lab report.) Before we start measuring

More information

Big Data Analytics. Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs

Big Data Analytics. Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs 1 Big Data Analytics Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs Montevideo, 22 nd November 4 th December, 2015 INFORMATIQUE

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

A Changing Standard for SEO Spam:

A Changing Standard for SEO Spam: A Changing Standard for SEO Spam: Google Penguin, Link Penalties & Declining Leniency Overview If you own a small or medium-sized business, you ve likely hired an outside vendor to build external links

More information

Statistical Challenges with Big Data in Management Science

Statistical Challenges with Big Data in Management Science Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision

More information

Top 5 Mistakes Made with Inventory Management for Online Stores

Top 5 Mistakes Made with Inventory Management for Online Stores Top 5 Mistakes Made with Inventory Management for Online Stores For any product you sell, you have an inventory. And whether that inventory fills dozens of warehouses across the country, or is simply stacked

More information

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2 DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.

More information

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

More information

SURVEY REPORT DATA SCIENCE SOCIETY 2014

SURVEY REPORT DATA SCIENCE SOCIETY 2014 SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses

More information

Use advanced techniques for summary and visualization of complex data for exploratory analysis and presentation.

Use advanced techniques for summary and visualization of complex data for exploratory analysis and presentation. MS Biostatistics MS Biostatistics Competencies Study Development: Work collaboratively with biomedical or public health researchers and PhD biostatisticians, as necessary, to provide biostatistical expertise

More information

Everything you wanted to know about using Hexadecimal and Octal Numbers in Visual Basic 6

Everything you wanted to know about using Hexadecimal and Octal Numbers in Visual Basic 6 Everything you wanted to know about using Hexadecimal and Octal Numbers in Visual Basic 6 Number Systems No course on programming would be complete without a discussion of the Hexadecimal (Hex) number

More information

Monday 28 January 2013 Morning

Monday 28 January 2013 Morning Monday 28 January 2013 Morning AS GCE MATHEMATICS 4732/01 Probability and Statistics 1 QUESTION PAPER * 4 7 3 3 8 5 0 1 1 3 * Candidates answer on the Printed Answer Book. OCR supplied materials: Printed

More information

Classroom Demonstrations of Big Data

Classroom Demonstrations of Big Data Classroom Demonstrations of Big Data Eric A. Suess Abstract We present examples of accessing and analyzing large data sets for use in a classroom at the first year graduate level or senior undergraduate

More information