Appendix III: Ten (10) Specialty Areas Data Sciences

Size: px
Start display at page:

Download "Appendix III: Ten (10) Specialty Areas Data Sciences"

Transcription

1 Appendix III: Ten (10) Specialty Areas Data Sciences Curriculum Mapping to Knowledge Units-Data Sciences Specialty Area IX. Data Sciences Specialty Area 1. Knowledge Unit title: Research Design and Application for Data and Analysis A. Knowledge Unit description and objective: Framing the analytic problem for a solution using critical thinking along with various statistical, mathematical, or algorithmic tools and software. B. Requirement satisfaction: This KU is satisfied when seven (7) Topics and all Learning Objectives are met. Research Design and Application for Data and Analysis IX.1C1 IX.1C2 IX.1C3 IX.1C4 IX.1C5 IX.1C6 IX.1C7 IX.1C8 IX.1C9 IX.1C10 IX.1C11 IX.1C12 IX.1D1 IX.1D2 IX.1D3 IX.1D4 Fundamentals and historical context of data analytics and the data science pipeline Components of data sets Different data structures Common data-representation schemes and structures Scope the resources required for a data science project Know what analyses are possible given a particular data set, including both the state of the art of the field and inherent limitations Making reproducible research and processes Basic statistical understanding including probability distributions, hypothesis testing, and linear regression, and causality. Types of data science questions i.e. Descriptive, Exploratory, Inferential, and so on Design of experiment Sampling Critical thinking and logic Discuss what data represents Describe the components of data Identify common data structures used for collection for analytic problems. Discuss common data-representation schemes and structures: unstructured and semi-structured data: text, web logs, and html.

2 IX.1D5 IX.1D6 IX.1D7 IX.1D8 IX.1D9 Research Design and Application for Data and Analysis Explain the resources required to develop and complete a data science project with a timeline and cost estimate Describe best practices of reproducible data analysis Identify various experimental designs and describe the benefits and constraints of each Explain various sampling schemes Describe common critical thinking techniques

3 2. Knowledge Unit title: Data Storage and Preparation A. Knowledge Unit description and objective: Understand and be familiar with obtaining and cleaning data for analysis B. Requirement satisfaction: This KU is satisfied when all Topics and all Learning Objectives are met. Data Storage and Preparation IX.2C1 IX.2C2 IX.2C3 IX.2C4 IX.2C5 IX.2C6 IX.2C7 IX.2C8 IX.2C9 IX.2D1 IX.2D2 IX.2D3 IX.2D4 IX.2D5 IX.2D6 IX.2D7 IX.2D8 Data acquisition Dealing with Big Data sets: ETL, SQL, non-sql, data nodes, data fusion/integration, data transformation Data cleaning Data Recoding Understand specialized systems and algorithms that have been developed to work with data at scale, including MapReduce and other software; core techniques in distributed systems; characteristics of HPC and cloud platforms; and important scalable algorithms for graphs, streams and text. Data Munging/Mining: PCA, feature Extraction, binding, unbiased estimators, handing missing variables and outliers, normalization, dimensionality reduction, denoising, sampling Tidy Data CRISP-DM Data base structures and trade-offs Describe how to access data from a variety of sources including relational databases, NoSQL data stores, webbased APIs Demonstrate programming skills in R, Hadoop and other languages to mine massive amounts of information Prepare clean data Show how to reformat/recode data for analysis Apply dimensionality reduction techniques to big data sets Explain CRISP-DM data mining construct Explain different data base structures and the benefits and draw backs of each Describe the tidy data concept and employ it to produce a clean data set

4 3. Knowledge Unit title: Exploring and Analyzing Data A. Knowledge Unit description and objective: Understand and be familiar on applying analytic techniques and algorithms (including statistical and data mining approaches) to large data sets to extract meaningful insights B. Requirement satisfaction: This KU is satisfied when all Topics and all Learning Objectives are met. Exploring and Analyzing Data IX.3C1 IX.3C2 IX.3C3 IX.3C4 IX.3C5 IX.3C6 IX.3C7 IX.3D1 IX.3D2 IX.3D3 IX.3D4 IX.3D5 IX.3D6 Exploratory analysis and inferential hypothesis testing through the basics of statistical analysis Data analyses using comparisons between batches, analysis of variance and linear and logistic regression. Evaluation of assumptions; data transformation; reliability of statistical measures; resampling methods; validation of assumptions; interpretation; causation versus correlation Principles of Bayesian Statistics Spatial Statistics Time-Series Analysis Programming for data analysis (e.g., SAS, R or Python) to include data frames, vectors, matrices, reading and writing data, sub-setting, REGEX, functions, and factor analysis. Text mining/nlp: corpus, text analysis, TF/F, SVM, feature extraction, sentiment analysis Apply statistical methods and regression techniques to make sense out of data sets both large and small Demonstrate how to apply Bayesian statistics to solve problems Employ time series analysis to temporal and spatiotemporal data Employ spatial statistics to spatial and spatio-temporal data Use various statistical packages or programs to conduct data analysis Apply text mining techniques to unstructured textual data

5 4. Knowledge Unit title: Machine Learning and Statistical Models A. Knowledge Unit description: Understand and be familiar with building appropriate machine learning applications for tasks. B. Requirement satisfaction: This KU is satisfied when at least seven (7) Topics and all Learning Objectives are met. Machine Learning and Statistical Models IX.4C1 IX.4C2 IX.4C3 IX.4C4 IX.4C5 IX.4C6 IX.4D1 IX.4D2 IX.4D3 IX.4D4 IX.4D5 IX.4D6 IX.4D7 IX.4D8 IX.4D9 Introduction of the theory and application of statistical machine learning Topics include supervised versus unsupervised learning; and regression, classification, clustering, and dimensionality reduction Deep Learning techniques, especially CNN and computer vision Collaborative Filtering/Recommendation Engines Model Evaluation Machine learning applications Open-source programming tools and techniques available for implementing machine learning Identify potential applications of machine learning Describe the differences in type of analyses enabled by regression, classification, clustering, and dimensionality reduction Select the appropriate machine learning technique Explain the difference between machine learning and deep learning and describe the structure of deep learning techniques Apply regression, classification, clustering, retrieval, recommender systems, and deep learning Assess the model quality with relevant error metrics Use a fitted model to analyze new data Build an end-to-end application that uses machine learning at its core Implement these techniques in Python or R (or in the language of your choice, though Python or R is highly recommended)

6 5. Knowledge Unit title: Data Visualization and Communication A. Knowledge Unit description and objective: Understand and be familiar with the ability to model and communicate results of analysis effectively (visually and verbally) to a broad audience. B. Requirement satisfaction: This KU is satisfied when at least all Topics and all Learning Objectives are met. Data Visualization and Communication IX.5C1 IX.5C2 IX.5C3 IX.5C4 IX.5C5 Types of infographics: decision trees, neural networks, survey plots, timelines, bubble charts, scatterplots, tree maps, histograms, boxplots, etc. Communicating quantitative information through storytelling to impact the organization Understand the design and presentation of digital information using modern visualization software (e.g., Tableau, ggplot2, D3.js, matplotlib, Qlikview) Identify common design principles for visualizations (e.g., Edward Tufte's The Visual Display of Quantitative Information) Presenting appropriate data visualizations for specific customers IX.5D1 IX.5D2 IX.5D3 IX.5D4 IX.5D5 IX.5D6 Design and critique visualizations Prepare infographics and dashboards in at least one program (e.g., MATLAB, Tableau, etc.) and programming language (e.g., R, Python, etc.) Construct streamlined analyses and highlight their implications efficiently using visualizations Produce effective visualizations that harness the human brain s innate perceptual and cognitive tendencies Explore methods of presenting complex information to enhance comprehension and analysis; and the incorporation of visualization techniques into humancomputer interfaces. Explain the state-of-the-art in privacy, ethics, governance around big data and data science

INDUSTRIAL TRAINING IN BIG DATA &DATA ANALYTICS

INDUSTRIAL TRAINING IN BIG DATA &DATA ANALYTICS INDUSTRIAL TRAINING IN BIG DATA &DATA ANALYTICS Course Name Duration Prerequisites Who Should Attend? What to expect? Course Focus Big Data & Data Analytics 6 Months (4 hours/day) Basic familiarity of

More information

This workshop is intended for individuals who are interested in learning data science, or who want to begin their career as a data scientist.

This workshop is intended for individuals who are interested in learning data science, or who want to begin their career as a data scientist. 5 Days (Instructor-Led) Introduction Our lives are flooded by large amount of information, but not all of them is useful data. Therefore, it is essential for us to learn how to applying data science to

More information

Data Mining. SPSS Clementine Clementine Overview. Fall 2009 Instructor: Dr. Masoud Yaghini. Clementine

Data Mining. SPSS Clementine Clementine Overview. Fall 2009 Instructor: Dr. Masoud Yaghini. Clementine Data Mining SPSS 12.0 1. Overview Fall 2009 Instructor: Dr. Masoud Yaghini Introduction Types of Models Interface References Outline Introduction Introduction Three of the common data mining tools SPSS

More information

Matt Edmonds Predictive Analytics

Matt Edmonds Predictive Analytics Matt Edmonds Predictive Analytics Agenda Introduction to PA Guidewire s Approach to PA Case Studies Software Conclusion Page 2 Introduction to Predictive Analytics What is Predictive Analytics Predictive

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2 DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

DATA ANALYTICS USING R

DATA ANALYTICS USING R DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

K2 Data Science WELCOME TO K2. Learning the Fundamental Skills. Creating Data Driven Applications. Portfolio and Career Support

K2 Data Science WELCOME TO K2. Learning the Fundamental Skills. Creating Data Driven Applications. Portfolio and Career Support K2 Data Science Become a data scientist with our mentor-led program. WELCOME TO K2 Learning the Fundamental Skills Spend the first half of the program learning fundamental skills through recorded lectures,

More information

Knowledge Discovery in Databases

Knowledge Discovery in Databases Knowledge Discovery in Databases Javier Béjar cbea CS - MIA AMLT - 2016/2017 Javier Béjar cbea (CS - MIA) Knowledge Discovery in Databases AMLT - 2016/2017 1 / 32 Outline 1 Knowledge Discovery in Databases

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Master of Science in Data Science and Analytics

Master of Science in Data Science and Analytics Master of Science in Data Science and Analytics Societal Need Hiring Demand by Metro Area for Big Data Experience in Canada Data Scientist We need teams Statistics/ Math Software Engineering Machine learning

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

HOW TO TURBOCHARGE YOUR MODELING, GO FROM ZERO TO HERO DR MARK CHIA PRACTICE LEAD, ADVANCED ANALYTICS

HOW TO TURBOCHARGE YOUR MODELING, GO FROM ZERO TO HERO DR MARK CHIA PRACTICE LEAD, ADVANCED ANALYTICS HOW TO TURBOCHARGE YOUR MODELING, GO FROM ZERO TO HERO DR MARK CHIA PRACTICE LEAD, ADVANCED ANALYTICS OVERVIEW Introduction Factory Miner Summary Q & A INTRODUCTION DATA SIZE ARE YOUR DECISIONS KEEPING

More information

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Description The Helzberg School of Management has launched two graduate-level certificates: one in Data

More information

From Data to next best action, using Predictive Analytics SPSS MODELER

From Data to next best action, using Predictive Analytics SPSS MODELER From Data to next best action, using Predictive Analytics SPSS MODELER Agenda Introduction to Predictive Analytics and Data Mining IBM SPSS Modeler Work Bench Data Preparation and Data Understanding Automated

More information

Products SAP HANA Vora SAP HANA platform

Products SAP HANA Vora SAP HANA platform Products SAP HANA Vora SAP HANA platform Description Key capabilities An in-memory, distributed computing solution that runs on Hadoop and builds upon Apache Spark to deliver enriched, interactive analytics

More information

Master of Science in Health Information Technology Degree Curriculum

Master of Science in Health Information Technology Degree Curriculum Master of Science in Health Information Technology Degree Curriculum Core courses: 8 courses Total Credit from Core Courses = 24 Core Courses Course Name HRS Pre-Req Choose MIS 525 or CIS 564: 1 MIS 525

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

STA 200 Statistics: A Force in Human Judgment. STA 200 Course Competencies. General Education Competencies

STA 200 Statistics: A Force in Human Judgment. STA 200 Course Competencies. General Education Competencies STA 200 Statistics: A Force in Human Judgment STA 200 Course Competencies General Education Competencies A. Knowledge of human cultures and the physical and natural worlds through study in the sciences

More information

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus

More information

COURSE BROCHURE. Big Data Foundation Training & Certification

COURSE BROCHURE. Big Data Foundation Training & Certification COURSE BROCHURE Big Data Foundation Training & Certification What is Big Data Foundation? The Big Data Foundation certification is designed to provide candidates with a well-rounded understanding of big

More information

INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA

INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA POLITECNICO DI MILANO GRADUATE SCHOOL OF BUSINESS BABD INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA Courses Description A JOINT PROGRAM WITH POLITECNICO DI MILANO SCHOOL OF MANAGEMENT PRE-COURSES

More information

Data Foundations. Data Attributes. Data Attributes and Features Data Pre-processing Data Storage Data Analysis

Data Foundations. Data Attributes. Data Attributes and Features Data Pre-processing Data Storage Data Analysis Data Foundations Data Attributes and Features Data Pre-processing Data Storage Data Analysis 1 Data Attributes Describing data content and characteristics Representing data dimensions Set of all attributes:

More information

MSCA 31000 Introduction to Statistical Concepts

MSCA 31000 Introduction to Statistical Concepts MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced

More information

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within

More information

Data Science with Hadoop Using Chorus to Operationalize Data Science in the Age of Big Data

Data Science with Hadoop Using Chorus to Operationalize Data Science in the Age of Big Data Data Science with Hadoop Using Chorus to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA.... 2 A COMPLETE DATA SCIENCE ENVIRONMENT.... 4 CONCLUSION.... 7 SYSTEM REQUIREMENTS & SELECTED

More information

Big Data Analytics and Optimization

Big Data Analytics and Optimization Big Data Analytics and Optimization C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e C e r t i f i c a t e P r o g r a m s i n A c c e l e r a t e d E n g i n e e r i n

More information

Role Description. Position of a Data Scientist Machine Learning at Fractal Analytics

Role Description. Position of a Data Scientist Machine Learning at Fractal Analytics Opportunity to work with leading analytics firm that creates Insights, Impact and Innovation. Role Description Position of a Data Scientist Machine Learning at Fractal Analytics March 2014 About the Company

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information

Machine Learning Approaches in Bioinformatics and Computational Biology. Byron Olson Center for Computational Intelligence, Learning, and Discovery

Machine Learning Approaches in Bioinformatics and Computational Biology. Byron Olson Center for Computational Intelligence, Learning, and Discovery Machine Learning Approaches in Bioinformatics and Computational Biology Byron Olson Center for Computational Intelligence, Learning, and Discovery Machine Learning Background and Motivation What is learning?

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Information Visualization. Texas Advanced Computing Center

Information Visualization. Texas Advanced Computing Center Information Visualization Texas Advanced Computing Center Data Analysis vs. information visualization Data analysis: process data to extract knowledge. What information visualization does? Human Data Information

More information

A Professional Big Data Master s Program to train Computational Specialists

A Professional Big Data Master s Program to train Computational Specialists A Professional Big Data Master s Program to train Computational Specialists Anoop Sarkar, Fred Popowich, Alexandra Fedorova! School of Computing Science! Education for Employable Graduates: Critical Questions

More information

Dealing with Data Especially Big Data

Dealing with Data Especially Big Data Dealing with Data Especially Big Data INFO-GB-2346.01 Fall 2016 Professor Norman White nwhite@stern.nyu.edu normwhite@twitter TA: TBA Assistant: Sharon Kim skim2@stern.nyu.edu Background: Most courses

More information

MSCA 31000 Introduction to Statistical Concepts

MSCA 31000 Introduction to Statistical Concepts MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced

More information

Recitation. Mladen Kolar

Recitation. Mladen Kolar 10-601 Recitation Mladen Kolar Topics covered after the midterm Learning Theory Hidden Markov Models Neural Networks Dimensionality reduction Nonparametric methods Support vector machines Boosting Learning

More information

Big Data Analytics and Optimization

Big Data Analytics and Optimization Big Data Analytics and Optimization C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e e.edu.in http://www.insof LIST OF COURSES Essential Business Skills for a Data Scientist...

More information

An interdisciplinary model for analytics education

An interdisciplinary model for analytics education An interdisciplinary model for analytics education Raffaella Settimi, PhD School of Computing, DePaul University Drew Conway s Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley Disclaimer: This material is protected under copyright act AnalytixLabs, 2011. Unauthorized use and/ or duplication of this material or

More information

The? Data: Introduction and Future

The? Data: Introduction and Future The? Data: Introduction and Future Husnu Sensoy Global Maksimum Data & Information Technologies Global Maksimum Data & Information Technologies The Data Company Massive Data Unstructured Data Insight Information

More information

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE: COS-STAT-747 Principles of Statistical Data Mining

More information

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights DATA EXPERTS We accelerate research and transform data to help you create actionable insights WE MINE WE ANALYZE WE VISUALIZE Domains Data Mining Mining longitudinal and linked datasets from web and other

More information

Dealing with Data Especially Big Data

Dealing with Data Especially Big Data Dealing with Data Especially Big Data INFO-GB-2346.30 Spring 2016 Very Rough Draft Subject to Change Professor Norman White Background: Most courses spend their time on the concepts and techniques of analyzing

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Information Extraction, Content Curation, and Machine Learning

Information Extraction, Content Curation, and Machine Learning Information Extraction, Content Curation, and Machine Learning Raghu Ramakrishnan Yahoo! Fellow Chief Scientist, Audience and Cloud Computing (Many slides courtesy of others at Yahoo!) 1 Web of Pages

More information

Data Mining. SPSS Clementine 12.0. 1. Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Data Mining. SPSS Clementine 12.0. 1. Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine Data Mining SPSS 12.0 1. Overview Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Types of Models Interface Projects References Outline Introduction Introduction Three of the common data mining

More information

From JMP Essentials, Second Edition. Full book available for purchase here.

From JMP Essentials, Second Edition. Full book available for purchase here. From JMP Essentials, Second Edition. Full book available for purchase here. Contents Preface... ix About the Authors...xv Acknowledgements... xvii Chapter 1 Getting Started... 1 1.1 Using JMP Essentials...

More information

Data mining knowledge representation

Data mining knowledge representation Data mining knowledge representation 1 What Defines a Data Mining Task? Task relevant data: where and how to retrieve the data to be used for mining Background knowledge: Concept hierarchies Interestingness

More information

Diploma in Strategic Finance& Analytics How to be a Finance Business Partner! Course Outline

Diploma in Strategic Finance& Analytics How to be a Finance Business Partner! Course Outline Diploma in Strategic Finance& Analytics How to be a Finance Business Partner! Course Outline Page 1 of 4 Introduction Background As part of its mandate to provide opportunities for continuing education

More information

Agile Data Science. Dr. Ahmet Bulut Istanbul Sehir University, Istanbul, Turkey

Agile Data Science. Dr. Ahmet Bulut Istanbul Sehir University, Istanbul, Turkey Agile Data Science Dr. Ahmet Bulut (ahmetbulut@sehir.edu.tr) Istanbul Sehir University, Istanbul, Turkey Web... In the nineties, the Web served lots of static HTML pages created by a small set of people

More information

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence. Integration, Design and Implementation Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Client Overview. Engagement Situation. Key Requirements

Client Overview. Engagement Situation. Key Requirements Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision

More information

Data Science in Action

Data Science in Action + Data Science in Action Peerapon Vateekul, Ph.D. Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University + Outlines 2 Data Science & Data Scientist Data Mining Analytics with

More information

Object Detection and Recognition

Object Detection and Recognition Object Detection and Recognition Object detection and recognition are two important computer vision tasks. Object detection determines the presence of an object and/or its scope, and locations in the image.

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

2017 CATALOG ADDENDUM

2017 CATALOG ADDENDUM 2017 CATALOG ADDENDUM FACULTY Trent Hauck has served as a Data Scientist at Zymergen, Zulily and Alight Analytics. He received his Master s of Science, Finance from the University of Kansas. 1 P a g e

More information

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of

More information

The fastest, distributed, in-memory database accelerated by GPUs for analyzing large and streaming data

The fastest, distributed, in-memory database accelerated by GPUs for analyzing large and streaming data The fastest, distributed, in-memory database accelerated by GPUs for analyzing large and streaming data Charles Sutton, Managing Director Europe October 27, 2016 Evolution of Data Processing 1990-2000

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Third Edition Ethem Alpaydın The MIT Press Cambridge, Massachusetts London, England 2014 Massachusetts Institute of Technology All rights reserved. No part of this book

More information

Object Detection and Recognition. Face Recognition. Face Detection. Object Detection

Object Detection and Recognition. Face Recognition. Face Detection. Object Detection Object Detection and Recognition Object detection and recognition are two important computer vision tasks. Face Recognition Slide 1 Object detection determines the presence of an object and/or its scope,

More information

BaiQing DIAO 2014.9.15

BaiQing DIAO 2014.9.15 Intelligent Operation Analysis and Application of Power Big Data BaiQing DIAO 2014.9.15 1 Contents Ⅰ Introduction of Power Big Data Ⅱ Application of Technology Framework Ⅲ Power Big Data Quality Management

More information

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

More information

MIT Information Technology Code :

MIT Information Technology Code : MIT Information Technology Code : 12254014 Duration of study: 2 years Total credits: 180 Programme Information This degree programme is presented in English only. Also consult G Regulations G.30 to G.54

More information

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Interactive Visual Data Exploration with Spark in Databricks Cloud. Hossein

Interactive Visual Data Exploration with Spark in Databricks Cloud. Hossein Interactive Visual Data Exploration with Spark in Databricks Cloud Hossein Falaki @mhfalaki About Databricks Founded by creators of Apache Spark Offers Spark as a service in the cloud Dedicated to open

More information

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014 Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions

More information

Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila

Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila Audit Analytics --An innovative course at Rutgers Qi Liu Roman Chinchila A new certificate in Analytic Auditing Tentative courses: Audit Analytics Special Topics in Audit Analytics Forensic Accounting

More information

DATA SCIENCE CONSULTING GIVE YOUR DATA MEANING

DATA SCIENCE CONSULTING GIVE YOUR DATA MEANING DATA SCIENCE CONSULTING GIVE YOUR DATA MEANING GIVE YOUR DATA MEANING: WITH DATA SCIENCE CONSULTING! Comma Data Science Consulting supports in optimizing business challenges with stateof-the-art methods

More information

Semester 2 Statistics Short courses

Semester 2 Statistics Short courses Semester 2 Statistics Short courses Course: STAA0001 - Basic Statistics Blackboard Site: STAA0001 Dates: Sat 10 th Sept and 22 Oct 2016 (9 am 5 pm) Room EN409 Assumed Knowledge: None Day 1: Exploratory

More information

WHAT MOTIVATED DATA MINING? WHY IS IT IMPORTANT?

WHAT MOTIVATED DATA MINING? WHY IS IT IMPORTANT? WHAT MOTIVATED DATA MINING? WHY IS IT IMPORTANT? Data mining is mainly used for decision making in business. The abundance of data, coupled with the need for powerful data analysis tools, has been described

More information

Data Analytics at NERSC

Data Analytics at NERSC Data Analytics at NERSC Rollin Thomas rcthomas@lbl.gov NERSC Data and Analytics Services March 21, 2016 NERSC User Group Meeting Introduction Data Analytics: The key to unlocking insight from massive and

More information

Deploying Data Mining Models with 4Sight

Deploying Data Mining Models with 4Sight Deploying Data Mining Models with 4Sight A White Paper Copyright 2015. Redistribution NOT permitted. All trademarks are the property of their respective owners. For the latest information, please visit

More information

Course Syllabus For Operations Management. Management Information Systems

Course Syllabus For Operations Management. Management Information Systems For Operations Management and Management Information Systems Department School Year First Year First Year First Year Second year Second year Second year Third year Third year Third year Third year Third

More information

BIG DATA & DATA SCIENCE

BIG DATA & DATA SCIENCE BIG DATA & DATA SCIENCE ACADEMY PROGRAMS IN-COMPANY TRAINING PORTFOLIO 2 TRAINING PORTFOLIO 2016 Synergic Academy Solutions BIG DATA FOR LEADING BUSINESS Big data promises a significant shift in the way

More information

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING

More information

Knowledge Discovery in Databases. Process Model for KDD

Knowledge Discovery in Databases. Process Model for KDD Knowledge Discovery in Databases Process Model for KDD 1 Characteristics of KDD Interactive Iterative Procedure to extract knowledge from data Knowledge being searched for is implicit previously unknown

More information

Dimensionality Reduction with PCA

Dimensionality Reduction with PCA Dimensionality Reduction with PCA Ke Tran May 24, 2011 Introduction Dimensionality Reduction PCA - Principal Components Analysis PCA Experiment The Dataset Discussion Conclusion Why dimensionality reduction?

More information

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008 Professional Organization Checklist for the Computer Science Curriculum Updates Association of Computing Machinery Computing Curricula 2008 The curriculum guidelines can be found in Appendix C of the report

More information

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data 100 001 010 111 From Raw Data to 10011100 Actionable Insights with 00100111 MATLAB Analytics 01011100 11100001 1 Access and Explore Data For scientists the problem is not a lack of available but a deluge.

More information

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

Beginner s Guide to Data Science by

Beginner s Guide to Data Science by Beginner s Guide to Science by Turkish Women in Computing Latife Genc, Groupon Gokcen Cilingir, Intel Rabia Nuray-Turan, Moodwire Inc Umit Yalcinalp, myappellation.com Gulustan Dogan, Yildiz Technical

More information

Predictive Analytics, Data Mining and Big Data

Predictive Analytics, Data Mining and Big Data Predictive Analytics, Data Mining and Big Data This page intentionally left blank Predictive Analytics, Data Mining and Big Data Myths, Misconceptions and Methods Steven Finlay Steven Finlay 2014 All rights

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

Prerequisites. Course Outline

Prerequisites. Course Outline MS-55040: Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot Description This three-day instructor-led course will introduce the students to the concepts of data mining,

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Preparation and analysis of industrial data

Preparation and analysis of industrial data Preparation and Analysis of Industrial Data Swedish Institute of Computer Science Outline Data Analysis of Industrial Data 1 Data Analysis of Industrial Data Analysing Industrial and Commercial Data The

More information

COURSE SYLLABUS COURSE TITLE:

COURSE SYLLABUS COURSE TITLE: 1 COURSE SYLLABUS COURSE TITLE: FORMAT: CERTIFICATION EXAMS: 55040 Data Mining: Predictive Analytics with Microsoft SQL Server Analysis Services and Excel Using PowerPivot and the Data Mining Add-Ins Instructor-Led

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Big Data. Frank Takes. LIACS, Universiteit Leiden AISSR, Universiteit van Amsterdam. Accept , 28 augustus 2015

Big Data. Frank Takes. LIACS, Universiteit Leiden AISSR, Universiteit van Amsterdam. Accept , 28 augustus 2015 Big Data Frank Takes LIACS, Universiteit Leiden AISSR, Universiteit van Amsterdam AcceptEmail, 28 augustus 2015 Introduction Frank Takes BSc: Informatica & Bedrijfswetenschappen (2008) MSc: Computer Science

More information

Web mining and knowledge discovery of usage patterns - A survey. CS748 Yan Wang

Web mining and knowledge discovery of usage patterns - A survey. CS748 Yan Wang Web mining and knowledge discovery of usage patterns - A survey CS748 Yan Wang Introduction Web data mining Usage mining on the Web WebSIFT: a usage mining system Personalization vs. User navigation pattern

More information

Foundations of Text Mining and Natural Language Processing

Foundations of Text Mining and Natural Language Processing Course code: Course title: ANA/TXT Foundations of Text Mining and Natural Language Processing Days: 3 Description: Course intended for Text Mining constitute at least 70% of all data generated in IT systems.

More information

Q1 Define the following: Data Mining, ETL, Transaction coordinator, Local Autonomy, Workload distribution

Q1 Define the following: Data Mining, ETL, Transaction coordinator, Local Autonomy, Workload distribution Q1 Define the following: Data Mining, ETL, Transaction coordinator, Local Autonomy, Workload distribution Q2 What are Data Mining Activities? Q3 What are the basic ideas guide the creation of a data warehouse?

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information