DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

Size: px
Start display at page:

Download "DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2"

Transcription

1 DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition. Over the course of five data science projects, they develop skills across key aspects of data science, and results from each project are added to the students' portfolios. In the last four weeks, students build out and complete their individual final projects, culminating in a presentation of their work to representatives from the Metis Hiring Network. ONLINE PRE-WORK Students work through a curated collection of tutorials that cover the basics so they can hit the ground running. First, they're guided through initial software setup. Introductory materials then start with productivity at the command line, using an editor effectively, and becoming familiar with Python basics. Students reinforce their statistics knowledge through a set of readings with exercises that start to blend the statistical and computational. Metis teaching assistants review these preparatory exercises and provide feedback online. INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS WEEK 1 UNIT ONE Introduction to the Data Science Toolkit Students complete an entire bite-sized data science project from start to finish. They start using Git for version control and the IPython environment with the pandas and matplotlib packages to perform exploratory statistical analyses and visualizations. Review probability and statistics, including distributions, bootstrapping, hypothesis testing, maximum likelihood estimation, and Bayes theorem (This review spans the first three weeks.) Use UNIX, Git, and IPython to organize data science project resources Load and manipulate data with the pandas Python package Visualize results using the matplotlib Python package Communicate data science results CODENAME In the For first the Students week, first The students last pass form Students guided small at work machine are groups project free small to that learning, focuses groups each anything on work using students unsupervised as covered MTA an internal dive turnstile deep class learning data or into science and learn estimate prediction NLP team something algorithms, the at a with fictional new NoSQL to answer volume regression of company people databases, models. the on in the questions They and street, insurance experience API they so data that industry want collection. (theoretical) to beauty (details address. of Students flat nonprofits are Some files, left work students and to and the learn individually companies students know to scrape what to and information street Supervised from constraints teams websites project efficiently. learning using for at the algorithms tools The design admissions students like of Python and this stage. are relational project. Requests, provided Others databases Beautiful with embark the have data on Soup, entirely been and and covered guided Selenium. new turf. in class. Every student can determine). will have be very their few passion deploy through After exploratory Students scraping works together data on intensely their analysis some own movie and classification challenges plotting box office so models him data, they or herself that students can fit focus within to find create on the and new something overall scrape tools, goals more cool, of the interesting, brainstorming, resources company on and their useful, and communication. own the or team. and worthwhile. present During McNulty, their movie students industry perform regression a deep dive into the to the visualization package D3 and create their own APIs on the Python Flask micro framework to class.

2 WEEK 2 UNIT TWO: PART 1: Design Process and Web Scraping In preparation for Project 2, students start to learn one of the most important tools a data scientist uses: the iterative design process. They learn tools for web scraping and start fitting simple models to data. Also, they are introduced to cloud computing and work on remote servers. Use the design process to iteratively explore the possible ways that a problem can be solved Create and work in a virtual environment on a cloud computing service Use Python s Requests and Selenium packages to obtain data from web pages Use Python s Beautiful Soup package to parse the content of a web page to find useful data for subsequent analysis Use the design process to iterate the concept for the Unit 2 projects Complete a primer on web fundamentals including HTML, CSS, and JavaScript WEEK 3 UNIT TWO: PART 2 Regression and Communicating Results Students go in-depth on regression using scikit-learn and matplotlib. Choosing among the analysis methods and approaches to reporting their results, students finish the second project and present their findings. Apply regression modeling with Python packages scikit-learn and statsmodels Load, clean, and explore data using Python packages pandas, numpy, and matplotlib/pyplot Experience how the design process influences analysis and results Complete second project and communicate results to each other CODENAME volume regression of company people databases, models. the on in questions the They and street, insurance API experience they so data that want industry collection. (theoretical) to address. (details beauty Students Some nonprofits are of left flat students work to files, and the individually know and companies students learn what and to will can determine). scrape be have their very final few project deploy information street Supervised constraints teams from at efficiently. learning the web for admissions sites the algorithms The design using students stage. of tools and this Others are relational project. like provided Python embark databases Requests, with on the entirely have data Beautiful been new and covered guided turf. Soup, Every and in class. student works through Selenium. exploratory Students After intensely work scraping data on their analysis and together challenges own and classification some plotting him movie or herself so models they box to that office can create fit focus within data, something on students the new overall cool, tools, find goals interesting, and of the useful, or brainstorming, scrape company more and resources worthwhile. and communication. the team. on their During own McNulty, and present students their perform movie a deep industry dive regression

3 WEEK 4 UNIT THREE: PART 1: Databases and Introduction to Machine Learning Concepts Students cover relational databases such as SQL and more ways of obtaining, cleaning and maintaining data. They are introduced to the concepts of machine learning and exposed to classification and supervised learning with a few examples such as logistic regression and KNN. They also discuss different types of feasibility related to data science questions and projects. Use SQL databases to store and organize data Explore supervised learning techniques including decision trees and random forests Access stored data with MySQL querying language Complete a deep applied survey of classification (supervised learning) techniques, such as logistic regression, k-nearest neighbors, etc. Design and evaluate the computational feasibility of a third data project WEEK 5 UNIT THREE: PART 2 Machine Learning, Supervised Learning Techniques, Naive Bayes Algorithm Students dig into more details and more algorithms for supervised learning including SVM, decision trees and random forests; techniques for feature selection and feature extraction; and concepts and applications for deep learning. Students choose to apply one or more of these algorithms as part of this Unit s project. WEEK 6 UNIT THREE: PART 3 JavaScript and D3 Connect regression modeling to the broader family of machine learning techniques Use supervised learning on Project 3; work in groups simulating in-house data science teams Refine models with feature selection and feature extraction Evaluate the efficacy and computational feasibility of various ML algorithms in different contexts Students visualize projects using D3, a favorite tool for flexible and attractive presentations of data and relationships. Since D3 is a JavaScript library, students learn JavaScript essentials and the incorporation of other js libraries (jquery, Bootstrap, etc.) that make the job much easier. Learn the fundamentals of JavaScript Explore basic principles of good visual design and communication Use D3 to create interactive visualizations that are functional in any browser Create novel data visualizations with D3 to illustrate Unit 3 project results in blog post format volume regression of company people databases, models. the on in the questions They and street, insurance API experience they so data that industry want collection. (theoretical) to (details address. beauty Students nonprofits are Some of left flat work students to files, and the individually and companies students know learn what to and can determine). scrape will have be very their few passion deploy information street Supervised constraints teams from project efficiently. learning web for at sites the algorithms The design admissions using students of tools and this stage. are relational project. like provided Python Others databases Requests, with embark the have data on Beautiful entirely been and covered guided Soup, new turf. and in class. Every student through Selenium. exploratory Students After works scraping data on intensely their analysis together own and classification challenges some plotting movie so models him they box or herself that office can fit focus within data, to create on students the new something overall tools, find goals and cool, of the interesting, brainstorming, scrape company more and resources useful, and communication. the or team. worthwhile. on their During own McNulty, and present students their perform movie a deep industry dive regression

4 WEEK 7 UNIT FOUR: PART 1: APIs, Data Collection Methods, NoSQL Storage, WebApps with Flask The project for the fourth unit involves text data. Students round out data acquisition methods with APIs and online database servers. Students also learn about NoSQL databases and start using MongoDB. WEEK 8 UNIT FOUR: PART 2 Natural Language Processing (NLP) Use Python to download data from an API Use NoSQL databases; parse and store unstructured data in MongoDB Review database selection: non-relational (NoSQL) databases vs. relational (SQL) databases vs. no database (flat files) Merge disparate data sets to practice data munging Design and propose initial data collection for Unit 4 project Students analyze the text data collected in the previous week and learn about NLP algorithms. More unsupervised learning algorithms are explored. Students dive deeper into unsupervised learning and more algorithms, covering K-means, hierarchical clustering, mixture models and topic models. They also learn about how large amounts of data are handled, discussing parallel computing and Hadoop MapReduce. Project 4 presentations are presented as lightning talks. Use Python s Natural Language ToolKit and TextBlob library to perform natural language analyses on text data Apply deep learning/neural networks, DBSCAN, dimensionality reduction (with principle components analysis). Algorithms including KD-trees and locality sensitive hashing are learned. Survey K-means, hierarchical clustering, and other unsupervised learning algorithms; applications on real data Reflect on the strengths and weaknesses of each algorithm and its appropriate use Outline the data science stack and design choices in data engineering fault tolerant systems Set up Hadoop environment on cloud servers Use Hadoop via Python bindings to write customized map-reduce jobs from scratch and run in Hadoop cloud environment Discuss Hadoop: history & ecosystem, when & why, hype & reality Complete Project 4 and present findings to class in lightning talk format volume regression of company people databases, models. the on in the questions They and street, insurance API experience they so data that industry want collection. (theoretical) to (details address. beauty Students nonprofits are Some of left flat work students to files, and the individually and companies students know learn what to and can determine). scrape will have be very their few passion deploy information street Supervised constraints teams from project efficiently. learning web for at sites the algorithms The design admissions using students of tools and this stage. are relational project. like provided Python Others databases Requests, with embark the have data on Beautiful entirely been and covered guided Soup, new turf. and in class. Every student through Selenium. exploratory Students After works scraping data on intensely their analysis together own and classification challenges some plotting movie so models him they box or herself that office can fit focus within data, to create on students the new something overall tools, find goals and cool, of the interesting, brainstorming, scrape company more and resources useful, and communication. the or team. worthwhile. on their During own McNulty, and present students their perform movie a deep industry dive regression

5 WEEKS 9-12 UNIT FIVE Final Project Students work full time on their Final Projects, which they have been slowly designing through the first eight weeks. They also learn more about cloud computing, system architectures and feasibility evaluations. Use the design process to isolate an appropriate problem to solve Evaluate the computational feasibility of the problem Choose data sources that can be used to address the problem Design and implement an appropriate computational architecture Design and implement an appropriate set of analysis steps Design and develop a data visualization to clearly convey the results of the analysis to a layperson Assemble final portfolio and present project at Career Day MORE ABOUT PROJECTS Data science projects can be divided into useful dimensions. A dimension can be thought of as a facet along which a decision must be made to specify a project implementation. The bootcamp considers the dimensions of domain, design, data, algorithms, tools, and communication. Each Unit covers certain content from several domains, which are reinforced in that Unit's project. The rigor with which we attack the topics covered in the bootcamp allow us to sleep soundly at night. We feel confident in saying that our graduates haven't simply learned about the tools that data scientists use. By the time they leave our classroom, our graduates are data scientists. They are ready to approach the problem space in their new careers and assemble the suite of tools and methods to answer insightful questions and communicate comprehensible results. They are competent, capable, and confident. And they are ready to work. volume regression of company people databases, models. the on in questions the They and street, insurance API experience they so data that want industry collection. (theoretical) to address. (details beauty Students Some nonprofits are of left flat students work to files, and the individually know and companies students learn what to and will can determine). scrape be have their very final few project deploy information street Supervised constraints teams from at efficiently. learning the web for admissions sites the algorithms The design using students stage. of tools and this Others are relational project. like provided Python embark databases Requests, with on the entirely have data Beautiful been new and covered guided turf. Soup, Every and in class. student works through Selenium. exploratory Students After intensely work scraping data on their analysis and together challenges own and classification some plotting him movie or herself so models they box to that office can create fit focus within data, something on students the new overall cool, tools, find goals interesting, and of the useful, or brainstorming, scrape company more and resources worthwhile. and communication. the team. on their During own McNulty, and present students their perform movie a deep industry dive regression

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!

More information

Data Science Certificate Program

Data Science Certificate Program Information Technologies Programs Data Science Certificate Program Accelerate Your Career extension.uci.edu/datascience Offered in partnership with University of California, Irvine Extension s professional

More information

INTRODUCTION & CONCEPTS. Definition of Cloud Computing Service Models Deployment Models... 23

INTRODUCTION & CONCEPTS. Definition of Cloud Computing Service Models Deployment Models... 23 Contents I INTRODUCTION & CONCEPTS 17 1 Introduction to Cloud Computing 19 11 Introduction 111 Definition of Cloud Computing 20 12 Characteristics of Cloud Computing 20 13 Cloud Models 22 131 132 Service

More information

Microsoft Research Windows Azure for Research Training

Microsoft Research Windows Azure for Research Training Copyright 2013 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the

More information

Microsoft Research Microsoft Azure for Research Training

Microsoft Research Microsoft Azure for Research Training Copyright 2014 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the

More information

Diploma Of Computing

Diploma Of Computing Diploma Of Computing Course Outline Campus Intake CRICOS Course Duration Teaching Methods Assessment Course Structure Units Melbourne Burwood Campus / Jakarta Campus, Indonesia March, June, October 022638B

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Description The Helzberg School of Management has launched two graduate-level certificates: one in Data

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

HADOOP. Revised 10/19/2015

HADOOP. Revised 10/19/2015 HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...

More information

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus

More information

Big Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify

Big Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify Big Data at Spotify Anders Arpteg, Ph D Analytics Machine Learning, Spotify Quickly about me Quickly about Spotify What is all the data used for? Quickly about Spark Hadoop MR vs Spark Need for (distributed)

More information

Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

More information

What s Cooking in KNIME

What s Cooking in KNIME What s Cooking in KNIME Thomas Gabriel Copyright 2015 KNIME.com AG Agenda Querying NoSQL Databases Database Improvements & Big Data Copyright 2015 KNIME.com AG 2 Querying NoSQL Databases MongoDB & CouchDB

More information

Master of Science in Health Information Technology Degree Curriculum

Master of Science in Health Information Technology Degree Curriculum Master of Science in Health Information Technology Degree Curriculum Core courses: 8 courses Total Credit from Core Courses = 24 Core Courses Course Name HRS Pre-Req Choose MIS 525 or CIS 564: 1 MIS 525

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

CORE CLASSES: IS 6410 Information Systems Analysis and Design IS 6420 Database Theory and Design IS 6440 Networking & Servers (3)

CORE CLASSES: IS 6410 Information Systems Analysis and Design IS 6420 Database Theory and Design IS 6440 Networking & Servers (3) COURSE DESCRIPTIONS CORE CLASSES: Required IS 6410 Information Systems Analysis and Design (3) Modern organizations operate on computer-based information systems, from day-to-day operations to corporate

More information

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights DATA EXPERTS We accelerate research and transform data to help you create actionable insights WE MINE WE ANALYZE WE VISUALIZE Domains Data Mining Mining longitudinal and linked datasets from web and other

More information

Analysis Tools and Libraries for BigData

Analysis Tools and Libraries for BigData + Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I

More information

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved. Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,

More information

Practical Data Science with R

Practical Data Science with R Practical Data Science with R Instructor Matthew Renze Twitter: @matthewrenze Email: matthew@matthewrenze.com Web: http://www.matthewrenze.com Course Description Data science is the practice of transforming

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm!

Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm! Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm! Moderator: David L. Snell, ASA, MAAA Presenters: Brian D. Holland, FSA, MAAA

More information

CSE 6040 Computing for Data Analytics: Methods and Tools. Lecture 1 Course Overview

CSE 6040 Computing for Data Analytics: Methods and Tools. Lecture 1 Course Overview CSE 6040 Computing for Data Analytics: Methods and Tools Lecture 1 Course Overview DA KUANG, POLO CHAU GEORGIA TECH FALL 2014 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 1 Course Staff Instructor Da

More information

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Is a Data Scientist the New Quant? Stuart Kozola MathWorks Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by

More information

Data Analysis Bootcamp - What To Expect. Damian Herrick Founder, Principal Consultant Lake Hill Analytics, LLC

Data Analysis Bootcamp - What To Expect. Damian Herrick Founder, Principal Consultant Lake Hill Analytics, LLC Data Analysis Bootcamp - What To Expect Damian Herrick Founder, Principal Consultant Lake Hill Analytics, LLC Why Are Companies Using Data and Analytics Today? Data + Predictive Ability + Optimization

More information

The Learn-Verified Full Stack Web Development Program

The Learn-Verified Full Stack Web Development Program The Learn-Verified Full Stack Web Development Program Overview This online program will prepare you for a career in web development by providing you with the baseline skills and experience necessary to

More information

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

Scikit-Learn GUI. NETSI Team: Abhilash Nair, Sean Dai, Graham Wright, Rohit Kale. Client: Dr. Olufisayo Omojokun

Scikit-Learn GUI. NETSI Team: Abhilash Nair, Sean Dai, Graham Wright, Rohit Kale. Client: Dr. Olufisayo Omojokun Scikit-Learn GUI NETSI Team: Abhilash Nair, Sean Dai, Graham Wright, Rohit Kale Client: Dr. Olufisayo Omojokun Presentation Overview Introduction to Machine Learning Importance of Machine Learning Feasibility

More information

AcademyR Course Catalog

AcademyR Course Catalog AcademyR Course Catalog Table of Contents Our Philosophy...3 Courses Listed by Role Data Analyst...4 Data Scientist...6 R Programmer...9 Statistician.... 10 BI Developer... 11 System Administrator... 12

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

More information

Predicting outcome of soccer matches using machine learning

Predicting outcome of soccer matches using machine learning Saint-Petersburg State University Mathematics and Mechanics Faculty Albina Yezus Predicting outcome of soccer matches using machine learning Term paper Scientific adviser: Alexander Igoshkin, Yandex Mobile

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

WEB DEVELOPMENT IMMERSIVE GA.CO/WDI

WEB DEVELOPMENT IMMERSIVE GA.CO/WDI General Assembly Course Curriculum WEB DEVELOPMENT IMMERSIVE Table of Contents 3 Overview 4 Students 5 Curriculum Projects & Units 11 Frequently Asked Questions 13 Contact Information 2 Overview OVERVIEW

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley Disclaimer: This material is protected under copyright act AnalytixLabs, 2011. Unauthorized use and/ or duplication of this material or

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

SURVEY REPORT DATA SCIENCE SOCIETY 2014

SURVEY REPORT DATA SCIENCE SOCIETY 2014 SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses

More information

Predictive Analytics Certificate Program

Predictive Analytics Certificate Program Information Technologies Programs Predictive Analytics Certificate Program Accelerate Your Career Offered in partnership with: University of California, Irvine Extension s professional certificate and

More information

An In-Depth Look at In-Memory Predictive Analytics for Developers

An In-Depth Look at In-Memory Predictive Analytics for Developers September 9 11, 2013 Anaheim, California An In-Depth Look at In-Memory Predictive Analytics for Developers Philip Mugglestone SAP Learning Points Understand the SAP HANA Predictive Analysis library (PAL)

More information

Computer Science Course Descriptions Page 1

Computer Science Course Descriptions Page 1 CS 101 Intro to Computer Science An introduction to computer science concepts and the role of computers in society. Topics include the history of computing, computer hardware, operating systems, the Internet,

More information

Classroom Demonstrations of Big Data

Classroom Demonstrations of Big Data Classroom Demonstrations of Big Data Eric A. Suess Abstract We present examples of accessing and analyzing large data sets for use in a classroom at the first year graduate level or senior undergraduate

More information

Data Science in Action

Data Science in Action + Data Science in Action Peerapon Vateekul, Ph.D. Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University + Outlines 2 Data Science & Data Scientist Data Mining Analytics with

More information

Introduction to Big Data with Apache Spark UC BERKELEY

Introduction to Big Data with Apache Spark UC BERKELEY Introduction to Big Data with Apache Spark UC BERKELEY This Lecture Exploratory Data Analysis Some Important Distributions Spark mllib Machine Learning Library Descriptive vs. Inferential Statistics Descriptive:»

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling

More information

DATA ANALYTICS USING R

DATA ANALYTICS USING R DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

More information

Ironfan Your Foundation for Flexible Big Data Infrastructure

Ironfan Your Foundation for Flexible Big Data Infrastructure Ironfan Your Foundation for Flexible Big Data Infrastructure Benefits With Ironfan, you can expect: Reduced cycle time. Provision servers in minutes not days. Improved visibility. Increased transparency

More information

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

More information

Big Data. Lyle Ungar, University of Pennsylvania

Big Data. Lyle Ungar, University of Pennsylvania Big Data Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. McKinsey Data Scientist: The Sexiest Job of the 21st Century -

More information

SAV2013: The Great SharePoint 2013 App Venture

SAV2013: The Great SharePoint 2013 App Venture SHAREPOINT 2013 FOR DEVELOPERS 5 DAYS SAV2013: The Great SharePoint 2013 App Venture AUDIENCE FORMAT COURSE DESCRIPTION Professional Developers Instructor-led training with hands-on labs This 5-day course

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

This Symposium brought to you by www.ttcus.com

This Symposium brought to you by www.ttcus.com This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data

More information

Challenges and Lessons from NIST Data Science Pre-pilot Evaluation in Introduction to Data Science Course Fall 2015

Challenges and Lessons from NIST Data Science Pre-pilot Evaluation in Introduction to Data Science Course Fall 2015 Challenges and Lessons from NIST Data Science Pre-pilot Evaluation in Introduction to Data Science Course Fall 2015 Dr. Daisy Zhe Wang Director of Data Science Research Lab University of Florida, CISE

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

More information

Car Insurance. Prvák, Tomi, Havri

Car Insurance. Prvák, Tomi, Havri Car Insurance Prvák, Tomi, Havri Sumo report - expectations Sumo report - reality Bc. Jan Tomášek Deeper look into data set Column approach Reminder What the hell is this competition about??? Attributes

More information

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Copyright 2014 Splunk Inc. Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Dritan Bi=ncka BD Solu=ons Architecture Disclaimer During the course of this presenta=on, we may make forward looking statements

More information

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Volker Markl volker.markl@tu-berlin.de dima.tu-berlin.de dfki.de/web/research/iam/ bbdc.berlin Based on my 2014 Vision Paper On

More information

INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA

INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA POLITECNICO DI MILANO GRADUATE SCHOOL OF BUSINESS BABD INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA Courses Description A JOINT PROGRAM WITH POLITECNICO DI MILANO SCHOOL OF MANAGEMENT PRE-COURSES

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com

Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com Spark Fast & Expressive Cluster computing engine Compatible with Hadoop Came

More information

Big Data Specialized Studies

Big Data Specialized Studies Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

Big Data Analytics and Optimization

Big Data Analytics and Optimization Big Data Analytics and Optimization C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e e.edu.in http://www.insof LIST OF COURSES Essential Business Skills for a Data Scientist...

More information

Learning Web App Development

Learning Web App Development Learning Web App Development Semmy Purewal Beijing Cambridge Farnham Kbln Sebastopol Tokyo O'REILLY Table of Contents Preface xi 1. The Workflow 1 Text Editors 1 Installing Sublime Text 2 Sublime Text

More information

Chapter 2: Data Analytics Life Cycle

Chapter 2: Data Analytics Life Cycle Section 1. Data Analytics Lifecycle Overview The Data Analytics Lifecycle is designed specifically for Big Data problems and data science projects. The lifecycle has six phases, and project work can occur

More information

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Instructor: (jpineau@cs.mcgill.ca) TAs: Pierre-Luc Bacon (pbacon@cs.mcgill.ca) Ryan Lowe (ryan.lowe@mail.mcgill.ca)

More information

An interdisciplinary model for analytics education

An interdisciplinary model for analytics education An interdisciplinary model for analytics education Raffaella Settimi, PhD School of Computing, DePaul University Drew Conway s Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries First Semester Development 1A On completion of this subject students will be able to apply basic programming and problem solving skills in a 3 rd generation object-oriented programming language (such as

More information

Data Analytics at NERSC

Data Analytics at NERSC Data Analytics at NERSC Rollin Thomas rcthomas@lbl.gov NERSC Data and Analytics Services March 21, 2016 NERSC User Group Meeting Introduction Data Analytics: The key to unlocking insight from massive and

More information

Data Integration Checklist

Data Integration Checklist The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media

More information

Trainer Preparation Guide for Course 20488B: Developing Microsoft SharePoint Server 2013 Core Solutions Design of the Course

Trainer Preparation Guide for Course 20488B: Developing Microsoft SharePoint Server 2013 Core Solutions Design of the Course Trainer Preparation Guide for Course 20488B: Developing Microsoft SharePoint Server 2013 Core Solutions 1 Trainer Preparation Guide for Course 20488B: Developing Microsoft SharePoint Server 2013 Core Solutions

More information

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through

More information

Introduction to Data Science: CptS 483-06 Syllabus First Offering: Fall 2015

Introduction to Data Science: CptS 483-06 Syllabus First Offering: Fall 2015 Course Information Introduction to Data Science: CptS 483-06 Syllabus First Offering: Fall 2015 Credit Hours: 3 Semester: Fall 2015 Meeting times and location: MWF, 12:10 13:00, Sloan 163 Course website:

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

Buyer s Guide to Big Data Integration

Buyer s Guide to Big Data Integration SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration

MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration Hoi-Wan Chan 1, Min Xu 2, Chung-Pan Tang 1, Patrick P. C. Lee 1 & Tsz-Yeung Wong 1, 1 Department of Computer Science

More information

You ll need to have: It d be great if you have:

You ll need to have: It d be great if you have: DevOps We re looking for a Development Operations Developer with a passion for experimentation. If you re interested in helping us build the future of mobile healthcare, this job is for you. A strong background

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics An Overview of Predictive Analytics for Practitioners Dean Abbott, Abbott Analytics Thank You Sponsors Empower users with new insights through familiar tools while balancing the need for IT to monitor

More information

Dealing with Data Especially Big Data

Dealing with Data Especially Big Data Dealing with Data Especially Big Data INFO-GB-2346.30 Spring 2016 Very Rough Draft Subject to Change Professor Norman White Background: Most courses spend their time on the concepts and techniques of analyzing

More information

lop Building Machine Learning Systems with Python en source

lop Building Machine Learning Systems with Python en source Building Machine Learning Systems with Python Master the art of machine learning with Python and build effective machine learning systems with this intensive handson guide Willi Richert Luis Pedro Coelho

More information

April 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner. kwaehner@tibco.

April 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner. kwaehner@tibco. April 2016 JPoint Moscow, Russia How to Apply Big Data Analytics and Machine Learning to Real Time Processing Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de LinkedIn / Xing Please connect!

More information

Spark and the Big Data Library

Spark and the Big Data Library Spark and the Big Data Library Reza Zadeh Thanks to Matei Zaharia Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in both enterprises and

More information

R YOU READY FOR PYTHON? Sunday 19th April, 2015

R YOU READY FOR PYTHON? Sunday 19th April, 2015 R YOU READY FOR PYTHON? Sunday 19th April, 2015 THIS IS NOT A PYTHON VS R TALK credits - https://meetmrholland.wordpress.com/2013/02/03/creative-5-tips-to-make-all-your-meetings-exactly-the-same/ WHO ARE

More information

Reference Architecture, Requirements, Gaps, Roles

Reference Architecture, Requirements, Gaps, Roles Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Master of Science in Healthcare Informatics and Analytics Program Overview

Master of Science in Healthcare Informatics and Analytics Program Overview Master of Science in Healthcare Informatics and Analytics Program Overview The program is a 60 credit, 100 week course of study that is designed to graduate students who: Understand and can apply the appropriate

More information