DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2
|
|
- Leslie Barnett
- 8 years ago
- Views:
Transcription
1 DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition. Over the course of five data science projects, they develop skills across key aspects of data science, and results from each project are added to the students' portfolios. In the last four weeks, students build out and complete their individual final projects, culminating in a presentation of their work to representatives from the Metis Hiring Network. ONLINE PRE-WORK Students work through a curated collection of tutorials that cover the basics so they can hit the ground running. First, they're guided through initial software setup. Introductory materials then start with productivity at the command line, using an editor effectively, and becoming familiar with Python basics. Students reinforce their statistics knowledge through a set of readings with exercises that start to blend the statistical and computational. Metis teaching assistants review these preparatory exercises and provide feedback online. INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS WEEK 1 UNIT ONE Introduction to the Data Science Toolkit Students complete an entire bite-sized data science project from start to finish. They start using Git for version control and the IPython environment with the pandas and matplotlib packages to perform exploratory statistical analyses and visualizations. Review probability and statistics, including distributions, bootstrapping, hypothesis testing, maximum likelihood estimation, and Bayes theorem (This review spans the first three weeks.) Use UNIX, Git, and IPython to organize data science project resources Load and manipulate data with the pandas Python package Visualize results using the matplotlib Python package Communicate data science results CODENAME In the For first the Students week, first The students last pass form Students guided small at work machine are groups project free small to that learning, focuses groups each anything on work using students unsupervised as covered MTA an internal dive turnstile deep class learning data or into science and learn estimate prediction NLP team something algorithms, the at a with fictional new NoSQL to answer volume regression of company people databases, models. the on in the questions They and street, insurance experience API they so data that industry want collection. (theoretical) to beauty (details address. of Students flat nonprofits are Some files, left work students and to and the learn individually companies students know to scrape what to and information street Supervised from constraints teams websites project efficiently. learning using for at the algorithms tools The design admissions students like of Python and this stage. are relational project. Requests, provided Others databases Beautiful with embark the have data on Soup, entirely been and and covered guided Selenium. new turf. in class. Every student can determine). will have be very their few passion deploy through After exploratory Students scraping works together data on intensely their analysis some own movie and classification challenges plotting box office so models him data, they or herself that students can fit focus within to find create on the and new something overall scrape tools, goals more cool, of the interesting, brainstorming, resources company on and their useful, and communication. own the or team. and worthwhile. present During McNulty, their movie students industry perform regression a deep dive into the to the visualization package D3 and create their own APIs on the Python Flask micro framework to class.
2 WEEK 2 UNIT TWO: PART 1: Design Process and Web Scraping In preparation for Project 2, students start to learn one of the most important tools a data scientist uses: the iterative design process. They learn tools for web scraping and start fitting simple models to data. Also, they are introduced to cloud computing and work on remote servers. Use the design process to iteratively explore the possible ways that a problem can be solved Create and work in a virtual environment on a cloud computing service Use Python s Requests and Selenium packages to obtain data from web pages Use Python s Beautiful Soup package to parse the content of a web page to find useful data for subsequent analysis Use the design process to iterate the concept for the Unit 2 projects Complete a primer on web fundamentals including HTML, CSS, and JavaScript WEEK 3 UNIT TWO: PART 2 Regression and Communicating Results Students go in-depth on regression using scikit-learn and matplotlib. Choosing among the analysis methods and approaches to reporting their results, students finish the second project and present their findings. Apply regression modeling with Python packages scikit-learn and statsmodels Load, clean, and explore data using Python packages pandas, numpy, and matplotlib/pyplot Experience how the design process influences analysis and results Complete second project and communicate results to each other CODENAME volume regression of company people databases, models. the on in questions the They and street, insurance API experience they so data that want industry collection. (theoretical) to address. (details beauty Students Some nonprofits are of left flat students work to files, and the individually know and companies students learn what and to will can determine). scrape be have their very final few project deploy information street Supervised constraints teams from at efficiently. learning the web for admissions sites the algorithms The design using students stage. of tools and this Others are relational project. like provided Python embark databases Requests, with on the entirely have data Beautiful been new and covered guided turf. Soup, Every and in class. student works through Selenium. exploratory Students After intensely work scraping data on their analysis and together challenges own and classification some plotting him movie or herself so models they box to that office can create fit focus within data, something on students the new overall cool, tools, find goals interesting, and of the useful, or brainstorming, scrape company more and resources worthwhile. and communication. the team. on their During own McNulty, and present students their perform movie a deep industry dive regression
3 WEEK 4 UNIT THREE: PART 1: Databases and Introduction to Machine Learning Concepts Students cover relational databases such as SQL and more ways of obtaining, cleaning and maintaining data. They are introduced to the concepts of machine learning and exposed to classification and supervised learning with a few examples such as logistic regression and KNN. They also discuss different types of feasibility related to data science questions and projects. Use SQL databases to store and organize data Explore supervised learning techniques including decision trees and random forests Access stored data with MySQL querying language Complete a deep applied survey of classification (supervised learning) techniques, such as logistic regression, k-nearest neighbors, etc. Design and evaluate the computational feasibility of a third data project WEEK 5 UNIT THREE: PART 2 Machine Learning, Supervised Learning Techniques, Naive Bayes Algorithm Students dig into more details and more algorithms for supervised learning including SVM, decision trees and random forests; techniques for feature selection and feature extraction; and concepts and applications for deep learning. Students choose to apply one or more of these algorithms as part of this Unit s project. WEEK 6 UNIT THREE: PART 3 JavaScript and D3 Connect regression modeling to the broader family of machine learning techniques Use supervised learning on Project 3; work in groups simulating in-house data science teams Refine models with feature selection and feature extraction Evaluate the efficacy and computational feasibility of various ML algorithms in different contexts Students visualize projects using D3, a favorite tool for flexible and attractive presentations of data and relationships. Since D3 is a JavaScript library, students learn JavaScript essentials and the incorporation of other js libraries (jquery, Bootstrap, etc.) that make the job much easier. Learn the fundamentals of JavaScript Explore basic principles of good visual design and communication Use D3 to create interactive visualizations that are functional in any browser Create novel data visualizations with D3 to illustrate Unit 3 project results in blog post format volume regression of company people databases, models. the on in the questions They and street, insurance API experience they so data that industry want collection. (theoretical) to (details address. beauty Students nonprofits are Some of left flat work students to files, and the individually and companies students know learn what to and can determine). scrape will have be very their few passion deploy information street Supervised constraints teams from project efficiently. learning web for at sites the algorithms The design admissions using students of tools and this stage. are relational project. like provided Python Others databases Requests, with embark the have data on Beautiful entirely been and covered guided Soup, new turf. and in class. Every student through Selenium. exploratory Students After works scraping data on intensely their analysis together own and classification challenges some plotting movie so models him they box or herself that office can fit focus within data, to create on students the new something overall tools, find goals and cool, of the interesting, brainstorming, scrape company more and resources useful, and communication. the or team. worthwhile. on their During own McNulty, and present students their perform movie a deep industry dive regression
4 WEEK 7 UNIT FOUR: PART 1: APIs, Data Collection Methods, NoSQL Storage, WebApps with Flask The project for the fourth unit involves text data. Students round out data acquisition methods with APIs and online database servers. Students also learn about NoSQL databases and start using MongoDB. WEEK 8 UNIT FOUR: PART 2 Natural Language Processing (NLP) Use Python to download data from an API Use NoSQL databases; parse and store unstructured data in MongoDB Review database selection: non-relational (NoSQL) databases vs. relational (SQL) databases vs. no database (flat files) Merge disparate data sets to practice data munging Design and propose initial data collection for Unit 4 project Students analyze the text data collected in the previous week and learn about NLP algorithms. More unsupervised learning algorithms are explored. Students dive deeper into unsupervised learning and more algorithms, covering K-means, hierarchical clustering, mixture models and topic models. They also learn about how large amounts of data are handled, discussing parallel computing and Hadoop MapReduce. Project 4 presentations are presented as lightning talks. Use Python s Natural Language ToolKit and TextBlob library to perform natural language analyses on text data Apply deep learning/neural networks, DBSCAN, dimensionality reduction (with principle components analysis). Algorithms including KD-trees and locality sensitive hashing are learned. Survey K-means, hierarchical clustering, and other unsupervised learning algorithms; applications on real data Reflect on the strengths and weaknesses of each algorithm and its appropriate use Outline the data science stack and design choices in data engineering fault tolerant systems Set up Hadoop environment on cloud servers Use Hadoop via Python bindings to write customized map-reduce jobs from scratch and run in Hadoop cloud environment Discuss Hadoop: history & ecosystem, when & why, hype & reality Complete Project 4 and present findings to class in lightning talk format volume regression of company people databases, models. the on in the questions They and street, insurance API experience they so data that industry want collection. (theoretical) to (details address. beauty Students nonprofits are Some of left flat work students to files, and the individually and companies students know learn what to and can determine). scrape will have be very their few passion deploy information street Supervised constraints teams from project efficiently. learning web for at sites the algorithms The design admissions using students of tools and this stage. are relational project. like provided Python Others databases Requests, with embark the have data on Beautiful entirely been and covered guided Soup, new turf. and in class. Every student through Selenium. exploratory Students After works scraping data on intensely their analysis together own and classification challenges some plotting movie so models him they box or herself that office can fit focus within data, to create on students the new something overall tools, find goals and cool, of the interesting, brainstorming, scrape company more and resources useful, and communication. the or team. worthwhile. on their During own McNulty, and present students their perform movie a deep industry dive regression
5 WEEKS 9-12 UNIT FIVE Final Project Students work full time on their Final Projects, which they have been slowly designing through the first eight weeks. They also learn more about cloud computing, system architectures and feasibility evaluations. Use the design process to isolate an appropriate problem to solve Evaluate the computational feasibility of the problem Choose data sources that can be used to address the problem Design and implement an appropriate computational architecture Design and implement an appropriate set of analysis steps Design and develop a data visualization to clearly convey the results of the analysis to a layperson Assemble final portfolio and present project at Career Day MORE ABOUT PROJECTS Data science projects can be divided into useful dimensions. A dimension can be thought of as a facet along which a decision must be made to specify a project implementation. The bootcamp considers the dimensions of domain, design, data, algorithms, tools, and communication. Each Unit covers certain content from several domains, which are reinforced in that Unit's project. The rigor with which we attack the topics covered in the bootcamp allow us to sleep soundly at night. We feel confident in saying that our graduates haven't simply learned about the tools that data scientists use. By the time they leave our classroom, our graduates are data scientists. They are ready to approach the problem space in their new careers and assemble the suite of tools and methods to answer insightful questions and communicate comprehensible results. They are competent, capable, and confident. And they are ready to work. volume regression of company people databases, models. the on in questions the They and street, insurance API experience they so data that want industry collection. (theoretical) to address. (details beauty Students Some nonprofits are of left flat students work to files, and the individually know and companies students learn what to and will can determine). scrape be have their very final few project deploy information street Supervised constraints teams from at efficiently. learning the web for admissions sites the algorithms The design using students stage. of tools and this Others are relational project. like provided Python embark databases Requests, with on the entirely have data Beautiful been new and covered guided turf. Soup, Every and in class. student works through Selenium. exploratory Students After intensely work scraping data on their analysis and together challenges own and classification some plotting him movie or herself so models they box to that office can create fit focus within data, something on students the new overall cool, tools, find goals interesting, and of the useful, or brainstorming, scrape company more and resources worthwhile. and communication. the team. on their During own McNulty, and present students their perform movie a deep industry dive regression
Introduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!
More informationData Science Certificate Program
Information Technologies Programs Data Science Certificate Program Accelerate Your Career extension.uci.edu/datascience Offered in partnership with University of California, Irvine Extension s professional
More informationMicrosoft Research Windows Azure for Research Training
Copyright 2013 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the
More informationMicrosoft Research Microsoft Azure for Research Training
Copyright 2014 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the
More informationANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
More informationData Science and Business Analytics Certificate Data Science and Business Intelligence Certificate
Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Description The Helzberg School of Management has launched two graduate-level certificates: one in Data
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationDiploma Of Computing
Diploma Of Computing Course Outline Campus Intake CRICOS Course Duration Teaching Methods Assessment Course Structure Units Melbourne Burwood Campus / Jakarta Campus, Indonesia March, June, October 022638B
More informationHADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
More informationIntroduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.
Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus
More informationBig Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify
Big Data at Spotify Anders Arpteg, Ph D Analytics Machine Learning, Spotify Quickly about me Quickly about Spotify What is all the data used for? Quickly about Spark Hadoop MR vs Spark Need for (distributed)
More informationMachine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
More informationDatabricks. A Primer
Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically
More informationWhat s Cooking in KNIME
What s Cooking in KNIME Thomas Gabriel Copyright 2015 KNIME.com AG Agenda Querying NoSQL Databases Database Improvements & Big Data Copyright 2015 KNIME.com AG 2 Querying NoSQL Databases MongoDB & CouchDB
More informationDATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights
DATA EXPERTS We accelerate research and transform data to help you create actionable insights WE MINE WE ANALYZE WE VISUALIZE Domains Data Mining Mining longitudinal and linked datasets from web and other
More informationAnalysis Tools and Libraries for BigData
+ Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I
More informationCORE CLASSES: IS 6410 Information Systems Analysis and Design IS 6420 Database Theory and Design IS 6440 Networking & Servers (3)
COURSE DESCRIPTIONS CORE CLASSES: Required IS 6410 Information Systems Analysis and Design (3) Modern organizations operate on computer-based information systems, from day-to-day operations to corporate
More informationMaster of Science in Health Information Technology Degree Curriculum
Master of Science in Health Information Technology Degree Curriculum Core courses: 8 courses Total Credit from Core Courses = 24 Core Courses Course Name HRS Pre-Req Choose MIS 525 or CIS 564: 1 MIS 525
More informationDatabricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationSession 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm!
Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm! Moderator: David L. Snell, ASA, MAAA Presenters: Brian D. Holland, FSA, MAAA
More informationMike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.
Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,
More informationThe Learn-Verified Full Stack Web Development Program
The Learn-Verified Full Stack Web Development Program Overview This online program will prepare you for a career in web development by providing you with the baseline skills and experience necessary to
More informationIs a Data Scientist the New Quant? Stuart Kozola MathWorks
Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationHadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015
Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationAcademyR Course Catalog
AcademyR Course Catalog Table of Contents Our Philosophy...3 Courses Listed by Role Data Analyst...4 Data Scientist...6 R Programmer...9 Statistician.... 10 BI Developer... 11 System Administrator... 12
More informationWROX Certified Big Data Analyst Program by AnalytixLabs and Wiley
WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley Disclaimer: This material is protected under copyright act AnalytixLabs, 2011. Unauthorized use and/ or duplication of this material or
More informationPredicting outcome of soccer matches using machine learning
Saint-Petersburg State University Mathematics and Mechanics Faculty Albina Yezus Predicting outcome of soccer matches using machine learning Term paper Scientific adviser: Alexander Igoshkin, Yandex Mobile
More informationMonitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
More informationWEB DEVELOPMENT IMMERSIVE GA.CO/WDI
General Assembly Course Curriculum WEB DEVELOPMENT IMMERSIVE Table of Contents 3 Overview 4 Students 5 Curriculum Projects & Units 11 Frequently Asked Questions 13 Contact Information 2 Overview OVERVIEW
More informationCSE 6040 Computing for Data Analytics: Methods and Tools. Lecture 1 Course Overview
CSE 6040 Computing for Data Analytics: Methods and Tools Lecture 1 Course Overview DA KUANG, POLO CHAU GEORGIA TECH FALL 2014 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 1 Course Staff Instructor Da
More informationData Analysis Bootcamp - What To Expect. Damian Herrick Founder, Principal Consultant Lake Hill Analytics, LLC
Data Analysis Bootcamp - What To Expect Damian Herrick Founder, Principal Consultant Lake Hill Analytics, LLC Why Are Companies Using Data and Analytics Today? Data + Predictive Ability + Optimization
More informationAn In-Depth Look at In-Memory Predictive Analytics for Developers
September 9 11, 2013 Anaheim, California An In-Depth Look at In-Memory Predictive Analytics for Developers Philip Mugglestone SAP Learning Points Understand the SAP HANA Predictive Analysis library (PAL)
More informationIntroduction to Big Data with Apache Spark UC BERKELEY
Introduction to Big Data with Apache Spark UC BERKELEY This Lecture Exploratory Data Analysis Some Important Distributions Spark mllib Machine Learning Library Descriptive vs. Inferential Statistics Descriptive:»
More informationBIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
More informationClassroom Demonstrations of Big Data
Classroom Demonstrations of Big Data Eric A. Suess Abstract We present examples of accessing and analyzing large data sets for use in a classroom at the first year graduate level or senior undergraduate
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationComputer Science Course Descriptions Page 1
CS 101 Intro to Computer Science An introduction to computer science concepts and the role of computers in society. Topics include the history of computing, computer hardware, operating systems, the Internet,
More informationIntroduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
More informationPredictive Analytics Certificate Program
Information Technologies Programs Predictive Analytics Certificate Program Accelerate Your Career Offered in partnership with: University of California, Irvine Extension s professional certificate and
More informationSURVEY REPORT DATA SCIENCE SOCIETY 2014
SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses
More informationSAV2013: The Great SharePoint 2013 App Venture
SHAREPOINT 2013 FOR DEVELOPERS 5 DAYS SAV2013: The Great SharePoint 2013 App Venture AUDIENCE FORMAT COURSE DESCRIPTION Professional Developers Instructor-led training with hands-on labs This 5-day course
More informationBig Data. Lyle Ungar, University of Pennsylvania
Big Data Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. McKinsey Data Scientist: The Sexiest Job of the 21st Century -
More informationThis Symposium brought to you by www.ttcus.com
This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationBayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationBIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
More informationMS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
More informationChallenges and Lessons from NIST Data Science Pre-pilot Evaluation in Introduction to Data Science Course Fall 2015
Challenges and Lessons from NIST Data Science Pre-pilot Evaluation in Introduction to Data Science Course Fall 2015 Dr. Daisy Zhe Wang Director of Data Science Research Lab University of Florida, CISE
More informationUp Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
More informationBig Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
More informationTowards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems
Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Volker Markl volker.markl@tu-berlin.de dima.tu-berlin.de dfki.de/web/research/iam/ bbdc.berlin Based on my 2014 Vision Paper On
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationIronfan Your Foundation for Flexible Big Data Infrastructure
Ironfan Your Foundation for Flexible Big Data Infrastructure Benefits With Ironfan, you can expect: Reduced cycle time. Provision servers in minutes not days. Improved visibility. Increased transparency
More informationApache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com
Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com Spark Fast & Expressive Cluster computing engine Compatible with Hadoop Came
More informationHow To Learn To Use Big Data
Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate
More informationBig Data Analytics and Optimization
Big Data Analytics and Optimization C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e e.edu.in http://www.insof LIST OF COURSES Essential Business Skills for a Data Scientist...
More informationAn interdisciplinary model for analytics education
An interdisciplinary model for analytics education Raffaella Settimi, PhD School of Computing, DePaul University Drew Conway s Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
More informationCOMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers
COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Instructor: (jpineau@cs.mcgill.ca) TAs: Pierre-Luc Bacon (pbacon@cs.mcgill.ca) Ryan Lowe (ryan.lowe@mail.mcgill.ca)
More informationCar Insurance. Prvák, Tomi, Havri
Car Insurance Prvák, Tomi, Havri Sumo report - expectations Sumo report - reality Bc. Jan Tomášek Deeper look into data set Column approach Reminder What the hell is this competition about??? Attributes
More informationHow To Write A Data Analysis Project
Section 1. Data Analytics Lifecycle Overview The Data Analytics Lifecycle is designed specifically for Big Data problems and data science projects. The lifecycle has six phases, and project work can occur
More informationIntroduction to Data Science: CptS 483-06 Syllabus First Offering: Fall 2015
Course Information Introduction to Data Science: CptS 483-06 Syllabus First Offering: Fall 2015 Credit Hours: 3 Semester: Fall 2015 Meeting times and location: MWF, 12:10 13:00, Sloan 163 Course website:
More informationName: Srinivasan Govindaraj Title: Big Data Predictive Analytics
Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
More informationHunk & Elas=c MapReduce: Big Data Analy=cs on AWS
Copyright 2014 Splunk Inc. Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Dritan Bi=ncka BD Solu=ons Architecture Disclaimer During the course of this presenta=on, we may make forward looking statements
More informationThe evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationR YOU READY FOR PYTHON? Sunday 19th April, 2015
R YOU READY FOR PYTHON? Sunday 19th April, 2015 THIS IS NOT A PYTHON VS R TALK credits - https://meetmrholland.wordpress.com/2013/02/03/creative-5-tips-to-make-all-your-meetings-exactly-the-same/ WHO ARE
More informationBig Data Integration: A Buyer's Guide
SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology
More informationHow To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
More informationLearning Web App Development
Learning Web App Development Semmy Purewal Beijing Cambridge Farnham Kbln Sebastopol Tokyo O'REILLY Table of Contents Preface xi 1. The Workflow 1 Text Editors 1 Installing Sublime Text 2 Sublime Text
More informationCourse 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
More informationAdvanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
More informationPredictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics
Predictive Analytics Powered by SAP HANA Cary Bourgeois Principal Solution Advisor Platform and Analytics Agenda Introduction to Predictive Analytics Key capabilities of SAP HANA for in-memory predictive
More informationTrainer Preparation Guide for Course 20488B: Developing Microsoft SharePoint Server 2013 Core Solutions Design of the Course
Trainer Preparation Guide for Course 20488B: Developing Microsoft SharePoint Server 2013 Core Solutions 1 Trainer Preparation Guide for Course 20488B: Developing Microsoft SharePoint Server 2013 Core Solutions
More informationBachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries
First Semester Development 1A On completion of this subject students will be able to apply basic programming and problem solving skills in a 3 rd generation object-oriented programming language (such as
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationBig Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic
Big Data Analytics with Spark and Oscar BAO Tamas Jambor, Lead Data Scientist at Massive Analytic About me Building a scalable Machine Learning platform at MA Worked in Big Data and Data Science in the
More informationAdvanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
More informationFrom Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data
100 001 010 111 From Raw Data to 10011100 Actionable Insights with 00100111 MATLAB Analytics 01011100 11100001 1 Access and Explore Data For scientists the problem is not a lack of available but a deluge.
More informationClient Overview. Engagement Situation. Key Requirements
Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision
More informationData Integration Checklist
The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media
More informationCSci 538 Articial Intelligence (Machine Learning and Data Analysis)
CSci 538 Articial Intelligence (Machine Learning and Data Analysis) Course Syllabus Fall 2015 Instructor Derek Harter, Ph.D., Associate Professor Department of Computer Science Texas A&M University - Commerce
More informationStatistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
More informationMyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration
MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration Hoi-Wan Chan 1, Min Xu 2, Chung-Pan Tang 1, Patrick P. C. Lee 1 & Tsz-Yeung Wong 1, 1 Department of Computer Science
More informationUsability of Visualization Libraries for Web Browsers for Use in Scientific Analysis
Usability of Visualization Libraries for Web Browsers for Use in Scientific Analysis Luke Barnard Technical Student CERN, Route de Meyrin 385 1217 Meyrin, Switzerland Matej Mertik Scientific Associate
More informationYou should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.
What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees
More informationYou ll need to have: It d be great if you have:
DevOps We re looking for a Development Operations Developer with a passion for experimentation. If you re interested in helping us build the future of mobile healthcare, this job is for you. A strong background
More informationSpark and the Big Data Library
Spark and the Big Data Library Reza Zadeh Thanks to Matei Zaharia Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in both enterprises and
More informationAn Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics
An Overview of Predictive Analytics for Practitioners Dean Abbott, Abbott Analytics Thank You Sponsors Empower users with new insights through familiar tools while balancing the need for IT to monitor
More informationPredictive Modeling Techniques in Insurance
Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics
More information