Date : July 28, 2015

Similar documents
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic

Big Data and Data Science: Behind the Buzz Words

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Hadoop Job Oriented Training Agenda

Big Data on Microsoft Platform

Sunnie Chung. Cleveland State University

Implement Hadoop jobs to extract business value from large and varied data sets

BIG DATA What it is and how to use?

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Fast Analytics on Big Data with H20

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

Data Analyst Program- 0 to 100

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?

HADOOP. Revised 10/19/2015

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

E6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics

Ali Ghodsi Head of PM and Engineering Databricks

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Big Data Analytics Opportunities and Challenges

Workshop on Hadoop with Big Data

Federated Cloud-based Big Data Platform in Telecommunications

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

ANALYTICS CENTER LEARNING PROGRAM

Creating Big Data Applications with Spring XD

HiBench Introduction. Carson Wang Software & Services Group

A Brief Outline on Bigdata Hadoop

What s Cooking in KNIME

CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

Big Data Analytics and Optimization

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Testing Big data is one of the biggest

HDP Hadoop From concept to deployment.

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Big Data Architect Certification Self-Study Kit Bundle

SEIZE THE DATA SEIZE THE DATA. 2015

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

The Internet of Things and Big Data: Intro

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

Advanced Big Data Analytics with R and Hadoop

HiBench Installation. Sunil Raiyani, Jayam Modi

BIG DATA - HADOOP PROFESSIONAL amron

BIG DATA HADOOP TRAINING

Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.

Duke University

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Analysis Tools and Libraries for BigData

CSE 6040 Computing for Data Analytics: Methods and Tools. Lecture 1 Course Overview

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Advanced In-Database Analytics

High Productivity Data Processing Analytics Methods with Applications

BIG DATA TRENDS AND TECHNOLOGIES

Maschinelles Lernen mit MATLAB

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Data Analytics Infrastructure

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

! E6893 Big Data Analytics:! Demo Session II: Mahout working with Eclipse and Maven for Collaborative Filtering

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Big Data. Lyle Ungar, University of Pennsylvania

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Training Catalog. Summer 2015 Training Catalog. Apache Hadoop Training from the Experts. Apache Hadoop Training From the Experts

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Processing of Big Data. Nelson L. S. da Fonseca IEEE ComSoc Summer Scool Trento, July 9 th, 2015

How to Hadoop Without the Worry: Protecting Big Data at Scale

BIG DATA CHALLENGES AND PERSPECTIVES

ITG Software Engineering

Bringing Big Data to People

Spark and the Big Data Library

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Assignment # 1 (Cloud Computing Security)

Hadoop Ecosystem B Y R A H I M A.

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Big Data Explained. An introduction to Big Data Science.

A bit about Hadoop. Luca Pireddu. March 9, CRS4Distributed Computing Group. (CRS4) Luca Pireddu March 9, / 18

BIG DATA IN BUSINESS ENVIRONMENT

Machine learning for algo trading

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Safe Harbor Statement

Data Science with Hadoop at Opower

Tax Fraud in Increasing

Big Data Analytics: Where is it Going and How Can it Be Taught at the Undergraduate Level?

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

The University of Jordan

Transcription:

Date : July 28, 2015

Awesome(Team( 2! Who"are"we?" Menish Gupta Lukas Osborne Founder!&!CEO! 9+!years!@!Amex!! 5!years!@!Startups!in!NYC! B.S.!/!M.S.!Comp!Sci.!NJIT! Data!Science! 7!PublicaIons! 5!years!@!CISMM!Labs! PhD.!Physics!UNC! Jose Escalano Lei Xia EducaIon!/!Training! Engineering! 28!PublicaIons! 11!years!industry!experience!! 7!years!Teaching! PhD.!Electrical,!Univ.!of!Valencia!! 4!years!industry!experience! MS!Comp!Sci,!Stevens!

What(we(do( 3! Founded"Fall"2013,"with"a"spark" Data"Science"Trainings" Develop"cuDng"edge" algorithms"

4! We"train"the"next"generaMon"of"data"scienMsts" Students( 7L1"student"raMo,"hands"on" pracmcal"data"science"training" PracMcal"hands"on"inLperson" classroom"trainings" Customize"use"cases"based"on" customer"data"for"training" 93" 90+" Corporate(Trainings( 3" Training(Materials( Develop"hands"on"pracMcal" cook"books,"and"data"sets" Research( Keep"tab"on"latest"research"in" academia"&"open"source"

Our(Training(Offers( 5! Skills"you"need" Core! Hadoop! Algorithms! Engineering" Big"Data" The"Brains" IntroducIon!to!Data!Science! Data!Munging!&!Fusion! Text!Mining! Naïve"Bayes" RecommendaIon!Engines! Principal!Component!Analysis! ClassificaIon! Decision"Trees" Random"Forest" Gradient"BoosMng"Machines" Generalized!Linear!Models" Clustering! KNN" KLMeans" Frequent!Pa\ern!Mining! Stable!Marriage! Graph!Analysis!

Trainings(Overview( Two"Tracks"for"Next"GeneraMon"of"Data"ScienMst" 6! Big(Data( Big(Data( Track1! Machine(Learning( Big(Data( Track!2!

Big(Data( Track1!

Big(Data(Training( 1! Track Week!1! 8! 4"Week"Big"Data"Training" Week!2! For!Data!Science! Week!3! Week!4! Self!Study! CerFficaFons( Study"&"ace"one"of"the" industry"standard"cermficamon"

Big(Data(Training( Master"the"basics" Week!1! " IntroducFons( 9! Pulling(and(Processing(Data( MoIvaIon!for!Big!Data! Unix!for!Data!Science! Pushing!and!Pulling!data!from!remote!servers! Columnar!Compressions! Extended!Data!DicIonary! SQL!overview! SQL design patterns for data analytics! o Pivot Tables! o Aggregation! o Network Analysis! Monday!a!!6:30!PM! Data(Set(Used( Google N-Gram! 100 Million Records! Wednesday!a!!6:30!PM! Unix(Assignments( Process data in parallel! Working with remote Machines! SQL(Assignments( Five key design patterns! Joins, Aggregation, Temp Tables, Indexes, Functions!

Big(Data(Training( Spin"up"the"cluster" Week!2! Cluster(Setup( 10! IntroducFon(Hadoop(( IntroducIon!to!Big!Data!Ecosystem! Acquire!5!machines!in!AWS!or!DO! Prepare!machines!for!Hadoop! Setup!5!!10!Node!Cluster! Say!Hello!to!Hadoop! Monday!a!!6:30!PM! Data(Set(Used( Google N-Gram! 100 Million Records! MoIvaIon!for!Hadoop! HDFS! ETL in Hadoop with large dataset! SQOOP! OOZIE! Hadoop Streaming! Wednesday!a!!6:30!PM! Cluster(Setup(Assignment( Setup Cluster in cloud! Develop automation scripts! ETL(In(Hadoop( N Gram data in Hadoop! Develop ETL jobs in cluster!

Big(Data(Training( Wrangle"millions"of"records"in"Hadoop" Week!3! Hive( 11! Advanced(Hive( MoIvaIon!for!hive! Hive!architecture! AggregaIon!and!data!selecIon! Hive!and!Python!IntegraIon! Monday!a!!6:30!PM! Data(Set(Used( Google N-Gram! 100 Million Records! Hive!Jobs!and!Variables! Custom!FuncIons! Custom!data!types! Indexing!and!Performance!issues! Wednesday!a!!6:30!PM! Hive(Assignment( Data aggregation! Hive(Assignment(2( N Gram data in Hadoop! Develop ETL jobs in cluster!

Big(Data(Training( Hadoop"under"the"hood"with"Map"Reduce" Week!4! Hadoop(Map(Reduce( 12! Advanced(Map(Reduce( MoIvaIon!for!Map!Reduce! Map!Reduce!in!acIon! Map!Reduce!API! Spli\er!and!Combiners! Custom!data!format! Monday!a!!6:30!PM! Data(Set(Used( Google N-Gram! 100 Million Records! Distributed!Joins! Data!Compression!in!Map!Reduce! OpImizaIons! Debugging!and!Tracing!! Wednesday!a!!6:30!PM! M/R(Assignment( Data aggregation! M/R(Assignment(2( N Gram data in Hadoop! Develop ETL jobs in cluster!

Pricing(Model( 13! Priced"to"Win" Big(Data( Big(Data( Schedule( 4(Weeks( ( Mon(6:30(PM( (9:30(PM( Wed(6:30(PM (9:30(PM( Price( $1500! 1! Track

Machine(Learning( Big(Data( Track!2!

2! Track Week!1! For!Data!Science! Machine(Learning(Training( 15! 6"Week"Data"Science"Training" Week!2! Week!3! IntroducIon!to!! Machine!Learning! Generalized!Linear!Models!!Linear"Regression" "RegularizaMon" "LogisMc"Regression" Data!Fusion!! and!fuzzy!matching! Clustering! Knn" KLMeans" RecommendaIon!Engine! Frequent!Pa\ern!Mining!!CollaboraMve"Filtering" "Apriori"Algorithm" Text!Mining! "Naive"Bayes" Week!4! PCA! Week!5! Week!6! Ensemble!Techniques! Decision!Trees! Random!Forests! Stable!Marriage! Gradient!BoosIng!! Machines! Graph!Analysis! 3!Weeks!of!opIonal! Independent!Projects!

Machine(Learning( Master"the"basics" Week!1! " IntroducFons( 16! Python(for(Data(Science( MoIvaIon!for!Big!Data! Unix!for!Data!Science! Pushing!and!Pulling!data!from!remote!servers! Columnar!Compressions! Extended!Data!DicIonary! Tuesday!a!!6:30!PM! Data(Set(Used( Google N-Gram! 100 Million Records! Thinking!in!Python! Python design patterns for data analytics! Pandas! Data Frames! Aggregations! Python with Parallel powers! Thursday!a!!6:30!PM! 1.(Unix(Assignments( Process data in parallel! Working with remote Machines! 2.(Python(Assignments( Data Processing in Python! Python scripts and automation!

Machine(Learning( Gearing"up" Week!2! IntroducFon(to(Machine(Learning( 17! MoIvaIon!for!Machine!Learning!(ML)! Decipher!mathemaIcal!notaIons! Back!to!basics!with!staIsIcal!concepts! Geometric!,!ProbabilisIc!and!Logical!Models!! Standardized!ML!Model!lifecycle! Accuracy!and!PredicIon!Error! Precision!and!Recall! ROC!Curve!&!AUC! Tuesday!a!!6:30!PM! Data(Set(Used( Yelp and Y-Pages Data sets on businesses! Data(Fusion(and(Fuzzy(Matching( Merging!data!sets!from!mulIple!sources! ProbabilisIc!and!DeterminisIc!Matching! String!Fuzzy!Matching!! - Levenshtein!Distance,!Jaro!Winkler!Distance! Fuzzy!Address!Matching! Swapain!/!Swapaout!analysis! Industry!Use!Cases! Thursday!a!!6:30!PM! Reading(Materials( Classical Papers in Machine Learning! 3.(Swap\in(/(Swap\out(Analysis( Firmographic data from Yelp and YPages!

Machine(Learning( Classical"Topics" Week!3! Generalized(Linear(Models( 18! Linear!Regression! RegularizaIon!(!Ridge,!Lasso!)! LogisIc!Regression! Feature!SelecIons! Industry!Use!Case!! Tuesday!a!!6:30!PM! Data(Set(Used( Linear Models : TBD! Recommendation / Naïve Bayes! Project Guttenberg / Wikipedia! RecommendaFon(Engine(/(Text(Mining( MoIvaIon!for!recommendaIon!Engines! Sparse!Matrices!operaIons! Manha\an!Distance,!Euclidean!Distance,!Cosine!Distance!! Similarity!Matrices!and!results! MoIvaIon!for!Text!Mining! Naïve!Bayes! ApplicaIons!and!Results! Thursday!a!!6:30!PM! 4.(LogisFc(Regression(Assignment(( Data Munging! Develop regression models! Validate the model! 5.(CollaboraFve(Filter( Classify books in Guttenberg project! Classify articles in Wikipedia!

Machine(Learning( ClassificaMon"and"Mining"" Week!4! Clustering(:(Knn(&(K\means( 19! MoIvaIon!for!Unasupervised!learning!methods! IntuiIon!behind!Knn!and!ApplicaIons! IntuiIon!behind!KaMeans!and!ApplicaIons! From!Kernels!to!distances! MulI!class!classificaIon! Hierarchical!Clustering! Frequent(Pabern(Mining(/(PCA( MoIvaIon!for!pa\ern!mining! IntuiIon!for!Apriori!Algorithm! Cluster!analysis!in!pa\erns! Industry!Use!Case! Principal!Component!Analysis! Curse!of!dimensionality! Tuesday!a!!6:30!PM! Data(Set(Used( Project Guttenberg / Wikipedia! Thursday!a!!6:30!PM! 6.(Clustering(Assignment( Cluster similar Wikipedia pages! Classify a new page! 7.(Pabern(Mining( Identify common language expressions in the corpus!

Machine(Learning( See"the"trees"and"the"forest" Week!5! Decision(Trees(and(Random(Forest( 20! MoIvaIon!for!Decision!Trees! ID3,!C4.5!and!CART! Entropy,!InformaIon!Gain,!Pruning!and!Purging! Trees!in!AcIons! MoIvaIon!for!Random!Forest! Vote!by!democracy!/!Variable!Importance! Random!Forest!in!AcIon! Gradient(BoosFng(Machines((GBM)( Tuesday!a!!6:30!PM! Data(Set(Used( MINST Hand Digit Data Set! MoIvaIon!for!GBM! BoosIng!vs.!Bagging! Residual!error!and!tree!generaIons! Metrics!Search!for!best!GBM!Trees! GBM!in!acIon! Industry!Use!cases! Thursday!a!!6:30!PM! 8.(Tree(and(Random(Forest(Assignments( Develop Classification Trees! Use MINST Data Set! 9.(GBM(Model(Development( GBM Model in MINST Data set! Compare Random Forest / GBM!

Machine(Learning( Hadoop"under"the"hood"with"Map"Reduce" Week!6! Stable(Marriage( 21! Graph(Analysis( MoIvaIons!for!matching!algorithms!with!preferences! BiaparIte!graphs! DefiniIon!of!Stable!Matching! Preferences!with!both!parIes! Incomplete!List!and!Ties! Industry!Use!cases! MoIvaIon!for!Network!Analysis! Standard!metrics!in!Graph!analysis!(!Centrality,!Nearest!Neighbor..!)! Directed!vs.!UnaDirected!Graphs! Network!visualizaIon!in!Gephi! Graphs!in!the!real!world! Cluster!Analysis!in!Graphs! Closing!Remarks! Tuesday!a!!6:30!PM! Data(Set(Used( Residents / Hospital Matching! Thursday!a!!6:30!PM! 10.(Stable(Marriage( Create stable marriages between Hospitals and Residents! 11.(Graph(Analysis( Develop your LinkedIn Social Graph! Develop ETL jobs in cluster!

Pricing(Model( 22! Priced"to"win" Machine(Learning( Big(Data( Schedule( 6(Weeks( ( Tue(6:30(PM( (9:30(PM( Thur(6:30(PM (9:30(PM( Price( $ 6,000! 2! Track!

23! Contact(Us( Made"in"NYC" 25!Broadway! Suite!5055! New!York,!NY! Enroll@bitbootcamp.com! 917a819a0106! 201a314a5838! www.bitbootcamp.com!