Date : July 28, 2015
Awesome(Team( 2! Who"are"we?" Menish Gupta Lukas Osborne Founder!&!CEO! 9+!years!@!Amex!! 5!years!@!Startups!in!NYC! B.S.!/!M.S.!Comp!Sci.!NJIT! Data!Science! 7!PublicaIons! 5!years!@!CISMM!Labs! PhD.!Physics!UNC! Jose Escalano Lei Xia EducaIon!/!Training! Engineering! 28!PublicaIons! 11!years!industry!experience!! 7!years!Teaching! PhD.!Electrical,!Univ.!of!Valencia!! 4!years!industry!experience! MS!Comp!Sci,!Stevens!
What(we(do( 3! Founded"Fall"2013,"with"a"spark" Data"Science"Trainings" Develop"cuDng"edge" algorithms"
4! We"train"the"next"generaMon"of"data"scienMsts" Students( 7L1"student"raMo,"hands"on" pracmcal"data"science"training" PracMcal"hands"on"inLperson" classroom"trainings" Customize"use"cases"based"on" customer"data"for"training" 93" 90+" Corporate(Trainings( 3" Training(Materials( Develop"hands"on"pracMcal" cook"books,"and"data"sets" Research( Keep"tab"on"latest"research"in" academia"&"open"source"
Our(Training(Offers( 5! Skills"you"need" Core! Hadoop! Algorithms! Engineering" Big"Data" The"Brains" IntroducIon!to!Data!Science! Data!Munging!&!Fusion! Text!Mining! Naïve"Bayes" RecommendaIon!Engines! Principal!Component!Analysis! ClassificaIon! Decision"Trees" Random"Forest" Gradient"BoosMng"Machines" Generalized!Linear!Models" Clustering! KNN" KLMeans" Frequent!Pa\ern!Mining! Stable!Marriage! Graph!Analysis!
Trainings(Overview( Two"Tracks"for"Next"GeneraMon"of"Data"ScienMst" 6! Big(Data( Big(Data( Track1! Machine(Learning( Big(Data( Track!2!
Big(Data( Track1!
Big(Data(Training( 1! Track Week!1! 8! 4"Week"Big"Data"Training" Week!2! For!Data!Science! Week!3! Week!4! Self!Study! CerFficaFons( Study"&"ace"one"of"the" industry"standard"cermficamon"
Big(Data(Training( Master"the"basics" Week!1! " IntroducFons( 9! Pulling(and(Processing(Data( MoIvaIon!for!Big!Data! Unix!for!Data!Science! Pushing!and!Pulling!data!from!remote!servers! Columnar!Compressions! Extended!Data!DicIonary! SQL!overview! SQL design patterns for data analytics! o Pivot Tables! o Aggregation! o Network Analysis! Monday!a!!6:30!PM! Data(Set(Used( Google N-Gram! 100 Million Records! Wednesday!a!!6:30!PM! Unix(Assignments( Process data in parallel! Working with remote Machines! SQL(Assignments( Five key design patterns! Joins, Aggregation, Temp Tables, Indexes, Functions!
Big(Data(Training( Spin"up"the"cluster" Week!2! Cluster(Setup( 10! IntroducFon(Hadoop(( IntroducIon!to!Big!Data!Ecosystem! Acquire!5!machines!in!AWS!or!DO! Prepare!machines!for!Hadoop! Setup!5!!10!Node!Cluster! Say!Hello!to!Hadoop! Monday!a!!6:30!PM! Data(Set(Used( Google N-Gram! 100 Million Records! MoIvaIon!for!Hadoop! HDFS! ETL in Hadoop with large dataset! SQOOP! OOZIE! Hadoop Streaming! Wednesday!a!!6:30!PM! Cluster(Setup(Assignment( Setup Cluster in cloud! Develop automation scripts! ETL(In(Hadoop( N Gram data in Hadoop! Develop ETL jobs in cluster!
Big(Data(Training( Wrangle"millions"of"records"in"Hadoop" Week!3! Hive( 11! Advanced(Hive( MoIvaIon!for!hive! Hive!architecture! AggregaIon!and!data!selecIon! Hive!and!Python!IntegraIon! Monday!a!!6:30!PM! Data(Set(Used( Google N-Gram! 100 Million Records! Hive!Jobs!and!Variables! Custom!FuncIons! Custom!data!types! Indexing!and!Performance!issues! Wednesday!a!!6:30!PM! Hive(Assignment( Data aggregation! Hive(Assignment(2( N Gram data in Hadoop! Develop ETL jobs in cluster!
Big(Data(Training( Hadoop"under"the"hood"with"Map"Reduce" Week!4! Hadoop(Map(Reduce( 12! Advanced(Map(Reduce( MoIvaIon!for!Map!Reduce! Map!Reduce!in!acIon! Map!Reduce!API! Spli\er!and!Combiners! Custom!data!format! Monday!a!!6:30!PM! Data(Set(Used( Google N-Gram! 100 Million Records! Distributed!Joins! Data!Compression!in!Map!Reduce! OpImizaIons! Debugging!and!Tracing!! Wednesday!a!!6:30!PM! M/R(Assignment( Data aggregation! M/R(Assignment(2( N Gram data in Hadoop! Develop ETL jobs in cluster!
Pricing(Model( 13! Priced"to"Win" Big(Data( Big(Data( Schedule( 4(Weeks( ( Mon(6:30(PM( (9:30(PM( Wed(6:30(PM (9:30(PM( Price( $1500! 1! Track
Machine(Learning( Big(Data( Track!2!
2! Track Week!1! For!Data!Science! Machine(Learning(Training( 15! 6"Week"Data"Science"Training" Week!2! Week!3! IntroducIon!to!! Machine!Learning! Generalized!Linear!Models!!Linear"Regression" "RegularizaMon" "LogisMc"Regression" Data!Fusion!! and!fuzzy!matching! Clustering! Knn" KLMeans" RecommendaIon!Engine! Frequent!Pa\ern!Mining!!CollaboraMve"Filtering" "Apriori"Algorithm" Text!Mining! "Naive"Bayes" Week!4! PCA! Week!5! Week!6! Ensemble!Techniques! Decision!Trees! Random!Forests! Stable!Marriage! Gradient!BoosIng!! Machines! Graph!Analysis! 3!Weeks!of!opIonal! Independent!Projects!
Machine(Learning( Master"the"basics" Week!1! " IntroducFons( 16! Python(for(Data(Science( MoIvaIon!for!Big!Data! Unix!for!Data!Science! Pushing!and!Pulling!data!from!remote!servers! Columnar!Compressions! Extended!Data!DicIonary! Tuesday!a!!6:30!PM! Data(Set(Used( Google N-Gram! 100 Million Records! Thinking!in!Python! Python design patterns for data analytics! Pandas! Data Frames! Aggregations! Python with Parallel powers! Thursday!a!!6:30!PM! 1.(Unix(Assignments( Process data in parallel! Working with remote Machines! 2.(Python(Assignments( Data Processing in Python! Python scripts and automation!
Machine(Learning( Gearing"up" Week!2! IntroducFon(to(Machine(Learning( 17! MoIvaIon!for!Machine!Learning!(ML)! Decipher!mathemaIcal!notaIons! Back!to!basics!with!staIsIcal!concepts! Geometric!,!ProbabilisIc!and!Logical!Models!! Standardized!ML!Model!lifecycle! Accuracy!and!PredicIon!Error! Precision!and!Recall! ROC!Curve!&!AUC! Tuesday!a!!6:30!PM! Data(Set(Used( Yelp and Y-Pages Data sets on businesses! Data(Fusion(and(Fuzzy(Matching( Merging!data!sets!from!mulIple!sources! ProbabilisIc!and!DeterminisIc!Matching! String!Fuzzy!Matching!! - Levenshtein!Distance,!Jaro!Winkler!Distance! Fuzzy!Address!Matching! Swapain!/!Swapaout!analysis! Industry!Use!Cases! Thursday!a!!6:30!PM! Reading(Materials( Classical Papers in Machine Learning! 3.(Swap\in(/(Swap\out(Analysis( Firmographic data from Yelp and YPages!
Machine(Learning( Classical"Topics" Week!3! Generalized(Linear(Models( 18! Linear!Regression! RegularizaIon!(!Ridge,!Lasso!)! LogisIc!Regression! Feature!SelecIons! Industry!Use!Case!! Tuesday!a!!6:30!PM! Data(Set(Used( Linear Models : TBD! Recommendation / Naïve Bayes! Project Guttenberg / Wikipedia! RecommendaFon(Engine(/(Text(Mining( MoIvaIon!for!recommendaIon!Engines! Sparse!Matrices!operaIons! Manha\an!Distance,!Euclidean!Distance,!Cosine!Distance!! Similarity!Matrices!and!results! MoIvaIon!for!Text!Mining! Naïve!Bayes! ApplicaIons!and!Results! Thursday!a!!6:30!PM! 4.(LogisFc(Regression(Assignment(( Data Munging! Develop regression models! Validate the model! 5.(CollaboraFve(Filter( Classify books in Guttenberg project! Classify articles in Wikipedia!
Machine(Learning( ClassificaMon"and"Mining"" Week!4! Clustering(:(Knn(&(K\means( 19! MoIvaIon!for!Unasupervised!learning!methods! IntuiIon!behind!Knn!and!ApplicaIons! IntuiIon!behind!KaMeans!and!ApplicaIons! From!Kernels!to!distances! MulI!class!classificaIon! Hierarchical!Clustering! Frequent(Pabern(Mining(/(PCA( MoIvaIon!for!pa\ern!mining! IntuiIon!for!Apriori!Algorithm! Cluster!analysis!in!pa\erns! Industry!Use!Case! Principal!Component!Analysis! Curse!of!dimensionality! Tuesday!a!!6:30!PM! Data(Set(Used( Project Guttenberg / Wikipedia! Thursday!a!!6:30!PM! 6.(Clustering(Assignment( Cluster similar Wikipedia pages! Classify a new page! 7.(Pabern(Mining( Identify common language expressions in the corpus!
Machine(Learning( See"the"trees"and"the"forest" Week!5! Decision(Trees(and(Random(Forest( 20! MoIvaIon!for!Decision!Trees! ID3,!C4.5!and!CART! Entropy,!InformaIon!Gain,!Pruning!and!Purging! Trees!in!AcIons! MoIvaIon!for!Random!Forest! Vote!by!democracy!/!Variable!Importance! Random!Forest!in!AcIon! Gradient(BoosFng(Machines((GBM)( Tuesday!a!!6:30!PM! Data(Set(Used( MINST Hand Digit Data Set! MoIvaIon!for!GBM! BoosIng!vs.!Bagging! Residual!error!and!tree!generaIons! Metrics!Search!for!best!GBM!Trees! GBM!in!acIon! Industry!Use!cases! Thursday!a!!6:30!PM! 8.(Tree(and(Random(Forest(Assignments( Develop Classification Trees! Use MINST Data Set! 9.(GBM(Model(Development( GBM Model in MINST Data set! Compare Random Forest / GBM!
Machine(Learning( Hadoop"under"the"hood"with"Map"Reduce" Week!6! Stable(Marriage( 21! Graph(Analysis( MoIvaIons!for!matching!algorithms!with!preferences! BiaparIte!graphs! DefiniIon!of!Stable!Matching! Preferences!with!both!parIes! Incomplete!List!and!Ties! Industry!Use!cases! MoIvaIon!for!Network!Analysis! Standard!metrics!in!Graph!analysis!(!Centrality,!Nearest!Neighbor..!)! Directed!vs.!UnaDirected!Graphs! Network!visualizaIon!in!Gephi! Graphs!in!the!real!world! Cluster!Analysis!in!Graphs! Closing!Remarks! Tuesday!a!!6:30!PM! Data(Set(Used( Residents / Hospital Matching! Thursday!a!!6:30!PM! 10.(Stable(Marriage( Create stable marriages between Hospitals and Residents! 11.(Graph(Analysis( Develop your LinkedIn Social Graph! Develop ETL jobs in cluster!
Pricing(Model( 22! Priced"to"win" Machine(Learning( Big(Data( Schedule( 6(Weeks( ( Tue(6:30(PM( (9:30(PM( Thur(6:30(PM (9:30(PM( Price( $ 6,000! 2! Track!
23! Contact(Us( Made"in"NYC" 25!Broadway! Suite!5055! New!York,!NY! Enroll@bitbootcamp.com! 917a819a0106! 201a314a5838! www.bitbootcamp.com!