Big Data Analytics and Optimization



Similar documents
Certificate Program in Big Data Analytics and Optimization

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

An Introduction to Data Mining

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

Azure Machine Learning, SQL Data Mining and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Big Data and Data Science: Behind the Buzz Words

ANALYTICS CENTER LEARNING PROGRAM

Information and Decision Sciences (IDS)

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Learning outcomes. Knowledge and understanding. Competence and skills

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Tax Fraud in Increasing

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Bringing Big Data to People

Workshop on Hadoop with Big Data

The University of Jordan

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

TRAINING PROGRAM ON BIGDATA/HADOOP

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

BIG DATA What it is and how to use?

Supervised Learning (Big Data Analytics)

R Tools Evaluation. A review by Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015

Master of Science in Health Information Technology Degree Curriculum

Data Mining Practical Machine Learning Tools and Techniques

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Big Data Science Certified Professional (BDSCP)

MS1b Statistical Data Mining

Machine learning for algo trading

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

ANALYTICS IN BIG DATA ERA

Hadoop Ecosystem B Y R A H I M A.

2015 Workshops for Professors

How to Hadoop Without the Worry: Protecting Big Data at Scale

Predictive Analytics Certificate Program

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Implement Hadoop jobs to extract business value from large and varied data sets

Our Raison d'être. Identify major choice decision points. Leverage Analytical Tools and Techniques to solve problems hindering these decision points

The Future of Data Management with Hadoop and the Enterprise Data Hub

A fast, powerful data mining workbench designed for small to midsize organizations

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Advanced In-Database Analytics

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

Data, Measurements, Features

The Data Mining Process

How To Understand The Theory Of Probability

Advanced Big Data Analytics with R and Hadoop

Transforming the Telecoms Business using Big Data and Analytics

The Internet of Things and Big Data: Intro

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Analytics on Big Data

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis

HPC ABDS: The Case for an Integrating Apache Big Data Stack

Principles of Data Mining by Hand&Mannila&Smyth

Comprehensive Analytics on the Hortonworks Data Platform

BIG DATA - HADOOP PROFESSIONAL amron

Sunnie Chung. Cleveland State University

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

HDP Hadoop From concept to deployment.

Get to Know the IBM SPSS Product Portfolio

Statistics Graduate Courses

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Description. Textbook. Grading. Objective

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

not possible or was possible at a high cost for collecting the data.

Statistics for BIG data

Information Builders Mission & Value Proposition

An Overview of Knowledge Discovery Database and Data mining Techniques

Chase Wu New Jersey Ins0tute of Technology

HADOOP. Revised 10/19/2015

Dominik Wagenknecht Accenture

DATA MINING TECHNIQUES AND APPLICATIONS

R-Academy I Knowledge, that matters

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

SEIZE THE DATA SEIZE THE DATA. 2015

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Transcription:

Big Data Analytics and Optimization C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e e.edu.in http://www.insof

LIST OF COURSES Essential Business Skills for a Data Scientist... 3 Planning and Thinking Skills for Architecting Data Science Solutions... 4 Essential Engineering Skills in Big Data Analytics... 5 Statistical Modeling for Predictive Analytics in Engineering and Business... 6 Engineering Big Data with R and Hadoop Ecosystem... 7 Text Mining, Social Network Analysis and Natural Language Processing... 8 Methods and Algorithms in Machine Learning... 9 Optimization and Decision Analysis... 10 Communication, Ethical and IP challenges for Analytics Professionals... 11

CSE 7110c Essential Business Skills for a Data Scientist This module is being independently offered to several CXOs and senior management across the globe and highly appreciated as one of the most hands-on managerial introduction to data science. You learn to become a consumer of analytics for which McKinsey predicted there is unprecedented demand. Why should we build models or use data to run a business: The edge of evidence over intuition What kind of models do data scientists build and where they do not work When you want a prediction o How do you estimate how much to pay and how long to wait o How do you precisely define for the teams what to deliver o How do you evaluate how good their prediction are When does big unstructured data become really important When you want to build an analytics group o What software or hardware should you invest in o Several engagement models and the ideal teams for each Business plan: Each team develops a business plan for setting up an analytics organization, and creates a complete business plan and presents. Case analysis: Participants would be divided into separate teams and would be given several high level business problems. They have to identify the prediction problems with high ROI and provide concise requirements

CSE 7111c Planning and Thinking Skills for Architecting Data Science Solutions This module trains the data scientists with skills to design and architect practical and workable solutions. They also understand the skills needed to coordinate between business and technical teams. Thinking tools o Approximations and estimations o Geometric visualization of data and models o Probabilistic analysis of data and models o Analyzing networks and graphs o Analyzing transitions, Markov chains and unstructured data o Estimating complexity of algorithms Choosing the right models and architecting a solution o Structure and anatomy of models o Problematic data and choosing the best experimentation Sources of errors in predictive models and techniques to minimize them Interacting with technical and business teams o Translating typical business problems into technical specifications o Brainstorming and analyzing data and designing transformations o Manual analysis of the models Case study: Participants will be given business problems. They need to: o Translate it into a specific technical solution o Brain storm for data and design transformations o Architect complete solution plan

CSE 7212c Essential Engineering Skills in Big Data Analytics This module trains engineers in hands-on Big Data and analytics tools like R, Hadoop, Hive and Pig. The students work on several real world data sets. Reading from Excel, CSV and other forms Data exploration (histograms, bar chart, box plot, line graph, scatter plot) Data story telling - The science, ggplot, bubble charts with multiple dimensions, gauge charts, treemap, heat map and motion charts Data preprocessing of structured data - Handling missing values, Binning, Standardization, Outlier/Noise, PCA, Type Conversion, etc. Visualization

CSE 7302c Statistical Modeling for Predictive Analytics in Engineering and Business This module is aimed at teaching how to think like a statistician. Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write, wrote H. G. Wells in the year 1895. That day and age has arrived with Data Analytics going mainstream (For Today s Graduate, Just One Word: Statistics - http://www.nytimes.com/2009/08/06/technology/06stats.html). This course thoroughly trains candidates on the following and uses Excel and R to explain concepts: Computing the properties of an attribute: Central tendencies (Mean, Median, Mode, Range, Variance, Standard Deviation); Expectations of a Variable; Moment Generating Functions Describing an attribute: Probability distributions (Discrete and Continuous) - Bernoulli, Geometric, Binomial and Poisson distributions Describing the relationship between attributes: Covariance; Correlation; ChiSquare Describing a single variable continued: Exponential distribution; Special emphasis on Normal distribution; Central Limit Theorem Inferential statistics: How to learn about the population from a sample and vice versa; Sampling distributions; Confidence Intervals, Hypothesis Testing ANOVA Regression (Linear, Multivariate Regression) in forecasting Analyzing and interpreting regression results Logistic Regression

CSE 7304c Engineering Big Data with R and Hadoop Ecosystem Companies collect and store large amounts of data during daily transactions. This data is both structured and unstructured. The volume of the data being collected has grown from MB to TB in the past few years and is continuing to grow at an exponential pace. The very large size, lack of structure and the pace at which it is growing characterize the Big Data. To analyze long-term trends and patterns in the data and provide actionable intelligence to managers, this data needs to be consolidated and processed in specialized processes; those techniques form the core of the module. From a tools perspective, this course introduces you to Hadoop. You will learn one of the most powerful combinations of Big Data, viz., R and Hadoop. Introduction to Big Data o World uses more Big Apps than you realize -A taxonomy and demonstrations of apps Data center as a computer o From Cells and Grids to Master-Slave Clouds - Evolution of clusters o Design Considerations: Cost, failure o What's so special about Hadoop? Storing big bytes o GFS, HDFS, Next Generation HDFS Rapidly ingesting & organizing unstructured data o Chukwa, Flume, Avro o NoSQL: Big Table, HBase, Document stores, Graph stores, Key-Value stores Your key tool: Split and Merge o Sequetial and Concurrent algorithms design, metrics o Two S&M Paradigms - Map Reduce versus BSP o Yarn, MR2, ZooKeeper Querying big data o SQL, Sqoop, Hive, Hive variants like Impala, Spark and Storm Processing big data o R-Hadoop, Hadoop Streaming with Python/C++ o PIG programming, Oozie

CSE 7206c Text Mining, Social Network Analysis and Natural Language Processing This module in tightly integrated with CSE 7304c module and the topics in the two modules are interweaved. Text mining: Unstructured data comprises more than 80% of the stored business information (primarily as text). This helped text mining emerge as a leading-edge technology. This module describes practical techniques for text mining, including pre-processing (tokenization, part-of speech tagging), document clustering and classification, information retrieval, search and sentiment extraction in a business context. Predictive modeling with social network data: Social network mining is extremely useful in targeted marketing, on-line advertising and fraud detection. The course teaches how incorporating social media analysis can help improve the performance of predictive models. Natural Language Processing: By the end of the course, you will be able to answer questions like how to classify or tag a document into a category, how to rank some people in a network as more likely customers than others, etc. Taming big text o Text ingestion using crawlers; Preprocessing - making text into data vectors Handling big graphs o Why graphs? How to represent, measure and query them? NoSQL graph stores o Implementing Graph processing in Map Reduce, Hama & Giraph The purpose of it all: Finding patterns in data Finding patterns in text o Mahout, text mining, text as a graph

CSE 7305c Methods and Algorithms in Machine Learning This module discusses the principles and ideas underlying the current practice of data mining and introduces a powerful set of useful data analytics tools (such as K-Nearest Neighbors, Neural Networks, etc.). Real-world business problems are used for practice. Rule based knowledge: Logic of rules, evaluating rules, Rule induction and association rules Construction of Decision Trees through simplified examples; Choosing the "best" attribute at each non-leaf node; Entropy; Information Gain Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with numerical variables; Other measures of randomness Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as rules Specialized decision trees (oblique trees), Ensemble and Hybrid models AdaBoost, Random Forests and Gradient boosting machines K-Nearest Neighbor method Wilson editing and triangulations K-nearest neighbors in collaborative filtering, digit recognition Motivation for Neural Networks and its applications Perceptron and Single Layer Neural Network, and hand calculations Learning in a Neural Net: Back propagation and conjugant gradient techniques Application of Neural Net in Face and Digit Recognition Linear learning machines and Kernel methods in learning VC (Vapnik-Chervonenkis) dimension; Shattering power of models Algorithm of Support Vector Machines (SVM) Connectivity models (hierarchical clustering) Centroid models (k-means algorithm) Distribution models (expectation maximization) Trend analysis and Time Series Cyclical and Seasonal analysis; Box-Jenkins method Smoothing; Moving averages; Auto-correlation; ARIMA Holt-Winters method Bayesian analysis and Naïve Bayes classifier

CSE 7213c Optimization and Decision Analysis This module is designed to teach linear and non-linear Optimization models namely Genetic Algorithms, Linear programming and Goal programming. The application areas originate from problems in finance and operations. Genetic Algorithms: The algorithm and the process Representing data for a Genetic Algorithm Why and how do Genetic Algorithms work? Linear programming: Graphical analysis Sensitivity and Duality analyses Integer, binary programming; Applications, problem formulation and solving through R Goal programming Data envelopment analysis Quadratic programming

CSV 1103 Communication, Ethical and IP challenges for Analytics Professionals This module emphasizes the importance of communication for Analytics professionals, especially since they are expected to deal with technical and non-technical users more closely than in any other discipline. Students also learn to appreciate the importance of ethical, legal and IP issues given that regulations are still sketchy in this field where adoption is increasing at rapid rates. Students learn to appreciate how to avoid ethical and legal pitfalls and what issues to be aware of when dealing with data. Why is Communication important? How to communicate effectively: Telling stories Communications issues from daily life with examples using audio, video, blogs, charts, email, etc. Seeing the big picture; Paying attention to details; Seeing things from multiple perspectives Challenges: Mix of stakeholders, Explicability of results, Visualization Guiding Principles: Clarity, Transparency, Integrity, Humility Framework for Effective Presentations; Examples of bad and good presentations Writing effective technical reports Difference between Legal and Ethical issues Challenges in current laws, regulations and fair information practices: Data protection, Intellectual property rights, Confidentiality, Contractual liability, Competition law, Licensing of Open Source software and Open Data How to handle legal, ethical and IP issues at an organization and an individual level The Ethics Check questions