1 Big Data Analytics and Optimization C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e e.edu.in
2 LIST OF COURSES Essential Business Skills for a Data Scientist... 3 Planning and Thinking Skills for Architecting Data Science Solutions... 4 Essential Engineering Skills in Big Data Analytics... 5 Statistical Modeling for Predictive Analytics in Engineering and Business... 6 Engineering Big Data with R and Hadoop Ecosystem... 7 Text Mining, Social Network Analysis and Natural Language Processing... 8 Methods and Algorithms in Machine Learning... 9 Optimization and Decision Analysis Communication, Ethical and IP challenges for Analytics Professionals... 11
3 CSE 7110c Essential Business Skills for a Data Scientist This module is being independently offered to several CXOs and senior management across the globe and highly appreciated as one of the most hands-on managerial introduction to data science. You learn to become a consumer of analytics for which McKinsey predicted there is unprecedented demand. Why should we build models or use data to run a business: The edge of evidence over intuition What kind of models do data scientists build and where they do not work When you want a prediction o How do you estimate how much to pay and how long to wait o How do you precisely define for the teams what to deliver o How do you evaluate how good their prediction are When does big unstructured data become really important When you want to build an analytics group o What software or hardware should you invest in o Several engagement models and the ideal teams for each Business plan: Each team develops a business plan for setting up an analytics organization, and creates a complete business plan and presents. Case analysis: Participants would be divided into separate teams and would be given several high level business problems. They have to identify the prediction problems with high ROI and provide concise requirements
4 CSE 7111c Planning and Thinking Skills for Architecting Data Science Solutions This module trains the data scientists with skills to design and architect practical and workable solutions. They also understand the skills needed to coordinate between business and technical teams. Thinking tools o Approximations and estimations o Geometric visualization of data and models o Probabilistic analysis of data and models o Analyzing networks and graphs o Analyzing transitions, Markov chains and unstructured data o Estimating complexity of algorithms Choosing the right models and architecting a solution o Structure and anatomy of models o Problematic data and choosing the best experimentation Sources of errors in predictive models and techniques to minimize them Interacting with technical and business teams o Translating typical business problems into technical specifications o Brainstorming and analyzing data and designing transformations o Manual analysis of the models Case study: Participants will be given business problems. They need to: o Translate it into a specific technical solution o Brain storm for data and design transformations o Architect complete solution plan
5 CSE 7212c Essential Engineering Skills in Big Data Analytics This module trains engineers in hands-on Big Data and analytics tools like R, Hadoop, Hive and Pig. The students work on several real world data sets. Reading from Excel, CSV and other forms Data exploration (histograms, bar chart, box plot, line graph, scatter plot) Data story telling - The science, ggplot, bubble charts with multiple dimensions, gauge charts, treemap, heat map and motion charts Data preprocessing of structured data - Handling missing values, Binning, Standardization, Outlier/Noise, PCA, Type Conversion, etc. Visualization
6 CSE 7302c Statistical Modeling for Predictive Analytics in Engineering and Business This module is aimed at teaching how to think like a statistician. Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write, wrote H. G. Wells in the year That day and age has arrived with Data Analytics going mainstream (For Today s Graduate, Just One Word: Statistics - This course thoroughly trains candidates on the following and uses Excel and R to explain concepts: Computing the properties of an attribute: Central tendencies (Mean, Median, Mode, Range, Variance, Standard Deviation); Expectations of a Variable; Moment Generating Functions Describing an attribute: Probability distributions (Discrete and Continuous) - Bernoulli, Geometric, Binomial and Poisson distributions Describing the relationship between attributes: Covariance; Correlation; ChiSquare Describing a single variable continued: Exponential distribution; Special emphasis on Normal distribution; Central Limit Theorem Inferential statistics: How to learn about the population from a sample and vice versa; Sampling distributions; Confidence Intervals, Hypothesis Testing ANOVA Regression (Linear, Multivariate Regression) in forecasting Analyzing and interpreting regression results Logistic Regression
7 CSE 7304c Engineering Big Data with R and Hadoop Ecosystem Companies collect and store large amounts of data during daily transactions. This data is both structured and unstructured. The volume of the data being collected has grown from MB to TB in the past few years and is continuing to grow at an exponential pace. The very large size, lack of structure and the pace at which it is growing characterize the Big Data. To analyze long-term trends and patterns in the data and provide actionable intelligence to managers, this data needs to be consolidated and processed in specialized processes; those techniques form the core of the module. From a tools perspective, this course introduces you to Hadoop. You will learn one of the most powerful combinations of Big Data, viz., R and Hadoop. Introduction to Big Data o World uses more Big Apps than you realize -A taxonomy and demonstrations of apps Data center as a computer o From Cells and Grids to Master-Slave Clouds - Evolution of clusters o Design Considerations: Cost, failure o What's so special about Hadoop? Storing big bytes o GFS, HDFS, Next Generation HDFS Rapidly ingesting & organizing unstructured data o Chukwa, Flume, Avro o NoSQL: Big Table, HBase, Document stores, Graph stores, Key-Value stores Your key tool: Split and Merge o Sequetial and Concurrent algorithms design, metrics o Two S&M Paradigms - Map Reduce versus BSP o Yarn, MR2, ZooKeeper Querying big data o SQL, Sqoop, Hive, Hive variants like Impala, Spark and Storm Processing big data o R-Hadoop, Hadoop Streaming with Python/C++ o PIG programming, Oozie
8 CSE 7206c Text Mining, Social Network Analysis and Natural Language Processing This module in tightly integrated with CSE 7304c module and the topics in the two modules are interweaved. Text mining: Unstructured data comprises more than 80% of the stored business information (primarily as text). This helped text mining emerge as a leading-edge technology. This module describes practical techniques for text mining, including pre-processing (tokenization, part-of speech tagging), document clustering and classification, information retrieval, search and sentiment extraction in a business context. Predictive modeling with social network data: Social network mining is extremely useful in targeted marketing, on-line advertising and fraud detection. The course teaches how incorporating social media analysis can help improve the performance of predictive models. Natural Language Processing: By the end of the course, you will be able to answer questions like how to classify or tag a document into a category, how to rank some people in a network as more likely customers than others, etc. Taming big text o Text ingestion using crawlers; Preprocessing - making text into data vectors Handling big graphs o Why graphs? How to represent, measure and query them? NoSQL graph stores o Implementing Graph processing in Map Reduce, Hama & Giraph The purpose of it all: Finding patterns in data Finding patterns in text o Mahout, text mining, text as a graph
9 CSE 7305c Methods and Algorithms in Machine Learning This module discusses the principles and ideas underlying the current practice of data mining and introduces a powerful set of useful data analytics tools (such as K-Nearest Neighbors, Neural Networks, etc.). Real-world business problems are used for practice. Rule based knowledge: Logic of rules, evaluating rules, Rule induction and association rules Construction of Decision Trees through simplified examples; Choosing the "best" attribute at each non-leaf node; Entropy; Information Gain Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with numerical variables; Other measures of randomness Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as rules Specialized decision trees (oblique trees), Ensemble and Hybrid models AdaBoost, Random Forests and Gradient boosting machines K-Nearest Neighbor method Wilson editing and triangulations K-nearest neighbors in collaborative filtering, digit recognition Motivation for Neural Networks and its applications Perceptron and Single Layer Neural Network, and hand calculations Learning in a Neural Net: Back propagation and conjugant gradient techniques Application of Neural Net in Face and Digit Recognition Linear learning machines and Kernel methods in learning VC (Vapnik-Chervonenkis) dimension; Shattering power of models Algorithm of Support Vector Machines (SVM) Connectivity models (hierarchical clustering) Centroid models (k-means algorithm) Distribution models (expectation maximization) Trend analysis and Time Series Cyclical and Seasonal analysis; Box-Jenkins method Smoothing; Moving averages; Auto-correlation; ARIMA Holt-Winters method Bayesian analysis and Naïve Bayes classifier
10 CSE 7213c Optimization and Decision Analysis This module is designed to teach linear and non-linear Optimization models namely Genetic Algorithms, Linear programming and Goal programming. The application areas originate from problems in finance and operations. Genetic Algorithms: The algorithm and the process Representing data for a Genetic Algorithm Why and how do Genetic Algorithms work? Linear programming: Graphical analysis Sensitivity and Duality analyses Integer, binary programming; Applications, problem formulation and solving through R Goal programming Data envelopment analysis Quadratic programming
11 CSV 1103 Communication, Ethical and IP challenges for Analytics Professionals This module emphasizes the importance of communication for Analytics professionals, especially since they are expected to deal with technical and non-technical users more closely than in any other discipline. Students also learn to appreciate the importance of ethical, legal and IP issues given that regulations are still sketchy in this field where adoption is increasing at rapid rates. Students learn to appreciate how to avoid ethical and legal pitfalls and what issues to be aware of when dealing with data. Why is Communication important? How to communicate effectively: Telling stories Communications issues from daily life with examples using audio, video, blogs, charts, , etc. Seeing the big picture; Paying attention to details; Seeing things from multiple perspectives Challenges: Mix of stakeholders, Explicability of results, Visualization Guiding Principles: Clarity, Transparency, Integrity, Humility Framework for Effective Presentations; Examples of bad and good presentations Writing effective technical reports Difference between Legal and Ethical issues Challenges in current laws, regulations and fair information practices: Data protection, Intellectual property rights, Confidentiality, Contractual liability, Competition law, Licensing of Open Source software and Open Data How to handle legal, ethical and IP issues at an organization and an individual level The Ethics Check questions
Getting Started with Big Data Analytics in Retail Learn how Intel and Living Naturally* used big data to help a health store increase sales and reduce inventory carrying costs. SOLUTION BLUEPRINT Big Data
Introduction to Data Mining and Knowledge Discovery Third Edition by Two Crows Corporation RELATED READINGS Data Mining 99: Technology Report, Two Crows Corporation, 1999 M. Berry and G. Linoff, Data Mining
INTELLIGENT BUSINESS STRATEGIES W H I T E P A P E R Architecting A Big Data Platform for Analytics By Mike Ferguson Intelligent Business Strategies October 2012 Prepared for: Table of Contents Introduction...
How to embrace Big Data A methodology to look at the new technology Contents 2 Big Data in a nutshell 3 Big data in Italy 3 Data volume is not an issue 4 Italian firms embrace Big Data 4 Big Data strategies
MBA Marketing Electives A Career-Based Introduction (2010-2011) Marketing Department Course Recommendations Based on Career Relevance Career Paths/Job Functions Marketing Electives (BUMK) 701 706 711 715
1 Contents Introduction. 1 View Point Phil Shelley, CTO, Sears Holdings Making it Real Industry Use Cases Retail Extreme Personalization. 6 Airlines Smart Pricing. 9 Auto Warranty and Insurance Efficiency.
BigBench: Towards an Industry Standard Benchmark for Big Data Analytics Ahmad Ghazal 1,5, Tilmann Rabl 2,6, Minqing Hu 1,5, Francois Raab 4,8, Meikel Poess 3,7, Alain Crolotte 1,5, Hans-Arno Jacobsen 2,9
Big Data Computing and Clouds: Trends and Future Directions Marcos D. Assunção a,, Rodrigo N. Calheiros b, Silvia Bianchi c, Marco A. S. Netto c, Rajkumar Buyya b, arxiv:1312.4722v2 [cs.dc] 22 Aug 2014
July 2013 Contents 1. Introduction 3 2. What is Big Data? 4 3. Big Data Adoption 5 4. Drivers and Barriers 11 5. Opportunities for Digital Entrepreneurship 14 5.1. Supply-side Business opportunities 14
Bachelor of Science in Business Management The Bachelor of Science in Business Management is a competencybased program that enables leaders and managers in organizations to earn a Bachelor of Science degree.
G00249318 Top 10 Technology Trends Impacting Information Infrastructure, 2013 Published: 19 February 2013 Analyst(s): Regina Casonato, Mark A. Beyer, Merv Adrian, Ted Friedman, Debra Logan, Frank Buytendijk,
TABLE OF CONTENTS Introduction... 3 The Importance of Triplestores... 4 Why Triplestores... 5 The Top 8 Things You Should Know When Considering a Triplestore... 9 Inferencing... 9 Integration with Text
For Big Data Analytics There s No Such Thing as Too Big The Compelling Economics and Technology of Big Data Computing March 2012 By: 4syth.com Emerging big data thought leaders Forsyth Communications 2012.
Business innovation and IT trends If you just follow, you will never lead Contents Executive summary 4 Background: Innovation and the CIO agenda 5 Cohesion and connection between technology trends 6 About
American Journal of Engineering Research (AJER) e-issn : 2320-0847 p-issn : 2320-0936 Volume-03, Issue-05, pp-266-270 www.ajer.org Research Paper Open Access Convergence of Big Data and Cloud Sreevani.Y.V.
Bachelor of Science in Marketing Management The Bachelor of Science in Marketing Management is a competencybased program that enables marketing and sales professionals to earn a Bachelor of Science degree.
STATE OF IOWA FEBRUARY 4-5, 2015 REQUEST FOR NEW PROGRAM AT IOWA STATE UNIVERSITY: MASTER OF BUSINESS ANALYTICS PROGRAM Contact: Diana Gonzalez Action Requested: Consider approval of the request by Iowa
Compliments of 2nd IBM Limited Edition Business Analytics in Retail Learn to: Put knowledge into action to drive higher sales Use advanced analytics for better response Tailor consumer shopping experiences
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R B i g D a t a : W h a t I t I s a n d W h y Y o u S h o u l d C a r e Sponsored
Text Analytics: The Victory Index Report SAS VICTORY Index d o u b l e v i c t o r Fern Halper, Ph.D Partner and Principal Analyst Marcia Kaufman COO and Principal Analyst Daniel Kirsh Senior Analyst Table
32 Big Data: present and future Big Data: present and future Mircea Răducu TRIFU, Mihaela Laura IVAN University of Economic Studies, Bucharest, Romania firstname.lastname@example.org, email@example.com
An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure
Eindhoven, August 2014 Big Data Opportunities for the Retail Sector A Model Proposal by M.G.H. (Marcel) van Eupen BSc Industrial Engineering & Management Science TU/e 2014 Student identity number 0715154
COULD VS. SHOULD: BALANCING BIG DATA AND ANALYTICS TECHNOLOGY The business world is abuzz with the potential of data. In fact, most businesses have so much data that it is difficult for them to process
Emergence and Taxonomy of Big Data as a Service Benoy Bhagattjee Working Paper CISL# 2014-06 May 2014 Composite Information Systems Laboratory (CISL) Sloan School of Management, Room E62-422 Massachusetts
Bachelor of Science in Business Human Resource Management The Bachelor of Science in Business Human Resource Management is a competency-based program that enables students to earn a Bachelor of Science
The Problems of the Month (POM) are used in a variety of ways to promote problem solving and to foster the first standard of mathematical practice from the Common Core State Standards: Make sense of problems