School of Natural Sciences Postgraduate Diploma in Data & Business Analytics Master of Science Degree in Data Analytics Prospectus 2014 www.snu.edu.in
Table of Contents Overview... 3 Program Objectives... 4 Course Design... 4 Course Details... 4 Administration... 7 Advisory Committee... 7 Faculty... 8 Career Prospects... 8 Minimum Eligibility Criteria for Applicants... 8 Application Process... 9 Admission Process... 9 Important Dates... 9 Venue for Admission Test and Interview... 9 Admission Test (M.S. in Data Analytics) Details... 10 Interview (M.S. in Data Analytics and P.G. Diploma in Business & Data Analytics) Details... 10 Fees and Scholarship... 10 Frequently Asked Questions... 11 Big Data Analytic Center Page 2
Overview The Big Data Analytics Center (BDAC) is an interdisciplinary center set up under the aegis of the School of Natural Sciences (SoNS), Shiv Nadar University (SNU) in 2014. Big Data refers to the collection of data sets whose scale, diversity and complexity require new architecture, techniques, algorithms and analysis to manage it, and extract value and hidden knowledge from it. Use of information has become central for the survival and development of the human race. Today we experience a true deluge of data which record and shape our lives, ranging from large global issues such as climate change to the smallest local problem such as controlling a thermostat. The critical screening and processing of Big Data has become a world-wide effort, requiring academic attention from diverse disciplines. The challenge is to develop theoretical and innovative scientific and technological solutions to cater to the needs of the industry, the society and the environment. Given the wide gap between demand and supply of scientists, technologists and key experts in the domain of Data Analytics today, BDAC has initiated graduate (post-graduate diploma and Masters degree) programs to prepare the interested young minds for the academic analysis of such Big Data and its applications in the society today, from business concerns to social practices and cultural change. The integrated model of BDAC provides a unique opportunity to young aspirants from different academic disciplines and executives from the corporate world to explore the possibilities of next generation solutions in the emerging discipline of Big Data Analytics. Integrated Model of Big Data Analytics Center @ SNU In the view of ever more rapid technological developments in the digitalized world, with new solutions becoming obsolete every few years, the emphasis in our programs is to hone the ability to recognize, define and find solutions to such fundamental problems in an analytical way, to take a leading role in shaping the future society. Big Data Analytic Center Page 3
Program Objectives The theoretical and practical mix of the Big Data Analytic programs has the following objectives: Develop in-depth knowledge and understanding of the big data analytic domain. Analyze and solve problems conceptually and practically from diverse industries, such as manufacturing, retail, software, banking/ finance and pharmaceutical. Undertake consulting projects with significant data analysis component for better understanding of the theoretical concepts from statistics, economics and related disciplines. Undertake industrial research projects for the development of future solutions in the domain of data analytics to make an impact in the technological advancement. Use advanced analytical tools/ decision-making tools/ operation research techniques to analyze the complex problems and get ready to develop such new techniques for the future. Allow a flexible option especially for job holders to complete a P.G. Diploma in Business & Data Analytics (1 year program), and continue to a Master of Science Degree in Data Analytics (2 years total). Course Design Postgraduate Diploma in Data and Business Analytics (One Year) The program consists of ten courses and a project. These courses are divided in two semesters. There are eight core courses and two elective courses. A wide range of electives will be offered to best suit the choice and interest of a student. The appropriate training and usage of SPSS Statistics and SAS Enterprise Miner will be part of the course contents and course structure. The customerbased industrial project will help in broadening and in-depth understanding of the theoretical concepts, and use a practical approach to solve a real-life complex problem. In certain courses, casebased teaching will be adopted. Master of Science Degree in Data Analytics (Two Years) The program consists of ten courses in the first and second semester. Out of ten courses, eight are core courses and two are elective courses. The elective courses will have a wide range in the big data analytics to suite the research interest of the student. The training and the analytical usage of SPSS Statistics and SAS Enterprise Miner will enhance and update the knowledge of student in the relevant domains. The summer internship is for two months, based on available choices and the interest of the student. Third and fourth semesters will involve Master s thesis. Collaborative industrial research projects will provide a suitable base for the Master s thesis in accomplishing the objective of quality research work. Course Details Core Courses 1. Data Collection and Management Principles, Tools and Platforms / (Database Management Systems): Database concepts, Basic components of DBMS, sources of data, Big Data Analytic Center Page 4
logging, cleaning data, data representation, data models (hierarchical, network, XML), and Stores, NoSQL database, design for performance / quality parameters, documents and information retrieval, related tools (Postgres, OLTP, OLAP, Hadoop, Mapreduce) 2. Data Visualization / (Visualization and Reporting): Purpose of visualization, Multidimensional visualization, tree visualization, graph visualization and time series data visualization techniques, visual perception, cognitive issues, evaluation as well as other theory and design principles behind information visualization, understanding analytics output and their usage, basic interaction techniques such as selection and distortion, evaluation, examples of information visualization applications and systems, user tasks and analysis 3. Mathematics for Data Analytics: Basic probability theory, distributions and their properties, Simple and multiple regression analysis, hypothesis testing and sampling, estimation theory, least square methods, SVD, transformations, stochastic models compression techniques, Markov Models, Markov decision process and its application in sequential decision making, Poisson, Cumulative Poisson Process and its generalization, applications in different business domain, ARMA and ARIMA, Monte Carlo Simulations, application of data analytics in different domains. 4. Business Statistics: Descriptive statistics uni-variate and bi-variate, residual analysis, confidence and prediction intervals regression, associations, sequencing, introduction to forecasting, design of experiments and performing basic statistical analysis of data experiments (both field and laboratory) to investigate business issues, tools for conducting basic statistics (for example SPSS and SAS), conducting the analytics on (laboratory and / or field ) data using the tools (for example, SAS, JMP, KNIME) 5. Systems / Business Analysis: Introduction to information system components, types of information systems, roles of business analyst, evolution and definition, industry needs and applications, process and methodologies, tools and technologies, roles and responsibilities, impact of digital marketing and unstructured data, Systems planning: Objectives, preliminary investigation, other fact-finding techniques, recording facts, Analyzing, requirements: Data flow diagrams, data dictionary, process description, evaluation alternatives, Data analytics Life Cycle: discovery, data preparation, model planning, model building, communication results and operationalization, Implementation: quality assurance, documentation, management approval, Installation / implementation, Acceptance. 6. Data Mining: Clustering, Association rules, factor analysis, scale development, survival analysis, data reduction using PCA, scoring new data and model implementation, improving predictive models, association and market basket analysis, advanced regression models: concepts and applications, conjoint and discrete choice analysis, design and analysis of experiment. 7. Operation Research: Introduction to optimization, gradient descent method, convex optimization, linear programming and its generalization (Goal Programming and multi criteria decision analysis), integer programming, dynamic programming, assignment problem, transportation problem and their applications. Big Data Analytic Center Page 5
8. Big Data Technologies: Big data definition, enterprise / structured data, social / unstructured data, unstructured data needs for analytics, Big data programming (Hadoop / HDFS, Map-reduce, event stream processing, complex event processing), evolution, purpose and use, application data stores, (NSQL databases, in-memory databases), data computing appliance (DCA) and OLAP, massive parallel processing, in-memory computing / analytics, data science, enterprise / external search, HDFS Overview and concepts, data flow (read and write), interface to HDFS (HTTP, CLI and Java API), high availability and Name Node federation, Map Reduce developing and deploying programs, optimization techniques, Map Reduce Anatomy, Data flow framework programming Map Reduce best practices and debugging Electives 9. Understanding Enterprise Processes and Analytics: Overview of domain, understanding of business pain points, understanding different types of analytics applications, financial services claims, renewal, sales force, collections, fraud, compliance, risk, pricing, customer loyalty, pricing and promotion effectiveness etc, healthcare evidence based medicine, comparative effectiveness research, clinical analytics, fraud/waste/abuse management etc., telecom network optimization, subscriber profiling, churn management, collection management etc., manufacturing demand forecasting and SKU rationalization, plant analytics, route and distribution optimization, vendor performance etc, Overview of analytics view chain data source, ETL Data integration, data migration, MDM, modeling, reporting and visualization etc., process of scoping analytics project / use case, steps in hypothesis creation, establish critical success factors, identify reports and deliverables, data privacy and security 10. Machine Learning and Knowledge discovery: Supervised learning, decision trees, linear discriminant functions (SVM), neural networks, deep belief networks, density estimation methods, Bayes decision theory, expectation and minimization, ensemble methods, feature engineering, association rule mining, clustering techniques. Practical: evaluation of ML Techniques cross validations, ROC, precision, recall, F-value, introduction to use of ML and KD tools such as Weka, Octave, SciLab/ equivalent libraries/ tools 11. Time Series and Forecasting: A survey of the theory and application of time series methods in different domains with special emphasis on econometrics. Univariate stationary and nonstationary models, vector auto-regressions, frequency domain methods, models for estimation and inference in persistent time series, and structural breaks, different methods of estimation and inferences of modern dynamic stochastic general equilibrium models (DSGE): simulated method of moments, maximum likelihood and Bayesian approach. The empirical applications will be drawn primarily from macroeconomics and different domains. 12. Evolutionary Programming: Introduction to evolutionary and heuristic techniques. Principles and Historical Perspectives; Application potential in optimization, dimensionality reduction, data mining and analytics, Genetic Algorithms, Evolutionary Strategies, Evolutionary Programming Introduction to Representations, Binary Strings, Real-Valued Vectors, Various Selection Strategies Introduction to Search Operators, Crossover and Mutation, Ant Colony Optimization, Pheromone mediated search and Exploration and Exploitation strategies, Particle swarm optimization basic PSO strategies and variants, different neighborhood Big Data Analytic Center Page 6
topologies, Biogeography Based Optimization; Immigration and Emigration Strategies, Monte Carlo Methods Simulated annealing and advanced annealing strategies, Differential Evolution, Group Search Optimization, Glow worm Optimization, Firefly and other novel heuristic algorithms, Applications of evolutionary & Heuristic techniques in large scale Optimization,Combinatorial & Function optimization, Multi-objective Optimization, Pareto Front and Non-dominated Solutions NSGA and related solution strategies, Applications to large scale clustering classification, rule mining and Data driven Modeling, Variable Selection and Informative Data reduction and parameter optimization in predictive data analytics with evolutionary and heuristic techniques, Evolutionary Computing in discovering Structure and modularity of large scale networks 13. Multi-core Programming: Fundamental aspects of shared-memory and accelerator-based parallel programming, such as shared memory parallel architecture concepts, programming models, performance models, parallel algorithmic paradigms, parallelization techniques and strategies, scheduling algorithms, optimization, composition of parallel programs, and concepts of modern parallel programming languages and systems. Practical exercises help to apply the theoretical concepts of the course to solve concrete problems in a real-world multicore system 14. Game Theory: Rigorous investigation of the evolutionary and epistemic foundations of solution concepts, such as rationalizability and Nash equilibrium. Classical topics on repeated games, bargaining, and super-modular games, games, heterogeneous priors, psychological games, and games without expected utility maximization. Applications and case studies from different domains 15. Text Analytics: Introduction to text mining, text representation and turning into features, exploratory analysis: frequency and co-occurrence, clustering, categorization, bag of features, predictive analysis for categorization, predicative analysis for sentiment analysis, analyze data from extracted text from web, such as social media and tweets, Develop prototypes for identifying the entities mentioned in text, the relations between them, and the opinions expressed about these entities. Industrial Collaboration Our industrial partner will be SAS who will help Big Data Analytics Center in setting up the lab for SAS Enterprise Miner. Under SNU and SAS partnership, Management Development Program (MDP) will be launched in Applied Analytics and Predictive Analytics. Administration Dr. Rupamanjari Ghosh, Director, School of Natural Sciences, SNU Dr. Santosh Singh, Head, Big Data Analytics Center, SNU Advisory Committee Dr. Harish Karnick, Professor, Department of Computer Science and Engineering, IIT Kanpur Big Data Analytic Center Page 7
Dr. Debashish Kundu, Arun Kumar Chair Professor and Head, Department of Mathematics and Statistics, IIT Kanpur Dr. S. K. Neogy, Professor, Department of Statistical Quality Control and Operations Research, ISI - Delhi Dr. Manik Varma, Microsoft Research (MSR) Bangalore Faculty Visiting Professors Professor S. K. Neogy, Department of Statistical Quality Control & Operations Research, ISI - Delhi Professor Neeraj Misra, Department of Mathematics and Statistics, IIT - Kanpur Adjunct Faculty Professor Niladri Chatterjee, Department of Mathematics, IIT - Delhi Mr. Ljubisa Goganovic, Siemens Information Systems - Bangalore Dr. Abhijit Kulkarni, SAS - Pune Mr. Narendra Dureja, HCL - Noida Dr. Santosh Singh, Department of Mathematics, SoNS, SNU Dr. Krishnan Rajkumar, Department of Mathematics, SoNS, SNU Dr. Abhishek Ranjan, Department of Mathematics, SoNS, SNU Dr. V. K. Jayaraman, Center for Informatics, SoNS, SNU Dr. Partha Chatterjee, Department of Economics, SoHSS, SNU Dr. Saptarshi P. Ghosh, Department of Economics, SoHSS, SNU Career Prospects The P.G. Diploma and M.S. Degree programs will educate the aspirants who want to make an impact in the corporate and academic world in the domain of data analytics as data scientist and researcher, big data leads/ administrators/ managers, business analysts and data visualization specialist. The course is also suitable for those who are already working in analytics to enhance their theoretical and conceptual knowledge as well as those with analytical aptitude and would like to start career in big data analytics in different business sectors. The collaboration with the different multi-national companies at the level of mutual research interests and customer related projects will ease the path for campus recruitment. Minimum Eligibility Criteria for Applicants The applicants must hold either a four-year undergraduate degree in engineering/ mathematics/ statistics/ physics/ economics/ commerce or a Masters degree in mathematics/ physics/ statistics/ economics/ commerce, or a three-year undergraduate degree in mathematics/ statistics/ physics/ economics/ commerce plus two or more years of relevant industrial experience. The applicants for Big Data Analytic Center Page 8
executive education are expected to have at least five years of work experience. The minimum eligibility criteria can be waived for exceptionally qualified profiles. Application Process All interested candidates should apply in the prescribed form, available at http://snu.edu.in/pdf/bdac_diploma-ms_applicationform_2014.pdf. The duly filled form along with supporting documents, and a non-refundable demand draft of Rs.1,000/- (in favor of Shiv Nadar University payable at Delhi) should be sent by Speed Post, with Application for BDAC-SNS- SNU 2014-15 written on the envelope, to: Ms. Rupa Goswami EA to the Director School of Natural Sciences Shiv Nadar University Post Office Shiv Nadar University Gautam Buddha Nagar, UP 201 314, India. Email: rupa.goswami@snu.edu.in Telephone: +91 120 266 3841 Admission Process Each candidate will be evaluated holistically to assess his/her potential for becoming a good data scientist/data analysts/data manager. For M.S. applicants, a written test will be conducted at the announced date in the NCR. The written test will have multiple-choice questions. The M.S. candidates shortlisted after the written test, and all PG Diploma applicants will be called for technical interviews. The candidates will be evaluated on the basis of their scientific in-depth knowledge, analytical skills and computational knowledge. Important Dates Last date for receipt of applications for o M.S. in Data Analytics: 30 June 2014 o P.G. Diploma in Business and Data Analytics: 6 July 2014 Admission test for M.S. in Data Analytics: 13 July 2014 (9 a.m.) Interview for short-listed P.G. Diploma and M.S. candidates 13 July (starting at 2 p.m.) --14 July 2014 Announcement of entrance results: 16 July 2014 Fee payment due date: 25 July 2014 Start of the session: 9 August 2014 Venue for Admission Test and Interview National Capital Region (Exact address will be announced in due course) Big Data Analytic Center Page 9
Admission Test (M.S. in Data Analytics) Details The total duration of the written test will be 3 hours (with a break for 15 minutes in the middle). All questions will be of multiple-choice type. There will be 3 sections: Section 1: One hour for verbal reasoning and quantitative reasoning. (Refer to GRE syllabus) Section 2: Half an hour for basic programming skills with options in programming languages (C/C++ OR JAVA) Break for 15 minutes Section 3: One and a half hour for mathematics / probability and statistics. Section 3 will cover the following topics at the undergraduate level: Algebra: Elementary combinatorics: permutations and combinations, Binomial theorem, Vectors and matrices, determinant, rank, inverse of a matrix, solution of linear equations, Eigenvalues and eigenvectors, projection matrix, vector spaces, inner product and least square method. Numerical Analysis: Root finding methods, finite difference, interpolation, numerical integration. Calculus: Sequence and series, power series, Taylor series, limits and continuity, differentiation and integration, definite integrals, maxima and minima, functions of several variables, double integration and ordinary differential equations. Probability and Statistics: Combinatorial probability, conditional probability, random variables, Bayes theorem, Binomial and Poisson Distribution, Gaussian distribution, statistical estimation and testing, confidence intervals, introduction to linear regression. Interview (M.S. in Data Analytics and P.G. Diploma in Business & Data Analytics) Details The interview will focus mainly on standard undergraduate-level Mathematics/ Statistics and basic reasoning/ analytical aptitude. Fees and Scholarship The total fee (Admission fee + Tuition fee + contribution to the Student Activity Fund) for the twoyear M.S. program is INR 772,000, and that for the one-year PG Diploma is INR 396,000. The breakup is shown in the table below. The amount takes care of academic expenses such as basic program material, royalty for copyrighted material, library and network charges. The students will have an option to live on campus. This fee does not cover accommodation, mess charges, laundry charges and other living expenses. Fee Structure Big Data Analytic Center Page 10
PG Diploma / M.S. First Year (INR) M.S. Second Year (INR) Two-Year Total for M.S. (INR) Admission Fee 20,000 -- 20,000 Tuition Fee 375,000 375,000 750,000 Contribution to Student Activity 1,000 1,000 2,000 Fund Total Fees 396,000 376,000 772,000 Refundable Security Deposit 25,000 -- 25,000 Total Payable 421,000 376,000 797,000 Financial Aid and Scholarships for 2014-15 P.G. Diploma in Business and Data Analytics Top 20 selected candidates Year 1 (INR) Admission Fee 20,000 Tuition Fee 187,000 Contribution to Student Activity Fund 1,000 Total Fees 208,000 Refundable Security Deposit 25,000 Total Payable 233,000 M.S. in Data Analytics Top 5 selected candidates Year 1 (INR) Year 2 (INR) Admission Fee 20,000 - Tuition Fee 100,000 100,000 Contribution to Student Activity Fund 1,000 1,000 Total Fees 121,000 101,000 Refundable Security Deposit 25,000 Total Payable 146,000 101,000 Top 5 selected candidates enrolled in the M.S. program in Data Analytics will be eligible for teaching assistantship of INR 12,000 per month Note: 1. For those students who opt for campus accommodation, hostel fee will be charged according to the SNU rules. 2. For P.G. Diploma in Business and Data Analytics, the classes will be on weekends and for those students who want to stay on campus, hostel feel will be charged according to the SNU rules. Frequently Asked Questions 1. Why should I choose to take a program in data analytics? The recent years have seen an exponential growth of digital data. In the next decade the total digital data is expected to cross 35 zettabytes (1 zettabyte = 10 21 bytes). There will be a huge requirement for qualified educated data scientists, data analysts and data managers to manage the data and provide real time solutions to deal with the technological, social and Big Data Analytic Center Page 11
business challenges. Many surveys have shown that more than 4.4 million jobs in the area of Big Data Analytics will be created in the coming years. There is a huge opportunity for young graduates and budding professionals to prepare themselves to make a difference in the domain of Big Data by handling the technological and social challenges from different perspectives. SNU, an emerging premier university in India, has deigned two different programs to bridge the gap between demand and supply in the domain of business and data analytics. These programs educate and train the students on the latest software tools, and in depth technological understanding of critical processing of the data. Hence, it enables them to take up various positions in the domain of business and data analytics. 2. How to decide on admission in M.S. in Data Analytics or in P.G. Diploma in Data and Business Analytics? Both the programs have their unique value and have been designed to cover a wide range of interested candidates. For those students and working professionals who are interested in understanding the technological concepts, fundamentals and get trained themselves with the latest software tools and techniques behind the Big Data, the one-year PG Diploma in Business and Data Analytics is highly suitable. On the other hand, for those students who would like, in addition, to explore and develop new algorithms and techniques, the two-year Master of Science in Data Analytics will be the ideal choice. The M.S. program will also open up a path for pursuing Ph.D. in this domain. 3. Are the programs distance-education, online or full-time programs? PG Diploma and M.S. are full-time programs. The curriculum apart from class-room teaching includes significant portion of practical, group projects and assignments. The teaching schedule will be adjusted (to mostly weekends) to allow professionals already in jobs to attend the P.G. Diploma course. 4. Will the students get to work on live projects with companies? Project is an integral part of the curriculum to reinforce the classroom learning. Students will work in groups on industry assigned projects. 5. What kind of companies will come to campus for placement and what will be the profile? We are going to start the first session in August 2014. The Career Development Center at SNU is dedicated to engage and negotiate with different companies in the relevant domain for both summer internships and final placement of the students. 6. Are both the programs approved? The Shiv Nadar University has been set up through an Act of the State of Uttar Pradesh and is also recognized by the UGC. Being a program of the Shiv Nadar University it is not necessary for the PG Diploma and M.S. program offered by the Big Data Analytics Center in the School of Natural Sciences to be approved by any other organization/ body. We follow all UGC guidelines. Big Data Analytic Center Page 12
7. Who will teach the courses? The Big Data Analytics Center (BDAC) at SNU is a research center. In this center, we have a good pool of renowned faculty and researchers from within SNU, and also IIT professors and industrial experts as adjunct / visiting faculty who will teach the core and elective courses. 8. Whom should I contact for further queries? Dr. Santosh Singh, Head, Big Data Analytics Center, School of Natural Sciences, SNU, Shiv Nadar University P.O., Gautam Buddha Nagar, UP 201 314, India. Email: Santosh.singh@snu.edu.in Big Data Analytic Center Page 13