How To Write A Project Report On Statistical Analysis Of Big Data Sets

Size: px
Start display at page:

Download "How To Write A Project Report On Statistical Analysis Of Big Data Sets"

Transcription

1 Statistical Analysis of Big Data Sets Seemant Ujjain Statistics and Informatics Department of Mathematics Indian Institute of Technology (IIT), Kharagpur Project guide: Dr. Jitendra Kumar Assistant Professor Institute of Development and Research in Banking Technology (IDRBT) Road No. 1, Castle Hills, Masab Tank, Hyderabad July 5,

2 CONTENTS Certificate Declaration Abstract 1. Introduction Statistical analysis and big data Methodology Numerical Illustration Empirical Example Conclusion

3 CERTIFICATE This is to certify that project report titled Statistical Analysis of Big Data submitted by Seemant Ujjain of Integrated M.Sc. 5 th year, Dept. of Mathematics IIT Kharagpur, is record of a bonafide work carried out by him under my guidance during the period 8 th May 2012 to 6 th July 2012 at Institute of Development and Research in Banking Technology, Hyderabad. The project work is a research study, which has been successfully completed as per the set objectives. Dr. Jitendra Kumar Assistant Professor IDRBT, Hyderabad 3

4 DECLARATION I declare that the summer internship project report titled Statistical Analysis of Big Data is my own work conducted under the supervision of Dr. Jitendra Kumar at the Institute of Development and Research in Banking Technology, Hyaderabad. I have put in 60 days of my attendance with my supervisor at IDRBT and have been awarded project fellowship. I further declare that to the best of my knowledge, the report does not contain any part of any work which has been submitted for the award of any degree either in this institute or any other institute without proper citation. Seemant Ujjain Int. M.Sc. 5 th year Dept. of Mathematics IIT Kharagpur 4

5 ABSTRACT The big data has been generated by multiple known/unknown sources which cannot be normally stored by normal storage tools as well as it is continuously increasing. This generating features of data create hurdles for statisticians or data scientists in analysing because it requires fix set of data and in similar fashion the computing tools are handling with huge volume. Present project deals the statistical analysis of data which is continuously increasing with some velocity. We managed the velocity using the concept of realization of time series, where the accelerating nature of data is controlled by realizing the parametric values and fitted the suitable model. The modelling of realized parameters gives us a better estimate as well as takes less time in comparison to voluminous data. The simulation study is also carried out and applied the same for analysis the daily ATM withdrawal. 5

6 INTRODUCTION Big data often represent multiple, non random samples of unknown population shifting in composition within the short term. Big data are an outgrowth of today s digital environment, which generates data flowing continuously from all directions at unprecedented speed and volume, and which almost always requires cleansing. Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time ( Difficulties include capture,storage, search, sharing, analysis, and visualization. Big data is a coined work by the IT professionals and markets researchers but they are arguing with a huge analytic power and storage power like terabyte or petabyte solution. This study targets to do the analysis with accelerated data sets. The vast amount of information stored by most of the MNC s or government agencies always falls into the category of Big data. Any technique which can provide assistance in extracting valuable information from big chunk of data can become a valuable tool. 6

7 STATISTICAL ANALYSIS AND BIG DATA My project is an attempt to handle such condition if data is of numeric kind. The first assumption taken is that whole data can t be stored at a time, so different data sets has been considered/generated where storage capacity of system to handle the data is assumed to be equivalent to the size of each such data sets. Many such data sets together is been considered as the whole Big data. Second assumption is regarding nature of data. Data is numeric and random, also it s distribution is considered to be unknown. In this particular project we have also tried to measure basic statistical properties of Big data. The approach has been justified first under controlled parameter s values simulated under multiple conditions and extended for a real data set. The daily withdrawal from several ATMs of a bank has been taken as sample data. The concept of simulated data is used using the different ATM data one by one and compiled it as the analysis of whole and partitioned data and the basic measures of statistics like mean, median, variance and mode are obtained in reference of Big Data. METHODOLOGY: Present study aims to analyse the big data which continuously aiding the information itself. For this we use partition of data into different groups that may be in respect to time, size etc. Here, the partitioned data is realization of a huge source of information in respect to some specific bases. We have justified our approach first by simulation and then applied it to study the ATM withdrawal of a bank. 7

8 Case 1 :Simulated Data In simulation study we generated sets of observation, size of data is 1,00,000 observations. It is partitioned in 100 groups, where the parameters in different groups are following autoregressive model with zero mean. 100 data sets were generated in an accelerated manner. Each data set is consisting of 1000 observations. It is assumed that capacity of computing tool enables us to analyze up to 1000 observations only. A suitable statistical model, here linear regression is fitted over first data set. The parameters of the models are stored and next data set is analyzed and so on. After 100 such data sets, a suitable stochastic model is fitted over the parameters. The stochastic model fitted so entails us for future forecasting of parameters, which will help in speculation of future trend of data. The basic statistics like mean, median, variance, mode etc is also calculated for the whole data set flowchart is give below 8

9 Case 2 : ATM Data We applied the technique to some real data of some random ATM. The whole data is considered as Big data. As the whole data can t be realized at one time, we divides the data into smaller data sets each of size A suitable statistical model is applied to each such data sets and parameters are stored. Again a suitable stochastic model is fitted over the stored parameters. This stochastic model helps us in forecasting as well as analysis of data within storage capacity of the computing tool. The basic statistics like mean, median, variance, mode etc is also calculated for the whole data set. Flowchart is given below: 9

10 NUMERICAL ILLUSTRATION Simulation: The data is generated under controlled parametric values. Initial values of alpha and beta is provided by the user. These alpha and beta values are used to generate first data set, alpha acting as the intercept and beta as slope. The data set so obtained is fitted with a suitable regression model and parameters are stored. Next pair of alpha and beta are generated by using the equation below. Similarly every pair of alphas and betas are generated, the course of alpha and beta are shown in the diagram below, and thus every data set is obtained by using the corresponding pair of alpha and beta as parametric values. The data set so obtained is modelled with a suitable regression model and parameters are stored. Over these parameters we have fitted stochastic model as well as regression model. The parameters of regression model is shown as betahat1 and betahat2. The stochastic model is also shown. We have also successfully computed few basic statistical measures of whole data, e.g. mean, variance, median, mode etc. Computation of median and mode is done by partitioning the data into classes and measuring their class frequencies. As we can see the lesser the bin size more is the precision in measuring median and mode. But it should be also noted that lesser bin size increases computation. Algorithm: Generation of 100 sets( i refers to set no.). Alpha(1,i)=k1*alpha(1,i-1)+norm1(μ1,σ1) Beta(1,i)=K2*beta(1,i-1)+ norm2(μ2,σ2) Within every ith set 1000 nos. are generated. Function used is Y1(i,j) = norm3(μ3,σ3) Y2(i,j) = alpha(1,i)+beta(1,i)*y1(i,j)+norm4(μ4,σ4) Constants used alpha(1,1)=1.32; norm1(μ1,σ1)= normal(0,2.5) beta(1,1)=.213; k_1=.912; k_2=.9631; norm2(μ2,σ2)= normal(0,1.5) norm3(μ3,σ3)= normal(10,2.5) norm4(μ4,σ4)= normal(0,2.5) 10

11 140 alpha beta For bin size of 10 the course of alpha and beta are shown 11

12 The parametric values of models are Betahat1 = [ ] stats1 = [ ] (In order, the R2 statistic, the F statistic and an estimate of the error variance.) Betahat2 =[ ,2.3588] stats2 = [ ] Stochastic Model: A(q)y(t) = C(q)e(t) On alpha s it is A(q) = q^-1 C(q) = q^-1 On beta s it is A(q) = q^-1 C(q) = q^-1 With the help of stochastic models on alpha s and beta s the whole data can be analyzed. The basic statistics for simulated data is shown below Real_median Median Mean Mean Variance Mode Mode frequency Bin size e e e e e e e e e e e e e e e

13 EMPIRICAL EXAMPLE Real Data: Method 1 : data has been divided randomly into 30 sets, each of size A suitable regression model is fitted over each such sets. Parameters (alpha s and beta s) thus so obtained are stored. The course of parameters are shown below. A suitable stochastic model is Fitted over alphas and betas. The model is shown below. The basic statistics for the whole data are also obtained following the same procedure as discussed in simulated data section alpha beta The course of alpha and beta are shown above 13

14 The parametric values of models are Stochastic Model: A(q)y(t) = C(q)e(t) On alpha s A(q) = q^-1, C(q) = 1 + q^-1 On beta s A(q) = q^-1, C(q) = q^-1 The basic statistics for real data (method 1) is shown below Real_median Median Mean Mean Variance Mode Mode Bin size frequency e e Note : The data are divided by 10^-3 and results are achieved in first method. 14

15 4 x 104 alpha Project Method 2 : Data has been divided ATM wise, a total of 19 sets of different sizes. Same technique as discussed above is applied. The only difference is that size of each data sets are different beta The course of alpha and beta are shown above 15

16 The parametric values of models are Stochastic model : A(q)y(t) = C(q)e(t) On alpha s A(q) = q^-1 C(q) = 1 - q^-1 Loss function 1.119e+009 and FPE e+009 On beta s A(q) = q^-1 C(q) = q^-1 Loss function e+006 and FPE e+006 The basic statistics for real data (method 2) is shown below Real_median Median Mean Mean Variance Mode Mode frequency Bin size e e e e e e e e e ^4 16

17 CONCLUSION As the size of the big is countably infinite and this nature can be realize by the analysis of data till we have sufficient information about the parameters. The parameters are realized by the proper modelling of the parameters. We have taken a sample program in this study, it can be extended for the study of the larger data size as managed by the available computing tools. Main advantage of this study is to obtain the parameters of the big data with certain level of confidence which can be analyzed by simple computing machines. The technique is successful in modelling the real data and computation of its few important basic statistic measures. Simulation part is not fully completed and study is still in progress. 17

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

Paper No 19. FINALTERM EXAMINATION Fall 2009 MTH302- Business Mathematics & Statistics (Session - 2) Ref No: Time: 120 min Marks: 80

Paper No 19. FINALTERM EXAMINATION Fall 2009 MTH302- Business Mathematics & Statistics (Session - 2) Ref No: Time: 120 min Marks: 80 Paper No 19 FINALTERM EXAMINATION Fall 2009 MTH302- Business Mathematics & Statistics (Session - 2) Ref No: Time: 120 min Marks: 80 Question No: 1 ( Marks: 1 ) - Please choose one Scatterplots are used

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Big Data in Finance. Alexander Grigoriev. School of Business and Economics Sharing Success

Big Data in Finance. Alexander Grigoriev. School of Business and Economics Sharing Success Big Data in Finance Alexander Grigoriev Definitions Wiki: Big Data Gartner s 3V-definition [2012]: Big data is high volume, high velocity, and/or high variety information assets that require new forms

More information

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480 1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Course Supply Chain Management: Inventory Management. Inventories cost money: Reasons for inventory. Types of inventory

Course Supply Chain Management: Inventory Management. Inventories cost money: Reasons for inventory. Types of inventory Inventories cost money: Inventories are to be avoided at all cost? Course Supply Chain Management: Or Inventory Management Inventories can be useful? Chapter 10 Marjan van den Akker What are reasons for

More information

A Divided Regression Analysis for Big Data

A Divided Regression Analysis for Big Data Vol., No. (0), pp. - http://dx.doi.org/0./ijseia.0...0 A Divided Regression Analysis for Big Data Sunghae Jun, Seung-Joo Lee and Jea-Bok Ryu Department of Statistics, Cheongju University, 0-, Korea shjun@cju.ac.kr,

More information

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014 Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions

More information

Reduced echelon form: Add the following conditions to conditions 1, 2, and 3 above:

Reduced echelon form: Add the following conditions to conditions 1, 2, and 3 above: Section 1.2: Row Reduction and Echelon Forms Echelon form (or row echelon form): 1. All nonzero rows are above any rows of all zeros. 2. Each leading entry (i.e. left most nonzero entry) of a row is in

More information

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept

Statistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept Statistics 215b 11/20/03 D.R. Brillinger Data mining A field in search of a definition a vague concept D. Hand, H. Mannila and P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge. Some definitions/descriptions

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

Online Tuning of Artificial Neural Networks for Induction Motor Control

Online Tuning of Artificial Neural Networks for Induction Motor Control Online Tuning of Artificial Neural Networks for Induction Motor Control A THESIS Submitted by RAMA KRISHNA MAYIRI (M060156EE) In partial fulfillment of the requirements for the award of the Degree of MASTER

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Alabama Department of Postsecondary Education

Alabama Department of Postsecondary Education Date Adopted 1998 Dates reviewed 2007, 2011, 2013 Dates revised 2004, 2008, 2011, 2013, 2015 Alabama Department of Postsecondary Education Representing Alabama s Public Two-Year College System Jefferson

More information

Financial Trading System using Combination of Textual and Numerical Data

Financial Trading System using Combination of Textual and Numerical Data Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Introduction to Engineering System Dynamics

Introduction to Engineering System Dynamics CHAPTER 0 Introduction to Engineering System Dynamics 0.1 INTRODUCTION The objective of an engineering analysis of a dynamic system is prediction of its behaviour or performance. Real dynamic systems are

More information

A Robust Method for Solving Transcendental Equations

A Robust Method for Solving Transcendental Equations www.ijcsi.org 413 A Robust Method for Solving Transcendental Equations Md. Golam Moazzam, Amita Chakraborty and Md. Al-Amin Bhuiyan Department of Computer Science and Engineering, Jahangirnagar University,

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)

More information

CoolaData Predictive Analytics

CoolaData Predictive Analytics CoolaData Predictive Analytics 9 3 6 About CoolaData CoolaData empowers online companies to become proactive and predictive without having to develop, store, manage or monitor data themselves. It is an

More information

MASTER OF MANAGEMENT (2011-2013)

MASTER OF MANAGEMENT (2011-2013) MASTER OF MANAGEMENT (2011-2013) 1. Course Title: 2. Distinctive Focus: 3. Eligibility: 4. Mode of Selection: 5. No. of seats: 6. Duration: 7. Objectives: 8. Course Structure: 9. Course Details: 10. Summer

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Machine Learning in Statistical Arbitrage

Machine Learning in Statistical Arbitrage Machine Learning in Statistical Arbitrage Xing Fu, Avinash Patra December 11, 2009 Abstract We apply machine learning methods to obtain an index arbitrage strategy. In particular, we employ linear regression

More information

Teaching Business Statistics through Problem Solving

Teaching Business Statistics through Problem Solving Teaching Business Statistics through Problem Solving David M. Levine, Baruch College, CUNY with David F. Stephan, Two Bridges Instructional Technology CONTACT: davidlevine@davidlevinestatistics.com Typical

More information

Chapter 25 Cost-Volume-Profit Analysis Questions

Chapter 25 Cost-Volume-Profit Analysis Questions Chapter 25 Cost-Volume-Profit Analysis Questions 1. Cost-volume-profit analysis is used to accomplish the first step in the planning phase for a business, which involves predicting the volume of activity,

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Industrial and Systems Engineering Master of Science Program Data Analytics and Optimization

Industrial and Systems Engineering Master of Science Program Data Analytics and Optimization Industrial and Systems Engineering Master of Science Program Data Analytics and Optimization Department of Integrated Systems Engineering The Ohio State University (Expected Duration: Semesters) Our society

More information

Session 9 Case 3: Utilizing Available Software Statistical Analysis

Session 9 Case 3: Utilizing Available Software Statistical Analysis Session 9 Case 3: Utilizing Available Software Statistical Analysis Michelle Phillips Economist, PURC michelle.phillips@warrington.ufl.edu With material from Ted Kury Session Overview With Data from Cases

More information

CUSTOMER EDUCATION ON MOBILE BANKING

CUSTOMER EDUCATION ON MOBILE BANKING CUSTOMER EDUCATION ON MOBILE BANKING Project Trainee: Purushottam Vishnu Bhandare MBA-Banking Technology Pondicherry University Guide: Dr. V. N. Sastry Professor IDRBT, Hyderabad Institute of Development

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building

More information

A Fuel Cost Comparison of Electric and Gas-Powered Vehicles

A Fuel Cost Comparison of Electric and Gas-Powered Vehicles $ / gl $ / kwh A Fuel Cost Comparison of Electric and Gas-Powered Vehicles Lawrence V. Fulton, McCoy College of Business Administration, Texas State University, lf25@txstate.edu Nathaniel D. Bastian, University

More information

Time series Forecasting using Holt-Winters Exponential Smoothing

Time series Forecasting using Holt-Winters Exponential Smoothing Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract

More information

Predictive Analytics

Predictive Analytics Predictive Analytics How many of you used predictive today? 2015 SAP SE. All rights reserved. 2 2015 SAP SE. All rights reserved. 3 How can you apply predictive to your business? Predictive Analytics is

More information

The following are the measurable objectives for graduated computer science students (ABET Standards):

The following are the measurable objectives for graduated computer science students (ABET Standards): Computer Science A Bachelor of Science degree (B.S.) in Computer Science prepares students for careers in virtually any industry or to continue on with graduate study in Computer Science and many other

More information

On Parametric Model Estimation

On Parametric Model Estimation Proceedings of the 11th WSEAS International Conference on COMPUTERS, Agios Nikolaos, Crete Island, Greece, July 26-28, 2007 608 On Parametric Model Estimation LUMINITA GIURGIU, MIRCEA POPA Technical Sciences

More information

BIG DATA IN HEALTHCARE THE NEXT FRONTIER

BIG DATA IN HEALTHCARE THE NEXT FRONTIER BIG DATA IN HEALTHCARE THE NEXT FRONTIER Divyaa Krishna Sonnad 1, Dr. Jharna Majumdar 2 2 Dean R&D, Prof. and Head, 1,2 Dept of CSE (PG), Nitte Meenakshi Institute of Technology Abstract: The world of

More information

Big Data Challenges. technology basics for data scientists. Spring - 2014. Jordi Torres, UPC - BSC www.jorditorres.

Big Data Challenges. technology basics for data scientists. Spring - 2014. Jordi Torres, UPC - BSC www.jorditorres. Big Data Challenges technology basics for data scientists Spring - 2014 Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN Data Deluge: Due to the changes in big data generation Example: Biomedicine

More information

Predictive Modeling Techniques in Insurance

Predictive Modeling Techniques in Insurance Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics

More information

School of Mathematics

School of Mathematics School of Mathematics Programmes Operational Masters Research and Programmes Applied Statistics Operational Research, Research Applied Statistics and Applied and Financial Statistics Risk Data Science

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

ANALYTICS IN BIG DATA ERA

ANALYTICS IN BIG DATA ERA ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut

More information

Operations Research and Financial Engineering. Courses

Operations Research and Financial Engineering. Courses Operations Research and Financial Engineering Courses ORF 504/FIN 504 Financial Econometrics Professor Jianqing Fan This course covers econometric and statistical methods as applied to finance. Topics

More information

Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods

Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods Enrique Navarrete 1 Abstract: This paper surveys the main difficulties involved with the quantitative measurement

More information

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING Sulaimon Mutiu O. Department of Statistics & Mathematics Moshood Abiola Polytechnic, Abeokuta, Ogun State, Nigeria. Abstract

More information

Georgia Department of Education Kathy Cox, State Superintendent of Schools 7/19/2005 All Rights Reserved 1

Georgia Department of Education Kathy Cox, State Superintendent of Schools 7/19/2005 All Rights Reserved 1 Accelerated Mathematics 3 This is a course in precalculus and statistics, designed to prepare students to take AB or BC Advanced Placement Calculus. It includes rational, circular trigonometric, and inverse

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

COMPUTER SCIENCE PROGRAM

COMPUTER SCIENCE PROGRAM COMPUTER SCIENCE PROGRAM Master of Science in Computer Science (M.S.C.S.) Degree DEGREE INFORMATION CONTACT INFORMATION Program Admission Deadlines: Fall: June 1February 15 Spring: October 15 Summer: No

More information

2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or

2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.$ and Sales $: 1. Prepare a scatter plot of these data. The scatter plots for Adv.$ versus Sales, and Month versus

More information

Artificial Neural Network and Non-Linear Regression: A Comparative Study

Artificial Neural Network and Non-Linear Regression: A Comparative Study International Journal of Scientific and Research Publications, Volume 2, Issue 12, December 2012 1 Artificial Neural Network and Non-Linear Regression: A Comparative Study Shraddha Srivastava 1, *, K.C.

More information

Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; hflores@rice.edu, and Dr. J.A. Dobelman

Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; hflores@rice.edu, and Dr. J.A. Dobelman Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; hflores@rice.edu, and Dr. J.A. Dobelman Statistics lab will be mainly focused on applying what you have learned in class with

More information

CASH DEMAND FORECASTING FOR ATMS

CASH DEMAND FORECASTING FOR ATMS Report of summer project Institute for development and research in banking technology 13 May -13 July, 2013 CASH DEMAND FORECASTING FOR ATMS Guided By Dr. Mahil Carr Associate Professor IDRBT, Hyderabad

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Big Data Hope or Hype?

Big Data Hope or Hype? Big Data Hope or Hype? David J. Hand Imperial College, London and Winton Capital Management Big data science, September 2013 1 Google trends on big data Google search 1 Sept 2013: 1.6 billion hits on big

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights DATA EXPERTS We accelerate research and transform data to help you create actionable insights WE MINE WE ANALYZE WE VISUALIZE Domains Data Mining Mining longitudinal and linked datasets from web and other

More information

Better decision making under uncertain conditions using Monte Carlo Simulation

Better decision making under uncertain conditions using Monte Carlo Simulation IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics

More information

3. Regression & Exponential Smoothing

3. Regression & Exponential Smoothing 3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Statistics in Applications III. Distribution Theory and Inference

Statistics in Applications III. Distribution Theory and Inference 2.2 Master of Science Degrees The Department of Statistics at FSU offers three different options for an MS degree. 1. The applied statistics degree is for a student preparing for a career as an applied

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Confidence Intervals for One Standard Deviation Using Standard Deviation

Confidence Intervals for One Standard Deviation Using Standard Deviation Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Evaluating Current Practices in Shelf Life Estimation

Evaluating Current Practices in Shelf Life Estimation Definition of Evaluating Current Practices in Estimation PQRI Stability Working Group Pat Forenzo Novartis James Schwenke Applied Research Consultants, LLC From ICH Q1E An appropriate approach to retest

More information

Problem of the Month Through the Grapevine

Problem of the Month Through the Grapevine The Problems of the Month (POM) are used in a variety of ways to promote problem solving and to foster the first standard of mathematical practice from the Common Core State Standards: Make sense of problems

More information

WISE Power Tutorial All Exercises

WISE Power Tutorial All Exercises ame Date Class WISE Power Tutorial All Exercises Power: The B.E.A.. Mnemonic Four interrelated features of power can be summarized using BEA B Beta Error (Power = 1 Beta Error): Beta error (or Type II

More information

Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved

Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved Perspective Big Data Framework for Healthcare using Hadoop

More information

Tennessee Department of Education. Task: Sally s Car Loan

Tennessee Department of Education. Task: Sally s Car Loan Tennessee Department of Education Task: Sally s Car Loan Sally bought a new car. Her total cost including all fees and taxes was $15,. She made a down payment of $43. She financed the remaining amount

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Application of Predictive Model for Elementary Students with Special Needs in New Era University

Application of Predictive Model for Elementary Students with Special Needs in New Era University Application of Predictive Model for Elementary Students with Special Needs in New Era University Jannelle ds. Ligao, Calvin Jon A. Lingat, Kristine Nicole P. Chiu, Cym Quiambao, Laurice Anne A. Iglesia

More information

CRLS Mathematics Department Algebra I Curriculum Map/Pacing Guide

CRLS Mathematics Department Algebra I Curriculum Map/Pacing Guide Curriculum Map/Pacing Guide page 1 of 14 Quarter I start (CP & HN) 170 96 Unit 1: Number Sense and Operations 24 11 Totals Always Include 2 blocks for Review & Test Operating with Real Numbers: How are

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

Sensex Realized Volatility Index

Sensex Realized Volatility Index Sensex Realized Volatility Index Introduction: Volatility modelling has traditionally relied on complex econometric procedures in order to accommodate the inherent latent character of volatility. Realized

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

More information

PHD PROGRAM IN FINANCE COURSE PROGRAMME AND COURSE CONTENTS

PHD PROGRAM IN FINANCE COURSE PROGRAMME AND COURSE CONTENTS PHD PROGRAM IN FINANCE COURSE PROGRAMME AND COURSE CONTENTS I. Semester II. Semester FINC 601 Corporate Finance 8 FINC 602 Asset Pricing 8 FINC 603 Quantitative Methods in Finance 8 FINC 670 Seminar 4

More information

How To Get A Masters Degree In Logistics And Supply Chain Management

How To Get A Masters Degree In Logistics And Supply Chain Management Industrial and Systems Engineering Master of Science Program Logistics and Supply Chain Management Department of Integrated Systems Engineering The Ohio State University Logistics is the science of design,

More information

Rolling Advertisement No. 04 / 2014 15 Dated February 27, 2015 INVITATION FOR TOP TALENT IN RESEARCH & ACADEMICS FOR FACULTY POSITIONS

Rolling Advertisement No. 04 / 2014 15 Dated February 27, 2015 INVITATION FOR TOP TALENT IN RESEARCH & ACADEMICS FOR FACULTY POSITIONS INSTITUTE FOR DEVELOPMENT & RESEARCH IN BANKING TECHNOLOGY (Established by Reserve Bank of India) Castle Hills, Road No. 1, Masab Tank, Hyderabad 500 057 Rolling Advertisement No. 04 / 2014 15 Dated February

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

A Correlation of. to the. South Carolina Data Analysis and Probability Standards A Correlation of to the South Carolina Data Analysis and Probability Standards INTRODUCTION This document demonstrates how Stats in Your World 2012 meets the indicators of the South Carolina Academic Standards

More information

Part 1 : 07/27/10 21:30:31

Part 1 : 07/27/10 21:30:31 Question 1 - CIA 593 III-64 - Forecasting Techniques What coefficient of correlation results from the following data? X Y 1 10 2 8 3 6 4 4 5 2 A. 0 B. 1 C. Cannot be determined from the data given. D.

More information

Some Essential Statistics The Lure of Statistics

Some Essential Statistics The Lure of Statistics Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived

More information

Master s Thesis. A Study on Active Queue Management Mechanisms for. Internet Routers: Design, Performance Analysis, and.

Master s Thesis. A Study on Active Queue Management Mechanisms for. Internet Routers: Design, Performance Analysis, and. Master s Thesis Title A Study on Active Queue Management Mechanisms for Internet Routers: Design, Performance Analysis, and Parameter Tuning Supervisor Prof. Masayuki Murata Author Tomoya Eguchi February

More information

Numerical Algorithms Group. Embedded Analytics. A cure for the common code. www.nag.com. Results Matter. Trust NAG.

Numerical Algorithms Group. Embedded Analytics. A cure for the common code. www.nag.com. Results Matter. Trust NAG. Embedded Analytics A cure for the common code www.nag.com Results Matter. Trust NAG. Executive Summary How much information is there in your data? How much is hidden from you, because you don t have access

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis 4 March 2009 Research Methods for Empirical Computer Science CMPSCI 691DD Edwin Hubble What did Hubble see? What did Hubble see? Hubble s Law V = H 0 r Where: V = recessional

More information

LCs for Binary Classification

LCs for Binary Classification Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

More information

A New Quantitative Behavioral Model for Financial Prediction

A New Quantitative Behavioral Model for Financial Prediction 2011 3rd International Conference on Information and Financial Engineering IPEDR vol.12 (2011) (2011) IACSIT Press, Singapore A New Quantitative Behavioral Model for Financial Prediction Thimmaraya Ramesh

More information

Data Visualization Techniques

Data Visualization Techniques Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The

More information