How To Write A Project Report On Statistical Analysis Of Big Data Sets
|
|
- Albert Simmons
- 3 years ago
- Views:
Transcription
1 Statistical Analysis of Big Data Sets Seemant Ujjain Statistics and Informatics Department of Mathematics Indian Institute of Technology (IIT), Kharagpur Project guide: Dr. Jitendra Kumar Assistant Professor Institute of Development and Research in Banking Technology (IDRBT) Road No. 1, Castle Hills, Masab Tank, Hyderabad July 5,
2 CONTENTS Certificate Declaration Abstract 1. Introduction Statistical analysis and big data Methodology Numerical Illustration Empirical Example Conclusion
3 CERTIFICATE This is to certify that project report titled Statistical Analysis of Big Data submitted by Seemant Ujjain of Integrated M.Sc. 5 th year, Dept. of Mathematics IIT Kharagpur, is record of a bonafide work carried out by him under my guidance during the period 8 th May 2012 to 6 th July 2012 at Institute of Development and Research in Banking Technology, Hyderabad. The project work is a research study, which has been successfully completed as per the set objectives. Dr. Jitendra Kumar Assistant Professor IDRBT, Hyderabad 3
4 DECLARATION I declare that the summer internship project report titled Statistical Analysis of Big Data is my own work conducted under the supervision of Dr. Jitendra Kumar at the Institute of Development and Research in Banking Technology, Hyaderabad. I have put in 60 days of my attendance with my supervisor at IDRBT and have been awarded project fellowship. I further declare that to the best of my knowledge, the report does not contain any part of any work which has been submitted for the award of any degree either in this institute or any other institute without proper citation. Seemant Ujjain Int. M.Sc. 5 th year Dept. of Mathematics IIT Kharagpur 4
5 ABSTRACT The big data has been generated by multiple known/unknown sources which cannot be normally stored by normal storage tools as well as it is continuously increasing. This generating features of data create hurdles for statisticians or data scientists in analysing because it requires fix set of data and in similar fashion the computing tools are handling with huge volume. Present project deals the statistical analysis of data which is continuously increasing with some velocity. We managed the velocity using the concept of realization of time series, where the accelerating nature of data is controlled by realizing the parametric values and fitted the suitable model. The modelling of realized parameters gives us a better estimate as well as takes less time in comparison to voluminous data. The simulation study is also carried out and applied the same for analysis the daily ATM withdrawal. 5
6 INTRODUCTION Big data often represent multiple, non random samples of unknown population shifting in composition within the short term. Big data are an outgrowth of today s digital environment, which generates data flowing continuously from all directions at unprecedented speed and volume, and which almost always requires cleansing. Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time ( Difficulties include capture,storage, search, sharing, analysis, and visualization. Big data is a coined work by the IT professionals and markets researchers but they are arguing with a huge analytic power and storage power like terabyte or petabyte solution. This study targets to do the analysis with accelerated data sets. The vast amount of information stored by most of the MNC s or government agencies always falls into the category of Big data. Any technique which can provide assistance in extracting valuable information from big chunk of data can become a valuable tool. 6
7 STATISTICAL ANALYSIS AND BIG DATA My project is an attempt to handle such condition if data is of numeric kind. The first assumption taken is that whole data can t be stored at a time, so different data sets has been considered/generated where storage capacity of system to handle the data is assumed to be equivalent to the size of each such data sets. Many such data sets together is been considered as the whole Big data. Second assumption is regarding nature of data. Data is numeric and random, also it s distribution is considered to be unknown. In this particular project we have also tried to measure basic statistical properties of Big data. The approach has been justified first under controlled parameter s values simulated under multiple conditions and extended for a real data set. The daily withdrawal from several ATMs of a bank has been taken as sample data. The concept of simulated data is used using the different ATM data one by one and compiled it as the analysis of whole and partitioned data and the basic measures of statistics like mean, median, variance and mode are obtained in reference of Big Data. METHODOLOGY: Present study aims to analyse the big data which continuously aiding the information itself. For this we use partition of data into different groups that may be in respect to time, size etc. Here, the partitioned data is realization of a huge source of information in respect to some specific bases. We have justified our approach first by simulation and then applied it to study the ATM withdrawal of a bank. 7
8 Case 1 :Simulated Data In simulation study we generated sets of observation, size of data is 1,00,000 observations. It is partitioned in 100 groups, where the parameters in different groups are following autoregressive model with zero mean. 100 data sets were generated in an accelerated manner. Each data set is consisting of 1000 observations. It is assumed that capacity of computing tool enables us to analyze up to 1000 observations only. A suitable statistical model, here linear regression is fitted over first data set. The parameters of the models are stored and next data set is analyzed and so on. After 100 such data sets, a suitable stochastic model is fitted over the parameters. The stochastic model fitted so entails us for future forecasting of parameters, which will help in speculation of future trend of data. The basic statistics like mean, median, variance, mode etc is also calculated for the whole data set flowchart is give below 8
9 Case 2 : ATM Data We applied the technique to some real data of some random ATM. The whole data is considered as Big data. As the whole data can t be realized at one time, we divides the data into smaller data sets each of size A suitable statistical model is applied to each such data sets and parameters are stored. Again a suitable stochastic model is fitted over the stored parameters. This stochastic model helps us in forecasting as well as analysis of data within storage capacity of the computing tool. The basic statistics like mean, median, variance, mode etc is also calculated for the whole data set. Flowchart is given below: 9
10 NUMERICAL ILLUSTRATION Simulation: The data is generated under controlled parametric values. Initial values of alpha and beta is provided by the user. These alpha and beta values are used to generate first data set, alpha acting as the intercept and beta as slope. The data set so obtained is fitted with a suitable regression model and parameters are stored. Next pair of alpha and beta are generated by using the equation below. Similarly every pair of alphas and betas are generated, the course of alpha and beta are shown in the diagram below, and thus every data set is obtained by using the corresponding pair of alpha and beta as parametric values. The data set so obtained is modelled with a suitable regression model and parameters are stored. Over these parameters we have fitted stochastic model as well as regression model. The parameters of regression model is shown as betahat1 and betahat2. The stochastic model is also shown. We have also successfully computed few basic statistical measures of whole data, e.g. mean, variance, median, mode etc. Computation of median and mode is done by partitioning the data into classes and measuring their class frequencies. As we can see the lesser the bin size more is the precision in measuring median and mode. But it should be also noted that lesser bin size increases computation. Algorithm: Generation of 100 sets( i refers to set no.). Alpha(1,i)=k1*alpha(1,i-1)+norm1(μ1,σ1) Beta(1,i)=K2*beta(1,i-1)+ norm2(μ2,σ2) Within every ith set 1000 nos. are generated. Function used is Y1(i,j) = norm3(μ3,σ3) Y2(i,j) = alpha(1,i)+beta(1,i)*y1(i,j)+norm4(μ4,σ4) Constants used alpha(1,1)=1.32; norm1(μ1,σ1)= normal(0,2.5) beta(1,1)=.213; k_1=.912; k_2=.9631; norm2(μ2,σ2)= normal(0,1.5) norm3(μ3,σ3)= normal(10,2.5) norm4(μ4,σ4)= normal(0,2.5) 10
11 140 alpha beta For bin size of 10 the course of alpha and beta are shown 11
12 The parametric values of models are Betahat1 = [ ] stats1 = [ ] (In order, the R2 statistic, the F statistic and an estimate of the error variance.) Betahat2 =[ ,2.3588] stats2 = [ ] Stochastic Model: A(q)y(t) = C(q)e(t) On alpha s it is A(q) = q^-1 C(q) = q^-1 On beta s it is A(q) = q^-1 C(q) = q^-1 With the help of stochastic models on alpha s and beta s the whole data can be analyzed. The basic statistics for simulated data is shown below Real_median Median Mean Mean Variance Mode Mode frequency Bin size e e e e e e e e e e e e e e e
13 EMPIRICAL EXAMPLE Real Data: Method 1 : data has been divided randomly into 30 sets, each of size A suitable regression model is fitted over each such sets. Parameters (alpha s and beta s) thus so obtained are stored. The course of parameters are shown below. A suitable stochastic model is Fitted over alphas and betas. The model is shown below. The basic statistics for the whole data are also obtained following the same procedure as discussed in simulated data section alpha beta The course of alpha and beta are shown above 13
14 The parametric values of models are Stochastic Model: A(q)y(t) = C(q)e(t) On alpha s A(q) = q^-1, C(q) = 1 + q^-1 On beta s A(q) = q^-1, C(q) = q^-1 The basic statistics for real data (method 1) is shown below Real_median Median Mean Mean Variance Mode Mode Bin size frequency e e Note : The data are divided by 10^-3 and results are achieved in first method. 14
15 4 x 104 alpha Project Method 2 : Data has been divided ATM wise, a total of 19 sets of different sizes. Same technique as discussed above is applied. The only difference is that size of each data sets are different beta The course of alpha and beta are shown above 15
16 The parametric values of models are Stochastic model : A(q)y(t) = C(q)e(t) On alpha s A(q) = q^-1 C(q) = 1 - q^-1 Loss function 1.119e+009 and FPE e+009 On beta s A(q) = q^-1 C(q) = q^-1 Loss function e+006 and FPE e+006 The basic statistics for real data (method 2) is shown below Real_median Median Mean Mean Variance Mode Mode frequency Bin size e e e e e e e e e ^4 16
17 CONCLUSION As the size of the big is countably infinite and this nature can be realize by the analysis of data till we have sufficient information about the parameters. The parameters are realized by the proper modelling of the parameters. We have taken a sample program in this study, it can be extended for the study of the larger data size as managed by the available computing tools. Main advantage of this study is to obtain the parameters of the big data with certain level of confidence which can be analyzed by simple computing machines. The technique is successful in modelling the real data and computation of its few important basic statistic measures. Simulation part is not fully completed and study is still in progress. 17
Analysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
More informationPaper No 19. FINALTERM EXAMINATION Fall 2009 MTH302- Business Mathematics & Statistics (Session - 2) Ref No: Time: 120 min Marks: 80
Paper No 19 FINALTERM EXAMINATION Fall 2009 MTH302- Business Mathematics & Statistics (Session - 2) Ref No: Time: 120 min Marks: 80 Question No: 1 ( Marks: 1 ) - Please choose one Scatterplots are used
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationBig Data in Finance. Alexander Grigoriev. School of Business and Economics Sharing Success
Big Data in Finance Alexander Grigoriev Definitions Wiki: Big Data Gartner s 3V-definition [2012]: Big data is high volume, high velocity, and/or high variety information assets that require new forms
More informationWeek TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480
1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500
More informationCurriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different
More informationCourse Supply Chain Management: Inventory Management. Inventories cost money: Reasons for inventory. Types of inventory
Inventories cost money: Inventories are to be avoided at all cost? Course Supply Chain Management: Or Inventory Management Inventories can be useful? Chapter 10 Marjan van den Akker What are reasons for
More informationA Divided Regression Analysis for Big Data
Vol., No. (0), pp. - http://dx.doi.org/0./ijseia.0...0 A Divided Regression Analysis for Big Data Sunghae Jun, Seung-Joo Lee and Jea-Bok Ryu Department of Statistics, Cheongju University, 0-, Korea shjun@cju.ac.kr,
More informationBig Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014
Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions
More informationReduced echelon form: Add the following conditions to conditions 1, 2, and 3 above:
Section 1.2: Row Reduction and Echelon Forms Echelon form (or row echelon form): 1. All nonzero rows are above any rows of all zeros. 2. Each leading entry (i.e. left most nonzero entry) of a row is in
More informationStatistics 215b 11/20/03 D.R. Brillinger. A field in search of a definition a vague concept
Statistics 215b 11/20/03 D.R. Brillinger Data mining A field in search of a definition a vague concept D. Hand, H. Mannila and P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge. Some definitions/descriptions
More informationFactors affecting online sales
Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4
More informationOnline Tuning of Artificial Neural Networks for Induction Motor Control
Online Tuning of Artificial Neural Networks for Induction Motor Control A THESIS Submitted by RAMA KRISHNA MAYIRI (M060156EE) In partial fulfillment of the requirements for the award of the Degree of MASTER
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationAlabama Department of Postsecondary Education
Date Adopted 1998 Dates reviewed 2007, 2011, 2013 Dates revised 2004, 2008, 2011, 2013, 2015 Alabama Department of Postsecondary Education Representing Alabama s Public Two-Year College System Jefferson
More informationFinancial Trading System using Combination of Textual and Numerical Data
Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,
More information2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationIntroduction to Engineering System Dynamics
CHAPTER 0 Introduction to Engineering System Dynamics 0.1 INTRODUCTION The objective of an engineering analysis of a dynamic system is prediction of its behaviour or performance. Real dynamic systems are
More informationA Robust Method for Solving Transcendental Equations
www.ijcsi.org 413 A Robust Method for Solving Transcendental Equations Md. Golam Moazzam, Amita Chakraborty and Md. Al-Amin Bhuiyan Department of Computer Science and Engineering, Jahangirnagar University,
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationSouth Carolina College- and Career-Ready (SCCCR) Probability and Statistics
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)
More informationCoolaData Predictive Analytics
CoolaData Predictive Analytics 9 3 6 About CoolaData CoolaData empowers online companies to become proactive and predictive without having to develop, store, manage or monitor data themselves. It is an
More informationMASTER OF MANAGEMENT (2011-2013)
MASTER OF MANAGEMENT (2011-2013) 1. Course Title: 2. Distinctive Focus: 3. Eligibility: 4. Mode of Selection: 5. No. of seats: 6. Duration: 7. Objectives: 8. Course Structure: 9. Course Details: 10. Summer
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationMachine Learning in Statistical Arbitrage
Machine Learning in Statistical Arbitrage Xing Fu, Avinash Patra December 11, 2009 Abstract We apply machine learning methods to obtain an index arbitrage strategy. In particular, we employ linear regression
More informationTeaching Business Statistics through Problem Solving
Teaching Business Statistics through Problem Solving David M. Levine, Baruch College, CUNY with David F. Stephan, Two Bridges Instructional Technology CONTACT: davidlevine@davidlevinestatistics.com Typical
More informationChapter 25 Cost-Volume-Profit Analysis Questions
Chapter 25 Cost-Volume-Profit Analysis Questions 1. Cost-volume-profit analysis is used to accomplish the first step in the planning phase for a business, which involves predicting the volume of activity,
More information17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationIndustrial and Systems Engineering Master of Science Program Data Analytics and Optimization
Industrial and Systems Engineering Master of Science Program Data Analytics and Optimization Department of Integrated Systems Engineering The Ohio State University (Expected Duration: Semesters) Our society
More informationSession 9 Case 3: Utilizing Available Software Statistical Analysis
Session 9 Case 3: Utilizing Available Software Statistical Analysis Michelle Phillips Economist, PURC michelle.phillips@warrington.ufl.edu With material from Ted Kury Session Overview With Data from Cases
More informationCUSTOMER EDUCATION ON MOBILE BANKING
CUSTOMER EDUCATION ON MOBILE BANKING Project Trainee: Purushottam Vishnu Bhandare MBA-Banking Technology Pondicherry University Guide: Dr. V. N. Sastry Professor IDRBT, Hyderabad Institute of Development
More informationStatistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
More informationbusiness statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
More informationSection Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini
NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building
More informationA Fuel Cost Comparison of Electric and Gas-Powered Vehicles
$ / gl $ / kwh A Fuel Cost Comparison of Electric and Gas-Powered Vehicles Lawrence V. Fulton, McCoy College of Business Administration, Texas State University, lf25@txstate.edu Nathaniel D. Bastian, University
More informationTime series Forecasting using Holt-Winters Exponential Smoothing
Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract
More informationPredictive Analytics
Predictive Analytics How many of you used predictive today? 2015 SAP SE. All rights reserved. 2 2015 SAP SE. All rights reserved. 3 How can you apply predictive to your business? Predictive Analytics is
More informationThe following are the measurable objectives for graduated computer science students (ABET Standards):
Computer Science A Bachelor of Science degree (B.S.) in Computer Science prepares students for careers in virtually any industry or to continue on with graduate study in Computer Science and many other
More informationOn Parametric Model Estimation
Proceedings of the 11th WSEAS International Conference on COMPUTERS, Agios Nikolaos, Crete Island, Greece, July 26-28, 2007 608 On Parametric Model Estimation LUMINITA GIURGIU, MIRCEA POPA Technical Sciences
More informationBIG DATA IN HEALTHCARE THE NEXT FRONTIER
BIG DATA IN HEALTHCARE THE NEXT FRONTIER Divyaa Krishna Sonnad 1, Dr. Jharna Majumdar 2 2 Dean R&D, Prof. and Head, 1,2 Dept of CSE (PG), Nitte Meenakshi Institute of Technology Abstract: The world of
More informationBig Data Challenges. technology basics for data scientists. Spring - 2014. Jordi Torres, UPC - BSC www.jorditorres.
Big Data Challenges technology basics for data scientists Spring - 2014 Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN Data Deluge: Due to the changes in big data generation Example: Biomedicine
More informationPredictive Modeling Techniques in Insurance
Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics
More informationSchool of Mathematics
School of Mathematics Programmes Operational Masters Research and Programmes Applied Statistics Operational Research, Research Applied Statistics and Applied and Financial Statistics Risk Data Science
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More informationGeostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt
More informationHow To Run Statistical Tests in Excel
How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting
More informationANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
More informationOperations Research and Financial Engineering. Courses
Operations Research and Financial Engineering Courses ORF 504/FIN 504 Financial Econometrics Professor Jianqing Fan This course covers econometric and statistical methods as applied to finance. Topics
More informationPractical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods
Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods Enrique Navarrete 1 Abstract: This paper surveys the main difficulties involved with the quantitative measurement
More informationAPPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING
APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING Sulaimon Mutiu O. Department of Statistics & Mathematics Moshood Abiola Polytechnic, Abeokuta, Ogun State, Nigeria. Abstract
More informationGeorgia Department of Education Kathy Cox, State Superintendent of Schools 7/19/2005 All Rights Reserved 1
Accelerated Mathematics 3 This is a course in precalculus and statistics, designed to prepare students to take AB or BC Advanced Placement Calculus. It includes rational, circular trigonometric, and inverse
More informationSample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationCOMPUTER SCIENCE PROGRAM
COMPUTER SCIENCE PROGRAM Master of Science in Computer Science (M.S.C.S.) Degree DEGREE INFORMATION CONTACT INFORMATION Program Admission Deadlines: Fall: June 1February 15 Spring: October 15 Summer: No
More information2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or
Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.$ and Sales $: 1. Prepare a scatter plot of these data. The scatter plots for Adv.$ versus Sales, and Month versus
More informationArtificial Neural Network and Non-Linear Regression: A Comparative Study
International Journal of Scientific and Research Publications, Volume 2, Issue 12, December 2012 1 Artificial Neural Network and Non-Linear Regression: A Comparative Study Shraddha Srivastava 1, *, K.C.
More informationIntroduction to Statistical Computing in Microsoft Excel By Hector D. Flores; hflores@rice.edu, and Dr. J.A. Dobelman
Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; hflores@rice.edu, and Dr. J.A. Dobelman Statistics lab will be mainly focused on applying what you have learned in class with
More informationCASH DEMAND FORECASTING FOR ATMS
Report of summer project Institute for development and research in banking technology 13 May -13 July, 2013 CASH DEMAND FORECASTING FOR ATMS Guided By Dr. Mahil Carr Associate Professor IDRBT, Hyderabad
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More informationBig Data Hope or Hype?
Big Data Hope or Hype? David J. Hand Imperial College, London and Winton Capital Management Big data science, September 2013 1 Google trends on big data Google search 1 Sept 2013: 1.6 billion hits on big
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationDATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights
DATA EXPERTS We accelerate research and transform data to help you create actionable insights WE MINE WE ANALYZE WE VISUALIZE Domains Data Mining Mining longitudinal and linked datasets from web and other
More informationBetter decision making under uncertain conditions using Monte Carlo Simulation
IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics
More information3. Regression & Exponential Smoothing
3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a
More informationInternational Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com
More informationStatistics in Applications III. Distribution Theory and Inference
2.2 Master of Science Degrees The Department of Statistics at FSU offers three different options for an MS degree. 1. The applied statistics degree is for a student preparing for a career as an applied
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationConfidence Intervals for One Standard Deviation Using Standard Deviation
Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationEvaluating Current Practices in Shelf Life Estimation
Definition of Evaluating Current Practices in Estimation PQRI Stability Working Group Pat Forenzo Novartis James Schwenke Applied Research Consultants, LLC From ICH Q1E An appropriate approach to retest
More informationProblem of the Month Through the Grapevine
The Problems of the Month (POM) are used in a variety of ways to promote problem solving and to foster the first standard of mathematical practice from the Common Core State Standards: Make sense of problems
More informationWISE Power Tutorial All Exercises
ame Date Class WISE Power Tutorial All Exercises Power: The B.E.A.. Mnemonic Four interrelated features of power can be summarized using BEA B Beta Error (Power = 1 Beta Error): Beta error (or Type II
More informationIndian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved
Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved Perspective Big Data Framework for Healthcare using Hadoop
More informationTennessee Department of Education. Task: Sally s Car Loan
Tennessee Department of Education Task: Sally s Car Loan Sally bought a new car. Her total cost including all fees and taxes was $15,. She made a down payment of $43. She financed the remaining amount
More informationAdvanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
More informationApplication of Predictive Model for Elementary Students with Special Needs in New Era University
Application of Predictive Model for Elementary Students with Special Needs in New Era University Jannelle ds. Ligao, Calvin Jon A. Lingat, Kristine Nicole P. Chiu, Cym Quiambao, Laurice Anne A. Iglesia
More informationCRLS Mathematics Department Algebra I Curriculum Map/Pacing Guide
Curriculum Map/Pacing Guide page 1 of 14 Quarter I start (CP & HN) 170 96 Unit 1: Number Sense and Operations 24 11 Totals Always Include 2 blocks for Review & Test Operating with Real Numbers: How are
More informationANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
More informationSensex Realized Volatility Index
Sensex Realized Volatility Index Introduction: Volatility modelling has traditionally relied on complex econometric procedures in order to accommodate the inherent latent character of volatility. Realized
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationA Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical
More informationPHD PROGRAM IN FINANCE COURSE PROGRAMME AND COURSE CONTENTS
PHD PROGRAM IN FINANCE COURSE PROGRAMME AND COURSE CONTENTS I. Semester II. Semester FINC 601 Corporate Finance 8 FINC 602 Asset Pricing 8 FINC 603 Quantitative Methods in Finance 8 FINC 670 Seminar 4
More informationHow To Get A Masters Degree In Logistics And Supply Chain Management
Industrial and Systems Engineering Master of Science Program Logistics and Supply Chain Management Department of Integrated Systems Engineering The Ohio State University Logistics is the science of design,
More informationRolling Advertisement No. 04 / 2014 15 Dated February 27, 2015 INVITATION FOR TOP TALENT IN RESEARCH & ACADEMICS FOR FACULTY POSITIONS
INSTITUTE FOR DEVELOPMENT & RESEARCH IN BANKING TECHNOLOGY (Established by Reserve Bank of India) Castle Hills, Road No. 1, Masab Tank, Hyderabad 500 057 Rolling Advertisement No. 04 / 2014 15 Dated February
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationA Correlation of. to the. South Carolina Data Analysis and Probability Standards
A Correlation of to the South Carolina Data Analysis and Probability Standards INTRODUCTION This document demonstrates how Stats in Your World 2012 meets the indicators of the South Carolina Academic Standards
More informationPart 1 : 07/27/10 21:30:31
Question 1 - CIA 593 III-64 - Forecasting Techniques What coefficient of correlation results from the following data? X Y 1 10 2 8 3 6 4 4 5 2 A. 0 B. 1 C. Cannot be determined from the data given. D.
More informationSome Essential Statistics The Lure of Statistics
Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived
More informationMaster s Thesis. A Study on Active Queue Management Mechanisms for. Internet Routers: Design, Performance Analysis, and.
Master s Thesis Title A Study on Active Queue Management Mechanisms for Internet Routers: Design, Performance Analysis, and Parameter Tuning Supervisor Prof. Masayuki Murata Author Tomoya Eguchi February
More informationNumerical Algorithms Group. Embedded Analytics. A cure for the common code. www.nag.com. Results Matter. Trust NAG.
Embedded Analytics A cure for the common code www.nag.com Results Matter. Trust NAG. Executive Summary How much information is there in your data? How much is hidden from you, because you don t have access
More informationExploratory Data Analysis
Exploratory Data Analysis 4 March 2009 Research Methods for Empirical Computer Science CMPSCI 691DD Edwin Hubble What did Hubble see? What did Hubble see? Hubble s Law V = H 0 r Where: V = recessional
More informationLCs for Binary Classification
Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
More informationIntroduction to Statistics and Quantitative Research Methods
Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.
More informationA New Quantitative Behavioral Model for Financial Prediction
2011 3rd International Conference on Information and Financial Engineering IPEDR vol.12 (2011) (2011) IACSIT Press, Singapore A New Quantitative Behavioral Model for Financial Prediction Thimmaraya Ramesh
More informationData Visualization Techniques
Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The
More information