Emerging Technology Forecasting Using New Patent Information Analysis

Size: px
Start display at page:

Download "Emerging Technology Forecasting Using New Patent Information Analysis"

Transcription

1 Emerging Technology Forecasting Using New Patent Information Analysis Sunghae Jun * and Seung-Joo Lee Department of Statistics, Cheongju University, Chungbuk, Korea shjun@cju.ac.kr, access@cju.ac.kr *Corresponding Author: Sunghae Jun (shjun@cju.ac.kr) Abstract Emerging technology drives technological development and innovation in diverse fields of technology. Emerging technology forecasting can predict the possible areas of emerging technology. However, it is difficult to forecast the emerging technology because most technology forecasting tasks depend on the subjective experience of experts. Patent analysis is an objective method to recognize the trends in technological development. Many patent analysis methods have been researched; these methods apply text mining techniques to analyze the text data of patent documents such as the title and abstract. This approach has some limitations, namely the computing cost and information loss associated with the preprocessing step of text mining. Therefore, we propose a new patent information analysis to overcome these problems. Using the International Patent Classification codes from the patent documents of a target technology, we construct an emerging technology forecasting model. This research combines statistical inference and neural networks to construct our model for new patent information analysis. We perform a case study to verify how our research can be practically applied, using nanotechnology as the target technology. Therefore, we contribute this research to R&D planning. Keywords: Emerging technology, patent information analysis, technology forecasting 1. Introduction Technology forecasting (TF) anticipates technological trends such as the direction and the rate of technological development [1-3]. We can plan R&D policy and new product development based on TF results. Also, a TF model can inform a company s strategic decisions concerning technology licensing and patent management [4]. TF is thus an important issue for a company. Emerging technology drives technological development and innovation in diverse fields of technology [5]. The development of a technology depends on the emergence of related technologies [6, 7]. Thus, extracting the relationships between a target technology and its related technologies is important to technological development. This extraction is called emerging technology forecasting (ETF) in this paper. ETF predicts possible areas of emerging technology. However, it is difficult to forecast emerging technology because most technology forecasting tasks depend on the subjective experience of experts. Such methods are not stable, and so more objective ETF methods are required [8]. Patent analysis is an objective method to recognize the trends in technological development. Many patent analysis methods have been researched [4, 9, 10]; these methods apply text mining techniques to analyze the text data of patent documents, such as the title and abstract, and they are more quantitative TF approaches than were those of previous works such as Delphi [11, 12]. However, most patent analysis approaches have some limitations, namely the 107

2 computing cost and the information loss associated with preprocessing step of text mining [9, 13]. One approach to avoid this problem is to analyze International Patent Classification (IPC) code data [14]. Therefore, we propose a new patent information analysis method for the purpose of ETF. Using the IPC codes from the patent documents of a target technology, we construct an emerging technology forecasting model. This research combines statistical inference and neural networks to construct our model for new patent information analysis. We perform a case study to verify how our research can be practically applied, using nanotechnology as the target technology. 2. Emerging Technology Forecasting Emerging technology is a technology affecting and explaining other technologies for the coming year [7]. In other words, if the development of a technology relies on the development of its related technology, then it is an emerging technology. Determining the relationship between a target technology and its emerging technology is very important to ETF. The faster the technology develops, the more important the emerging technology is. In previous research, determining emerging technologies for a target technology depended on the Delphi study [6, 7]. In this paper, however, we determine emerging technologies via a quantitative and objective approach. Our research uses statistical inference, regression, and neural networks as quantitative methods and analyzes patent documents as objective data because patent data are appropriate historical data for ETF [7]. 3. New Patent Information Analysis for Emerging Technology Forecasting In this paper, we construct an (n*p) patent-ipc code matrix (PICM) from the retrieved patent documents using text mining techniques; this matrix is then used for our ETF modeling. Figure 1 shows our PICM structure. Figure 1. PICM Structure The rows and columns of this matrix are the patent number and IPC code, respectively; a matrix element represents the frequency that a given IPC code occurs in a patent. Our quantitative analysis is performed using this matrix. First, we have to determine the dependent and independent variables for the construction of the ETF model. The dependent variable is the IPC code representing the target technology. The IPC codes explaining (affecting) the target technology are the independent variables. All IPC codes (excluding the IPC code that is the dependent variable) are not alwa ys independent variables. Thus, we have to select the significantly independent variables to explain the target variable. In this paper, we use statistical inference to select the significant IPC codes (independent variables). To construct our ETF model, we combine statistical inference and neural network methods. In the statistical inference step, we select the IPC codes to be used in the ETF model. The selected IPC codes are the top-ranked codes and are divided into dependent and independent codes. Each code 108

3 is a variable in our model. The dependent code is a response variable; this is our ETF target technology. All of the remaining codes are candidate independent (predictive) variables: the target technology (dependent code) is explained by the technologi es of the predictive variables (independent codes). To select a meaningful variable to explain the target variable, we test the significance of an independent variable. In this paper, we use the probability value (p-value), which is the probability of a test statistic based on the observed data. This value assumes that the null hypothesis (H 0 ) is true (i.e., the variable is not significant). When the p-value is less than the significance level (0.05 for a 95% confidence interval), we determine that the vari able is significant using p- value, p(x) which is a test statistic that represents the probability of sample X [15]. A small p(x) value suggests that the alternative hypothesis is true (i.e., a variable is significant). denotes the level- test of p(x) and takes a value between 0 and 1. This research applies the p-value to two statistical methods, multiple regression and correlation analysis. In this paper, our regression model is as follows: IPC Ti IPC 0 1 1i IPC 2 2i IPC Where IPC Ti is the IPC code of the target technology in the ith patent (i = 1, 2,, n). 0 is the intercept and ( 1,, k ) are the slopes [16]. j denotes the amount of increase or decrease in the value of IPC T (target technology, dependent variable) associated with a one-unit increase in IPC j (explanatory technology, independent variable). In other words, the regression parameter j describes the relationship between IPC T and IPC j. To select the necessary IPC codes for the ETF model of IPC T, we test the significance of j as follows: H0 : j 0 vs. H1 : j 0 The null hypothesis (H 0 ) states that the variable is not significant and the alternative hypothesis (H 1 ) states that the variable is significant. If the p-value of the regression result is less than 0.05 (95% confidence coefficient), we can reject H 0, and so H 1 is selected for the ETF model. We also apply Pearson correlation analysis as a method of statistical inference for extracting the significant variables. This research shows the correlation coefficient is as follows: r n n i 1 IPC IPC j IPC IPC T ji n 2 2 IPC IPC j IPC IPC T ji i 1 i 1 Where IPC j and IPC T are the mean values of IPC ji and IPC Ti (i = 1, 2,, n), respectively. This coefficient represents the correlation between IPC ji and IPC Ti. As in the multiple regression analysis, H 0 states that the independent variable (IPC j ) is not significant with respect to the dependent variable (IPC T ), and H 1 states the significance of IPC j to IPC T. Using the p-value of IPC j, we decide whether the IPC code is significant. Therefore, we select the necessary IPC codes to explain the development of IPC T by the combined results of the multiple regression and the Pearson correlation. The selected IPC codes are used in the following neural network model to construct an ETF model. Neural networks have been used for diverse analyses, such as forecast modeling [17]. These networks are mathematical models that imitate the information processes of the human brain. The neural network model is a powerful tool for Ti Ti k ki i 109

4 predicting future events using previous (historical) data [17]. This model consists of neurons and the connecting weights between neurons. The neurons are contained in input, hidden, and output layers, as shown in Figure 2. Figure 2. Neural Network Model for ETF In our neural network model for ETF, the input layer is composed of the independent variables (explanatory IPC codes), and the dependent variable (target IPC code) is in the output layer. Also, this model has one hidden layer. We use the sigmoid function as the activation function of the neural network model [18]. This research uses gradient descent as an optimization strategy for constructing the ETF model based on neural networks [18]. Our proposed process of patent information analysis is shown in Figure 3. Figure 3. Patent Analysis Process for ETF Figure 3 represents the ETF process from the retrieved patent data to the emerging technology path diagram for identifying the future technology. To obtain a final path analysis result from the neural networks, we use the method of mean squared error (MSE) [19] of the ith predictive and actual values. Thus, we determine the best model as the one with the smallest MSE. From the best model, we can construct the emerging technology path for ETF. Therefore, we forecast the emerging technology via the path diagram. 4. Experimental Results To verify the performance of this research, we conduct an experiment using the patent data related to nanotechnology from the Korea Intellectual Property Rights Information Service (KIPRIS) [20]. We retrieved a total of 2,482 patent documents concerning nanotechnology. These were all the relevant patents submitted before April 110

5 14, From the patent data, we extracted all the IPC codes for our experiment. Figure 4 shows the PICM of nanotechnology. Figure 4. PICM of Nanotechnology The dimension of this matrix is ( ). In other words, the total number of IPC codes in all patent documents is 253. An element of this matrix is the frequency that a given IPC code occurred in each patent document. For example, the IPC code A01N occurred once in patent Figure 5 shows the frequency of top ranked IPC codes. Figure 5. Top Ranked IPC Codes We used 13 IPC codes with frequencies larger than 100. Also, we designated the IPC code with the largest frequency (H01L) as the dependent variable because this code occurred in the most nanotechnology patents. Thus, we constructed the following ETF model: H01L=f(A61K, B01J,, G01N, H01M). Our nanotechnology ETF model predicts the trend of H01L using the predictive model f( ) based on A61K, B01J, B05D, B29C, B32B, B82Y, C01B, C01G, C08K, C23C, G01N, and H01M. Table 1. P-values of Correlation and Regression Analysis Independent code Corr. coefficient Reg. parameter A61K B01J B05D B29C B32B B82Y C01B C01G C08K C23C G01N H01M

6 We knew that the IPC codes of B82Y, C01G, and C23C were not significant because of their p-values of the correlation (Corr.) coefficient and the regression (Reg.) parameter are less than Thus, we removed these variables (IPC codes) when constructing our ETF model. Using the nine selected IPC codes, we performed a neural networks analysis. Figure 6 shows the importance results we obtained for the nine selected IPC codes by neural network model. Figure 6. Importance Results of Explanatory IPC Codes We knew that the IPC code C01B was the most important to explain the IPC code H01L. Table 2 shows the resulting hierarchical models used to construct the ETF model. Table 2. Hierarchical Models for the ETF Model M1 M2 M3 M4 M5 M6 M7 M8 M9 Used IPC codes C01B M1 + B32B M2 + B29C M3 + G01N M4 + B01J M5 + C08K M6 + A61K M7 + H01M M8 + B05D First, we constructed the ETF model for H01L using one IPC code, C01B. Next, we added the second most important IPC code, B32B, to the first model (M1) to construct the second model (M2). In this manner, we constructed nine candidate models (M1 to M9) to determine optimal ETF model. Table 3 shows the neural networks results of the candidate models. Table 3. Neural Networks Results using Nine IPC Codes IPC code MSE Training Test Diff. Lift M M M M M M M M M

7 We divided the IPC code data into two data sets, the training set and the test data. Using the training data, we constructed our neural networks model. This model was validated by the test data. The model with the smallest MSE difference (Diff.) between the training and the test data sets was deemed the best. We can thus consider the M4 and M9 models the candidates for the best model. Then, we used the lift value as another measure to determine the best model. Lift value is an improved performance measure of a constructed model. For example, if a model has a lift value of 3, then the improved performance of the model is 3 times more accurate than random selection without the model. There was no difference between the MSEs of M4 and M9. Therefore, we selected the four IPC codes of M4 (C01B, B32B, B29C, and G01N) as our best ETF model of nanotechnology. This research used these IPC codes to forecast the target technology (H01L). Also, the other IPC codes (B01J, C08K, A61K, H01M, and B05D) were considered to explain the IPC codes of M4. Table 4 shows the path analysis result for these IPC codes. Table 4. Path Analysis Result (p-values) IPC code C01B B32B B29C G01N B01J C08K A61K H01M B05D The B01J was significant to the C01B because its p-value was less than Also, we knew that the B05D affected the C01B and B32B significantly. Using these results, we could construct our path diagram for the nanotechnology ETF model, as shown in Figure 7. Figure 7. Emerging Technology Path for Nanotechnology The nanotechnology (H01L) is developing by the development of four technologies based on C01B, B32B, B29C, and G01N. Also, the technology of C01B depends on the technological developments of B01J and B05D. The development of B05D affects the technology of B32B. 5. Conclusions In this paper, we proposed an ETF method using statistical inference and neural networks. This research used the IPC code data of patent documents for the ETF model. As a case study to verify our ETF model, we selected nanotechnology as the target technology. Using the results of p-values of regression and correlation analysis, we extracted the meaningful IPC codes from the PICM and constructed an ETF model. The 113

8 proposed model used multiple regression and neural networks models to determine the relationships between IPC codes. Our target technology was defined as the dependent variable in our model, and the other codes were treated as independent variables. From the results of the regression and neural networks, we constructed a technological path diagram for ETF. We concluded that the ETF of nanotechnology requires the development of the C01B, B32B, B29C, and G01N technologies directly and the development of B01J and B05D indirectly. The contribution of our research is an objective attempt to forecast emerging technology by quantitative methods (regression and neural networks) and objective data (IPC codes of patent documents). Our model can be applied to the ETF works of diverse technologies. In this research, we had some limitations such as the dependency of interpretation of the constructed path diagram. In other words, we need nanotechnology experts to determine the effective usage of the nanotechnology path diagram. In future work, we will research more objective ETF approaches to address this problem. References [1] A. K. Firat, W. L. Woon and S. Madnick, Technological Forecasting-A Review, Working paper CISL# , (2008). [2] S. Jun, A Forecasting Model for Technological Trend Using Unsupervised Learning, Communications in Computer and Information Science, vol. 258, (2011), pp [3] A. T. Roper, S. W. Cunningham, A. L. Porter, T. W. Mason, F. A. Rossini and J. Banks, Forecasting and Management of Technology, Wiley, (2011). [4] S. Jun, S. Park and D. Jang, Technology Forecasting using Matrix Map and Patent Clustering, Industrial Management & Data Systems, vol. 112, Issue 5, (2012). [5] Y. G. Kim, J. H. Suh and S. C. Park, Visualization of patent analysis for emerging technology, Expert Systems with Applications, vol. 34, (2008), pp [6] T. U. Daim, G. Rueda, H. Martin and P. Gerdsri, Forecasting emerging technologies: Use of bibliometrics and patent analysis, Technological Forecasting & Social Change, vol. 73, (2006), pp [7] M. Bengisu and R. Nekhili, Forecasting emerging technologies with the aid of science and technology databases, Technological Forecasting & Social Change, vol. 73, (2006), pp [8] S. Park and S. Jun, New Technology Management Using Time Series Regression and Clustering, International Journal of Software Engineering and Its Applications, vol. 6, no. 2, (2012), pp [9] M. Fattori, G. Pedrazzi and R. Turra, Text mining applied to patent mapping: a practical business case, World Patent Information, vol. 25, (2003), pp [10] S. Lee, B. Toon and Y. Park, An approach to discovering new technology opportunities: Keywords-based patent map approach, technovation, vol. 29, (2009), pp [11] V. W. Mitchell, Using Delphi to Forecast in New Technology Industries, Marketing Intelligence & Planning, vol. 10, Issue 2, (1992), pp [12] Y. C. Yun, G. H. Jeong and S. H. Kim, A Delphi technology forecasting approach using a semi-markov concept, Technological Forecasting and Social Change, vol. 40, (1991), pp [13] Y. H. Tseng, C. J. Lin and Y. I. Linc, Textmining techniques for patent analysis, Information Processing & Management, vol. 43, Issue 5, (2007), pp [14] S. Jun, IPC code Analysis of Patent Documents Using Association Rules and Maps-Patent Analysis of Database Technology, Communications in Computer and Information Science, vol. 258, (2011), pp [15] G. Casella and R. L. Berger, Statistical Inference, Duxbury, (2002). [16] R. H. Myers, Classical and Modern Regression with Applications, Duxbury, (1990). [17] P. Giudici, Applied Data Mining Statistical Methods for Business and Industry, Wiley, (2003). [18] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning-Data Mining, Inference, and Prediction, Springer, (2001). [19] B. L. Bowerman, R. T. O Connell and A. B, Koehler, Forecasting, Time Series, and Regression, An Applied Approach, Brooks/Cole, (2005). [20] Korea Intellectual Property Rights Information Service (KIPRIS), 114

9 Sunghae Jun Authors He received the BS, MS, and PhD degrees in department of statistics, Inha University, Korea, in 1993, 1996, and Also, He received PhD degree in department of computer science, Sogang University in He was a visiting scholar in department of statistics, Oklahoma State University in the United States from 2009 to He is currently associate professor in department of statistics, Cheongju University. Seung-Joo Lee He received the BS degree in department of applied statistics from Cheongju University, Korea in Also, he received MS, and PhD degrees in department of statistics, Dongkuk University, Korea, in 1987 and He is currently professor in department of statistics, Cheongju University. He has researched Bayesian and multi-variate statistics. 115

10 116

Patent Big Data Analysis by R Data Language for Technology Management

Patent Big Data Analysis by R Data Language for Technology Management , pp. 69-78 http://dx.doi.org/10.14257/ijseia.2016.10.1.08 Patent Big Data Analysis by R Data Language for Technology Management Sunghae Jun * Department of Statistics, Cheongju University, 360-764, Korea

More information

A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

More information

TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING

TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING Sunghae Jun 1 1 Professor, Department of Statistics, Cheongju University, Chungbuk, Korea Abstract The internet of things (IoT) is an

More information

A Divided Regression Analysis for Big Data

A Divided Regression Analysis for Big Data Vol., No. (0), pp. - http://dx.doi.org/0./ijseia.0...0 A Divided Regression Analysis for Big Data Sunghae Jun, Seung-Joo Lee and Jea-Bok Ryu Department of Statistics, Cheongju University, 0-, Korea shjun@cju.ac.kr,

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network

The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network , pp.67-76 http://dx.doi.org/10.14257/ijdta.2016.9.1.06 The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network Lihua Yang and Baolin Li* School of Economics and

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

TABLE OF CONTENTS. About Chi Squares... 1. What is a CHI SQUARE?... 1. Chi Squares... 1. Hypothesis Testing with Chi Squares... 2

TABLE OF CONTENTS. About Chi Squares... 1. What is a CHI SQUARE?... 1. Chi Squares... 1. Hypothesis Testing with Chi Squares... 2 About Chi Squares TABLE OF CONTENTS About Chi Squares... 1 What is a CHI SQUARE?... 1 Chi Squares... 1 Goodness of fit test (One-way χ 2 )... 1 Test of Independence (Two-way χ 2 )... 2 Hypothesis Testing

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA Chapter 13 introduced the concept of correlation statistics and explained the use of Pearson's Correlation Coefficient when working

More information

New Ensemble Combination Scheme

New Ensemble Combination Scheme New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

More information

Big Data Analytics for SCADA

Big Data Analytics for SCADA ENERGY Big Data Analytics for SCADA Machine Learning Models for Fault Detection and Turbine Performance Elizabeth Traiger, Ph.D., M.Sc. 14 April 2016 1 SAFER, SMARTER, GREENER Points to Convey Big Data

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480 1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

More information

Assessing Data Mining: The State of the Practice

Assessing Data Mining: The State of the Practice Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality

More information

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries

A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries Aida Mustapha *1, Farhana M. Fadzil #2 * Faculty of Computer Science and Information Technology, Universiti Tun Hussein

More information

A Study on Quantitative Analysis for New Product Development Strategy

A Study on Quantitative Analysis for New Product Development Strategy A Study on Quantitative Analysis for New Product Development Strategy Junseok Lee 1, Hyun woo Kim 1, Jongchan Kim 1, Joonhyuck Lee 1, Sangsung Park 2, Dongsik Jang 1 1 Department of Industrial Management

More information

Prediction Model for Crude Oil Price Using Artificial Neural Networks

Prediction Model for Crude Oil Price Using Artificial Neural Networks Applied Mathematical Sciences, Vol. 8, 2014, no. 80, 3953-3965 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.43193 Prediction Model for Crude Oil Price Using Artificial Neural Networks

More information

APPLICATION OF INTELLIGENT METHODS IN COMMERCIAL WEBSITE MARKETING STRATEGIES DEVELOPMENT

APPLICATION OF INTELLIGENT METHODS IN COMMERCIAL WEBSITE MARKETING STRATEGIES DEVELOPMENT ISSN 1392 124X INFORMATION TECHNOLOGY AND CONTROL, 2005, Vol.34, No.2 APPLICATION OF INTELLIGENT METHODS IN COMMERCIAL WEBSITE MARKETING STRATEGIES DEVELOPMENT Algirdas Noreika Department of Practical

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process

Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek Abstract Depending

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Diagnosis of Students Online Learning Portfolios

Diagnosis of Students Online Learning Portfolios Diagnosis of Students Online Learning Portfolios Chien-Ming Chen 1, Chao-Yi Li 2, Te-Yi Chan 3, Bin-Shyan Jong 4, and Tsong-Wuu Lin 5 Abstract - Online learning is different from the instruction provided

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

APPLICATION OF ARTIFICIAL NEURAL NETWORKS USING HIJRI LUNAR TRANSACTION AS EXTRACTED VARIABLES TO PREDICT STOCK TREND DIRECTION

APPLICATION OF ARTIFICIAL NEURAL NETWORKS USING HIJRI LUNAR TRANSACTION AS EXTRACTED VARIABLES TO PREDICT STOCK TREND DIRECTION LJMS 2008, 2 Labuan e-journal of Muamalat and Society, Vol. 2, 2008, pp. 9-16 Labuan e-journal of Muamalat and Society APPLICATION OF ARTIFICIAL NEURAL NETWORKS USING HIJRI LUNAR TRANSACTION AS EXTRACTED

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

Performance Evaluation of Requirements Engineering Methodology for Automated Detection of Non Functional Requirements

Performance Evaluation of Requirements Engineering Methodology for Automated Detection of Non Functional Requirements Performance Evaluation of Engineering Methodology for Automated Detection of Non Functional J.Selvakumar Assistant Professor in Department of Software Engineering (PG) Sri Ramakrishna Engineering College

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Application of Predictive Model for Elementary Students with Special Needs in New Era University

Application of Predictive Model for Elementary Students with Special Needs in New Era University Application of Predictive Model for Elementary Students with Special Needs in New Era University Jannelle ds. Ligao, Calvin Jon A. Lingat, Kristine Nicole P. Chiu, Cym Quiambao, Laurice Anne A. Iglesia

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data Athanasius Zakhary, Neamat El Gayar Faculty of Computers and Information Cairo University, Giza, Egypt

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Towards better understanding Cybersecurity: or are "Cyberspace" and "Cyber Space" the same?

Towards better understanding Cybersecurity: or are Cyberspace and Cyber Space the same? Towards better understanding Cybersecurity: or are "Cyberspace" and "Cyber Space" the same? Stuart Madnick Nazli Choucri Steven Camiña Wei Lee Woon Working Paper CISL# 2012-09 November 2012 Composite Information

More information

SOCIOLOGY 7702 FALL, 2014 INTRODUCTION TO STATISTICS AND DATA ANALYSIS

SOCIOLOGY 7702 FALL, 2014 INTRODUCTION TO STATISTICS AND DATA ANALYSIS SOCIOLOGY 7702 FALL, 2014 INTRODUCTION TO STATISTICS AND DATA ANALYSIS Professor Michael A. Malec Mailbox is in McGuinn 426 Office: McGuinn 427 Phone: 617-552-4131 Office Hours: TBA E-mail: malec@bc.edu

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Analecta Vol. 8, No. 2 ISSN 2064-7964

Analecta Vol. 8, No. 2 ISSN 2064-7964 EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,

More information

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013 A Short-Term Traffic Prediction On A Distributed Network Using Multiple Regression Equation Ms.Sharmi.S 1 Research Scholar, MS University,Thirunelvelli Dr.M.Punithavalli Director, SREC,Coimbatore. Abstract:

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

Calculating the Probability of Returning a Loan with Binary Probability Models

Calculating the Probability of Returning a Loan with Binary Probability Models Calculating the Probability of Returning a Loan with Binary Probability Models Associate Professor PhD Julian VASILEV (e-mail: vasilev@ue-varna.bg) Varna University of Economics, Bulgaria ABSTRACT The

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

New Matrix Approach to Improve Apriori Algorithm

New Matrix Approach to Improve Apriori Algorithm New Matrix Approach to Improve Apriori Algorithm A. Rehab H. Alwa, B. Anasuya V Patil Associate Prof., IT Faculty, Majan College-University College Muscat, Oman, rehab.alwan@majancolleg.edu.om Associate

More information

270107 - MD - Data Mining

270107 - MD - Data Mining Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 015 70 - FIB - Barcelona School of Informatics 715 - EIO - Department of Statistics and Operations Research 73 - CS - Department of

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Nam-gu, Incheon, Korea 2 Division of Industrial Engineering and Management, Sungkyul University,

Nam-gu, Incheon, Korea 2 Division of Industrial Engineering and Management, Sungkyul University, Vol.87 (Art, Culture, Game, Graphics, Broadcasting and Digital Contents 2015), pp.6-11 http://dx.doi.org/10.14257/astl.2015.87.02 Differences in the Environmental Management and Ethical Management Practices

More information

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India sumit_13@yahoo.com 2 School of Computer

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila

Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila Audit Analytics --An innovative course at Rutgers Qi Liu Roman Chinchila A new certificate in Analytic Auditing Tentative courses: Audit Analytics Special Topics in Audit Analytics Forensic Accounting

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo 71251911@mackenzie.br,nizam.omar@mackenzie.br

More information

Planning Workforce Management for Bank Operation Centers with Neural Networks

Planning Workforce Management for Bank Operation Centers with Neural Networks Plaing Workforce Management for Bank Operation Centers with Neural Networks SEFIK ILKIN SERENGIL Research and Development Center SoftTech A.S. Tuzla Teknoloji ve Operasyon Merkezi, Tuzla 34947, Istanbul

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

How To Understand The Theory Of Probability

How To Understand The Theory Of Probability Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

Categorical Data Analysis

Categorical Data Analysis Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods

More information

Joseph Twagilimana, University of Louisville, Louisville, KY

Joseph Twagilimana, University of Louisville, Louisville, KY ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim

More information

Neural Networks for Sentiment Detection in Financial Text

Neural Networks for Sentiment Detection in Financial Text Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.

More information

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results , pp.33-40 http://dx.doi.org/10.14257/ijgdc.2014.7.4.04 Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results Muzammil Khan, Fida Hussain and Imran Khan Department

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations Volume 3, No. 8, August 2012 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

More information

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python. Vikram Kamath Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

More information

Power Prediction Analysis using Artificial Neural Network in MS Excel

Power Prediction Analysis using Artificial Neural Network in MS Excel Power Prediction Analysis using Artificial Neural Network in MS Excel NURHASHINMAH MAHAMAD, MUHAMAD KAMAL B. MOHAMMED AMIN Electronic System Engineering Department Malaysia Japan International Institute

More information

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Predictive Analytics Tools and Techniques

Predictive Analytics Tools and Techniques Global Journal of Finance and Management. ISSN 0975-6477 Volume 6, Number 1 (2014), pp. 59-66 Research India Publications http://www.ripublication.com Predictive Analytics Tools and Techniques Mr. Chandrashekar

More information

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Nine Common Types of Data Mining Techniques Used in Predictive Analytics 1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

Advanced analytics at your hands

Advanced analytics at your hands 2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Statistical Functions in Excel

Statistical Functions in Excel Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

Course Syllabus. Purposes of Course:

Course Syllabus. Purposes of Course: Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building

More information