Emerging Technology Forecasting Using New Patent Information Analysis
|
|
- Shona Lawson
- 7 years ago
- Views:
Transcription
1 Emerging Technology Forecasting Using New Patent Information Analysis Sunghae Jun * and Seung-Joo Lee Department of Statistics, Cheongju University, Chungbuk, Korea shjun@cju.ac.kr, access@cju.ac.kr *Corresponding Author: Sunghae Jun (shjun@cju.ac.kr) Abstract Emerging technology drives technological development and innovation in diverse fields of technology. Emerging technology forecasting can predict the possible areas of emerging technology. However, it is difficult to forecast the emerging technology because most technology forecasting tasks depend on the subjective experience of experts. Patent analysis is an objective method to recognize the trends in technological development. Many patent analysis methods have been researched; these methods apply text mining techniques to analyze the text data of patent documents such as the title and abstract. This approach has some limitations, namely the computing cost and information loss associated with the preprocessing step of text mining. Therefore, we propose a new patent information analysis to overcome these problems. Using the International Patent Classification codes from the patent documents of a target technology, we construct an emerging technology forecasting model. This research combines statistical inference and neural networks to construct our model for new patent information analysis. We perform a case study to verify how our research can be practically applied, using nanotechnology as the target technology. Therefore, we contribute this research to R&D planning. Keywords: Emerging technology, patent information analysis, technology forecasting 1. Introduction Technology forecasting (TF) anticipates technological trends such as the direction and the rate of technological development [1-3]. We can plan R&D policy and new product development based on TF results. Also, a TF model can inform a company s strategic decisions concerning technology licensing and patent management [4]. TF is thus an important issue for a company. Emerging technology drives technological development and innovation in diverse fields of technology [5]. The development of a technology depends on the emergence of related technologies [6, 7]. Thus, extracting the relationships between a target technology and its related technologies is important to technological development. This extraction is called emerging technology forecasting (ETF) in this paper. ETF predicts possible areas of emerging technology. However, it is difficult to forecast emerging technology because most technology forecasting tasks depend on the subjective experience of experts. Such methods are not stable, and so more objective ETF methods are required [8]. Patent analysis is an objective method to recognize the trends in technological development. Many patent analysis methods have been researched [4, 9, 10]; these methods apply text mining techniques to analyze the text data of patent documents, such as the title and abstract, and they are more quantitative TF approaches than were those of previous works such as Delphi [11, 12]. However, most patent analysis approaches have some limitations, namely the 107
2 computing cost and the information loss associated with preprocessing step of text mining [9, 13]. One approach to avoid this problem is to analyze International Patent Classification (IPC) code data [14]. Therefore, we propose a new patent information analysis method for the purpose of ETF. Using the IPC codes from the patent documents of a target technology, we construct an emerging technology forecasting model. This research combines statistical inference and neural networks to construct our model for new patent information analysis. We perform a case study to verify how our research can be practically applied, using nanotechnology as the target technology. 2. Emerging Technology Forecasting Emerging technology is a technology affecting and explaining other technologies for the coming year [7]. In other words, if the development of a technology relies on the development of its related technology, then it is an emerging technology. Determining the relationship between a target technology and its emerging technology is very important to ETF. The faster the technology develops, the more important the emerging technology is. In previous research, determining emerging technologies for a target technology depended on the Delphi study [6, 7]. In this paper, however, we determine emerging technologies via a quantitative and objective approach. Our research uses statistical inference, regression, and neural networks as quantitative methods and analyzes patent documents as objective data because patent data are appropriate historical data for ETF [7]. 3. New Patent Information Analysis for Emerging Technology Forecasting In this paper, we construct an (n*p) patent-ipc code matrix (PICM) from the retrieved patent documents using text mining techniques; this matrix is then used for our ETF modeling. Figure 1 shows our PICM structure. Figure 1. PICM Structure The rows and columns of this matrix are the patent number and IPC code, respectively; a matrix element represents the frequency that a given IPC code occurs in a patent. Our quantitative analysis is performed using this matrix. First, we have to determine the dependent and independent variables for the construction of the ETF model. The dependent variable is the IPC code representing the target technology. The IPC codes explaining (affecting) the target technology are the independent variables. All IPC codes (excluding the IPC code that is the dependent variable) are not alwa ys independent variables. Thus, we have to select the significantly independent variables to explain the target variable. In this paper, we use statistical inference to select the significant IPC codes (independent variables). To construct our ETF model, we combine statistical inference and neural network methods. In the statistical inference step, we select the IPC codes to be used in the ETF model. The selected IPC codes are the top-ranked codes and are divided into dependent and independent codes. Each code 108
3 is a variable in our model. The dependent code is a response variable; this is our ETF target technology. All of the remaining codes are candidate independent (predictive) variables: the target technology (dependent code) is explained by the technologi es of the predictive variables (independent codes). To select a meaningful variable to explain the target variable, we test the significance of an independent variable. In this paper, we use the probability value (p-value), which is the probability of a test statistic based on the observed data. This value assumes that the null hypothesis (H 0 ) is true (i.e., the variable is not significant). When the p-value is less than the significance level (0.05 for a 95% confidence interval), we determine that the vari able is significant using p- value, p(x) which is a test statistic that represents the probability of sample X [15]. A small p(x) value suggests that the alternative hypothesis is true (i.e., a variable is significant). denotes the level- test of p(x) and takes a value between 0 and 1. This research applies the p-value to two statistical methods, multiple regression and correlation analysis. In this paper, our regression model is as follows: IPC Ti IPC 0 1 1i IPC 2 2i IPC Where IPC Ti is the IPC code of the target technology in the ith patent (i = 1, 2,, n). 0 is the intercept and ( 1,, k ) are the slopes [16]. j denotes the amount of increase or decrease in the value of IPC T (target technology, dependent variable) associated with a one-unit increase in IPC j (explanatory technology, independent variable). In other words, the regression parameter j describes the relationship between IPC T and IPC j. To select the necessary IPC codes for the ETF model of IPC T, we test the significance of j as follows: H0 : j 0 vs. H1 : j 0 The null hypothesis (H 0 ) states that the variable is not significant and the alternative hypothesis (H 1 ) states that the variable is significant. If the p-value of the regression result is less than 0.05 (95% confidence coefficient), we can reject H 0, and so H 1 is selected for the ETF model. We also apply Pearson correlation analysis as a method of statistical inference for extracting the significant variables. This research shows the correlation coefficient is as follows: r n n i 1 IPC IPC j IPC IPC T ji n 2 2 IPC IPC j IPC IPC T ji i 1 i 1 Where IPC j and IPC T are the mean values of IPC ji and IPC Ti (i = 1, 2,, n), respectively. This coefficient represents the correlation between IPC ji and IPC Ti. As in the multiple regression analysis, H 0 states that the independent variable (IPC j ) is not significant with respect to the dependent variable (IPC T ), and H 1 states the significance of IPC j to IPC T. Using the p-value of IPC j, we decide whether the IPC code is significant. Therefore, we select the necessary IPC codes to explain the development of IPC T by the combined results of the multiple regression and the Pearson correlation. The selected IPC codes are used in the following neural network model to construct an ETF model. Neural networks have been used for diverse analyses, such as forecast modeling [17]. These networks are mathematical models that imitate the information processes of the human brain. The neural network model is a powerful tool for Ti Ti k ki i 109
4 predicting future events using previous (historical) data [17]. This model consists of neurons and the connecting weights between neurons. The neurons are contained in input, hidden, and output layers, as shown in Figure 2. Figure 2. Neural Network Model for ETF In our neural network model for ETF, the input layer is composed of the independent variables (explanatory IPC codes), and the dependent variable (target IPC code) is in the output layer. Also, this model has one hidden layer. We use the sigmoid function as the activation function of the neural network model [18]. This research uses gradient descent as an optimization strategy for constructing the ETF model based on neural networks [18]. Our proposed process of patent information analysis is shown in Figure 3. Figure 3. Patent Analysis Process for ETF Figure 3 represents the ETF process from the retrieved patent data to the emerging technology path diagram for identifying the future technology. To obtain a final path analysis result from the neural networks, we use the method of mean squared error (MSE) [19] of the ith predictive and actual values. Thus, we determine the best model as the one with the smallest MSE. From the best model, we can construct the emerging technology path for ETF. Therefore, we forecast the emerging technology via the path diagram. 4. Experimental Results To verify the performance of this research, we conduct an experiment using the patent data related to nanotechnology from the Korea Intellectual Property Rights Information Service (KIPRIS) [20]. We retrieved a total of 2,482 patent documents concerning nanotechnology. These were all the relevant patents submitted before April 110
5 14, From the patent data, we extracted all the IPC codes for our experiment. Figure 4 shows the PICM of nanotechnology. Figure 4. PICM of Nanotechnology The dimension of this matrix is ( ). In other words, the total number of IPC codes in all patent documents is 253. An element of this matrix is the frequency that a given IPC code occurred in each patent document. For example, the IPC code A01N occurred once in patent Figure 5 shows the frequency of top ranked IPC codes. Figure 5. Top Ranked IPC Codes We used 13 IPC codes with frequencies larger than 100. Also, we designated the IPC code with the largest frequency (H01L) as the dependent variable because this code occurred in the most nanotechnology patents. Thus, we constructed the following ETF model: H01L=f(A61K, B01J,, G01N, H01M). Our nanotechnology ETF model predicts the trend of H01L using the predictive model f( ) based on A61K, B01J, B05D, B29C, B32B, B82Y, C01B, C01G, C08K, C23C, G01N, and H01M. Table 1. P-values of Correlation and Regression Analysis Independent code Corr. coefficient Reg. parameter A61K B01J B05D B29C B32B B82Y C01B C01G C08K C23C G01N H01M
6 We knew that the IPC codes of B82Y, C01G, and C23C were not significant because of their p-values of the correlation (Corr.) coefficient and the regression (Reg.) parameter are less than Thus, we removed these variables (IPC codes) when constructing our ETF model. Using the nine selected IPC codes, we performed a neural networks analysis. Figure 6 shows the importance results we obtained for the nine selected IPC codes by neural network model. Figure 6. Importance Results of Explanatory IPC Codes We knew that the IPC code C01B was the most important to explain the IPC code H01L. Table 2 shows the resulting hierarchical models used to construct the ETF model. Table 2. Hierarchical Models for the ETF Model M1 M2 M3 M4 M5 M6 M7 M8 M9 Used IPC codes C01B M1 + B32B M2 + B29C M3 + G01N M4 + B01J M5 + C08K M6 + A61K M7 + H01M M8 + B05D First, we constructed the ETF model for H01L using one IPC code, C01B. Next, we added the second most important IPC code, B32B, to the first model (M1) to construct the second model (M2). In this manner, we constructed nine candidate models (M1 to M9) to determine optimal ETF model. Table 3 shows the neural networks results of the candidate models. Table 3. Neural Networks Results using Nine IPC Codes IPC code MSE Training Test Diff. Lift M M M M M M M M M
7 We divided the IPC code data into two data sets, the training set and the test data. Using the training data, we constructed our neural networks model. This model was validated by the test data. The model with the smallest MSE difference (Diff.) between the training and the test data sets was deemed the best. We can thus consider the M4 and M9 models the candidates for the best model. Then, we used the lift value as another measure to determine the best model. Lift value is an improved performance measure of a constructed model. For example, if a model has a lift value of 3, then the improved performance of the model is 3 times more accurate than random selection without the model. There was no difference between the MSEs of M4 and M9. Therefore, we selected the four IPC codes of M4 (C01B, B32B, B29C, and G01N) as our best ETF model of nanotechnology. This research used these IPC codes to forecast the target technology (H01L). Also, the other IPC codes (B01J, C08K, A61K, H01M, and B05D) were considered to explain the IPC codes of M4. Table 4 shows the path analysis result for these IPC codes. Table 4. Path Analysis Result (p-values) IPC code C01B B32B B29C G01N B01J C08K A61K H01M B05D The B01J was significant to the C01B because its p-value was less than Also, we knew that the B05D affected the C01B and B32B significantly. Using these results, we could construct our path diagram for the nanotechnology ETF model, as shown in Figure 7. Figure 7. Emerging Technology Path for Nanotechnology The nanotechnology (H01L) is developing by the development of four technologies based on C01B, B32B, B29C, and G01N. Also, the technology of C01B depends on the technological developments of B01J and B05D. The development of B05D affects the technology of B32B. 5. Conclusions In this paper, we proposed an ETF method using statistical inference and neural networks. This research used the IPC code data of patent documents for the ETF model. As a case study to verify our ETF model, we selected nanotechnology as the target technology. Using the results of p-values of regression and correlation analysis, we extracted the meaningful IPC codes from the PICM and constructed an ETF model. The 113
8 proposed model used multiple regression and neural networks models to determine the relationships between IPC codes. Our target technology was defined as the dependent variable in our model, and the other codes were treated as independent variables. From the results of the regression and neural networks, we constructed a technological path diagram for ETF. We concluded that the ETF of nanotechnology requires the development of the C01B, B32B, B29C, and G01N technologies directly and the development of B01J and B05D indirectly. The contribution of our research is an objective attempt to forecast emerging technology by quantitative methods (regression and neural networks) and objective data (IPC codes of patent documents). Our model can be applied to the ETF works of diverse technologies. In this research, we had some limitations such as the dependency of interpretation of the constructed path diagram. In other words, we need nanotechnology experts to determine the effective usage of the nanotechnology path diagram. In future work, we will research more objective ETF approaches to address this problem. References [1] A. K. Firat, W. L. Woon and S. Madnick, Technological Forecasting-A Review, Working paper CISL# , (2008). [2] S. Jun, A Forecasting Model for Technological Trend Using Unsupervised Learning, Communications in Computer and Information Science, vol. 258, (2011), pp [3] A. T. Roper, S. W. Cunningham, A. L. Porter, T. W. Mason, F. A. Rossini and J. Banks, Forecasting and Management of Technology, Wiley, (2011). [4] S. Jun, S. Park and D. Jang, Technology Forecasting using Matrix Map and Patent Clustering, Industrial Management & Data Systems, vol. 112, Issue 5, (2012). [5] Y. G. Kim, J. H. Suh and S. C. Park, Visualization of patent analysis for emerging technology, Expert Systems with Applications, vol. 34, (2008), pp [6] T. U. Daim, G. Rueda, H. Martin and P. Gerdsri, Forecasting emerging technologies: Use of bibliometrics and patent analysis, Technological Forecasting & Social Change, vol. 73, (2006), pp [7] M. Bengisu and R. Nekhili, Forecasting emerging technologies with the aid of science and technology databases, Technological Forecasting & Social Change, vol. 73, (2006), pp [8] S. Park and S. Jun, New Technology Management Using Time Series Regression and Clustering, International Journal of Software Engineering and Its Applications, vol. 6, no. 2, (2012), pp [9] M. Fattori, G. Pedrazzi and R. Turra, Text mining applied to patent mapping: a practical business case, World Patent Information, vol. 25, (2003), pp [10] S. Lee, B. Toon and Y. Park, An approach to discovering new technology opportunities: Keywords-based patent map approach, technovation, vol. 29, (2009), pp [11] V. W. Mitchell, Using Delphi to Forecast in New Technology Industries, Marketing Intelligence & Planning, vol. 10, Issue 2, (1992), pp [12] Y. C. Yun, G. H. Jeong and S. H. Kim, A Delphi technology forecasting approach using a semi-markov concept, Technological Forecasting and Social Change, vol. 40, (1991), pp [13] Y. H. Tseng, C. J. Lin and Y. I. Linc, Textmining techniques for patent analysis, Information Processing & Management, vol. 43, Issue 5, (2007), pp [14] S. Jun, IPC code Analysis of Patent Documents Using Association Rules and Maps-Patent Analysis of Database Technology, Communications in Computer and Information Science, vol. 258, (2011), pp [15] G. Casella and R. L. Berger, Statistical Inference, Duxbury, (2002). [16] R. H. Myers, Classical and Modern Regression with Applications, Duxbury, (1990). [17] P. Giudici, Applied Data Mining Statistical Methods for Business and Industry, Wiley, (2003). [18] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning-Data Mining, Inference, and Prediction, Springer, (2001). [19] B. L. Bowerman, R. T. O Connell and A. B, Koehler, Forecasting, Time Series, and Regression, An Applied Approach, Brooks/Cole, (2005). [20] Korea Intellectual Property Rights Information Service (KIPRIS), 114
9 Sunghae Jun Authors He received the BS, MS, and PhD degrees in department of statistics, Inha University, Korea, in 1993, 1996, and Also, He received PhD degree in department of computer science, Sogang University in He was a visiting scholar in department of statistics, Oklahoma State University in the United States from 2009 to He is currently associate professor in department of statistics, Cheongju University. Seung-Joo Lee He received the BS degree in department of applied statistics from Cheongju University, Korea in Also, he received MS, and PhD degrees in department of statistics, Dongkuk University, Korea, in 1987 and He is currently professor in department of statistics, Cheongju University. He has researched Bayesian and multi-variate statistics. 115
10 116
Patent Big Data Analysis by R Data Language for Technology Management
, pp. 69-78 http://dx.doi.org/10.14257/ijseia.2016.10.1.08 Patent Big Data Analysis by R Data Language for Technology Management Sunghae Jun * Department of Statistics, Cheongju University, 360-764, Korea
More informationA Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical
More informationTECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING
TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING Sunghae Jun 1 1 Professor, Department of Statistics, Cheongju University, Chungbuk, Korea Abstract The internet of things (IoT) is an
More informationA Divided Regression Analysis for Big Data
Vol., No. (0), pp. - http://dx.doi.org/0./ijseia.0...0 A Divided Regression Analysis for Big Data Sunghae Jun, Seung-Joo Lee and Jea-Bok Ryu Department of Statistics, Cheongju University, 0-, Korea shjun@cju.ac.kr,
More informationNew Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
More informationThe Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network
, pp.67-76 http://dx.doi.org/10.14257/ijdta.2016.9.1.06 The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network Lihua Yang and Baolin Li* School of Economics and
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationTABLE OF CONTENTS. About Chi Squares... 1. What is a CHI SQUARE?... 1. Chi Squares... 1. Hypothesis Testing with Chi Squares... 2
About Chi Squares TABLE OF CONTENTS About Chi Squares... 1 What is a CHI SQUARE?... 1 Chi Squares... 1 Goodness of fit test (One-way χ 2 )... 1 Test of Independence (Two-way χ 2 )... 2 Hypothesis Testing
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationCHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA
CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA Chapter 13 introduced the concept of correlation statistics and explained the use of Pearson's Correlation Coefficient when working
More informationNew Ensemble Combination Scheme
New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationFactors affecting online sales
Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4
More informationEstimation of σ 2, the variance of ɛ
Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated
More informationBig Data Analytics for SCADA
ENERGY Big Data Analytics for SCADA Machine Learning Models for Fault Detection and Turbine Performance Elizabeth Traiger, Ph.D., M.Sc. 14 April 2016 1 SAFER, SMARTER, GREENER Points to Convey Big Data
More informationPredict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationEFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
More informationWeek TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480
1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500
More informationAssessing Data Mining: The State of the Practice
Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality
More informationA Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries Aida Mustapha *1, Farhana M. Fadzil #2 * Faculty of Computer Science and Information Technology, Universiti Tun Hussein
More informationA Study on Quantitative Analysis for New Product Development Strategy
A Study on Quantitative Analysis for New Product Development Strategy Junseok Lee 1, Hyun woo Kim 1, Jongchan Kim 1, Joonhyuck Lee 1, Sangsung Park 2, Dongsik Jang 1 1 Department of Industrial Management
More informationPrediction Model for Crude Oil Price Using Artificial Neural Networks
Applied Mathematical Sciences, Vol. 8, 2014, no. 80, 3953-3965 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.43193 Prediction Model for Crude Oil Price Using Artificial Neural Networks
More informationAPPLICATION OF INTELLIGENT METHODS IN COMMERCIAL WEBSITE MARKETING STRATEGIES DEVELOPMENT
ISSN 1392 124X INFORMATION TECHNOLOGY AND CONTROL, 2005, Vol.34, No.2 APPLICATION OF INTELLIGENT METHODS IN COMMERCIAL WEBSITE MARKETING STRATEGIES DEVELOPMENT Algirdas Noreika Department of Practical
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationData quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
More informationDATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
More informationPattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process
Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek Abstract Depending
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationDiagnosis of Students Online Learning Portfolios
Diagnosis of Students Online Learning Portfolios Chien-Ming Chen 1, Chao-Yi Li 2, Te-Yi Chan 3, Bin-Shyan Jong 4, and Tsong-Wuu Lin 5 Abstract - Online learning is different from the instruction provided
More informationBOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
More informationLecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions
SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationAPPLICATION OF ARTIFICIAL NEURAL NETWORKS USING HIJRI LUNAR TRANSACTION AS EXTRACTED VARIABLES TO PREDICT STOCK TREND DIRECTION
LJMS 2008, 2 Labuan e-journal of Muamalat and Society, Vol. 2, 2008, pp. 9-16 Labuan e-journal of Muamalat and Society APPLICATION OF ARTIFICIAL NEURAL NETWORKS USING HIJRI LUNAR TRANSACTION AS EXTRACTED
More informationRegression step-by-step using Microsoft Excel
Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
More informationHow To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
More informationPerformance Evaluation of Requirements Engineering Methodology for Automated Detection of Non Functional Requirements
Performance Evaluation of Engineering Methodology for Automated Detection of Non Functional J.Selvakumar Assistant Professor in Department of Software Engineering (PG) Sri Ramakrishna Engineering College
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationApplication of Predictive Model for Elementary Students with Special Needs in New Era University
Application of Predictive Model for Elementary Students with Special Needs in New Era University Jannelle ds. Ligao, Calvin Jon A. Lingat, Kristine Nicole P. Chiu, Cym Quiambao, Laurice Anne A. Iglesia
More informationBig Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationPrinciple Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
More informationA Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data
A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data Athanasius Zakhary, Neamat El Gayar Faculty of Computers and Information Cairo University, Giza, Egypt
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationChapter 23. Inferences for Regression
Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationTowards better understanding Cybersecurity: or are "Cyberspace" and "Cyber Space" the same?
Towards better understanding Cybersecurity: or are "Cyberspace" and "Cyber Space" the same? Stuart Madnick Nazli Choucri Steven Camiña Wei Lee Woon Working Paper CISL# 2012-09 November 2012 Composite Information
More informationSOCIOLOGY 7702 FALL, 2014 INTRODUCTION TO STATISTICS AND DATA ANALYSIS
SOCIOLOGY 7702 FALL, 2014 INTRODUCTION TO STATISTICS AND DATA ANALYSIS Professor Michael A. Malec Mailbox is in McGuinn 426 Office: McGuinn 427 Phone: 617-552-4131 Office Hours: TBA E-mail: malec@bc.edu
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationAnalecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
More informationInternational Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013
A Short-Term Traffic Prediction On A Distributed Network Using Multiple Regression Equation Ms.Sharmi.S 1 Research Scholar, MS University,Thirunelvelli Dr.M.Punithavalli Director, SREC,Coimbatore. Abstract:
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationCalculating the Probability of Returning a Loan with Binary Probability Models
Calculating the Probability of Returning a Loan with Binary Probability Models Associate Professor PhD Julian VASILEV (e-mail: vasilev@ue-varna.bg) Varna University of Economics, Bulgaria ABSTRACT The
More informationClass 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationNew Matrix Approach to Improve Apriori Algorithm
New Matrix Approach to Improve Apriori Algorithm A. Rehab H. Alwa, B. Anasuya V Patil Associate Prof., IT Faculty, Majan College-University College Muscat, Oman, rehab.alwan@majancolleg.edu.om Associate
More information270107 - MD - Data Mining
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 015 70 - FIB - Barcelona School of Informatics 715 - EIO - Department of Statistics and Operations Research 73 - CS - Department of
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationNam-gu, Incheon, Korea 2 Division of Industrial Engineering and Management, Sungkyul University,
Vol.87 (Art, Culture, Game, Graphics, Broadcasting and Digital Contents 2015), pp.6-11 http://dx.doi.org/10.14257/astl.2015.87.02 Differences in the Environmental Management and Ethical Management Practices
More informationA FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING
A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India sumit_13@yahoo.com 2 School of Computer
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationAudit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila
Audit Analytics --An innovative course at Rutgers Qi Liu Roman Chinchila A new certificate in Analytic Auditing Tentative courses: Audit Analytics Special Topics in Audit Analytics Forensic Accounting
More informationMobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
More informationAUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo 71251911@mackenzie.br,nizam.omar@mackenzie.br
More informationPlanning Workforce Management for Bank Operation Centers with Neural Networks
Plaing Workforce Management for Bank Operation Centers with Neural Networks SEFIK ILKIN SERENGIL Research and Development Center SoftTech A.S. Tuzla Teknoloji ve Operasyon Merkezi, Tuzla 34947, Istanbul
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationHow To Understand The Theory Of Probability
Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationComparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More informationCategorical Data Analysis
Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods
More informationJoseph Twagilimana, University of Louisville, Louisville, KY
ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim
More informationNeural Networks for Sentiment Detection in Financial Text
Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.
More informationSingle Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results
, pp.33-40 http://dx.doi.org/10.14257/ijgdc.2014.7.4.04 Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results Muzammil Khan, Fida Hussain and Imran Khan Department
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationData Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationComparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
Volume 3, No. 8, August 2012 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
More informationIntroduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
More informationPower Prediction Analysis using Artificial Neural Network in MS Excel
Power Prediction Analysis using Artificial Neural Network in MS Excel NURHASHINMAH MAHAMAD, MUHAMAD KAMAL B. MOHAMMED AMIN Electronic System Engineering Department Malaysia Japan International Institute
More informationChapter 6: Multivariate Cointegration Analysis
Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration
More informationIT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
More informationCurriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different
More informationPredictive Analytics Tools and Techniques
Global Journal of Finance and Management. ISSN 0975-6477 Volume 6, Number 1 (2014), pp. 59-66 Research India Publications http://www.ripublication.com Predictive Analytics Tools and Techniques Mr. Chandrashekar
More informationNine Common Types of Data Mining Techniques Used in Predictive Analytics
1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationIntroduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
More informationAdvanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationStatistical Functions in Excel
Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.
More informationPremaster Statistics Tutorial 4 Full solutions
Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for
More informationCourse Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
More information