Pattern-Aided Regression Modelling and Prediction Model Analysis
|
|
|
- Brendan Blake
- 9 years ago
- Views:
Transcription
1 San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Fall 2015 Pattern-Aided Regression Modelling and Prediction Model Analysis Naresh Avva Follow this and additional works at: Recommended Citation Avva, Naresh, "Pattern-Aided Regression Modelling and Prediction Model Analysis" (2015). Master's Projects. Paper 441. This Master's Project is brought to you for free and open access by the Master's Theses and Graduate Research at SJSU ScholarWorks. It has been accepted for inclusion in Master's Projects by an authorized administrator of SJSU ScholarWorks. For more information, please contact
2 Pattern-Aided Regression Modelling and Prediction Model Analysis A Project Presented to The Faculty of the Department of Computer Science San Jose State University In Partial Fulfillment of the Requirements for the Degree Master of Science by Naresh Avva Dec 2015
3 2015 Naresh Avva ALL RIGHTS RESERVED
4 The Designated Project Committee Approves the Project Titled Pattern-Aided Regression Modelling and Prediction Model Analysis by Naresh Avva APPROVED FOR THE DEPARTMENT OF COMPUTER SCIENCE SAN JOSE STATE UNIVERSITY Dec 2015 Dr. Tsau Young Lin Dr. Robert Chun Mr. Prasad Kopanati Department of Computer Science Department of Computer Science Advisor at ManyShip Inc.
5 ABSTRACT Pattern-Aided Regression Modelling and Prediction Model Analysis by Naresh Avva In this research, we develop an application for generating a pattern aided regression (PXR) model, a new type of regression model designed to represent accurate and interpretable prediction model. Our goal is to generate a PXR model using Contrast Pattern Aided Regression (CPXR) method and compare it with the multiple linear regression method. The PXR models built by CPXR are very accurate in general, often outperforming state-of-the-art regression methods by big margins. CPXR is especially effective for high-dimensional data. We use pruning to improve the classification accuracy and to remove outliers from the dataset. We provide implementation details and give experimental results. Finally, we show that the system is practical and better in comparison to other available methods.
6 ACKNOWLEDGEMENTS First and foremost, I would like to take this opportunity to thank my project advisor, Dr. Tsau Young Lin, for his constant guidance and trust on me. It wouldn t have been possible without his contribution throughout the project. I would also like to thank my committee members Dr. Robert Chun and Mr. Prasad Kopanati, for their invaluable advices and crucial comments during project development. Furthermore, I would like to thank my parents for always being there during my master s program. Finally, I would like to thank all my friends for their support throughout the completion of this project. v
7 Table of Contents 1 INTRODUCTION BACKGROUND PATTERN RECOGNITION PATTERN MATCHING DATA MINING MACHINE LEARNING PATTERN-AIDED REGRESSION MODELLING AND PREDICTIVE MODELLING ANALYSIS ALGORITHM FLOW DIAGRAM USE CASE DIAGRAM CLASS DIAGRAM SEQUENCE DIAGRAM ENTITY-RELATIONSHIP DIAGRAM MODULES Data loading and preprocessing Predict rating Root mean square Compare accuracy Performance analysis Support Values SYSTEM REQUIREMENTS SOFTWARE REQUIREMENTS HARDWARE REQUIREMENTS SOFTWARE DESCRIPTION JAVA NET BEANS WAMP SERVER MYSQL PLATFORMS AND INTERFACES EXPERIMENTS AND RESULTS CONCLUSION FUTURE WORK LIST OF REFERENCES APPENDIX ADDITIONAL SCREEN-SHOTS vi
8 LIST OF TABLES TABLE 1: ACRONYMS [1] 9 TABLE 2. RESULTS 29 LIST OF FIGURES FIGURE 1. CPXR ALGORITHM [1] FIGURE 2. THE ITERATIMP(CPS, PS) FUNCTION [1] FIGURE 3. FLOW DIAGRAM FIGURE 4. USE CASE DIAGRAM FIGURE 5. CLASS DIAGRAM FIGURE 6. SEQUENCE DIAGRAM FIGURE 7. ENTITY RELATIONSHIP DIAGRAM FIGURE 8. DATA LOADING & PREPROCESSING FIGURE 9. PREDICT RATING FIGURE 10. ROOT MEAN SQUARE FIGURE 11. COMPARE ACCURACY FIGURE 12. PERFORMANCE ANALYSIS FIGURE 13. SUPPORT VALUES FIGURE 14. APPLICATION START FRAME FIGURE 15. DATA SELECTION FRAME FIGURE 16. DATA SELECTION FRAME FIGURE 17. FILE CONTENT VIEW FRAME FIGURE 18. FILE PREPROCESS FRAME FIGURE 19. FILE PREPROCESS FRAME FIGURE 20. FILE UPLOAD FRAME FIGURE 21. FILE UPLOAD FRAME FIGURE 22. FIND WEIGHT VALUES FRAME FIGURE 23. FIND PREDICT RATING FRAME FIGURE 24. FIND MEAN SQUARE ERROR FRAME FIGURE 25. FIND ROOT MEAN SQUARE ERROR AND MEAN VALUE FRAME FIGURE 26. FIND CLASSIFICATION FRAME FIGURE 27. MAXIMUM PROBABILITY FRAME FIGURE 28. MAXIMUM PROBABILITY DETAILS FRAME FIGURE 29. FIND PREDICT RATING FOR MAXIMUM VALUES FRAME FIGURE 30. FIND MEAN SQUARE ERROR FOR MAXIMUM VALUES FRAME FIGURE 31. FIND ROOT MEAN SQUARE FOR MAXIMUM VALUES FRAME FIGURE 32. FIND MEAN VALUE IN RMSE FOR MAXIMUM VALUES FRAME FIGURE 33. FIND CLASSIFICATION FOR MAXIMUM VALUES FRAME FIGURE 34. FIND DROP IN ACCURACY FOR ATTRIBUTES IN RMSE AND RMSE WITH SUPPORT COUNT VALUES FRAME FIGURE 35. FIND DROP IN ACCURACY FOR ATTRIBUTES IN RMSE AND RMSE WITH SUPPORT COUNT VALUES FRAME FIGURE 36. FIND COMPARISON BETWEEN RMSE AND RMSE WITH SUPPORT COUNT VALUES FRAME.. 55 FIGURE 37. COMPARISON GRAPH BETWEEN RMSE & RMSE WITH SUPPORT COUNT VALUES FRAME.. 56 vii
9 ACRONYMS Table 1: Acronyms [1] viii
10 CHAPTER 1 Introduction The contrast pattern aided regression (CPXR) method is a novel, robust, and powerful regression-based method for building prediction models [1]. CPXR provides high accuracy and works with varied predictor-response relationships [1]. CPXR generated prediction models are also representable as compared to artificial neural network models [1]. The results of many experiments conducted suggested that models developed by CPXR method are more accurate than others [1]. CPXR method gave improved performance as compared to other classifiers when applied in other areas of research [1]. The key point of CPXR is to utilize a pattern, in association of certain conditions on a limited number of predictor variables, as logical characterization of a subgroup of data, and a local regression model (corresponded to pattern) as a behavioral characterization of the predictor-response relationship for data instances of that subgroup of data [1]. CPXR can combine a pattern and a local regression model to show a distinct predictor-response relationship for subgroup of data and that s what makes CPXR powerful method [1]. CPXR s ability to choose a profoundly collaborative series of limited number of patterns to augment the entire collective prediction accuracy is one the prime reasons as well to outplay a lot of other methods [1]. The CPXR algorithm generates a pattern aided logistic regression model represented by certain patterns and certain related logistic regression models [1]. CPXR generated prediction models are clear and simple to understand. CPXR can be 9
11 used in different set of areas such as clinical applications because of its capability to efficiently control data with varied predictor-response relationships [1]. The application designed for this research paper implements the functionality of CPXR by making use of statistical formulas such Root Mean Square Error, Mean Calculation, etc. The application classifies the patterns into two parts such as Large Errors and Small Errors, later mining the patterns in Large Errors group to fulfil the purpose of searching contrast patterns in Large Errors group. 10
12 CHAPTER 2 Background This chapter consists of introduction to pattern recognition, pattern matching, data mining, and machine learning. 2.1 Pattern Recognition Pattern Recognition comes under machine learning; it focuses on recognizing the patterns by monitoring the uniformity of data [2]. There are two ways to train Pattern recognition systems, the first one is by providing labeled training data (supervised learning and the other case arises when there is no labelled data available, then by using different algorithms one can determine already undiscovered patterns (unsupervised learning) [2]. 2.2 Pattern Matching Pattern matching is the methodology which follows the principle of verifying the already provided series of tokens for existence of components of some pattern [3]. In Pattern matching the match has to be identical, unlike pattern recognition [3]. 2.3 Data Mining In big datasets with millions of instances, to anticipate the results; data mining is the process used to determine irregularities, patterns and correlations [4]. By making use of several other techniques, one can utilize this useful information to boost revenues, cut costs, enhance customer relationships, lower risks and more [4]. 2.4 Machine Learning Machine learning is branch of computer science that emerged from the research of pattern recognition and computational learning theory in artificial 11
13 intelligence [5]. There are advanced algorithms being developed such as to learn from and to do predictions on the dataset [5]. The two main areas in machine learning are supervised learning and unsupervised learning [5]. 12
14 CHAPTER 3 Pattern-Aided Regression Modelling and Predictive Modelling Analysis CPXR begins by developing a standard regression model by training database DT utilizing multiple linear regression technique [1]. Then, patterns are categorized on the errors of equivalent standard regression model, and aggregate error is computed [1]. A subjective value of 45 percent of aggregate error is selected to determine the cutting point, which partitions the training database DT into two collections: LE (Large Errors) and SE (Small Errors) [1]. LE consists of those patterns whose aggregate error is higher than 45 percent of the aggregate error [1]. Note that 45 percent is a perfect value established by evaluating more than 50 various databases in diverse research fields [1]. An entropy-based binning approach is utilized for distinct input variables and define elements [1]. Then CPXR analyzes entire set of patterns of LE category, because those patterns are highly repeated in LE than in SE, they are fairly to capture subgroups of data where standard regression model produces high prediction errors [1]. Certain filters are utilized to discard those patterns, which are highly identical to others [1]. For each existing contrast pattern then a local multiple linear regression model is developed [1]. At this step, several patterns and local multiple linear regression models, which fail to enhance the accuracy of predictions are discarded [1]. CPXR then employ s a double (nested) loop to find an optimal pattern [1]. For this objective, we change a pattern by another one in pattern set to lower the errors in each iteration [1]. 13
15 3.1 Algorithm Figure 1. CPXR Algorithm [1] Figure 2. The IteratImp(CPS, PS) Function [1] 14
16 3.2 Flow Diagram Figure 3. Flow Diagram 15
17 3.3 Use Case Diagram Figure 4. Use Case Diagram 16
18 3.4 Class Diagram Figure 5. Class Diagram 17
19 3.5 Sequence Diagram Figure 6. Sequence Diagram 18
20 3.6 Entity-Relationship Diagram Figure 7. Entity Relationship Diagram 19
21 3.7 Modules Data loading and preprocessing It s a module consisting of 4 different phases. Data loading and preprocessing phase lets the user to select the data and load it to the application, where data preprocessing is the process which converts the loaded data to proper machine readable format by removing the extra space in the data [1]. Equivalence classes (EC) consists of set of contrast patterns that are portioned from total set to avoid redundant pattern processing [1]. Its adequate to deal with just one pattern per EC because multiple patterns possessing exact behavior can be counted as one [1]. Furthermore, it can be demonstrated that every single EC can be represented by a closed pattern (the longest in the EC) and a series of minimal-generator (MG) patterns (minimal with respect to); so an EC consists of entire set of patterns Q fulfilling the criteria Q is a superset of some MG and Q is the subset of the closed pattern, of the EC [1]. Figure 8. Data Loading & Preprocessing 20
22 3.7.2 Predict rating In the CPXR method, the first thing we do is to divide the dataset D into two collections, LE (Large Error) and SE (Small Error), containing instances of D where f (prediction model) forms large/small prediction errors respectively [1]. CPXR then examines for a small collection of contrast patterns of LE to enhance the trr (Total residual reduction) measure, and utilizes that set to develop a PXR model [1]. Figure 9. Predict Rating 21
23 3.7.3 Root mean square The efficiency of a prediction model f is generally calculated based on its prediction residuals [1]. The residual of f on any appropriate instance the difference between the predicted and observed response variable values [1]. The most widely used quality measure is RMSE ( Root Mean Square Error), where [1]. Figure 10. Root Mean Square 22
24 3.7.4 Compare accuracy In Compare Accuracy module, we compare the RMSE reduction achieved by following two techniques CPXR(LP) and CPXR(LL) [1]. -CPXR(LP) is CPXR using LR (Linear Regression Algorithm) to build baseline models and PL (Piecewise Linear Regression) to build local regression models [1]. -CPXR(LL) is CPXR using LR (Linear Regression Algorithm) to build baseline models and LR (Linear Regression) to build local regression models [1]. CPXR(LP) attained highest average RMSE reduction (42.89%) over 80% several times during the experiments and it was found that CPXR(LL) being less accurate among the two [1]. Figure 11. Compare Accuracy 23
25 3.7.5 Performance analysis The CPXR is compared with other regression techniques, on several datasets for factors such as prediction accuracy, overfitting, and sensitivity to noise [1]. The outcomes of the comparison show that CPXR performs constantly higher than other techniques and that to with big-margins [1]. We test the effect of parameters and standard regression methods on CPXR s efficiency, computation time, and usage of memory [1]. At last we review the benefits of utilizing contrast patterns instead of frequent patterns in developing PXR models [1]. Figure 12. Performance Analysis 24
26 3.7.6 Support Values Double pruning is the technique that can be achieved by using support values to enhance the classification accuracy and discard the outliers from the data [1]. In machine learning, pruning is the technique that is used for decreasing the data size by discarding parts of data that give limited power to categorize instances [1]. Figure 13. Support Values 25
27 CHAPTER 4 System Requirements 4.1 Software Requirements Operating System : Windows. Programing Language : Java. IDE : NetBeans Data Base : MySQL 4.2 Hardware Requirements CPU : Minimum 2.4 GHz Hard Disk : Minimum 160 GB RAM : Minimum 2GB 26
28 CHAPTER 5 Software Description 5.1 Java Java is an object oriented programming language designed for cross platform usage [6]. Java applications are generally complied to byte code (class file) that can execute on any Java Virtual Machine (JVM) indifferent of computer architecture [6]. Java goes by the phrase Write once, run anywhere [6]. Java is widely used for client-server web applications [6]. 5.2 Net Beans Net beans is a framework for building Java Swing desktop applications [7]. It s designed to provide reusability and for simplification of java applications development [7]. It s an Integrated Development Environment (IDE) package for Java SE, that s essential to start Net Beans plug-in and Net beans platform based applications development, supplementary SDK is not needed [7]. 5.3 WAMP Server Windows/Apache/MySQL/PHP (WAMP), it s a web development environment [8]. The operating system for WAMP is Windows, web server is Apache, database is MySQL, scripting language is PHP [8]. WAMP server comes with an interactive GUI for MySQL database manager called PHPMyAdmin [9]. 5.4 MySQL MySQL is an open source Structured Query Language for relational database management system [10]. Its written in C and C++ [10]. It s world s most widely 27
29 used Query Language for its simplicity and high functionality [10]. Some of the popular applications using MySQL database are Drupal, WordPress, and TYPO3, etc. [10]. 5.5 Platforms and interfaces Lot of programming languages with language-specific APIs consists libraries for using databases [11]. This include Java Database Connectivity (JDBC) driver for Java [11]. JDBC presents approaches to query and revise data in database [11]. A JDBC-to-ODBC bridge facilitates to utilize the ODBC functionalities present in Java Virtual Machine (JVM) [11]. 28
30 CHAPTER 6 Experiments and Results The application has been trained and tested with cancer and student datasets. The objective of the research behind building this application is to analyze the contrast patterns from the dataset and produce accurate and interpretable PXR model [1]. The aim is to find the patterns from the cancer dataset which show high probability for cancer and the goal with student dataset is to predict students performance in secondary education. The cancer dataset consists of 16 attributes and 1660 instances & the Student dataset consists of 33 attributes and 649 and 395 instances for Student-por (Student s who took Portugal language class) and Studentmat (Student s who took Math class) respectively. The application uses the statistical functions and formulas proposed in the research paper to achieve high performance output. The Results table shows the stats of the application s performance in terms of the dataset size, execution time and memory usage. Dataset Number of Instances Table 2. Results Number of attributes Execution Time (minutes) Memory (MB) Cancer Student-por Student-mat
31 CHAPTER 7 Conclusion The research paper allowed us to successfully implement the novel concept of CPXR methods using statistical functions and formulas coupled with pruning in the application to build more accurate and interpretable PXR model as compared to other methods. The results achieved from the experiments suggest that CPXR method predicted output variables better than traditional PXR generation techniques in both train and test phases of application. In generic, CPXR can efficiently deal with data having varied predictor-response relationships. 30
32 CHAPTER 8 Future Work The application can be designed to test with some other classification and regression dataset. The application can be designed to work in the area of medical data. The application can be designed in such a way that it can extract the patterns from the images and then proceed further to pattern recognition phase. The application in future can be combined with a retina or finger print scanning machine and store the scans from the scanner as a pattern and then run the algorithm to recognize the various patterns. This will help in biometric security. Big datasets with large volumes of data can be used to test the application and its performance. 31
33 LIST OF REFERENCES [1] G. Dong and V. Taslimitehrani, 'Pattern-Aided Regression Modeling and Prediction Model Analysis', IEEE Trans. Knowl. Data Eng., vol. 27, no. 9, pp , [2] Wikipedia, 'Pattern recognition', [Online]. Available: [Accessed: 23- Nov- 2015]. [3] Wikipedia, 'Pattern matching', [Online]. Available: [Accessed: 23- Nov- 2015]. [4] Sas.com, 'What is data mining?', [Online]. Available: N15dnwFqjJ7Y-WgAaE6Bjq_rnDuxW1NOQJhoCmDjw_wcB. [Accessed: 23- Nov- 2015]. [5] Wikipedia, 'Machine learning', [Online]. Available: [Accessed: 23- Nov- 2015]. [6] Wikipedia, 'Java (programming language)', [Online]. Available: [Accessed: 23- Nov- 2015]. [7] Wikipedia, 'NetBeans', [Online]. Available: [Accessed: 23- Nov- 2015]. [8] Webopedia.com, 'What is WAMP? Webopedia', [Online]. Available: [Accessed: 23- Nov- 2015]. [9] Softonic, 'WampServer', [Online]. Available: [Accessed: 23- Nov- 2015]. [10] Wikipedia, 'MySQL', [Online]. Available: [Accessed: 23- Nov- 2015]. [11] Wikipedia, 'Java Database Connectivity', [Online]. Available: [Accessed: 23- Nov- 2015]. 32
34 APPENDIX Additional Screen-shots Figure 14. Application Start Frame 33
35 Figure 15. Data selection Frame 1 34
36 Figure 16. Data Selection Frame 2 35
37 Figure 17. File Content View Frame 36
38 Figure 18. File Preprocess Frame 1 37
39 Figure 19. File Preprocess Frame 2 38
40 Figure 20. File Upload Frame 1 39
41 Figure 21. File Upload Frame 2 40
42 Figure 22. Find Weight Values Frame 41
43 Figure 23. Find Predict Rating Frame 42
44 Figure 24. Find Mean Square Error Frame 43
45 Figure 25. Find Root Mean Square Error and Mean Value Frame 44
46 Figure 26. Find Classification Frame 45
47 Figure 27. Maximum Probability Frame 46
48 Figure 28. Maximum Probability Details Frame 47
49 Figure 29. Find Predict Rating for Maximum Values Frame 48
50 Figure 30. Find Mean Square Error for Maximum Values Frame 49
51 Figure 31. Find Root Mean Square for Maximum Values Frame 50
52 Figure 32. Find Mean Value in RMSE for Maximum Values Frame 51
53 Figure 33. Find Classification for Maximum Values Frame 52
54 Figure 34. Find Drop in Accuracy for Attributes in RMSE and RMSE with Support Count Values Frame 1 53
55 Figure 35. Find Drop in Accuracy for Attributes in RMSE and RMSE with Support Count Values Frame 2 54
56 Figure 36. Find Comparison between RMSE and RMSE with Support Count Values Frame 55
57 Figure 37. Comparison Graph between RMSE & RMSE with Support Count Values Frame 56
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
CiteSeer x in the Cloud
Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013
A Short-Term Traffic Prediction On A Distributed Network Using Multiple Regression Equation Ms.Sharmi.S 1 Research Scholar, MS University,Thirunelvelli Dr.M.Punithavalli Director, SREC,Coimbatore. Abstract:
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
CREDIT CARD FRAUD DETECTION SYSTEM USING GENETIC ALGORITHM
CREDIT CARD FRAUD DETECTION SYSTEM USING GENETIC ALGORITHM ABSTRACT: Due to the rise and rapid growth of E-Commerce, use of credit cards for online purchases has dramatically increased and it caused an
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Advanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7
DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
Is a Data Scientist the New Quant? Stuart Kozola MathWorks
Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Data Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
COURSE RECOMMENDER SYSTEM IN E-LEARNING
International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand
Distributed Framework for Data Mining As a Service on Private Cloud
RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &
Data Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
Classification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
Neural network software tool development: exploring programming language options
INEB- PSI Technical Report 2006-1 Neural network software tool development: exploring programming language options Alexandra Oliveira [email protected] Supervisor: Professor Joaquim Marques de Sá June 2006
Distributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
A Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
Business Application Services Testing
Business Application Services Testing Curriculum Structure Course name Duration(days) Express 2 Testing Concept and methodologies 3 Introduction to Performance Testing 3 Web Testing 2 QTP 5 SQL 5 Load
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
A Review of Anomaly Detection Techniques in Network Intrusion Detection System
A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
How To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Research Article Traffic Analyzing and Controlling using Supervised Parametric Clustering in Heterogeneous Network. Coimbatore, Tamil Nadu, India
Research Journal of Applied Sciences, Engineering and Technology 11(5): 473-479, 215 ISSN: 24-7459; e-issn: 24-7467 215, Maxwell Scientific Publication Corp. Submitted: March 14, 215 Accepted: April 1,
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Analysis Tools and Libraries for BigData
+ Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
Spark: Cluster Computing with Working Sets
Spark: Cluster Computing with Working Sets Outline Why? Mesos Resilient Distributed Dataset Spark & Scala Examples Uses Why? MapReduce deficiencies: Standard Dataflows are Acyclic Prevents Iterative Jobs
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/
Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/ Email: [email protected] Office: Dipartimento di Ingegneria
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Dynamic Resource allocation in Cloud
Dynamic Resource allocation in Cloud ABSTRACT: Cloud computing allows business customers to scale up and down their resource usage based on needs. Many of the touted gains in the cloud model come from
Advanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Prerequisites. Course Outline
MS-55040: Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot Description This three-day instructor-led course will introduce the students to the concepts of data mining,
How To Predict Web Site Visits
Web Site Visit Forecasting Using Data Mining Techniques Chandana Napagoda Abstract: Data mining is a technique which is used for identifying relationships between various large amounts of data in many
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014
Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA
Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/
Application of Predictive Analytics for Better Alignment of Business and IT
Application of Predictive Analytics for Better Alignment of Business and IT Boris Zibitsker, PhD [email protected] July 25, 2014 Big Data Summit - Riga, Latvia About the Presenter Boris Zibitsker
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL
International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR
Server Load Prediction
Server Load Prediction Suthee Chaidaroon ([email protected]) Joon Yeong Kim ([email protected]) Jonghan Seo ([email protected]) Abstract Estimating server load average is one of the methods that
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du [email protected] University of British Columbia
Machine Learning Capacity and Performance Analysis and R
Machine Learning and R May 3, 11 30 25 15 10 5 25 15 10 5 30 25 15 10 5 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 100 80 60 40 100 80 60 40 100 80 60 40 30 25 15 10 5 25 15 10
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
Performance Analysis of Web based Applications on Single and Multi Core Servers
Performance Analysis of Web based Applications on Single and Multi Core Servers Gitika Khare, Diptikant Pathy, Alpana Rajan, Alok Jain, Anil Rawat Raja Ramanna Centre for Advanced Technology Department
Student Attendance Through Mobile Devices
Student Attendance Through Mobile Devices Anurag Rastogi Kirti Gupta Department of Computer Science and Engineering National Institute of Technology Rourkela Rourkela-769 008, Odisha, India Student Attendance
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
CMS Query Suite. CS4440 Project Proposal. Chris Baker Michael Cook Soumo Gorai
CMS Query Suite CS4440 Project Proposal Chris Baker Michael Cook Soumo Gorai I) Motivation Relational databases are great places to efficiently store large amounts of information. However, information
8. Machine Learning Applied Artificial Intelligence
8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name
IT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
A Web-based Interactive Data Visualization System for Outlier Subspace Analysis
A Web-based Interactive Data Visualization System for Outlier Subspace Analysis Dong Liu, Qigang Gao Computer Science Dalhousie University Halifax, NS, B3H 1W5 Canada [email protected] [email protected] Hai
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data
Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 2 nd, 2014 Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition
SOA, case Google. Faculty of technology management 07.12.2009 Information Technology Service Oriented Communications CT30A8901.
Faculty of technology management 07.12.2009 Information Technology Service Oriented Communications CT30A8901 SOA, case Google Written by: Sampo Syrjäläinen, 0337918 Jukka Hilvonen, 0337840 1 Contents 1.
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM IRANDOC CASE STUDY Ammar Jalalimanesh a,*, Elaheh Homayounvala a a Information engineering department, Iranian Research Institute for
Configuring Apache Derby for Performance and Durability Olav Sandstå
Configuring Apache Derby for Performance and Durability Olav Sandstå Database Technology Group Sun Microsystems Trondheim, Norway Overview Background > Transactions, Failure Classes, Derby Architecture
Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies
Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Spam
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult
Apache Hama Design Document v0.6
Apache Hama Design Document v0.6 Introduction Hama Architecture BSPMaster GroomServer Zookeeper BSP Task Execution Job Submission Job and Task Scheduling Task Execution Lifecycle Synchronization Fault
1 File Processing Systems
COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.
Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition
Brochure More information from http://www.researchandmarkets.com/reports/2170926/ Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd
How To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző [email protected] 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
Understanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
How can we discover stocks that will
Algorithmic Trading Strategy Based On Massive Data Mining Haoming Li, Zhijun Yang and Tianlun Li Stanford University Abstract We believe that there is useful information hiding behind the noisy and massive
Unit 2 Research Project. Eddie S. Jackson. Kaplan University. IT530: Computer Networks. Dr. Thomas Watts, PhD, CISSP
Running head: UNIT 2 RESEARCH PROJECT 1 Unit 2 Research Project Eddie S. Jackson Kaplan University IT530: Computer Networks Dr. Thomas Watts, PhD, CISSP 08/19/2014 UNIT 2 RESEARCH PROJECT 2 Abstract Application
How To Perform Predictive Analysis On Your Web Analytics Data In R 2.5
How to perform predictive analysis on your web analytics tool data June 19 th, 2013 FREE Webinar by Before we start... www Q & A? Our speakers Carolina Araripe Inbound Marketing Strategist @Tatvic http://linkd.in/yazvvn
FileMaker 11. ODBC and JDBC Guide
FileMaker 11 ODBC and JDBC Guide 2004 2010 FileMaker, Inc. All Rights Reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker is a trademark of FileMaker, Inc. registered
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Summer Internship 2013 Group No.4-Enhancement of JMeter Week 1-Report-1 27/5/2013 Naman Choudhary
Summer Internship 2013 Group No.4-Enhancement of JMeter Week 1-Report-1 27/5/2013 Naman Choudhary For the first week I was given two papers to study. The first one was Web Service Testing Tools: A Comparative
Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk [email protected] Tom Kelsey ID5059-19-B &
