DEA implementation and clustering analysis using the K-Means algorithm
|
|
|
- Ann Day
- 10 years ago
- Views:
Transcription
1 Data Mining VI 321 DEA implementation and clustering analysis using the K-Means algorithm C. A. A. Lemos, M. P. E. Lins & N. F. F. Ebecken COPPE/Universidade Federal do Rio de Janeiro, Brazil Abstract Nowadays, problems that involve efficiency analysis and decision support systems inside a company need special attention and a number of tools have been developed to support managers. DEA Data Envelopment Analysis is one of these tools and its use is increasing in research and in new developments. The problem is how to improve the quality of DEA analysis when the DMU (decision-making unit) it analyzes is considered efficient, and how to guarantee the analysis if the input and output parameters that contain a lot of zeros? Probably these parameters have not been considered in how to visualize the inputs and outputs in n-dimensional space? This paper proposes combining another tool with DEA based in data mining, CLUSTERING, to evaluate the efficiency analyses made for DEA tools, and visualize groups which have inefficient DMUs, based on the K-Means algorithm, and apply over a telecommunication database that contains an indicator of efficiency of the telephone installation in the Brazilian market. Keywords: Data Envelopment Analysis, clustering, data mining, telecommunication quality indicator, decision support system. 1 Introduction Problems that involve efficiency analysis inside a company need to have special attention. Tools are being development to support managers. Some companies use complex formulations based on traditional statistical methods and others are using new environments based on computational intelligence and others tools. DEA [2] is one of these tools that obtain relative efficiency between two or more companies, departments or groups. The problem in DEA is how to improve the quality of analysis when the DMU (decision-making unit) it analyzes is
2 322 Data Mining VI considered efficient. In this paper we will present and discuss one possibility to improve DEA analysis making a pre-processing in data using intelligent computational toll based on clustering. 2 DEA: Data Envelopment Analysis DEA uses a linear programming approach to identify the efficient DMUs (decision making units), those units that make the most efficient use of inputs to produce outputs. The efficiency units consist of a frontier among all DMUs. The efficiencies of the DMUs are measured by projecting to this frontier. The DEA model in its original form represents the performance of efficiency of the DMU as the ratio of weighted outputs to weight inputs [3]. To date, the DEA literature has developed numerous models and detailed discussion can be found in [2,3]. Essentially, various models for DEA seek to establish which subset of DMUs determines an envelopment surface and address how to characterize each DMU by an efficiency score. There are two basic models: CRS constant returns to scale and VRS variable returns to scale. Both are presented below [2]. 2.1 The constant returns to scale (CRS) DEA Model This method was proposed by Charnes, Cooper and Rhodes (CCR models ) where the term DEA data envelopment analysis, was first used. This first approach uses input orientation and assumes constant return to scale. Later, others papers have considered alternatives sets of assumptions. Suppose N data points (DMUs) are to be evaluated. Assume there are data on K inputs and M outputs for each DMU. For the i-th DMU they are represented by column vector x i and y i, respectively. The K x N input matrix, X and M x N output matrix, Y, represent the data for all DMUs. An intuitive way to introduce DEA is via the ratio form. For each DMU, we would like to obtain a measure of the ratio of all outputs over all inputs, such as u y i /v x i, where u is a M x 1 vector of output weights and v is a K x 1 vector of input weights. The optimal weights are obtained by solving the mathematical programming problem: max u, v ( u yi st u y j / v x u, v 0. / v x ), j, i j = 1,2,..., N (1) This involves finding values for u and v, such that the efficiency measure for the i-th firm is maximised, subject to the constraints that all efficiency measure must be less than or equal to one The problems of slacks The piece-wise linear form of nom-parametric frontier in DEA can cause few difficulties in efficiency measurement. The problem arises because the sections
3 Data Mining VI 323 of the piece-wise linear frontier that run parallel to the axes. This problem can give us incorrect analysis (inefficient Pareto frontier). The CRS model assumption is only appropriate when all firms are operating at an optimal scale. Imperfect competition, constrains on finance, etc., may cause a DMU to be not operating at optimal scale. 2.2 The variable returns to scale (VRS) DEA model Banker, Charnes and Cooper 1984 (BCC model ), suggest an extension of the CRS DEA model to account for variable returns to scale situation. The CRS linear programming problem (eq. 2) can be easily modified to account for VRS by adding the convexity constraint N1 λ=1: min Θ, λ Θ, yi + Yλ 0, Θxi Xλ 0, st N1 λ = 1, λ 0. (2) where: N1 is an N x 1 vector of ones. This approach forms a convex hull of intersecting planes which envelope the data points more tightly than the CRS conical hull and thus provides technical efficiency scores which are greater than or equal to those obtained using the CRS model. The VRS specification has been the most commonly used specification in the 1990s [2]. 3 Clustering: K-means algorithm Clustering is a toll to data mining used to classify things that have similar characteristics, and the output takes the form of a diagram that shows how the instances are inside into cluster. In the simplest case this involves associating a cluster number with each instance, which might be depicted by laying the instances out in two dimensions and partitioning the space to show each cluster. Some clustering algorithms allow one instance to belong to more than one cluster, so the diagram might lay the instances out in two dimensions and draw overlapping subnets representing each cluster. Others, associate instances with clusters probabilistically rather than categorically. In this case, for every instance there is a probability or a degree of membership with which it belongs to each cluster (fuzzy clustering). Some algorithms produce a hierarchical structure of cluster [6]. There are a lot of applications of the K-mean Clustering, from unsupervised learning of Neural Network, Pattern Recognitions, Classification Analysis,
4 324 Data Mining VI Artificial Intelligent, Image Processing, etc In principle, if you have several objects and each object has attributes and you want to classify the objects based on the attributes, then you can apply this algorithm. 3.1 K-means algorithm How K-means clustering works If the number of data is less than the number of clusters then we assign each data as the centroid of the cluster. Each centroid will have a cluster number. If the number of data is bigger than the number of cluster, for each data, we calculate the distance to all centroid and get the minimum distance. This data is said to belong to the cluster to another that has minimum distance from this data. Since we are not sure about the location of the centroid, we need to adjust the centroid location based on the current update data. Then we assign all the data to this new centroid. This process is repeated until no data is moving to another cluster anymore. Mathematically, this loop can be proved to be convergent. The ref. [8] has an example to k-mean algorithm in Visual Basic code Weakness of K-mean clustering Similar to other algorithms, K-mean clustering has many weaknesses: When the number of data are not so many, initial grouping will determine the cluster significantly; The number of cluster, K, must be determined before hand; We never know the real cluster, using the same data. If it is input in a different way it may produce a different cluster if the number of data is few; We never know which attribute contributes more to grouping process since we assume that each attribute has the same weight. 4 The databases DEA needs a data base where found inputs and outputs about specific DMU. In our research about the telecom manager indicator, we created a specific database to test and compare the methodologies proposal in this paper. Table 1 shows the database implemented by date from ref. [9] and [10]: DMUs: Number of Decision Making Units: 34 DMUs are telecommunications operation company in Brazil, acting in fixed telephony service. INPUTs: POPulation Number of inhabitants per region or state [POP] Cities NUmber Inside the state or region [CNU] Total Area: state or Region - (Km 2 ) [TAR] Index of Urban Concentration [IUC] OUTPUT: Number of Fix Telephone per state or region [NFT]
5 Data Mining VI 325 Table 1: Database: DEA efficiency. DMU Ref Region State INPUT OUTPUT POP CNU TAR ICU NFT 1 RJ Region I RJ ,05 0, MG Region I MG ,18 0, MG Region I MG ,11 0, ES Region I ES ,52 0, BA Region I BA ,67 0, SE Region I SE ,35 0, AL Region I AL ,66 0, PE Region I PE ,62 0, PB Region I PB ,84 0, RN Region I RN ,79 0, CE Region I CE ,60 0, PI Region I PI ,19 0, MA Region I MA ,29 0, PA Region I PA ,52 0, AP Region I AP ,59 0, AM Region I AM ,68 0, RR Region I RR ,98 0, SC Region II SC ,18 0, PR Region II PR ,01 0, PR Region II PR ,84 0, MS Region II MS ,65 0, MS Region II MS ,31 0, MT Region II MT ,91 0, GO Region II TO ,97 0, GO Region II GO ,73 0, DF Region II DF ,94 0, RO Region II RO ,17 0, AC Region II AC ,39 0, RS Region II RS ,34 0, RS Region II RS ,20 0, SP Region III SP ,87 0, SP Region III SP ,85 0, SP Region III SP ,03 0, SP Region III SP ,68 0,
6 326 Data Mining VI 5 Experiments and results If you look at the numbers in Table 1, it is possible to see a great variation between the lowest and the biggest values. Therefore, the fist thing is to normalize the database. After this we put the data in EMS software [4] and calculate the efficiency score using the basics DEA models. After that we convert the data base to ARFF format file and clustering using WEKA software [5]. The experiment follows the flowchart indicated in Figure 1. Normalized Database Table 1 Get the ARFF Format for Clustering. (WEKA software) Get the Basics DEA Models (EMS software). CRS/RAD/IN - VRS/RAD/IN Graphs Generation & Results Analysis Tables 2, 3 Figure 1: DEA x clustering. 5.1 Clustering database Figure 2 and Figure 3 show the results of cluster analysis using WEKA software. In Table 2 we can see the DMUs and the clusters they belong, since the output of software is colored. Figure 2: Clusters: population (x axis) x Number Fix Phone (y axis). Figure 3: Clusters: Number Cities (x axis) x Number Fix Phone (y axis).
7 Data Mining VI 327 Table 2: DEA efficiency and clustering. DMU Ref Region State DEA CRS Efficiency DEA VRS Clusters Efficiency Figure 2 Figure 3 1 RJ Region I RJ 100,00% 100,00% II II 2 MG Region I MG 61,90% 87,30% II III 3 MG Region I MG 54,90% 87,60% V VI 4 ES Region I ES 68,20% 95,00% V VI 5 BA Region I BA 49,20% 94,70% III IV 6 SE Region I SE 37,80% 100,00% V VI 7 AL Region I AL 30,00% 100,00% V VI 8 PE Region I PE 44,50% 89,30% IV V 9 PB Region I PB 34,20% 94,00% V V 10 RN Region I RN 37,90% 93,60% V V 11 CE Region I CE 37,80% 93,70% IV V 12 PI Region I PI 29,10% 97,80% V V 13 MA Region I MA 25,80% 100,00% IV V 14 PA Region I PA 32,80% 96,00% IV V 15 AP Region I AP 40,20% 87,10% V VI 16 AM Region I AM 39,50% 88,40% V VI 17 RR Region I RR 45,40% 100,00% V VI 18 SC Region II SC 81,60% 94,90% IV V 19 PR Region II PR 77,50% 86,60% III IV 20 PR Region II PR 64,00% 100,00% III VI 21 MS Region II MS 61,10% 82,80% V VI 22 MS Region II MS 60,20% 100,00% V VI 23 MT Region II MT 52,00% 82,60% V V 24 GO Region II TO 63,00% 89,40% IV V 25 GO Region II GO 3,80% 100,00% IV VI 26 DF Region II DF 100,00% 100,00% V VI 27 RO Region II RO 43,20% 100,00% V VI 28 AC Region II AC 40,50% 100,00% V VI 29 RS Region II RS 76,50% 85,60% V IV 30 RS Region II RS 62,40% 99,30% V VI 31 SP Region III SP 100,00% 100,00% I I 32 SP Region III SP 75,60% 99,40% V VI 33 SP Region III SP 75,70% 100,00% V VI 34 SP Region III SP 82,60% 92,60% V VI
8 328 Data Mining VI 5.2 Analysis In Table 2 we can see the result of EMS software (DEA Efficiency) and the result of WEKA software (CLUSTERING). We put in bold letters the efficiency 100%, in both DEA basic models: CRS and VRS. In Table 3 the DMUs are classified inside of the respective cluster that they had been found of the proper data. Table 3: DMUs clustering. Graf Graf Cluster I Cluster II Cluster III DMU- DMU-1(*) DMU-5 31(*) DMU-2 DMU-19 DMU-20 DMU- 31(*) Cluster IV DMU-8 DMU-11 DMU-13 DMU-14 DMU-18 DMU-24 DMU-25 DMU-1(*) DMU-2 DMU-5 DMU-19 DMU-29 Cluster V DMU-3 DMU-4 DMU-6 DMU-7 DMU-9 DMU-10 DMU-12 DMU-15 DMU-16 DMU-17 DMU-21 DMU-22 DMU-23 DMU- 26(*) DMU-27 DMU-28 DMU-29 DMU-30 DMU-32 DMU-33 DMU-34 DMU-8 DMU-9 DMU-10 DMU-11 DMU-12 DMU-13 DMU-14 DMU-18 DMU-23 DMU-24 (*) DMU S who get 100% efficiency in basic DEA models CRS and VRS. Cluster VI - DMU-3 DMU-4 DMU-6 DMU-7 DMU-15 DMU-16 DMU-17 DMU-20 DMU-21 DMU-22 DMU-25 DMU- 26(*) DMU-27 DMU-28 DMU-30 DMU-32 DMU-33 DMU-34
9 Data Mining VI Conclusion In DEA analysis with more than one input and one or two outputs, we have difficulty to visualize the behavior of data sets. The analysis of data set improves when a cluster algorithms is added. With the information obtained by clustering, we can return to DEA software and perform the analysis in a more homogeneous group. This prevents the problem of slacks mentioned before. Using clustering software we can see the problem for different parameters and plot graphs to assist the analysis. Looking for DMU 31 we can identify outstandard DMU, probably a benchmark DMU. This DMU needs a specific analysis, and included in other cluster will be problem. We can do the same analysis for all groups and graphs and improve the DEA analysis. Clustering analysis combined with DEA analysis is a very interesting tool, reducing the numbers of variables that decides if DMU is efficiency or not, improve the visualization of variables and making a coherent and a homogeneous comparison. References [1] Banker, R.D., A. Chanes, W, W Cooper, Some Models for estimating Technical and Scale Inefficiencies In Data Envelopment Analysis, Management Science. [2] Coelli, T., Prasada Rao, George Battese An Introduction To Efficiency and productivity Analysis, Kluwer Academic Publishers, Boston. [3] Cooper, W., Laurence Seiford, Kaoru Tone Data Envelopmente Analysis: A comprehensive text with models, applications, references and DEA-solver software. Dordrecht, Netherlands: Kluwer Academic publishers. [4] Scheel, Holger, A Guide for EMS Version 1.3: A Data Envelopment Analysis (Computer Program). University Dortmund Germany [5] Written, I. H, A Guide for WEKA Wikato Environment for knowledge Analysis (Computer Program) University of Waikato, New Zealand [6] Written, I. H. Data mining: practical machine learning tools and techniques with java implementations / Ian H. Witten, Eibe Frank. [7] Dulá, J. H., Computation in DEA School of Business Administrations University of Mississippi [8] Teknomo, Kardi, K-Mean Clustering. [9] ANATEL Brazilian Bureau of Telecommunication [10] IBGE Brazilian Institute of Geography and Statistics-
Assessing Container Terminal Safety and Security Using Data Envelopment Analysis
Assessing Container Terminal Safety and Security Using Data Envelopment Analysis ELISABETH GUNDERSEN, EVANGELOS I. KAISAR, PANAGIOTIS D. SCARLATOS Department of Civil Engineering Florida Atlantic University
Efficiency in Software Development Projects
Efficiency in Software Development Projects Aneesh Chinubhai Dharmsinh Desai University [email protected] Abstract A number of different factors are thought to influence the efficiency of the software
Hybrid Data Envelopment Analysis and Neural Networks for Suppliers Efficiency Prediction and Ranking
1 st International Conference of Recent Trends in Information and Communication Technologies Hybrid Data Envelopment Analysis and Neural Networks for Suppliers Efficiency Prediction and Ranking Mohammadreza
The efficiency of fleets in Serbian distribution centres
The efficiency of fleets in Serbian distribution centres Milan Andrejic, Milorad Kilibarda 2 Faculty of Transport and Traffic Engineering, Logistics Department, University of Belgrade, Belgrade, Serbia
K-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
A Guide to DEAP Version 2.1: A Data Envelopment Analysis (Computer) Program
A Guide to DEAP Version 2.1: A Data Envelopment Analysis (Computer) Program by Tim Coelli Centre for Efficiency and Productivity Analysis Department of Econometrics University of New England Armidale,
Gautam Appa and H. Paul Williams A formula for the solution of DEA models
Gautam Appa and H. Paul Williams A formula for the solution of DEA models Working paper Original citation: Appa, Gautam and Williams, H. Paul (2002) A formula for the solution of DEA models. Operational
Abstract. Keywords: Data Envelopment Analysis (DEA), decision making unit (DMU), efficiency, Korea Securities Dealers Automated Quotation (KOSDAQ)
, pp. 205-218 http://dx.doi.org/10.14257/ijseia.2015.9.5.20 The Efficiency Comparative Evaluation of IT Service Companies using the Data Envelopment Analysis Approach Focus on KOSDAQ(KOrea Securities Dealers
AN EVALUATION OF FACTORY PERFORMANCE UTILIZED KPI/KAI WITH DATA ENVELOPMENT ANALYSIS
Journal of the Operations Research Society of Japan 2009, Vol. 52, No. 2, 204-220 AN EVALUATION OF FACTORY PERFORMANCE UTILIZED KPI/KAI WITH DATA ENVELOPMENT ANALYSIS Koichi Murata Hiroshi Katayama Waseda
Agri Commodities ABN AMRO Bank NV
Agri Commodities ABN AMRO Bank NV Fausto Caron Head of Commodities Brazil Chicago, June 2013 1 Agenda Brazilian Agriculture A Historical Perspective Infra-Structure: The Brazilian quest for competitiveness
Application of Data Envelopment Analysis Approach to Improve Economical Productivity of Apple Fridges
International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 4 (6): 1603-1607 Science Explorer Publications Application of Data Envelopment
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
Distributed Generation in Electricity Networks
Distributed Generation in Electricity Networks Benchmarking Models and Revenue Caps Maria-Magdalena Eden Robert Gjestland Hooper Endre Bjørndal Mette Bjørndal 2010 I Abstract The main focus of this report
Clustering-Based Method for Data Envelopment Analysis. Hassan Najadat, Kendall E. Nygard, Doug Schesvold North Dakota State University Fargo, ND 58105
Clustering-Based Method for Data Envelopment Analysis Hassan Najadat, Kendall E. Nygard, Doug Schesvold North Dakota State University Fargo, ND 58105 Abstract. Data Envelopment Analysis (DEA) is a powerful
DEA for Establishing Performance Evaluation Models: a Case Study of a Ford Car Dealer in Taiwan
DEA for Establishing Performance Evaluation Models: a Case Study of a Ford Car Dealer in Taiwan JUI-MIN HSIAO Department of Applied Economics and management, I-Lan University, TAIWAN¹, [email protected]
COMPUTATIONS IN DEA. Abstract
ISSN 0101-7438 COMPUTATIONS IN DEA José H. Dulá School of Business Administration The University of Mississippi University MS 38677 E-mail: [email protected] Received November 2001; accepted October 2002
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
ANALYTIC HIERARCHY PROCESS AS A RANKING TOOL FOR DECISION MAKING UNITS
ISAHP Article: Jablonsy/Analytic Hierarchy as a Raning Tool for Decision Maing Units. 204, Washington D.C., U.S.A. ANALYTIC HIERARCHY PROCESS AS A RANKING TOOL FOR DECISION MAKING UNITS Josef Jablonsy
December/2003. Corporate Presentation
December/2003 Corporate Presentation General Overview 1 HIGHLIGHTS Integrated Telecom Service Provider 15.1 million wirelines in service (Dec/03) Over 4.0 million wireless subscribers (Jan/04) Region I
ISYDS INTEGRATED SYSTEM FOR DECISION SUPPORT (SIAD SISTEMA INTEGRADO DE APOIO A DECISÃO): A SOFTWARE PACKAGE FOR DATA ENVELOPMENT ANALYSIS MODEL
versão impressa ISSN 00-7438 / versão online ISSN 678-542 Seção de Software Virgílio José Martins Ferreira Filho Departamento de Engenharia Industrial Universidade Federal do Rio de Janeiro (UFRJ) Rio
Performance Analysis of Coal fired Power Plants in India
Proceedings of the 2010 International Conference on Industrial Engineering and Operations Management Dhaka, Bangladesh, January 9 10, 2010 Performance Analysis of Coal fired Power Plants in India Santosh
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Predictive Dynamix Inc
Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished
Brazil February Production Update and Weekly Crop Condition Report
February 27, 2014 Informa Economics South American Crop Reporting Service Brazil February Production Update and Weekly Crop Condition Report The Informa Economics staff in Brazil conducted its survey between
Efficiency and Productivity of Major Asia-Pacific Telecom Firms
Chang Gung Journal of Humanities and Social Sciences 1:2 (October 2008), 223-245 Efficiency and Productivity of Major Asia-Pacific Telecom Firms Jin-Li Hu Wei-Kai Chu Abstract This paper studies the impacts
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil [email protected] 2 Network Engineering
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Data Mining with Weka
Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Data Mining with Weka a practical course on how to
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA
PERFORMANCE ANALYSIS OF CLUSTERING ALGORITHMS IN DATA MINING IN WEKA Prakash Singh 1, Aarohi Surya 2 1 Department of Finance, IIM Lucknow, Lucknow, India 2 Department of Computer Science, LNMIIT, Jaipur,
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
3Q07 Results Conference Call. November 14 th 2007 I SÃO PAULO
3Q07 Results Conference Call November 14 th 2007 I SÃO PAULO Speakers Cesar Augusto R. Parizotto CEO Marco Antonio R. Parizotto Commercial Vice President Ricardo Perpetuo CFO and IRO José Alexandre Hamer
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Operational Efficiency and Firm Life Cycle in the Korean Manufacturing Sector
, pp.151-155 http://dx.doi.org/10.14257/astl.2015.114.29 Operational Efficiency and Firm Life Cycle in the Korean Manufacturing Sector Jayoun Won 1, Sang-Lyul Ryu 2 1 First Author, Visiting Researcher,
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
How To Predict Web Site Visits
Web Site Visit Forecasting Using Data Mining Techniques Chandana Napagoda Abstract: Data mining is a technique which is used for identifying relationships between various large amounts of data in many
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected]
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected] WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode
A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode Seyed Mojtaba Hosseini Bamakan, Peyman Gholami RESEARCH CENTRE OF FICTITIOUS ECONOMY & DATA SCIENCE UNIVERSITY
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
K-means Clustering Technique on Search Engine Dataset using Data Mining Tool
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 505-510 International Research Publications House http://www. irphouse.com /ijict.htm K-means
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
EMS: Efficiency Measurement System User s Manual
EMS: Efficiency Measurement System User s Manual Holger Scheel Version 1.3 2000-08-15 Contents 1 Introduction 2 2 Preparing the input output data 2 2.1 Using MS Excel files..............................
Clustering Marketing Datasets with Data Mining Techniques
Clustering Marketing Datasets with Data Mining Techniques Özgür Örnek International Burch University, Sarajevo [email protected] Abdülhamit Subaşı International Burch University, Sarajevo [email protected]
CLUSTER ANALYSIS FOR SEGMENTATION
CLUSTER ANALYSIS FOR SEGMENTATION Introduction We all understand that consumers are not all alike. This provides a challenge for the development and marketing of profitable products and services. Not every
THREE DIMENSIONAL GEOMETRY
Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,
European Journal of Operational Research
European Journal of Operational Research 207 (2010) 1506 1518 Contents lists available at ScienceDirect European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor Decision
Visualizing class probability estimators
Visualizing class probability estimators Eibe Frank and Mark Hall Department of Computer Science University of Waikato Hamilton, New Zealand {eibe, mhall}@cs.waikato.ac.nz Abstract. Inducing classifiers
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate
Measuring Technical Efficiency in Research of State Colleges and Universities in Region XI Using Data Envelopment Analysis by Ed D.
9 th National Convention on Statistics (NCS) EDSA Shangri-La Hotel October 4-5, 2004 Measuring Technical Efficiency in Research of State Colleges and Universities in Region XI Using Data Envelopment Analysis
COC131 Data Mining - Clustering
COC131 Data Mining - Clustering Martin D. Sykora [email protected] Tutorial 05, Friday 20th March 2009 1. Fire up Weka (Waikako Environment for Knowledge Analysis) software, launch the explorer window
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
An Introduction to WEKA. As presented by PACE
An Introduction to WEKA As presented by PACE Download and Install WEKA Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html 2 Content Intro and background Exploring WEKA Data Preparation Creating Models/
Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm
Presentation at the 14 th Annual Latin America Conference
Presentation at the 14 th Annual Latin America Conference MARCH, 2006 www.telemar.com.br/ir Telemar at a Glance December / 05 A leading telecommunication services provider in Brazil, offering a full range
Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
MS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Clustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
A Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries
A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries Aida Mustapha *1, Farhana M. Fadzil #2 * Faculty of Computer Science and Information Technology, Universiti Tun Hussein
Technical Efficiency Accounting for Environmental Influence in the Japanese Gas Market
Technical Efficiency Accounting for Environmental Influence in the Japanese Gas Market Sumiko Asai Otsuma Women s University 2-7-1, Karakida, Tama City, Tokyo, 26-854, Japan [email protected] Abstract:
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].
1.3 Neural Networks 19 Neural Networks are large structured systems of equations. These systems have many degrees of freedom and are able to adapt to the task they are supposed to do [Gupta]. Two very
Classification of Learners Using Linear Regression
Proceedings of the Federated Conference on Computer Science and Information Systems pp. 717 721 ISBN 978-83-60810-22-4 Classification of Learners Using Linear Regression Marian Cristian Mihăescu Software
CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data
Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 2 nd, 2014 Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition
An Algorithm for Automatic Base Station Placement in Cellular Network Deployment
An Algorithm for Automatic Base Station Placement in Cellular Network Deployment István Törős and Péter Fazekas High Speed Networks Laboratory Dept. of Telecommunications, Budapest University of Technology
USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS
USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS Koua, E.L. International Institute for Geo-Information Science and Earth Observation (ITC).
The Use of Super-Efficiency Analysis for strategy Ranking
The Use of Super-Efficiency Analysis for strategy Ranking 1 Reza Farzipoor Saen0F Department of Industrial Management, Faculty of Management and Accounting, Islamic Azad University - Karaj Branch, Karaj,
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
The current issue and full text archive of this journal is available at www.emeraldinsight.com/1741-0398.htm
The current issue and full text archive of this journal is available at www.emeraldinsight.com/1741-0398.htm JEIM 662 A data envelopment analysis approach based on total cost of ownership for supplier
Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/
Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/ Email: [email protected] Office: Dipartimento di Ingegneria
Measuring the Relative Efficiency of European MBA Programs:A Comparative analysis of DEA, SBM, and FDH Model
Measuring the Relative Efficiency of European MBA Programs:A Comparative analysis of DEA, SBM, and FDH Model Wei-Kang Wang a1, Hao-Chen Huang b2 a College of Management, Yuan-Ze University, [email protected]
EFFECTS OF BENCHMARKING OF ELECTRICITY DISTRIBUTION COMPANIES IN NORDIC COUNTRIES COMPARISON BETWEEN DIFFERENT BENCHMARKING METHODS
EFFECTS OF BENCHMARKING OF ELECTRICITY DISTRIBUTION COMPANIES IN NORDIC COUNTRIES COMPARISON BETWEEN DIFFERENT BENCHMARKING METHODS Honkapuro Samuli 1, Lassila Jukka, Viljainen Satu, Tahvanainen Kaisa,
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA
315 DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA C. K. Lowe-Ma, A. E. Chen, D. Scholl Physical & Environmental Sciences, Research and Advanced Engineering Ford Motor Company, Dearborn, Michigan, USA
Chapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
