SIGNIFICANCE OF CLASSIFICATION TECHNIQUES IN PREDICTION OF LEARNING DISABILITIES
|
|
|
- Debra Bryan
- 10 years ago
- Views:
Transcription
1 SIGNIFICANCE OF CLASSIFICATION TECHNIQUES IN PREDICTION OF LEARNING DISABILITIES Julie M. David 1 and Kannan Balakrishnan 2 1 MES College, Aluva, Cochin , India [email protected] 2 Cochin University of Science & Technology, Cochin , India [email protected] ABSTRACT The aim of this study is to show the importance of two classification techniques, viz. decision tree and clustering, in prediction of learning disabilities (LD) of school-age children. LDs affect about 10 percent of all children enrolled in schools. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Decision trees and clustering are powerful and popular tools used for classification and prediction in Data mining. Different rules extracted from the decision tree are used for prediction of learning disabilities. Clustering is the assignment of a set of observations into subsets, called clusters, which are useful in finding the different signs and symptoms (attributes) present in the LD affected child. In this paper, J48 algorithm is used for constructing the decision tree and K-means algorithm is used for creating the clusters. By applying these classification techniques, LD in any child can be identified. KEYWORDS Clustering, Data Mining, Decision Tree, K-means, Learning Disability (LD). 1. INTRODUCTION Data mining is a collection of techniques for efficient automated discovery of previously unknown, valid, novel, useful and understandable patterns in large databases. Conventionally, the information that is mined is denoted as a model of the semantic structure of the datasets. The model might be utilized for prediction and categorization of new data [1]. In recent years the sizes of databases has increased rapidly. This has lead to a growing interest in the development of tools capable in the automatic extraction of knowledge from data. The term Data Mining or Knowledge Discovery in databases has been adopted for a field of research dealing with the automatic discovery of implicit information or knowledge within databases [16]. Diverse fields such as marketing, customer relationship management, engineering, medicine, crime analysis, expert prediction, web mining and mobile computing besides others utilize data mining [7]. Databases are rich with hidden information, which can be used for intelligent decision making. Classification and prediction are two forms of data analysis that can be used to extract models describing important data classes or to predict future data trends [8]. Classification is a data mining (machine learning) technique used to predict group membership for data instances. Machine learning refers to a system that has the capability to automatically learn knowledge from experience and other ways [4]. Classification predicts categorical labels whereas prediction models continuous valued functions. Classification is the task of generalizing known structure to
2 apply to new data while clustering is the task of discovering groups and structures in the data that are in some way or another similar, without using known structures in the data. Decision trees are supervised algorithms which recursively partition the data based on its attributes, until some stopping condition is reached [8]. This recursive partitioning, gives rise to a tree-like structure. Decision trees are white boxes as the classification rules learned by them can be easily obtained by tracing the path from the root node to each leaf node in the tree. Decision trees are very efficient even with the large volumes data. This is due to the partitioning nature of the algorithm, each time working on smaller and smaller pieces of the dataset and the fact that they usually only work with simple attribute-value data which is easy to manipulate. The Decision Tree Classifier (DTC) is one of the possible approaches to multistage decision-making. The most important feature of DTCs is their capability to break down a complex decision making process into a collection of simpler decisions, thus providing a solution, which is often easier to interpret [17]. Clustering is the one of the major data mining tasks and aims at grouping the data objects into meaningful classes or clusters such that the similarity of objects within clusters is maximized and the similarity of objects from different clusters is minimized [10]. Clustering separates data into groups whose members belong together. Each object is assigned to the group it is most similar to. Cluster analysis is a good way for quick review of data, especially if the objects are classified into many groups. Clustering does not require a prior knowledge of the groups that are formed and the members who must belong to it. Clustering is an unsupervised algorithm [6]. Clustering is often confused with classification, but there is some difference between the two. In classification the objects are assigned to pre defined classes, whereas in clustering the classes are also to be defined [11]. 2. LEARNING DISABILITY LD is a neurological condition that affects a child's brain and impairs his ability to carry out one or many specific tasks. These like children are neither slow nor mentally retarded. An affected child can have normal or above average intelligence. This is why a child with a learning disability is often wrongly labeled as being smart but lazy. LDs affect about 10 percent of all children enrolled in schools. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Pediatricians are often called on to diagnose specific learning disabilities in school- age children. Learning disabilities affect children both academically and socially. These may be detected only after a child begins school and faces difficulties in acquiring basic academic skills [11]. Learning disability is a general term that describes specific kinds of learning problems. Specific learning disabilities have been recognized in some countries for much of the 20 th century, in other countries only in the latter half of the century, and yet not at all in other places [11]. A learning disability can cause a person to have trouble learning and using certain skills. The skills most often affected are: reading, writing, listening, speaking, reasoning, and doing math. If a child has unexpected problems or struggling to do any one of these skills, then teachers and parents may want to investigate more. The child may need to be evaluated to see if he or she has a learning disability. Learning disabilities are formally defined in many ways in many countries. However, they usually contain three essential elements: a discrepancy clause, an exclusion clause and an etiologic clause. The discrepancy clause states there is a significant disparity between aspects of specific functioning and general ability; the exclusion clause states the disparity is not primarily due to intellectual, physical, emotional, or environmental problems; and the etiologic clause speaks to causation involving genetic, biochemical, or neurological factors. The most frequent clause used in determining whether a child has a learning disability is the difference between areas of
3 functioning. When a child shows a great disparity between those areas of functioning in which she or he does well and those in which considerable difficulty is experienced, this child is described as having a learning disability [12]. Learning disabilities vary from child to child. One child with LD may not have the same kind of learning problems as another child with LD. There is no "cure" for learning disabilities [14]. They are life-long. However, children with LD can be high achievers and can be taught ways to get around the learning disability. With the right help, children with LD can and do learn successfully. There is no one sign that shows a child has a learning disability. Experts look for a noticeable difference between how well a child does in school and how well he or she could do, given his or her intelligence or ability. There are also certain clues, most relate to elementary school tasks, because learning disabilities tend to be identified in elementary school, which may mean a child has a learning disability. A child probably won't show all of these signs, or even most of them When a LD is suspected based on parent and/or teacher observations, a formal evaluation of the child is necessary. A parent can request this evaluation, or the school might advise it. Parental consent is needed before a child can be tested [12]. Many types of assessment tests are available. Child's age and the type of problem determines the tests that child needs. Just as there are many different types of LDs, there are a variety of tests that may be done to pinpoint the problem. A complete evaluation often begins with a physical examination and testing to rule out any visual or hearing impairment [3]. Many other professionals can be involved in the testing process. The purpose of any evaluation for LDs is to determine child's strengths and weaknesses and to understand how he or she best learns and where they have difficulty [12]. The information gained from an evaluation is crucial for finding out how the parents and the school authorities can provide the best possible learning environment for child. 3. PROPOSED APPROACH This study consists of two parts. In the former part, LD prediction is classified by using decision tree and in the latter part by clustering. J48 algorithm is used in constructing the decision tree and K-means algorithm is used in creating the clusters of LD. A decision is a flow chart like structure, where each internal node denotes a test on an attribute, each branch of the tree represents an outcome of the test and each leaf node holds a class label [8]. The topmost node in a tree is the root node. Decision tree is a classifier in the form of a tree structure where each node is either a leaf node-indicates the value of the target attribute of examples or a decision node specifies some test to be carried out on a single attribute-with one branch and sub tree for each possible outcome of the test[9]. Decision tree can handle high dimensional data. The learning and classification step of decision tree are simple and fast. A decision tree can be used to classify an example by starting at the root of the tree and moving through it until a leaf node, which provides the classification of the instance [17]. In this work we are using the well known and frequently used algorithm J48 for the classification of LD. To classify an unknown instance, it is routed down the tree according to the values of the attributes tested in successive nodes and when a leaf is reached, the instance is classified according to the class assigned to the leaf [17]. Clustering is a tool for data analysis, which solves classification problem. Its object is to distribute cases into groups, so that the degree of association to be strong between members of same clusters and weak between members of different clusters. This way each cluster describes in terms of data collected, the class to which its members belong. Clustering is a discovery tool. It may reveal associations and structure in data which though not previously evident.the results of cluster analysis may contribute to the definition of a formal classification scheme. Clustering helps us to find natural groups of components based on some similarity. Clustering is the assignment of a set of observations into subsets so that observations in the same cluster are similar in some sense.
4 Clustering is a method of unsupervised learning, and a common technique for statistical data analysis used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. 3.1 Classification by Decision Tree Data mining techniques are useful for predicting and understanding the frequent signs and symptoms of behavior of LD. There are different types of learning disabilities. If we study the signs and symptoms (attributes) of LD we can easily predict which attribute is from the data sets more related to learning disability. The first task to handle learning disability is to construct a database consisting of the signs, characteristics and level of difficulties faced by those children. Data mining can be used as a tool for analyzing complex decision tables associated with the learning disabilities. Our goal is to provide concise and accurate set of diagnostic attributes, which can be implemented in a user friendly and automated fashion. After identifying the dependencies between these diagnostic attributes, rules are generated and these rules are then be used to predict learning disability. In this paper, we are using a checklist containing the same 16 most frequent signs & symptoms (attributes) generally used for the assessment of LD [13] to investigate the presence of learning disability. This checklist is a series of questions that are general indicators of learning disabilities. It is not a screening activity or an assessment, but a checklist to focus our understanding of learning disability. The list of 16 attributes used by us in LD prediction is shown in Table 1 below. Table 1. List of Attributes Sl. Attribute Signs & Symptoms of LD No. 1 DR Difficulty with Reading 2 DS Difficulty with Spelling 3 DH Difficulty with Handwriting 4 DWE Difficulty with Written Expression 5 DBA Difficulty with Basic Arithmetic skills 6 DHA Difficulty with Higher Arithmetic skills 7 DA Difficulty with Attention 8 ED Easily Distracted 9 DM Difficulty with Memory 10 LM Lack of Motivation 11 DSS Difficulty with Study Skills 12 DNS Does Not like School 13 DLL Difficulty Learning a Language 14 DLS Difficulty Learning a Subject 15 STL Slow To Learn 16 RG Repeated a Grade Based on the information obtained from the checklist, a data set is generated. This is set is in the form of an information system containing cases, attributes and class. A complete information system expresses all the knowledge available about objects being studied. Decision tree induction is the learning of decisions from class labeled training tuples. Given a data set D = {t 1, t 2,...., t n } where t i = <t i1,.., t ih >. In our study, each tuple is represented by 16 attributes and the class is LD. Then, Decision or Classification Tree is a tree associated with D such that each internal node is labeled with attributes DR, DS, DH, DWE, etc. Each arc is labeled with predicate, which can be applied to the attribute at the parent node. Each leaf node is labeled with a class LD. The basic steps in the decision tree are building the tree by using the training data sets and applying the tree
5 to the new data sets. Decision tree induction is the process of learning about the classification using the inductive approach [8]. During this process we create a new decision tree from the training data. This decision tree can be used for making classifications. Here we are using the J48 algorithm, which is a greedy approach in which decision trees are constructed in a top-down recursive divide and conquer manner. Most algorithms for decision tree approach are following such a top down approach. It starts with a training set of tuples and their associated class labels. The training set is recursively partitioned into smaller subsets as a tree is being built. This algorithm consists of three parameters attribute list, attribute selection method and classification. The attribute list is a list of attributes describing the tuples. Attribute selection method specifies a heuristic procedure for selecting the attribute that best discriminate the given tuples according to the class. The procedure employs an attribute selection measure such as information gain that allows a multi-way splits. Attribute selection method determines the splitting criteria. The splitting criteria tells as which attribute to test at a node by determining the best way to separate or partition the tuples into individual classes. Here we are using the data mining tool weka for attribute selection and classification. Classification is a data mining (Machine Learning) technique, used to predict group membership from data instances [15] Methodology used J48 algorithm is used for classifying the Learning Disability. The procedure consists of three steps viz. (i) data partition based on cross validation test, (ii) attribute list and (iii) attribute selection method based on information gain. Cross validation approach is used for the sub sampling of datasets. In this approach, each record is used the same number of times for training and exactly once for testing. To illustrate this method, first we partition the datasets into two subsets and choose one of the subsets for training and other for testing. Then swap the roles of the subsets so that the previous training set becomes the test set and vice versa. The Information Gain Ratio for a test is defined as follows. IGR (Ex, a) = IG / IV, where IG is the Information Gain and IV is the Gain Ratio [13]. Information gain ratio biases the decision tree against considering attributes with a large number of distinct values. So it solves the drawback of information gain. The classification results are as shown under: Correctly Classified Instances 97 Nos % Incorrectly Classified Instances 28 Nos % The accuracy of the decision tree is given in Table 2 below. Table 2. Accuracy of Decision Tree TP Rate FP Rate Precision Recall F-Measure ROC Class Area N Y The first two columns in the table denote TP Rate (True Positive Rate) and the FP Rate (False Positive Rate). TP Rate is the ratio of low weight cases predicted correctly cases to the total of positive cases. A decision tree formed based on the methodology adopted in this paper is shown in Figure 1 below. It is easy to read a set of rules directly off a decision tree. One rule is generated for each leaf. The antecedent of the rule includes a condition for every node on the path from the root to that leaf and the consequent of the rule is the class assigned by the leaf [17]. This procedure produces rules that are unambiguous in that the order in which they are executed is irrelevant. However in general, rules that are read directly off a decision tree are far more complex than necessary and rules derived from trees are usually pruned to remove redundant tests. The rules are so popular because each rule represents an independent knowledge. New rule can added to an existing rule sets
6 without disturbing them, whereas to add to a tree structure may require reshaping the whole tree. In this section we present a method for generating a rule set from a decision tree. In principle, every path from the root node to the leaf node of a decision tree can be expressed as a classification rule. The test conditions encountered along the path form the conjuncts of the rule antecedent, while the class label at the leaf node is assigned to the rule consequent. The expressiveness of a rule set is almost equivalent to that of a decision tree because a decision tree can be expressed by a set of mutually exclusive and exhaustive rules. 3.2 Classification by Clustering Figure 1. Decision tree We are using the data mining tool weka for clustering. The clustering algorithm K-means is used for classifying LD. In clustering algorithm, K initial pointers are chosen to represent initial cluster centers, all data points are assigned to the nearest one, the mean value of the points in each cluster is computed to form its new cluster centre and iteration continues until there are no changes in the clusters. The K-means algorithms iterates over the whole dataset until convergence is reached Methodology used The K-means algorithm is a most well-known and commonly used partitioning method. It takes the input parameter, K, and partitions a set of N objects into K clusters so that the resulting intra-cluster similarity is high but the inter cluster similarity is low. Cluster similarity is measured in regard to the mean value of the objects in a cluster [8]. The working of algorithm is like it randomly selects the K objects, each of which initially represents cluster mean or center. For each of the remaining objects, an object is assigned to the cluster to which it is the most similar, based on the distance between the objects and the cluster mean. It then computes the new mean for each cluster. This process iterates until the criterion function converges. An important step in most clustering is to select a distance measure, which will determine how the similarity of the two elements is calculated. This will influenced the shape of the clusters, as some elements may be close to one another according to one distance and farther away according to one another. Another important distinction is whether the clustering uses symmetric or asymmetric distances [8]. Many of the distance function have the property that distances are symmetric. Here, we are using the binary variables. A binary variable has two states 0 or 1, where 0 means that variable is absent and 1 means that is present. In this study, we use the partitioning method K- means algorithm, where each cluster is represented by the mean value of the objects in the cluster. In this partitioning method, the database has N objects or data tuples, it constructs K partitions of
7 the data, where each partition represents a cluster and it classifies the data into K groups. Each group contains at least one object and each object must belong to exactly one group. The clustering results obtained by us are shown under: Clustered Instances LD = 0 (No) - 94 Nos % Clustered Instances LD = 1 (Yes) - 31 Nos % The clustering history and the cluster visulizer, indicating LD = Y and LD = N are as shown in Table 3 and in Figure 2 respectively below. Table 3. Clustering history Sl. No Attributes Full Data (125) LD = 0 (No) LD = 1 (Yes) 1 DR DS DH DWE DBA DHA DA ED DM LM DSS DNS DLL DLS STL RG No. of iterations 2 Within cluster sum of squared errors Missing values globally replaced with mean/mode 4. RESULT ANALYSIS Figure 2. Cluster visulizer In this study, we are used 125 real data sets with 16 attributes most of which takes binary values for the LD classifications. J48 algorithms are found very suitable for handling missing values and the key symptoms of LD can easily be predicted. The decision tree is very user friendly architecture compared to other classification methods. J48 decision tree is better in terms of efficiency and complexity. From this study, we have obtained that; decision tree correctly
8 classified 77.6 % of instances. The key symptoms of LD are determined by using the attribute selection method in decision tree. By using decision tree, simple and very effective rules can be formed for LD prediction. It is also found that in case of inconsistent data, decision tree provides no solution. The accuracy of decision making can also be improved by applying the rules formulated from the tree. On comparing with our other recent studies focused on RST, SVM & MLP, Decision tree is found best in terms of efficiency and complexity, From the study, it is also found that clustering, as one of the first step in data mining analysis, identifies groups of related records that can be used as a starting points for exploring further relationship. This technique supports the development of classification models of LD such as LD-Yes or LD-No and also formed the attribute clusters present in LD-Yes and LD-No. From the results obtained from clustering classification, we found the importance of attributes in predicting LD. In clustering also we have used the same 125 real data sets with 16 attributes. 5. COMPARISON OF RESULTS In this study, we are used the algorithms J48 and K-means for prediction of LD in children. The results obtained from this study are compared with the output of a similar study conducted by us using Rough Set Theory (RST) with LEM1 algorithm. From these, we have seen that, the rules generated based on decision tree is more powerful than those of rough set theory. From the comparison of results, we have also noticed that, decision tree algorithm, J48, has a number of advantages over RST with LEM1 algorithm for solving the similar nature of problems. For large data sets, there may be chances of some incomplete data or attributes. In data mining concept, it is difficult to mine rules from these incomplete data sets. In decision tree, the rules formulated will never influenced by any such incomplete datasets or attributes. Hence, LD can easily be predicted by using the methods adopted by us. The other benefit of decision tree concept is that it leads to significant advantages in many areas including knowledge discovery, machine learning and expert system. Also it may act as a knowledge discovery tool in uncovering rules for the diagnosis of LD affected children. The importance of this study is that, using a decision tree we can easily predict the key attributes (signs and symptoms) of LD and can predict whether a child has LD or not. For very large data set, the number of clusters can easily be identified using clustering method. Obviously, as the school class strength is 40 or so, the manpower and time needed for the assessment of LD in children is very high. But using the techniques adopted by us, we can easily predict the learning disability of any child. Decision tree approach shows, its capability in discovering knowledge behind the LD identification procedure. The main contribution of this study is the selection of the best attributes that has the capability to predict LD. In best of our knowledge, none of the rules discovered in this type of study, so far, have minimum number of attributes, as we obtained, for prediction of LD. The discovered rules also prove its potential in correct identification of children with learning disabilities. 6. CONCLUSION AND FUTURE RESEARCH In this paper, we consider an approach to handle learning disability database to predict frequent signs and symptoms of the learning disability in school age children. This study mainly focuses on two classification techniques, decision tree and clustering, because accuracy of decision-making can be improved by applying these methods. This study has been carried out on 125 real data sets with most of the attributes takes binary values and more work need to be carried out on quantitative data as that is an important part of any data set. In future, more research is required to
9 apply the same approach for large data set consisting of all relevant attributes. This study is a true comparison of the proposed approach by applying it to large datasets and analyzing the completeness and effectiveness of the generated rules. J48 decision tree application on discrete data and twofold test shows that it is better than RST in terms of efficiency and complexity. J48 decision tree has to be applied on continuous or categorical data. Noise effects and their elimination have to be studied. The results from the experiments on these small datasets suggests that J48 decision tree can serve as a model for classification as it generates simpler rules and remove irrelevant attributes at a stage prior to tree induction. By using clustering method, the number of clusters can easily be identified in case of very large data sets. In this paper, we are considering an approach to handle learning disability database and predicting the learning disability in school age children. Our future research work focuses on, fuzzy sets, to predict the percentage of LD, in each child, thus to explore the possibilities of getting more accurate and effective results in prediction of LD. REFERENCES [1] A. Kothari, A. Keskar (2009) Paper on Rough Set Approach for Overall Performance Improvement of an Unsupervised ANN - Based Pattern Classifier, Journal on Advanced Computational Intelligence and Intelligent Information, Vol. 13, No.4, pp [2] Blackwell Synergy (2006) Learning Disabilities Research Practices, Volume 22 [3] C. Carol, K. Doreen (1993) Children and Young People with Specific Learning Disabilities: Guides for Special Education, Vol. 9, UNESCO [4] D.K. Roy, L.K. Sharma (2010) Genetic k-means clustering algorithm for mixed numeric and categorical data sets, International Journal of Artificial Intelligence & Applications, Vol 1, No. 2, pp [5] Frawley, Piaatetsky (1996) Shaping Knowledge Discovery in Database;An Overview. The AAAI/MIT press, Menlo Park [6] G. Palubinskas, X. Descombes, Kruggel (1998) Paper on An unsupervised clustering method using the entropy minimization, In: Proceedings. Fourteenth International Conference on Pattern Recognition [7] H. Chen, S.S. Fuller, C. Friedma, W. Hersh (2005) Knowledge Discovery in Data Mining and Text Mining in Medical Informatics, pp [8] H. Jiawei and K. Micheline (2008) Data Mining-Concepts and Techniques, Second Edition, Morgan Kaufmann - Elsevier Publishers, ISBN: [9] H. Witten Ian, F. Ibe (2005) Data Mining Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann/Elsevier Publishers, ISBN: 13: [10] I. Xu, D. Ho, L.A. Capretz (2010) Empirical study on the procedure to derive software quality estimation models, International Journal of Computer Science & Information Technology, Vol. 2, No. 4, pp 1-16 [11] Julie M. David, Pramod K.V (2008) Paper on Prediction of Learning Disabilities in School Age children using Data Mining Techniques, In: Proceedings of AICTE Sponsored National Conference on Recent Developments and Applications of Probability Theory, Random Process and Random Variables in Computer Science, T. Thrivikram, P. Nagabhushan, M.S. Samuel (eds), pp [12] Julie M. David, Kannan Balakrishnan (2009) Paper on Prediction of Frequent Signs of Learning Disabilities in School Age Children using Association Rules. In: Proceedings of the International
10 Conference on Advanced Computing, ICAC 2009, MacMillion Publishers India Ltd., NYC, ISBN 10: , ISBN 13: , pp [13] Julie M. David, Kannan Balakrishnan (2010) Paper on Prediction of Learning Disabilities in School Age Children using Decision Tree. In: Proceedings of the International Conference on Recent Trends in Network Communications- Communication in Computer and Information Science, Vol 90, Part - 3 N. Meghanathan, Selma Boumerdassi, Nabendu Chaki, Dhinaharan Nagamalai (eds), Springer- Verlag Berlin Heidelberg, ISSN: (print) (online), ISBN (print) (online), DOI : / _55, pp [14] M. Chapple, About.com Guide, [15] R. Paige, (Secretary) (2002) US Department of Education. In: Twenty-fourth Annual Report to Congress on the Implementation of the Individuals with disabilities Education Act-To Assure the Free Appropriate Public Education of all Children with Disabilities [16] S.J. Cunningham, G Holmes (1999) Developing innovative applications in agricultural using data mining. In: The Proceedings of the Southeast Asia Regional Computer Confederation Conference [17] T. Pang-Ning, S.Michael, K. Vipin, (2008) Introduction to Data Mining, Low Price edn. Pearson Education, Inc., London, ISBN Julie M. David born in 1976 received the Masters degree in Computer Applications (MCA) from Bharathiyar University, Coimbatore, India and M.Phil degree in Computer Science from Vinayaka Missions University, Salem, India in 2000 and 2008 respectively. She is currently pursuing Ph. D in the area of Data Mining at Cochin University of Science and Technology, Cochin, India. During she was with Mahatma Gandhi University, Kottayam, India as a Lecturer in the Department of Computer Applications. She is now with MES College, Aluva, Cochin, India as an Asst. Professor in the Department of Computer Applications. She has published several papers in international and national conference proceedings. Her research interest includes Data Mining, Artificial Intelligence and Machine Learning. She is a member of International Association of Engineers and a reviewer of Elsevier Knowledge Based Systems. Dr. Kannan Balakrishnan born in 1960 received the M.Sc and M. Phil degrees in Mathematics from University of Kerala, India, M. Tech degree in Computer and Information Science from Cochin University of Science & Technology, Cochin, India and Ph. D in Futures Studies from University of Kerala, India in 1982, 1983, 1988 and 2006 respectively. He is currently working with Cochin University of Science & Technology, Cochin, India, as an Associate Professor (Reader) in the Department of Computer Applications. He has visited Netherlands as part of a MHRD project on Computer Networks. Also he visited Slovenia as the co-investigator of Indo-Slovenian joint research project by Department of Science and Technology, Government of India. He has published several papers in international journals and national and international conference proceedings. His present areas of interest are Graph Algorithms, Intelligent Systems, Image Processing, CBIR and Machine Translation. He is a reviewer of American Mathematical Reviews. He is a recognized Research Guide in the Faculties of Technology and Science in the Cochin University of Science and Technology, Cochin, India. He has served in many academic bodies of various universities in Kerala, India. Also currently he is a member of the Board of Studies of Cochin, Calicut and Kannur Universities in India. He is also a member of MIR labs India.
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
Comparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
Rule based Classification of BSE Stock Data with Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Classification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil [email protected] 2 Network Engineering
Data Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
Data Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
Classification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Lecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers
Data Mining as a tool to Predict the Churn Behaviour among Indian bank customers Manjit Kaur Department of Computer Science Punjabi University Patiala, India [email protected] Dr. Kawaljeet Singh University
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India
Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One
Classification On The Clouds Using MapReduce
Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal [email protected] Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal [email protected]
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS
FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
Professor Anita Wasilewska. Classification Lecture Notes
Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. [email protected]
IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR Bharti S. Takey 1, Ankita N. Nandurkar 2,Ashwini A. Khobragade 3,Pooja G. Jaiswal 4,Swapnil R.
Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS
PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao ABSTRACT Department of Computer Engineering, Fr.
Data Mining with R. Decision Trees and Random Forests. Hugh Murrell
Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge
Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
Empirical Study of Decision Tree and Artificial Neural Network Algorithm for Mining Educational Database
Empirical Study of Decision Tree and Artificial Neural Network Algorithm for Mining Educational Database A.O. Osofisan 1, O.O. Adeyemo 2 & S.T. Oluwasusi 3 Department of Computer Science, University of
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
D A T A M I N I N G C L A S S I F I C A T I O N
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
Mining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
Towards applying Data Mining Techniques for Talent Mangement
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India [email protected]
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
K-Means Clustering Tutorial
K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM
AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo [email protected],[email protected]
DATA MINING APPROACH FOR PREDICTING STUDENT PERFORMANCE
. Economic Review Journal of Economics and Business, Vol. X, Issue 1, May 2012 /// DATA MINING APPROACH FOR PREDICTING STUDENT PERFORMANCE Edin Osmanbegović *, Mirza Suljić ** ABSTRACT Although data mining
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
Predicting Student Performance by Using Data Mining Methods for Classification
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance
Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors
Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann
An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
Decision-Tree Learning
Decision-Tree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: Top-Down Induction of Decision Trees Numeric Values Missing Values
Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.
International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant
Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report
Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report G. Banos 1, P.A. Mitkas 2, Z. Abas 3, A.L. Symeonidis 2, G. Milis 2 and U. Emanuelson 4 1 Faculty
AnalysisofData MiningClassificationwithDecisiontreeTechnique
Global Journal of omputer Science and Technology Software & Data Engineering Volume 13 Issue 13 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals
Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data
Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,
First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms
First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms Azwa Abdul Aziz, Nor Hafieza IsmailandFadhilah Ahmad Faculty Informatics & Computing
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan
International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
Random forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
Performance Analysis of Decision Trees
Performance Analysis of Decision Trees Manpreet Singh Department of Information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Sonam Sharma CBS Group of Institutions, New Delhi,India
Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition Data
Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 2 nd, 2014 Classification of Titanic Passenger Data and Chances of Surviving the Disaster Data Mining with Weka and Kaggle Competition
LVQ Plug-In Algorithm for SQL Server
LVQ Plug-In Algorithm for SQL Server Licínia Pedro Monteiro Instituto Superior Técnico [email protected] I. Executive Summary In this Resume we describe a new functionality implemented
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
MHI3000 Big Data Analytics for Health Care Final Project Report
MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given
How To Understand The Impact Of A Computer On Organization
International Journal of Research in Engineering & Technology (IJRET) Vol. 1, Issue 1, June 2013, 1-6 Impact Journals IMPACT OF COMPUTER ON ORGANIZATION A. D. BHOSALE 1 & MARATHE DAGADU MITHARAM 2 1 Department
Data Mining and Business Intelligence CIT-6-DMB. http://blackboard.lsbu.ac.uk. Faculty of Business 2011/2012. Level 6
Data Mining and Business Intelligence CIT-6-DMB http://blackboard.lsbu.ac.uk Faculty of Business 2011/2012 Level 6 Table of Contents 1. Module Details... 3 2. Short Description... 3 3. Aims of the Module...
Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
Data Mining based on Rough Set and Decision Tree Optimization
Data Mining based on Rough Set and Decision Tree Optimization College of Information Engineering, North China University of Water Resources and Electric Power, China, [email protected] Abstract This paper
REVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
