Application of Data Mining Techniques for Improving Software Engineering
|
|
|
- Vanessa Briana Miller
- 10 years ago
- Views:
Transcription
1 Application of Data Mining Techniques for Improving Software Engineering Wahidah Husain 1, Pey Ven Low 2, Lee Koon Ng 3, Zhen Li Ong 4, School of Computer Sciences, Universiti Sains Malaysia USM, Penang 1, {lpv , nlk1001 3, ozl }@student.usm.my Abstract Computer software plays an important role in business, government, societies, and the sciences. To solve realworld problems, it is important to improve the quality and reliability of computer software. Software Engineering is the computing field concern with designing, developing, implementing, maintaining, and modifying software. Software Engineering data consists of sequences, graphs, and text.. In this paper, we study how data mining techniques can be applied in solving Software Engineering problems. These techniques are applied to detect problems such as bugs, to aid in pattern discovery, and to help developers deal with the complexity of existing software, in order to create more failure-free software. At the end of this paper, we suggest specific data mining techniques for each type of Software Engineering data. There are seven steps in the process: data integration, data cleaning, data selection, data transformation, data mining, pattern evaluation and knowledge presentation. Data mining techniques that can be applied in improving SE include generalization, characterization, classification, clustering, associative tree, decision tree or rule induction, frequent pattern mining, and etc. [5]. Keywords Software Engineering, Data Mining, Text Mining, Sequence Mining, Graph Mining T I. INTRODUCTION he economies of all developed countries are dependent on software, because software controls systems that affect our daily needs. Thus, Software Engineering (SE) has become more and more important. SE concerns computer-based system development; this includes system specification, architectural design, integration, and deployment. With the increasing importance of SE, the difficulty of maintaining, creating and developing software has also risen. Challenges in SE include requirement gathering, systems integration and evolution, maintainability, pattern discovery, fault detection, reliability, and complexity of software development [1, 2]. SE data can be categorized into three types: sequences, graphs, and text [3]. Data mining is a process that employs various analytic tools to extract patterns and information from large datasets. Today, large numbers of datasets are collected and stored. Human are much better at storing data than extracting knowledge from it, especially the accurate and valuable information needed to create good software. Large datasets are hard to understand, and traditional techniques are infeasible for finding information from those raw data. Data mining helps scientists in hypothesis formation in biology, physics, chemistry, medicine, and engineering. The data mining process is shown in Fig. 1 below. Figure 1. Data Mining Process [4] The purpose of this study is to explore how data mining techniques can be applied to improve SE. Objectives of this study are: To review the concept of SE and data mining To determine the problems in SE To identify data mining techniques that can be applied to solve SE problems. In this paper, we focus on the issues particular to each type of SE data, and the specific data mining techniques that can solve those problems. We suggest the best data mining technique(s) for each SE data type, which are sequences, graphs, and text. II. BACKGROUND STUDY SE has become increasingly important these days, and research on its problems has proposed various methods for improving SE. The study of SE problems has followed several approaches. Early research of Thayer et al. [6] was concerned with SE project management. They introduced planning 1
2 problems, organizing problems, staffing problems, and controlling problems as major challenges in this area. Ramamoorthy et al. [7] stated that as more complex software applications are required, programmers will fall further behind the demand. This causes the development of poor quality software and higher maintenance costs. The problems stated by Thayer are more closely tied to the processes of SE project management, while those introduced by Ramamoorthy are more closely tied to the limitations of human beings. Later work by Clarke [8] identified challenges in SE associated with the complexity of the software development process. Similar to our study, he stated that the complexity of software development causes the software to become harder to maintain. This leads to other problems such as software integrity and difficulty in detecting application bugs or flaws. A bug is a flaw in a computer program that can ultimately cause glitches, program failure or software destruction. With the increasing importance of SE, data mining has become an important tool for solving SE problems. Data perturbation techniques for preserving privacy in data mining were proposed by Islam and Brankovic [9]. Aouf proposed the clustering technique to identify patterns in the underlying data [10]. Later work by Ma and Chan [11] suggested iterative mining for mining overlapping patterns in noisy data. The three data mining approaches discussed above are all clustering techniques. The data perturbation technique described by Islam and Brankovic [9] involves adding noisy data into some part of the dataset in order to preserve privacy, while Ma and Chan [11] were concerned with the elimination of noisy data to enable extracting valuable information. Different types of data require different data mining techniques. In this paper, we present specific problems that exist in SE and apply a specific data mining technique to solve each of them. III. RESEARCH METHODOLOGY In this research, we decided to use a descriptive approach as our research methodology. The descriptive approach is primarily used for gathering data, with attempting to generate rich descriptions and explanations of the object of study. The project may also include gathering opinions about the desirability of the present state of things. It relies on both qualitative and quantitative data. The research methodology of our study is shown in Fig. 2 below. First, we collect data from past literature: journals, magazine, written documents, sources from the internet, books and published papers. After identifying the problem in SE (reliability of software and complexity of development processes), we analyze the data we collected about the various data mining techniques, and make a decision as to which technique is best suited to solve our problem. Finally, we propose the best data mining technique to solve SE problem. Figure 2. Research Methodology Phase IV. ANALYSIS AND FINDINGS SE data can be categorized into three categories as follows [3]: Sequences, such as execution traces collected at runtime, and static traces extracted from source code Graphs, such as dynamic call graphs extracted from source code Text, such as code comments, documentation, and bug reports. When problems such as bugs or flaws arise in systems or software, it is difficult for developers or programmers to determine the cause(s). Data mining is a valuable tool for solving SE problems. Data mining techniques can be applied to solve problems in all three categories of SE data. The following sub-topics review and describe data mining techniques that can solve problems in each type of SE data. A. Sequence Data Mining Examples of SE sequence data include method-call data that is dynamically collected during program execution, or statically extracted from source code. The challenge inherent in SE sequence data is the difficulty of extracting sequential patterns and information from the source code [3] or program during program execution [12] for software bug or flaw detection. Data mining techniques that can be applied to solve these problems are frequent itemset mining (FIM) and frequent sequence mining (FSM). FIM and FSM are types of frequent pattern mining that use alternative support counting technique [3]. Fig. 3 shows the process of frequent pattern mining on program source code. A program source code is composed of elements written in a particular programming language. To mine sequential data from source code, we need to divide it into different units, such as tags, blocks, function, and classes, for further mining. The information extracted from different units of source code can be used to remove the wrong source code [13]. In this case, FIM can be used to characterize the importance of program elements. FIM is used to mine the 2
3 frequency of program elements, and to extract the procedural rules of fault and bug detection. FIM is only suitable for mining static traces or paths extracted from source code as it does not reflect the sequential order of information in the mined pattern [3, 14]. FIM uses a bottom-up approach and follows the principle wherein an itemset X of variables can only be frequent if all its subsets are also frequent [14]. Thus, it is not suitable for application to run-time data [13]. flaws in the source code of an application helps in system maintenance, and produces more reliable software. FSM saves developers time in searching for and fixing software bugs or faults during program execution. This improves software reliability and maintenance. Besides this, system performance can be enhanced as systems require less time for retrieving data from databases. B. Graph Data Mining Most SE data can be represented as graphs. Dynamic call graphs generated during execution of a program and static call graphs generated from program source code are instances of SE data represented in graphs [3]. A graph is an expressive representation for SE data and is useful for modeling complicated structures and their relationships [15]. Mining graph data has great potential to help in software development. Thus, graph data mining is often an active research area in SE. As graph data are usually large and complex, it is hard for the human eye to uncover patterns in them and classify the patterns to ease program bug analysis [16]. Application of graph classification can solve this problem, since the technique automates the identification of subgraph patterns from graph data and constructs a graph classification model for automating the classification process. Figure 3. The Procedure of Frequent Pattern Mining on Source Code [13] Run-time paths capture the ordered sequence of components or events that take place in an application [14]. The information can be used to analyze the overall software design. However, a run-time path can be very long, and numerous different run-time paths may exist in a system [14]. FSM can identify the frequent sequences that occur across different run-time paths [14]. FSM can be applied to the runtime paths [14] during program execution to identify the repeating sequences, patterns, and design flaws that lead to poor system performance. The common problem in systems, especially enterprise systems, is performance degradation when systems retrieve data from databases. Run-time paths contain performance information of the application, corresponding to users' actions. In solving performance issues of an application, it is more accurate to identify resource intensive method sequences than frequent sequences [14]. Applying FSM to mine the run-time paths not only can identify the resource intensive methods, but also give the sequences in which they occurred. The sequential information extracted by FSM allows developers to quickly locate and correct the flaws in an application. FIM helps developers determine the important elements of a particular source code. This enables them in future to create and develop more effective and failure-free software that has similar requirements. Besides this, the detection of bugs and Software is unlikely to ever be failure-free. The more failures and bugs encountered in the software, the more unreliable the software is. By detecting bugs and failures and pinpointing their origin software programmers or developers can fix them and thus increase the software reliability. There are two types of software bugs: crashing and non-crashing [16]. Crashing bugs are those that cause program execution to halt under non-ordinary conditions. Examples of crashing bugs include inputting characters into a field of integer data type, or referring to a null pointer. Non-crashing bugs do not terminate the program [16] but may result in errors in execution or output. An example of a non-crashing bug would be logical errors such as the final result generated by the program is not the result that it should be. Non-crashing bugs are more difficult to be detected as they have no crashing point -- that is, the program continues running past the point of error [16]. In order to disclose traces of non-crashing bugs, graph classification technique is used. Software behavior graphs from program execution comprise the data to be mined. Fig. 4 shows an example of behavior graphs. These graphs represent one correct run and one incorrect run of a program. The node of the graph represents the function of the program while the edge represents the function calls. However, the number of frequent subgraphs mined from behavior graphs is often large which may result in low performance of graph mining and classification [16] due to the difficulties and time consuming nature of mining huge numbers of frequent subgraphs. Closed frequent subgraphs are a legitimate substitute for frequent graphs for classification purposes, as the number of closed 3
4 frequent subgraphs generated is lower. The mined closed frequent subgraphs are used for classifier construction [15]. A classifier is built at each checkpoint for every function in a program to detect the location of the bug [16]. exercise is time consuming and often either not feasible or not regular practice [17]. To solve this problem, we can apply a technique known as text mining on natural-language description. Text data mining refers to the discovery of unknown information and potentially useful knowledge from a collection of texts, by automatically extracting and analyzing information [18]. A key factor is to extract the appropriate information and link it together to form new facts to be explored further. The Natural Language Description Technique combines linguistics and computer science to enhance the interactions between computers and natural languages [19]. Figure 4. Example of behavior graphs [16] Bug detection is achieved by monitoring for classification accuracy boost. The classification accuracy of each classifier will remain low before the bug is triggered [3], because the classifier lacks information about the behavior of the bug. When the bug is triggered, the classification accuracy of the classifier at that checkpoint increases, and therefore the location of that classifier is probably the location of the bug [16]. This eases the process of debugging for non-crashing bugs, since programmers thus can deal with the bugs directly without going through the difficulty and time to search for the bugs on their own. C. Text Data Mining Currently, approximately 80% of information is stored in computers as text [17]. Example of SE text data includes bug reports, project reports, s and code comments [3]. The majority of software used every day generates bug reports (BR). BRs are compiled from various sources such as the research and development sector and also from end user feedback. When a bug report is sent in, it is labeled as either a security bug report (SBR) or non-security bug report (NSBR). SBRs have a higher priority than NSBR. Gegick et al [17] stated that correctly labeling SBRs and NSBRs is critical to good security practice since a delay in identifying and fixing security bugs has the potential to cause serious damage to software-system stakeholders. In reality, during bug report classification SBRs are often mislabeled as NSBRs. This is partially caused by the lack of human knowledge of security domain issues. Such errors can cause serious damage to the software system due to delays in ascertaining authenticity and in rectifying the bugs. Though people on the security team are able to manually inspect NSBR submitted to identify potentially mislabeled SBR, this Figure 5. Bug Reports Classification using Text Data Mining [17] Fig. 5 shows the process of bug report classification using text data mining's Natural Language Description Technique. The initial step is to obtain a set of labeled BR data that contains textual descriptions of bugs correctly labeled to indicate whether they should be classified SBR or NSBR. Labeling of the BR data set is required for creating and evaluating a Natural Language Predictive Model. The next step is to create three configuration files used in text mining: start list, stop list, and synonym list. A stop list contains terms such as articles, prepositions, and conjunctions that are not relevant in text mining. If a term in the document is found in the stop list it is not entered into matrix. Terms in the start list, on the other hand, are highly relevant to SBRs, therefore their appearance in a BR increases the probability that the BR is an SBR [17]. The synonym list contains terms with the same meanings (e.g., buffer overrun and buffer overflow have the same meaning). Terms in a synonym list are treated equally in text mining. Thus, for example, a less frequently used term that is associated with SBRs may be assigned more weight in the prediction model than a morefrequent term that is not associated with SBRs, or a term that is not itself frequently associated with SBRs may receive more priority in the prediction model if it is a synonym with a term that is frequently used in SBRs. The third step is to train, validate, and test the predictive model that estimates the probability that a BR is an SBR. [17]. 4
5 Text mining on natural-language descriptions can be applied to BRs to determine whether those reports are SBR or NSBR. Software engineers can use this technique to automate the classification process of BRs from huge bug databases to save time and reduce dependency on human efforts, as well as to improve the priority ranking of each SBR and ensure that the described security bug receives the appropriate effort and a timel fix, thereby improving the software. D. Discussion of the Three Data Mining Techniques In sections A, B, and C above, we discussed the data mining techniques that can be used to solve problems in each type of SE data (sequences, graphs, and text) Hence, we have shown that the different nature of data demands different data mining techniques in solving SE problems. Data mining techniques that can be applied to solve certain SE data problems might not be able to solve other SE data problems. The data mining techniques appropriate for solving sequence data problems are frequent itemset mining (FIM) and frequent sequence mining (FSM), while the data mining technique best suited for solving SE graph data problems is classification, and text mining on natural language technique is the most useful tool for solving text data problems. V. CONCLUSION In this paper, we have discussed the application of data mining for solving of a number of Software Engineering problems. A number of problems encountered in the SE field, such as bug occurrence, high cost of software maintenance; unclear requirements and so on can reduce software quality and productivity. We outlined the data mining techniques that can be applied to various types of SE data in order to solve the challenges posed by SE tasks such as programming, bug detection, debugging, and maintenance. Our studies show that data mining techniques have proven to be effective for improving SE by increasing software reliability and quality. As to future direction, new mining techniques or algorithms are needed in order to solve unclear software requirements in mining SE data. Another challenge is the ability to mine combination types of SE data (e.g., text and graphs together) for more informative patterns. Further research needs to be done to explore the potential of mining combination data to solve numerous real-world problems in SE. Computer and Automation Engineering (ICCAE) 2010, Vol. 1, pp , Apr [5] R. W. DePree, Pattern recognition in software engineering, IEEE Computer 1983, pp , [6] R. H. Thayer, A. Pyster, and R. C. Wood, Validating solutions to major problems in software engineering project management, IEEE Computer Society, pp , [7] C. V. Ramamoorthy, A. Prakash, W. T. Tsai, and Y. Usuda, Software engineering: problems and perspectives, IEEE Computer Society, pp , Oct [8] J. Clarke et al., Refomulating software engineer as a search problem, IEEE Proceeding Software., Vol. 150, No. 3, pp , June [9] M. Z. Islam and L. Brankovic, Detective: a decision tree based categorical value clustering and perturbation technique for preserving privacy in data mining, Third IEEE Conference on Industrial Informatics (INDIN), pp , [10] M. Aouf, L. Lyanage, and S. Hansen, Critical review of data mining techniques for gene expression analysis, International Conference on Information and Automation for Sustainability (ICIAFS) 2008, pp , [11] P. C. H. Ma and K. C. C. Chan, An iterative data mining approach for mining overlapping coexpression patterns in noisy gene expression data, IEEE Trans. NanoBioscience, Vol. 8 No. 3, pp , Sept [12] B. Bhasker, An algorithm for mining large sequence in databases, Communications of the IBIMA, Vol. 6, pp , [13] C. M. Zong and Z. L. Li, Applying data mining techniques in software development, 2nd IEEE Int. Conf., pp Apr [14] T. Parsons, J. Murphy, and P. O Sullivan, Applying frequent sequence mining to identify design flaws in enterprise software systems, Machine Learning and Data Mining in Pattern Recognition, 5 th Int. Conf., pp , MLDM [15] J. Han, and M. Kamber, Data mining concepts and techniques, 2nd Edition, The Morgon Kaufman Series in Data Management Systems, [16] C. Liu et al., Mining behavior graphs for backtrace of noncrashing bugs, Proc. of 2005 SIAM Int. Conf. on Data Mining (SDM'05), pp , [17] M. Gegick, P. Rotella and T. Xie, Identifying security bug reports via text mining: an industrial case study, 7th IEEE Working Conf. Mining Software Repositories (MSR), Cape Town, [18] G. Vishal and S. L Gurpreet, A survey of text mining techniques and applications, Journal of Emerging Technologies in Web Intelligence, Vol. 1, No. 1, August [19] X.Y. Wang, Z. Lu, T. Xie, A. John and S. Jiasu, An approach to detecting duplicate bug reports using natural language and execution information, ACM/IEEE 30th Int. Conf. on Software Engineering, ICSE 08, 2008 ACKNOWLEDGMENT The authors wish to thank Universiti Sains Malaysia (USM) for supporting this research. REFERENCES [1] J. Clarke, Refomulating software as s search problem, IEEE Proc.: Software Vol. 150, No.3, pp , June [2] X. L. Fern, C. Komireddy, V. Gregoreanu, and M. Burnett, Mining problem-solving strategies from HCL data, ACM Trans. Computer- Human Interaction, Vol. 17, No. 1, Article 3 pp. 1-22, March [3] T. Xie, S. Thummalapenta, D. Lo, and C. Liu, Data mining for software engineering, IEEE Computer Society August 2009, pp , [4] Y. Chen, X. H. Shen, P. Du, and B. Ge, Research on software defect prediction based on data mining, 2 nd International Conference on 5
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One
IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD
Journal homepage: www.mjret.in ISSN:2348-6953 IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD Deepak Ramchandara Lad 1, Soumitra S. Das 2 Computer Dept. 12 Dr. D. Y. Patil School of Engineering,(Affiliated
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Dynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
Fault Localization in a Software Project using Back- Tracking Principles of Matrix Dependency
Fault Localization in a Software Project using Back- Tracking Principles of Matrix Dependency ABSTRACT Fault identification and testing has always been the most specific concern in the field of software
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL
International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR
Financial Trading System using Combination of Textual and Numerical Data
Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,
A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan
, pp.217-222 http://dx.doi.org/10.14257/ijbsbt.2015.7.3.23 A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan Muhammad Arif 1,2, Asad Khatak
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm
Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm R.Karthiyayini 1, J.Jayaprakash 2 Assistant Professor, Department of Computer Applications, Anna University (BIT Campus),
Data Mining Governance for Service Oriented Architecture
Data Mining Governance for Service Oriented Architecture Ali Beklen Software Group IBM Turkey Istanbul, TURKEY [email protected] Turgay Tugay Bilgin Dept. of Computer Engineering Maltepe University Istanbul,
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: [email protected] November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering
Enhancing Quality of Data using Data Mining Method
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad
An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Data Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
A Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, [email protected] Abstract Most text data from diverse document databases are unsuitable for analytical
Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules
Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules M.Sangeetha 1, P. Anishprabu 2, S. Shanmathi 3 Department of Computer Science and Engineering SriGuru Institute of Technology
A UPS Framework for Providing Privacy Protection in Personalized Web Search
A UPS Framework for Providing Privacy Protection in Personalized Web Search V. Sai kumar 1, P.N.V.S. Pavan Kumar 2 PG Scholar, Dept. of CSE, G Pulla Reddy Engineering College, Kurnool, Andhra Pradesh,
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
Clustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016
Network Machine Learning Research Group S. Jiang Internet-Draft Huawei Technologies Co., Ltd Intended status: Informational October 19, 2015 Expires: April 21, 2016 Abstract Network Machine Learning draft-jiang-nmlrg-network-machine-learning-00
Web Application Regression Testing: A Session Based Test Case Prioritization Approach
Web Application Regression Testing: A Session Based Test Case Prioritization Approach Mojtaba Raeisi Nejad Dobuneh 1, Dayang Norhayati Abang Jawawi 2, Mohammad V. Malakooti 3 Faculty and Head of Department
Semantic Concept Based Retrieval of Software Bug Report with Feedback
Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop
Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior
Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior N.Jagatheshwaran 1 R.Menaka 2 1 Final B.Tech (IT), [email protected], Velalar College of Engineering and Technology,
Component visualization methods for large legacy software in C/C++
Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University [email protected]
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Improving Apriori Algorithm to get better performance with Cloud Computing
Improving Apriori Algorithm to get better performance with Cloud Computing Zeba Qureshi 1 ; Sanjay Bansal 2 Affiliation: A.I.T.R, RGPV, India 1, A.I.T.R, RGPV, India 2 ABSTRACT Cloud computing has become
Distributed Framework for Data Mining As a Service on Private Cloud
RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &
An Empirical Study of Application of Data Mining Techniques in Library System
An Empirical Study of Application of Data Mining Techniques in Library System Veepu Uppal Department of Computer Science and Engineering, Manav Rachna College of Engineering, Faridabad, India Gunjan Chindwani
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System
Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System Bala Kumari P 1, Bercelin Rose Mary W 2 and Devi Mareeswari M 3 1, 2, 3 M.TECH / IT, Dr.Sivanthi Aditanar College
Facilitating Business Process Discovery using Email Analysis
Facilitating Business Process Discovery using Email Analysis Matin Mavaddat [email protected] Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process
Graph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
DATA PREPARATION FOR DATA MINING
Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 [email protected] 2 [email protected] Abstract A vast amount of assorted
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Selection of Optimal Discount of Retail Assortments with Data Mining Approach
Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.
Finding Execution Faults in Dynamic Web Application
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 5 (2014), pp. 445-452 International Research Publications House http://www. irphouse.com /ijict.htm Finding
A Data Mining Study of Weld Quality Models Constructed with MLP Neural Networks from Stratified Sampled Data
A Data Mining Study of Weld Quality Models Constructed with MLP Neural Networks from Stratified Sampled Data T. W. Liao, G. Wang, and E. Triantaphyllou Department of Industrial and Manufacturing Systems
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
Distributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
(Refer Slide Time: 01:52)
Software Engineering Prof. N. L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture - 2 Introduction to Software Engineering Challenges, Process Models etc (Part 2) This
INVESTIGATING THE DEFECT PATTERNS DURING THE SOFTWARE DEVELOPMENT PROJECTS
INVESTIGATING THE DEFECT PATTERNS DURING THE SOFTWARE DEVELOPMENT PROJECTS A Paper Submitted to the Graduate Faculty of the North Dakota State University of Agriculture and Applied Science By Abhaya Nath
2.1. The Notion of Customer Relationship Management (CRM)
Int. J. Innovative Ideas (IJII) www.publishtopublic.com A Review on CRM and CIS: A Service Oriented Approach A Review on CRM and CIS: A Service Oriented Approach Shadi Hajibagheri 1, *, Babak Shirazi 2,
Process Modelling from Insurance Event Log
Process Modelling from Insurance Event Log P.V. Kumaraguru Research scholar, Dr.M.G.R Educational and Research Institute University Chennai- 600 095 India Dr. S.P. Rajagopalan Professor Emeritus, Dr. M.G.R
Profile Based Personalized Web Search and Download Blocker
Profile Based Personalized Web Search and Download Blocker 1 K.Sheeba, 2 G.Kalaiarasi Dhanalakshmi Srinivasan College of Engineering and Technology, Mamallapuram, Chennai, Tamil nadu, India Email: 1 [email protected],
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
Web Usage Mining: Identification of Trends Followed by the user through Neural Network
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 617-624 International Research Publications House http://www. irphouse.com /ijict.htm Web
Mining Signatures in Healthcare Data Based on Event Sequences and its Applications
Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1
Mining for Web Engineering
Mining for Engineering A. Venkata Krishna Prasad 1, Prof. S.Ramakrishna 2 1 Associate Professor, Department of Computer Science, MIPGS, Hyderabad 2 Professor, Department of Computer Science, Sri Venkateswara
Automatic Annotation Wrapper Generation and Mining Web Database Search Result
Automatic Annotation Wrapper Generation and Mining Web Database Search Result V.Yogam 1, K.Umamaheswari 2 1 PG student, ME Software Engineering, Anna University (BIT campus), Trichy, Tamil nadu, India
Fault Analysis in Software with the Data Interaction of Classes
, pp.189-196 http://dx.doi.org/10.14257/ijsia.2015.9.9.17 Fault Analysis in Software with the Data Interaction of Classes Yan Xiaobo 1 and Wang Yichen 2 1 Science & Technology on Reliability & Environmental
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
Towards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial
A SYSTEM FOR DENIAL OF SERVICE ATTACK DETECTION BASED ON MULTIVARIATE CORRELATION ANALYSIS
Journal homepage: www.mjret.in ISSN:2348-6953 A SYSTEM FOR DENIAL OF SERVICE ATTACK DETECTION BASED ON MULTIVARIATE CORRELATION ANALYSIS P.V.Sawant 1, M.P.Sable 2, P.V.Kore 3, S.R.Bhosale 4 Department
K Thangadurai P.G. and Research Department of Computer Science, Government Arts College (Autonomous), Karur, India. ktramprasad04@yahoo.
Enhanced Binary Small World Optimization Algorithm for High Dimensional Datasets K Thangadurai P.G. and Research Department of Computer Science, Government Arts College (Autonomous), Karur, India. [email protected]
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Neural Networks in Data Mining
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 01-06 www.iosrjen.org Neural Networks in Data Mining Ripundeep Singh Gill, Ashima Department
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
A SECURE DECISION SUPPORT ESTIMATION USING GAUSSIAN BAYES CLASSIFICATION IN HEALTH CARE SERVICES
A SECURE DECISION SUPPORT ESTIMATION USING GAUSSIAN BAYES CLASSIFICATION IN HEALTH CARE SERVICES K.M.Ruba Malini #1 and R.Lakshmi *2 # P.G.Scholar, Computer Science and Engineering, K. L. N College Of
An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus
An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus Tadashi Ogino* Okinawa National College of Technology, Okinawa, Japan. * Corresponding author. Email: [email protected]
Study of Lightning Damage Risk Assessment Method for Power Grid
Energy and Power Engineering, 2013, 5, 1478-1483 doi:10.4236/epe.2013.54b280 Published Online July 2013 (http://www.scirp.org/journal/epe) Study of Lightning Damage Risk Assessment Method for Power Grid
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
Comparing Methods to Identify Defect Reports in a Change Management Database
Comparing Methods to Identify Defect Reports in a Change Management Database Elaine J. Weyuker, Thomas J. Ostrand AT&T Labs - Research 180 Park Avenue Florham Park, NJ 07932 (weyuker,ostrand)@research.att.com
A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
Future Trend Prediction of Indian IT Stock Market using Association Rule Mining of Transaction data
Volume 39 No10, February 2012 Future Trend Prediction of Indian IT Stock Market using Association Rule Mining of Transaction data Rajesh V Argiddi Assit Prof Department Of Computer Science and Engineering,
Intinno: A Web Integrated Digital Library and Learning Content Management System
Intinno: A Web Integrated Digital Library and Learning Content Management System Synopsis of the Thesis to be submitted in Partial Fulfillment of the Requirements for the Award of the Degree of Master
Software Engineering Introduction & Background. Complaints. General Problems. Department of Computer Science Kent State University
Software Engineering Introduction & Background Department of Computer Science Kent State University Complaints Software production is often done by amateurs Software development is done by tinkering or
2.1. Data Mining for Biomedical and DNA data analysis
Applications of Data Mining Simmi Bagga Assistant Professor Sant Hira Dass Kanya Maha Vidyalaya, Kala Sanghian, Distt Kpt, India (Email: [email protected]) Dr. G.N. Singh Department of Physics and
A Knowledge Management Framework Using Business Intelligence Solutions
www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For
A Framework for Personalized Healthcare Service Recommendation
A Framework for Personalized Healthcare Service Recommendation Choon-oh Lee, Minkyu Lee, Dongsoo Han School of Engineering Information and Communications University (ICU) Daejeon, Korea {lcol, niklaus,
Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India
Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone
Detecting Multiple Selfish Attack Nodes Using Replica Allocation in Cognitive Radio Ad-Hoc Networks
Detecting Multiple Selfish Attack Nodes Using Replica Allocation in Cognitive Radio Ad-Hoc Networks Kiruthiga S PG student, Coimbatore Institute of Engineering and Technology Anna University, Chennai,
A Review of Anomaly Detection Techniques in Network Intrusion Detection System
A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information
