Association Rules Mining: Application to Large-Scale Satisfiability
|
|
|
- Marian Lang
- 10 years ago
- Views:
Transcription
1 Association Rules Mining: Application to Large-Scale Satisfiability Habiba Drias, Youcef Djenouri LRIA, USTHB Abstract Big data mining is a promising technology for industrial applications such as business intelligence and marketing. In this paper, we attempt to transfer this background to a scientific domain, which is problem solving. An enormous research investment has been done over the past years for solving NP-complete problems. The exact methods yield the optimal solution if it exists but they are not able to tackle problems of large instances because they would generate a combinatorial explosion and a timeout calculation. Approximate approaches are mostly stochastic and do not guarantee to find a solution even if one exists. For both alternatives, and in order to reach more closely any optimal solution, the search space should be explored in a judicious manner. If we consider the solutions collection as a data set, data mining techniques may help acquiring some knowledge on the landscape fitness function and then exploiting the achieved information in solving the problem. We are especially interested in locating those regions that are promising in the sense they contain solutions of good quality. We aim especially at developing a two-phase approach to address the satisfiability problem in order to cope with its complexity. In the first step, a preprocessing of clauses is undertaken using association rules mining. The achieved associative clauses classes are then exploited to solve the instance. Extensive experiments are performed on BMC benchmarks and numerical results show the benefit of our proposal. 1. Introduction Frequent pattern mining has become widely used in many areas such as marketing and DNA sequence analysis. It consists in finding regularities that occur frequently in data sets. Association rules are a means to represent frequent patterns under some conditions. Several investigations have been devoted to this research area and a plethora of algorithms have been designed to compute such rules. Most of the proposed approaches consider reasonable data sizes and are confronted now to treat very large data inputs. This paper is a first attempt to explore big data mining for a scientific application such as problem solving. Solving NPcomplete problems and the satisfiability problem especially is one of the trickier issues studied for more than half a century. Tremendous efforts have been spent around the world to shed the light on these complex problems and significant and various results have been achieved. However, investments are still needed for acquiring a better knowledge on these hard problems leading for a better resolution. One possible research direction may consist in exploring in a judicious way the search space especially for large problem instances before launching the resolution process. The idea is to develop two steps for solving the problem, the first one consists in preprocessing the instance by extracting knowledge on the solutions space nature and the second is to solve the problem using the acquired knowledge. Our work considers data mining techniques and especially association rules mining to explore the search space for the satisfiability problem. The aim is to reduce the complexity of the problem by clustering clauses and hence variables and afterwards solving the clusters with a smaller number of variables. From the technical part, association rules mining have been commonly used in relational databases where data are represented in a table. An entry of the table represents a transaction and the columns its attributes. Several algorithms are proposed in the literature to address the issue. Three major methods have dominated associated rules mining: Apriori [2], FPgrowth [8] and Charm [13]. Apriori can be seen as a horizontal search method whereas Charm is another vertical version. Apriori has the disadvantage to be slow as it crosses the database at each iteration of the algorithm. FPgrowth aimed at palliating this issue by using a complex data structure namely the FPtree. For our concern, we focus on exploiting association rules mining for preprocessing clauses of SAT instances. For this aim, the clauses represent the transactions and the variables the attributes. The variables are grouped into clusters that are solved using a SAT solver. Methods other than the classical association rules mining are proposed to cope with the scalability dimension. They are based on hybrid meta-heuristics and more precisely on a bio-inspired approach and a taboo search. The rest of the paper is organized as follows. In section 2, the problem SAT is recalled and the related works are presented in section 3. The proposal and its implementation are exposed and discussed in sections 4, 5 and 6. The conducted experiments and the obtained results are exhibited in section 7. Finally, in the last section, we conclude our study and give some perspectives. 2. The Satisfiability Problem Satisfiability (SAT) arouses more interest from the communities of artificial intelligence and computational complexity. Indeed, all eyes turn to this problem in recent years thinking that solving SAT could make significant progress in these two major disciplines. Of course, these advancements will generate a significant impact in the field of computer science in general. 2.1 DefinitionsTapez une équation ici. Given a set of Boolean variables V = {v1, v2,..., vn} and an assignment function t: V {T, F} that associates a truth value to each variable vi, the general notions that define SAT are: ASE 2014 ISBN:
2 - a literal is a variable that appears with or without the negation operator. - a clause is a disjunction of literals. - an instance of SAT is a conjunction of clauses. - the problem SAT is defined by the following (instance, question) pair Instance: m clauses over n variables { Question: is there an instanciation of variables for which all the clauses are true? A clause is True (T) if and only if it includes at least one positive literal with the value T or one negative literal with the value False (F). Example: Let consider the following set of variables V = {v1, v2, v3, v4} and the following set of clauses C = {C1, C2, C3} defined as: C1 = v1, v2, -v4 C2 = v2, -v3, v4 C3 = -v1, -v2, v3 The - sign denotes the negation operator and, the disjunctive operator. One possible solution is the instanciation {T, T, T, T} respectively assigned to {v1, v2, v3, v4}. 2.2 SAT Solvers There are several approaches for solving SAT and a large number of SAT solvers have been developed since the 60s. They are used in the academic world as well as in the industry to solve complex problems. The first designed solvers are called "complete" because they are able to compute the optimal solution if it exists or prove that there is none. The most known SAT solvers are: The Davis-Putnam Procedure (DPP) DPP [4] is one of the complete algorithm that has been widely used for solving SAT. The idea of the algorithm is the construction of a binary tree using recursion and backtracking. By assigning values to variables and then eliminating these variables from the instance, the size of the search space is considerably reduced relatively to the methods that existed previously. The process repeats selecting a variable v and assigning to it the possible Boolean values and then eliminating all clauses that contain v or v. The process stops when the instance becomes empty. The Davis-Putnam-Logemann-Loveland (DPLL) DPLL [5] is an improved version of DPP w.r.t. the space complexity. Loveland and Logemann introduced the splitting rule, which recursively assigns Boolean values to a variable and solves the resulting sub-problems. When a clause is satisfied, it is suppressed and this operation is called simplification rule. Other more modern and powerful SAT solvers [10] based on DPLL have been developed and are more widely used these days. They are able to solve instances of hundreds of thousands of clauses over thousands of variables. As we know, when the size of the instance to solve is very large, the exact algorithms can confront a combinatorial explosion during their execution and any high degree of performance of the actual machines cannot avoid this problem. So "incomplete" approaches have been designed as an alternative to yield nearoptimal solutions. Their main drawback is that they never guarantee the optimal solutions. There is nowadays a plethora of techniques, including incomplete heuristics and metaheuristics. These methods are called incomplete because they explore only a part of the search space when the latter is prohibitive. Several interesting incomplete SAT solvers have been developed and in some cases, they are even able to outperform the complete solvers. Among them, GSAT [11] is a randomized local search. It starts drawing randomly a valuation for the variables and then making a certain number of flips on variables that reduce the number of unsatisfiable clauses. This process is repeated until getting the optimal solution or reaching a limit on the number of tries. Walksat [12] is an extended version of GSAT. A noise represented by a probabilistic instruction, is introduced in the procedure to realize the random walk move. With a probability p a variable drawn at random is considered and with (1-p) the variable that yields the maximum satisfied clause is selected. On the other hand, several meta-heuristics like Scatter search, Taboo search and Bio-inspired approaches such as GA, ACO and BSO were adapted for SAT. 3. Association Rules Mining Association Rules Mining (ARM) is one of the most important and well studied tasks in Data Mining. It aims at extracting frequent patterns, associations or causal structures among sets of items from a given transactional database. Formally, the association rule problem is defined as follows: let T be a set of n transactions {t1, t2,..., tn} representing a transactional database, and I be a set of m different items or attributes {i1, i2,..., im}, an association rule is an implication of the form X Y where X I, Y I, and X Y =. The itemset X is called antecedent while the itemset Y is called consequent and the rule is read X implies Y. It is interpreted as: whenever the items of X appear in a transaction, the items of Y appear in the same transaction. ARM consists in discovering a set of association rules that cover a large percentage of data transactions. The rules should ASE 2014 ISBN:
3 not be redundant and hence their size should be kept as small as possible. However, since the databases are increasingly large, the user no longer looks for all the rules but only for a subset of useful rules. Two basic parameters are commonly used for measuring the usefulness of association rules, namely the support and the confidence of a rule. The support of an itemset I I is the number of transactions containing I. The support of a rule X Y is the support of X Y and the confidence of a rule is calculated as: support(x Y) support(x) The confidence is a measure of strength of the association rule. An association rule X Y with a confidence of 80% means that 80% of the transactions that contain X also contain Y together. So, association rules mining consists in extracting from a given database, all rules with a support greater or equal to MinSup and a confidence greater or equal to MinConf where MinSup and MinConf are two thresholds predefined by the users [7]. 4. Related Works For our best knowledge, no studies have tackled the problem of association rules for satisfiability, both paradigms have been explored separately. Pattern recognition such as association rules mining have been widely investigated especially these last years. On the other hand, SAT has been studied for more than half a century. The most recent works that are close to the main concern of the paper are described in the following. Many algorithms for generating association rules for databases have been proposed in the literature. Some well known algorithms are AIS [1], Apriori [2], Eclat [13] and FP-Growth [8]. AIS is very space consuming and requires too many passes over the whole database. Apriori is the most used algorithm for association rules mining because of its simplicity. It consists of a breadth first search strategy for counting the supports of itemsets and uses a candidate generation function to exploit the downward closure property of support. It presents however a major drawback, which is the great number of access to the database, which yields at its turn a slow computation response. FP-growth uses an efficient but complex structure called FPtree to compress the database and a divide-and-conquer approach, to decompose the mining tasks and the database as well. In [7] the authors present an interesting survey about different exact and polynomial ARM algorithms. However, because of the fast growth of data, the existing methods have become very quickly inefficient. Indeed, even if these polynomial algorithms can still calculate the association rules in a very short time, they remain limited according to our goal, which is to extract all the rules from large instances of SAT. Motivated by the diversification aspect of the bio-inspired approach namely the Bee Swarm Optimization (BSO) and the intensification strategy of Taboo Search (TS), a new algorithm called HBSO-TS ( for Hybrid Bees Swarm Optimization and Taboo Search) for association rule mining is proposed. It is applied for generating association rules for SAT clauses and described in the next section. We propose an approach that starts exploring judiciously the search space before seeking for solutions and hence reduces the complexity of the problem large instances. 5. Hybrid BSO and TS for ARM In order to achieve scalability for the association rules mining, we designed a new algorithm called HBSO-TS for Hybrid BSO and TS for ARM. We first give an overview of the bio-inspired approach BSO and its corresponding algorithm. Taboo search algorithm is described afterwards. At last the hybrid method is presented. 5.1 BSO Bee algorithms have been well studied and a good survey on the subject can be found in [9]. They are inspired from the foraging behavior of real bees swarm, which is a cooperative process. The bees start leaving the hive in order to search for a food source. When they found a rich source of nectar, they communicate through a dance, its direction and distance to their congeners. This form of communication between the bees is qualified as stygmergic, it allows the bees to share the detected food between themselves. BSO is based on a population of artificial bees cooperating together to solve an instance of an optimization problem. The general algorithm of BSO is illustrated in Figure 1. In line 2, an initial solution generated randomly or via a heuristic is assigned to the variable representing the reference solution. This solution represent somehow the scout bee, which determines the search regions. The block of instructions between line 3 and line 12 translates the whole bees generations. During one iteration, the determination of the search region to assign to the swarm is performed in line 5. It consists in generating from the reference solution a set of solutions according to some strategy. Each solution in the search region is assigned to one bee of the swarm as starting point of its local search, and the result is stored in a table called Dance that the artificial bees use as a means to communicate to their fellows the best solution they found at the end of each iteration. The selection of the reference solution is performed in line 11. It is chosen as the best solution found by the bees in the previous iteration in order to simulate the most rigorous dance performed by a bee. 5.2 Tabu Search The tabu search we use, is described in Figure 2. It consists in exploring a region intensively. Indeed, in case the algorithm determines a solution s that outperforms the best solution already found, the best solution will be updated with s. If, on the other hand, the best solution was not improved during ASE 2014 ISBN:
4 several iterations, the reference solution will be the one that presents the greatest diversification degree, that is, the one that is the most distant from the solutions of the previous iterations that are stored in the taboo list. This allows the algorithm to escape local optima and ensures a good compromise between exploitation and exploration of the solution space. Algorithm 1: BSO 1. Begin 2. Generate an initial solution and assign it to refsol; 3. While not stopping criterion do 4. Determine searchpoints from refsol; 5. Assign a solution to each bee; 6. For each Bee do 7. Perform a local search; 8. Store the result in the table Dance; 9. EndFor; 10. Select the new reference solution RefSol; 11. EndWhile; 12. Return the best solution found; 13. End. Figure 1. Basic BSO Algorithm Algorithm 2: Taboo search 1. Begin 2. S = some initial candidate solution; 3. Best = S; 4. L = {}; a tabu list 5. I = 1; 6. while I < Max-Iter and not stop do 7. Enqueue S into L; 8. S = Best neighbors(s); 9. if Quality(Best) < Quality(S) then Alter Best; 10. EndWhile 11. End Figure 2. Taboo Search Algorithm To adapt BSO and TS to ARM we define the following components: the encoding s solution, the determination of SearchPoints, the fitness function and the neighborhood search. 5.3 Solution Encoding In Association Rule Mining, two famous representations, namely binary encoding and integer encoding can be found in the literature. In binary encoding, a rule is represented by a vector S of n elements where n is the items number. Furthermore, S[i] = 1 if the item i is in the rule and 0 otherwise. In the integer encoding the solution is represented by a vector S of k+1 elements where k is the rule size. The first element is the separator index between antecedent and consequent parts of the solution. For all the other elements i in S, S[i] = j where the item j appears in the ith position of the rule. In HBSO-TS, a solution is a rule and we propose a combination of both previous representations for it. Such structure is more readable and hence facilitates the design of BSO SearchPoints and the neighborhood search. Each solution S is a vector of n elements where: 1) S[i] = 0 if the item i is not in the solution S. 2) S[i] = 1 if the item i belongs to the antecedent part of the solution S. 3) S[i] = 2 if the item i belongs to the consequent part. 5.4 Fitness Function As mentioned above, the ARM problem consists in finding all rules satisfying the constraints of MinSup and MinConf respectively. This is a bi-criteria problem and we can solve it using a multi-objective optimization method. This requires more background and is not the core of our concern. A simple and sufficient solution would be to consider an aggregation technique with two empirical parameters α and β. The fitness function F of the solution S that we propose is expressed as follows: F(S) = α confidence(s) + β support(s) α and β are to be set according to the importance we give to the confidence and support respectively. 5.5 Search Regions Determination Given a reference solution Sref and a colony of k bees, SearchPoints is determined in order to assign a point of the search space to a bee. It is built by changing successively in the solution Sref the values of entries k+i flip by random ones drawn from (0,1,2) as defined in subsection 5.3. In order to get several points, we vary i from 0 to n 1 and flip is a given parameter. If the distance between solutions is the number of different entries, then the distance between the bees and the n solution reference is equal to. The generated points are flip designed in such a way they are dispersed in the search space in order to browse the latter in an efficient way. 5.6 Neighborhood Search The neighborhood search concerns BSO for the local search as well as taboo search. A neighbor is computed by changing randomly the value of one element of the current solution. In BSO, the first neighbor that is of better quality than the current solution is considered as the result of the local search. However for the taboo search, all neighbors are generated and evaluated. If the best neighbor does not belong to the taboo list then it will become the new solution for the next iteration. This process is ASE 2014 ISBN:
5 repeated until the maximum number of taboo search iterations is reached. 5.7 HBSO-TS algorithm HBSO-TS starts calculating Sref, then it computes k regions to assign to the bees respectively. Then, each bee explores its region using Taboo Search instead of a simple local search and the result is put in Dance table. At this time, the communication between bees takes place in order to find the best solution simulating the most vigorous dance. The latter becomes Sref for the next pass. This process is repeated until a maximum number of iterations is reached. 6. SAT Clauses Preprocessing HBSO-TS is applied for mining association rules for an instance of SAT. We consider a clause as a transaction and a variable belonging to a clause as its attribute. 6.1 Associative Clusters of Clauses From the generated association rules, clusters of variables are built. All the clauses appearing in a rule constitute a cluster. The clusters are then formed by the variables belonging to its clauses. Example: Let consider the following clauses: c180 = v10712, v2600, v2510, v9320, v3503, v6178 c551 = v3005, v1718, v9789 c125 = v3572, v4740, v5002, v10585 and the following computed association rule: C180, c551 c125 then the variables belonging to these clauses form the following cluster: (v10712, v2600, v2510, v9320, v3503, v6178, v3005, v1718, v9789, v3572, v4740, v5002, v10585) 6.2 Overall Solution The resulting clusters are considered as new SAT sub-problems with a smaller number of clauses and variables and hence less difficult to solve. Each cluster is treated separately and the set of clusters may also be solved in parallel. The variables take the value of the solution of the cluster in which they appear. For the resolution of the sub-problems, we used the minisat solver [10]. The overall algorithm is called HBSO-TS-SAT. This approach would be effective if the clusters were totally disjoints. Nevertheless, this is not the case in general and it happens that several variables are shared by clusters. All the difficulty is then to handle the intersections between clusters. We cope with this issue by adopting a simple strategy, which consists in yielding disjoints clusters by assigning the common variables to only one cluster. Of course, in order to get more convincing outcomes, we should investigate for more efficient strategy for treating the shared variables. 7. Experimental Results The experiments were conducted on all BMC (Bounded Model Checking) benchmarks [3] presented in Table 1. These instances are hard to solve and used by the scientific community. Extensive experiments were undertaken in order to test the gain brought by the integration of association rules techniques in solving SAT. Our results were compared to those provided by the SAT solver called minisat. The results concerning the satisfaction rate are rather almost as good as those of minisat though the simple way we handle the common variables between clusters. Figure 3 exhibits the results concerning the satisfiability rate of the 13 instances of the tested benchmark. In order to yield better performance, the strategy of solving the shared clauses by clusters should be designed in a more subtle way. Our efforts focus now on this issue. In terms of computational time, the outcomes are much more interesting. Table 2 shows the gap existing between our approach and the classical version of minisat expressed by the speedup of the last column. Table 1. BMC benchmarks Benchmarks characteristics benchmark Number of Number of variables clauses ibm ibm ibm ibm ibm ibm ibm galileo galileo ibm ibm ibm ibm ASE 2014 ISBN:
6 bmc1 bmc2 bmc3 BMc4 BMc5 BMC6 BMC7 BMC8 BMc9 BMC10 BMC11 BMC12 BMC ASE BIGDATA/SOCIALCOM/CYBERSECURITY Conference, Stanford University, May 27-31, satisfiability rate Satisfaction rate of classical MiniSat Satisfaction rate of the proposed approach results show the impact of clauses clustering gave the best results in terms of processing time. The computation time is reduced drastically because the clusters to solve are of smaller size. The performance of the solution quality is interesting but needs to be improved and this is what we are working on right now. Also, for the near future, we are interested in exploring several perspectives, among them: - We intent to deepen our study by handling more experiments on other public benchmarks. - We are using GPU computing for parallelizing the main algorithm to achieve more enhancements in terms of computation time. - Another idea is to compute the correlation between the variables in order to reduce their number and integrate this process in the pretreatment phase. Figure 3. Satisfiability rate for the new approach Table 2. Running time for SAT Instance Number of rules CPU Time Our approach MiniSat Speed up bmc ,049 0,065 24,62 bmc ,0009 0,002 55,00 bmc ,21 0,25 16,00 bmc ,195 0,212 8,02 bmc ,385 0,36-6,94 bmc ,18 0,25 28,00 bmc ,69 0,75 8,00 bmc ,12 0,17 29,41 bmc ,32 0,26-23,08 bmc ,65 1,25-32,00 bmc ,3 0,45 33,33 bmc ,65 4,52 19,25 bmc ,2 4,44 27,93 8. Conclusion In this paper, we proposed an approach for solving problems by clustering the data using an association rules algorithm before launching the resolution method. The satisfiability problem was a case study and algorithms were developed for the problem to illustrate the idea. A new algorithm based on the hybridization of BSO and Taboo search was first designed to address the largescale association rules mining. Then a novel two-phase approach was designed to solve SAT. The clauses were first grouped using the ARM algorithm and then the clusters were solved separately using the minisat solver. A simple strategy was implemented to handle the shared variables between the clusters. Experimental tests were performed on a collection of 13 BMC benchmarks to test satisfiability with these novel proposals. The References [1] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc ACM-SIGMOD Int. Conf. Management of Data (SIGMOD 93), pages ,Washington, DC, May [2] R. Agrawal, S. Ramakrishan. Fast algorithms for association rules in large databases. Proc of the 20th International Conference on very large Data bases - VLDB), Santiago, Chile PP , Sept [3] A. Biere, A. Cimatti, E.M. Clarke, Y. Zhu. Symbolic Model checking without BDDs. in proc of TACAS 99, LNCS 1579, Springer, , [4] M. Davis, H. Putnam. A computing procedure for quantification theory. CACM, 7, , [5] M. Davis, G. Logemann, D. Loveland. A Machine Program for Theorem Proving. Communications of the ACM 5 (7), ,1962. [6] C. P. Gomes, H. Kautz, A. Sabharwal, B. Selman. Satisfiability Solvers. in Handbook of Knowledge Representation edited by F. van Harmelen, V. Lifschitz and B. Porter, Elsevier, , [7] J. Han et al. Data mining Concepts and Techniques, Morgan Kauffman Series, [8] J. Han, J. Pei, Y. Yin, R. Mao. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data mining and knowledge discovery, 8(1), 53-87, [9] D. Karaboga, B. Akay. A survey: algorithms simulating bee swarm intelligence. Artificial Intelligence Review, Vol 31, No. 1-4, pp 61-85, [10] SAT competition, [11] B. Selman, H.J. Levesque, D.G. Mitchell. A new method for solving hard satisfiability problems. In 10th AAAI, San Jose, CA, , [12] B. Selman, H. Kautz, B. Cohen. Local search strategies for satisfiability testing. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, volume 26, American Mathematical Society, , [13] M. J. Zaki. Scalable algorithms for association mining. Knowledge and Data Engineering, IEEE Transactions on, 12(3), , ASE 2014 ISBN:
Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm
R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*
Static Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
Mining Association Rules: A Database Perspective
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.12, December 2008 69 Mining Association Rules: A Database Perspective Dr. Abdallah Alashqur Faculty of Information Technology
npsolver A SAT Based Solver for Optimization Problems
npsolver A SAT Based Solver for Optimization Problems Norbert Manthey and Peter Steinke Knowledge Representation and Reasoning Group Technische Universität Dresden, 01062 Dresden, Germany [email protected]
Performance Test of Local Search Algorithms Using New Types of
Performance Test of Local Search Algorithms Using New Types of Random CNF Formulas Byungki Cha and Kazuo Iwama Department of Computer Science and Communication Engineering Kyushu University, Fukuoka 812,
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Research Statement Immanuel Trummer www.itrummer.org
Research Statement Immanuel Trummer www.itrummer.org We are collecting data at unprecedented rates. This data contains valuable insights, but we need complex analytics to extract them. My research focuses
Selection of Optimal Discount of Retail Assortments with Data Mining Approach
Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.
College information system research based on data mining
2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei
CNFSAT: Predictive Models, Dimensional Reduction, and Phase Transition
CNFSAT: Predictive Models, Dimensional Reduction, and Phase Transition Neil P. Slagle College of Computing Georgia Institute of Technology Atlanta, GA [email protected] Abstract CNFSAT embodies the P
Building A Smart Academic Advising System Using Association Rule Mining
Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 [email protected] Qutaibah Althebyan +962796536277 [email protected] Baraq Ghalib & Mohammed
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, [email protected]
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India [email protected]
Integrating Benders decomposition within Constraint Programming
Integrating Benders decomposition within Constraint Programming Hadrien Cambazard, Narendra Jussien email: {hcambaza,jussien}@emn.fr École des Mines de Nantes, LINA CNRS FRE 2729 4 rue Alfred Kastler BP
Binary Coded Web Access Pattern Tree in Education Domain
Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: [email protected] M. Moorthi
Sudoku as a SAT Problem
Sudoku as a SAT Problem Inês Lynce IST/INESC-ID, Technical University of Lisbon, Portugal [email protected] Joël Ouaknine Oxford University Computing Laboratory, UK [email protected] Abstract Sudoku
Why? A central concept in Computer Science. Algorithms are ubiquitous.
Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online
An Efficient Frequent Item Mining using Various Hybrid Data Mining Techniques in Super Market Dataset
An Efficient Frequent Item Mining using Various Hybrid Data Mining Techniques in Super Market Dataset P.Abinaya 1, Dr. (Mrs) D.Suganyadevi 2 M.Phil. Scholar 1, Department of Computer Science,STC,Pollachi
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan
Integrating Pattern Mining in Relational Databases
Integrating Pattern Mining in Relational Databases Toon Calders, Bart Goethals, and Adriana Prado University of Antwerp, Belgium {toon.calders, bart.goethals, adriana.prado}@ua.ac.be Abstract. Almost a
Association Rule Mining
Association Rule Mining Association Rules and Frequent Patterns Frequent Pattern Mining Algorithms Apriori FP-growth Correlation Analysis Constraint-based Mining Using Frequent Patterns for Classification
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
A Survey on Association Rule Mining in Market Basket Analysis
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 4 (2014), pp. 409-414 International Research Publications House http://www. irphouse.com /ijict.htm A Survey
International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
InvGen: An Efficient Invariant Generator
InvGen: An Efficient Invariant Generator Ashutosh Gupta and Andrey Rybalchenko Max Planck Institute for Software Systems (MPI-SWS) Abstract. In this paper we present InvGen, an automatic linear arithmetic
SPMF: a Java Open-Source Pattern Mining Library
Journal of Machine Learning Research 1 (2014) 1-5 Submitted 4/12; Published 10/14 SPMF: a Java Open-Source Pattern Mining Library Philippe Fournier-Viger [email protected] Department
International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
An application for clickstream analysis
An application for clickstream analysis C. E. Dinucă Abstract In the Internet age there are stored enormous amounts of data daily. Nowadays, using data mining techniques to extract knowledge from web log
ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL
International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR
Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm
Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Purpose: key concepts in mining frequent itemsets understand the Apriori algorithm run Apriori in Weka GUI and in programatic way 1 Theoretical
Ant Colony Optimization and Constraint Programming
Ant Colony Optimization and Constraint Programming Christine Solnon Series Editor Narendra Jussien WILEY Table of Contents Foreword Acknowledgements xi xiii Chapter 1. Introduction 1 1.1. Overview of the
PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE
International Journal of Computer Science and Applications, Vol. 5, No. 4, pp 57-69, 2008 Technomathematics Research Foundation PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE
NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE
www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan
Comparison of Data Mining Techniques for Money Laundering Detection System
Comparison of Data Mining Techniques for Money Laundering Detection System Rafał Dreżewski, Grzegorz Dziuban, Łukasz Hernik, Michał Pączek AGH University of Science and Technology, Department of Computer
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
RDB-MINER: A SQL-Based Algorithm for Mining True Relational Databases
998 JOURNAL OF SOFTWARE, VOL. 5, NO. 9, SEPTEMBER 2010 RDB-MINER: A SQL-Based Algorithm for Mining True Relational Databases Abdallah Alashqur Faculty of Information Technology Applied Science University
Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining
Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining by Ashish Mangalampalli, Vikram Pudi Report No: IIIT/TR/2008/127 Centre for Data Engineering International Institute of Information Technology
A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains
A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains Dr. Kanak Saxena Professor & Head, Computer Application SATI, Vidisha, [email protected] D.S. Rajpoot Registrar,
Random vs. Structure-Based Testing of Answer-Set Programs: An Experimental Comparison
Random vs. Structure-Based Testing of Answer-Set Programs: An Experimental Comparison Tomi Janhunen 1, Ilkka Niemelä 1, Johannes Oetsch 2, Jörg Pührer 2, and Hans Tompits 2 1 Aalto University, Department
Web Mining using Artificial Ant Colonies : A Survey
Web Mining using Artificial Ant Colonies : A Survey Richa Gupta Department of Computer Science University of Delhi ABSTRACT : Web mining has been very crucial to any organization as it provides useful
Private Record Linkage with Bloom Filters
To appear in: Proceedings of Statistics Canada Symposium 2010 Social Statistics: The Interplay among Censuses, Surveys and Administrative Data Private Record Linkage with Bloom Filters Rainer Schnell,
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
ONTOLOGY IN ASSOCIATION RULES PRE-PROCESSING AND POST-PROCESSING
ONTOLOGY IN ASSOCIATION RULES PRE-PROCESSING AND POST-PROCESSING Inhauma Neves Ferraz, Ana Cristina Bicharra Garcia Computer Science Department Universidade Federal Fluminense Rua Passo da Pátria 156 Bloco
XOR-based artificial bee colony algorithm for binary optimization
Turkish Journal of Electrical Engineering & Computer Sciences http:// journals. tubitak. gov. tr/ elektrik/ Research Article Turk J Elec Eng & Comp Sci (2013) 21: 2307 2328 c TÜBİTAK doi:10.3906/elk-1203-104
On the Relationship between Classes P and NP
Journal of Computer Science 8 (7): 1036-1040, 2012 ISSN 1549-3636 2012 Science Publications On the Relationship between Classes P and NP Anatoly D. Plotnikov Department of Computer Systems and Networks,
University of Potsdam Faculty of Computer Science. Clause Learning in SAT Seminar Automatic Problem Solving WS 2005/06
University of Potsdam Faculty of Computer Science Clause Learning in SAT Seminar Automatic Problem Solving WS 2005/06 Authors: Richard Tichy, Thomas Glase Date: 25th April 2006 Contents 1 Introduction
Generating models of a matched formula with a polynomial delay
Generating models of a matched formula with a polynomial delay Petr Savicky Institute of Computer Science, Academy of Sciences of Czech Republic, Pod Vodárenskou Věží 2, 182 07 Praha 8, Czech Republic
Discuss the size of the instance for the minimum spanning tree problem.
3.1 Algorithm complexity The algorithms A, B are given. The former has complexity O(n 2 ), the latter O(2 n ), where n is the size of the instance. Let n A 0 be the size of the largest instance that can
Finding Liveness Errors with ACO
Hong Kong, June 1-6, 2008 1 / 24 Finding Liveness Errors with ACO Francisco Chicano and Enrique Alba Motivation Motivation Nowadays software is very complex An error in a software system can imply the
An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients
An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients Celia C. Bojarczuk 1, Heitor S. Lopes 2 and Alex A. Freitas 3 1 Departamento
Mining an Online Auctions Data Warehouse
Proceedings of MASPLAS'02 The Mid-Atlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance
Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Compact Representations and Approximations for Compuation in Games
Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions
Comparative Study in Building of Associations Rules from Commercial Transactions through Data Mining Techniques
Third International Conference Modelling and Development of Intelligent Systems October 10-12, 2013 Lucian Blaga University Sibiu - Romania Comparative Study in Building of Associations Rules from Commercial
Implementation of hybrid software architecture for Artificial Intelligence System
IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.1, January 2007 35 Implementation of hybrid software architecture for Artificial Intelligence System B.Vinayagasundaram and
II. OLAP(ONLINE ANALYTICAL PROCESSING)
Association Rule Mining Method On OLAP Cube Jigna J. Jadav*, Mahesh Panchal** *( PG-CSE Student, Department of Computer Engineering, Kalol Institute of Technology & Research Centre, Gujarat, India) **
TECHNIQUES FOR OPTIMIZING THE RELATIONSHIP BETWEEN DATA STORAGE SPACE AND DATA RETRIEVAL TIME FOR LARGE DATABASES
Techniques For Optimizing The Relationship Between Data Storage Space And Data Retrieval Time For Large Databases TECHNIQUES FOR OPTIMIZING THE RELATIONSHIP BETWEEN DATA STORAGE SPACE AND DATA RETRIEVAL
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Mining for Web Engineering
Mining for Engineering A. Venkata Krishna Prasad 1, Prof. S.Ramakrishna 2 1 Associate Professor, Department of Computer Science, MIPGS, Hyderabad 2 Professor, Department of Computer Science, Sri Venkateswara
Chapter 6: Episode discovery process
Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
OHJ-2306 Introduction to Theoretical Computer Science, Fall 2012 8.11.2012
276 The P vs. NP problem is a major unsolved problem in computer science It is one of the seven Millennium Prize Problems selected by the Clay Mathematics Institute to carry a $ 1,000,000 prize for the
Data Mining and Database Systems: Where is the Intersection?
Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: [email protected] 1 Introduction The promise of decision support systems is to exploit enterprise
Bounded-width QBF is PSPACE-complete
Bounded-width QBF is PSPACE-complete Albert Atserias 1 and Sergi Oliva 2 1 Universitat Politècnica de Catalunya Barcelona, Spain [email protected] 2 Universitat Politècnica de Catalunya Barcelona, Spain
Breaking Generalized Diffie-Hellman Modulo a Composite is no Easier than Factoring
Breaking Generalized Diffie-Hellman Modulo a Composite is no Easier than Factoring Eli Biham Dan Boneh Omer Reingold Abstract The Diffie-Hellman key-exchange protocol may naturally be extended to k > 2
WEBLOG RECOMMENDATION USING ASSOCIATION RULES
IADIS International Conference on Web Based Communities 2006 WEBLOG RECOMMENDATION USING ASSOCIATION RULES Juan Julián Merelo Guervós, Pedro A. Castillo, Beatriz Prieto Campos Departamento de Arquitectura
Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm
International Journal of Engineering Inventions e-issn: 2278-7461, p-issn: 2319-6491 Volume 2, Issue 5 (March 2013) PP: 16-21 Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm
HYBRID ACO-IWD OPTIMIZATION ALGORITHM FOR MINIMIZING WEIGHTED FLOWTIME IN CLOUD-BASED PARAMETER SWEEP EXPERIMENTS
HYBRID ACO-IWD OPTIMIZATION ALGORITHM FOR MINIMIZING WEIGHTED FLOWTIME IN CLOUD-BASED PARAMETER SWEEP EXPERIMENTS R. Angel Preethima 1, Margret Johnson 2 1 Student, Computer Science and Engineering, Karunya
Disjunction of Non-Binary and Numeric Constraint Satisfaction Problems
Disjunction of Non-Binary and Numeric Constraint Satisfaction Problems Miguel A. Salido, Federico Barber Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia Camino
MPBO A Distributed Pseudo-Boolean Optimization Solver
MPBO A Distributed Pseudo-Boolean Optimization Solver Nuno Miguel Coelho Santos Thesis to obtain the Master of Science Degree in Information Systems and Computer Engineering Examination Committee Chairperson:
EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS
EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS Susan P. Imberman Ph.D. College of Staten Island, City University of New York [email protected] Abstract
Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows
TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)
Effect of Using Neural Networks in GA-Based School Timetabling
Effect of Using Neural Networks in GA-Based School Timetabling JANIS ZUTERS Department of Computer Science University of Latvia Raina bulv. 19, Riga, LV-1050 LATVIA [email protected] Abstract: - The school
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
New Matrix Approach to Improve Apriori Algorithm
New Matrix Approach to Improve Apriori Algorithm A. Rehab H. Alwa, B. Anasuya V Patil Associate Prof., IT Faculty, Majan College-University College Muscat, Oman, [email protected] Associate
A Note on Maximum Independent Sets in Rectangle Intersection Graphs
A Note on Maximum Independent Sets in Rectangle Intersection Graphs Timothy M. Chan School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada [email protected] September 12,
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.
RESEARCH ARTICLE ISSN: 2321-7758 GLOBAL LOAD DISTRIBUTION USING SKIP GRAPH, BATON AND CHORD J.K.JEEVITHA, B.KARTHIKA* Information Technology,PSNA College of Engineering & Technology, Dindigul, India Article
Optimizing Description Logic Subsumption
Topics in Knowledge Representation and Reasoning Optimizing Description Logic Subsumption Maryam Fazel-Zarandi Company Department of Computer Science University of Toronto Outline Introduction Optimization
Towards a Benchmark Suite for Modelica Compilers: Large Models
Towards a Benchmark Suite for Modelica Compilers: Large Models Jens Frenkel +, Christian Schubert +, Günter Kunze +, Peter Fritzson *, Martin Sjölund *, Adrian Pop* + Dresden University of Technology,
Data Mining in the Application of Criminal Cases Based on Decision Tree
8 Journal of Computer Science and Information Technology, Vol. 1 No. 2, December 2013 Data Mining in the Application of Criminal Cases Based on Decision Tree Ruijuan Hu 1 Abstract A briefing on data mining
A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms
A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms Ian Davidson 1st author's affiliation 1st line of address 2nd line of address Telephone number, incl country code 1st
How To Find Influence Between Two Concepts In A Network
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing
