Chapter 7: Association Rule Mining in Learning Management Systems

Transcription

1 1 Chapter 7: Association Rule Mining in Learning Management Systems Enrique García 1, Cristóbal Romero 1, Sebastián Ventura 1, Carlos de Castro 1, Toon Calders 2 1 Department of Computer Science and Numerical Analysis, University of Cordoba, Spain 2 Eindhoven University of Technology (TU/e), PO Box 513, Eindhoven, The Netherlands Introduction Learning Management Systems (LMSs) can offer a great variety of channels and workspaces to facilitate information sharing and communication among participants in a course. They let educators distribute information to students, produce content material, prepare assignments and tests, engage in discussions, manage distance classes and enable collaborative learning with forums, chats, file storage areas, news services, etc. Some examples of commercial systems are Blackboard [1], WebCT [2] and Top-Class [3] while some examples of free systems are Moodle [4], Ilias [5] and Claroline [6]. One of the most commonly used is Moodle (modular object oriented developmental learning environment), a free learning management system enabling the creation of powerful, flexible and engaging online courses and experiences (Rice, 2006). These e-learning systems accumulate a vast amount of information which is very valuable for analyzing students behaviour and could create a gold mine of educational data [7]. They can record any student activities involved, such as reading, writing, taking tests, performing various tasks, and even communicating with peers. They normally also provide a database that stores all the system s information: personal information about the users (profile), academic results and users interaction data. However, due to the vast quantities of data these systems can generate daily, it is very difficult to manage data analysis manually. Instructors and

2 2 course authors demand tools to assist them in this task, preferably on a continual basis. Although some platforms offer some reporting tools, it becomes hard for a tutor to extract useful information when there are a great number of students [8]. The current LMSs do not provide specific tools allowing educators to thoroughly track and assess all learners activities while evaluating the structure and contents of the course and its effectiveness for the learning process [9]. A very promising area for attaining this objective is the use of data mining. Data mining or knowledge discovery in databases (KDD) is the automatic extraction of implicit and interesting patterns from large data collections. Next to statistics and data visualization, there are many data mining techniques for analyzing the data. Some of the most useful data mining tasks and methods are clustering, classification and association rule mining. These methods uncover new, interesting and useful knowledge based on users usage data. In the last few years, researchers have begun to apply data mining methods to help instructors and administrators to improve e- learning systems [10]. Association rule mining has been applied to e-learning systems for traditionally association analysis (finding correlations between items in a dataset), including, e.g., the following tasks: building recommender agents for on-line learning activities or shortcuts [11], automatically guiding the learner s activities and intelligently generate and recommend learning materials [12], identifying attributes characterizing patterns of performance disparity between various groups of students [13], discovering interesting relationships from a student s usage information in order to provide feedback to the course author [14], finding out the relationships between each pattern of a learner s behavior [15], finding student mistakes often occurring together [16], guiding the search for the best fitting transfer model of student learning [17], optimizing the content of an e-learning portal by determining the content of most interest to the

3 3 user [18], extracting useful patterns to help educators and web masters evaluating and interpreting on-line course activities [11], and personalizing e-learning based on aggregate usage profiles and a domain ontology [19]. Association rules mining is one of the most well studied data mining tasks. It discovers relationships among attributes in databases, producing if-then statements concerning attributevalues [20]. Association rule mining has been applied to web-based education systems from two points of view: 1) help professors to obtain detailed feedback of the e-learning process: e.g., finding out how the students learn on the web, to evaluate the students based on their navigation patterns, to classify the students into groups, to restructure the contents of the web site to personalize the courses; and 2) help students in their interaction with the e-learning system: e.g., adaptation of the course according to the apprentice's progress, e.g., by recommending to them personalized learning paths based on the previous experiences other similar students. This paper is organized in the following way: First, we describe the background of association rule mining in general and more specifically its application to e-learning. Then, we describe the main drawbacks of and some solutions for applying association rule algorithms in LMSs. Next, we show a practical tutorial of using an association rule mining algorithm over data generated from a Moodle system. Finally, the conclusions and further research are outlined. Background IF-THEN rules are one of the most popular ways of knowledge representation, due to their simplicity and comprehensibility [21]. There are different types of rules according to the data mining technique used, for example: classification, association, sequential pattern analysis, prediction, causality induction, optimization, etc. In the area of knowledge discovery in databases the most studied ones are association rules, classifiers and predictors. An example of a

4 4 generic rule format IF-THEN in Extend Backus Naur Form (EBNF) notation it is shown in the Table 1. Table 1. IF-THEN rule format <rule> ::= IF <antecedent> THEN <consequent> <antecedent> ::= <condition> + <consequent> ::= <condition> + <condition> ::= <attribute> <operator> <value> <attribute> ::= Each one of the possible attributes set <value> ::= Each one of the possible values of the attributes set <operator> ::= = > < Before beginning the study in depth of the main association rule mining algorithms, we first formally define what is an association rule [22]. Let I = {i 1, i 2,..., i m } be a set of literals, called items. Let D be a set of transactions, where each transaction T is a set of items such that T I. Each transaction is associated with a unique identifier, called its TID. We say that a transaction T contains X, a set of some items in I, if X I. An association rule is an implication of the form X Y, where X I, Y I, and X Y=. The rule X Y holds in the transaction set D with confidence c if c% of transactions in D that contain X also contain Y. The rule X Y has support s in the transaction set D if s% of transactions in D contain X Y. An example association rule is: "IF diapers in a transaction, THEN beer is in the transaction as well in 30% of the cases. Diapers and beer are bought together in 11 % of the rows in the database". In this example, support and confidence of the rule are: s = P(X Y)=11%

5 5 c = P(Y/X) = P(X Y) / P(X) = s (X Y) / s (X) = 30% Given a set of transactions D, and user-specified thresholds minimum support (called minsup) and minimum confidence (called minconf ), the problem of mining association rules is to generate all association rules that have support and confidence greater than minsup and minconf respectively. Our discussion is neutral with respect to the representation of D. For example, D could be a data file, a relational table, or the result of a relational expression. The previous problem is solved by the Apriori algorithm [20] that is the first and simplest association rule mining algorithm. For an overview of different association rule mining algorithms and implementations, see the Frequent Itemset Mining Implementations (FIMI) repository ( Figure 1 shows the Apriori algorithm. The first pass of the algorithm simply counts item occurrences to determine the frequent itemsets of size (cardinality) 1. A frequent itemset of size k is in literature also called a large k-itemset. All subsequent passes, consists of two phases. First, the large itemsets L k found in the (k)th pass are used to generate the candidate itemsets C k+1 as it is shown in Figure 2. Next, the database is scanned and the support of candidates in C k+1 is counted. C k : Candidate itemsets of size k; L k : Frequent itemsets of size k L 1 = {frequent items}; for (k = 1; L k!= ; k++) do begin C k+1 = {candidates generated from L k }; for each transaction t in the database do increment the count of all candidates in C k+1 that are contained in t; L k+1 = {candidates in C k+1 with minsup} end return k L k ; Figure 1. Pseudo-code of Apriori Algorithm

6 6 Let us consider a set L 3 ={abc, abd, acd, ace, bcd} The join step: - C k+1 is generated by joining L k with itself Self-joining: L 3 *L 3 (Items must be in lexicographic order.) - abcd from abc and abd - acde from acd and ace The prune step: - Any k-itemset that is not frequent cannot be a subset of a frequent (k+1)-itemset - Pruning C 4 : acde is removed because ade is not in L 3 The final candidate set: C 4 ={abcd} Figure 2. Apriori candidate generation There are a lot of different association rule algorithms. A comparative study between the main algorithms that are currently used to discover association rules can be found in [23]: Apriori [22], FP-Growth [24], MagnumOpus [25], Closet [26]. Most of these algorithms require the user to set two thresholds, the minimal support and the minimal confidence, and find all the rules that exceed the thresholds specified by the user. Therefore, the user must possess a certain amount of expertise in order to find the right settings for support and confidence to obtain the best rules. Drawbacks of applying association rule in e-learning In the association rule mining area, most of the research efforts went in the first place to improving the algorithmic performance [27], and in the second place into reducing the output set by allowing the possibility to express constraints on the desired results. Over the past decade a variety of algorithms that address these issues through the refinement of search strategies, pruning techniques and data structures have been developed. While most algorithms focus on the explicit discovery of all rules that satisfy minimal support and confidence constraints for a given dataset, increasing consideration is being given to specialized algorithms that attempt to improve processing time or facilitate user interpretation by reducing the result set size and by

7 7 incorporating domain knowledge [28]. Most of the current data mining tools are too complex for educators to use and their features go well beyond the scope of what an educator might require. As a result, the course administrator is more likely to apply data mining techniques in order to produce reports for instructors who then use these reports to make decisions about how to improve the student s learning and the online courses. Nowadays, normally, data mining tools are designed more for power and flexibility than for simplicity. There are also other specific problems related to the application of association rule mining to e-learning data. Next, we are going to describe some of the main drawbacks of association rule algorithms in e-learning. Finding the appropriate parameter settings of the mining algorithm Association rule mining algorithms need to be configured before they are executed. So, the user has to give appropriate values for the parameters in advance (often leading to too many or too few rules) in order to obtain a good number of rules. Most of these algorithms require the user to set two thresholds, the minimal support and the minimal confidence, and then find all the rules that exceed the thresholds specified by the user. Therefore, the user must possess a certain amount of expertise in order to find the right settings for support and confidence to obtain the best rules. One possible solution to this problem can be to use a parameter-free algorithm. For example, the Weka [29] package implements an Apriori-type algorithm that solves this problem partially. This algorithm reduces iteratively the minimum support, by a factor delta support (Δs) introduced by the user, until a minimum support is reached or a required number of rules has been generated.

8 8 Another improved version of the Apriori algorithm is the Predictive Apriori algorithm [30], which automatically resolves the problem of balance between these two parameters, maximizing the probability of making an accurate prediction for the data set. In order to achieve this, a parameter called the exact expected predictive accuracy is defined and calculated using the Bayesian method, which provides information about the accuracy of the rule found. In [31] experimental tests were performed on a Moodle course by comparing the two previous algorithms. The final results demonstrated better performance for Predictive Apriori than the Apriori-type algorithm using the Δs factor. Discovering too many rules Educational data sets are normally very small if we compare them with databases used in other data mining fields typical sizes are the size of one class, which are often only exemplars. In very few cases, we get data from students. Therefore, the application of traditional association algorithms will be simple and efficient. However, association rule mining algorithms normally discover a huge quantity of rules and do not guarantee that all the rules found are relevant. Event though support and confidence already allow for pruning many associations, often it is desirable to apply other constraints as well; for example, on the attributes that must or cannot be present in the antecedent or consequent of the discovered rules. Another solution is to evaluate, and post-prune the obtained rules in order to find the most interesting rules for a specific problem. Traditionally, the use of objective interestingness measures has been suggested [32], such as support and confidence, mentioned previously, as well as purely statistical measures such as the chi-square statistic and the correlation coefficient in order to measure the dependency inference between data variables. Subjective measures are

9 9 becoming increasingly important [33], in other words measures that are based on subjective factors controlled by the user. Most of the subjective approaches involve user participation in order to express, in accordance with his or her previous knowledge, which rules are of interest. Some suggested subjective measures [34] are: unexpectedness (rules are interesting if they are unknown to the user or contradict the user s knowledge) and actionability (rules are interesting if users can do something with them to their advantage). Liu et al. [34] proposed an Interestingness Analysis System (IAS) which compares rules discovered with the user's knowledge about the area of interest. Let U be the set of a user s specification representing his/her knowledge space, A be the set of discovered association rules, this algorithm implements a pruning technique for removing redundant or insignificant rules by ranking and classifying them into four categories: conforming rules, unexpected consequent rules, unexpected condition rules and both-side unexpected rules. The degrees of membership into each of these four categories are used for ranking the rules. Using their own specification language, they indicate their knowledge about the matter in question, through relationships among the fields or items in the database. Finally, another approximation is that we can see the knowledge database as a rule repository on the basis of which subjective analysis of the rules discovered can be performed [35]. Before running the association rule mining algorithm, the teacher could download the relevant knowledge database, in accordance with his/her profile. The personalization of the rules returned is based on filtering parameters, associated with the type of the course to be analyzed such as: the area of knowledge; the level of education; the difficulty of the course, etc. The rules repository is created on the server in a collaborative way where the experts can vote for each rule

10 10 in the repository, based on the educational considerations and their experience gained in other similar e-learning courses. Discovery of poorly understandable rules A factor that is of major importance in determining the quality of the extracted rules is their comprehensibility. Although the main motivation for rule extraction is to obtain a comprehensible description of the underlying model's hypothesis, this aspect of rule quality is often overlooked due to the subjective nature of comprehensibility, which can not be measured independently of the person using the system [36]. Prior experience and domain knowledge of this person play an important role in assessing the comprehensibility. This contrasts with accuracy that can be considered as a property of the rules and which can be evaluated independently of the users. There are some traditional techniques that have been used in order to improve the comprehensibility of discovered rules, such as, constraining the number of items in the antecedent or consequent of the rule, or performing a discretization of numerical values. Discretization [37] divides the numerical data into categorical classes that are easier to understand for the instructor (categorical values are more user-friendly for the instructor than precise magnitudes and ranges). Another way to improve the comprehensibility of the rules is to incorporate domain knowledge and semantics, and to use a common and well-know vocabulary for the teacher. In the context of web-based educational systems, we can identify some common attributes to a variety of e-learning systems such as LMS and adaptive hypermedia courses, such as visited, time, score, difficulty level, knowledge level, attempts, number of messages, etc. In this context the use of standard metadata for e-learning [38] allows the creation and maintenance of a

11 11 common knowledge base with a common vocabulary susceptible of sharing among different communities of instructors or creators of e-learning courses. The SCORM [39] standard describes a content aggregation model and a tracking model for reusable learning objects to support adaptive instruction based on the learner s objectives, preferences, performance and other factors like instructional techniques. One of the most important features of SCORM is that allows the instructional content designer to specify sequencing rules and navigation behavior while maintaining the possibility of reusing learning resources within multiple and different aggregation contexts. While SCORM provide a framework for the representation and processing of the metadata, it falls short in including the needed support for more specific pedagogical tracking such as the use of collaborative resources. We thus argue that the use of a more specific pedagogical ontology provides a higher level of decision support analysis and mining, based in qualitative issues like: the collaborative degree of activities. Lastly, we consider very important to mention another aspect that can facilitate the comprehensibility of discovered rules, the visualization. The goal of visualization is to help analysts in inspecting the data after to applying the mining task [40] by means of some visual representation of the corpus of rules extracted. A range of visual representations is used, such as tables, two-dimensional matrices, graphs, bar charts, grids, mosaic plots, parallel coordinates, etc. Association rule mining can be integrated with visualization techniques [41] in order to allow users to drive the association rule finding process, giving them control and visual cues to ease understanding of both the process and its results. However, the visualization methods are still difficult to understand for a non-expert in data mining, such as a teacher. Therefore, we consider this question need to be addressed in a near future, and the challenge it will be to apply

12 12 these techniques in a more intuitive way identifying within these data structures the rules that are relevant and meaningful in the context of the e-learning analysis, and representing them in a simple way such as icons, colors, etc, using interactive bi-dimensional and three-dimensional representations. Statistical significance of discovered rules One aspect that is often overlooked when applying data mining techniques on small datasets is that of overfitting. When there is only limited data available, and the hypothesis space (i.e., the total number of possible rules) is huge, it is inevitable in a statistical sense, to get false discoveries because every rule has a small chance of being true in the given data, and the number of rules is huge. The smaller the dataset, the larger the danger for these so-called type-i errors: false rules that accidentally hold in the small dataset. To address this problem it is more than advisable to restrict the hypothesis space by inserting as much as possible background knowledge; i.e.: restricting the body and heads of the rules as much as possible, and maybe, in worst case, even remove numerous attributes from the dataset. When evaluating the outcome of an association rule mining operation, one also has to take very well into account that without further controlled experiments the found associations are, at least, open to debate. An introduction to association rule mining with Weka in a Moodle LMS We are going to use Weka over Moodle data in order to show a practical example of applying association rule mining in an educational environment. Moodle [42] is a well-known, freeware learning management system and open-source software package designed using sound pedagogical principles, to help educators create effective online learning communities. Moodle has a flexible array of module activities and resources to create five kinds of static course material (a text page, a web page, a link to anything on the Web, a

13 13 view into one of the course s directories and a label that displays any text or image), six types of interactive course material (assignments, choice, journal, lesson, quiz and survey) and five kinds of activities where students interact with each other (chat, forum, glossary, wiki and workshop). On the other hand, Weka [29] is an open-source software platform that provides a collection of machine learning and data mining algorithms for data pre-processing, classification, regression, clustering, association rules, and visualization. The process of applying association rule mining over the Moodle data consists of the same four steps as the general data mining process (see Figure 3): Collect data. The LMS system is used by students and the usage and interaction information is stored in the database. We are going to use the students usage data of the Moodle system. Preprocess the data. The data are cleaned and transformed into a mineable format. In order to preprocess the Moodle data we used the MySQL System Tray Monitor and Administrator tools [43] and the Open DB Preprocess task in the Weka Explorer [29]. Apply association rule mining. The data mining algorithms are applied to discover and summarize knowledge of interest to the teacher. Interpret, evaluate and deploy the results. The obtained results or model are interpreted and used by the teacher for further actions. The teacher can use the discovered information for making decision about the students and the Moodle activities of the course in order to improve the students learning.

14 14 Figure 3. Mining Moodle data. Our objective is to use the students usage data of the Moodle system. Moodle has a lot of detailed information about content, users, usage, etc. that it is stored in a relational database. The data is stored in a single database: MySQL and PostgreSQL are best supported, but it can also be used with Oracle, Access, Interbase, any database supporting ODBC connections and others. We have used MySQL because is the world's most popular open source database. Moodle has more than 150 tables, some of the most important are: mdl_log, mdl_assignement, mdl_chat, mdl_forum, mdl_message, mdl_quiz. Data preprocessing allows for transforming the original data into a suitable shape to be used by a particular data mining algorithm or framework. So, before applying a data mining algorithm, a number of general data preprocessing tasks has to be addressed (data cleaning, user identification, session identification, path completion, transaction identification, data transformation and enrichment, data integration, data reduction). Data preprocessing of LMSgenerated data has some specific issues: Moodle and most of the LMS use user authentication (password protection) in which logs have entries identified by users since the users have to log-in, and sessions are already

15 15 identified since users may also have to log-out. So, we can remove the typical user and session identification tasks of preprocessing data of web-based systems. Moodle and most of the LMSs record the students usage information not only in log files but also directly in relational databases. The tracking module can store user interactions at a higher level than simple page access. Databases are more powerful, flexible and less bug-prone that typical log text files for gather detailed access and usage information of the services available (forums, chats, etc.) by the LMS. Therefore, the data gathered by a LMS may require less cleaning and preprocessing than data collected in other systems based on log files. In our case, we are going to do the following preprocessing tasks: Selecting data. The first task is to choose in what specific Moodle courses we are interested to use for mining. Creating summary tables. Starting from the selected course we create summarization tables that aggregate the information at the required level (e.g. student). Student and interaction data are spread over several tables. We have created a new summary table which integrates the most important information for our objective. This table has a summary by row about all the activities done by each student in the course and the final obtained mark by the student in the same course. In order to create this table it is necessary to do several queries to the database in order to obtain the information of the desired students (userid value from mdl_user_students table) and courses (id value of the mdl_course table). Table 2 shows the main attributes used to create summarization tables.

16 16 Table 2. Attributes used for each student. Name Description course Identification number of the course n_assigment Number of realized assignments. n_quiz Number of realized quizzes. n_quiz_a Number of passed quizzes. n_quiz_s Number of failed quizzes. n_messages Number of send messages to the chat. n_messages_ap Number of send messages to the teacher. n_posts Number of send messages to the forum. n_read Number or read messages of the forum. total_time_assignment Total time used in assignment. total_time_quiz Total time used in quiz. total_time_forum Total time used in forum. mark Final mark the student obtained in the course. Discretizing data. Next, we perform a discretization of numerical values in order to increase the interpretation and comprehensibility. Discretization divides the data in categorical classes that are easier to understand for the teacher (categorical values are more familiar to the teacher than precise magnitudes and ranges). In our experiments we discretized all the numerical values of the summarization table (except for course identification value). There are several unsupervised global methods [37] for transforming continuous attributes into discrete Attributes, such as the equal-width method, equal-frequency method or the manual method (in which you have to specify the cut-off points). In this case, we have use the manual method for the mark attribute and the equal-width method for the others attributes. The labels that we have used in the mark attribute are: FAIL, PASS, GOOD and EXCELLENT, and in all the other attributes: LOW, MEDIUM and HIGH. Transforming the data. Finally, we transform the data to the required format used by the data mining algorithm or framework in order to be mineable. In this case, we have

17 17 exported the summary table to a text file with ARFF format used by Weka. An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes [44]. Once the data is prepared, we can apply the association rule mining algorithm. In order to do so, it is necessary: 1) to choose the specific association rule mining algorithm; 2) to configure the parameters of the algorithm; 3) to identify which table will be used for the mining; 4) and to specify some other restrictions, such as the maximum number of items and what specific attributes can be present in the antecedent or consequent of the discovered rules. In this case, we have used the Apriori algorithm [20] for finding association rules over the discretized summarization table of the course 110 (Projects), executing this algorithm with a minimum support of 0.3 and a minimum confidence of 0.9 as parameters. Projects is a 3º course subjects of computer science studies oriented to design, develop, document and maintain a full computer science project. Students have to use Moodle to read/download documents/notes, do activities, do quizzes and send/read message to forums in order to complement traditional (face-to-face) classrooms. The instructor can use all the collected data by Moodle for doing data mining; in our case association rule mining. Association rules can discover for example interesting relationships between the Moodle usage and the student's final marks. It can show what type of students actions predict good final marks (if activities done then marks good) or bad final marks (if not message send/read to forum then marks bad) and can be used to detect on time student with problem in order to help them and motivate them.

18 18 Figure 4. Results of Apriori algorithm Finally, we do a data post-processing step in which the obtained results and models are interpreted, evaluated and used by the teacher for further actions. The teacher can use the discovered information for making decisions about the students and the LMS activities of the course in order to improve the students learning. Next, we are going to explain the meaning of some obtained rules. Figure 4 shows how a huge number of association rules can be discovered. There are also a lot of uninteresting rules, like a great number of redundant rules (rules with a generalization of relationships of several rules, like rule 3 with rules 1 and 2, and rule 6 with rule 7 and 8). There are some similar rules (rules with the same element in antecedent and consequent but interchanged, such as rules 15, 16, 17, and rules 18, 19, 20). And there are some random relationships (rules with random relations between variables, such as rules 1 and 5). But there are also rules that show relevant information for educational purposes, like those that show expected

19 19 or conforming relationships (if a students does not send messages, it is logical that he/she does not read them either, such as rule 2, and, similarly, rules 10, 11 and 13). There are also rules that show unexpected relationships (such as rules 4, 12, 14 and 9) which can be very useful for the instructor in decision making about the activities and detecting students with learning problems, like rules 4, 12 and 9 that show that if the number of messages read and messages sent in the forum is very low, and the total time and number of passed quizzes is very low, then the final mark obtained is a failing grade. Starting from this information, the instructor can pay more attention to these students because they are prone to failure. As a result, the instructor can motivate them in time to pass the course. Conclusions and future trends It is still the early days for the total integration of association rule mining in e-learning systems and not many real and fully operative implementations are available. Nevertheless, a good deal of academic research in this area has been published over the last few years. In this paper, we have outlined some of the main drawbacks of the application of association rule mining to data generated by learning management systems. We have proposed some possible solutions for each problem. Furthermore, we have given general outlines, and explained how to carry out a particular KDD process for association rule mining in a LMS system. We believe that some future research lines will focus on: developing an association rule mining tool that can more easily be used by educators; proposing new specific measures of interest with the inclusion of domain knowledge and semantic information; integrating the mining tool into the LMS systems in order to give direct feedback; developing iterative and interactive or guided mining to help educators to apply KDD processes, or even developing an automatic mining system that can perform the mining automatically in an unattended way, such

20 20 that the teacher only has to use the proposed recommendations in order to improve the students learning. Acknowledgments The authors gratefully acknowledge the financial support provided by the Spanish department of Research under TIN C06-03 and P08-TIC-3720 Projects. References [1] BlackBoard, available at (accessed March 16), [2] WebCT, available at (accessed March 16), [3] TopClass, available at (accessed March 16), [4] Moodle, available at (accessed March 16), [5] Ilias, available at (accessed March 16), [6] Claroline, available at (accessed March 16), [7] Mostow, J., and Beck, J., Some useful tactics to modify, map and mine data from intelligent tutors. Natural Language Engineering. 12(2), , [8] Dringus, L. and Ellis, T., Using data mining as a strategy for assessing asynchronous discussion forums. Computer & Education Journal. 45, , [9] Zorrilla, M. E., Menasalvas, E., Marin, D., Mora, E. and Segovia, J., Web usage mining project for improving web-based learning sites. In Web mining workshop, Cataluña, 1 22, [10] Romero, C. and Ventura, S., Data mining in e-learning. Southampton, UK: Wit Press [11] Zaïane, O., Building a Recommender Agent for e-learning Systems. Proceedings of the International Conference in Education (ICCE), Auckland, Nueva Zelanda, 55-59, 2002.

21 21 [12] Lu, J., Personalized e-learning material recommender system. International conference on information technology for application , [13] Minaei-Bidgoli, B., Tan, P. and Punch, W., Mining interesting contrast rules for a webbased educational system. International Conference on Machine Learning Applications [14] Romero, C., Ventura, S., and De Bra, P., Knowledge discovery with genetic programming for providing feedback to courseware author. User Modeling and User- Adapted Interaction: The Journal of Personalization Research. 14(5), , [15] Yu, P., Own, C. and Lin, L., On learning behavior analysis of web based interactive environment. In: Proc. ICCEE, Oslo, Norway, [16] Merceron, A., and Yacef, K, Mining student data captured from a web-based tutoring tool: Initial exploration and results. Journal of Interactive Learning Research. 15:4, , [17] Freyberger, J., Heffernan, N., Ruiz, C., Using association rules to guide a search for best fitting transfer models of student learning. Workshop on Analyzing Student-Tutor Interactions Logs to Improve Educational Outcomes at ITS Conference, [18] Ramli, A.A., Web usage mining using apriori algorithm: UUM learning care portal case. International Conference on Knowledge Management. Malaysia. 1-19, [19] Markellou, P., Mousourouli, I., Spiros, S. and Tsakalidis, A., Using Semantic Web Mining Technologies for Personalized e-learning Experiences. The Fourth IASTED Conference on Web-based Education, WBE Grindelwald, Switzerland, [20] Agrawal, R., Imielinski, T. and Swami, A.N., Mining Association Rules between Sets of Items in Large Databases. In Proceedings of SIGMOD, , [21] Klosgen, W., & Zytkow, J., Handbook of data mining and knowledge discovery. New York: Oxford University Press, [22] Agrawal, R. and Srikant, R., 1994, Fast algorithms for mining association rules. Proceedings of Conf. Very Large Data Bases, Santiago de Chile, [23] Zheng, Z., R. Kohavi and Mason, L., Real world performance of association rules. Sixth ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2(2), 86-98, 2001.

22 22 [24] Han, J., Pei, J., and Yin, Y., Mining frequent patterns without candidate generation. In Proceedings of ACM-SIGMOD International Conference on Management of Data [25] Webb, G.I. OPUS: An efficient admissible algorithm for unordered search. Journal of Artificial Intelligence Research, , [26] Pei, J., Han, J., and Mao, R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proceedings of ACM_SIGMOD International DMKD'00, Dallas, TX, [27] Ceglar, A. and Roddick, J.F., Association mining. ACM Computing Surveys, 38(2), 1-42, [28] Calders, T. and Goethals, B. Non-derivable Itemset Mining. Data Mining and Knowledge Discovery. 14, , [29] Weka, available at (accessed March 16), [30] Scheffer, T., Finding Association Rules That Trade Support Optimally against Confidence. Lecture Notes in Computer Science. 2168, 424+, [31] García, E., Romero, C., Ventura, S. and De Castro, C., Using Rules Discovery for the Continuous Improvement of e-learning Courses. Proc. of the 7th International Conference on Intelligent Data Engineering and Automated Learning- IDEAL LNCS 4224, , [32] Tan, P. and Kumar, V., Interesting Measures for Association Patterns: A Perspectiva, Technical Report TR Department of Computer Science. University of Minnnesota, [33] Silberschatz, A. and Tuzhilin, A., What makes pattterns interesting in Knoledge discovery systems. IEEE Trans. on Knowledge and Data Engineering. 8(6), , [34] Liu B., Wynne H., Shu C. and Yiming M., Analyzing the Subjective Interestingness of Association Rules. IEEE Intelligent Systems and their Applications. 15:5, 47-55, [35] García, E., Romero, C., Ventura, S. and De Castro C., An architecture for making recommendations to courseware authors through association rule mining and collaborative filtering. UMUAI: User Modelling and User Adapted Interaction. 19:99, , 2009.

23 23 [36] Huysmans J., Baesens B. and Vanthienen J., Using Rule Extraction to Improve the Comprehensibility of Predictive Models. (accessed January 20, 2008), [37] Dougherty, J. Kohavi, M. and Sahami, M., Supervised and unsupervised discretization of continuous features. Int. Conf. Machine Learning, Tahoe City, CA, , [38] Brase, J. and Nejdl, W., Ontologies and Metadata for elearning. Springer Verlag, , [39] SCORM, Advanced Distributed Learning. Shareable content object reference model: The SCORM overview. Available at (accessed March 16, 2009), [40] De Oliveira, M. C. F. and Levkowitz, H., From visual data exploration to visual data mining: A survey. IEEE Trans. on Visualization and Computer Graphics. 9(3), , [41] Yamamoto C. H. and De Oliveira, M. C. F., Visualization to assist users in association rule mining tasks. PhD diss. Arizona State Mathematics and Computer Sciences Institute. Sao Paulo University. (accessed March 16, 2009), [42] Rice, W. H., Moodle e-learning course development. A complete guide to successful learning using Moodle. Packt Publishing [43] Ullman, L., Guía de aprendizaje MySQL. Pearson Prentice Hall [44] Witten, I. H. and Frank, E., Data mining: Practical machine learning tools and techniques. Morgan Kaufman