Use of group discussion and learning portfolio to build knowledge for managing web group learning

Size: px
Start display at page:

Download "Use of group discussion and learning portfolio to build knowledge for managing web group learning"

Transcription

1 Use of group discussion and learning portfolio to build knowledge for managing web group learning Gwo-Dong Chen,Kuo-Liang Ou, Chin-Yeh Wang Department of Computer Science and Information Engineering National Central University Chung-Li TAIWAN Abstract To monitor and enhance the learning performance of learning groups in a web learning system, teachers need to know the learning status of the group and determine the key influences affecting group learning outcomes. Teachers can achieve this goal by observing the group discussions and learning behavior from web logs and analyzing the web log data to obtain the relevant information. However, web logs are not systematically organized and the discussions are extensive. Consequently, teachers must struggle to extract information from logs and intuitively apply teaching rules based on experience when managing the groups. Rather than using statistics packages to evaluate hypotheses, this work presents a methodology of applying existing data and text mining tools to automatically gather learning status and predict performance of learning groups from the contents of discussions, and from log records of learning behaviors. Meanwhile, the methodology infers a causal network exists between learning features and learning performance. Knowledge is inferred based on statistics and probability reasoning and social interdependency theory. The causal network can suggest means of enhancing learning performance to teachers. Simultaneously, teachers can use the knowledge of learning groups obtained to manage group learning process on the web. Experimental results of applying the novel methodology to manage a group learning class organized over the web and containing 706 students are also presented. 1) Introduction In existing web learning systems, students may need to learn independently, without the opportunity to interact with other learning companions. Simultaneously, numerous researches have indicated that students learn better in a group [1] [2] [3]. Student learning motivation and performance may be enhanced by peer support and competitive pressure from the group [4]. Thus, the group learning mechanism can be

2 adopted into web learning systems to enhance learning performance. However, to manage learning groups in a web-based system, the teachers need to expend significant effort to obtain the learning status for monitoring and guiding the learning groups. To obtain the status of the learning groups, the teachers need to collect the relevant information from the web logs that represent the history of group members actions in carrying out their assigned projects. The web logs include group portfolio and inner group discussion. Because the amount of logs is huge and is not organized to facilitate pedagogical research, teachers will find trying to manage groups by seeking meaningful and useful information from the logs a great burden. The situation is worsened by the involvement of hundreds of students in group learning and by the need to manually analyze thousands of web and discussion logs. To facilitate the obtaining of information from the web log, web masters have developed numerous tools to enhance web performance [5]. However, these tools mostly merely summarize the access log of a website, including information such as access time, access frequencies, and the IP addresses. This statistical information is insufficient to let teachers capture the status of groups in a group learning system. What teachers really need to know is the information that can be inferred from access logs and group portfolios. For instance, teachers are interested in information such as which groups are discussing the topic of group goal [6], which groups have difficulties in learning a specific chapter, which group members are unwilling to help others [7], and so on. One way to derive information about group learning status is to track and analyze group discussions and the behavior of individual group members in carrying out group projects. Teachers can observe the relationships among group members, and can capture the status of the group by analyzing the contents of discussions within the group and the group portfolio. After accumulating observational experience from the discussion contents of classes with the same curriculum and similar learning model, the teacher can induce heuristic rules on how to predict final group performance according to initial group behavior. Thus, the teachers can thus use past experiences to predict learning status conditions in similar groups in the future, and can then use timely intervention to enhance performance. However, in accomplishing the above, teachers face the following problems: First, the simple bulk of material involved makes it hard to discuss the contents of group discussions:

3 The group s discusion board wil reflect the individual learning statuses of the group s members. The contents of the discussions include information about learning performance in reading curriculums, discussion of group tasks, group atmosphere, knowledge sharing, and so on. Teachers can thus capture the learning status of the group by tracking group discussions and helping groups learn. However, the contents of the discussion are always not only large in quantity but are also unorganized, making analysis difficult. Consequently, filtering out uninteresting discussion contents and extracting interesting information are key issues in tracking group discussions. Second, discovering the important impacts of group learning status for teachers to monitor and thus manage group learning is difficult. Teachers can estimate and infer group learning status and many other factors by tracking group discussion and portfolio. However, the discussion board and group portfolio include numerous elements, with examples including resource sharing, effective leadership, and discussion quantities. The teachers should first determine which elements significantly influence learning performance, and then focus on these factors. To locate groups needing help effectively promote performance, teachers should determine the key influences on group learning performance. However, it is difficult to obtain the causal relationship between factors and performance based on learning logs and group discussions. Therefore, how to obtain the causal relationship between learning status and factors is another key issue for teachers wishing to monitor and guide group learning in a web collaborative web learning system. Third, predicting which groups may suffer low performance and then intervening to prevent learning failure is difficult. After finding determining the dominant influences on the success of a learning group, teachers require a map of causal relationships within the group as a base for making decisions on teaching strategies to apply and how to guide group learning. By constructing and refining the map based on several semesters of experience, teachers can use this map of past experience to prevent a group from failing. Simultaneously, the teachers can infer information on unobservable factors from relevant data and the network of relationships within the group. A mechanism is needed to integrate individual experiences so that other

4 teachers or teaching assistants can share these experiences and use them to predict student performance. Consequently, the teachers can intervene in group learning in time to prevent failure. To overcome the above problems, this investigation employed data mining techniques [8] to extract useful information from web logs and group portfolios and allow teachers to manage collaborative learning on the web. The novel methodology employs three tools to extract information to assist teachers in managing the learning groups. 1) Capture learning status by extracting the topics and summaries from the inner group discussion contents: Text Miner, from the IBM DB2 database management system [9], is employed to identify discussion topics and extract summaries of articles. This approach allows teachers to track group discussions simply by reading the discussion topics and summaries, rather than reading through the actual articles on the discussion board. 2. Determining key influences on learning performance by analyzing the causal dependence of learning status indicators: The Bayesian Knowledge Discoverer (BKD) [10], based on conditional probability and Bayesian belief network (BBN) [34], is employed to derive the causal relationships among factors and learning performance. In the derived causal network [35], BKD also extracts the conditional probability distributions of each node based on the data provided. Teachers can then monitor and predict learning status based on the BBN and can manipulate the key influence of learning performance. 3. Predict group learning performance to prevent learning failure: The causal network built based on Bayesian belief network can be used in the Robust Classifier (RoC) [11] to predict group learning performance based on data from discussions within the group and from the makeup of the group. The teachers then can use the tool to predict and locate the groups that are likely to experience poor learning performance. Teachers can then intervene and provide guidance to enhance the learning performance of these groups. 2. Methodological Overview Figure 1 illustrates the three main processes and working flows teachers use to extract learning

5 information and guide group learning. The three processes are (1) capturing learning status by tracking group discussion, (2) discovering the key influence and causal relationships by analyzing the causal relationship between learning status indicators and learning performance, and (3) intervening and preventing group failure according to predicted learning performance. Group Discussion Online Web Logs and Group Works Discussion Tracking and Analyzing Tools (IBM Text Miner) The Learning Status (Training Data) The Learning Status (Testing Data) Learning Status Analyzing and Predicting Tool (BKD & RoC) Causal Network of Learning Status and Learning Performance Predicted Learning Performance Legend Input Data Process Temporal Data Output/Result Figure 1: The process for monitoring and predicting the learning situation To allow group discussion to be monitored, the contents of discussion were inputted into IBM Intelligent Miner for Text, and discussion topics and summaries were derived. The IBM Intelligent Miner for Text [9] is a set of information mining [12] tools for retrieving interesting patterns and gathering information from large quantities of articles. Monitoring the discussion topics and abstracts derived by IBM Intelligent Miner for Text allows teachers to determine learning status without excessive reading. Moreover, fully represent learning status, teachers collected web access logs and group portfolios including answers of questionnaires, and the status of the discussion. These data are treated as training data and inputted into the Bayesian Knowledge Discoverer (BKD) [10] system. BKD is a knowledge discovery software developed by Knowledge Media Institute of the UK s Open University. BKD is an automated modeling tool that can transform a database into a Bayesian network by seeking for the model most likely responsible for the observed data. BKD does not use conditional-independence tests but instead uses Bayesian methods [10]. BKD could establish the casual dependence network of the learning status indicators and represent the network in a graphically network according to the data in a table, with each field/attribute represented having a corresponding learning status indicator or learning performance. The

6 result of applying BKD can help teachers to realize the key influences on learning performance. Meanwhile, each node in the network contains a joint probability distribution for all possible values, thus allowing teachers to obtain information on still unobservable learning status indicators by Bayesian inference from the network and data from other available learning status indicators. Microsoft Belief Network tool MSBN [15] is used herein as the inference tool. Teachers can employ the Robust Classifier (RoC) [11] system to predict group learning performance. The RoC system is a Bayesian based classification software developed by the developer of BKD. RoC can use known learning status indicator values and performance to produce a mechanism for predicting performance. Teachers can thus predict the learning performance of groups based on existing records of learning status indicators. Thus, RoC supports teachers by giving them the information necessary to intervene in group learning and prevent poor group performance. Bayesian net learning is a machine learning method based on the Bayes theorem [33], and was proposed in the 18th century by Rev. Thomas Bayes. The Bayesian learning method calculates explicit probability distribution for hypotheses based on given training data [13] and in some cases can competitive with other learning algorithms, including decision tree and neural network algorithms [14]. In a collaborative learning environment, each grade/category for group learning performance/status is treated as a hypothesis in the Bayesian net learning method. The Bayesian belief net builder, called BKD herein, constructs the causal net and probability distribution of the learning status indicators/performance based on the training data provided, namely records of previous class learning behavior data. With the derived Bayesian net and current student learning behavior data, Bayesian net reasoning modules such as Microsoft MSBN [15], or Bayesian classifier such as RoC, calculate the probability distribution of each hypothesis. The reasoning module then select the maximum probable hypothesis, called the Maximum a Posteriori Hypothesis (H MAP ), or the Maximum Likelihood Hypothesis (H ML ), using evidence of currently available attributes and the derived BBN. For example, if the module estimates the final grade probability distribution of a student under other available learning statuses as (A, 20%), (B, 78%),(C, 2%),(D, 0%), then the module predicts that the final grade of this student will be B. That is, hypothesis final grade = B is H ML or H MAP among all hypotheses final grade = A or B or C or D. The causal network is easy for teachers without a statistics background to understand and apply to the management of learning groups. Furthermore,

7 the following features [13] of Bayesian learning methods make it adaptable for teachers wishing to manage a classroom: 1. The training examples can influence the estimated probability of a hypothesis being correct. This feature provides additional flexibility for teachers wishing to increase or eliminate a hypothesis. 2. Prior knowledge can be combined with observed data to determine the final probability of a hypothesis, allowing teachers to accumulate their teaching experience when managing the group. 3. Bayesian methods can accommodate hypotheses that make probabilistic predictions, allowing teachers to predict the present learning status of the group based on past experience. The experimental testing of the novel method includes 7 teachers, 5 teaching assistants and 706 students. The curriculum materials were distributed to students via video compacted disc (VCD), and the class adopted a group learning strategy with a web interface for performing and recording group projects and inter and intra group discussions. The novel method captured 52 attributes. Each attribute represents a part of a group s learning statuses. Thus, we called them learning status indicators. Among the 52 attributes, 44 attributes s are derived from group discussion and web logs, and other 8 attributes are from questionnaires. These learning statuses indicators are defined using Johnson and Johnson s positive social interdependence, which is essential to effective group learning [20]. The analytical result reveals that the novel tools let teachers monitor and guide the collaborative web-based learning, aided by knowledge and information derived from group discussions and portfolio logs. 3. Monitoring and Analyzing the Group Discussion The contents of the group discussion board of a group include various subjects, such as problems in the curriculum, group goals, group task schedule, resource sharing, and members responsibilities, and so on. This information is important in helping teachers to determine the learning status of groups and helping their learning. However, a single discussion board often includes discussion of many subjects, some of which are trivial and unrelated to learning activities. Thus, to capture the status of groups, teachers need to expend significant effort on locating relevant on sifting the vast and disorganized discussion logs for useful information.. To capture the learning status of the group, the teachers thus need tools to assist them

8 in tracking the group discussion and extracting important information from the discussion board. This section illustrates the tools that teachers use to extract this information from the discussion board and thus monitor the learning status of the group. 3.1 Extracting the discussion topics One sentence of a discussion, the topic sentence, generally contains the most important information in the article. The topic sentence may be represented by the most keywords. Thus, the first step of tracking the discussion is to classify them into topics by analyzing the occurrence, position and frequency of keywords. The IBM Intelligent Miner for Text supports several tools that help users extract information from text-mode data. Herein, the Text Analysis tools [9] of the IBM Intelligent Miner for Text are employed to help teachers analyze inner group discussion articles and thus determine group learning status. Five types of Text Analysis tools exist, outlined briefly below: 1) Language Identification: Automatically identifies the language in which a document is written. 2) Topic Categorization: Automatically assigns documents to predefined categories, topics, or themes. Teachers can then capture the learning status of the group by reading articles in specific categories. For example, an increasing number of articles in the group conflict [16] category indicates a high possibility of group conflict existing. 3) Feature Extraction: Automatically recognizes significant vocabulary items in a text, and classifies the documents into appropriate categories without the need of using a predefined vocabulary. Teachers could extract out the interesting information from discussion topics. 4) Clustering: Automatically groups collections of similar documents, allowing teachers to seek groups that may reflect similar learning status and then help these groups together. 5) Summarizer: Analyzes the sentences in a document as a basis for producing a summary. Teachers can then capture the learning status of the group by reading summaries of discussion instead of reading the entire articles. This investigation uses the topic categorization tool to extract the discussion topics of groups, and then uses the summarizer tool summarizes group discussions. For example, the following sample sentence is extracted from a group member Michael Chen s discusions:

9 Dear teammates: I am sorry to be late for the on-line conference of our group this morning. I have a question and need a favor from you. In the chapter 4, page 45, teachers have illustrated last week. Can anybody kindly tell me the purpose of a member function in an object in the object oriented programming language? Michael Chen To categorize discussions by topic, teachers must first define a set of interesting discussion categories. Example categories include inquiries about system usage, gossip among members, questions about the context of a specific chapter, and so on. The categories listed in the Topic Categorization tool represent the topics of interest to the teachers. For each discussion article, the Topic Categorization tool assigns a score for each of the categories, and will usually classify the article under the category for which the highest score is obtained. This approach helps teachers to classify the discussion into predefined discussion categories. To define the categories, teachers must first train the Topic Categorization tool with several sample discussion articles, after which the Topic Categorization tool will generate a category schema. The tool can then be used to categorize the discussion articles Because IBM Intelligent Miner for Text does not support Chinese, a similar text classification tool using TFIDF [17] is employed herein [18]. All the examples in this paper are translated from Chinese to English for illustration. Teachers could recreate the experimental results reported herein under the support of tools such as IBM Intelligent Miner for Text. Michael Chen s discusions can be taken as an example. The teachers first used several discusion in the category Question for Chapter 4 as training samples using the keywords: what, source code, member function, object, and object oriented programming language. Most of the keywords are curriculum related, and are aimed to let the teachers know which curriculum areas the groups are focused on. The topic categorization tool then classifies all the discussion articles according to keyword type, frequency, and position. The example discusion is asigned the highest score to the category Question for Chapter 4. Table 1 lists the categorization and output file obtained from applying Text Analysis Tool to the above example.

10 Category List Ranking Score Question for Chapter Question for Chapter Question for Chapter Question for Chapter Inquiry for system and environment Gossips discussion Table 1. The category result and output file of Michael Chen s discusion Table 1 lists the scores in defined categories generated by Text Analysis Tool for the example discusion. In table 1, the example discusion was categorized into Question for Chapter 4. Text Analysis Tool helps teachers to partially capture the learning status of a group, for example finding whether a group has problems in understanding chapter 4. The following figure illustrates the steps involved in training the discussion topic categories and categorizing the discussions in the IBM Intelligent Miner for Text: Categorizing Tool of Text Analysis Training Discussion in Category 1 Training Discussion in Category 2 Vocabulary Category 1 Vocabulary Category 2 Full Category Schema All the Discussion Documents Categorization Filter Categorized Documents Figure 2. Training and Filter in the Categorizing Tool 3.2 Summarizing the discussion of a group After classifying the discussions, the teachers may need to read the discussions and their contents to monitor the learning statuses of a group. However, the discussions can be very long and their contents irrelevant. Consequently, teachers summarize each discussion by locating the important sentences in each article, and then combining these sentences to summarize the article. The teachers employ Text Analysis tools such as IBM Intelligent Miner for Text to summarize the discussion and capture the learning status of

11 a group. For example, the Text Analysis tools extract the folowing sentence as the summary of Michael Chen s discussion: What is the purpose of a member function in an object in object oriented programming language? Other sentences in the example discussion are filtered out, allowing teachers to more easily capture the learning status of the group. That is a member of the group has difficulty in understanding the term member function in an object oriented programming language. The categorization and summaries extracted from the discussion board assist teachers in capturing the current learning status of the group. These learning statuses help teachers to intervene in group learning and enhance learning performance. Furthermore, these learning statuses are important information that helps teachers to discover the causal network of influence on learning performance and predict the influence on learning performance, which will be described in the following two sections. 4. Discover the Causal Dependence net of Group social interdependency factors and learning status indicators The aims of discovering the causal dependency net include (1) to find the key influences on final group performance, (2) to find a way for teachers to locate the group most likely to suffer low learning performance, (3) to suggest ways for teachers to enhance the performance of a learning group by manipulating important learning status indicators, (4) to provide a way to determine unobservable factors influence the learning performance based on available observed factors. Thus, the teachers can know how to apply appropriate strategies to enhance learning performance of a group by monitoring and manipulating the key influence factors, learning status indicators, such as the status of working on group project. Simultaneously, teachers can learn from the joint probability among these factors and learning performance, guiding them in applying strategies to promote the factors and thus enhance group learning. The previous section describes how learning status can be obtained from discussion contents. Additional learning statuses can be derived from group activity behavior that is recorded in the web logs and the group portfolio. The web logs are the records of users accesing histories, including login time,

12 login frequency, reading path, page reading count, and so on. Meanwhile, the group portfolio is a history of the group s actions and outcomes during the period in which the group was performing the tasks asigned to it. The portfolio includes the outcomes of tasks assigned to the group, amount of resources shared among group members, and the role assigned for each member. To enhance group learning performance, teachers need a tool to determine the major dominant influences that can improve group performance from the causal network of the learning status indicators. The Bayesian Belief Network (BBN) [19] is a directed graph comprising nodes and arcs that represents the causal relationships [35] between learning status indicator attributes. This section demonstrates the BBN analyzing tool that helps teachers determine learning status based on the group activity status. 4.1 Group learning status represented in a Bayesian Belief Network BBN is a directed graph comprising several nodes and arcs. The nodes represent learning status indicators, while the arcs represent that there is a causal dependency between two learning statuses [13]. For example, the status working on group project will influence the status group final grade. BBN is widespread in many fields for representing causal relationships among variables. Each node in BBN also stores the probability distribution of possible values of the node over all the possible value combinations of the variables in the source nodes of the incoming edges of the node. In an educational environment, the nodes represent the indicators of the group s learning status, while the arcs represent the causal relationships among two or more learning status indicators/outcomes. Our novel methodology uses the BKD tool to create BBN from learning status indicators and outcomes that are derived from the discussion board and the group portfolio. BBN assists teachers in deriving the causal relationships between the learning status indicators. Teachers can then use BBN to enhance learning performance and prevent group learning failure. Table 2 lists some of the learning status indicators that teachers use to monitor and analyze relationships among the groups. The learning status indicators are defined according positive social interdependence, as proposed in [20]. In the above example, teachers try to determine the relationships among positive social interdependence and the group learning achievement. The categories defined by Johnson s Interdependence Typology [20] are used herein, namely: outcome interdependence (goal and

13 reward interdependence), and mean interdependence (task, resource and role interdependence). This study defines a list of group learning status indicators that can be extracted from the group discussion board and group portfolio. The group portfolio includes answers to online questionnaires. The group learning status indicators include Compromise Group Goal, Online Conference, Conflict, Online Message and Competition. Compromise Group Goal, Online Conference and Conflict are of the goal interdependence. Compromise Group Goal is the degree to which the group compromises on its group goals. Conflict is the frequency of conflicts during discussions on reaching a compromise reaching the group goal. Finally, Online Conference is the frequency with which group members use real time discussion boards for group discussions. Online Message and Competition are of the reward interdependence. Online Message is the frequency with which group members help other teammates to increase the rewards of the group. Finally, Competition is the frequency of competition with other groups. Working on Project is the task interdependence, and is represented by the time group members spend working on group tasks. Reading Resource, Uploading Resource and Updating Resource are of source interdependence, and are represented by the reading, sharing, and updating frequency of members on the group resource space. Leader Success is of the role interdependence and is represented by the group leader s capability in fulfilling her/his duties. Grade represents the grades of the group. Learning Status Related Positive Interdependence Compromise Group Goal, Conflict, Online Conference Goal Interdependence Online Message, Competition Reward Interdependence Working on Project Task Interdependence Reading Resource, Uploading Resource, Updating Resource Resource Interdependence Leader Success Role Interdependence Table 2. The partial learning statuses related to positive interdependences The Conflict node represents the frequency of conflict in each group discussion board. Working on Project, Reading Resource, Uploading Resource, Updating Resource, Online Message and Competition nodes represent the frequency with which these actions occur in the web logs and group portfolio. The articles in the discussion board are first categorized using the text categorization tool, and information of interest is then extracted from the discussion board on the basis of the classification. For

14 example, the frequency of conflict is measured as the number of articles categorized as dealing with conflict that occurring within a defined period. The Leader Success and Compromise Group Goal nodes are derived based on the relevant answers obtained from questionnaires. Grade was based on the examination results. All of the learning statuses indicators are inputted into BKD to construct a causal network. 4.2 Observing group learning status and discovering the key influence on learning performance by BKD BKD is a knowledge discovery tool that can extract usable knowledge from databases without requiring users to have a background in statistics and programming. BKD uses BBN to graphically represent the causal network of learning status indicators and outcome. Once the BBN is constructed from the database, the network can be used in a BBN reasoning system to provide information for teachers, including observations, predictions and support for decision-making. BKD needs a text file to be exported from a database to construct BBN, and this input data can be numeric or discrete data [21]. Table 3 lists 70 groups in each class. All the values are derived from the discussion contents, group portfolio, web logs, and questionnaires. Various quantity attributes/learning status indicators that are extracted from the web logs are converted into three levels (HIGH-MID-LOW, and so on) so that they can be used to predict other attributes, learning status indicators, and learning outcome. Group Online Working Competition Conflict Online Reading Uploading Updating Compromise Leader Grade Id Discussion on Task Message Resource Resource Resource Group Goal Success MID EXCELLENT B LOW EXCELLENT B LOW FAIL C MID EXCELLENT B MID FAIR B HIGH FAIR B LOW FAIR B MID EXCELLENT B MID FAIL B MID FAIR C LOW FAIR D Table 3: The data file extract from the learning statues BKD derives the causal relationships between learning status indicators, but becomes inefficient if the learning status indicators continue to increase. The major reason for this phenomenon is that BKD

15 needs to find relationship among learning status indicators trying all possible pairings. This research spent over an hour analyzing the causal relationships between all 52 of these learning status indicators. For the sake of efficiency and readability, the novel methodology uses the factor analysis of SPSS [22] to reduce the number of variables. Following the Kaiser [23] varimax factor analysis, the 52 attributes can be reduced to 11 rotated components (λ 1,cumulative variance=77.737%), simplifying the causal relationship construction for BKD. The teachers can use the factor analysis tool of SPSS to cluster all learning status indicators into components. In order to estimate group final grade, the teachers could select the set of learning status indicators that are of the same a component with group final grade. Then, they can use BKD to build the causal relationship between the selected indicators f and group final grade. After importing the table into BKD, a BBN of the causal relationships and associate probability distributions wil be constructed. Figure 3 ilustrates one of the BKD results of the Introduction of Computer Network and Application course. In this figure, the nodes represent the learning status indicators, while the arcs represent the causal relationships among the nodes. This figure will support teachers in observing the learning status of the group and make instructional decisions to assist and enhance group learning performance. Teachers observed two kinds of BKD results: 1) Enhance a set of learning status indicators by determining the learning status indicator that affects most of them For example, the BBN in Figure 3 ilustrates that the Working on Project learning status may influence the three learning status: Reading Resource, Uploading Resource, and Grade. Accordingly, if group members frequently work on the group project, they are prefer reading the group resources, to share their resource with the members. Meanwhile, the group grade also has high probability affected by this frequency. Thus, if the teachers wish to enhance the frequency of sharing resource and enhance the group grade simultaneously, they should encourage group members to frequently engage in group tasks. 2. Enhance learning status by determine the learning statuses that influence it For example, the BBN in Fig. 3 ilustrates that the frequency of Online Discusion is influenced by the frequency of Uploading Resource, while the frequency of Uploading

16 Resource is in turn afected by the frequency of Working on Project. Thus, teachers can thus probably promote discussion within a group (Online Discussion) by encouraging members of the group to work on the group project (Working on Project) Figure 3: The example of Bayesian Belief Network In the BBN derived by BKD, presented in Fig. 3, each node contains a probability distribution table that represents the probability distribution of all value combinations for each possible value of the node. Meanwhile, Table 4 lists the conditional distribution of Grade nodes that depend on the Working on Project node. The frequency of Working on Project is divided into levels by BKD: 11-75, , and , and the table thus contains four rows, one corresponding to each level. The group grade can have 4 possible values, A, B, C, and D, ranging from excellent to a fail. Grade A represent group grades that exceed the average by one standard deviation, grade B represents group grades that exceed the average by less than one standard deviation, grade C represents the group grades are below the average grade by less than one standard deviation, and grade D represents group grades that are below the average grade by more than one standard deviation. The first column of table 4 is the values of working on project attribute. The other columns show the probability of group final grade (for the value indicated in the first row) for the corresponding working on project ( for the vlue indicated in the first column). For example,

17 if the degree of Working on Project is , there is a probability (93.1%) that the group will receive a B grade. Meanwhile, if the frequency of Working on Project is 11-75, the group has probability (69.9%) of receiving D grade. Table 4: The probability table of Grade learning status The BKD system can assist teachers in deriving the causal relationships among learning status indicators and outcome. Teachers can thus help a group to improve group learning performance by enhancing the status that will influence group performance. For example, encourage group members to work on the group project will improve the group performance.. After several semesters, the results of applying BKD could be used to predict learning status for different teachers, as will be described in the following section. 5. Predicting group performance from initial group discussions and portfolio The causal network and probability distribution constructed by BKD are important teaching experiences for teachers. Moreover, these relationships are presented in a graphical form that is easy for different teachers to understand and exchange. However, these teaching experiences do not support predictions of group learning performance. The teaching experience and help teachers predict learning performance, tools to assist teachers to accumulate past teaching experience are necessary, along with tools for predicting group learning performance early in a class. Predicting learning performance is intended to locate groups destined to fail. For example, if the BKD causal net of past experience infers that a group is going fail, the teacher should intervene in the group to enhance associate learning statuses. A Bayesian classifier RoC is adopted herein to predict the eventual group performance by using data taken early on in a class. The prediction involves two phases: training and prediction. First, RoC is used to produce a prediction scheme from learning statuses and the target value group final grade of a previous

18 class. RoC then uses this prediction scheme to predict the final grades of groups in the current class based on available learning statuses before the final grades of the groups are known. Four steps are involved in applying RoC to predict learning performance: 1) Define the Bayesian classifier (prediction scheme) from a database: The input of the RoC is the set of attributes that are indicators of learning statuses derived from discussion, web log, and group portfolio. Table 3 illustrates an input file to RoC. The file format is as in BKD system, with the first row showing the attribute names, while the other rows list the values of corresponding attributes. The values can be numerical or categorical. 2) Class selection: Select one of the learning status indicators as the class, that is, the target attribute being predicted. The class in RoC represents the attribute that the teachers want to predict. Before predicting evaluation data, the RoC system generates a classifier from learning data for the selected class. 3) Learning the BBN: RoC automatically learns a prediction model from the learning data, and then determines the dependence and probability distribution among the attributes (learning status indicators). The output of this step is called the classifier, and supports teachers predictions of learning performance evaluation data in the following step. 4) Predicting learning performance: Once the classifier has been trained, current learning groups can be classified into a class. The classifier generated by RoC classifies current groups and predicts their grades. The current experiment involves 706 students and 70 groups. Each group collaboratively learns the same curriculum on web. The teachers collected 52 learning status indicators, defined based on the basis of Johnson s positive social interdependence [20]. Herein, teachers collect these learning statuses based on just one course from each semester. Thus, the leave-one-out cros validation method [32] is employed to demonstrate the prediction capability of the RoC system. The following paragraph illustrates the leave-one-out method of RoC prediction: Step1. Teachers collect all the learning statuses as input files to RoC. Herein, Table 5 lists the input

19 data for constructing the classifier. All attributes are clustered into components by factor analysis tool of SPSS. In our experiment, we only select the attributes that are of the same component with group final grade. Step2. Select one of the learning status indicators of input data as the class. For example in Table 5, the teachers attempt to determine which group will have lower grades at the end of the semester, and thus the Grade was selected as the class. Step3. Select some of the groups as the evaluation group, and define the others as training groups. For example in Table 5, the evaluation group is groups 1 to 10, and the grade value of the evaluation group is marked as? for predicting RoC. The evaluation groups are used to evaluate the accuracy of the classifier. Step 4. Discover the rules of the training data and predict the class value of the evaluation group. Table 6 lists predicted results for groups 1 to 10. Step 5. Repeat steps 1 to 4, using different sets of ten groups as the evaluation data until the class value of all of the groups has been predicted. Table 7 lists the predicted results following 7 predictions. Group Online Working on Competition Reading Uploading Updating Grade Id Discussion Task Resource Resource Resource ? ? ? ? ? ? ? ? ? ? D Table 5: The input file for prediction of RoC

20 Group id Predicted Grade Grade Probability of Grade A Probability of Grade B Probability of Grade C Probability of Grade D 1 B B B B D C C B B B B B C B B B B B C C Correct: 8 Incorrect: 2 Accuracy: 80 % Coverage % Table 6: The output file of predicted result In Table 6, the coverage shows that all cases of evaluation data are predictable in RoC. The column Grades represents the original grade values of evaluation data, while the column Predicted Grade is predicted by RoC using the input data from Table 5. The system predicted that group 1 would obtain a Grade B at the end of the semester, and this prediction was proved correct. However, the predictions of groups 3 and 7 were in corect. The column Probability represents the probability of a particular group obtaining each grade value. In this experiment, the predictions were found to be 80% accurate (8 correct, 2 incorrect). Meanwhile, Table 7 displays that after running 7 predictions for each testing data, the average accuracy was 74.28%, sufficient for teachers to use to predict the group learning status. Testing data Accuracy Group 1 to group % Group 11 to group % Group 21 to group % Group 31 to group % Group 41 to group % Group 51 to group % Group 61 to group % Average Accuracy % Table 7. The 7 times of prediction for Grade value and the accuracy RoC can be used to predict not only Grades but also all other group learning statuses. 6. Experience and Discussion Our experiment was performed on the Introduction of Computer Network and Applications course, and took place in a web learning environment. The curriculum included some fundamental concepts of computer networks, and the programming language/application for constructing WWW pages such as

21 TCP/IP, Network security, HTML, JavaScript, FrontPage, and so on. Meanwhile, data mining tools, Text Analysis Tool, BKD, and RoC, were employed to extract information to assist teachers to track group discussions, observe group learning status, and illustrate causal relationships between leaning statuses. This information helps teachers to prevent groups from failing The following table shows the issues investigated herein, along with the associated methods, or tools adopted for solving the issues. Tasks Input Data Tools Output Data Capture the learning status by extracting a topic and abstract from group discussions Discussion contents Text Analysis Tool Group status Discussion Status Discover important influences on learning performance by analyzing causal dependence of the learning status Predict learning performance to prevent groups from suffering low performance Discussion status, web logs, questionnaire answers Learning statuses and learning performance of previous classes, learning statuses of current class The Bayesian Knowledge Discoverer (BKD) The Robust Bayesian Classifier (RoC) BBN of learning statuses indicators Final prediction of group learning performance Table 8: The input/output data and tools of three primary tasks when teachers extract learning information and guide group learning 6.1 The participants and the grouping on web The participants included 7 teachers, 5 teaching assistants and 706 students at National Central University, Taiwan. 459 (65.0 %) of the students were male, and 247 (35.0 %) were female. The average age was After spending a month familiarizing themselves with the learning environment, the students were systematically grouped into several heterogeneous groups [25]. The grouping criterion included personal profile and thinking style, with the personal profiles including gender, age, area of residence,education and so on. The teachers selected the student s thinking style as the major concern for grouping. Herein, teachers categorized students according to 13 thinking styles, as proposed by Dr. Sternber, namely [26]: legislative, executive, external, judicial, monarchic, hierarchic, oligarchic, anarchic, global, local, internal, liberal, and conservative. Herein, each group is grouped according to the specification given by the teacher. There are 70 groups. Each group containing 10 to 11 students. The students read the curriculums on video compact discs (VCD). After reading the curriculums, the students must register with the NCUVC web collaborative learning system [28] [29], which supports a peer

22 interaction space [29][30], collaborative project space, resource sharing space, and so on. The first group task is to elect a group leader, co-leader and reporter. The group private working space included online and offline discussion rooms, a resource sharing space, a portfolio space [31], a project scheduler, and a window for querying member working status. 6.2 The results of Monitoring group discussions The total number of discussion herein is The teachers defined 25 interesting discussion categories, and selected 25% of the articles as the training data. The teachers immediately assigned one of three grades, Good, Acceptable, and incorrect, to the analyzed result. The analytical results reveal that 96.6%, and 73.3% of the abstracts received grades of Good and Acceptable, respectively. Furthermore, the longer of an article is, the better the grade it receives. Table 9 illustrates the derived results and teacher feedbacks: Feedback Topics Abstract Good 73.0 % 55.0 % Acceptable 23.6 % 18.3 % Mistake 7.3 % % Table 9: The results of deriving attributes and abstract of an article Part of the reasons that cause the incorrect categorization and summarization are listed as follows: : 1) Adages: Some adages are not considered by the teachers and are excluded from the keywords database. These adages could not be derived from articles but are important in the discussion articles. 2) Mistyped keywords: If keywords in articles were mistyped, the system was unable to locate and properly classify them. 3) Program source codes: The program structure could not be derived from keyword retrieval. 4) Short discussion articles: Short discussions contained few representative keywords and sentences. 6.3 Predicted results for the flunked groups The classifier produced by RoC system can assist teachers in predicting the final performance of groups. In our experiment, the prediction accuracy achieved herein for predicting performance was 74.28%. Actually, the teachers are more interested in finding the groups that may have poor learning performance or

23 be flunked. Thus, they can help these groups to improve their learning. Table 10 shows the results of predicting the flunked groups. Group Id Predicted Result Probability for Predicted Result 11 D D D D D D B D D D D D D Accuracy for flunk % Table 10. The predicting results and accuracy for flunk groups Table 10 showed that 12 of the failed groups out of 13 groups were predicted correctly. The learning statuses and final grades of the flunked groups are shown in Table 11. Group Online Working on Competition Reading Uploading Updating Grade Id Discussion Task Resource Resource Resource D D D D D D D D D D D D D Table 11. The extract of learning status of flunked groups Thus, teachers could focus on this group in Table 11 to determine why these groups failed. 7. Conclusion To assist teachers in monitoring and guiding group learning, this work presented the methodology for deriving knowledge for observing group status, discovering the key influence of group performance, and the rule to predict learning performance. The Bayesian method is an efficient way to achieve the above goals. Without the proposed mechanisms, teachers must spend considerable time in trying to infer group status from vast unorganized web logs. The causal relationships of group learning situations are hard to track. In traditional approaches, group learning situation have to be determined based on the teacher s individual experience that is imprecise and could not be reused for other teachers. This investigation (1)

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Better planning and forecasting with IBM Predictive Analytics

Better planning and forecasting with IBM Predictive Analytics IBM Software Business Analytics SPSS Predictive Analytics Better planning and forecasting with IBM Predictive Analytics Using IBM Cognos TM1 with IBM SPSS Predictive Analytics to build better plans and

More information

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three

More information

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS. PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1

Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1 Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1 Introduction Electronic Commerce 2 is accelerating dramatically changes in the business process. Electronic

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

Diagnosis of Students Online Learning Portfolios

Diagnosis of Students Online Learning Portfolios Diagnosis of Students Online Learning Portfolios Chien-Ming Chen 1, Chao-Yi Li 2, Te-Yi Chan 3, Bin-Shyan Jong 4, and Tsong-Wuu Lin 5 Abstract - Online learning is different from the instruction provided

More information

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the

More information

COURSE RECOMMENDER SYSTEM IN E-LEARNING

COURSE RECOMMENDER SYSTEM IN E-LEARNING International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand

More information

A Time Efficient Algorithm for Web Log Analysis

A Time Efficient Algorithm for Web Log Analysis A Time Efficient Algorithm for Web Log Analysis Santosh Shakya Anju Singh Divakar Singh Student [M.Tech.6 th sem (CSE)] Asst.Proff, Dept. of CSE BU HOD (CSE), BUIT, BUIT,BU Bhopal Barkatullah University,

More information

Building a Database to Predict Customer Needs

Building a Database to Predict Customer Needs INFORMATION TECHNOLOGY TopicalNet, Inc (formerly Continuum Software, Inc.) Building a Database to Predict Customer Needs Since the early 1990s, organizations have used data warehouses and data-mining tools

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

Designing Socio-Technical Systems to Support Guided Discovery-Based Learning in Students: The Case of the Globaloria Game Design Initiative

Designing Socio-Technical Systems to Support Guided Discovery-Based Learning in Students: The Case of the Globaloria Game Design Initiative Designing Socio-Technical Systems to Support Guided Discovery-Based Learning in Students: The Case of the Globaloria Game Design Initiative Rebecca Reynolds 1, Sean P. Goggins 2 1 Rutgers University School

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

A Framework for the Delivery of Personalized Adaptive Content

A Framework for the Delivery of Personalized Adaptive Content A Framework for the Delivery of Personalized Adaptive Content Colm Howlin CCKF Limited Dublin, Ireland colm.howlin@cckf-it.com Danny Lynch CCKF Limited Dublin, Ireland colm.howlin@cckf-it.com Abstract

More information

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

More information

Better decision making under uncertain conditions using Monte Carlo Simulation

Better decision making under uncertain conditions using Monte Carlo Simulation IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics

More information

SAS JOINT DATA MINING CERTIFICATION AT BRYANT UNIVERSITY

SAS JOINT DATA MINING CERTIFICATION AT BRYANT UNIVERSITY SAS JOINT DATA MINING CERTIFICATION AT BRYANT UNIVERSITY Billie Anderson Bryant University, 1150 Douglas Pike, Smithfield, RI 02917 Phone: (401) 232-6089, e-mail: banderson@bryant.edu Phyllis Schumacher

More information

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning

More information

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

More information

Role of Social Networking in Marketing using Data Mining

Role of Social Networking in Marketing using Data Mining Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:

More information

Working with telecommunications

Working with telecommunications Working with telecommunications Minimizing churn in the telecommunications industry Contents: 1 Churn analysis using data mining 2 Customer churn analysis with IBM SPSS Modeler 3 Types of analysis 3 Feature

More information

IBM SPSS Direct Marketing

IBM SPSS Direct Marketing IBM Software IBM SPSS Statistics 19 IBM SPSS Direct Marketing Understand your customers and improve marketing campaigns Highlights With IBM SPSS Direct Marketing, you can: Understand your customers in

More information

Analyzing survey text: a brief overview

Analyzing survey text: a brief overview IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining

More information

Important dimensions of knowledge Knowledge is a firm asset: Knowledge has different forms Knowledge has a location Knowledge is situational Wisdom:

Important dimensions of knowledge Knowledge is a firm asset: Knowledge has different forms Knowledge has a location Knowledge is situational Wisdom: Southern Company Electricity Generators uses Content Management System (CMS). Important dimensions of knowledge: Knowledge is a firm asset: Intangible. Creation of knowledge from data, information, requires

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,

More information

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining. Fundamentals, robotics, recognition Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

More information

Business Intelligence in E-Learning

Business Intelligence in E-Learning Business Intelligence in E-Learning (Case Study of Iran University of Science and Technology) Mohammad Hassan Falakmasir 1, Jafar Habibi 2, Shahrouz Moaven 1, Hassan Abolhassani 2 Department of Computer

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation biswajit.biswal@oracle.com ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety

More information

Predicting Students Final GPA Using Decision Trees: A Case Study

Predicting Students Final GPA Using Decision Trees: A Case Study Predicting Students Final GPA Using Decision Trees: A Case Study Mashael A. Al-Barrak and Muna Al-Razgan Abstract Educational data mining is the process of applying data mining tools and techniques to

More information

ADOPTION OF OPEN SOURCE AND CONVENTIONAL ERP SOLUTIONS FOR SMALL AND MEDIUM ENTERPRISES IN MANUFACTURING. Mehran G. Nezami Wai M. Cheung Safwat Mansi

ADOPTION OF OPEN SOURCE AND CONVENTIONAL ERP SOLUTIONS FOR SMALL AND MEDIUM ENTERPRISES IN MANUFACTURING. Mehran G. Nezami Wai M. Cheung Safwat Mansi Proceedings of the 10 th International Conference on Manufacturing Research ICMR 2012 ADOPTION OF OPEN SOURCE AND CONVENTIONAL ERP SOLUTIONS FOR SMALL AND MEDIUM ENTERPRISES IN MANUFACTURING Mehran G.

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Data Mining Application in Higher Learning Institutions

Data Mining Application in Higher Learning Institutions Informatics in Education, 2008, Vol. 7, No. 1, 31 54 31 2008 Institute of Mathematics and Informatics, Vilnius Data Mining Application in Higher Learning Institutions Naeimeh DELAVARI, Somnuk PHON-AMNUAISUK

More information

Creating a Database. Frank Friedenberg, MD

Creating a Database. Frank Friedenberg, MD Creating a Database Frank Friedenberg, MD Classic Data Management Flow for Clinical Research Scientific Hypotheses Identify Specific Data Elements Required to Test Hypotheses Data Acquisition Instruments

More information

CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES

CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES International Journal of Scientific and Research Publications, Volume 4, Issue 4, April 2014 1 CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES DR. M.BALASUBRAMANIAN *, M.SELVARANI

More information

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

More information

Application of Predictive Model for Elementary Students with Special Needs in New Era University

Application of Predictive Model for Elementary Students with Special Needs in New Era University Application of Predictive Model for Elementary Students with Special Needs in New Era University Jannelle ds. Ligao, Calvin Jon A. Lingat, Kristine Nicole P. Chiu, Cym Quiambao, Laurice Anne A. Iglesia

More information

Fast and Easy Delivery of Data Mining Insights to Reporting Systems

Fast and Easy Delivery of Data Mining Insights to Reporting Systems Fast and Easy Delivery of Data Mining Insights to Reporting Systems Ruben Pulido, Christoph Sieb rpulido@de.ibm.com, christoph.sieb@de.ibm.com Abstract: During the last decade data mining and predictive

More information

DESKTOP BASED RECOMMENDATION SYSTEM FOR CAMPUS RECRUITMENT USING MAHOUT

DESKTOP BASED RECOMMENDATION SYSTEM FOR CAMPUS RECRUITMENT USING MAHOUT Journal homepage: www.mjret.in ISSN:2348-6953 DESKTOP BASED RECOMMENDATION SYSTEM FOR CAMPUS RECRUITMENT USING MAHOUT 1 Ronak V Patil, 2 Sneha R Gadekar, 3 Prashant P Chavan, 4 Vikas G Aher Department

More information

AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING

AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING AN EFFIIENT APPROAH TO PERFORM PRE-PROESSING S. Prince Mary Research Scholar, Sathyabama University, hennai- 119 princemary26@gmail.com E. Baburaj Department of omputer Science & Engineering, Sun Engineering

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 Copyright 2011 Pearson Education, Inc. Student Learning Objectives How does a relational database organize data,

More information

Research of Postal Data mining system based on big data

Research of Postal Data mining system based on big data 3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication

More information

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

More information

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

More information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Eric Hsueh-Chan Lu Chi-Wei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering

More information

Recommendation Tool Using Collaborative Filtering

Recommendation Tool Using Collaborative Filtering Recommendation Tool Using Collaborative Filtering Aditya Mandhare 1, Soniya Nemade 2, M.Kiruthika 3 Student, Computer Engineering Department, FCRIT, Vashi, India 1 Student, Computer Engineering Department,

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

How To Write A Summary Of A Review

How To Write A Summary Of A Review PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,

More information

Real Time Traffic Monitoring With Bayesian Belief Networks

Real Time Traffic Monitoring With Bayesian Belief Networks Real Time Traffic Monitoring With Bayesian Belief Networks Sicco Pier van Gosliga TNO Defence, Security and Safety, P.O.Box 96864, 2509 JG The Hague, The Netherlands +31 70 374 02 30, sicco_pier.vangosliga@tno.nl

More information

Building A Smart Academic Advising System Using Association Rule Mining

Building A Smart Academic Advising System Using Association Rule Mining Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

How To Understand The Impact Of A Computer On Organization

How To Understand The Impact Of A Computer On Organization International Journal of Research in Engineering & Technology (IJRET) Vol. 1, Issue 1, June 2013, 1-6 Impact Journals IMPACT OF COMPUTER ON ORGANIZATION A. D. BHOSALE 1 & MARATHE DAGADU MITHARAM 2 1 Department

More information

Email Spam Detection Using Customized SimHash Function

Email Spam Detection Using Customized SimHash Function International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email

More information

Basic Concepts in Research and Data Analysis

Basic Concepts in Research and Data Analysis Basic Concepts in Research and Data Analysis Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...3 The Research Question... 3 The Hypothesis... 4 Defining the

More information

EFL LEARNERS PERCEPTIONS OF USING LMS

EFL LEARNERS PERCEPTIONS OF USING LMS EFL LEARNERS PERCEPTIONS OF USING LMS Assist. Prof. Napaporn Srichanyachon Language Institute, Bangkok University gaynapaporn@hotmail.com ABSTRACT The purpose of this study is to present the views, attitudes,

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

Web Mining as a Tool for Understanding Online Learning

Web Mining as a Tool for Understanding Online Learning Web Mining as a Tool for Understanding Online Learning Jiye Ai University of Missouri Columbia Columbia, MO USA jadb3@mizzou.edu James Laffey University of Missouri Columbia Columbia, MO USA LaffeyJ@missouri.edu

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

Personalized Information Management for Web Intelligence

Personalized Information Management for Web Intelligence Personalized Information Management for Web Intelligence Ah-Hwee Tan Kent Ridge Digital Labs 21 Heng Mui Keng Terrace, Singapore 119613 Email: ahhwee@krdl.org.sg Abstract Web intelligence can be defined

More information

Model-Based Cluster Analysis for Web Users Sessions

Model-Based Cluster Analysis for Web Users Sessions Model-Based Cluster Analysis for Web Users Sessions George Pallis, Lefteris Angelis, and Athena Vakali Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece gpallis@ccf.auth.gr

More information

During my many years as a classroom teacher and then as a

During my many years as a classroom teacher and then as a 01-Creighton (Schools).qxd 6/1/2006 5:48 PM Page 1 CHAPTER ONE The Role of Data Analysis in the Lives of School Leaders During my many years as a classroom teacher and then as a principal and superintendent

More information

CHAPTER 4 RESULTS. four research questions. The first section demonstrates the effects of the strategy

CHAPTER 4 RESULTS. four research questions. The first section demonstrates the effects of the strategy CHAPTER 4 RESULTS This chapter presents the statistical analysis of the collected data based on the four research questions. The first section demonstrates the effects of the strategy instruction on the

More information

Hexaware E-book on Predictive Analytics

Hexaware E-book on Predictive Analytics Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

More information

Decision Support System For A Customer Relationship Management Case Study

Decision Support System For A Customer Relationship Management Case Study 61 Decision Support System For A Customer Relationship Management Case Study Ozge Kart 1, Alp Kut 1, and Vladimir Radevski 2 1 Dokuz Eylul University, Izmir, Turkey {ozge, alp}@cs.deu.edu.tr 2 SEE University,

More information

TEACHING AN APPLIED BUSINESS INTELLIGENCE COURSE

TEACHING AN APPLIED BUSINESS INTELLIGENCE COURSE TEACHING AN APPLIED BUSINESS INTELLIGENCE COURSE Stevan Mrdalj (smrdalj@emich.edu) ABSTRACT This paper reports on the development of an applied Business Intelligence (BI) course for a graduate program.

More information

A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING

A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING Ahmet Selman BOZKIR Hacettepe University Computer Engineering Department, Ankara, Turkey selman@cs.hacettepe.edu.tr Ebru Akcapinar

More information

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators

More information

Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?*

Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?* Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?* Jennjou Chen and Tsui-Fang Lin Abstract With the increasing popularity of information technology in higher education, it has

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering

More information

Data Mining & Data Stream Mining Open Source Tools

Data Mining & Data Stream Mining Open Source Tools Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.

More information

Designing Programming Exercises with Computer Assisted Instruction *

Designing Programming Exercises with Computer Assisted Instruction * Designing Programming Exercises with Computer Assisted Instruction * Fu Lee Wang 1, and Tak-Lam Wong 2 1 Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong flwang@cityu.edu.hk

More information

Client Overview. Engagement Situation. Key Requirements

Client Overview. Engagement Situation. Key Requirements Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision

More information

Easily Identify the Right Customers

Easily Identify the Right Customers PASW Direct Marketing 18 Specifications Easily Identify the Right Customers You want your marketing programs to be as profitable as possible, and gaining insight into the information contained in your

More information

Contact Recommendations from Aggegrated On-Line Activity

Contact Recommendations from Aggegrated On-Line Activity Contact Recommendations from Aggegrated On-Line Activity Abigail Gertner, Justin Richer, and Thomas Bartee The MITRE Corporation 202 Burlington Road, Bedford, MA 01730 {gertner,jricher,tbartee}@mitre.org

More information

Oracle Data Miner (Extension of SQL Developer 4.0)

Oracle Data Miner (Extension of SQL Developer 4.0) An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,

More information

IFS-8000 V2.0 INFORMATION FUSION SYSTEM

IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 Overview IFS-8000 v2.0 is a flexible, scalable and modular IT system to support the processes of aggregation of information from intercepts to intelligence

More information