Use of group discussion and learning portfolio to build knowledge for managing web group learning

Transcription

1 Use of group discussion and learning portfolio to build knowledge for managing web group learning Gwo-Dong Chen,Kuo-Liang Ou, Chin-Yeh Wang Department of Computer Science and Information Engineering National Central University Chung-Li TAIWAN Abstract To monitor and enhance the learning performance of learning groups in a web learning system, teachers need to know the learning status of the group and determine the key influences affecting group learning outcomes. Teachers can achieve this goal by observing the group discussions and learning behavior from web logs and analyzing the web log data to obtain the relevant information. However, web logs are not systematically organized and the discussions are extensive. Consequently, teachers must struggle to extract information from logs and intuitively apply teaching rules based on experience when managing the groups. Rather than using statistics packages to evaluate hypotheses, this work presents a methodology of applying existing data and text mining tools to automatically gather learning status and predict performance of learning groups from the contents of discussions, and from log records of learning behaviors. Meanwhile, the methodology infers a causal network exists between learning features and learning performance. Knowledge is inferred based on statistics and probability reasoning and social interdependency theory. The causal network can suggest means of enhancing learning performance to teachers. Simultaneously, teachers can use the knowledge of learning groups obtained to manage group learning process on the web. Experimental results of applying the novel methodology to manage a group learning class organized over the web and containing 706 students are also presented. 1) Introduction In existing web learning systems, students may need to learn independently, without the opportunity to interact with other learning companions. Simultaneously, numerous researches have indicated that students learn better in a group [1] [2] [3]. Student learning motivation and performance may be enhanced by peer support and competitive pressure from the group [4]. Thus, the group learning mechanism can be

2 adopted into web learning systems to enhance learning performance. However, to manage learning groups in a web-based system, the teachers need to expend significant effort to obtain the learning status for monitoring and guiding the learning groups. To obtain the status of the learning groups, the teachers need to collect the relevant information from the web logs that represent the history of group members actions in carrying out their assigned projects. The web logs include group portfolio and inner group discussion. Because the amount of logs is huge and is not organized to facilitate pedagogical research, teachers will find trying to manage groups by seeking meaningful and useful information from the logs a great burden. The situation is worsened by the involvement of hundreds of students in group learning and by the need to manually analyze thousands of web and discussion logs. To facilitate the obtaining of information from the web log, web masters have developed numerous tools to enhance web performance [5]. However, these tools mostly merely summarize the access log of a website, including information such as access time, access frequencies, and the IP addresses. This statistical information is insufficient to let teachers capture the status of groups in a group learning system. What teachers really need to know is the information that can be inferred from access logs and group portfolios. For instance, teachers are interested in information such as which groups are discussing the topic of group goal [6], which groups have difficulties in learning a specific chapter, which group members are unwilling to help others [7], and so on. One way to derive information about group learning status is to track and analyze group discussions and the behavior of individual group members in carrying out group projects. Teachers can observe the relationships among group members, and can capture the status of the group by analyzing the contents of discussions within the group and the group portfolio. After accumulating observational experience from the discussion contents of classes with the same curriculum and similar learning model, the teacher can induce heuristic rules on how to predict final group performance according to initial group behavior. Thus, the teachers can thus use past experiences to predict learning status conditions in similar groups in the future, and can then use timely intervention to enhance performance. However, in accomplishing the above, teachers face the following problems: First, the simple bulk of material involved makes it hard to discuss the contents of group discussions:

3 The group s discusion board wil reflect the individual learning statuses of the group s members. The contents of the discussions include information about learning performance in reading curriculums, discussion of group tasks, group atmosphere, knowledge sharing, and so on. Teachers can thus capture the learning status of the group by tracking group discussions and helping groups learn. However, the contents of the discussion are always not only large in quantity but are also unorganized, making analysis difficult. Consequently, filtering out uninteresting discussion contents and extracting interesting information are key issues in tracking group discussions. Second, discovering the important impacts of group learning status for teachers to monitor and thus manage group learning is difficult. Teachers can estimate and infer group learning status and many other factors by tracking group discussion and portfolio. However, the discussion board and group portfolio include numerous elements, with examples including resource sharing, effective leadership, and discussion quantities. The teachers should first determine which elements significantly influence learning performance, and then focus on these factors. To locate groups needing help effectively promote performance, teachers should determine the key influences on group learning performance. However, it is difficult to obtain the causal relationship between factors and performance based on learning logs and group discussions. Therefore, how to obtain the causal relationship between learning status and factors is another key issue for teachers wishing to monitor and guide group learning in a web collaborative web learning system. Third, predicting which groups may suffer low performance and then intervening to prevent learning failure is difficult. After finding determining the dominant influences on the success of a learning group, teachers require a map of causal relationships within the group as a base for making decisions on teaching strategies to apply and how to guide group learning. By constructing and refining the map based on several semesters of experience, teachers can use this map of past experience to prevent a group from failing. Simultaneously, the teachers can infer information on unobservable factors from relevant data and the network of relationships within the group. A mechanism is needed to integrate individual experiences so that other

4 teachers or teaching assistants can share these experiences and use them to predict student performance. Consequently, the teachers can intervene in group learning in time to prevent failure. To overcome the above problems, this investigation employed data mining techniques [8] to extract useful information from web logs and group portfolios and allow teachers to manage collaborative learning on the web. The novel methodology employs three tools to extract information to assist teachers in managing the learning groups. 1) Capture learning status by extracting the topics and summaries from the inner group discussion contents: Text Miner, from the IBM DB2 database management system [9], is employed to identify discussion topics and extract summaries of articles. This approach allows teachers to track group discussions simply by reading the discussion topics and summaries, rather than reading through the actual articles on the discussion board. 2. Determining key influences on learning performance by analyzing the causal dependence of learning status indicators: The Bayesian Knowledge Discoverer (BKD) [10], based on conditional probability and Bayesian belief network (BBN) [34], is employed to derive the causal relationships among factors and learning performance. In the derived causal network [35], BKD also extracts the conditional probability distributions of each node based on the data provided. Teachers can then monitor and predict learning status based on the BBN and can manipulate the key influence of learning performance. 3. Predict group learning performance to prevent learning failure: The causal network built based on Bayesian belief network can be used in the Robust Classifier (RoC) [11] to predict group learning performance based on data from discussions within the group and from the makeup of the group. The teachers then can use the tool to predict and locate the groups that are likely to experience poor learning performance. Teachers can then intervene and provide guidance to enhance the learning performance of these groups. 2. Methodological Overview Figure 1 illustrates the three main processes and working flows teachers use to extract learning

5 information and guide group learning. The three processes are (1) capturing learning status by tracking group discussion, (2) discovering the key influence and causal relationships by analyzing the causal relationship between learning status indicators and learning performance, and (3) intervening and preventing group failure according to predicted learning performance. Group Discussion Online Web Logs and Group Works Discussion Tracking and Analyzing Tools (IBM Text Miner) The Learning Status (Training Data) The Learning Status (Testing Data) Learning Status Analyzing and Predicting Tool (BKD & RoC) Causal Network of Learning Status and Learning Performance Predicted Learning Performance Legend Input Data Process Temporal Data Output/Result Figure 1: The process for monitoring and predicting the learning situation To allow group discussion to be monitored, the contents of discussion were inputted into IBM Intelligent Miner for Text, and discussion topics and summaries were derived. The IBM Intelligent Miner for Text [9] is a set of information mining [12] tools for retrieving interesting patterns and gathering information from large quantities of articles. Monitoring the discussion topics and abstracts derived by IBM Intelligent Miner for Text allows teachers to determine learning status without excessive reading. Moreover, fully represent learning status, teachers collected web access logs and group portfolios including answers of questionnaires, and the status of the discussion. These data are treated as training data and inputted into the Bayesian Knowledge Discoverer (BKD) [10] system. BKD is a knowledge discovery software developed by Knowledge Media Institute of the UK s Open University. BKD is an automated modeling tool that can transform a database into a Bayesian network by seeking for the model most likely responsible for the observed data. BKD does not use conditional-independence tests but instead uses Bayesian methods [10]. BKD could establish the casual dependence network of the learning status indicators and represent the network in a graphically network according to the data in a table, with each field/attribute represented having a corresponding learning status indicator or learning performance. The

6 result of applying BKD can help teachers to realize the key influences on learning performance. Meanwhile, each node in the network contains a joint probability distribution for all possible values, thus allowing teachers to obtain information on still unobservable learning status indicators by Bayesian inference from the network and data from other available learning status indicators. Microsoft Belief Network tool MSBN [15] is used herein as the inference tool. Teachers can employ the Robust Classifier (RoC) [11] system to predict group learning performance. The RoC system is a Bayesian based classification software developed by the developer of BKD. RoC can use known learning status indicator values and performance to produce a mechanism for predicting performance. Teachers can thus predict the learning performance of groups based on existing records of learning status indicators. Thus, RoC supports teachers by giving them the information necessary to intervene in group learning and prevent poor group performance. Bayesian net learning is a machine learning method based on the Bayes theorem [33], and was proposed in the 18th century by Rev. Thomas Bayes. The Bayesian learning method calculates explicit probability distribution for hypotheses based on given training data [13] and in some cases can competitive with other learning algorithms, including decision tree and neural network algorithms [14]. In a collaborative learning environment, each grade/category for group learning performance/status is treated as a hypothesis in the Bayesian net learning method. The Bayesian belief net builder, called BKD herein, constructs the causal net and probability distribution of the learning status indicators/performance based on the training data provided, namely records of previous class learning behavior data. With the derived Bayesian net and current student learning behavior data, Bayesian net reasoning modules such as Microsoft MSBN [15], or Bayesian classifier such as RoC, calculate the probability distribution of each hypothesis. The reasoning module then select the maximum probable hypothesis, called the Maximum a Posteriori Hypothesis (H MAP ), or the Maximum Likelihood Hypothesis (H ML ), using evidence of currently available attributes and the derived BBN. For example, if the module estimates the final grade probability distribution of a student under other available learning statuses as (A, 20%), (B, 78%),(C, 2%),(D, 0%), then the module predicts that the final grade of this student will be B. That is, hypothesis final grade = B is H ML or H MAP among all hypotheses final grade = A or B or C or D. The causal network is easy for teachers without a statistics background to understand and apply to the management of learning groups. Furthermore,

7 the following features [13] of Bayesian learning methods make it adaptable for teachers wishing to manage a classroom: 1. The training examples can influence the estimated probability of a hypothesis being correct. This feature provides additional flexibility for teachers wishing to increase or eliminate a hypothesis. 2. Prior knowledge can be combined with observed data to determine the final probability of a hypothesis, allowing teachers to accumulate their teaching experience when managing the group. 3. Bayesian methods can accommodate hypotheses that make probabilistic predictions, allowing teachers to predict the present learning status of the group based on past experience. The experimental testing of the novel method includes 7 teachers, 5 teaching assistants and 706 students. The curriculum materials were distributed to students via video compacted disc (VCD), and the class adopted a group learning strategy with a web interface for performing and recording group projects and inter and intra group discussions. The novel method captured 52 attributes. Each attribute represents a part of a group s learning statuses. Thus, we called them learning status indicators. Among the 52 attributes, 44 attributes s are derived from group discussion and web logs, and other 8 attributes are from questionnaires. These learning statuses indicators are defined using Johnson and Johnson s positive social interdependence, which is essential to effective group learning [20]. The analytical result reveals that the novel tools let teachers monitor and guide the collaborative web-based learning, aided by knowledge and information derived from group discussions and portfolio logs. 3. Monitoring and Analyzing the Group Discussion The contents of the group discussion board of a group include various subjects, such as problems in the curriculum, group goals, group task schedule, resource sharing, and members responsibilities, and so on. This information is important in helping teachers to determine the learning status of groups and helping their learning. However, a single discussion board often includes discussion of many subjects, some of which are trivial and unrelated to learning activities. Thus, to capture the status of groups, teachers need to expend significant effort on locating relevant on sifting the vast and disorganized discussion logs for useful information.. To capture the learning status of the group, the teachers thus need tools to assist them

8 in tracking the group discussion and extracting important information from the discussion board. This section illustrates the tools that teachers use to extract this information from the discussion board and thus monitor the learning status of the group. 3.1 Extracting the discussion topics One sentence of a discussion, the topic sentence, generally contains the most important information in the article. The topic sentence may be represented by the most keywords. Thus, the first step of tracking the discussion is to classify them into topics by analyzing the occurrence, position and frequency of keywords. The IBM Intelligent Miner for Text supports several tools that help users extract information from text-mode data. Herein, the Text Analysis tools [9] of the IBM Intelligent Miner for Text are employed to help teachers analyze inner group discussion articles and thus determine group learning status. Five types of Text Analysis tools exist, outlined briefly below: 1) Language Identification: Automatically identifies the language in which a document is written. 2) Topic Categorization: Automatically assigns documents to predefined categories, topics, or themes. Teachers can then capture the learning status of the group by reading articles in specific categories. For example, an increasing number of articles in the group conflict [16] category indicates a high possibility of group conflict existing. 3) Feature Extraction: Automatically recognizes significant vocabulary items in a text, and classifies the documents into appropriate categories without the need of using a predefined vocabulary. Teachers could extract out the interesting information from discussion topics. 4) Clustering: Automatically groups collections of similar documents, allowing teachers to seek groups that may reflect similar learning status and then help these groups together. 5) Summarizer: Analyzes the sentences in a document as a basis for producing a summary. Teachers can then capture the learning status of the group by reading summaries of discussion instead of reading the entire articles. This investigation uses the topic categorization tool to extract the discussion topics of groups, and then uses the summarizer tool summarizes group discussions. For example, the following sample sentence is extracted from a group member Michael Chen s discusions:

9 Dear teammates: I am sorry to be late for the on-line conference of our group this morning. I have a question and need a favor from you. In the chapter 4, page 45, teachers have illustrated last week. Can anybody kindly tell me the purpose of a member function in an object in the object oriented programming language? Michael Chen To categorize discussions by topic, teachers must first define a set of interesting discussion categories. Example categories include inquiries about system usage, gossip among members, questions about the context of a specific chapter, and so on. The categories listed in the Topic Categorization tool represent the topics of interest to the teachers. For each discussion article, the Topic Categorization tool assigns a score for each of the categories, and will usually classify the article under the category for which the highest score is obtained. This approach helps teachers to classify the discussion into predefined discussion categories. To define the categories, teachers must first train the Topic Categorization tool with several sample discussion articles, after which the Topic Categorization tool will generate a category schema. The tool can then be used to categorize the discussion articles Because IBM Intelligent Miner for Text does not support Chinese, a similar text classification tool using TFIDF [17] is employed herein [18]. All the examples in this paper are translated from Chinese to English for illustration. Teachers could recreate the experimental results reported herein under the support of tools such as IBM Intelligent Miner for Text. Michael Chen s discusions can be taken as an example. The teachers first used several discusion in the category Question for Chapter 4 as training samples using the keywords: what, source code, member function, object, and object oriented programming language. Most of the keywords are curriculum related, and are aimed to let the teachers know which curriculum areas the groups are focused on. The topic categorization tool then classifies all the discussion articles according to keyword type, frequency, and position. The example discusion is asigned the highest score to the category Question for Chapter 4. Table 1 lists the categorization and output file obtained from applying Text Analysis Tool to the above example.

10 Category List Ranking Score Question for Chapter Question for Chapter Question for Chapter Question for Chapter Inquiry for system and environment Gossips discussion Table 1. The category result and output file of Michael Chen s discusion Table 1 lists the scores in defined categories generated by Text Analysis Tool for the example discusion. In table 1, the example discusion was categorized into Question for Chapter 4. Text Analysis Tool helps teachers to partially capture the learning status of a group, for example finding whether a group has problems in understanding chapter 4. The following figure illustrates the steps involved in training the discussion topic categories and categorizing the discussions in the IBM Intelligent Miner for Text: Categorizing Tool of Text Analysis Training Discussion in Category 1 Training Discussion in Category 2 Vocabulary Category 1 Vocabulary Category 2 Full Category Schema All the Discussion Documents Categorization Filter Categorized Documents Figure 2. Training and Filter in the Categorizing Tool 3.2 Summarizing the discussion of a group After classifying the discussions, the teachers may need to read the discussions and their contents to monitor the learning statuses of a group. However, the discussions can be very long and their contents irrelevant. Consequently, teachers summarize each discussion by locating the important sentences in each article, and then combining these sentences to summarize the article. The teachers employ Text Analysis tools such as IBM Intelligent Miner for Text to summarize the discussion and capture the learning status of

11 a group. For example, the Text Analysis tools extract the folowing sentence as the summary of Michael Chen s discussion: What is the purpose of a member function in an object in object oriented programming language? Other sentences in the example discussion are filtered out, allowing teachers to more easily capture the learning status of the group. That is a member of the group has difficulty in understanding the term member function in an object oriented programming language. The categorization and summaries extracted from the discussion board assist teachers in capturing the current learning status of the group. These learning statuses help teachers to intervene in group learning and enhance learning performance. Furthermore, these learning statuses are important information that helps teachers to discover the causal network of influence on learning performance and predict the influence on learning performance, which will be described in the following two sections. 4. Discover the Causal Dependence net of Group social interdependency factors and learning status indicators The aims of discovering the causal dependency net include (1) to find the key influences on final group performance, (2) to find a way for teachers to locate the group most likely to suffer low learning performance, (3) to suggest ways for teachers to enhance the performance of a learning group by manipulating important learning status indicators, (4) to provide a way to determine unobservable factors influence the learning performance based on available observed factors. Thus, the teachers can know how to apply appropriate strategies to enhance learning performance of a group by monitoring and manipulating the key influence factors, learning status indicators, such as the status of working on group project. Simultaneously, teachers can learn from the joint probability among these factors and learning performance, guiding them in applying strategies to promote the factors and thus enhance group learning. The previous section describes how learning status can be obtained from discussion contents. Additional learning statuses can be derived from group activity behavior that is recorded in the web logs and the group portfolio. The web logs are the records of users accesing histories, including login time,

12 login frequency, reading path, page reading count, and so on. Meanwhile, the group portfolio is a history of the group s actions and outcomes during the period in which the group was performing the tasks asigned to it. The portfolio includes the outcomes of tasks assigned to the group, amount of resources shared among group members, and the role assigned for each member. To enhance group learning performance, teachers need a tool to determine the major dominant influences that can improve group performance from the causal network of the learning status indicators. The Bayesian Belief Network (BBN) [19] is a directed graph comprising nodes and arcs that represents the causal relationships [35] between learning status indicator attributes. This section demonstrates the BBN analyzing tool that helps teachers determine learning status based on the group activity status. 4.1 Group learning status represented in a Bayesian Belief Network BBN is a directed graph comprising several nodes and arcs. The nodes represent learning status indicators, while the arcs represent that there is a causal dependency between two learning statuses [13]. For example, the status working on group project will influence the status group final grade. BBN is widespread in many fields for representing causal relationships among variables. Each node in BBN also stores the probability distribution of possible values of the node over all the possible value combinations of the variables in the source nodes of the incoming edges of the node. In an educational environment, the nodes represent the indicators of the group s learning status, while the arcs represent the causal relationships among two or more learning status indicators/outcomes. Our novel methodology uses the BKD tool to create BBN from learning status indicators and outcomes that are derived from the discussion board and the group portfolio. BBN assists teachers in deriving the causal relationships between the learning status indicators. Teachers can then use BBN to enhance learning performance and prevent group learning failure. Table 2 lists some of the learning status indicators that teachers use to monitor and analyze relationships among the groups. The learning status indicators are defined according positive social interdependence, as proposed in [20]. In the above example, teachers try to determine the relationships among positive social interdependence and the group learning achievement. The categories defined by Johnson s Interdependence Typology [20] are used herein, namely: outcome interdependence (goal and

13 reward interdependence), and mean interdependence (task, resource and role interdependence). This study defines a list of group learning status indicators that can be extracted from the group discussion board and group portfolio. The group portfolio includes answers to online questionnaires. The group learning status indicators include Compromise Group Goal, Online Conference, Conflict, Online Message and Competition. Compromise Group Goal, Online Conference and Conflict are of the goal interdependence. Compromise Group Goal is the degree to which the group compromises on its group goals. Conflict is the frequency of conflicts during discussions on reaching a compromise reaching the group goal. Finally, Online Conference is the frequency with which group members use real time discussion boards for group discussions. Online Message and Competition are of the reward interdependence. Online Message is the frequency with which group members help other teammates to increase the rewards of the group. Finally, Competition is the frequency of competition with other groups. Working on Project is the task interdependence, and is represented by the time group members spend working on group tasks. Reading Resource, Uploading Resource and Updating Resource are of source interdependence, and are represented by the reading, sharing, and updating frequency of members on the group resource space. Leader Success is of the role interdependence and is represented by the group leader s capability in fulfilling her/his duties. Grade represents the grades of the group. Learning Status Related Positive Interdependence Compromise Group Goal, Conflict, Online Conference Goal Interdependence Online Message, Competition Reward Interdependence Working on Project Task Interdependence Reading Resource, Uploading Resource, Updating Resource Resource Interdependence Leader Success Role Interdependence Table 2. The partial learning statuses related to positive interdependences The Conflict node represents the frequency of conflict in each group discussion board. Working on Project, Reading Resource, Uploading Resource, Updating Resource, Online Message and Competition nodes represent the frequency with which these actions occur in the web logs and group portfolio. The articles in the discussion board are first categorized using the text categorization tool, and information of interest is then extracted from the discussion board on the basis of the classification. For

14 example, the frequency of conflict is measured as the number of articles categorized as dealing with conflict that occurring within a defined period. The Leader Success and Compromise Group Goal nodes are derived based on the relevant answers obtained from questionnaires. Grade was based on the examination results. All of the learning statuses indicators are inputted into BKD to construct a causal network. 4.2 Observing group learning status and discovering the key influence on learning performance by BKD BKD is a knowledge discovery tool that can extract usable knowledge from databases without requiring users to have a background in statistics and programming. BKD uses BBN to graphically represent the causal network of learning status indicators and outcome. Once the BBN is constructed from the database, the network can be used in a BBN reasoning system to provide information for teachers, including observations, predictions and support for decision-making. BKD needs a text file to be exported from a database to construct BBN, and this input data can be numeric or discrete data [21]. Table 3 lists 70 groups in each class. All the values are derived from the discussion contents, group portfolio, web logs, and questionnaires. Various quantity attributes/learning status indicators that are extracted from the web logs are converted into three levels (HIGH-MID-LOW, and so on) so that they can be used to predict other attributes, learning status indicators, and learning outcome. Group Online Working Competition Conflict Online Reading Uploading Updating Compromise Leader Grade Id Discussion on Task Message Resource Resource Resource Group Goal Success MID EXCELLENT B LOW EXCELLENT B LOW FAIL C MID EXCELLENT B MID FAIR B HIGH FAIR B LOW FAIR B MID EXCELLENT B MID FAIL B MID FAIR C LOW FAIR D Table 3: The data file extract from the learning statues BKD derives the causal relationships between learning status indicators, but becomes inefficient if the learning status indicators continue to increase. The major reason for this phenomenon is that BKD

15 needs to find relationship among learning status indicators trying all possible pairings. This research spent over an hour analyzing the causal relationships between all 52 of these learning status indicators. For the sake of efficiency and readability, the novel methodology uses the factor analysis of SPSS [22] to reduce the number of variables. Following the Kaiser [23] varimax factor analysis, the 52 attributes can be reduced to 11 rotated components (λ 1,cumulative variance=77.737%), simplifying the causal relationship construction for BKD. The teachers can use the factor analysis tool of SPSS to cluster all learning status indicators into components. In order to estimate group final grade, the teachers could select the set of learning status indicators that are of the same a component with group final grade. Then, they can use BKD to build the causal relationship between the selected indicators f and group final grade. After importing the table into BKD, a BBN of the causal relationships and associate probability distributions wil be constructed. Figure 3 ilustrates one of the BKD results of the Introduction of Computer Network and Application course. In this figure, the nodes represent the learning status indicators, while the arcs represent the causal relationships among the nodes. This figure will support teachers in observing the learning status of the group and make instructional decisions to assist and enhance group learning performance. Teachers observed two kinds of BKD results: 1) Enhance a set of learning status indicators by determining the learning status indicator that affects most of them For example, the BBN in Figure 3 ilustrates that the Working on Project learning status may influence the three learning status: Reading Resource, Uploading Resource, and Grade. Accordingly, if group members frequently work on the group project, they are prefer reading the group resources, to share their resource with the members. Meanwhile, the group grade also has high probability affected by this frequency. Thus, if the teachers wish to enhance the frequency of sharing resource and enhance the group grade simultaneously, they should encourage group members to frequently engage in group tasks. 2. Enhance learning status by determine the learning statuses that influence it For example, the BBN in Fig. 3 ilustrates that the frequency of Online Discusion is influenced by the frequency of Uploading Resource, while the frequency of Uploading

16 Resource is in turn afected by the frequency of Working on Project. Thus, teachers can thus probably promote discussion within a group (Online Discussion) by encouraging members of the group to work on the group project (Working on Project) Figure 3: The example of Bayesian Belief Network In the BBN derived by BKD, presented in Fig. 3, each node contains a probability distribution table that represents the probability distribution of all value combinations for each possible value of the node. Meanwhile, Table 4 lists the conditional distribution of Grade nodes that depend on the Working on Project node. The frequency of Working on Project is divided into levels by BKD: 11-75, , and , and the table thus contains four rows, one corresponding to each level. The group grade can have 4 possible values, A, B, C, and D, ranging from excellent to a fail. Grade A represent group grades that exceed the average by one standard deviation, grade B represents group grades that exceed the average by less than one standard deviation, grade C represents the group grades are below the average grade by less than one standard deviation, and grade D represents group grades that are below the average grade by more than one standard deviation. The first column of table 4 is the values of working on project attribute. The other columns show the probability of group final grade (for the value indicated in the first row) for the corresponding working on project ( for the vlue indicated in the first column). For example,

17 if the degree of Working on Project is , there is a probability (93.1%) that the group will receive a B grade. Meanwhile, if the frequency of Working on Project is 11-75, the group has probability (69.9%) of receiving D grade. Table 4: The probability table of Grade learning status The BKD system can assist teachers in deriving the causal relationships among learning status indicators and outcome. Teachers can thus help a group to improve group learning performance by enhancing the status that will influence group performance. For example, encourage group members to work on the group project will improve the group performance.. After several semesters, the results of applying BKD could be used to predict learning status for different teachers, as will be described in the following section. 5. Predicting group performance from initial group discussions and portfolio The causal network and probability distribution constructed by BKD are important teaching experiences for teachers. Moreover, these relationships are presented in a graphical form that is easy for different teachers to understand and exchange. However, these teaching experiences do not support predictions of group learning performance. The teaching experience and help teachers predict learning performance, tools to assist teachers to accumulate past teaching experience are necessary, along with tools for predicting group learning performance early in a class. Predicting learning performance is intended to locate groups destined to fail. For example, if the BKD causal net of past experience infers that a group is going fail, the teacher should intervene in the group to enhance associate learning statuses. A Bayesian classifier RoC is adopted herein to predict the eventual group performance by using data taken early on in a class. The prediction involves two phases: training and prediction. First, RoC is used to produce a prediction scheme from learning statuses and the target value group final grade of a previous

18 class. RoC then uses this prediction scheme to predict the final grades of groups in the current class based on available learning statuses before the final grades of the groups are known. Four steps are involved in applying RoC to predict learning performance: 1) Define the Bayesian classifier (prediction scheme) from a database: The input of the RoC is the set of attributes that are indicators of learning statuses derived from discussion, web log, and group portfolio. Table 3 illustrates an input file to RoC. The file format is as in BKD system, with the first row showing the attribute names, while the other rows list the values of corresponding attributes. The values can be numerical or categorical. 2) Class selection: Select one of the learning status indicators as the class, that is, the target attribute being predicted. The class in RoC represents the attribute that the teachers want to predict. Before predicting evaluation data, the RoC system generates a classifier from learning data for the selected class. 3) Learning the BBN: RoC automatically learns a prediction model from the learning data, and then determines the dependence and probability distribution among the attributes (learning status indicators). The output of this step is called the classifier, and supports teachers predictions of learning performance evaluation data in the following step. 4) Predicting learning performance: Once the classifier has been trained, current learning groups can be classified into a class. The classifier generated by RoC classifies current groups and predicts their grades. The current experiment involves 706 students and 70 groups. Each group collaboratively learns the same curriculum on web. The teachers collected 52 learning status indicators, defined based on the basis of Johnson s positive social interdependence [20]. Herein, teachers collect these learning statuses based on just one course from each semester. Thus, the leave-one-out cros validation method [32] is employed to demonstrate the prediction capability of the RoC system. The following paragraph illustrates the leave-one-out method of RoC prediction: Step1. Teachers collect all the learning statuses as input files to RoC. Herein, Table 5 lists the input

19 data for constructing the classifier. All attributes are clustered into components by factor analysis tool of SPSS. In our experiment, we only select the attributes that are of the same component with group final grade. Step2. Select one of the learning status indicators of input data as the class. For example in Table 5, the teachers attempt to determine which group will have lower grades at the end of the semester, and thus the Grade was selected as the class. Step3. Select some of the groups as the evaluation group, and define the others as training groups. For example in Table 5, the evaluation group is groups 1 to 10, and the grade value of the evaluation group is marked as? for predicting RoC. The evaluation groups are used to evaluate the accuracy of the classifier. Step 4. Discover the rules of the training data and predict the class value of the evaluation group. Table 6 lists predicted results for groups 1 to 10. Step 5. Repeat steps 1 to 4, using different sets of ten groups as the evaluation data until the class value of all of the groups has been predicted. Table 7 lists the predicted results following 7 predictions. Group Online Working on Competition Reading Uploading Updating Grade Id Discussion Task Resource Resource Resource ? ? ? ? ? ? ? ? ? ? D Table 5: The input file for prediction of RoC

20 Group id Predicted Grade Grade Probability of Grade A Probability of Grade B Probability of Grade C Probability of Grade D 1 B B B B D C C B B B B B C B B B B B C C Correct: 8 Incorrect: 2 Accuracy: 80 % Coverage % Table 6: The output file of predicted result In Table 6, the coverage shows that all cases of evaluation data are predictable in RoC. The column Grades represents the original grade values of evaluation data, while the column Predicted Grade is predicted by RoC using the input data from Table 5. The system predicted that group 1 would obtain a Grade B at the end of the semester, and this prediction was proved correct. However, the predictions of groups 3 and 7 were in corect. The column Probability represents the probability of a particular group obtaining each grade value. In this experiment, the predictions were found to be 80% accurate (8 correct, 2 incorrect). Meanwhile, Table 7 displays that after running 7 predictions for each testing data, the average accuracy was 74.28%, sufficient for teachers to use to predict the group learning status. Testing data Accuracy Group 1 to group % Group 11 to group % Group 21 to group % Group 31 to group % Group 41 to group % Group 51 to group % Group 61 to group % Average Accuracy % Table 7. The 7 times of prediction for Grade value and the accuracy RoC can be used to predict not only Grades but also all other group learning statuses. 6. Experience and Discussion Our experiment was performed on the Introduction of Computer Network and Applications course, and took place in a web learning environment. The curriculum included some fundamental concepts of computer networks, and the programming language/application for constructing WWW pages such as

21 TCP/IP, Network security, HTML, JavaScript, FrontPage, and so on. Meanwhile, data mining tools, Text Analysis Tool, BKD, and RoC, were employed to extract information to assist teachers to track group discussions, observe group learning status, and illustrate causal relationships between leaning statuses. This information helps teachers to prevent groups from failing The following table shows the issues investigated herein, along with the associated methods, or tools adopted for solving the issues. Tasks Input Data Tools Output Data Capture the learning status by extracting a topic and abstract from group discussions Discussion contents Text Analysis Tool Group status Discussion Status Discover important influences on learning performance by analyzing causal dependence of the learning status Predict learning performance to prevent groups from suffering low performance Discussion status, web logs, questionnaire answers Learning statuses and learning performance of previous classes, learning statuses of current class The Bayesian Knowledge Discoverer (BKD) The Robust Bayesian Classifier (RoC) BBN of learning statuses indicators Final prediction of group learning performance Table 8: The input/output data and tools of three primary tasks when teachers extract learning information and guide group learning 6.1 The participants and the grouping on web The participants included 7 teachers, 5 teaching assistants and 706 students at National Central University, Taiwan. 459 (65.0 %) of the students were male, and 247 (35.0 %) were female. The average age was After spending a month familiarizing themselves with the learning environment, the students were systematically grouped into several heterogeneous groups [25]. The grouping criterion included personal profile and thinking style, with the personal profiles including gender, age, area of residence,education and so on. The teachers selected the student s thinking style as the major concern for grouping. Herein, teachers categorized students according to 13 thinking styles, as proposed by Dr. Sternber, namely [26]: legislative, executive, external, judicial, monarchic, hierarchic, oligarchic, anarchic, global, local, internal, liberal, and conservative. Herein, each group is grouped according to the specification given by the teacher. There are 70 groups. Each group containing 10 to 11 students. The students read the curriculums on video compact discs (VCD). After reading the curriculums, the students must register with the NCUVC web collaborative learning system [28] [29], which supports a peer

22 interaction space [29][30], collaborative project space, resource sharing space, and so on. The first group task is to elect a group leader, co-leader and reporter. The group private working space included online and offline discussion rooms, a resource sharing space, a portfolio space [31], a project scheduler, and a window for querying member working status. 6.2 The results of Monitoring group discussions The total number of discussion herein is The teachers defined 25 interesting discussion categories, and selected 25% of the articles as the training data. The teachers immediately assigned one of three grades, Good, Acceptable, and incorrect, to the analyzed result. The analytical results reveal that 96.6%, and 73.3% of the abstracts received grades of Good and Acceptable, respectively. Furthermore, the longer of an article is, the better the grade it receives. Table 9 illustrates the derived results and teacher feedbacks: Feedback Topics Abstract Good 73.0 % 55.0 % Acceptable 23.6 % 18.3 % Mistake 7.3 % % Table 9: The results of deriving attributes and abstract of an article Part of the reasons that cause the incorrect categorization and summarization are listed as follows: : 1) Adages: Some adages are not considered by the teachers and are excluded from the keywords database. These adages could not be derived from articles but are important in the discussion articles. 2) Mistyped keywords: If keywords in articles were mistyped, the system was unable to locate and properly classify them. 3) Program source codes: The program structure could not be derived from keyword retrieval. 4) Short discussion articles: Short discussions contained few representative keywords and sentences. 6.3 Predicted results for the flunked groups The classifier produced by RoC system can assist teachers in predicting the final performance of groups. In our experiment, the prediction accuracy achieved herein for predicting performance was 74.28%. Actually, the teachers are more interested in finding the groups that may have poor learning performance or

23 be flunked. Thus, they can help these groups to improve their learning. Table 10 shows the results of predicting the flunked groups. Group Id Predicted Result Probability for Predicted Result 11 D D D D D D B D D D D D D Accuracy for flunk % Table 10. The predicting results and accuracy for flunk groups Table 10 showed that 12 of the failed groups out of 13 groups were predicted correctly. The learning statuses and final grades of the flunked groups are shown in Table 11. Group Online Working on Competition Reading Uploading Updating Grade Id Discussion Task Resource Resource Resource D D D D D D D D D D D D D Table 11. The extract of learning status of flunked groups Thus, teachers could focus on this group in Table 11 to determine why these groups failed. 7. Conclusion To assist teachers in monitoring and guiding group learning, this work presented the methodology for deriving knowledge for observing group status, discovering the key influence of group performance, and the rule to predict learning performance. The Bayesian method is an efficient way to achieve the above goals. Without the proposed mechanisms, teachers must spend considerable time in trying to infer group status from vast unorganized web logs. The causal relationships of group learning situations are hard to track. In traditional approaches, group learning situation have to be determined based on the teacher s individual experience that is imprecise and could not be reused for other teachers. This investigation (1)