ConTag: Conceptual Tag Clouds Video Browsing in e-learning

ConTag: Conceptual Tag Clouds Video Browsing in e-learning 1 Ahmad Nurzid Rosli, 2 Kee-Sung Lee, 3 Ivan A. Supandi, 4 Geun-Sik Jo 1, First Author Department of Information Technology, Inha University, South Korea nurzid@eslab.inha.ac.kr *4, Corresponding Author School of Computer and Information Engineering, South Korea gsjo@inha.ac.kr 2,3 Department of Information Technology, Inha University, South Korea {lks,ivanaries}@eslab.inha.ac.kr Abstract This paper presents our proposed method to automatically generate topical video content. We exploiting video transcripts to generate Conceptual graph and Tag clouds visualization, known as ConTag or Contextual Tag. The system is able to provide users with contextual information according to the video content. To achieve this we employed popular text mining algorithm, TF-IDF model and phrase association algorithm, Apriori algorithm. In this work, we also solved passive learning problem by facilitating users with interactive conceptual visualization system and communication mechanism over Social Networking Service (SNS) which accompanied by contextual and temporal information correspond to the particular time and scene from the video. 1. Introduction Keywords: Video Summarization, Tag Clouds, Conceptual Graph, E-Learning In recent years the popularity of audiovisual content as teaching and learning resources has grown over the net (e.g.: KhanAcademy, Yovisto, TED Conference (Technology, Education and Design Conference), Edu - YouTube and videolectures.net). According to study report from Nielsen Company, indicated that American spends more than 41 hours each week engaging with content across all screen (i.e.: via television, TV internet computer, smartphones, and tablet) [1]. This tremendous increased number of videos demanded an efficient way of supporting exploration and navigation of multimedia data [2]. Therefore, there is a need to generate an understanding from the content and context of the video or in other words video summarization. There are significant studies devoted to provide better interpretation of the video content [3][4]. Generally we found two issues in academic video representation: 1) How to provide a context of understanding which may vary in the videos, therefore it is easy for users to jump into particular context at particular video scene and if they are interested to have further discussion over the corresponded video. 2) What is the suitable form to represent such context mentioned earlier, so that easy for users to grasp what elements may available to represent the video? Under those circumstances, we proposed a novel system that solves aforementioned circumstances by providing conceptual hierarchy ontology visualization to represent video content known as Conceptual graph and the Tag Clouds. To achieve this, we utilized transcript resources available on the videos by employing text mining method in our system called ConTag or Conceptual Tag as e-learning system. In this work we present text mining approach by utilizing video transcript resources to help user to decide the worthwhile to watch through the whole video as a part of e-learning process. Additionally, we also facilitate practical way for user to access video content or scene through intuitive interface known as ConTag. In other words, we proposed an automated video summarization through text mining by employing and widely used algorithm for retrieving document the TF-IDF algorithm [5][6]. 2. Related Work In this section, a short overview on tag cloud and video summarization is described. Generally word cloud or tag cloud known as visual representation of the frequency of the words in any written material, Research Notes in Information Science (RNIS) Volume14,June 2013 doi:10.4156/rnis.vol14.20 109

such as lectures notes or textbook chapter [7]. The font size is used to indicate word frequency, so the larger the font size allows getting a quick impression of the relevant concepts available in transcripts. Gottron in his work, claims that user has to read the text to understand which word is important. In his research work, the idea of enhancing perception of documents with visualization techniques are borrowed from the tag clouds [8]. In similar way, Cui et al., use word cloud to depict the representative keywords [9]. Meanwhile, Miley and Read introduce words cloud as a tool that assist student learning. The tool enhances students motivation and engagement with the learning [10]. On the other hand, Haubold introduce new technique for extracting, meaningful textual information from low-accuracy lecture transcripts using an external corpus of index phrases [11]. Similarly to our work, for the purpose of indexing, summarization, and cross-referencing, Open Directory Project (ODP) are used to extract the domain. The ODP is human-edited index of web sites and used to list and categorize sites. In his work, he also believes, the transcripts would include theme and topic phrases that describe the topic in a given lecture. 3. System Architecture This section describes how our proposed system works (see Figure 1). The system is called ConTag. For the implementation purpose, we have selected a video from TED conference which promotes free knowledge and inspired thinker s talks. To demonstrate our work and explanation purpose, we have selected a talk from Tim Berners Lee with title The next web (16 min 20 sec long). We extract readily segmented transcript with associated temporal information from the video. Each segmented transcript in this video is treated as a collection of documents and is assigned with transcript ID. The video is consisting of 159 documents with 3083 words. Noted, we will switch back and forth to use the term document as a transcript and vice versa. 3.1. Text Extraction Method Figure 1. ConTag System Architecture The transcripts are varied according to the topics and duration length. It is important to note that it contains various complexities which due to the nature of spoken language which informal, weak grammar structure and usage of common argumentative words and sometimes it contains incomplete sentence. Therefore, in this work we will: (i) Help user to decide worthwhile to watch the whole video content by providing an interactive conceptual hierarchical ontology obtained from ODP that summarize the video content; (ii) Create tag clouds while still maintaining the semantic content; (iii) Provide communication mechanism to engage active learning among users by integrating the SNS in the system. In the first phase, to generate the keywords or terms from the transcript, we first filtered the document and eliminate unimportant terms by using stop words function. Then, we compute the importance of each term in each document by employing the traditional term weighting in Information Retrieval (IR) system based on TF-IDF model. We first count how often the term appears in particular document. For each term t we determine its document frequency df(t). For a given document d we then 110

determine the term frequency. The TF-IDF weight for term t in document d, denote the total number of document by N is defined as: ( ) ( ) (1) The aforementioned formula (Eq. 1) describes a weighting scheme for term in a vector space IR model. If a query term matches an index term with a high TF-IDF value, the corresponding documents obtain a higher relevance score. The term which score the highest the TF-IDF scoring, represent well the document compared to other documents. The score value is the index terms and known as feature vector. It will eliminate unimportant words, in other words system that treats words as occurring independently. Therefore, it makes no use of semantic similarities between words. (i.e.: world, wide, web as a single term compared to the world wide web term). In second phase, we need to employ the traditional phrase association algorithm known as Apriori algorithm [12]. In other words, the phrase association algorithm will eliminates any unimportant words. Instead, we generate important keywords or terms that relevance to the topic, based on keywords or terms that tend to occur together (e.g.: World Wide Web, Semantic Web and etc.). Apriori algorithm efficiently finds all frequent unordered and ordered keywords or terms in given collection of document from a transcript (see Figure 2). This algorithm will run through over a transcript as D, document of transactions to compute set of frequent k-phrase patterns for every k = 1,,d. In step 1 of Apriori finds the frequent 1-itemsets,. In step 2 to 10, is used to generate candidate in order to find for k 2. The apriori_gen procedure generates the candidates and then uses the Apriori property to eliminate those having a subset that is not in frequent (step 3). The document is scanned in step 4, once all candidates have been generated. It is important to note that we set the bound distance between candidates to 1 for each transaction in step 5. Then, a subset function is used to find all subsets of the transaction that are candidates and count for each candidate is accumulated (step 6 and 7). At the end, those candidates satisfying minimum support (step 9) form the set of frequent item sets, L (step 11). As a result, relevant keywords are generated from the transcript. Then, the feature associates with the original keywords are used to assign to the new keywords. These new keywords with feature vectors are used in the next phase to compute the similarity measure for cross reference purpose. 1) L = {large 1-itemsets}; 2) for (k = 2; L k ϕ; k++) do begin ( ) 3) C k = apriori-gen ( L k ); // New candidates 4) forall transactions t 2 D do begin 5) C t = subset (C k, t); // Candidates contained in t 6) forall candidates c C t do 7) c.count++; 8) end 9) L k = {c C t c.count min_sup} 10) end 11) Answer = k L k ; Figure 2. Modified Apriori Algorithm As shown in ConTag system architecture, we also employ the TF-IDF to the hierarchical ontology schema known as ODP to generate the feature vector similarly to feature vector extracted from the transcript. The ODP is important to provide cross reference to term extracted from the corresponding transcript. To achieve this, we need to compute the keywords or term similarity from both of the transcript and the ODP by employing the cosine similarities measurement on the third phase (see Eq. 2). The relevancy can be measured by traditional cosine similarity between the feature vector of the keywords from the transcript and topic from ODP. Given two documents and their cosine similarity is: ( ) (2) 111

Where (D = Document) and (C = Categories) are m-dimensional vectors over the term set * +. Each dimension represents a term with its weight in the document, which is nonnegative. As a result, the cosine similarity is non-negative and bounded between [0, 1]. For example, computer science is the subclass of computers that has feature vectors (, ) to represent the topic computers and similarly every keyword from the transcript has a set of keywords feature vector (, ). Therefore, term is assigned to topic T by matching the degree of relevancy. As the matter of fact, the process is important in order to cluster the transcripts according into its relevance domain and sub-domain (see Table 1). In general ODP consists of 16 main categories including Arts, Business, Science and etc. These transcripts are clustered then are mapped as conceptual hierarchical ontology visualization as seen in phase 4. This will provide quick scanning and helps user to make a relevance decision to jump to the visualized conceptual hierarchy. This is one of the main contributions of the ConTag which provide an impression of the worthwhile whether or not to watch the whole video or on selected segment. Domain Computers etc. Table 1. Domain and sub-domain of Computers in ODP Sub-domain Computer Science: Artificial Intelligent/Distributed Computing,/Computer Graphic,/Parallel computing etc. Internet: On the web,/searching/web design and development. etc. etc. Meanwhile, the idea to use the word clouds or tag clouds to transfer visualization of the topics over the corresponding transcript is also visualized in phase 5. In different with previous phase (phase 4), which visually summarize the document as a whole and provide a hierarchical visualization to represent the transcript, phase 5 will unfold the keywords or terms that present on selected sub-concept. The details explanation is described in next section. To promote collaborative learning, we facilitate our e-learning eco-system with communication mechanism among users (e.g.: students- to-students and students-to-lecturer vice versa) by integrating ConTag with Twitter API. This will engage active learning while watching the videos by harnessing diversity and open opportunity for student to teach each other by interacting with the comment or message thread and hash tag ( # ) message related to the video. 3.2. Conceptualization Visualization Interface For the purpose of visualizing the content of the transcript, efficient user interface design is a must. Therefore, we distinguish five interface parts in our system (see Figure 3). The video will be display at (a) Video display part. For the transcript visualization representation, it is represented by two type of representation: (i) Conceptual hierarchical ontology form and (ii) Tag clouds form. Both representations will be displayed at (b) as Conceptual graph and Tag clouds. The Conceptual graph form is hierarchical representation of the ontology schema obtained from ODP. It is mainly to portray in which domain of particular video belongs to. It dynamically changes correspond to what video is loaded. The main idea is to derive user to jump insight of the video context domain. This may help user to grab what-is-what? that correspond to the video transcript. User can interactively use the Conceptual graph to navigate related concepts shown in this part. This is certainly a significant feature in ConTag system that helps to reveal the conceptual hierarchical ontology that represents the video content. Once it selected, then the system will unfold or display related sub-domain (sub-topic). The tag cloud is organized according to the weight value which refers to the terms frequency exists in the document. The tag clouds are unfolded after one of the conceptual hierarchy ontology is selected. Each of the tag clouds comprises of highlighted keywords or term which reflects the particular subdomain in conceptual hierarchical ontology. Our objective is to highlight the terms that exist in the particular transcript. These words are then linked (mapped) to a list of relevant transcript with 112

associated temporal information at (c) Transcript box part. The transcript box is necessary in order to allow users to have a glance of idea on particular transcript that reflected from the word cloud. Users can jump to particular scene by selecting or click on provided icon next to the transcript. Meanwhile at (d) ConTag Tweeting message, provides communication mechanism for the users (i.e.: students-to students or students-to-lecturer vice versa) to engage active discussion over the particular video. The details explanation about this mechanism and scenario can be found on the next section. Figure 3. ConTag interface design Figure 4. ConTag message format 3.3. Integrating SNS with ConTag This section will describe how we facilitate the communication mechanism between users over the topics in particular video. The main idea is to promote active participation between users in ConTag ecosystem and engage learning with SNS platform [13, 14]. Initially, segmented transcripts are represented as conceptual graph and tag clouds. Each consists of temporal information to derive user to jump to the particular scene in the video. In contrast, let consider these scenarios: 1) What if users wanted to pose a question at particular time or scene? However, the text alone is relatively abstract to express such context in the discussion over the video. 2) What if someone has left a comment or pose a question at that particular time and how users may respond to grasp the understanding from the question? 3) What if users wanted to share interested part or particular scene with others in SNS (i.e.: Twitter)? In that case, we provide communication mechanisms called ConTag Tweeting message by utilizing the Twitter API. As shown in Figure 3, this ConTag Tweeting message will facilitates the communication among users in ConTag ecosystem. Details about ConTag Tweet message format is shown in Figure 4. It important to note, the time and hierarchy significantly help user to grab instantly the message or question context that are referring to that particular time or scene in the video. User may write any comment or message in comment box. We can call this as Twitter of ConTag or a twitter of e-learning ecosystem. This on the other hand, enables user to label or annotate with the comment over particular video. User also can interact with the message left by other users by simply click respond button, and message window will pop-up on top of the page. In fact, any message left by the user will unfold (overlay) on top of the video at time where the message is left while other users watching the video (see Figure 3 e). Users also can expand the entire message left by other users without having to wait to unfold at that particular time. 4. Conclusion and Future Works We have proposed a design to summarize and accessing video content which can maintain the semantic context based on the provided transcript. Our main contribution is providing contextual information through interactive interface design that helps user to decide the worthwhile to watch the whole video through the Conceptual graph and Tag clouds. This allows us to address missing context that exists in one way learning process (video to user). The Conceptual graph provides conceptual hierarchical ontology as visualization interface that helps user to provides quick scanning and make relevance decision to jump to particular scene. In fact, it also facilitates contextual information to 113

provide context of discussion related to particular scene or time in the video, which usually ignored. Usually they simply share the plain video and sometimes accompany with long description in order to explain where and which particular part of the video they are referring to. Thanks to our temporal info and hierarchy info in ConTag Tweeting message which address aforementioned circumstances. Another key point is, we also to promote active learning and collaborative learning over particular video content without losing the context of the discussion through the communication mechanism. We facilitate context and understanding sharing that absence in one way learning process (video to student). For future work, the encouraging numbers of video for educational purpose motivates us to expand our system to another domain available in ODP. We are also considering implementing other text mining method to improve the relevancy of generated keywords or terms. 5. Acknowledgement This research was supported by the Ministry of Knowledge Economy (MKE), Korea and Microsoft Research under IT/SW Creative Research Program supervised by National IT Industry Promotion Agency (NIPA) (NIPA-2012-(H0503-12-1024)). 6. References [1] Free to move between screens: The Cross Platform Report 2013, retrieved from http://www.nielsen.com/us/en/reports/2013/the-nielsen-march-2013-cross-platform-report--freeto-move-betwe. html [2] Xu, C., Zhang, Y. F., Zhu, G., Rui, Y., Lu, H., & Huang, Q, Using webcast text for semantic event detection in broadcast sports video. Multimedia, IEEE Transactions, vol. 10, issue 7, pp. 1342-1355, 2008. [3] Liao, C. W., Chan, K. H., Cheng, B. Y., Tsai, C. H., Chang, W. T., & Chuang, Y. L., An open framework for video content analysis. In Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific, pp. 1-8, 2012. [4] Li Xiang-Wei, Zhang Ming-Xin, Zhao Shuang-Ping and Zhu Ya Lin, A Novel Dynamic Video Summarization Approach Based on Rough Sets in Compressed Domain, Information Technology Journal, vol. 8, pp. 388-392, 2009. [5] Salton, G., Developments in automatic text retrieval, In Science (New York, NY), vol. 253 (no. 5023), pp. 974-980, 1991. [6] Tan, S., Tan, H. K., & Ngo, C. W. (2010, October). Topical summarization of web videos by visual-text time-dependent alignment. In Proceedings of the international conference on Multimedia (pp. 1095-1098). ACM. [7] Bateman, S., Gutwin, C., & Nacenta, M., Seeing things in the clouds: the effect of visual features on tag cloud selections, In Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, pp. 193-202, 2008. [8] Gottron, T., Document word clouds: Visualising web documents as tag clouds to aid users in relevance decisions, Research and Advanced Technology for Digital Libraries, pp. 94-105, 2009. [9] Cui, W., Wu, Y., Liu, S., Wei, F., Zhou, M. X., & Qu, H, Context preserving dynamic word cloud visualization, In IEEE Pacific Visualization Symposium, pp. 121-128, 2010. [10] Miley, F., & Read, A., Using Word Clouds to Develop Proactive Learners, Journal of the Scholarship of Teaching and Learning, vol. 11(2), pp. 91-110, 2011. [11] Haubold, A., Analysis and visualization of index words from audio transcripts of instructional videos, In Multimedia Software Engineering, IEEE Sixth International Symposium proceedings, pp. 570-573, 2004. [12] Agrawal, R., Imieliński, T., & Swami, A., Mining association rules between sets of items in large databases, In ACM SIGMOD Record, vol. 22 (No. 2), pp. 207-216, 1993. [13] Shamma, D., Kennedy, L., & Churchill, E., Tweetgeist: Can the twitter timeline reveal the structure of broadcast events, In CSCW Horizons, 2010. [14] Sack, H., & Waitelonis, J., Integrating social tagging and document annotation for content-based search in multimedia data, In Semantic Authoring and Annotation Workshop (SAAW), 2006. 114