Enriching Subtitled YouTube Media Fragments via Utilization of the Web-Based Natural Language Processors and Efficient Semantic Video Annotations

Enriching Subtitled YouTube Media Fragments via Utilization of the Web-Based Natural Language Processors and Efficient Semantic Video Annotations Babak Farhadi Department of Computer engineering University of Tehran, International Campus Kish Island, Iran babakfarhadi@ut.ac.ir Abstract: - Semantic video annotation is an active research zone within the field of multimedia content understanding. With the constant increase of videos published on the famous video sharing platforms such as YouTube, more efforts are spent to automatically annotate these videos. In this paper, we propose a novel framework to annotating subtitled YouTube media fragments using both textual features such as handle all the main portions extracted from web-based natural language processors of in relation to subtitles, and temporal features such as the duration of the media fragments where particular entities are spotted. We implement SYMF-SE (Subtitled YouTube Media Fragment Search Engine) as an efficacious framework to cruising on the subtitled YouTube videos resident in the Linked Open Data (LOD) cloud. For realizing this purpose, we propose Unifier Module of Outcomes of Web-Based Natural Language Processors (UM-OWNLP) for extracting the essential portions of the 10 NLP web-based tools from subtitles associated to YouTube videos in order to generate media fragments annotated with resources from the LOD cloud. Then, we propose Unifier Module of Outcomes of Web-Based Named Entity Booster Processors (UM-OWNEBP) containing the six favorite web APIs to boost outcomes of Named Entities (NE) obtained from UM-OWNLP. We will use dotnetrdf as a powerful and flexible API for working with Resource Description Framework (RDF) and SPARQL in.net environments. Key-Words: - Subtitled YouTube video, NLP web-based tools, textual metadata, semantic web, video annotation, video search, web-based natural language processor. 1 Introduction For better understanding the discussions located in introduction section, we have divided them in six subsections that can be summarized as followings: 1.1 Subtitled YouTube videos and their obvious defects Nowadays, on-line video sharing platforms, especially YouTube shows that video has become the medium of selection for lots of people interchanging via the net. On the other hand, the amazing increase of online video contents particularly subtitled YouTube videos confront the consuming user with an indefinite amount of data, which can only be accessed with advanced semantic multimedia search and special management technologies in order to retrieve the few needles from the giant haystack. The most on-line video search engines provide a keyword-based search, where lexical ambiguity of natural language often leads to imprecise and defective results. For instance, YouTube supports a keyword-based search within the textual metadata provided by the video users and owners, accepting all the shortcomings caused by e.g. homonyms (see Figure 1). However, in our opinions, enriched and meaningful results are possible through the analysis of the available textual and timed text (caption) metadata with web-based natural language processors of in cooperating with Named Entity (NE) booster processors and efficient semantic video annotations, especially given the availability of subtitles metadata on YouTube videos. 1.2 Semantic web technologies and linking open data Role of semantic web technologies is to make implicit meaning of content explicit by providing adequate metadata annotations based on formal knowledge representations. In this regard, Linked Data (LD) means to expose, share and connect fragments of data, information and knowledge on the semantic web using Uniform Resource Identifiers (URI) for 41

identification of resources and RDF as a structured data format. It is creating the relationships from the data to other sources on the Web. These data sets are not only accessible by human beings, but also readable for machines. LOD can aims to publish and connect open but inharmonious databases by applying the LD principles. The aggregation of all LOD data set is denoted as LOD Cloud. In this direction, the term media fragment refers to the inside content of multimedia objects, such as a certain zone within an image, or a 9.498-second segment within a 34-minute video. Most research about media fragments focus on exposing the closed data, such as tracks and video segments, within the multimedia file or the on-line host server using semantic web and LD technologies. In here, we will introduce every media fragment as a visual display of the selected subtitle text that located on the YouTube host server. Each media fragment is included in media fragment number, caption text, start time, duration and end time. 1.3 Achieving to indexable semantic data into on-line video search engines To make use of the semantic web information, search engines need to index semantic data. Generally, this can be achieved by storing the data in triple stores. Triple stores are data warehouses for structured data modeled using RDF. They store RDF triples and are queried using SPARQL endpoint. For convert a non-semantic video search engine into a semantic video search engine, the already existing keyword-based textual metadata has to be mapped to semantic web entities. The most challenging problem on mapping data to semantic web entities is the existence of ambiguous names and therefore, resulting in a set of entities, which have to be disambiguated. One of the important our goal in this paper is mapping of textual and temporal features of YouTube subtitled videos to LOD entities. 1.4 Influence of video transcripts in semantic video indexing One of the most challenging approaches about semantic video indexing is based on the extraction of semantics from its subtitles. Subtitles carried such information through natural language sentences. The YouTube video content is described as well by metadata referring to the entire video as also by information assigned to a distinguished time position within the video. Our intention is entity mapping of both kinds of metadata. These kinds are included in subtitled YouTube video metadata (e.g. title, video ID, description, rating, publish information, thumbnail view, category, Tags and etc.) and transcripts attached to it. In addition since textual features containing a higher level semantic concept are easier to understand, they are very useful in video classification. Fig.1 Structure of subtitled YouTube keyword-based search. 1.5 Web-based natural language processors and video transcripts 42

Fig.2 SYMF-SE (proposed) framework. We must evaluate and analyze subtitle text in order to calculate the relatedness between textual inputs. For this purpose, we propose using web-based natural language processors to discover relatedness between words that possibly represent the same concepts. Processes for analyzing words and sentences within Natural Language Processing (NLP) include part-of-speech tagging (POST) and word sense disambiguation (WSD). These processes locate and discover the sense and concept that the word represents. NLP is not only about comprehension, but also the ability to manipulate data, and in some cases, produce answers. It is as well forcefully connected with Information Retrieval (IR) for searching and retrieving information by using some of these concepts. A word can be classified into several linguistic word types. These can, for example, be homonyms, hyponyms and hypernyms. Homonyms are synonyms or equivalent words. Hyponyms are the opposite, a more particular instance of the given word. A hypernym is a less particular instance than the original word. In addition, to maximize the serviceability of NLP, we need to have specific ontology. 1.6 Overview on main targets in our research In this paper by utilizing a rich data model based on the skilled web-based natural language processors, the NE booster processors, ontologies and RDF, we have developed a web application that called SYMF-SE (Proposed). Previous works of near to our work haven t used all the main portions and ideas of realized in this paper yet. They have used some web-based natural language processors only for detect NEs (in very limited main types). For instance, Alchemy API provides advanced cloud-based and on-premise text analysis infrastructure that removes the cost and problem of integrating natural language processors. Some of the Alchemy API main portions are included in concept tagging, entity extraction, keyword extraction, relation extraction, sentiment analysis, text categorization and etc. Behavior of previous works against the OpenCalais and other the web-based natural language processors are in the same way. By using the NE booster processors such as Amazon web service, Google map, wolfram alpha, Google Book, Facebook graph API and Internet Movie Database (IMDB), we can have really boosted semantically meaning of NEs. However, none of the preceding works haven t used the NE booster processors in their approaches. Furthermore, in their web applications, user can t use SPARQL endpoint for query from RDF datasets. There isn t any SPARQL GUI for run user queries towards triple stores and also there aren t any RDF syntaxes outputs (e.g. RDF/XML, NTriples, Notation 3, Turtle, NQuads, TriG, TriX, RDFa and etc.) for purposes of store into user disk storage, analysis by user and send query to them through SPARQL GUI. SYMF-SE (Proposed) framework has eliminated all the difficulties with the previous web applications (see Figure 2). 43

In here, we propose UM-OWNLP for extracting and unifying outcomes of the main portions of 10 web-based natural language processors from subtitles associated to YouTube videos in order to generate media fragments annotated with resources from the LOD cloud. Then, we propose UM-OWNEBP containing six the favorite NE booster processors to boost outcomes of NEs obtained from UM-OWNLP. We use dotnetrdf as a powerful and flexible library for working with RDF, triple stores, graphs and SPARQL in.net environments. Some of our contributions are included in offering a new integrated and comprehensive method to: 1) extracting NEs and the other principal portions from inside of the web-based natural language processors that embedded into UM- OWNLP in both text/uri area and for linking media fragments to the LOD cloud; 2) Semantic boosting of resulted NEs by UM-OWNEBP. 3) Design a userfriendly and robust user interface for browsing the enriched subtitled YouTube videos. 4) Use UM- OWNLP ontology for interact with ontological mappings resulted from UM-OWNLP by the end client. 5) Make an ability for interact with RDF Generator Module (RDF-GM), in-memory triple store and SPARQL GUI through user and 6) mapping YouTube keyword-based textual metadata and caption metadata attached to it, towards semantic web entities. 2 Related Work 2.1 Overview on all the previous works related to our proposed approach Many approaches have already presented multimedia and Annotation as LD, which offers experience for us on video resource representing. The LEMO framework provides an integrated model to annotate media fragments while the annotations are enriched with textually related information from the LOD cloud. In LEMO, media fragments are presented using MPEG-21 terminology. LEMO must convert available video files to MPEG suitable version and emanate them from LEMO server. In addition, LEMO derivatives a core Annotation schema from Annotea Annotation Schema in order to link annotations to media fragments identifications [1]. Website of yovisto.com, handle a great amount of recordings of academic lectures and conferences for end clients to search in a content-based method. Yovisto presents an academic video search platform by spread the database containing video and annotations as LD [2] and [9]. It uses Virtuoso server in [3] to propagate the videos and annotations in the database and MPEG-7, Core Ontology of Multimedia (COMM) in [4] to describe multimedia data. Yovisto provides both automatic video annotations based on video analysis and participatory user-generated annotations, which are moreover linked to entities in the LOD cloud with the target to better the search ability of videos. SemWebVid automatically generates RDF video descriptions using their transcripts. The transcripts are analyzed by 3 NLP web Tools (AlchemyAPI, OpenCalais and Zemanta) but sliced into blocks, which make slack the context for the web-based natural language processors [5]. In [6] has shown how events can be detected on-the-fly through crowd sourcing textual, visual, and behavioral analysis in YouTube videos, at scale. They defined three types of events: 1) visual events in the sense of shot changes. 2) Occurrence events in the sense of the appearance of an NE and 3) interest-based events in the sense of purposeful in-video navigation by end clients. In occurrence event detection process, they analyzed the available video metadata using NLP techniques, as outlined in [5]. The detected NEs are presented to the user in a list, and upon click via a timeline-like user interface allow for jumping into one of the shots where the NE occurs. In addition, into [10] we used unifier and generator modules for creating a Novel Semantic Video Search Engine by Enrichment Textual and Temporal Features of Subtitled YouTube Media Fragments. 2.2 Analysis the closest work related to our proposed approach 44

Fig.3 Example of the whole resulted page and architecture in [7]. In [7] proposed an approach using the 10 web-based natural language processors based on NERD module in [8]. They only used the portion of NE extraction of these processors and in very restricted types. By having an overview on their demo, end client realizes that he/she can t use advantages RDF syntaxes outputs, SPARQL GUI and triple store module (see Figure 3). We found that the most YouTube videos of returned on their outcomes aren t with subtitle, and they haven t utilized the main keyword-based textual metadata of YouTube data API, up to now. NERD module grabs outcomes of the 10 web-based natural language processors separately, and it uses these processors only for NE recognition about the 10 main types. In NE extraction, it hasn t grabbed any NE subtypes and other derivatives of entity extraction portion. Hence, NERD has answers of chopped and with restricted types of dependent on NEs (see Figure 4). It s ignored the most important portions of the web-based natural language processors such as concept tagging, entity extraction, keyword extraction, relation extraction, sentiment analysis, text categorization, fact detection, topic extraction, meaning detection, dependency parse graph and etc. As we mentioned the most challenging problem on mapping textual and temporal features of subtitled YouTube videos to LOD entities is the existence of ambiguous names and therefore, resulting in a set of entities, which have to be disambiguated. Using all the main portions of the web-based natural language processors and efficient ontological mappings, boosting resulted NEs and also utilizing outcomes by RDF-GM and attached modules of related to it, can be eventuated in an enriched entity set. The NLP web-based tools that used in [7] are included into AlchemyAPI, DBpedia Spotlight, Evri, Extractiv, OpenCalais, Saplo, Wikimeta, Yahoo! Content Extraction and Zemanta. In this paper, we used all the main portions of NLP web-based tools are listed, but the difference is that we utilized TextRazor NLP web-based tool instead of the Evri. TextRazor can Identify and disambiguate millions of NEs including thousands of types. It has particular ability in Extracting subject-action-object relations, properties and typed interdependencies between entities. TextRazor returns contextual synonyms or entailments, words that are semantically implied by the given context, even though they aren't explicitly mentioned. Generally, in comparison with the Evri NLP web-based tool that it s out of the access right now, TextRazor is a complete and integrated textual analysis solution. It uses state-of-theart machine learning and NLP techniques together with a comprehensive knowledgebase of real-life facts to help Fig.4 The NERD module used in [8]. parse and disambiguate captions with industry-leading accuracy and speed. 45

Fig.5 The basic data flow process in proposed SYMF-SE. Compared with [7] and according to our experiments, we have proper ability in using all the portions of the NLP web-based tools in both text/uri area and by UM- OWNLP. In addition, UM-OWNEBP is an expert module in boosting resulted NEs. Furthermore, RDF- GM, triple store and SPARQL GUI in our paper play an important role in representing enriched results to cruising on the LOD-cloud by users. On the other hand, UM- OWNLP ontology has a special ability in representing ontological mappings obtained from UM-OWNLP. 3 Framework Overview Figure 5 shows the modular implementation of our proposed framework. In a nutshell, the framework enables to pick interesting YouTube subtitled video metadata by users and retrieving related caption metadata from the YouTube data API to extract NEs and other the main portions through integrator modules. Then results of integrator modules shift towards RDF-GM and its affiliated portions for cruising on the LOD-cloud by users. We will describe in full the data flow process, our proposed modules and SYMF-SE (Proposed) user interface. 3.1 Data flow Process Our data flow process is shown in Figure 5. It can be summarized as follows: 1) User must enter desired keyword or video id to receive keyword-based textual metadata of available by YouTube data API. 2) Content of requested keyword or video id is sent by a standard REST protocol using HTTP GET requests to YouTube data API. 3) The popular keyword-based textual metadata redirected by YouTube data API to the end client. All the videos are with subtitles (captions). Returned textual metadata is containing video title, video 46

ID, description, publishing date, updating date, author information, recording location, rating average, five snapshots, user comments, related tags, views, likes, caption language and etc. 4) In here; user has a particular ability to pick interesting metadata. Therefore, keywordbased textual metadata of on LOD-cloud is based-on user choice. Then selected video metadata by connected video id redirected to semantic Annotation. 5) The selected video metadata has been sent towards RDF-GM to save into user disk storage. 6) To get related caption metadata of YouTube subtitled media fragment, SYMF-SE (Proposed) back-end sends an automatic REST request to YouTube data API. 7) YouTube data API returns related video caption information. It contains subtitle text, start time of media fragment and its duration. We implemented an optimized JavaScript code so that by click on the media fragment number, user jump to related start time. Then from start time to end time of related media fragment has highlighted. 8) In this step, the user can have an advanced search on the caption metadata. Simultaneously, he/she has a special ability in select related media fragment and jumps to it. In addition, we have used Google translate API for translate selected subtitle text. Our default language is Persian. 9) The chosen caption metadata has been sent towards RDF-GM to save into user disk storage. 10) To apply advantages of the web-based natural language processors on media fragment, the selected timed text send to UM-OWNLP. In here we used 10 NLP web-based tools for accomplish an impressive outcome. Apart from the NE extraction and disambiguation, UM-OWNLP use the other main portions of 10 NLP web-based Tools. 11) The unified and enriched output of UM-NLPTR sends towards the user-friendly interface. In addition, we have implemented UM-OWNLP Ontology as an efficient module that maintains ontological mappings of generated by 10 NLP web-based tools. For instance, after detect person entity type; dbpedia spotlight returns URI of http://dbpedia.org/ontology/person towards UM- OWNLP Ontology. Similarly, we have been collected manually classes, instances, properties, relations and etc. from ontological portions of used within the natural language processors. 12) In this step, optimized result of UM-OWNLP sends towards RDF-GM. We have proposed as a novel RDF Generator Module for the integration of the chosen main keyword-based textual metadata, the selected caption metadata and the unified results of UM-OWNLP and UM-OWNEBP that could be reused for various online media; Generally, according to UM-OWNLP output, RDF-GM generates suitable RDF syntaxes outputs (e.g. RDF/XML, NTriples, Turtle, TriG, TriX, RDFa and etc.) to save them into the user disk storage. By default for every resulted portion of exist in UM-OWNLP, RDF-GM generates correct RDF syntaxes outputs of NTriples, Turtle and RDF/XML. However, this is based-on user choice. We used dotnetrdf library for this purpose, and it supports from many RDF syntaxes outputs. Desired output formats saved on the user disk storage (with the portion name instead of folder name and media fragment number-video id instead of the file name). 13) For every the generated portion of from UM-OWNLP; RDF-GM generates related TriG format to store or load into the in-memory triple store. By this method, user can make special SPARQL query. 14) End client has a particular capability in boost NEs through UM-OWEBP. It contains 6 the NE booster processors such as Amazon web service, Google map, wolfram alpha, Google book, Facebook graph API and Internet Movie Database (IMDB). For example, Bill Gates NE can send to UM-OWEBP and enriched list of about the Relationships of Bill Gates with the content of these processors is returned. There is a collection of the lovable NE booster processors. For instance, in Wolfram Alpha API; input interpretation, basic information, image, timeline, notable facts, physical characteristics, estimated net worth, Wikipedia s page hits history about the Bill Gates NE is shown. 15) UM-OWEBP outcome sends towards the user-friendly interface. 16) Boosted outcome of UM-OWEBP sends towards RDF-GM. Predefined format that has been stored is RDF/XML. 17) SPARQL is a standard query language for the semantic web and can be used to query over large volumes of RDF data. dotnetrdf provides support for querying over local inmemory data using its own SPARQL implementation. We used SPARQL GUI for testing out SPARQL queries on arbitrary data sets which user created by loading in RDF-GM (or triple store ) and remote URIs. A user can easily query files that have been stored into triple store (with TriG format) and other RDF syntaxes of resulted by RDF-GM. Based-on retrieved YouTube subtitled videos; user can have been cruising on the LOD-cloud. Simultaneously, in here, the user can generate descriptions of resulted objects in a media fragment. In addition, end client can send own annotations and descriptions from the outcomes of integrator modules (text/uri), towards RDF-GM for completing the process of generate enriched RDF/XML file. Finally, we have enriched RDF/XML file connected with video metadata, caption metadata, boosted NEs and LDs of related to the subtitled YouTube video media fragment. 3.2 SYMF-SE (Proposed) Modules Description 47

Fig.6 Some overviews on SYMF-SE user interface and sample results. 1) UM-OWNLP: We proposed UM-OWNLP for extracting NEs and other the principal portions of 10 NLP web-based tools from subtitles associated to YouTube videos in order to generate media fragments annotated and meaningful with resources from the LOD cloud. These tools are included in AlchemyAPI, DBpedia Spotlight, Yahoo! Content Extraction, Extractiv, OpenCalais, Saplo, Wikimeta, TextRazor and Zemanta. Recently, research and commercial communities have spent efforts to propagate NLP services on the web. Besides the common task of identifying WSD and POS, they provide more disambiguation facility with URIs that describes web resources, leveraging on the web of realworld objects. The most favorite NLP web-based tools, use various portions for provide maximum of efficiency. They use portions such as entity extraction, text categorization, sentiment analysis, concept tagging and other similar items. However, all the researches on the online video Annotation included in entity extraction and sometimes, topic extraction. In this paper, we have implemented UM-OWNLP as the integrator module of outcomes of the NLP web-based tools portions. NLP tools represent a clear opportunity for the web community to increase the volume of interconnected data and therefore, they can have huge contribution in reveal truly invisible web. This paper presents UM-OWNLP as a framework that unified the output of 10 different NLP web-based tools. Architecture of UM-OWNLP follows the REST principles and provides a suitable API for machines to exchange content in ideal output modes. It accepts any text/uri of an NLP web-based tool portion/web document, which is analyzed in order to extract its semantic and enriched textual content. GET, POST and PUT methods manage the requests coming from end clients to retrieve the list of NEs and the main portions of NLP tools, classification types and text/uris for a specific tool or for the combination of them. The output sent back to the user can be serialized in JSON, XML and RDF depending on the content type requested. Although NLP web-based tools share the same target, but they use different algorithms and their own classification taxonomies, which make comparison infeasible. In UM- 48

OWNLP, we use disambiguation information for the detected entities. In addition, UM-OWNLP Ontology exposes additional ontological mappings for detected entities. For example, we used DBpedia Ontology as one of the important UM-OWNLP ontology modules that currently contains about 2,350,000 instances, or we utilized by TextRazor ontology as a module that can automatically place entities into ontology of thousands of categories derived from LD sources. 2) UM-OWNEBP: After that NEs and other the semantic portions of UM-OWNLP on user interface displayed, client can send interesting NEs towards UM- OWNEBP. An NLP web-based tool such as OpenCalais and Zemanta can show geographical latitude and altitude of an NE. however that they can t show geographical information on Google map API. NLP web-based tools can show an NE with its types and sub types well, but they can't show boosted information of related to them. For example, none of these NLP web-based tools can display physical characteristics of Bill Gates". However, Wolfram Alpha shows such information and relationships of NEs together as well. We can have a special preview on books related to Bill Gates, by Google book. Now, we can see prices and Bibliographic Information on Amazon web service API. Email to Bill Gates is a movie about the Bill Gates NE. IMDB presents competent information this movie to end clients. As we saw above, through NE migration to a higher level, we can have been ideal boosting on such as NEs. We used JavaScript code for implementation some APIs such as Google map and Google book. Framework of UM-OWNEBP follows REST principles alike to UM- OWNLP architecture. Boosted outcomes of UM- OWNEBP can show as an appropriate graphical/textual form, similar to UM-OWNLP form. As a result, we need to represent a higher level to have a boosted NE enrichment. UM-OWNEBP has been implemented such level as well. In addition, UM-OWNEBP is better optimized by collaborate with UM-OWNLP for exploits the number of named entities extracted from subtitle metadata of YouTube video, their type and their appearance on the time line as features for classifying videos into different categories. 3) RDF-GM: We implemented RDF generator module (RDF-GM) based-on dotnetrdf library. dotnetrdf can produce two main products. One is a Programmer API for working with RDF and SPARQL in code using.net and other is a toolbox which provides a category of GUI and command line tools for working with RDF and SPARQL. RDF-GM is a very important module in our work. It has a special capability on reading RDF, writing RDF, working with graphs. RDF-GM receives semantic and unified outcomes from UM- OWNLP and makes meaningful RDF syntaxes outputs towards the end client disk storage. Simultaneously, it creates related TriG format towards the in-memory triple store. In addition, RDF-GM has an adequate ability in produce RDF/XML files proportional to the selected textual video metadata, caption metadata and boosted NEs. dotnetrdf currently supports reading RDF files in all the RDF syntaxes include NTriples, Turtle, Notation 3, RDF/XML, RDF/JSON, RDFa 1.0, TriG (Turtle with named graphs), TriX (named graphs in XML) and NQuads (NTriples plus context). End user can by SPARQL GUI send interesting query towards in-memory triple store and RDF syntaxes stored into his/her disk storage. In Addition, end user can write an RDF file towards RDF-GM by his/her annotations from semantic results of UM-OWNLP. In YouTube, the visualization of media resources is embedded within a Web page. This offers opportunities to embed RDF about media fragments and annotations into the original displaying multimedia resources and their annotations. Client-side in this method does not need to change anything unless the visualization of media fragments is required, while server-side has to add RDFa into the web pages. About the RDFa extraction, dotnetrdf has a parser which extracts RDF embedded in RDFa in HTML and XHTML documents. 3.3 SYMF-SE User Interface Some overview on our web application is shown in Figure 6. SYMF-SE (Proposed) is a web application in Microsoft Visual C# 2012. A user must at first login on SYMF-SE (Proposed). Then the user is redirected to initial level of search. In this level, after entering desired keywords by the user, he/she can pick the main keywordbase textual metadata from subtitled YouTube videos that are resulted. For example, through pick category metadata, we can have an efficient comparison between the YouTube category title and existed category per media fragment (by use the text categorization portion). Once the main textual metadata is picked, user redirected to second level of search. In this level, user can have a meaningful and particular interact with video semantic Annotation components. These components are included in video player, subtitle information presenter and exhibitioner of unifier modules outputs. We also implemented navigator subcomponents for make facilities to send desired URIs/text towards unifier modules, send wrote RDF syntaxes to RDF-GM, send resulted NEs towards UM-OWNEBP, and send user queries towards SPARQL GUI. In second level, user has a valuable ability in seeking media fragments of selected subtitled YouTube video. So that with the select desired media fragment by the user, he/she jumps to specific start time. For this purpose, we implemented JavaScript codes that supported through YouTube feed API. From start time to end time of selected media fragment is highlighted and timed text information is sent to subtitle 49

information presenter component. In here, subtitle information such as closed caption text, media fragment number; start time, duration and end time is sent to UM- OWNLP. Then, unified outcomes of UM-OWNLP are sent to exhibitioner of unifier modules outputs, and user can see them. Now, end client can have more utilization by using navigator subcomponents and other the main unifier/generator modules. For instance, User has ideal facilities in send desired query towards SPARQL endpoint, write and make RDF syntaxes and send them towards RDF-GM, send interesting resulted NEs towards UM-OWNEBP, enter desired URI/text towards unifier modules, send his/her metadata annotations (e.g. ranking, comments and etc.) towards YouTube platform. Finally, we have a user-friendly and enriched cruising on the LOD-cloud by the end client. 4 Evaluation 4.1 Precision and Recall of SYMF-SE (in initial level of search) In this section, we analyze the retrieval effectiveness of the SYMF-SE (proposed) in initial level (video metadata retrieval) of search. Both precision and recall were considered for measuring the effectiveness of the search engines. Since the YouTube data API returns the results of related to requested subtitled YouTube video, therefore, SYMF-SE search strategy is dependent on to the Google search strategies. In here, recall is the ratio of the number of relevant subtitled YouTube videos retrieved to the total number of relevant subtitled YouTube videos in the database. Since we send our search queries towards YouTube database, therefore, we don t have any information about the overall number of the relevant subtitled videos located into it. In case of precision, we divided tested search queries into: 1) one-word queries 2) simple multi-word queries and finally 3) complex multi-word queries. For more relevant, less relevant and irrelevant categories, the assigned grades are 2, 1, and 0. The precision is calculated through the formula of sum of the grades of subtitled YouTube videos retrieved by the SYMF-SE (proposed), divided to the total number of subtitled YouTube videos that selected for evaluation. 1) Precision of SYMF-SE for Simple One-word Queries: Table 1 illustrated that 61% of the subtitled YouTube videos retrieved by SYMF-SE are less relevant. It s also perceived that 28% these videos are more relevant and only small percentage of the videos (11%) Irrelevant. The overall precision of the SYMF- SE was 1.17. About the search query, the lowest precision was for search query of Music (0.96). We assigned number of 1 to the Computer, number of 2 to the Multimedia, number of 3 to the Internet and number of 4 to the Music search queries. Table1. Precision of SYMF-SE for Simple One-word Queries. Search query Number of subtitled videos evaluated More relevant Less relevant Irrelevant Precision 1 25 9 15 1 1.32 2 25 6 16 3 1.12 3 25 8 16 1 1.28 4 25 5 14 6 0.96 Total 100 28% 61% 11% 1.17 2) Precision of SYMF-SE for Simple Multi-word Queries: Table 2 shows the search results of SYMF-SE for simple multi-word queries. It is beheld from the table that 28% of subtitled YouTube videos are less relevant while 54% of these videos are more relevant. About the Irrelevant videos, only 18% are included into our test result. Generally, the total precision of Simple Multiword Queries is highest in comparison with Simple Oneword Queries. In case of the search query precision, Bill Gates has the most grade among the other search queries. On the other hand, Cloud computing has the lowest precision grade. We defined number of 1 for the Cloud Computing, number of 2 for the Bill Gates, number of 3 for the Comedy movie and number of 4 for the Rock music search queries. Table 2. Precision of SYMF-SE for Simple Multi-word Queries. Search query Number of subtitled videos evaluated More relevant Less relevant Irrelevant Precision 1 25 12 5 8 1.16 2 25 22 2 1 1.84 3 25 9 13 3 1.24 4 25 11 8 6 1.20 Total 100 54% 28% 18% 1.36 50

(3 Precision of SYMF-SE for Complex Multiword Queries: As looked in Table 3, 46% of the subtitled YouTube videos were less relevant and 20% of these videos were irrelevant. About the more relevant, 34% of the subtitled YouTube videos are included into this category. The precision of these videos for complex multi-word queries was found to be 1.14. We assigned number of 1 to the Rock music of European, number of 2 to the Bill Gates in harvard, number of 3 to the SaaS in cloud computing and number of 4 to the Movies of Jerry Lewis search queries. Table 3. Precision of SYMF-SE for Complex Multi-word Queries. Search query Number of subtitled videos evaluated More relevant Less relevant Irrelevant Precision 1 25 2 18 5 0.88 2 25 13 8 4 1.36 3 25 13 6 6 1.28 4 25 6 14 5 1.04 Total 100 34% 46% 20% 1.14 Figure 7 shows that the precision average of SYMF- SE in video metadata retrieval level and for Simple Oneword Queries, Simple Multi-word Queries and Complex Multi-word Queries is equal to 1.22. Fig.7 Precision average of SYMF-SE in video metadata retrieval level. 4.2 Precision of SYMF-SE (in semantic level of search) We have been provided an impressive evaluation to show that we are effectively able to retrieve the subtitles (captions) of YouTube videos with run the main portions located into UM-OWNLP. We collected 100 subtitled YouTube videos in total and evaluated them within four different YouTube channels, including Tech, Science and Education, Sports and News. In this section, the precision evaluation consisted in two steps: 1) be able to get all subtitles of YouTube videos and 2) Manufacturing the relevant and efficient outcomes through UM-OWNLP. We aggregated the category of portions of the web-based natural language processors that supported by UM- OWNLP and we divided these categories into 11 NLP portions, including entity extraction, keyword extraction, topic extraction, word detection, sentiment analysis, phrases detection, text categorization, meaning detection, concept tagging, relation extraction and dependency parse graph. About the first step, since we send request of getting subtitled YouTube videos by REST protocol towards YouTube data API; our experimental results showed that YouTube returns every 100 YouTube videos with their subtitles that located into an XML file. In second step, regardless of environment sounds of subtitles (e.g. train passings <tututu>) and subtitles that we must have refinements on their caption text (to eliminate extra characters such as # ), we will present our experimental results in this step. We will introduce precision as the ratio of the number of more relevant outcomes of portions retrieved to the sum of less relevant and more relevant outcomes of portions retrieved from inside of UM-OWNLP. We extracted 25 subtitled YouTube videos per YouTube channels and in total, 100 subtitled YouTube videos are analyzed. Then we evaluated 25 caption text of media fragments that located per the subtitled YouTube video retrieved into the category of YouTube channels (in total: 2500 media fragments are analyzed). In Tech channel, we retrieved 2957 media fragments from 25 subtitled YouTube videos that extracted. In Science and Education channel, we retrieved 4531 media fragments inside of 25 subtitled YouTube videos that extracted. In Sports YouTube channel, we retrieved 3866 media fragments from 25 subtitled videos that extracted from YouTube. Finally, about the Science and Education YouTube channel, we retrieved 3722 media fragments inside of 25 subtitled YouTube videos that extracted (See Table 4). We had an efficient precision on media fragments that located inside of YouTube channels. For example (see Figure 8), in precision section and per all channels, the outcome of entity extraction portion is with precision average of 91%. It s an acceptable result. Hence, the average ratio of more relevant outcomes to the sum of more relevant outcomes to the less relevant outcomes and into the four evaluated channels is equal to 91%. About the precision average, the concept tagging portion has the lowest percentage. On the other hand, word detection, phrases detection and dependency parse graph 51

portions have the highest percentage. In total, we have the precision average of 92% per all the 11 portions. About the precision section (see Figure 11), News channel with 95% has the highest precision among the other channels. It s also observed that Tech channel with 89% has the lowest precision. Fig.8 Precision average of the UM-OWNLP portion category per all the YouTube channels that evaluated. In case of the average of the relevant media fragments, News channel with 96% in more relevant (see Figure 9) and 4% in less relevant (see Figure 10) has the highest average. On the other hand, Tech channel with 88% in more relevant and 12% in less relevant group has the lowest average. Fig.9 more relevant media fragments average of the four YouTube channels. Fig.10 less relevant media fragments average of the four YouTube channels. Fig.11 Precision average of the four YouTube channels. In [7], they evaluated only the average number of NEs and entities extracted per three YouTube channels (on 60 YouTube videos). They used their evaluations on entity extraction portion and in nine limited entity type of NERD module, including thing, person, function, organization, location, product, time, amount and event. However, we evaluated our experimental results on 11 the main portions of NLP web-based tools. Furthermore, we could utilize from the thousands entity types located on 10 NLP web-based tools in our entity extraction portion. In addition, apart from the entity extraction portion, we evaluated outcomes of the 10 the other main portions located on UM-OWNLP. Our experimental results on 100 subtitled YouTube videos showed that these 10 portions can have very important effects on enrichment and meaningful textual and temporal features of subtitled YouTube media fragments. 4.3 Our approach evaluation against previous work We divided our comparisons into two parts. In first part, we will compare UM-OWNLP against the module of NE extraction of in [8] (that called NERD). In second part, we will have an efficient comparison between our RDF generator module and RDF API of in [7]. In comparison with NERD module [8], UM-OWNLP has some advantages such as 1) present a framework to support all the 10 NLP web-based tools features such as concept tagging, entity extraction, keyword extraction, text categorization, sentiment analysis, relation extraction, event identification, fact identification, entailment extraction and other many features, and representing them in a unified format, 2) Present UM- NLPTR ontology to support all the 10 NLP web tools ontological features and representing unified ontological outputs, 3) Using TextRazor as an impressive and 52

acceptable NLP web-based tool instead of the Evri located on [8], 4) Using all of NEs types and subtypes of the 10 NLP web-based tools in outcome, 5) User tests by URI/text (according to all the features of NLP web-based tools) and 6) represent unified outcomes with the highest precision average in comparison with NERD output (in entity extraction portion) and other many cases. Finally, in comparison with the RDF API of [7], our RDF-GM has special advantages such as 1) Provide facilities to reading RDF, writing RDF, working with graphs, working with triple stores and querying with SPARQL by using dotnetrdf library, 2) Make all the RDF syntaxes supported by dotnetrdf about the reading RDF including NTriples, Turtle, Notation 3, RDF/XML, RDF/JSON, RDFa 1.0, TriG, TriX and NQuads, 3) Make all the RDF syntaxes supported by dotnetrdf about the writing graphs including NTriples, Turtle, Notation 3, RDF/XML, RDF/JSON, XHTML + RDFa, 4) Loading and saving triple stores into an inmemory triple store module, 5) Embed user annotations and boosted NEs, caption metadata and picked video metadata into an enriched RDF/XML format, 6) Export every UM-OWNLP outcome into an enriched RDF/XML format, according to video ID and media fragment number 7) Store all the resulted RDF syntaxes into user disk storage according to user preferences 8) Save an in-memory triple store as a file in a RDF dataset format such as TriG, TriX or Nquads according to user preferences and 9) Using SPARQL GUI as an particular tool to send user queries towards enriched and semantic files of generated by in-memory triple store and RDF- GM. 5 Conclusion and Future Work The enormous growth of subtitled YouTube video data repositories has increased the need for semantic video indexing techniques. In this paper, we discussed a new way for efficient semantic indexing subtitled YouTube media fragment content through extracting the main portions from the captions with web-based natural language processors. We introduced the integrator modules that their outcomes are associated with subtitled YouTube media fragments. LD has provided a suitable way to expose, index and search media fragments and annotations on semantic web using URI for identification of resources and RDF as a structured data format. In here, LOD can aims to publish and connect open but heterogeneous databases by applying the LD principles. The aggregation of all LOD data set is denoted as LOD Cloud. Finally, by implementing the important semantic web derivatives such as SPARQL GUI and RDF-GM, end-client can have affective cruising on the LOD-cloud. On the other hand, by tools such as CaptionTube user can effortlessly create suitable captions for own YouTube videos. Therefore, with minimum distribution of hundreds of thousands YouTube videos in a moment and easily convert them to subtitled YouTube videos by owners, we could have efficient semantic indexing and Annotating on subtitled YouTube specific contents by SYMF-SE (proposed). For future works, we plan to apply our proposed approaches to object-featured video summarization and video categorization field. In addition, we re developing a query-builder interface which creates the SPARQL queries and thereby, end user without have knowledge about the SPARQL queries can easily interact with enriched and semantic files located on the user disk storage. 6 Acknowledgments We d like to acknowledge the contributors to this project such as service supporters of the famous webbased natural language processors such as AlchemyAPI, DBpedia Spotlight, TextRazor, Zemanta, OpenCalais and many others that helped us in implementing this project. References: [1] B. Haslhofer, W. Jochum, R. King, C. Sadilek, and K. Schellner, "The LEMO annotation framework: weaving multimedia annotations with the web," International Journal on Digital Libraries, vol. 10, pp. 15-32, 2009. [2] J. Waitelonis and H. Sack, "Augmenting video search with linked open data," in Proc. of int. conf. on semantic systems, 2009, pp. 1-9. [3] O. Erling and I. Mikhailov, "RDF Support in the Virtuoso DBMS," in Networked Knowledge-Networked Media, ed: Springer, 2009, pp. 7-24. [4] R. Arndt, R. Troncy, S. Staab, L. Hardman, and M. Vacura, "COMM: designing a well-founded multimedia ontology for the web," in The semantic web, ed: Springer, 2007, pp. 30-43. [5] T. Steiner and M. Hausenblas, "SemWebVid- Making Video a First Class Semantic Web Citizen and a First Class Web Bourgeois," in ISWC Posters&Demos, 2010, pp. 1-8. [6] T. Steiner, R. Verborgh, R. Van de Walle, M. Hausenblas, and J. Gabarró Vallès, "Crowdsourcing event detection in YouTube videos," 2012, pp. 58-67. [7] Y. Li, G. Rizzo, R. Troncy, M. Wald, and G. Wills, "Creating enriched YouTube media fragments with NERD using timed-text," pp. 1-4, 2012. [8] G. Rizzo and R. Troncy, "NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Extraction 53

Tools," in Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012, pp. 73-76. [9] J. Waitelonis, N. Ludwig, and H. Sack, "Use what you have: Yovisto video search engine takes a semantic turn," in Semantic Multimedia, ed: Springer, 2011, pp. 173-185. [10] B. Farhadi and M. B. Ghaznavi Ghoushchi, "Creating a Novel Semantic Video Search Engine through Enrichment Textual and Temporal Features of Subtitled YouTube Media Fragments, " in 3rd International conference on Computer and Knowledge Engineering, 2013. As you can see for the title of the paper you must use 16pt Mathematical Equations must be numbered as School UM-OWNLP portions category Table 4. Outcomes of SYMF-SE precision in semantic level of search. YouTube channels YouTube channels Precision More relevant media fragments less relevant media fragments Tech Science Sports News Tech Science Sports News Tech Science Sports News Precision average Entity extraction 23 23 21 24 2 2 4 1 0.92 0.92 0.84 0.96 91% Keyword extraction 24 25 23 25 1 0 2 0 0.96 1 0.92 1 97% Topic extraction 22 23 24 24 3 2 1 1 0.88 0.92 0.96 0.96 93% Word detection 25 25 25 25 0 0 0 0 1 1 1 1 100% Sentiment analysis Phrases detection Text categorization Meaning detection 21 24 20 24 4 1 5 1 0.84 0.96 0.8 0.96 89% 25 25 25 25 0 0 0 0 1 1 1 1 100% 19 23 21 24 6 2 4 1 0.76 0.92 0.84 0.96 87% 23 24 22 23 2 1 3 2 0.92 0.96 0.88 0.92 92% Concept tagging 16 19 18 17 9 6 7 8 0.64 0.76 0.72 0.68 70% Relation extraction Dependency parse graph 24 23 25 25 1 2 0 0 0.96 0.92 1 1 97% 25 25 25 25 0 0 0 0 1 1 1 1 100% Average 88% 96% 92% 96% 12% 4% 8% 4% 89% 94% 90% 95% 92% 54