Bringing Named Entity Recognition on Drupal Content Management System
|
|
|
- Augusta Edwards
- 10 years ago
- Views:
Transcription
1 Bringing Named Entity Recognition on Drupal Content Management System José Ferrnandes 1 and Anália Lourenço 1,2 1 ESEI - Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense, Spain 2 IBB - Institute for Biotechnology and Bioengineering, Centre of Biological Engineering, University of Minho, Campus de Gualtar, Braga, Portugal [email protected], analia@{ceb.uminho.pt,uvigo.es} Abstract. Content management systems and frameworks (CMS/F) play a key role in Web development. They support common Web operations and provide for a number of optional modules to implement customized functionalities. Given the increasing demand for text mining (TM) applications, it seems logical that CMS/F extend their offer of TM modules. In this regard, this work contributes to Drupal CMS/F with modules that support customized named entity recognition and enable the construction of domain-specific document search engines. Implementation relies on well-recognized Apache Information Retrieval and TM initiatives, namely Apache Lucene, Apache Solr and Apache Unstructured Information Management Architecture (UIMA). As proof of concept, we present here the development of a Drupal CMS/F that retrieves biomedical articles and performs automatic recognition of organism names to enable further organism-driven document screening. Keywords: Drupal, text mining, named entity recognition, Apache Lucene, Apache Solr, Apache UIMA. 1 Introduction The number of generic Text Mining (TM) software tools available now is considerable [1, 2], and almost every computer language has some module or package dedicated to natural language processing [3]. Notably, Biomedical TM (BioTM), i.e. the area of TM dedicated to applications in the Biomedical domain, has grown considerably [4, 5]. One of the main challenges in BioTM is achieving a good integration of TM tools with tools that are already part of the user workbench, in particular data curation pipelines [4, 6]. Many TM products (especially, commercial products) are built in a monolithic way and often, their interfaces are not disclosed and open standards are not fully supported [7]. Also, it is important to note that biomedical users have grown dependent of Web resources and tools, such as online data repositories, online and downloadable data analysis tools, and scientific literature catalogues [8]. Therefore, it J. Sáez-Rodríguez et al. (eds.), 8th International Conference on Practical Appl. of Comput. Biol. & Bioinform. (PACBB 2014), Advances in Intelligent Systems and Computing 294, DOI: / _31, Springer International Publishing Switzerland
2 262 J. Ferrnandes and A. Lourenço is desirable to integrate TM tools with these resources and tools, and it seems logical to equip Web development frameworks, such as Content Management Systems and Frameworks (CMS/F), with highly customizable TM modules. Drupal, which is one of the most common open source CMS/Fs ( builtwith.com/cms), already presents some contributions to Bioinformatics and TM: the GMOD Drupal Bioinformatic Server Framework [9], which aims to speed up the development of Drupal modules for bioinformatics applications; the OpenCalais project that integrates Drupal with the Thomson Reuters' Calais Web service ( a service for annotating texts with URIs from the Linked Open Data cloud; and, RDF/RDFa support so to enable the use of this ontology language in Web knowledge exchange and facilitate the development of document-driven applications, and promote the availability of knowledge resources [10]. The aim of this work was to extend further Drupal TM capabilities, notably to enable the incorporation of third-party specialized software and the development of customized applications. The proof of concept addressed Named Entity Recognition (NER), i.e. the identification of textual references to entities of interest, which is an essential step in automatic text processing pipelines [11, 12]. There are many opensource and free NER tools available, covering a wide range of bio-entities and approaches. So, our efforts were focused on implementing a Drupal module that would support customised NER and, in particular, to equip Drupal with the necessary means to construct domain-specific document search engines. For this purpose, we relied on Apache Information Retrieval (IR) and TM initiatives, namely Apache Lucene, Apache Solr and Apache Unstructured Information Management Architecture (UIMA). The next sections describe the technologies used and their integration in the new Drupal model. The recognition of species names in scientific papers using the stateof-the-art and open source Linnaeus tool [13] is presented as an example of application. 2 Apache Software Foundation Information Retrieval and Extraction Initiatives Apache organization supports some of the most important open source projects for the Web [14]. The Web server recommended to run Drupal CMS/F is the Apache HTTP Server [15]. Now, we want to take advantage of Apache Lucene, Apache Solr and Apache UIMA to incorporate IR and TM capabilities in Drupal CMS/F. The Apache Lucene and Apache Solr are two distinct Java projects that have joined forces to provide a powerful, effective, and fully featured search tool. Solr is a standalone enterprise search server with a REST-like API [16] and Lucene is a highperformance and scalable IR library [17]. Due to its scalability and performance, Lucene is one of the most popular, free IR libraries [17, 18]. Besides the inverted index for efficient document retrieval, Lucene provides search enhancing features, namely: a rich set of chainable text analysis components, such as tokenizers and language-specific stemmers; a query syntax with a parser and a variety of query types
3 Bringing Named Entity Recognition on Drupal Content Management System 263 that support from simple term lookup to fuzzy term matching; a scoring algorithm, with flexible means to affect the scoring; and utilities such as the highlighter, the query spell-checker, and "more like this" [16]. Apache Solr can be seen as an enabling layer for Apache Lucene that extends its capabilities in order to support, among others: external configuration via XML; advanced full-text search capabilities, standard-based open interfaces (e.g. XML, JSON and HTTP); extensions to the Lucene Query Language; and, Apache UIMA integration for configurable metadata extraction ( features.html). Originally, the Apache UIMA started as an IBM Research project with the aim to deliver a powerful infrastructure to store, transport, and retrieve documents and annotation knowledge accumulated in NLP pipeline systems [19]. Currently, Apache UIMA supports further types of unstructured information besides text, like audio, video and images and is a de facto industry standard and software framework for content analysis [20]. Its main focus is ensuring interoperability between the processing components and thus, allowing a stable data transfer through the use of common data representations and interfaces. 3 New Supporting Drupal NER Module We looked for a tight integration of the aforementioned Apache technologies in order to provide the basic means to deploy any NER task, namely those regarding basic natural language processing and text annotation (Fig.1). Using XML as common document interchange format, the new module allows the incorporation of third-party NER tools through pre-existent or newly developed Apache UIMA wrappers. Fig. 1. Interoperation of Apache technology and third-party NER tools in the Drupal NER module The proof of concept was the development of a Drupal CMS/F that retrieves scientific articles from the PubMed Central Open Access subset (PMC-OA) [21] and performs automatic recognition of organism mentions to enable further organismdriven document screening. The next subsections detail the interoperation of the different technologies and the integration of the Linnaeus UIMA NER wrapper as means to deploy such CMS/F.
4 264 J. Ferrnandes and A. Lourenço 3.1 Document Retrieval and Indexing Documents are retrieved from the PMC-OA through the FTP service. An XSLT stylesheet is used to specify the set of rules that guide document format transformation. Notably, the XML Path (XPath) language is used to identify matching nodes and navigate through the elements and attributes in the PMC-OA XML documents. After that, the Apache Solr engine is able to execute the UIMA-based NER pipeline to identify textual references of interest (in this case, organism names) and produce a list of named entities. The textual references are included in the metadata of the documents, and the entities recognized are added to the Apache Lucene index as means to enable further organism-specific document retrieval by the Drupal application. 3.2 Document Processing and Annotation Apache UIMA supports the creation of highly customized document processing pipelines [22]. At the beginning of any processing pipeline is the Collection Reader component (Fig. 2), which is responsible for document input and interaction. Whenever a document is processed by the pipeline, a new object-based data structure, named Common Analysis Structure (CAS), is created. UIMA associates a Type System (TS), like an object schema for the CAS, which defines the various types of objects that may be discovered in documents. The TS can be extended by the developer, permitting the creation of very rich type systems. Fig. 2. High-Level UIMA Component Architecture This CAS is processed throughout the pipeline and information can be added to the object by the Analysis Engine (AE) at different stages. The UIMA framework treats AEs as pluggable, compatible, discoverable, managed objects that analyze documents as needed. An AE consists of two components: Java classes, typically packaged as one or more JAR files, and AE descriptors, consisting of one or more XML files. The simplest type of AE is the primitive type, which contains a single annotator at its core (e.g. a tokenizer), but AEs can be combined together into an Aggregate Analysis Engine (AAE). The basic building block of any AE is the Annotator, which comprises the analysis algorithms responsible for the discovery of the desired types and the CAS update for upstream processing. At the end of the processing pipeline are the CAS Consumers, which receive the CAS objects, after they have been analyzed by the AE/AAE, and conduct the final CAS processing.
5 Bringing Named Entity Recognition on Drupal Content Management System Integration of Third-Party NER Tools Apache UIMA supports seamless integration of third-party TM tools such as NER tools. Indeed, there already exist UIMA wrappers for several state-of-the-art NER tools, such as the organism tagger Linnaeus [23]. The first step to create an UIMA annotator wrapper is to define the AE Descriptor, i.e. the XML file that contains the information about the annotator, such as the configuration parameters, data structures, annotator input and output data types, and the resources that the annotator uses. The UIMA Eclipse plug-ins help in this creation by auto-generating this file based on the options configured in a point and click window (Fig. 3 - A). Fig. 3. UIMA Eclipse plug-in windows The AE is then able to load the annotator in the UIMA pipeline, and the next step is to define the TS, namely the output types produced by the annotator, as describedd in the AE descriptor file (Fig. 3 - B). { public void process(jcas cas) throws AnalysisEngineProcessException String text = cas.getdocumenttext(); List<Mention> mentions = matcher.match(text); for (Mention mention : mentions) { String mostprobableid = mention.getmostprobableid(); String idstostring = mention.getidstostring(); LinnaeusSpecies species = new LinnaeusSpecies(cas); species.setbegin(mention.getstart()); species.setend(mention.getend()); species.setmostprobablespeciesid(mostprobableid); species.setallidsstring(idstostring); species.setambigous(mention.isambigous()); species.addtoindexes(); } } Fig. 4. process() method of the LinnaeusWrapper.java class
6 266 J. Ferrnandes and A. Lourenço The implementation of AE s Annotator is based on the standard interface AnalysisComponent. Basically, the wrapping of third-party tools implies the implementation of the annotator process( ) method, i.e. the desired Annotator logic (Fig. 4). The CAS Visual Debugger is useful while implementing and testing the UIMA wrappers, in particular the annotators (Fig. 5). After wrappers are fully functioning the pipeline is ready to be used. Fig. 5. Linnaeus UIMA wrapper running on a PMC Open Access article 3.4 Integration between Drupal and Apache Solr Drupal allows developers to alter and customize the functionality of almost all of its components. Here, we developed a new Drupal module, named Views Solr Backend ( to allow the easy and flexible querying of the Apache Solr index. The module is written in PHP and uses the APIs available for Drupal Views and Solarium, an Apache Solr client library for PHP applications [24]. This Drupal module can be easily configured and is even able to support different configurations for multiple Solr hosts (Fig. 6). Notably, it enables the administration of network parameters, the path to connect to the Solr host, and other parameters regarding the presentation of the search results in Drupal. After configuration, it is possible to query any Solr schema, i.e. all indexed documents and the corresponding annotations. Therefore, our Drupal s Views module simplifies custom query display while increasing the interoperability with the CMS.
7 Bringing Named Entity Recognition on Drupal Content Management System 267 Fig. 6. Drupal module setup and presentation 4 Conclusions and Future Work Drupal is a powerful and agile CMS/F that suits a number of development efforts in the Biomedical domains. Given the increasing demand for automatic document processing in support of the population of biomedical knowledge systems, BioTM has become almost a required module in such a framework. This work addressed this need through the exploitation of major open source IR and IE initiatives. The new Drupal Views Solr Backend module, available at Drupal.org website ( integrates Apache Solr with Drupal and thus, enables the implementation of customized search engines in Drupal applications. Moreover, the Apache Solr UIMA plug-in developed here for the Linnaeus NER tool exemplifies the integration of third-party NER tools in the document analysiss processes of Apache Solr, granting a powerful and seamlessly means of specialized document annotation and indexing. Acknowledgements. This work was supported by the IBB-CEB, the Fundação para a Ciência e Tecnologia (FCT) and the European Community fund FEDER, through Program COMPETE [FCT Project number PTDC/SAU-SAP/113196/2009/FCOMP FEDER ], and the Agrupamento INBIOMED from DXPCTSUG- results has received funding from the European Union's Seventh Framework Programme FP7/REGPOT under grant agreement n , BIOCAPS. This document reflects only the author s views and the European Union is FEDER unha maneira de facer Europa (2012/273). The research leading to these not liable for any use that may be made of the information contained herein.
8 268 J. Ferrnandes and A. Lourenço References 1. Kano, Y., Baumgartner, W.A., McCrohon, L., et al.: U-Compare: share and compare text mining tools with UIMA. Bioinformatics 25, (2009), doi: /bioinformatics/btp Fan, W., Wallace, L., Rich, S., Zhang, Z.: Tapping the power of text mining. Commun. ACM 49, (2006), doi: / Gemert, J.: Van Text Mining Tools on the Internet An overview. Univ. Amsterdam 25, 1 75 (2000) 4. Lourenço, A., Carreira, R., Carneiro, S., et A workbench for biomedical text mining. J. Biomed. Inform. 42, (2009), doi: /j.jbi Hucka, M., Finney, A., Sauro, H.: A medium for representation and exchange of biochemical network models (2003) 6. Lu, Z., Hirschman, L.: Biocuration workflows and text mining: overview of the BioCreative, Workshop Track II. Database (Oxford) 2012:bas043 (2012), doi: /database/bas Feinerer, I., Hornik, K., Meyer, D.: Text Mining Infrastructure in R. J. Stat. Softw. 25, 1 54 (2008), doi:citeulike-article-id: Fernández-Suárez, X.M., Rigden, D.J., Galperin, M.Y.: The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection. Nucleic Acids Res. 42, 1 6 (2014), doi: /nar/gkt Papanicolaou, A., Heckel, D.G.: The GMOD Drupal bioinformatic server framework. Bioinformatics 26, (2010), doi: bioinformatics/btq Decker, S., Melnik, S., van Harmelen, F., et al.: The Semantic Web: the roles of XML and RDF. IEEE Internet Comput. 4, (2000), doi: / Rebholz-Schuhmann, D., Kafkas, S., Kim, J.-H., et al.: Monitoring named entity recognition: The League Table. J. Biomed Semantics 4, 19 (2013), doi: / Rzhetsky, A., Seringhaus, M., Gerstein, M.B.: Getting started in text mining: Part two. PLoS Comput. Biol. 5, e (2009), doi: /journal.pcbi Gerner, M., Nenadic, G., Bergman, C.M.: LINNAEUS: A species name identification system for biomedical literature. BMC Bioinformatics 11, 85 (2010), doi: / Fielding, R.T., Kaiser, G.: The Apache HTTP Server Project. IEEE Internet Comput. (1997), doi: / Web server Drupal.org., Smiley, D., Pugh, E.: Apache Solr 3 Enterprise Search Server, p. 418 (2011) 17. McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action, Second Edition: Covers Apache Lucene 3.0, p. 475 (2010) 18. Konchady, M.: Building Search Applications: Lucene, LingPipe, and Gate, p. 448 (2008) 19. Ferrucci, D., Lally, A.: UIMA: An architectural approach to unstructured information processing in the corporate research environment. Nat. Lang. Eng. (2004) 20. Rak, R., Rowley, A., Ananiadou, S.: Collaborative Development and Evaluation of Textprocessing Workflows in a UIMA-supported Web-based Workbench. In: LREC (2012) 21. Lin, J.: Is searching full text more effective than searching abstracts? BMC Bioinformatics 10, 46 (2009), doi: / Baumgartner, W.A., Cohen, K.B., Hunter, L.: An open-source framework for large-scale, flexible evaluation of biomedical text mining systems. J. Biomed. Discov. Collab. 3, 1 (2008), doi: / Móra, G.: Concept identification by machine learning aided dictionary-based named entity recognition and rule-based entity normalisation. Second CALBC Work 24. Kumar, J.: Apache Solr PHP Integration, p. 118 (2013)
Abstracting the types away from a UIMA type system
Abstracting the types away from a UIMA type system Karin Verspoor, William Baumgartner Jr., Christophe Roeder, and Lawrence Hunter Center for Computational Pharmacology University of Colorado Denver School
The Prolog Interface to the Unstructured Information Management Architecture
The Prolog Interface to the Unstructured Information Management Architecture Paul Fodor 1, Adam Lally 2, David Ferrucci 2 1 Stony Brook University, Stony Brook, NY 11794, USA, [email protected] 2 IBM
Shallow Parsing with Apache UIMA
Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland [email protected] Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic
PMML and UIMA Based Frameworks for Deploying Analytic Applications and Services
PMML and UIMA Based Frameworks for Deploying Analytic Applications and Services David Ferrucci 1, Robert L. Grossman 2 and Anthony Levas 1 1. Introduction - The Challenges of Deploying Analytic Applications
Information Retrieval Elasticsearch
Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches
Natural Language Processing in the EHR Lifecycle
Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS [email protected] Health & Public Service Outline Medical Data Landscape Value Proposition of NLP
Semantic Content Management with Apache Stanbol
Semantic Content Management with Apache Stanbol Ali Anil SINACI and Suat GONUL SRDC Software Research & Development and Consultancy Ltd., ODTU Teknokent Silikon Blok No:14, 06800 Ankara, Turkey {anil,suat}@srdc.com.tr
LDIF - Linked Data Integration Framework
LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany [email protected],
UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis
UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis Jan Hajič, jr. Charles University in Prague Faculty of Mathematics
K@ A collaborative platform for knowledge management
White Paper K@ A collaborative platform for knowledge management Quinary SpA www.quinary.com via Pietrasanta 14 20141 Milano Italia t +39 02 3090 1500 f +39 02 3090 1501 Copyright 2004 Quinary SpA Index
UIMA Overview & SDK Setup
UIMA Overview & SDK Setup Written and maintained by the Apache UIMA Development Community Version 2.4.0 Copyright 2006, 2011 The Apache Software Foundation Copyright 2004, 2006 International Business Machines
How To Manage Your Digital Assets On A Computer Or Tablet Device
In This Presentation: What are DAMS? Terms Why use DAMS? DAMS vs. CMS How do DAMS work? Key functions of DAMS DAMS and records management DAMS and DIRKS Examples of DAMS Questions Resources What are DAMS?
Structured Content: the Key to Agile. Web Experience Management. Introduction
Structured Content: the Key to Agile CONTENTS Introduction....................... 1 Structured Content Defined...2 Structured Content is Intelligent...2 Structured Content and Customer Experience...3 Structured
Shared service components infrastructure for enriching electronic publications with online reading and full-text search
Shared service components infrastructure for enriching electronic publications with online reading and full-text search Nikos HOUSSOS, Panagiotis STATHOPOULOS, Ioanna-Ourania STATHOPOULOU, Andreas KALAITZIS,
An Information Provider s Wish List for a Next Generation Big Data End-to-End Information System
An Information Provider s Wish List for a Next Generation Big Data End-to-End Information System Mona M. Vernon Thomson Reuters 22 Thomson Place Boston, MA 02210, USA [email protected] Brian
Full-text Search in Intermediate Data Storage of FCART
Full-text Search in Intermediate Data Storage of FCART Alexey Neznanov, Andrey Parinov National Research University Higher School of Economics, 20 Myasnitskaya Ulitsa, Moscow, 101000, Russia [email protected],
LINKING DOCUMENTS IN REPOSITORIES TO STRUCTURED DATA IN DATABASE
LINKING DOCUMENTS IN REPOSITORIES TO STRUCTURED DATA IN DATABASE A thesis submitted to the University of Manchester for the degree of Master of Science in the Faculty of Engineering and Physical Sciences
Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies
Semantic Data Management Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies 1 Enterprise Information Challenge Source: Oracle customer 2 Vision of Semantically Linked Data The Network of Collaborative
AnnoMarket: An Open Cloud Platform for NLP
AnnoMarket: An Open Cloud Platform for NLP Valentin Tablan, Kalina Bontcheva Ian Roberts, Hamish Cunningham University of Sheffield, Department of Computer Science 211 Portobello, Sheffield, UK [email protected]
Things Made Easy: One Click CMS Integration with Solr & Drupal
May 10, 2012 Things Made Easy: One Click CMS Integration with Solr & Drupal Peter M. Wolanin, Ph.D. Momentum Specialist (principal engineer), Acquia, Inc. Drupal contributor drupal.org/user/49851 co-maintainer
Apache Sling A REST-based Web Application Framework Carsten Ziegeler [email protected] ApacheCon NA 2014
Apache Sling A REST-based Web Application Framework Carsten Ziegeler [email protected] ApacheCon NA 2014 About [email protected] @cziegeler RnD Team at Adobe Research Switzerland Member of the Apache
THE CCLRC DATA PORTAL
THE CCLRC DATA PORTAL Glen Drinkwater, Shoaib Sufi CCLRC Daresbury Laboratory, Daresbury, Warrington, Cheshire, WA4 4AD, UK. E-mail: [email protected], [email protected] Abstract: The project aims
UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications
UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications Gaël de Chalendar CEA LIST F-92265 Fontenay aux Roses [email protected] 1 Introduction The main data sources
A Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
PPInterFinder A Web Server for Mining Human Protein Protein Interaction
PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar
ELPUB Digital Library v2.0. Application of semantic web technologies
ELPUB Digital Library v2.0 Application of semantic web technologies Anand BHATT a, and Bob MARTENS b a ABA-NET/Architexturez Imprints, New Delhi, India b Vienna University of Technology, Vienna, Austria
EXTENDING JMETER TO ALLOW FOR WEB STRUCTURE MINING
EXTENDING JMETER TO ALLOW FOR WEB STRUCTURE MINING Agustín Sabater, Carlos Guerrero, Isaac Lera, Carlos Juiz Computer Science Department, University of the Balearic Islands, SPAIN [email protected], [email protected],
IT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
Rotorcraft Health Management System (RHMS)
AIAC-11 Eleventh Australian International Aerospace Congress Rotorcraft Health Management System (RHMS) Robab Safa-Bakhsh 1, Dmitry Cherkassky 2 1 The Boeing Company, Phantom Works Philadelphia Center
IBM WebSphere ILOG Rules for.net
Automate business decisions and accelerate time-to-market IBM WebSphere ILOG Rules for.net Business rule management for Microsoft.NET and SOA environments Highlights Complete BRMS for.net Integration with
How To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
Software Architecture Document
Software Architecture Document Natural Language Processing Cell Version 1.0 Natural Language Processing Cell Software Architecture Document Version 1.0 1 1. Table of Contents 1. Table of Contents... 2
Using Apache Solr for Ecommerce Search Applications
Using Apache Solr for Ecommerce Search Applications Rajani Maski Happiest Minds, IT Services SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. 2 Copyright Information This document
Building a Modular Server Platform with OSGi. Dileepa Jayakody Software Engineer SSWSO2 Inc.
Building a Modular Server Platform with OSGi Dileepa Jayakody Software Engineer SSWSO2 Inc. Outline Complex Systems OSGi for Modular Systems OSGi in SOA middleware Carbon : A modular server platform for
An Oracle White Paper October 2013. Oracle Data Integrator 12c New Features Overview
An Oracle White Paper October 2013 Oracle Data Integrator 12c Disclaimer This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality, and should
Eclipse Open Healthcare Framework
Eclipse Open Healthcare Framework Eishay Smith [1], James Kaufman [1], Kelvin Jiang [2], Matthew Davis [3], Melih Onvural [4], Ivan Oprencak [5] [1] IBM Almaden Research Center, [2] Columbia University,
Obfuscated Biology -MSc Dissertation Proposal- Pasupula Phaninder University of Edinburgh [email protected] March 31, 2011
Obfuscated Biology -MSc Dissertation Proposal- Pasupula Phaninder University of Edinburgh [email protected] March 31, 2011 1 Introduction In this project, I aim to introduce the technique of obfuscation
Middleware support for the Internet of Things
Middleware support for the Internet of Things Karl Aberer, Manfred Hauswirth, Ali Salehi School of Computer and Communication Sciences Ecole Polytechnique Fédérale de Lausanne (EPFL) CH-1015 Lausanne,
Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery
Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery Dimitrios Kourtesis, Iraklis Paraskakis SEERC South East European Research Centre, Greece Research centre of the University
Find the Information That Matters. Visualize Your Data, Your Way. Scalable, Flexible, Global Enterprise Ready
Real-Time IoT Platform Solutions for Wireless Sensor Networks Find the Information That Matters ViZix is a scalable, secure, high-capacity platform for Internet of Things (IoT) business solutions that
Dendro: collaborative research data management built on linked open data
Dendro: collaborative research data management built on linked open data João Rocha da Silva João Aguiar Castro Faculdade de Engenharia da Universidade do Porto/INESC TEC, Portugal, {joaorosilva,joaoaguiarcastro}@gmail.com
MULTI AGENT-BASED DISTRIBUTED DATA MINING
MULTI AGENT-BASED DISTRIBUTED DATA MINING REECHA B. PRAJAPATI 1, SUMITRA MENARIA 2 Department of Computer Science and Engineering, Parul Institute of Technology, Gujarat Technology University Abstract:
How To Build A Connector On A Website (For A Nonprogrammer)
Index Data's MasterKey Connect Product Description MasterKey Connect is an innovative technology that makes it easy to automate access to services on the web. It allows nonprogrammers to create 'connectors'
Search and Real-Time Analytics on Big Data
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
Data Integration Hub for a Hybrid Paper Search
Data Integration Hub for a Hybrid Paper Search Jungkee Kim 1,2, Geoffrey Fox 2, and Seong-Joon Yoo 3 1 Department of Computer Science, Florida State University, Tallahassee FL 32306, U.S.A., [email protected],
Introduction to WebSphere Process Server and WebSphere Enterprise Service Bus
Introduction to WebSphere Process Server and WebSphere Enterprise Service Bus Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 4.0.3 Unit objectives
Lightweight Data Integration using the WebComposition Data Grid Service
Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed
zen Platform technical white paper
zen Platform technical white paper The zen Platform as Strategic Business Platform The increasing use of application servers as standard paradigm for the development of business critical applications meant
11-792 Software Engineering EMR Project Report
11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of
INSPIRE Dashboard. Technical scenario
INSPIRE Dashboard Technical scenario Technical scenarios #1 : GeoNetwork catalogue (include CSW harvester) + custom dashboard #2 : SOLR + Banana dashboard + CSW harvester #3 : EU GeoPortal +? #4 :? + EEA
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object
Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object Anne Monceaux 1, Joanna Guss 1 1 EADS-CCR, Centreda 1, 4 Avenue Didier Daurat 31700 Blagnac France
Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
AN INTEGRATION APPROACH FOR THE STATISTICAL INFORMATION SYSTEM OF ISTAT USING SDMX STANDARDS
Distr. GENERAL Working Paper No.2 26 April 2007 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL
Simplifying e Business Collaboration by providing a Semantic Mapping Platform
Simplifying e Business Collaboration by providing a Semantic Mapping Platform Abels, Sven 1 ; Sheikhhasan Hamzeh 1 ; Cranner, Paul 2 1 TIE Nederland BV, 1119 PS Amsterdam, Netherlands 2 University of Sunderland,
Tool Support for Model Checking of Web application designs *
Tool Support for Model Checking of Web application designs * Marco Brambilla 1, Jordi Cabot 2 and Nathalie Moreno 3 1 Dipartimento di Elettronica e Informazione, Politecnico di Milano Piazza L. Da Vinci,
Techniques for ensuring interoperability in an Electronic health Record
Techniques for ensuring interoperability in an Electronic health Record Author: Ovidiu Petru STAN 1. INTRODUCTION Electronic Health Records (EHRs) have a tremendous potential to improve health outcomes
A Digital Library Feasibility Study
A Digital Library Feasibility Study C. Henshaw, D. Thompson, M. Savage-Jones Wellcome Library London, UK LIBER Annual Conference Aarhus, Denmark June 2010 Introduction 1. Who we are 2. Vision and strategy
Service Oriented Architecture
Service Oriented Architecture Charlie Abela Department of Artificial Intelligence [email protected] Last Lecture Web Ontology Language Problems? CSA 3210 Service Oriented Architecture 2 Lecture Outline
How To Write A Drupal 5.5.2.2 Rdf Plugin For A Site Administrator To Write An Html Oracle Website In A Blog Post In A Flashdrupal.Org Blog Post
RDFa in Drupal: Bringing Cheese to the Web of Data Stéphane Corlosquet, Richard Cyganiak, Axel Polleres and Stefan Decker Digital Enterprise Research Institute National University of Ireland, Galway Galway,
Implementing Ontology-based Information Sharing in Product Lifecycle Management
Implementing Ontology-based Information Sharing in Product Lifecycle Management Dillon McKenzie-Veal, Nathan W. Hartman, and John Springer College of Technology, Purdue University, West Lafayette, Indiana
THE SEMANTIC WEB AND IT`S APPLICATIONS
15-16 September 2011, BULGARIA 1 Proceedings of the International Conference on Information Technologies (InfoTech-2011) 15-16 September 2011, Bulgaria THE SEMANTIC WEB AND IT`S APPLICATIONS Dimitar Vuldzhev
General principles and architecture of Adlib and Adlib API. Petra Otten Manager Customer Support
General principles and architecture of Adlib and Adlib API Petra Otten Manager Customer Support Adlib Database management program, mainly for libraries, museums and archives 1600 customers in app. 30 countries
Semantic Web Services for e-learning: Engineering and Technology Domain
Web s for e-learning: Engineering and Technology Domain Krupali Shah and Jayant Gadge Abstract E learning has gained its importance over the traditional classroom learning techniques in past few decades.
Technical. Overview. ~ a ~ irods version 4.x
Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number
Functional Requirements for Digital Asset Management Project version 3.0 11/30/2006
/30/2006 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 24 25 26 27 28 29 30 3 32 33 34 35 36 37 38 39 = required; 2 = optional; 3 = not required functional requirements Discovery tools available to end-users:
Flattening Enterprise Knowledge
Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it
Semaphore Overview. A Smartlogic White Paper. Executive Summary
Semaphore Overview A Smartlogic White Paper Executive Summary Enterprises no longer face an acute information access challenge. This is mainly because the information search market has matured immensely
Semantic Knowledge Management System. Paripati Lohith Kumar. School of Information Technology
Semantic Knowledge Management System Paripati Lohith Kumar School of Information Technology Vellore Institute of Technology University, Vellore, India. [email protected] Abstract The scholarly activities
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
Inside the Digital Commerce Engine. The architecture and deployment of the Elastic Path Digital Commerce Engine
Inside the Digital Commerce Engine The architecture and deployment of the Elastic Path Digital Commerce Engine Contents Executive Summary... 3 Introduction... 4 What is the Digital Commerce Engine?...
Perspectives of Semantic Web in E- Commerce
Perspectives of Semantic Web in E- Commerce B. VijayaLakshmi M.Tech (CSE), KIET, A.GauthamiLatha Dept. of CSE, VIIT, Dr. Y. Srinivas Dept. of IT, GITAM University, Mr. K.Rajesh Dept. of MCA, KIET, ABSTRACT
Adding Robust Digital Asset Management to Oracle s Storage Archive Manager (SAM)
Adding Robust Digital Asset Management to Oracle s Storage Archive Manager (SAM) Oracle's Sun Storage Archive Manager (SAM) self-protecting file system software reduces operating costs by providing data
How To Evaluate Web Applications
A Framework for Exploiting Conceptual Modeling in the Evaluation of Web Application Quality Pier Luca Lanzi, Maristella Matera, Andrea Maurino Dipartimento di Elettronica e Informazione, Politecnico di
DataDirect XQuery Technical Overview
DataDirect XQuery Technical Overview Table of Contents 1. Feature Overview... 2 2. Relational Database Support... 3 3. Performance and Scalability for Relational Data... 3 4. XML Input and Output... 4
data.bris: collecting and organising repository metadata, an institutional case study
Describe, disseminate, discover: metadata for effective data citation. DataCite workshop, no.2.. data.bris: collecting and organising repository metadata, an institutional case study David Boyd data.bris
San Jose State University
San Jose State University Fall 2011 CMPE 272: Enterprise Software Overview Project: Date: 5/9/2011 Under guidance of Professor, Rakesh Ranjan Submitted by, Team Titans Jaydeep Patel (007521007) Zankhana
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. [email protected] Mrs.
Data Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
How To Build A Cloud Based Intelligence System
Semantic Technology and Cloud Computing Applied to Tactical Intelligence Domain Steve Hamby Chief Technology Officer Orbis Technologies, Inc. [email protected] 678.346.6386 1 Abstract The tactical
A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL
A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL *Hung-Ming Chen, Chuan-Chien Hou, and Tsung-Hsi Lin Department of Construction Engineering National Taiwan University
2012 LABVANTAGE Solutions, Inc. All Rights Reserved.
LABVANTAGE Architecture 2012 LABVANTAGE Solutions, Inc. All Rights Reserved. DOCUMENT PURPOSE AND SCOPE This document provides an overview of the LABVANTAGE hardware and software architecture. It is written
Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014
Increase Agility and Reduce Costs with a Logical Data Warehouse February 2014 Table of Contents Summary... 3 Data Virtualization & the Logical Data Warehouse... 4 What is a Logical Data Warehouse?... 4
Applying MDA in Developing Intermediary Service for Data Retrieval
Applying MDA in Developing Intermediary Service for Data Retrieval Danijela Boberić Krstićev University of Novi Sad Faculty of Sciences Trg Dositeja Obradovića 4, Novi Sad Serbia +381214852873 [email protected]
