Annotated Corpora in the Cloud: Free Storage and Free Delivery
|
|
|
- Vivian Thornton
- 10 years ago
- Views:
Transcription
1 Annotated Corpora in the Cloud: Free Storage and Free Delivery Graham Wilcock University of Helsinki Abstract The paper describes a technical strategy for implementing natural language processing applications in the cloud. Annotated corpora can be stored in the cloud and queried in normal web browsers via user interfaces implemented in the described framework. A key aim of the strategy is to exploit the free storage and processing that is available in the cloud, while avoiding lock-in to proprietary infrastructure. A half-million-word annotated corpus application is described as a working example of the strategy. 1. Introduction The paper describes a technical strategy for designing and implementing natural language processing applications in the cloud in such a way that annotated corpora can be queried and displayed in ordinary web browsers. There are many different strategies for cloud computing, but rather than giving a superficial review of a variety of alternatives, the paper focusses on describing one specific approach. This approach can be summarized as open source frontend, proprietary (but free) back-end. The paper focusses exclusively on approaches that offer free storage of the corpora in the cloud, and free delivery of the corpora contents and annotations to the web browser. The example corpus application that demonstrates these approaches does not currently support collaborative development of the annotations. Like many other applications, corpus applications can be regarded as having three main parts. The front-end is the user interface, typically consisting of a set of web pages and ways to navigate between them. The back-end is where the data is stored, typically in a database. The application processing takes place somewhere in the middle. This division into three parts is well-known in computer science as the model, view, controller design pattern. Here, the back-end database is the model, the front-end user interface is the view, and the application processing in the middle is the controller. In the case of a cloud computing application, the data is stored in some special kind of cloud data store and the processing is done in a special cloud run-time environment, but it is important that the user interface works in an ordinary web browser. The component parts of the technical strategy are described in the next section. Section 3. then reviews related work. Section 4. describes an implemented example application, in which an annotated corpus is stored in the cloud and is queried from ordinary web browsers. Problems and solutions from this implementation are discussed in Section 5., and Section 6. presents conclusions. 2. A Technical Strategy This section sets out a technical strategy for design and implementation of cloud-based applications. A key aim of the strategy is to take advantage of the free storage and processing quotas that are available in the cloud, while avoiding lock-in to one specific proprietary infrastructure. We believe that this can be achieved by appropriate choices of the front-end and back-end components. The choices proposed in this technical strategy are Django, an open source web framework, and Google App Engine, a proprietary cloud computing platform. The strategy of open source front-end, proprietary (but free) back-end is therefore more specifically implemented as Django frontend, Google App Engine back-end The cloud computing framework Google App Engine ( appengine) is a platform for running web apps in the cloud on Google s infrastructure. One of the motivations for choosing App Engine as the preferred cloud computing framework is that Google currently allow applications to be run entirely free of charge, as long as they stay within certain quotas. The quotas apply to several dimensions: processing power, overall storage capacity, individual file sizes, response times. Significant applications can be implemented within the free quotas, and can be hosted on Google s infrastructure with zero running costs. Like other cloud frameworks, there are no maintenance costs for server hardware or server software. The Google architectures are massively scalable. If the quotas are exceeded App Engine is no longer free, but this will only occur if the applications are massively successful, which is a very desirable problem. Even in this case, there is no obligation to pay for the additional resources required to meet the higher demand. The application can simply be restricted to the free quotas. The users will experience this as longer response times or reduced service availability at times of high demand, but there will be no charges unless billing has been authorized. When selecting a framework that is currently free of charge, the danger of lock-in to the specific technology must be considered, in case charging is introduced at some time in the future. This important question is addressed in Section 5.3..
2 Figure 1: Example application: tokenized text The web app front-end App Engine includes its own simple web app framework, but other standards-compliant front-end frameworks can be imported. Our strategy uses Django ( djangoproject.com), a successful and widely-used open source Python web app framework (Holovaty and Kaplan-Moss, 2009). Django provides a wide range of components that speed up web app development. One of the most important is the Django template engine, which supports dynamic generation of HTML web pages. The template slots are filled-in with the relevant information from the specific context, using appropriate filters, conditionals and loops. Collections of templates can be managed by organizing them into template hierarchies, where more specific templates inherit information from base templates. Inheritance can take place at several different levels. Django also provides a clean way to manage the mapping between the application URLs and the processing code that handles the HTTP requests, and an object-relational mapping (ORM) between the object-oriented Python processing code and the back-end relational database models The database back-end Django is normally used with an SQL database. This can be a full-scale database system such as MySQL or a light database such as SQLite3. By contrast, App Engine is normally used with its own non-relational datastore, which is based on Google s BigTable technology. The advantages of using the App Engine datastore are that its use is free within the quotas, while being massively scalable if required. However, there are two main disadvantages. First, the non-relational NoSQL architecture is less familiar to most developers than standard SQL databases. Second, there could be a danger of lock-in to Google s proprietary technology. The example application described in Section 4. originally used the App Engine datastore back-end together with the App Engine web app front-end. This version can be seen at The prototype has subsequently been re-implemented to make it portable, so that either a MySQL relational database or an App Engine non-relational datastore can be used. It is possible to combine a Django front-end with an App Engine datastore back-end. This version of our example application can be seen at appspot.com. It has recently become possible to use a MySQL database with App Engine in the Google Cloud SQL service (http: //code.google.com/p/googlecloudsql). Another version of our example application, combining Django and MySQL with App Engine, can be seen at Application processing In our strategy the application processing that connects the front-end user interface and the back-end database is written in Python. We use NLTK Natural Language Toolkit (Bird et al., 2009) for the language processing tasks, where possible, while organizing the user interaction within the Django framework. NLTK ( provides a set of tools and resources for natural language processing. Like Django, NLTK is a successful and widely-used open source
3 Figure 2: Example application: part-of-speech tags and a tooltip explanation. Python toolkit. The ready-made NLTK tools include a sentence boundary detector nltk.sent tokenize(), a word tokenizer nltk.word tokenize(), a part-of-speech tagger nltk.pos tag() and a classifier-based named entity recognizer nltk.ne chunker(). In addition, NLTK includes useful wordlists, such as lists of stopwords. NLTK also includes a complete version of WordNet, and a convenient Python-WordNet interface. However, there are some technical issues in using these tools with Google App Engine, which are discussed further in Section Annotation format The most widely-used markup language for linguistic annotation of texts is XML. While it is generally agreed that XML should be used for external interchange of linguistic annotations, as it is the global standard for data interchange, it is not necessarily the best choice for internal representation of annotations. When working in Python it is more convenient to use JSON as an internal representation. Python objects can be serialized easily and quickly to JSON strings, and JSON strings can be deserialized easily and quickly to Python objects. Our strategy therefore recommends storing linguistic annotations in JSON format in the back-end database. Typically, complete chapters of novels can be stored as long text strings in the database, even when expanded by adding linguistic annotations. 3. Related Work Corpus linguistics is usually done with corpus tools such as WordSmith and AntConc. WordSmith (Scott, 2008) is a proprietary concordancing tool for Windows (http: // AntConc (Antony, 2005) is a freeware concordancing tool for Windows, Mac or Linux ( waseda.ac.jp/software.html). In both cases these tools are typically used on a PC with the corpus and the corpus tool locally installed. Their strong point is that users can easily collect their own corpora and process them with these tools. A radically different approach enables corpus queries from ordinary web browsers. This has two major advantages: the user does not need to install special software, and the user does not need to store local copies of the corpora. A good example of a web-based interface to an annotated corpus is BNCweb (Hoffmann et al., 2008), a web interface for the British National Corpus. In BNCweb the frontend user interface runs in an orinary web browser and provides extensive facilities for querying the corpus, viewing concordances, and other services. The back-end MySQL database contains the British National Corpus, converted from its original XML format and indexed for fast processing with MySQL. However, BNCweb runs on conventional web servers, not in the cloud. In earlier work (Wilcock, 2010) we described a prototype that demonstrated the use of language technology in a cloud computing environment. This version can be seen at It runs on Google App Engine and presents a web browser inter-
4 Figure 3: Example application: NP, PP, VP phrase chunks. face to an annotated corpus of Jane Austen novels. The browser displays different types of annotations, including part-of-speech tagging, phrase chunks, and word sense definitions from WordNet. However, Wilcock (2010) did not address the problem of how to avoid lock-in to a proprietary framework. This is an important question that we discuss in Section An Example Application Screenshots from the example application with the halfmillion-word annotated corpus of Jane Austen texts are shown in Figures 1 to 6. Although we use NLTK tools for language processing as much as possible, the example application does not use the NLTK tokenizer nltk.word tokenize() because there are specific problems in tokenizing the Gutenberg texts of the Jane Austen novels. One problem is the use of a double hyphen (--) to represent a dash. Wilcock (2010) gives an example from the third sentence in Northanger Abbey which includes the string Richard--and. This is tokenized as a single token by the standard NLTK tokenizer. Our example application therefore uses a regular expression tokenizer that splits this string correctly into three tokens. This can be seen in Figure 1. The example application also does not use the NLTK partof-speech tagger nltk.pos tag() for the reasons given in Section 5.. The application uses an alternative pure Python tagger trained on the NLTK Treebank corpus, a subset of the full Penn Treebank corpus. The tagger is uploaded into App Engine as a pickle file. An example of text with part-of-speech tags can be seen in Figure 2. Phrase chunks for NPs, PPs, and VPs are identified using NLTK s regular expression parser over POS tag sequences, and are annotated with IOB chunk labels. Phrase chunking is displayed with colour-coded highlighting as shown in Figure 3. Simple word frequencies and concordances can also be displayed, as shown in Figure 4 and Figure 5. These are both rather basic, and certainly do not match the sophistication of dedicated concordance tools such as WordSmith, AntConc or BNCweb. The concordances are created using NLTK s ConcordanceIndex() method, and show all occurrences of a word in a novel, not chapter by chapter. The offsets for the whole novel are calculated off-line and uploaded to datastore in a serialized JSON format. Words are also annotated with word sense definitions using NLTK s Python-WordNet interface. Words that have Word- Net definitions are highlighted, and the definition pops up in a tooltip when the mouse hovers over the word, as shown in Figure 6. The range of possible definitions for each word is restricted by the part-of-speech tag already decided by the POS tagger. A simple form of word sense disambiguation is used to select one definition to be displayed. This is based on the simplified Lesk algorithm, with the most frequent WordNet sense as back-off. 5. Technical Issues This section discusses some potential problems relevant to our strategy and describes solutions. First, there are restrictions imposed by Google App Engine in order to support scalability. Next there are some technical issues in using
5 Figure 4: Example application: simple word frequencies. specific NLTK tools with App Engine. Finally, there is the danger of lock-in to Google s proprietary framework Scalability and restrictions When Google App Engine was designed, one of the key requirements was that it must allow massive scalability. As a result, small applications must be designed for scalability in the same way as large applications. To ensure scalability, various restrictions are imposed on all App Engine applications. There are different types of restrictions, on the programming language, maximum number of files, maximum file size, and so on. A major programming language restriction is that the code must be pure Python, not depending on modules implemented in other language such as C. This means that you cannot upload code that uses numpy, which is written in C. You cannot use cpickle, but you can use pure Python pickle. Up to now, the maximum file count in an App Engine application has been 3,000. If you bundle large packages (such as Django or NLTK) with your app, you could hit this limit. However, this problem can be avoided by using zipimport (Sanderson, 2008). In fact, recent versions of Django are included in recent versions of App Engine, so you do not need to bundle Django with your app, as (Sanderson, 2008) points out. Up to now, the maximum file size allowed in App Engine has been 10 megabytes. In the NLTK version of WordNet, the file containing all the nouns is just over 15 megabytes, so the WordNet data cannot itself be uploaded into App Engine. Files can be annotated with WordNet definitions off-line, and the annotated files can be uploaded so long as they are less than 10 megabytes. For the Jane Austen novels each chapter text fits easily within the maximum, and when annotations are added for part-of-speech tags and other small features, the file size is still less than the limit. However, when WordNet definitions are added the file size increases drastically because the definition strings are quite long and many words have multiple definitions, so some chapters can exceed the limit. This problem is solved by doing word sense disambiguation, so that only one definition is used NLTK and App Engine NLTK includes a wide range of components implemented by different people in different ways, and some of them use numpy or other C modules. This means that you cannot simple do import NLTK in App Engine. As (Wilcock, 2010) points out, there are two ways to use NLTK with App Engine. One way is to use NLTK offline to create the required annotations. If the annotations are saved for example as JSON text files, these files can be included in the folders uploaded to the cloud as part of your App Engine app. This approach has the advantage that you can use all the NLTK components with no restrictions, even if they use C or numpy. The other way is to make a stripped-down version of NLTK in a new folder, only including specific components that use pure Python. Then you can include this new NLTK folder in your app, and you can do import NLTK. In this approach, annotations are created by tools running inside the App Engine framework. As noted above, tools written in pure Python can be used in App Engine, but tools written in C cannot be used. Some of the NLTK tools are pure Python so they can be imported into App Engine suc-
6 Figure 5: Example application: a simple word concordance. cessfully, but some cannot. Alternative pure Python tools should be used. Further details of which NLTK tools can and cannot be used in App Engine are discussed in (Wilcock, 2010) Avoiding lock-in There has recently been controversy about changes in the pricing scheme for commercial applications in App Engine, but free quotas are still available and in some cases the quotas have even been increased. While it is very attractive to run natural language processing applications and linguistic corpora free of charge on Google s infrastructure, there is always the possibility that charging might be introduced in the future. It is therefore advisable to beware of the danger of lock-in to one proprietary system, and even to have an exit strategy in case of need. The danger of lock-in to Google s framework can be largely avoided by taking two steps. The first step concerns the web app front-end. By using a well-designed and widelyused open source web framework like Django, it will be much easier to move the application away from Google infrastructure to a more traditional server if that is desired in future, because standard servers can run standard Django web apps. The second step concerns the back-end datastore. Although Django is normally used with standard SQL databases, Django s ORM (object-relational mapping) maps Python objects (logical models) to relations (database tables). This allows an SQL database to be used from Python code without actually writing SQL statements. The open source django-nonrel project (Kornewald and Wanschik, 2011) is an extension of standard Django that maps Python objects at a higher level of abstraction, allowing either SQL databases or NoSQL databases to be used with the same models, provided the data models have not been designed around specific SQL-only or specific NoSQL-only features. This makes Django web apps portable between SQL databases and the App Engine datastore, thereby avoiding the danger of lock-in (Wanschik et al., 2010). However, django-nonrel is not included in standard Django and is not supported by the Django Software Foundation. If using django-nonrel is considered unsuitable, there are two main alternatives. One option is to keep the Django front-end and re-write the database back-end to use App Engine datastore explicitly. We did this for our example application and the conversion was very easy, as the ORM mappings for Django and App Engine are very similar. This version runs at django-appeng.appspot.com. The other alternative is to use Django with a MySQL database and not with App Engine datastore. This avoids re-writing any code, and there are possibilities for running the application in the cloud. One option is to use the Google Cloud SQL service, which combines App Engine with a MySQL database. The pricing for Google Cloud SQL is not yet known, but the preview service is free. A version of our example application with Django and MySQL runs on Google Cloud SQL at appspot.com. 6. Conclusions and Future Work The working example application shows that free storage and free delivery of annotated corpora can be achieved by
7 Figure 6: Example application: a word sense definition in a tooltip. the approach described. Care must be taken to avoid lockin to one proprietary infrastructure, but this risk can be minimized by adopting open source web frameworks like Django as basic components of the application. Section 5.3. discussed approaches to avoiding lock-in. One option is to use Django with MySQL and not App Engine datastore, because MySQL can be used in a wide range of environments, either on cloud services or on conventional web servers. Using MySQL with the Google Cloud SQL service is currently free, but charging is expected later. There are other Platform-as-a-Service providers, such as Red Hat Cloud, offering free cloud services including Django and MySQL. We are currently setting up another MySQL-based instance of our example corpus application on Red Hat Cloud at rhcloud.com. As we use JSON format rather than XML for the annotations (as mentioned in Section 2.5.), we are currently investigating document-oriented databases that use JSON format directly. These include CouchDB and MongoDB (which uses binary JSON: BSON). We are setting up a MongoDB-based instance of our example corpus application on Red Hat Cloud at rhcloud.com. Future work will develop better methods for handling word frequency analysis and more sophisticated concordance queries, at least including multi-word phrases and part-ofspeech tags. Several further corpora will be made available in the cloud, starting with the Brown Corpus which is nicely divided into small files ready for uploading and offers scope for genre-based concordance querying. For this workshop, the most interesting future work would be to combine cloud delivery with crowd sourcing. App Engine has facilities for individual user authentication and for maintaining user-specific records in the datastore. If standoff markup is used, updated annotations input by individual users could be stored as alternatives without damage to the exisiting annotations. Crowd sourcing algorithms could be deployed to decide which alternatives should be applied as updates to the displayed corpora. These possibilities await further work. 7. References Laurence Antony AntCon: Design and development of a freeware corpus analysis toolkit for the technical writing classroom. In Proceedings of International Professional Communication Conference. Steven Bird, Ewan Klein, and Edward Loper Natural Language Processing with Python. O Reilly. Sebastian Hoffmann, Stefan Evert, Nicholas Smith, David Leed, and Ylva Berglund Prytz Corpus Linguistics with BNCweb - a Practical Guide. Peter Lang, Frankfurt am Main. Adrian Holovaty and Jacob Kaplan-Moss The Definitive Guide to Django (second edition). Apress. Waldemar Kornewald and Thomas Wanschik Django-nonrel - NoSQL support for Django. projects/django-nonrel. Dan Sanderson Using Django 1.0 on App Engine with ZipImport. com/appengine/articles/django10_ zipimport.html.
8 Mike Scott WordSmith Tools version 5. Liverpool: Lexical Analysis Software. Thomas Wanschik, Waldemar Kornewald, and Wesley Chun Running Pure Django Projects on Google App Engine. appengine/articles/django-nonrel.html. Graham Wilcock Cloud computing for the humanities: Two approaches for language technology. In Human Language Technologies - The Baltic Perspective: Proceedings of the Fourth International Conference Baltic HLT 2010, Riga.
Shallow Parsing with Apache UIMA
Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland [email protected] Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic
Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013
Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data
Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1
Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically
Combining structured data with machine learning to improve clinical text de-identification
Combining structured data with machine learning to improve clinical text de-identification DT Tran Scott Halgrim David Carrell Group Health Research Institute Clinical text contains Personally identifiable
Open Source Technologies on Microsoft Azure
Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions
APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS
APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS This article looks into the benefits of using the Platform as a Service paradigm to develop applications on the cloud. It also compares a few top PaaS providers
Realization of Inventory Databases and Object-Relational Mapping for the Common Information Model
Realization of Inventory Databases and Object-Relational Mapping for the Common Information Model Department of Physics and Technology, University of Bergen. November 8, 2011 Systems and Virtualization
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
Automatic Text Analysis Using Drupal
Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing
CatDV Pro Workgroup Serve r
Architectural Overview CatDV Pro Workgroup Server Square Box Systems Ltd May 2003 The CatDV Pro client application is a standalone desktop application, providing video logging and media cataloging capability
Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project
Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project Paul Bone [email protected] June 2008 Contents 1 Introduction 1 2 Method 2 2.1 Hadoop and Python.........................
Client vs. Server Implementations of Mitigating XSS Security Threats on Web Applications
Journal of Basic and Applied Engineering Research pp. 50-54 Krishi Sanskriti Publications http://www.krishisanskriti.org/jbaer.html Client vs. Server Implementations of Mitigating XSS Security Threats
2012 LABVANTAGE Solutions, Inc. All Rights Reserved.
LABVANTAGE Architecture 2012 LABVANTAGE Solutions, Inc. All Rights Reserved. DOCUMENT PURPOSE AND SCOPE This document provides an overview of the LABVANTAGE hardware and software architecture. It is written
A Monitored Student Testing Application Using Cloud Computing
A Monitored Student Testing Application Using Cloud Computing R. Mullapudi and G. Hsieh Department of Computer Science, Norfolk State University, Norfolk, Virginia, USA [email protected], [email protected]
PROJECT REPORT OF BUILDING COURSE MANAGEMENT SYSTEM BY DJANGO FRAMEWORK
PROJECT REPORT OF BUILDING COURSE MANAGEMENT SYSTEM BY DJANGO FRAMEWORK by Yiran Zhou a Report submitted in partial fulfillment of the requirements for the SFU-ZU dual degree of Bachelor of Science in
The Cloud to the rescue!
The Cloud to the rescue! What the Google Cloud Platform can make for you Aja Hammerly, Developer Advocate twitter.com/thagomizer_rb So what is the cloud? The Google Cloud Platform The Google Cloud Platform
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013
Database Management System Choices Introduction To Database Systems CSE 373 Spring 2013 Outline Introduction PostgreSQL MySQL Microsoft SQL Server Choosing A DBMS NoSQL Introduction There a lot of options
Google Cloud Platform The basics
Google Cloud Platform The basics Who I am Alfredo Morresi ROLE Developer Relations Program Manager COUNTRY Italy PASSIONS Community, Development, Snowboarding, Tiramisu' Reach me [email protected]
SOA, case Google. Faculty of technology management 07.12.2009 Information Technology Service Oriented Communications CT30A8901.
Faculty of technology management 07.12.2009 Information Technology Service Oriented Communications CT30A8901 SOA, case Google Written by: Sampo Syrjäläinen, 0337918 Jukka Hilvonen, 0337840 1 Contents 1.
BEST WEB PROGRAMMING LANGUAGES TO LEARN ON YOUR OWN TIME
BEST WEB PROGRAMMING LANGUAGES TO LEARN ON YOUR OWN TIME System Analysis and Design S.Mohammad Taheri S.Hamed Moghimi Fall 92 1 CHOOSE A PROGRAMMING LANGUAGE FOR THE PROJECT 2 CHOOSE A PROGRAMMING LANGUAGE
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
PaaS - Platform as a Service Google App Engine
PaaS - Platform as a Service Google App Engine Pelle Jakovits 14 April, 2015, Tartu Outline Introduction to PaaS Google Cloud Google AppEngine DEMO - Creating applications Available Google Services Costs
Corpus and Discourse. The Web As Corpus. Theory and Practice MARISTELLA GATTO LONDON NEW DELHI NEW YORK SYDNEY
Corpus and Discourse The Web As Corpus Theory and Practice MARISTELLA GATTO B L O O M S B U R Y LONDON NEW DELHI NEW YORK SYDNEY Contents List of Figures xiii List of Tables xvii Preface xix Acknowledgements
Survey Public Visualization Services
Survey Public Visualization Services Stefan Kölbl Markus Unterleitner Benedict Wright Graz University of Technology A-8010 Graz, Austria 4 May 2012 Abstract This survey is about public visualization services.
Syllabus INFO-GB-3322. Design and Development of Web and Mobile Applications (Especially for Start Ups)
Syllabus INFO-GB-3322 Design and Development of Web and Mobile Applications (Especially for Start Ups) Spring 2015 Stern School of Business Norman White, KMEC 8-88 Email: [email protected] Phone: 212-998
Middleware- Driven Mobile Applications
Middleware- Driven Mobile Applications A motwin White Paper When Launching New Mobile Services, Middleware Offers the Fastest, Most Flexible Development Path for Sophisticated Apps 1 Executive Summary
Real-Time Identification of MWE Candidates in Databases from the BNC and the Web
Real-Time Identification of MWE Candidates in Databases from the BNC and the Web Identifying and Researching Multi-Word Units British Association for Applied Linguistics Corpus Linguistics SIG Oxford Text
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
QualysGuard WAS. Getting Started Guide Version 3.3. March 21, 2014
QualysGuard WAS Getting Started Guide Version 3.3 March 21, 2014 Copyright 2011-2014 by Qualys, Inc. All Rights Reserved. Qualys, the Qualys logo and QualysGuard are registered trademarks of Qualys, Inc.
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
Integration of Learning Management Systems with Social Networking Platforms
Integration of Learning Management Systems with Social Networking Platforms E-learning in a Facebook supported environment Jernej Rožac 1, Matevž Pogačnik 2, Andrej Kos 3 Faculty of Electrical engineering
The Django web development framework for the Python-aware
The Django web development framework for the Python-aware Bill Freeman PySIG NH September 23, 2010 Bill Freeman (PySIG NH) Introduction to Django September 23, 2010 1 / 18 Introduction Django is a web
Language and Computation
Language and Computation week 13, Thursday, April 24 Tamás Biró Yale University [email protected] http://www.birot.hu/courses/2014-lc/ Tamás Biró, Yale U., Language and Computation p. 1 Practical matters
Web Application Frameworks. Robert M. Dondero, Ph.D. Princeton University
Web Application Frameworks Robert M. Dondero, Ph.D. Princeton University 1 Objectives You will learn about: The Django web app framework Other MVC web app frameworks (briefly) Other web app frameworks
WebLicht: Web-based LRT services for German
WebLicht: Web-based LRT services for German Erhard Hinrichs, Marie Hinrichs, Thomas Zastrow Seminar für Sprachwissenschaft, University of Tübingen [email protected] Abstract This software
An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines)
An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines) James Clarke, Vivek Srikumar, Mark Sammons, Dan Roth Department of Computer Science, University of Illinois, Urbana-Champaign.
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising
Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising Open Data Partners and AdReady April 2012 1 Executive Summary AdReady is working to develop and deploy sophisticated
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University [email protected] Kapil Dalwani Computer Science Department
XML Processing and Web Services. Chapter 17
XML Processing and Web Services Chapter 17 Textbook to be published by Pearson Ed 2015 in early Pearson 2014 Fundamentals of http://www.funwebdev.com Web Development Objectives 1 XML Overview 2 XML Processing
Cloud Powered Mobile Apps with Microsoft Azure
Cloud Powered Mobile Apps with Microsoft Azure Malte Lantin Technical Evanglist Microsoft Azure Malte Lantin Technical Evangelist, Microsoft Deutschland Fokus auf Microsoft Azure, App-Entwicklung Student
Syllabus INFO-UB-3322. Design and Development of Web and Mobile Applications (Especially for Start Ups)
Syllabus INFO-UB-3322 Design and Development of Web and Mobile Applications (Especially for Start Ups) Fall 2014 Stern School of Business Norman White, KMEC 8-88 Email: [email protected] Phone: 212-998
Developing ASP.NET MVC 4 Web Applications
Course M20486 5 Day(s) 30:00 Hours Developing ASP.NET MVC 4 Web Applications Introduction In this course, students will learn to develop advanced ASP.NET MVC applications using.net Framework 4.5 tools
Budget Event Management Design Document
Budget Event Management Design Document Team 4 Yifan Yin(TL), Jiangnan Shangguan, Yuan Xia, Di Xu, Xuan Xu, Long Zhen 1 Purpose Summary List of Functional Requirements General Priorities Usability Accessibility
PyCantonese: Cantonese linguistic research in the age of big data
PyCantonese: Cantonese linguistic research in the age of big data Jackson L. Lee University of Chicago http://jacksonllee.com Childhood Bilingualism Research Center, CUHK September 15, 2015 Grammar versus
Team Members: Christopher Copper Philip Eittreim Jeremiah Jekich Andrew Reisdorph. Client: Brian Krzys
Team Members: Christopher Copper Philip Eittreim Jeremiah Jekich Andrew Reisdorph Client: Brian Krzys June 17, 2014 Introduction Newmont Mining is a resource extraction company with a research and development
Natural Language Processing
Natural Language Processing 2 Open NLP (http://opennlp.apache.org/) Java library for processing natural language text Based on Machine Learning tools maximum entropy, perceptron Includes pre-built models
31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
Introducing Apache Pivot. Greg Brown, Todd Volkert 6/10/2010
Introducing Apache Pivot Greg Brown, Todd Volkert 6/10/2010 Speaker Bios Greg Brown Senior Software Architect 15 years experience developing client and server applications in both services and R&D Apache
AUTOMATED CONFERENCE CD-ROM BUILDER AN OPEN SOURCE APPROACH Stefan Karastanev
International Journal "Information Technologies & Knowledge" Vol.5 / 2011 319 AUTOMATED CONFERENCE CD-ROM BUILDER AN OPEN SOURCE APPROACH Stefan Karastanev Abstract: This paper presents a new approach
A stream computing approach towards scalable NLP
A stream computing approach towards scalable NLP Xabier Artola, Zuhaitz Beloki, Aitor Soroa IXA group. University of the Basque Country. LREC, Reykjavík 2014 Table of contents 1
Cloudy with a chance of 0-day
Cloudy with a chance of 0-day November 12, 2009 Jon Rose Trustwave [email protected] The Foundation http://www.owasp.org Jon Rose Trustwave SpiderLabs Phoenix DC AppSec 09! Tom Leavey Trustwave SpiderLabs
Virtual Credit Card Processing System
The ITB Journal Volume 3 Issue 2 Article 2 2002 Virtual Credit Card Processing System Geraldine Gray Karen Church Tony Ayres Follow this and additional works at: http://arrow.dit.ie/itbj Part of the E-Commerce
Sisense. Product Highlights. www.sisense.com
Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze
MongoDB. The Definitive Guide to. The NoSQL Database for Cloud and Desktop Computing. Apress8. Eelco Plugge, Peter Membrey and Tim Hawkins
The Definitive Guide to MongoDB The NoSQL Database for Cloud and Desktop Computing 11 111 TECHNISCHE INFORMATIONSBIBLIO 1 HEK UNIVERSITATSBIBLIOTHEK HANNOVER Eelco Plugge, Peter Membrey and Tim Hawkins
Chunk Parsing. Steven Bird Ewan Klein Edward Loper. University of Melbourne, AUSTRALIA. University of Edinburgh, UK. University of Pennsylvania, USA
Chunk Parsing Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA University of Edinburgh, UK University of Pennsylvania, USA March 1, 2012 chunk parsing: efficient and robust approach
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
Application of NoSQL Database in Web Crawling
Application of NoSQL Database in Web Crawling College of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, 210044, China doi:10.4156/jdcta.vol5.issue6.31 Abstract
Pro<DOC/> e-commerce Technology An Introduction
Pro e-commerce Technology An Introduction From Rightangle Technologies Private Limited (www.rigthangle.co.in) 1 P a g e R i g h t a n g l e T e c h n o l o g i e s P v t. L t d. 1 Problem Statement
owncloud Architecture Overview
owncloud Architecture Overview Time to get control back Employees are using cloud-based services to share sensitive company data with vendors, customers, partners and each other. They are syncing data
HTML5. Turn this page to see Quick Guide of CTTC
Programming SharePoint 2013 Development Courses ASP.NET SQL TECHNOLGY TRAINING GUIDE Visual Studio PHP Programming Android App Programming HTML5 Jquery Your Training Partner in Cutting Edge Technologies
Big Data Database Revenue and Market Forecast, 2012-2017
Wikibon.com - http://wikibon.com Big Data Database Revenue and Market Forecast, 2012-2017 by David Floyer - 13 February 2013 http://wikibon.com/big-data-database-revenue-and-market-forecast-2012-2017/
NoSQL web apps. w/ MongoDB, Node.js, AngularJS. Dr. Gerd Jungbluth, NoSQL UG Cologne, 4.9.2013
NoSQL web apps w/ MongoDB, Node.js, AngularJS Dr. Gerd Jungbluth, NoSQL UG Cologne, 4.9.2013 About us Passionate (web) dev. since fallen in love with Sinclair ZX Spectrum Academic background in natural
2013 Ruby on Rails Exploits. CS 558 Allan Wirth
2013 Ruby on Rails Exploits CS 558 Allan Wirth Background: Ruby on Rails Ruby: Dynamic general purpose scripting language similar to Python Ruby on Rails: Popular Web app framework using Ruby Designed
Responsive, resilient, elastic and message driven system
Responsive, resilient, elastic and message driven system solving scalability problems of course registrations Janina Mincer-Daszkiewicz, University of Warsaw [email protected] Dundee, 2015-06-14 Agenda
Open-Source Daycare Management System Project Proposal
Open-Source Daycare Management System Project Proposal Jason Butz University of Evansville December 3, 2009 Contents 1 Introduction 2 2 Technical Approach 2 2.1 Background..............................................
Rotorcraft Health Management System (RHMS)
AIAC-11 Eleventh Australian International Aerospace Congress Rotorcraft Health Management System (RHMS) Robab Safa-Bakhsh 1, Dmitry Cherkassky 2 1 The Boeing Company, Phantom Works Philadelphia Center
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews Minqing Hu and Bing Liu Department of Computer Science University of Illinois at Chicago 851 South Morgan Street Chicago, IL 60607-7053 {mhu1, liub}@cs.uic.edu
owncloud Architecture Overview
owncloud Architecture Overview owncloud, Inc. 57 Bedford Street, Suite 102 Lexington, MA 02420 United States phone: +1 (877) 394-2030 www.owncloud.com/contact owncloud GmbH Schloßäckerstraße 26a 90443
Building native mobile apps for Digital Factory
DIGITAL FACTORY 7.0 Building native mobile apps for Digital Factory Rooted in Open Source CMS, Jahia s Digital Industrialization paradigm is about streamlining Enterprise digital projects across channels
What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World
COSC 304 Introduction to Systems Introduction Dr. Ramon Lawrence University of British Columbia Okanagan [email protected] What is a database? A database is a collection of logically related data for
Course Scheduling Support System
Course Scheduling Support System Roy Levow, Jawad Khan, and Sam Hsu Department of Computer Science and Engineering, Florida Atlantic University Boca Raton, FL 33431 {levow, jkhan, samh}@fau.edu Abstract
Visualization of Semantic Windows with SciDB Integration
Visualization of Semantic Windows with SciDB Integration Hasan Tuna Icingir Department of Computer Science Brown University Providence, RI 02912 [email protected] February 6, 2013 Abstract Interactive Data
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
MEAN/Full Stack Web Development - Training Course Package
Brochure More information from http://www.researchandmarkets.com/reports/3301786/ MEAN/Full Stack Web Development - Training Course Package Description: This course pack features a detailed exploration
Foundations for your. portable cloud
Foundations for your portable cloud Start Today Red Hat s cloud vision is unlike that of any other IT vendor. We recognize that IT infrastructure is and will continue to be composed of pieces from many
Big Data Visualization and Dashboards
Big Data Visualization and Dashboards Boney Pandya Marketing Manager Greg Harris Systems Engineer Follow us @Jinfonet #BigDataWebinar JReport Highlights Advanced, Embedded Data Visualization Platform:
WESTERNACHER OUTLOOK E-MAIL-MANAGER OPERATING MANUAL
TABLE OF CONTENTS 1 Summary 3 2 Software requirements 3 3 Installing the Outlook E-Mail Manager Client 3 3.1 Requirements 3 3.1.1 Installation for trial customers for cloud-based testing 3 3.1.2 Installing
Ad Hoc Analysis of Big Data Visualization
Ad Hoc Analysis of Big Data Visualization Dean Yao Director of Marketing Greg Harris Systems Engineer Follow us @Jinfonet #BigDataWebinar JReport Highlights Advanced, Embedded Data Visualization Platform:
A Quick Introduction to Google's Cloud Technologies
A Quick Introduction to Google's Cloud Technologies Chris Schalk Developer Advocate @cschalk Anatoli Babenia Agenda Introduction Introduction to Google's Cloud Technologies App Engine Review Google's new
Big Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
An Enterprise Approach to Mobile File Access and Sharing
White Paper File and Networking Services An Enterprise Approach to Mobile File Access and Sharing Table of Contents page Anywhere, Any Device File Access with IT in Control...2 Novell Filr Competitive
Introduction to Directory Services
Introduction to Directory Services Overview This document explains how AirWatch integrates with your organization's existing directory service such as Active Directory, Lotus Domino and Novell e-directory
Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases
NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases Background Inspiration: postgresapp.com demo.beatstream.fi (modern desktop browsers without
A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL
A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL *Hung-Ming Chen, Chuan-Chien Hou, and Tsung-Hsi Lin Department of Construction Engineering National Taiwan University
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches
Concepts of Database Management Seventh Edition Chapter 9 Database Management Approaches Objectives Describe distributed database management systems (DDBMSs) Discuss client/server systems Examine the ways
Heterogeneous Tools for Heterogeneous Network Management with WBEM
Heterogeneous Tools for Heterogeneous Network Management with WBEM Kenneth Carey & Fergus O Reilly Adaptive Wireless Systems Group Department of Electronic Engineering Cork Institute of Technology, Cork,
