SHEBANQ! System for HEBrew Text: ANnotations for Queries and Markup! Dirk Roorda - researcher @ DANS,TLA!

Similar documents
CLARIN-NL Third Call: Closed Call

GATE Mímir and cloud services. Multi-paradigm indexing and search tool Pay-as-you-go large-scale annotation

Best Practices for Structural Metadata Version 1 Yale University Library June 1, 2008

Queen s Open Journal System (OJS) Business Case

Guide for Writing an Exegesis On a Biblical Passage

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU

Shallow Parsing with Apache UIMA

Community Edition. Master Data Management 3.X. Administrator Guide

TEANLIS - Text Analysis for Literary Scholars

TO ASK FOR YOUR FREE TRIAL: lexum.com/decisia. OR CONTACT US: EFFICIENT ACCESS TO YOUR DECISIONS

St Patrick s College Maynooth. Faculty of Theology. Essay Writing Guidelines for Students in BD, BATh, BTh, and Higher Diploma in Theological Studies

EDRMS Migration Project Checklist

AUTHOR GUIDELINES Handbook Articles

Database preservation toolkit:

Core Fittings C-Core and CD-Core Fittings

Custom Urgent System Design Blueprint

ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search

data.bris: collecting and organising repository metadata, an institutional case study

» A Hardware & Software Overview. Eli M. Dow <emdow@us.ibm.com:>

How To Write A Blog Post On Globus

EOFS Workshop Paris Sept, Lustre at exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

2. Background on Data Management. Aspects of Data Management and an Overview of Solutions used in Engineering Applications

The Knowledge Sharing Infrastructure KSI. Steven Krauwer

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

OpenAIRE Research Data Management Briefing paper

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

CRM dig : A generic digital provenance model for scientific observation

Die Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF

Name: Note that the TEAS 2009 score report for reading has the following subscales:

Columbia University Digital Library Architecture. Robert Cartolano, Director Library Information Technology Office October, 2009

Course 6232A: Implementing a Microsoft SQL Server 2008 Database

TRUSTED ARCHIVE OVERVIEW

bigdata Managing Scale in Ontological Systems

Download Check My Words from:

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

Digital Preservation. OAIS Reference Model

How To Manage Research Data At Columbia

Training Plan 12 - Months MS Office, Customer Service, Medical Office, Billing and Coding

Avid. Interfacing with Avid inews. Including inews Web Services Version 1.0

The Institutional Repository at West Virginia University Libraries: Resources for Effective Promotion

Virtual research environments: learning gained from a situation and needs analysis for malaria researchers

Managing Data in Motion

Component MetaData Infrastructure

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Expanding Metadata Reuse with an Islandora Metadata Extraction Utility

The Open Source CMS. Open Source Java & XML

1 File Processing Systems

SOFTWARE ENGINEERING PROGRAM

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

SEO Basics for Starters

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Flattening Enterprise Knowledge

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

A GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing

Evaluating SPARQL-to-SQL translation in ontop

1 About This Proposal

Get to Grips with SEO. Find out what really matters, what to do yourself and where you need professional help

HadoopRDF : A Scalable RDF Data Analysis System

ECTS equivalent N/A

Lesson 8: Introduction to Databases E-R Data Modeling

Analysis of Data Mining Concepts in Higher Education with Needs to Najran University

Course Descriptions MA in Archaeology and Biblical Studies Southwestern Baptist Theological Seminary

COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014

A federated data infrastructure: the Dutch way forward

Chapter 1: Introduction

Life Cycle of Records

The Google File System

Transcription:

Data Archiving and Networked Services! SHEBANQ! System for HEBrew Text: ANnotations for Queries and Markup! Dirk Roorda - researcher @ DANS,TLA! TEI pre-conference workshop: Query! Roma 2013-10-01!

Overview 1. Context: text, data, research in Hebrew Bible 2. MdF database model, MQL query language 3. Sharing the research process 4. CLARIN-NL project: SHEBANQ 5. Towards new tools

1 (of 5) Context Text, data and research in the Hebrew Bible

VU Amsterdam Eep Talstra Centre for Bible and Computer text + linguistic features database + research questions => database => publications 4!

2 (of 5) MdF and MQL MdF database model MQL query language

Monad Object Feature 1977-now: Eep Talstra et al. ECA, WIVU. Print reference (Google Books) 1988-1994 Crist-Jan Doedens: Text Databases One Database Model and Several Retrieval Languages (google books reference) 2004: Ulrik Petersen. Emdros - a text database engine for analyzed or annotated text. COLING

sentence objects 84383 11.. 1 clause_atom objects clause_atom_number=1 clause_atom_relation=0 clause_atom_relation_daughter_tense=unknown clause_atom_relation_kind=no_relation clause_atom_relation_mother_tense=unknown clause_atom_relation_preposition_class=none clause_atom_type=xqtl indentation=0 34680 11.. 1 Monad-Object-Feature phrase objects 59559 11.. 5 phrase_atom objects subphrase objects 40770 77638 11.. 9 11.. 5 77637 7.. 5 lexeme_utf8= תי שאר old_lexeme_utf8= תי שאר vocalized_lexeme_utf8= תי שא ר surface_consonants_utf8= תי שאר רא ש י= graphical_lexeme_utf8 word objects 12 11 10 9 8 7 6 5 4 3 2 monads (atomic chunks of text) 11 10 9 8 7 6 5 4 3 2 1 standard edition text ית א שׁ ר בּ ים ה. א א ר בּ ת א ם י מ שּׁ ה ת א ו ץ ר א ה

MQL query language topographic, i.e: query expression =~= query results w.r.t. sequence embedding

SELECT ALL OBJECTS! WHERE! [Clause! [Phrase! [Word FOCUS! " " "part_of_speech = verb AND! " " "lexeme = "FJM["]! ]!..! [Phrase FOCUS! " "phrase_function = Objc OR! " "phrase_function = IrpO! ]!..! [Phrase FOCUS! " "phrase_function = Objc OR! " "phrase_function = IrpO! ]! ]! Example

3 (of 5) Sharing Problem: how to share (intermediate) results of analysis Solution: saving queries as annotations

Lock - in Stuttgart Electronic Study Bible massive dissemination But not the right dynamics for tool development scholarly-bibles.com!

a short history: 2012 Leiden: international workshop biblical scholarship Desiderata: new tool development text transmission (variants) linguistic analysis (features) even combined! leiden lorentz!

Hebrew Text in the Archive urn:nbn:nl:ui:13-ikjj-ek!

Hebrew Text in the Archive urn:nbn:nl:ui:13-ikjj-ek! how can the people annotate our work?!

Research Data Cycle

! Research Data Cycle religious communities Text transmission, tradition, editorial processes Free University, theology faculty, server department, WIVU project theol. scholars theol. scholars NWO projects! enlightened lay people scholarlyibles.com!

Research Data Cycle linguists religious communities CLARIN SHEBANQ Research Data Archiving DANS Text transmission, tradition, editorial processes Free University, theology faculty, server department, WIVU project! dig. hum theol. scholars Wider public: Annotation, Query Saving, via Linked Data theol. scholars comp. hum NWOprojects! projects NWO enlightened lay people lyr a l o sch m! o c. s ible

3 (of 5) Sharing (c t d) Solution: Queries As Annotations

queries-as-annotations model! query! example! body! targets! query instruction! query results in context! SELECT ALL OBJECTS WHERE [Word FOCUS part_of_speech = verb AND!["שים" = lexeme ו י ש כ ם י ע ק ב ב ב ק ר ו י ק ח א ת ה א ב ן א ש ר ש ם מ ר א ש ת יו ו י ש ם א ת ה מ צ ב ה ו י צ ק ש מ ן ע ל ר אש ה annotation! published query! qu123 (just an identifier)! metadata! researcher, date created, date last run, research question! Janet Dyk 2004-02-16 2012-01-27 Can the verb ש ים have a double object? - article in Foundations for Syriac Lexicography!

OpenAnnotation openannotation.org!

provenance

motivation

demonstrator datanetworkservice.nl/qaa!

demonstrator datanetworkservice.nl/qaa!

demonstrator datanetworkservice.nl/qaa!

demonstrator datanetworkservice.nl/qaa!

demonstrator

demonstrator

demonstrator

demonstrator still missing: saving queries not semantic-web-enabled sustainability

4 (of 5) Project CLARIN-NL: SHEBANQ: (A) Curation (B) Demonstrator

SHEBANQ System for Hebrew Text: ANnotations for Queries CLARIN-NL project data curation: LAF demonstrator: query saver s/g$/q/! #!/etc bc

Linguistic Annotation Framework ISO 24612:2012 Nancy Ide, Laurent Romary

feature definitions

feature definitions

TEI ISO-FS schema

dcr:datcat on <fdecl> versus <f> 26,225,966 <f>s!! 2.5 GB redundant attribute material!!

5 (of 5) Project CLARIN-NL: SHEBANQ: (B) Demonstrator

אמ ר ח ז ק י ה ו מ ה א ות כ י א ע ל ה ב ית אמ ר ח ז ק י ה ו מ ה א ות כ י א ע ל ה ב ית אמ ר ח ז ק י ה ו מ ה א ות כ י א ע ל ה ב ית רץ רץ ת ת select all objects where Edit Query [clause [phrase phrase_function = Objc [word FOCUS tense = infinitive_absolute] ] ] Executing query... Query executed Execute Query results Save this query Passage Gen 1:1 Text ב רא ש ית ב ר א א לה ים א ת ה ש מ י ם וא הא Controls א ר ב Name valency Researcher Oliver Glanz Gen 1:1 Ex 23:2 1Sam 12:4 view in context ב ר א ב רא ש ית ו י ו י ו י וא הא יהו ה יהו ה 2Chron 3:4 יהו ה א לה ים א ת ה ש מ י ם Prev 1 2 3 4 5 6... 21 22 Next 313 results Date created 2013-08-25 Date last run 2013-08-25 Project Institute Data and Tradition VU/Eep Talstra Centre for Bible and Computing א ר ב Reason irregular valency of Comments לה ים א needs to be combined with query on Cancel Save Publish

אמ ר ח ז ק י ה ו מ ה א ות כ י א ע ל ה ב ית אמ ר ח ז ק י ה ו מ ה א ות כ י א ע ל ה ב ית אמ ר ח ז ק י ה ו מ ה א ות כ י א ע ל ה ב ית רץ רץ ת ת Query Info MQL query text Persistent Identifier urn:nbn:nl:ui:13-scpm-ji select all objects where http://www.persistent-identifier.nl/?identifier=urn... [clause [phrase phrase_function = Objc [word FOCUS tense = infinitive_absolute] ] ] Saved Query Results Information on this query Passage Gen 1:1 Text ב רא ש ית ב ר א א לה ים א ת ה ש מ י ם וא הא Controls Name Researcher valency Oliver Glanz א ר ב Gen 1:1 Ex 23:2 1Sam 12:4 view in context ב ר א ב רא ש ית ו י ו י ו י וא הא יהו ה יהו ה 2Chron 3:4 יהו ה א לה ים א ת ה ש מ י ם Prev 1 2 3 4 5 6... 21 22 Next 313 results Date created 2013-08-25 Date last run 2013-08-25 Project Institute Reason Comments Data and Tradition VU/Eep Talstra Centre for Bible and Computing א ר ב irregular valency of לה ים א needs to be combined with query on

datanetworkservice.nl/qaa!

SHEBANQ: implementing Q-a-A

5 (of 5) Towards new tools LAF tools or generic graph algorithms Emdros tools or generic database technology Linked Data tools or generic SPARQL queries

Side conditions development close to the researchers preferably in their own institutions decent performance within the scale of a laptop usable to researchers that is: non-programmers persistence in mind new results will be archived and reenter the data cycle

s/g$/q/! #!/etc bc Eep Talstra Centre for Bible and Computer! thank you dirk.roorda@dans.knaw.nl slideshare.net/dirkroorda/