Data Archiving and Networked Services! SHEBANQ! System for HEBrew Text: ANnotations for Queries and Markup! Dirk Roorda - researcher @ DANS,TLA! TEI pre-conference workshop: Query! Roma 2013-10-01!
Overview 1. Context: text, data, research in Hebrew Bible 2. MdF database model, MQL query language 3. Sharing the research process 4. CLARIN-NL project: SHEBANQ 5. Towards new tools
1 (of 5) Context Text, data and research in the Hebrew Bible
VU Amsterdam Eep Talstra Centre for Bible and Computer text + linguistic features database + research questions => database => publications 4!
2 (of 5) MdF and MQL MdF database model MQL query language
Monad Object Feature 1977-now: Eep Talstra et al. ECA, WIVU. Print reference (Google Books) 1988-1994 Crist-Jan Doedens: Text Databases One Database Model and Several Retrieval Languages (google books reference) 2004: Ulrik Petersen. Emdros - a text database engine for analyzed or annotated text. COLING
sentence objects 84383 11.. 1 clause_atom objects clause_atom_number=1 clause_atom_relation=0 clause_atom_relation_daughter_tense=unknown clause_atom_relation_kind=no_relation clause_atom_relation_mother_tense=unknown clause_atom_relation_preposition_class=none clause_atom_type=xqtl indentation=0 34680 11.. 1 Monad-Object-Feature phrase objects 59559 11.. 5 phrase_atom objects subphrase objects 40770 77638 11.. 9 11.. 5 77637 7.. 5 lexeme_utf8= תי שאר old_lexeme_utf8= תי שאר vocalized_lexeme_utf8= תי שא ר surface_consonants_utf8= תי שאר רא ש י= graphical_lexeme_utf8 word objects 12 11 10 9 8 7 6 5 4 3 2 monads (atomic chunks of text) 11 10 9 8 7 6 5 4 3 2 1 standard edition text ית א שׁ ר בּ ים ה. א א ר בּ ת א ם י מ שּׁ ה ת א ו ץ ר א ה
MQL query language topographic, i.e: query expression =~= query results w.r.t. sequence embedding
SELECT ALL OBJECTS! WHERE! [Clause! [Phrase! [Word FOCUS! " " "part_of_speech = verb AND! " " "lexeme = "FJM["]! ]!..! [Phrase FOCUS! " "phrase_function = Objc OR! " "phrase_function = IrpO! ]!..! [Phrase FOCUS! " "phrase_function = Objc OR! " "phrase_function = IrpO! ]! ]! Example
3 (of 5) Sharing Problem: how to share (intermediate) results of analysis Solution: saving queries as annotations
Lock - in Stuttgart Electronic Study Bible massive dissemination But not the right dynamics for tool development scholarly-bibles.com!
a short history: 2012 Leiden: international workshop biblical scholarship Desiderata: new tool development text transmission (variants) linguistic analysis (features) even combined! leiden lorentz!
Hebrew Text in the Archive urn:nbn:nl:ui:13-ikjj-ek!
Hebrew Text in the Archive urn:nbn:nl:ui:13-ikjj-ek! how can the people annotate our work?!
Research Data Cycle
! Research Data Cycle religious communities Text transmission, tradition, editorial processes Free University, theology faculty, server department, WIVU project theol. scholars theol. scholars NWO projects! enlightened lay people scholarlyibles.com!
Research Data Cycle linguists religious communities CLARIN SHEBANQ Research Data Archiving DANS Text transmission, tradition, editorial processes Free University, theology faculty, server department, WIVU project! dig. hum theol. scholars Wider public: Annotation, Query Saving, via Linked Data theol. scholars comp. hum NWOprojects! projects NWO enlightened lay people lyr a l o sch m! o c. s ible
3 (of 5) Sharing (c t d) Solution: Queries As Annotations
queries-as-annotations model! query! example! body! targets! query instruction! query results in context! SELECT ALL OBJECTS WHERE [Word FOCUS part_of_speech = verb AND!["שים" = lexeme ו י ש כ ם י ע ק ב ב ב ק ר ו י ק ח א ת ה א ב ן א ש ר ש ם מ ר א ש ת יו ו י ש ם א ת ה מ צ ב ה ו י צ ק ש מ ן ע ל ר אש ה annotation! published query! qu123 (just an identifier)! metadata! researcher, date created, date last run, research question! Janet Dyk 2004-02-16 2012-01-27 Can the verb ש ים have a double object? - article in Foundations for Syriac Lexicography!
OpenAnnotation openannotation.org!
provenance
motivation
demonstrator datanetworkservice.nl/qaa!
demonstrator datanetworkservice.nl/qaa!
demonstrator datanetworkservice.nl/qaa!
demonstrator datanetworkservice.nl/qaa!
demonstrator
demonstrator
demonstrator
demonstrator still missing: saving queries not semantic-web-enabled sustainability
4 (of 5) Project CLARIN-NL: SHEBANQ: (A) Curation (B) Demonstrator
SHEBANQ System for Hebrew Text: ANnotations for Queries CLARIN-NL project data curation: LAF demonstrator: query saver s/g$/q/! #!/etc bc
Linguistic Annotation Framework ISO 24612:2012 Nancy Ide, Laurent Romary
feature definitions
feature definitions
TEI ISO-FS schema
dcr:datcat on <fdecl> versus <f> 26,225,966 <f>s!! 2.5 GB redundant attribute material!!
5 (of 5) Project CLARIN-NL: SHEBANQ: (B) Demonstrator
אמ ר ח ז ק י ה ו מ ה א ות כ י א ע ל ה ב ית אמ ר ח ז ק י ה ו מ ה א ות כ י א ע ל ה ב ית אמ ר ח ז ק י ה ו מ ה א ות כ י א ע ל ה ב ית רץ רץ ת ת select all objects where Edit Query [clause [phrase phrase_function = Objc [word FOCUS tense = infinitive_absolute] ] ] Executing query... Query executed Execute Query results Save this query Passage Gen 1:1 Text ב רא ש ית ב ר א א לה ים א ת ה ש מ י ם וא הא Controls א ר ב Name valency Researcher Oliver Glanz Gen 1:1 Ex 23:2 1Sam 12:4 view in context ב ר א ב רא ש ית ו י ו י ו י וא הא יהו ה יהו ה 2Chron 3:4 יהו ה א לה ים א ת ה ש מ י ם Prev 1 2 3 4 5 6... 21 22 Next 313 results Date created 2013-08-25 Date last run 2013-08-25 Project Institute Data and Tradition VU/Eep Talstra Centre for Bible and Computing א ר ב Reason irregular valency of Comments לה ים א needs to be combined with query on Cancel Save Publish
אמ ר ח ז ק י ה ו מ ה א ות כ י א ע ל ה ב ית אמ ר ח ז ק י ה ו מ ה א ות כ י א ע ל ה ב ית אמ ר ח ז ק י ה ו מ ה א ות כ י א ע ל ה ב ית רץ רץ ת ת Query Info MQL query text Persistent Identifier urn:nbn:nl:ui:13-scpm-ji select all objects where http://www.persistent-identifier.nl/?identifier=urn... [clause [phrase phrase_function = Objc [word FOCUS tense = infinitive_absolute] ] ] Saved Query Results Information on this query Passage Gen 1:1 Text ב רא ש ית ב ר א א לה ים א ת ה ש מ י ם וא הא Controls Name Researcher valency Oliver Glanz א ר ב Gen 1:1 Ex 23:2 1Sam 12:4 view in context ב ר א ב רא ש ית ו י ו י ו י וא הא יהו ה יהו ה 2Chron 3:4 יהו ה א לה ים א ת ה ש מ י ם Prev 1 2 3 4 5 6... 21 22 Next 313 results Date created 2013-08-25 Date last run 2013-08-25 Project Institute Reason Comments Data and Tradition VU/Eep Talstra Centre for Bible and Computing א ר ב irregular valency of לה ים א needs to be combined with query on
datanetworkservice.nl/qaa!
SHEBANQ: implementing Q-a-A
5 (of 5) Towards new tools LAF tools or generic graph algorithms Emdros tools or generic database technology Linked Data tools or generic SPARQL queries
Side conditions development close to the researchers preferably in their own institutions decent performance within the scale of a laptop usable to researchers that is: non-programmers persistence in mind new results will be archived and reenter the data cycle
s/g$/q/! #!/etc bc Eep Talstra Centre for Bible and Computer! thank you dirk.roorda@dans.knaw.nl slideshare.net/dirkroorda/