LAMUS & LAT Archiving software



Similar documents
The Language Archive at the Max Planck Institute for Psycholinguistics. Alexander König (with thanks to J. Ringersma)

Technology in language documentation

Sustainable Solutions for Endangered Languages Data: The Language Archive

Intera Intera Deliverable D2.3

LEXUS: a web based lexicon tool

DAM-LR at the INL Archive Formation and Local INL. Remco van Veenendaal 01/03/2007 DAM-LR

A sustainable archiving software solution for The Language Archive

Overview The Corpus Nederlandse Gebarentaal (NGT; Sign Language of the Netherlands)

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

A collaborative platform for knowledge management

BarTender Print Portal. Web-based Software for Printing BarTender Documents WHITE PAPER

ANNEX - Annotation Explorer

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Component MetaData Infrastructure

CatDV Pro Workgroup Serve r

DAM-LR Distributed Access Management

Web. Anti- Spam. Disk. Mail DNS. Server. Backup

How to use and archive data at the Archive of the Indigenous Languages in Latin America (AILLA): A workshop-presentation

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Advanced Digital Imaging

ADAM 5.5. System Requirements

Workshop Advanced GeoNetwork

IBM Cloud Manager with OpenStack

Please return this document to when complete.

Integration of Shibboleth and (Web) Applications

White Paper. Anywhere, Any Device File Access with IT in Control. Enterprise File Serving 2.0

Tool-Assisted Knowledge to HL7 v3 Message Translation (TAMMP) Installation Guide December 23, 2009

File Share Navigator Online 1

Rich Media & HD Video Streaming Integration with Brightcove

Computerized Language Analysis (CLAN) from The CHILDES Project

Features of AnyShare

VPN Web Portal Usage Guide

DAM-LR Distributed Solution. - ideas -

cbox YOUR FILES GO MOBILE! FOR MAC OSX CLIENT USER MANUAL

Communiqué 4. Standardized Global Content Management. Designed for World s Leading Enterprises. Industry Leading Products & Platform

OCLC CONTENTdm. Geri Ingram Community Manager. Overview. Spring 2015 CONTENTdm User Conference Goucher College Baltimore MD May 27, 2015

How To Understand The History Of The Web (Web)

The Language Archiving Technology solutions for sustainable data from digital fieldwork research

glibrary: Digital Asset Management System for the Grid

How To Manage Your Digital Assets On A Computer Or Tablet Device

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Feature and Technical

JReport Server Deployment Scenarios

NetBeans IDE Field Guide

How to Create a Voicethread PowerPoint Presentation

Annotation in Language Documentation

Functional Requirements for Digital Asset Management Project version /30/2006

JAMF Software Server Installation and Configuration Guide for OS X. Version 9.2

Opacus Outlook Addin v3.x User Guide

Design Proposal for a Meta-Data-Driven Content Management System

Scaling Web Applications in a Cloud Environment using Resin 4.0

JAMF Software Server Installation and Configuration Guide for OS X. Version 9.0

Step-by-Step guide for SSO from MS Sharepoint 2010 to SAP EP 7.0x

Discovery Education Integration for Higher Ed. Administrator Guide. Version 1.0 for Blackboard Learn 9.1 SP10+

Business Process Management

CLARIN-NL Third Call: Closed Call

aloe-project.de White Paper ALOE White Paper - Martin Memmel

The end. Carl Nettelblad

A Web Services Data Analysis Grid *

2012 LABVANTAGE Solutions, Inc. All Rights Reserved.

IBM Configuring Rational Insight and later for Rational Asset Manager

Populating Your Domino Directory (Or ANY Domino Database) With Tivoli Directory Integrator. Marie Scott Thomas Duffbert Duff

Flattening Enterprise Knowledge

Corpus and Discourse. The Web As Corpus. Theory and Practice MARISTELLA GATTO LONDON NEW DELHI NEW YORK SYDNEY

A framework for web-based product data management using J2EE

Mastering Advanced GeoNetwork

PostFiles. The file sharing and synchronization solution dedicated to professionals.

LabStats 5 System Requirements

ADMINISTERING ADOBE LIVECYCLE MOSAIC 9.5

GeoNetwork, The Open Source Solution for the interoperable management of geospatial metadata

JBoss SOAP Web Services User Guide. Version: M5

DeskNow. Ventia Pty. Ltd. Advanced setup. Version : 3.2 Date : 4 January 2007

LifeSize Video Center Administrator Guide March 2011


SENSE/NET 6.0. Open Source ECMS for the.net platform. 1

Web Class Configuration and Test Guide

BusinessObjects Enterprise InfoView User's Guide

Media Exchange really puts the power in the hands of our creative users, enabling them to collaborate globally regardless of location and file size.

Leveraging TEWI Platform to Enhance Scientific Collaboration on Universities

CLARIN-NL Second Open Call. Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010

IBM Systems Director Navigator for i5/os New Web console for i5, Fast, Easy, Ready

Product Navigator User Guide

Content Management Systems: Drupal Vs Jahia

About: Our Client - GFT About: equadriga Situation

Cross-domain Identity Management System for Cloud Environment

PLATFORM. Web Content Management and Digital Marketing for Higher Education. Everything You Need from a Great Enterprise CMS CONTENT MANAGEMENT

By: Richard Li March 2003

How To Create A Clarin Metadata Infrastructure

Patrick Schweizer Director of Sales Enablement May 2013

elearning Content Management Middleware

The Open Source CMS. Open Source Java & XML

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

Data Management System - Developer Guide

JAMF Software Server Installation and Configuration Guide for Linux. Version 9.2

Avid. Avid Interplay Web Services. Version 2.0

BlackBerry Enterprise Server for Microsoft Exchange Version: 5.0 Service Pack: 2. Feature and Technical Overview

WWW. World Wide Web Aka The Internet. dr. C. P. J. Koymans. Informatics Institute Universiteit van Amsterdam. November 30, 2007

Managing Microsoft Office SharePoint Server Content with Hitachi Data Discovery for Microsoft SharePoint and the Hitachi NAS Platform

Filr 2.0 Administration Guide. April 2016

Xythos WebFile Server Architecture A Technical Guide to the Core Technology, Components, and Design of the Xythos WebFile Server Platform

Transcription:

LAMUS & LAT Archiving software Daan Broeder Max-Planck Institute for Psycholinguistics The Language Archive Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands

The Language Archive - 2011 MPI for Psycholinguistics research corpora: child language, bilingualism, gesture, sign language, orpus Spoken Dutch, second learner corpora, etc. Archive for the DOBES project Hosting (and inviting) corpora for other projects in need (UNESO study: 80% of all material is endangered) DBD, NGT, Leiden Univ. language documentation corpora Donated endangered language corpora Eibl Eibersfeldt human ethology collection Maintain a metadata catalog for properly described resources from other institutes BAS, -ORAL-ROM (Univ. Florence), LR from Lund Univ, INL, other archive partners opy of HILDES and Talkbank corpora from MU Mainly annotated audio/video recordings 50 TB: 200k MD records, 250k AV resources, 200k annotation files, lexicons, sketch grammars, etc.

History Started in 2000 to try solve the mounting data chaos at the MPI for Psycholinguistics First needed proper data descriptions Archive software development linked to the IMDI metadata set for Language Resource First archive was basically a file-system with metadata descriptions and resource files Tools operating directly on the files A researcher s notebook disk was just as sophisticated

IMDI ISLE Metadata Initiative Metadata schema for Language Resources Developed from 2000 also in several EU projects ISLE, EHO, INTERA Especially multi-media/multi-modal recordings 3 XML metadata schema + special profiles for specific communities: Sign-Language, SL-acquisition, T I S S S S S T M M T T M

TLA Archive Organization Archiving formats only Metadata in XML files Relations represented by links DBs only as helpers Data safety through HSM, pushing data to TLs TLA ARHIVE S S S S S M M M M T T T T } IMDI metadata }resources language expedition age group genre sessionx media file annot. file

Archive Access Browsing/Search/Visualization WWW browser TROVE LARI Local tools - ARBIL - ELAN IMDI- Browser HTTP server resource download ARHIVE metadata annotations media files LAMUS AMS PID service Upload data All resources accessible by HTTP if authorized LOAL DATA All web-apps can be configured to use either Shibboleth or a local LDAP for authentication

Archive Administration API API API API IMDI search IMDI browser content search AMS amsdb IMDI lucene idx imdidb. corpus structure annexdb lamusdb crawler archive manager S S S S S archive LAMUS API

Why user managed deposition? Increasing costs New cheaper technologies for recording, digitization and storage causes huge increase in data quantities. Using depositor knowledge Researcher/depositor knows where to put the data in the logical structure (catalogue) of the archive. ommunication with archive managers is overhead. Offer remote archiving services Support distributed projects Stricter checking Make checks explicit Archive managers have short contracts, knowledge seems to get lost. Maximizing deposition 80 percent of all recordings is in danger (UNESO report) We want to open our archive for external depositors But cannot afford extra workload for archive managers

LAMUS LAMUS is a web-application that allows Uploading and naming individual resources (media, annotations, information files) Specifying limited metadata and mutual relations for and between resources reating relevant linguistic groupings for the data (subcorpora) LAMUS will: arry out checks for consistency and coherence: check for accepted formats etc. (configurable list) Updating databases and indexes Issue PID for the new resources and metadata records

local disk WORKSPAE ARHIVE

orpus check-out check-in cycle The Archive check out Local tools: Arbil, ELAN, Shoebox, Using Arbil Add to original after consistency check versioning modify/add/.. check in workspace using LAMUS

TLA Versioning of resources TLA versioning policy Nothing gets actually deleted Users can delete resources which are removed from the visible collection (corpus tree) but remain in the archive Users can update (replace) existing resources The new version will get a new PID Old version will be shelved but keep their PID Access to old versions is managed by the owner

AMS Access Management System Sign academic license S M M Rule 1 S S S S S M M M M Rule 2 M Rule 3 Rule 1 Rule 2 Rule 3 User role administration: archive manager, domain curator, domain manager, domain editor Set a required license Set access rules per media type: annotations, images, audio, video, info A rule sets access/denial to user/ group for type of data Special groups: all, registered user Rules have priority Inheritance of rules by descendant nodes

IMDI-Browser & Metadata Search Browse the hierarchy of corpora Inspect metadata records reate bookmarks resources IMDI-Browser showing resources Show PIDs, URLs for resources and metadata Make resource access requests Search the metadata: simple keyword, complex queries

IMDI-Browser as a jump board

http://corpus1.mpi.nl/ds/imdi_browser?openpath=mpi541199%23

Publishing resources

Regional Archives Initiative Regional Archives Initiative: ooperation of TLA/MPI-PL with other organizations interested in EL archiving They use TLA LAT archiving software Encourage local resource collecting & archiving Network of South American archives has been established and contacts with LARA were made

Data Synchronization I S S S S S S S S Logical synchronization

Data Synchronization II S S S HTTP server OSIX OSIX: complex logic to compare corpus trees and determine what is new what to replace what to add what to delete S S S S S archive API LAMUS In a cooperation with MU, OSIX is used to copy HILDES and Talkbank corpora into our archive. MU generating IMDI records on the fly from their DBs

Technical Info Java web-applications running inside Tomcat servlet container Postgress DBMS Platform: Linux Web-app frameworks: JSP, Applets, JSF, FLEX, Wicket, Works with most web browsers (Explorer, Firefox, Opera, Safari)

LAMUS & LAT Future TLA is part of LARIN and is promoting MDI, so We are planning the transition from LAMUS IMDI to LAMUS MDI We analyzed our set-up and still like the LAT fundaments e.g. file based, modularity, But we will also alleviate some current problems and inconveniences: limited metadata editing in LAMUS Insufficient provenance tracking of resources Better handling of download/modify/upload cycle Better integration with other (LAT) archives and infrastructures.

THANK YOU FOR YOUR ATTENTION

Thank you for your attention