1.Introduction - Opening up Italian archaeological data: a cultural problem

MAPPA Open Data Metadata. The importance of archaeological background. Francesca Anichini,f.anichini@arch.unipi.it Gabriele Gattiglia, g.gattiglia@arch.unipi.it MAPPA Lab - Dipartimento di Civiltà e Forme del Sapere - University of Pisa (MAPPA) Via Trieste 38 56126 Pisa - Italy Abstract The MOD (MAPPA Open Data), the first Italian repository of open archaeological data, was conceived by the MAPPA project for making archaeological data easily accessible to everyone and for any need. The MOD allows users to download the raw documentation and the grey literature of archaeological interventions. The MOD Metadata schema is based on Dublin Core and on ISO 19115 schemas. Each archaeological intervention is described according to a schema that defines the history of the archaeological intervention, the sources used for creating the dataset, the method and the structure of the data and the physical data relations. A particular relevance in the schema is given to the description of the methodological background. This part of the metadata schema is fundamental to translate to future generation of archaeologists the subjective part of the archaeological record, because only the comprehension of the methodological background permit a real semantic interoperability. 1.Introduction - Opening up Italian archaeological data: a cultural problem The MAPPA project set itself a highly innovative aim (Anichini et al., 2012; Anichini et al. 2013), quite revolutionary for Italian archaeology: lifting out of the archives the documents containing the data of archaeological investigations, both the raw documentation and the grey literature, and making this information easily accessible to everyone and for any need: from research to protection, from urban planning to quality tourism, or even just for simple learned curiosity. This is how the MOD (MAPPA Open Data, http://www.mappaproject.org/mod),the first Italian repository of open archaeological data was conceived in keeping with European directives regarding easy access to Public Sector data, and to Research data. Originally conceived to collect the documentation of the excavations carried out in Pisa, the MOD considerably grew as the months went by (Anichini and Gattiglia, 2012; Anichini, Ciurcina and Noti, 2013). Also in consideration of the results of a survey promoted by the MAPPA project on «Open data and Italian archaeology» (Anichini, 2013), which showed how the need to share data is strongly felt by the majority of the archaeological community, the MOD, a little at a time, has become the open data repository of Italian archaeology, where archaeologists can publish excavation data, and in 2013 it entered in the list of recommended repositories of the Journal of Open Archaeological Data. In fact entering archaeological documentation in the MOD is a publication to all effects, whose authorship is protected by a DOI (Digital Object Identifier) code and a CC BY or CC BY SA license. At first it was decided to create a 1 star repository and to transform it step by step to a 5 star one. The rationale beneath this choice was the fact that data and the possibility to circulate and spread them are the key infrastructure of the world of archaeology. Well aware that part of the (interpretative) information underlying data is often connected to the know-how of each single researcher, the MOD stemmed exactly from this precondition as well as from the certainty that data are the structure upon which historical-archaeological interpretations are based and from the essential need for data to freely and rapidly circulate in the archaeological community. The guiding philosophy was based the ecological cycle of the data produced every day regardless of their ultimate use. We are talking about the large amount of raw data that make up the documentation of every archaeological intervention: context sheets, catalogues, maps, photographs, inventories, reports, etc., which are produced and deposited in ministerial archives, regardless of whether the investigation is subsequently the subject or not of a publication. Many data, therefore, are often never published or reused for other research. In the ecological cycle of data, instead, it is good practice to use the data as much as possible. Alongside scientific publications, where the interpreted results of an investigation are reported, many new and

often unexpected applications may be created, using the same data with different purposes or to reach different interpretative hypotheses.the turning point is essentially of a cultural rather than technological nature. Data sharing is one of the few paths to be taken today that can allow knowledge to progress without having to sustain huge costs. On the contrary, it is possible to optimise the large amount of data that are produced every day and are underused. Opening also means protecting, so that widespread sharing, in a community that points towards the idea of historical archaeological heritage as a common value, can gradually ensure control, quality and preservation. To make this happen another fundamental cultural step needs to be taken: recognising intellectual authorship and the related copyrights of the person who produced the data. At the same time, it is also of extreme importance to overcome the concept of ownership, because archaeological data are not owned (as in the case of private property) by the archaeologist who produces them but are part of our collective heritage: of course, we are speaking about the raw data, not the subjective interpretations of those data, which obviously belong to their author. Without this awareness, it would be very difficult to overcome the mistrust and fear that archaeologists have of being deprived of the result of their work. 2. Basic choices Defining open data is quite simple: their essential features are coded in Tim Berners Lee s five stars and in the indications provided by the Open Knowledge Foundation. Open data must be: - complete, so that they can be exported and used both online and offline, reporting the specifications adopted; - primary, i.e. in a raw processing state, so that they can be integrated and aggregated with other digital resources; - timely and accessible, so that users can access the data quickly, using Internet protocols without the need for subscriptions, payments or registrations, and can transmit and exchange them directly via the web; - machine-readable, i.e. automatically processed by the computer, so that users are not forced to use proprietary programmes, applications or interfaces to carry out these operations; - searchable, fully reusable and that can be integrated to create new resources, applications, programmes and services, also for commercial purposes. All these features must be permanent for the entire lifecycle on the web (Anichini andgattiglia,2012). The data must also be issued with licences that respect the above features and do not limit their reuse in any way. They can be associated with licences that require recognition of authorship, but this must be free of charge. The first decision we took was to enter the data in the repository exactly as they were presented to us and so quickly place them on the web. Our decision was based on the fact that real battle to be fought in Italy was clearly to change archaeologists mentality, usually reluctant to share data. It was important, therefore, to make primary data begin to circulate as quickly as possible, regardless of their type. We decided to arrange the repository for any format of data, as long as they could be entirely downloaded. We did not impose a guideline for publication standards in the MOD but, on the contrary, left it to the authors to decide what to publish and how: a lot, a little or everything, in a more or less open format. Every archaeologist, therefore, could decide how to take part in the transformation process towards open Archaeology. Detailed indications were provided instead on quotation rules, types of licences used and legal requisites regarding the compliance of privacy and copyright laws (Ciurcina, 2013). Specific attention was devoted to providing user-friendly access and use of the repository. 3 Inside the MOD The repository was built by taking the Archaeological Data Service of York University as example. Starting from this model, we chose a very simple user interface which makes it easy to search, view and download the documentation available. Another consideration needs to be carefully considered. The documentation of an archaeological excavation represents the only trace of an operation that, given its nature, destroys the stratification it is investigating: the MOD guarantees that those documents and, therefore, the data contained in them have potentially endless digital life i.e. the life of the web.

The MOD, hosted on GNU/Linux of the Centro Interdipartimentale di ServiziInformatici per l AreaUmanistica (CISIAU) of Pisa University, was designed by Valerio Noti on an Open Source LAMP technological platform using an Apache HTTP Server, PHP 5.x scripting language and MySQL Open Source relational database. This platform, fully proven worldwide for the development of IT repositories, guarantees adequate stability, security and performances for the project purposes. The database structure focuses on the single archaeological investigation, defined by a set of information (title, author, DOI, region, location, year, main contact, introduction, overview) and by connection to diverse data sources such as text documents, images, multimedia objects, database files, geographical data, etc. A repository of files, downloadable by users, is associated with the repository, which authorised operators can enter via upload functions. From an operating viewpoint, the application which can be consulted using any browser compatible with W3C standards is made up of two distinct segments: an administrative section with confidential access and a public section. The administrative section is used by operators to enter and change the database contents. Categories (e.g. Chronology) and sub-categories (e.g. Medieval, Roman, etc.) can be managed, and an evolved HTML editor can be used for editing the Introduction and Overview sections and for uploading the files in the repository. The public segment can be directly accessed from the MAPPA project website. It can be used to consult the repository, view the record of each single dataset, carry out free full-text or categorised searches and download the files from the repository. An advanced query section is also available where users can query different fields of the database (chronology, topics, year, author, title, type of file in the repository) and carry out searches by geographical area (at the moment by Region). Particular importance was given to the scalability of the application in view of the growth in size and the issues relating to the repository s digital lifecycle. The MOD was designed to ensure easy migration to other hardware or software platforms and to highly reduce future technological adjustment activities (Anichini, Ciurcina and Noti, 2013). 4. Metadata So the new step was to provide the MOD with a standardised management of the metadata referred to the single dataset, defining a minimum set of information so as to guarantee correct use of the data. Every archaeological intervention is associated, through a metadata schema, with all the information regarding the intervention itself, the archaeographic production, and the structure and format of the digital data, following a pattern that describes the history of the intervention, the sources used, the method and the relationship with the physical data. Archiving entire archaeographic and archaeological datasets and making them readily available on the web is a novelty for Italian archaeology, which until now has dealt almost exclusively with the metadating of summary archaeological data (e.g. the repositories of CulturaItalia or SigecWeb). The MOD Metadata schema is based on Dublin Core and the ISO 19115 schemas, it uses both thesauri realised by the MAPPA project itself and thesauri from ICCD (IstitutoCentrale per ilcatalogo e la Documentazione National Institute for Catalogue and Documentation). A particular relevance in the schema is given to the description of the methodological background of the archaeological intervention: who direct the intervention, in which year, with what kind of method and so on. This part of the metadata schema is fundamental to translate to future generation of archaeologists the subjective part of the archaeological record, because only the comprehension of the methodological background permit a real semantic interoperability. The current version of the MOD proposes a schema comprising[the underlined elements are from the ISO 19115 core (M = Mandatory), (C = Conditional), (O = Optional); in italics the equivalent ICCD thesaurus]: 1. History of the investigation: 1.1 Dataset title (M) (title of investigation/dataset, free text), 1.2 Purpose of investigation (brief description of purpose and main results of the investigation), 1.3 Method (MAPPA thesaurus), 1.4Type of documentation: drawn, photographic, written, video, multimedia, 1.5 Geographical location: 1.5.1 Region (PVCR),

1.5.2 Province (PVCP), 1.5.3 Municipality (PVCC), 1.5.4.Address (PVCI), 1.6Additional extent information for the dataset (vertical and temporal) (O) 1.6.1 Chronological range, MAPPA thesaurus for all identified chronologies), 1.6.2 Chronological period 1.7 Principal Investigator/team 1.8 Year (interval or single year of investigation). 2. Sources used to create the data: 2.1 Archives queried, 2.2 Cartography used for georeferencing, 2.3 Previous investigations in the investigated area. 3. Method and structure of data: 3.1 Dataset reference date (M) (Date of creation of dataset), 3.2 Dataset topic category (M), 3.3 Data georeferencing (GAT): 3.3.1Geographic location of the dataset (by four coordinates or by geographic identifier) (C), 3.3.2 Spatial resolution of the dataset (O) 3.3.3 Reference system (O) 3.4Abstract describing the dataset (M) 3.5 List of files present and their content (name of file with extension, Distribution format (O), software used for creating the file, version, description, relations 3.6 List of assigned identifiers 3.7 List of codes used 3.8 Thesauri 3.9 Description of any conversion to other formats 3.10 Staff (all staff components with their tasks in the production of the paper or digital dataset) 3.11 Dataset authorship (curatorship of the DOI) 3.11.1 Licence 3.13 Dataset language (M) 4. Reports: 4.1 Bibliography 4.1.1 From 4.1.2 To 4.2 Place of preservation of archaeographic documentation (ICCD S+ region number; C+ municipality number) 4.3 Place of preservation of finds (ICCD S+ region number; C+ municipality number) 4.4 On-line resource (O) (URL Dataset) 4.5 Metadata: 4.5.1Metadata language (C) 4.5.2 Metadata character set (C) 4.5.3 Metadata point of contact (M) 4.5.4 Metadata date stamp (M) 5. Conclusions the importance of archaeological background We must be aware that data collection by archaeologists is partly subjective (although limited by the use of standardised procedures) and that only a full and accurate account of the methodological and scientific procedures used by archaeologists when constructing their data representation system will allow easier data integration and reuse.this means that the data collected by various researchers can be compared only by taking into account their intellectual history and individual background (Terrenato, 2006, p.19), and that it would be better to make them available timely, without seeking perfection, when the scientific community is in greater methodological harmony with whoever has produced the data (Gattiglia, 2009, p.56).from a semantic viewpoint, this means that the concept of absolute objectivity needs to be abandoned in an attempt to make subjectivity objective. Nevertheless, this requires the codification of data through scientifically shared procedures that ensure their future use.

Objectivising subjectivity means explaining the intellectual and methodological background and the expertise used for digitally codifying the data. This process, called semantic interoperability, does not provide a basis for the technical grouping of data and creates illusory superstandards that group existing standards(d Andrea, 2006, p. 120). On the contrary, since it does not alter the formalisation of data adopted by each single researcher, it ensures the codification of information on more abstract and general formal models, capable of capturing the semantics inherent in the stored data. To allow semantic interoperability, it is necessary to record the motivations and circumstances regarding the creation of a digital source, the details of its origin, content, structure and of the terms and conditions applicable to its use, both in terms of a complex source (an entire dataset) and digital object (single file).these aspects are recorded (recording allows extensiveandcontinuoususeofdatabythescientificcommunity)bycreatingmetadatawhichareusedforrecor dinghowthedatawereformed, thusmakinginformationfreelyandcorrectlyaccessible,evenacrosstimeandspace,andsimplifyingsearch,loc alisation,selectionandsemanticinteroperabilityoperations. References ANICHINI,F. (2013) MAPPA Survey: gli Open Data nell archeologiaitaliana. In: Anichini, F., Dubbini, N., Fabiani, F., Gattiglia, G. and Gualandi, M.L. Mappa. Metodologie Applicate allapredittività del PotenzialeArchaeologico. 2. Roma: EdizioniNuovaCultura. ANICHINI, F., CIURCINA, M. and NOTI, V. (2013) Il MOD: l archivio Open Data dell archeologiaitaliana. In: Anichini, F., Dubbini, N., Fabiani, F., Gattiglia, G. and Gualandi, M.L. Mappa. Metodologie Applicate allapredittività del PotenzialeArchaeologico. 2. Roma: EdizioniNuovaCultura. ANICHINI, F., FABIANI, F., GATTIGLIA, G. and GUALANDI, M.L(2012) Mappa. Methodology Applied to Archaeological potentialpredictivity.1. Roma: EdizioniNuovaCultura. ANICHINI, F., DUBBINI, N., FABIANI, F., GATTIGLIA, G. and GUALANDI, M.L. (2013) Mappa. Metodologie Applicate allapredittività del PotenzialeArchaeologico. 2. Roma: EdizioniNuovaCultura. ANICHINI, F. and GATTIGLIA, G.(2012) #MappaOpenData. From web to society. Archaeological open data testing.mappapers.2. pp. 54-56. CIURCINA, M. (2013) Parerelegalesulportale MAPPA Open Data. MapPapers. 4. pp.87-106. D ANDREA, A. (2006) Documentaionearcheologica standard e trattementoinformatico. Budapest: Archeolingua. GATTIGLIA, G.(2009) Open digital archives in archaeology. A good practice. Archeologia e Calcolatori. 20 (2). pp. 49-63. TERRENATO, N. (2006) Le misure del campionecontano! Il paradossodeifenomeniglobali e dellericognizionilocali. In: Mancassola, N.andSaggioro, F. Medieoevo, Paesaggi e Metodi. Mantova: SAP.