Data quality Vision at SBBr Danny Vélez 4th workshop SiBBr: data quality and ecological data 25-29 August 2014
SiBBr: national level Community Universities NGO s Government agencies Research centers Citizens SiBBr secretariat
Current focus Focus Species fact Species sheets occurrence Species records occurrence records METADATA Ecological data Genomic data
Publication workflow at SiBBr Structuring and Publication Steps Data Structuring with Darwin Core Harvest, aggregate and data-store steps Visualization through portals Robertson et al. (2014) The GBIF Integrated Publishing Toolkit: Facilitating the Efficient Publishing of Biodiversity Data on the Internet. PLoS ONE 9(8): e102623.
Considerations in addressing data issues at SiBBr
Considerations in addressing data issues at SiBBr 1. Data quality, cleaning and correction are the responsibility of the community and cannot be assigned to any one agent in the process. 2. Museum records and even observation occurrences, are all aggregations of records taken at different times and by different collectors. In the digital world, the flow of biological observations can go from observer to end user through multiple digital aggregators. At any node in the flow, errors can be detected, introduced or addressed. Belbin et al., 2013
Considerations in addressing data issues at SiBBr 3. Addressing data errors will involve the aggregators improving their ability to detect and help other agents in the chain to correct errors. These organizations have a responsibility in collaboration with community- to train in data management and deliver automated mechanisms wherever possible to facilitate new processes and tools that will support several aspects of data quality. Belbin et al., 2013
Considerations in addressing data issues at SiBBr 4. We need an effective way to support and stimulate the feedback of data users and experts. 5. We need to improve our historical approaches that managed only paper-based information to one where all relevant information is generated, managed and curated in a fully interlinked form.
Data paper What it is: Scholarly publication of searchable metadata document describing a dataset, or a group of datasets Provide a mechanism of data quality control Promote and publicize existence of data Provide scholarly credit to data publishers through citable journal publications Describe the data in a structured human-readable form Vishwas, 2011
Data quality activity of SiBBr secretariat? Where? how? Structuring and Publication Steps Harvest, aggregate and data-store steps Visualization through portals Giving and receiving training Creating, using and adapting mechanism and tools of data quality and cleaning for publishers, users and SiBBr secretariat National and international cooperation with others agents of the publication workflow SiBBr secretariat is not going to do data cleaning on the data
Advances Training and manuals Principles of Data Quality translation to Portuguese Darwin Core translation to Portuguese Starting the training program on data structuring and publication Workshops on data quality Tools and mechanisms Tool for data sets completeness and quality visualization Data portal allowing data visualization and feedback
For the next future Consolidate both channels and mechanism for national and international cooperation Consolidate the SiBBr training programs including workshops on data structuring, data paper and data users Development and adaptation of data quality tools and manuals for according with the Brazilian community necessities
Data quaility vision at SiBBr As part of the national and international community interesting in biodiversity data, the SiBBr will contribute to the structuring, integration, publication and using of biodiversity data of quality. In the next years SiBBr will become a referent in the implementation of mechanisms, tools and training process to help other agents of the publications workflow to address data quality issues at the national and international levels.
Thanks! Danny Vélez dvelez@lncc.br dannyvelezv@gmail.com