Data Quality Resources in Species Occurrence Digitization



Similar documents
Research on Information Technology in Agribusiness and the Environment LAA - USP

WebBee and ViNCES. Antonio Mauro Saraiva Universidade de São Paulo Agricultural Automation Laboratory 1

RESPONSE FROM GBIF TO QUESTIONS FOR FURTHER CONSIDERATION


Data Quality Working Group

General Secretariat of the Organization of American States Unit for Sustainable Development and Environment IABIN GEF PDF B - Executing Agency

Understanding by Design. Title: BIOLOGY/LAB. Established Goal(s) / Content Standard(s): Essential Question(s) Understanding(s):

IT governance in Brazil:

Information technology to assist in conserving and using crop wild relatives and landrace diversity (the boring version without pictures)

Introduction to protection goals, ecosystem services and roles of risk management and risk assessment. Lorraine Maltby

UCM-MACB 2.0: A COMPLUTENSE UNIVERSITY VIRTUAL HERBARIUM PROJECT

Data quality Vision at SBBr Danny Vélez

BUILDING A BIODIVERSITY CONTENT MANAGEMENT SYSTEM FOR SCIENCE, EDUCATION, AND OUTREACH

Advancing Graduate Education in the Agricultural and Environmental Sciences Dr. Thomas J. Dormody, Dean of the Graduate School, CATIE

Georeferencing and Google Earth. Cyberinfrastructure: Specimen Databases: Day 5 Brigette Zacharczenko

Assign: Unit 1: Preparation Activity page 4-7. Chapter 1: Classifying Life s Diversity page 8

Digitization in the Pacific. Larry M. Page PD, idigbio Curator, FLMNH

K-12 Information and Digital Literacy

observation: the future Professor Ian Boyd, Chief Scientific Adviser, Defra

James Serra Data Warehouse/BI/MDM Architect JamesSerra.com

Roles and Importance of GBIF Participant Nodes

INDIVIDUAL COURSE DETAILS

About The Express Software Identification Database (ESID)

COMPUTER APPLICATIONS AND OFFICE TECHNOLOGIES (CAOT)

Enterprise Data Quality

The Digital Atlas of Ordovician Life: digitizing and mobilizing data for paleontologists and the public

Building Service-oriented User Agents using a Software Product Line Approach

TerraLib as an Open Source Platform for Public Health Applications. Karine Reis Ferreira

Environmental Science Overview

Using Google Analytics With PeopleSoft

What's a Digital Marketing Platform? What Isn't?

Building and Managing Analytics Teams

eesyoffice TM All-in-One Virtual Office Suite

2016 BIMF/FBIP FORUM Data Quality Control & Improvement methods. 1. Alpheus Mothapo 2. Hannelie Snyman 3. Michelle Smith Date: 12 May 2016

Distributed Data Management in Internet Map Services

Building a Data Quality Scorecard for Operational Data Governance

AP PSYCHOLOGY 2013 SCORING GUIDELINES

The news media: Writing press releases

Fatigue Life Prediction of Complex 2D Components under Mixed-Mode Variable Loading

DATA GOVERNANCE AND DATA QUALITY

Report on the Dagstuhl Seminar Data Quality on the Web

Q&A: The Impact of XBRL on Corporate Performance Management

idigbio Technology, Cloud and Appliances

o Ivy Tech DESN 105- Architectural Design I DESN 113- Intermediate CAD o Vincennes University ARCH 221- Advanced Architectural Software Applications

Building Effective Product Roadmaps. Rich Nutinsky, Conference call: (605) Access code:

Whitnall School District Report Card Content Area Domains

Card-Sorting: What You Need to Know about Analyzing and Interpreting Card Sorting Results

UNIVERSITY OF EXCELLENCE. Educating global innovative leaders

11/16/2010. An Online Patient Management System Built for Wildlife Rehabilitation. Wildlife Intake, Incident, Inventory Log Database Online Network

Roadmap to Data Integrity: Practical Data Validation, Verification, and Security Controls

2009 CAP Grant Kickoff USGS, Reston, VA May 21, 2009

Joan Thorne (Editor, Zoological Record) BIOSIS UK, 54 Micklegate, York, North Yorkshire YO1 6WF, U.K. (

IBAT (Integrated Biodiversity Assessment Tool)

Top 20 Data Quality Solutions for Data Science

Supplementary Report: 2015 Heads and Beds Levy on Institutions

Web Design Foundations ( )

Part-Time Staff Employment Postings. Missouri State University Applicant Tracking System

e-science and technology infrastructure for biodiversity research

Developing OBIS into a Tool to Provide Reliable Estimates of Population Indices for Marine Species from Research Trawl Surveys

Master Sampling Frame for Agriculture and Rural Statistics. Fred Vogel Gero Carletto The World Bank

Prof. Javier Carvajal Barriga, PhD. Beijing, September, Prof. Javier Carvajal Barriga October 2014

Integrating Technology in the Classroom

NICE INCENTIVE COMPENSATION MANAGEMENT. NICE Incentive Compensation Management

A characterization of Volunteered Geographic Information

The Musibraille Project Enabling the Inclusion of Blind Students in Music Courses

How To Shop On Dell Premier

Survey of LAC agricultural research institutes on technical information management.

DAAD Deutscher Akademischer Austausch Dienst

Future computing platforms for biodiversity science

PRINCIPLES OF DATA QUALITY

Training for Big Data

WebBee a Web-based Information Network on Bees

A pixlogic White Paper 4984 El Camino Real Suite 205 Los Altos, CA T

Domain

COS AND LAND USE PLANNING: OPEN DATA TOWARDS PLANNING

Call for expression of interest Pilot Coordination Actions (PCA)

Quick Guide YPPS Online Custom Order Process Yale

Managing software inventory

Data Quality and The Decision Model: Advice from Practitioners

o Ivy Tech DESN 102- Technical Graphics DESN 103- CAD Fundamentals

7th Framework Programme Theme 6 Environment (including climate change)

Software Advice BuyerView: Human Resources Software Report Insight into today s software buyer

Science Experiment (Grades 6-8) WISH Showcase

California Department of Fish and Game (Wildlife) GIS Data and Services

TIPS DATA QUALITY STANDARDS ABOUT TIPS

TAP into NAU. TAP is a single tool. How it works. Transfer Academic Plan nau.edu/tap

Document the IT Service Portfolio Before Creating the IT Service Catalog

Post Mortem, We Sort 'Em

Report Writing: Editing the Writing in the Final Draft

2015 Pre-TDWG Training Workshop

Global Networking of Collections WFCC and GBRCN perspectives. EMbaRC Seminar David Smith Cantacuzino Institute, Bucharest, Romania 8-9 March 2010

Proposal for Creation and Publication of Invertebrate Database and GIS Information. Submitted by:

Expanding the knowledge base for policy implementation and long term transitions

Section 1 - General Information. Section 2 - Current Insurance Program

Standards. Interactive Media, July 2012, Page 1 of 6

TOWARDS AN AUTOMATED HEALING OF 3D URBAN MODELS

Office of Institutional Research. Spring 2009 Grades in Courses Related to General Education Degree Requirements and Core Competencies

An introduction to designing reliable cloud services

Required and Recommended Supporting Information for IUCN Red List Assessments

Opening the Biodiversity Data in Finland

Transcription:

Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation Laboratory LAA Computing Engineering Dept., Engineering School Universidade de São Paulo, Brazil

Outline Background Biodiversity Data Digitizer (BDD) & IABIN Data Quality Methodology Data Quality Tools BDD Geo Tool BDD Taxon Tool Conclusion

Background Importance of Species Occurrence Data GBIF Portal IABIN Portal Data quality impacts the uses of data Location Taxonomic data domain Georeferencing Identification are two major causes of error in species occurrence data Need to improve Data Quality (DQ)

Data quality & IABIN-PTN Inter-American Biodiversity Information Network (IABIN) Pollinators Thematic Network (PTN) GEF-funded project (2006-2011) (~$180k) 11 countries in Latin America ~400,000 records Responsibilities Development of tools for data digitization and integration Data Digitization Training and support Reviewing proposals, reports, data Close contact with data owners / providers

Data Quality & IABIN-PTN Opportunities & needs Discuss digitization issues with the grantees Standards: importance and role (TDWG) Data quality: concepts Improve data quality Provide mechanisms integrated to digitization tools versus isolated tools

Biodiversity Data Digitizer (BDD) Designed for easy: Digitization Manipulation Publication Demo: Thu Rich data content FAO-GEF pollinator project

DQ Assessment Methodology What is Data Quality? Location Data Domain

DQ Management Methodology How to improve the DQ? Error prevention is considered superior to error detection

Resources to Improve DQ on BDD Tools to prevent errors on occurrence data digitization Integrated to BDD species occurrence data-entry interface BDD Geo Tool prevent location data digitization errors BDD Taxon Tool prevent taxonomic data digitization errors

BDD Geo Tool Step 1 of 3 Primary Data

BDD Geo Tool Step 2 of 3 Data Source

BDD Geo Tool Step 3 of 3 Uncertainty

BDD Geo Tool Location data form is filled

BDD Geo Tool Improved Completeness: adds data not available before (ex. lat/long, municipality) Consistency: consistent data obtained from a consistent source (avoiding errors like lat:0, long:0, municipality: New Orleans ) Credibility: associate data to a credible source (BioGeomancer, Google, GeoNames) Accuracy: better than center of mass of a region Precision: uncertainty indicator increases data fitness for use

BDD Taxon Tool Step 1 of 2 Taxonomic Name Selection

BDD Taxon Tool Step 2 of 2 Taxonomic Hierarchy Selection

BDD Taxon Tool Taxonomic data form filled

BDD Taxon Tool Improved Completeness: taxonomic hierarchy is filled from a taxon name Consistency: consistent data are obtained from a consistent source (Catalog of Life) Credibility: data associate to a credible source (Catalog of Life) Accuracy: avoid spelling mistakes / entering an incorrect taxonomic hierarchy Precision: complete scientific names suggestions

Conclusion Integrated existing techniques, tools, and credible data sources to a species occurrence data-entry tool Improved completeness, consistency, accuracy and precision of species occurrence data Error prevention in taxonomic and location data Tools available for an audience with little literacy on data digitization and DQ

Conclusion Next steps Other tools, techniques, dimensions and error patterns and domains of data quality in biodiversity are yet to be explored and added Work on error correction on existing data Spreadsheet based data correction Suggestions and collaboration are welcome!

Acknowledgements IABIN PTN Laurie Adams (P2), Mike Ruggiero (ITIS), Mike Frame, Liz Sellers and Ben Wheeler (USGS) Pedro Correa (University of São Paulo) All data grantees FAO-UNEP-GEF Pollinator project in Brazil Barbara Gemmil-Herren (FAO) Ministry of the Environment - Brazil All data grantees

Thank you Allan Koch Veiga allan.kv@gmail.com Etienne Americo Cartolano Jr etienne.cartolano@gmail.com Antonio Mauro Saraiva saraiva@usp.br Agricultural Automation Laboratory LAA Computing Engineering Dept., Engineering School Universidade de São Paulo, Brazil