REACCH PNA Data Management Plan

Similar documents
An Esri White Paper June 2011 ArcGIS for INSPIRE

Geospatial Data Stewardship at an Interdisciplinary Data Center

Nevada NSF EPSCoR Track 1 Data Management Plan

An ESRI White Paper October 2009 ESRI Geoportal Technology

GIS Initiative: Developing an atmospheric data model for GIS. Olga Wilhelmi (ESIG), Jennifer Boehnert (RAP/ESIG) and Terri Betancourt (RAP)

Virginia Commonwealth University Rice Rivers Center Data Management Plan

The ORIENTGATE data platform

EXPLORING AND SHARING GEOSPATIAL INFORMATION THROUGH MYGDI EXPLORER

GIS Data Models for INSPIRE and ELF

EEOS Spatial Databases and GIS Applications

NATIONAL CLIMATE CHANGE & WILDLIFE SCIENCE CENTER & CLIMATE SCIENCE CENTERS DATA MANAGEMENT PLAN GUIDANCE

ArcGIS Online School Locator

Environment Canada Data Management Program. Paul Paciorek Corporate Services Branch May 7, 2014

Interagency Science Working Group. National Archives and Records Administration

Evolving Infrastructure: Growth and Evolution of Spatial Portals

How To Write An Nccwsc/Csc Data Management Plan

Enterprise GIS Solutions to GIS Data Dissemination

How To Use The Alabama Data Portal

ArcGIS Data Models Practical Templates for Implementing GIS Projects

CDI/THREDDS Interoperability: the SeaDataNet developments. P. Mazzetti 1,2, S. Nativi 1,2, 1. CNR-IMAA; 2. PIN-UNIFI

ArcGIS. Server. A Complete and Integrated Server GIS

Portal for ArcGIS. Satish Sankaran Robert Kircher

Standard 5: Use a consistent data management framework in accordance with internal and partner organization data standards.

Data Integration Strategies

NetCDF and HDF Data in ArcGIS

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

Final Report - HydrometDB Belize s Climatic Database Management System. Executive Summary

The Arctic Observing Network and its Data Management Challenges Florence Fetterer (NSIDC/CIRES/CU), James A. Moore (NCAR/EOL), and the CADIS team

Summary of Responses to the Request for Information (RFI): Input on Development of a NIH Data Catalog (NOT-HG )

GIS Databases With focused on ArcSDE

Integrated Information Management System, Development of Web Interface, a.k.a. Online Data Portal (ODP)

Spatial Data Infrastructure. A Collaborative Network

ICCD/COP(12)/CST/INF.5

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

Research Data Archival Guidelines

CDI SSF Category 1: Management, Policy and Standards

Data Management Plan Template Guidelines

A Web services solution for Work Management Operations. Venu Kanaparthy Dr. Charles O Hara, Ph. D. Abstract

CIP s Open Data & Data Management Guidelines and Procedures

Project Title: Project PI(s) (who is doing the work; contact Project Coordinator (contact information): information):

Standardised reporting for Natural Resource Management (NRM) Investment in NSW

Data Management Best Practices for Landscape Conservation Cooperatives Part 1: LCC Funded Science

OpenAIRE Research Data Management Briefing paper

Basics on Geodatabases

Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.

A Capability Maturity Model for Scientific Data Management

GeoNetwork, The Open Source Solution for the interoperable management of geospatial metadata

Lecture 8. Online GIS

NSF Data Management Plan Template Duke University Libraries Data and GIS Services

Census Data: Access, Mapping and Visualization

Amazon Hosted ESRI GeoPortal Server. GeoCloud Project Report

GeoManitoba Spatial Data Infrastructure Update. Presented by: Jim Aberdeen Shawn Cruise

Geodatabase Programming with SQL

The ORIENTGATE data platform

How To Govern Data Governance At The Australian Primary Health Care Research Institute

The EcoTrends Web Portal: An Architecture for Data Discovery and Exploration

Lesson 3: Data Management Planning

National Geothermal Data System and Global Geosciences Data Integration

Texas State University University Library Strategic Plan

AN INTEGRATED SOLUTION FOR MANAGING EXPLORATION DATA

HydroDesktop Overview

An Introduction to Managing Research Data

Creating an EAD Finding Aid. Nicole Wilkins. SJSU School of Library and Information Science. Libr 281. Professor Mary Bolin.

Data Management Implementation Plan

UNITED STATES DEPARTMENT OF THE INTERIOR BUREAU OF LAND MANAGEMENT MANUAL TRANSMITTAL SHEET Data Administration and Management (Public)

data.bris: collecting and organising repository metadata, an institutional case study

Introducing the Open Source CUAHSI Hydrologic Information System Desktop Application (HIS Desktop)

There are various ways to find data using the Hennepin County GIS Open Data site:

National Data Sharing and Accessibility Policy (NDSAP)

GIS and Mapping Solutions for Developers. ESRI Developer Network (EDN SM)

VGIS HANDBOOK PART 2 - STANDARDS SECTION D METADATA STANDARD

Local Loading. The OCUL, Scholars Portal, and Publisher Relationship

Symantec Enterprise Vault.cloud Overview

Unified Resource Management: The Ex Libris Framework for Next- Generation Library Services

Building and Sustaining a Strong Organization Amid Challenge And Change KPMG LLP

Business Plan for the. The Geospatial Platform. September 20, 2012 REDACTED

Transcription:

REACCH PNA Data Management Plan Regional Approaches to Climate Change (REACCH) For Pacific Northwest Agriculture 875 Perimeter Drive MS 2339 Moscow, ID 83844-2339 http://www.reacchpna.org reacch@uidaho.edu University of Idaho Regional Approaches to Climate Change for Pacific Northwest Agriculture (REACCH PNA) Updated as of: January 2013 Version 6.0 Contents EXECUTIVE SUMMARY... 2 INTRODUCTION... 2 WHAT IS DATA MANAGEMENT?... 2 REACCH DATA MANAGEMENT VISION AND STRATEGY... 4 REACCH DATA MANAGEMENT GOALS... 4 GOALS:... 5 REACCH OPERATIONAL DATA MANAGEMENT GUIDELINES... 5 INTRODUCTION... 5 DATA INCLUSION... 5 DATA RETENTION... 6 DATA FORMATS AND DISSEMINATION... 7 INPUT FORMATS... 7 Output formats... 7 Dissemination... 8 DATA STORAGE AND PRESERVATION OF ACCESS... 8 REACCH DATA MANAGEMENT ORGANIZATIONAL FRAMEWORK... 8 REACCH DATA MANAGEMENT ORGANIZATIONAL MODEL... 8 REACCH DATA MANAGEMENT TEAM STRUCTURE... 10 DATA MANAGEMENT DATA FLOW... 10 REACCH DATA MANAGEMENT ROLES AND POLICY... 11 Data Management Policy and Access Standards... 12 METADATA POLICY STANDARDS... 13 WHAT IS METADATA?... 13 WHY IS METADATA IMPORTANT?... 13 REACCH DATA MANAGEMENT PROPOSED DESIGN AND ARCHITECTURE... 14 DESIGN OVERVIEW... 14 CONCEPTUAL/SEMANTIC DESIGN... 15 LOGICAL/PHYSICAL DESIGN... 16 http://www.reacchpna.org ~ 1 ~ http://www.fb.com/reacchpna

Executive Summary The following REACCH data management plan is a comprehensive strategic and planning document that outlines the steps for building and maintaining an environmentally-focused data analysis system. REACCH, which stands for Regional Approaches to Climate Change is a research team composed of over 40 scientists, dedicated to understanding how climate alterations are affecting our Pacific Northwest agriculture. The REACCH Data Management Plan has four key goals that are To develop an extensible and decentralized data management framework for climate based research; To promote the decentralization of data storage by instantiating all data access and rules in a programmatic application layer environment; Allowing researchers and stakeholders to analyze and examine climate based scientific research thru the use of easily accessible open source compiled and web based data tools Provide a library of climate based tools that allow the user to analyze and examine data from multiple sources with no need to download any data. To develop a data policy framework for organizing and structuring how data is managed. Introduction What is Data Management? "Data Resource Management is the development and execution of architectures, policies, practices and procedures that properly manage the full data lifecycle needs of an enterprise. http://www.reacchpna.org ~ 2 ~ http://www.fb.com/reacchpna

Data management is a changing and dynamic discipline that has evolved extensively over time. With the development of personal computers and the advancement in the use of database technologies, web development, and data communication protocols (XML) data management has become an essential component to any large, diverse, data intensive project (citation, citation, citation) Data management in scientific research has had considerable growth with many funding agencies recognizing the importance of data management as part of an overall research strategy (citation). For example, both the National Science Foundation (NSF) and the National Institute for Health (NIH) have data management plan templates and encourage each research proposal to have a data management plan. The REACCH program is a prime example of a diverse project that requires data management structure, guidelines, and policy. With over 50 scientists, staff, policymakers, and hundreds of stakeholders clearly describing and understanding our data management vision, plan, and policy is essential for success. Figure 1. REACCH Data Management flow http://www.reacchpna.org ~ 3 ~ http://www.fb.com/reacchpna

REACCH Data Management Vision and Strategy REACCH s data management vision is focused on three aspects: 1. Modularity 2. Flexibility 3. Extensibility Modularity. REACCH s scientific efforts are diverse. With scientists from areas including cropping systems, soils, biotics, education, biological engineering, atmospheric science, climatology, geography, and the like a modular data approach indicates an effort that is somewhat standardized and can be used interchangeably amongst groups. Metadata cataloging is an example area where modularity can be helpful in reducing time and effort by developing a modular structure that can be stamped out again and again with alterations. Flexibility. This is a changing, moving project. Scientists are exploring ideas and coming up with new and innovative ways to better understand our climate in the northwest. From a data management perspective, we want to be flexible in terms of how we develop our processes for data uploading, how we expose data, what applications we create, etc. The data management plan s purpose is to lay down a foundation for data management development but to additionally be flexible enough to change as necessary when information or circumstances change. Extensibility. To be extensible means take into consideration the fact that REACCH will change and grow over time. We want our data management effort, our data structure, our metadata, to be formulated in a way that ensures that it remains extensible. REACCH Data Management Goals The REACCH data management goals and associated objectives are a high level view of our strategic data management direction. Our hope is that there is a direct association between: REACCH overall program goals; REACCH data management goals; REACCH data management objectives; and REACCH data management tasks as part of implementation and sustainability. http://www.reacchpna.org ~ 4 ~ http://www.fb.com/reacchpna

Goals: 1. To develop an extensible and decentralized data management framework for climate based research. 2. To promote the decentralization of data storage by instantiating all data access and rules in a programmatic application layer environment. 3. Allowing researchers and stakeholders to analyze and examine climate based scientific research thru the use of easily accessible open source compiled and web based data tools. 4. Provide a library of climate based tools and information that allow the user to analyze and examine data from multiple sources with no need to download data. 5. To develop a data policy framework for organizing and structuring how data is managed, that is clear to stakeholders, researchers, and the public. REACCH Operational Data Management Guidelines Introduction Development of a robust data management effort requires guidelines on how data will be used and managed over time. Particularly in an environment where data is being continuously added, analyzed, removed, and changed there needs to be a set of processes and rules that define how this information will be managed not just during the grant time period, but after the conclusion of the grant. These operational guidelines describe steps necessary for operational management, and refer to additional informational collection documents that need to be in place to record said information. Data Inclusion Data inclusion guidelines define which datasets will be incorporated into the REACCH repositories. From a broad perspective, REACCH datasets can be grouped into the following areas: http://www.reacchpna.org ~ 5 ~ http://www.fb.com/reacchpna

1. Raw unprocessed data 2. Processed data 3. Analyzed data 4. Published papers and presentations These functional groupings of data show how the variety of datasets and information being collected are related in several meaningful ways (Figure 2). Literature Derived and recombined data REACCH Objective Teams Raw Data Figure 2: REACCH Semantic overview Data retention The retention of REACCH data involves the grouping of datasets into retention levels, as described below: http://www.reacchpna.org ~ 6 ~ http://www.fb.com/reacchpna

Retained indefinitely Retained til end of project (YR5) Retained yearly Retained monthly Level1 Level2 Level3 Level4 The determination of retention level per dataset will be set by the owner of the data, and will be included in the appropriate metadata for the dataset. This information will then be used to segregate datasets when backup and redundancy components are put in place. Data formats and dissemination REACCH has a very diverse scientific grouping, with researchers in cropping systems, biotics, modeling, economics, and sociology. Several data input and output formats will be utilized, including: Input formats Netcdf ESRI ArcGIS shapefile ESRI ArcGIS geodatabase Excel spreadsheet Output formats ESRI based map service ESRI ArcGIS geodatabase CSV XML WMS/WFS map service THREDDS/OPENDAP protocol service http://www.reacchpna.org ~ 7 ~ http://www.fb.com/reacchpna

Dissemination Dissemination will focus on the dynamic distribution and analysis of datasets thru web services rather than data download and individual analysis. Given the complexity and size of datasets, the actual acquisition of a full dataset for analysis of an individual data type or component would not be efficient or appropriate. As such, REACCH s dissemination model will focus on the use of web server technology to allow researchers, stakeholders, and the public regardless of the technology device (computer, mobile phone) to refine data output thru information query and then allow said user to view the results of this query or then download just the information that is being requested. Data storage and preservation of access REACCH data will be stored within a redundant geospatial database structure (PostgresQL /Linux), and managed thru web service protocols (using ESRI s geoportal server for data discovery and searching). Data will be preserved thru primary and incremental backups as well as replicated to other mirrored servers attached to the University of Idaho network. REACCH Data Management Organizational Framework REACCH Data Management Organizational Model The REACCH data management organizational model is important to describe the flow of decisionmaking and for clarity in terms of how decisions regarding data management should be made. Evaluation of organization s goals and objectives, and how they fit with the proposed initiative; Description of the decision-making work flow for the program Defining of roles and responsibilities for the initiative within the organization; Clear methods that describe how management can modify/re-evaluate approaches that are not working; and Well-defined mechanisms for communicating between participating groups. http://www.reacchpna.org ~ 8 ~ http://www.fb.com/reacchpna

Figure 3: REACCH overall organizational model (2012) http://www.reacchpna.org ~ 9 ~ http://www.fb.com/reacchpna

REACCH Data Management Team Structure The REACCH Objective 8 Infrastructure and Cyberinfrastructure team (Figure 3) will serve as the lead group coordinating data management, with direction from the Environmental Data Manager. Each of the REACCH objective teams will work in coordination with Objective team 8 as each team will have assigned tasks that are part of the overall REACCH Data Management Project Plan and those tasks will be assigned in REACCH s intranet collaboration portal system - Central Desktop. Data Management Data Flow The REACCH approach (Figure 4) to the flow and management of data will utilize several key components: A internal collaboration web portal for informal information sharing, task and milestone management, and other documentation sharing; An external web site (www.reacchpna.org) for public information sharing and secure access to the REACCH data portal; A database repository (Linux operating system, utilizing a PostgresQL enterprise geospatial database) for data storage; An application web server model for data analysis, discovery, and upload/download; http://www.reacchpna.org ~ 10 ~ http://www.fb.com/reacchpna

Figure 4: REACCH Physical Architecture (2013) REACCH Data Management Roles and Policy From a REACCH overarching perspective, an overall organizational framework is needed to identify each objective team s role in the data management effort. Some common precepts that help to outline this framework: 1. Data ownership determines decision-making regarding distribution 2. Datasets can have multiple roles (Data Creator, Data Owner) 3. Objective team roles determine their REACCH data management procedures over time. http://www.reacchpna.org ~ 11 ~ http://www.fb.com/reacchpna

Data Owner Data Creator Data Owner Data Owner USD A REACCH Data Integrator NKN Inside Data Owner Data Owner Data Creator Data Creator Figure 5: REACCH Data Organizational Framework Roles Data Management Policy and Access Standards REACCH has developed a Data Management Access Policy, that outlines guidelines for external data distribution, data inclusion, and the delineation of data based on issues related to privacy and confidentiality. In addition, the REACCH Data Management Access Policy provides guidelines to REACCH researchers on: How data will be segregated based on data type; How such information will be stored and managed; Guidelines for REACCH researchers on the methods of data uploading; and Potential barriers to data inclusion and how to resolve. In addition to the REACCH Data Management Policy, REACCH data agreements have been developed for both general data access, and restricted data access that will be presented to all users of REACCH data via a web form to ensure that essential information regarding the potential use and distribution of REACCH data is communicated. All users of REACCH data will be required to acknowledge review of these data agreements, based on the type of data that is to be reviewed/analyzed/downloaded. http://www.reacchpna.org ~ 12 ~ http://www.fb.com/reacchpna

Metadata policy standards What is metadata? Metadata is descriptive information for your research data. Metadata describes said information, so we can search for this data, and find it easily using technology tools. Metadata is a collection descriptive fields, or columns of data, that define core information about a dataset that helps someone who is unfamiliar with the information to know: Who created it; Where it came from; How it was created; When it was created; and What particular components of the dataset mean. Why is metadata important? Metadata not only helps to describe the data but helps the person examining it to know where it might be found - which is often times referred to as 'discovery'. Discovery metadata is a core aspect of your dataset. REACCH is using a customized version of the ISO 19115 metadata standard. "ISO is a network of the national standards institutes of 157 countries, on the basis of one member per country, with a Central Secretariat in Geneva, Switzerland, that coordinates the system. ISO is a non-governmental organization: its members are not, as is the case in the United Nations system, delegations of national governments. Nevertheless, ISO occupies a special position between the public and private sectors. This is because, on the one hand, many of its member institutes are part of the governmental structure of their countries, or are mandated by their government. On the other hand, other members have their roots uniquely in the private sector, having been set up by national partnerships of industry associations. http://www.reacchpna.org ~ 13 ~ http://www.fb.com/reacchpna

Therefore, ISO is able to act as a bridging organization in which a consensus can be reached on solutions that meet both the requirements of business and the broader needs of society, such as the needs of stakeholder groups like consumers and users." 1 ISO 19115 and 19115-2 define the schema required for describing geographic information and services. It provides information about the identification, the extent, the quality, the spatial and temporal schema, spatial reference, and distribution of digital geographic data Standard metadata frameworks supported by REACCH As noted above, REACCH will utilize the ISO-19115 metadata standard as the core standard for data storage. Given that the REACCH effort has a strong geographic data focus the ISO19115 standard provides the most broad and extensive metadata element structure for storage of a wide grouping of data types. In addition, the ISO standard provides strong crosswalks to other metadata standards including EML, Dublin Core, Darwin Core, and other widely used metadata standards. REACCH Data Management Proposed Design and Architecture Design Overview The design of the REACCH data management effort is focused on three core areas: Development of a strong database repository for storage of data; Use of a data discovery server for the uploading and tagging of data as well as for searching, or discovering REACCH data, and The use of web application server technology (ArcGIS Server, THREDDS) to organize and display data in a web services protocol for analysis, download, and exposure of said data to other systems and repositories. The above design is an approach that is used for the collection and distribution of large, diverse datasets, particularly those that may be very large, cumbersome, and detailed in nature. To simply provide said datasets for download is, to say the least, inefficient. 1 International Standardization of Security - http://www.iso.org/iso/iss_home http://www.reacchpna.org ~ 14 ~ http://www.fb.com/reacchpna

The REACCH model is an attempt to organize and distribute data in a way that permits the user to organize how they wish to see the data, and allow server-side technology to process and select the requested information and present it for analysis, and specific download, if that is needed. Conceptual/Semantic Design The conceptual design of the REACCH data management model is seen in Figure 6 below. Figure 6: REACCH Conceptual Data Design (2013) http://www.reacchpna.org ~ 15 ~ http://www.fb.com/reacchpna

REACCH data providers will enter the system thru the REACCH web site where, based on their need, they will be directed to the REACH data discovery server. The REACCH data discovery server will be the location where users will upload their datasets, and associated metadata. This data will be deposited into the REACCH repository (Linux/PostgresQL), where at that point, a set of transformative processes will be run and placing the dataset into the appropriate location within the REACCH geospatial database. After review, approval, and publishing of loaded data - REACCH application servers with specific applications developed for functional information distribution (biotics, cropping systems, economics, education, modeling, etc) will be developed and will access the REACCH geospatial database displaying data to users for search and discovery thru the REACCH data discovery server. Logical/Physical Design REACCH is leveraging the University of Idaho s Northwest Knowledge Network (NKN - www.northwestknowledge.net) as a source for infrastructure and networking support. NKN is providing server and networking systems that allow REACCH to implement the conceptual design indicated in Figure 6. REACCH s physical design utilizes virtual server technology to build out our database repository, application servers, web server, and our www.reacchpna.org web site. All systems will additionally be redundantly mirrored at other locations utilizing the Idaho IRON high speed network. 2 2 Idaho Regional Optical Network, http://www.ironforidaho.net/ http://www.reacchpna.org ~ 16 ~ http://www.fb.com/reacchpna