CDI SSF Category 1: Management, Policy and Standards Developing a Data Management Plan for Implementation: Best Practices for the Collection, Management, Storing, and Sharing of Geospatial and Non-geospatial Environmental Change Data Phase One Applicants/Principal Investigator(s): Lori Baer, USGS, Geosciences and Environmental Change Science Center, Box 25046, MS 980, Denver Ph: (303)236-1328 Email: labaer@usgs.gov Darren Van Sistine, USGS, Geosciences and Environmental Change Science Center, Box 25046, MS 980, Denver Ph: (303)236-5452 Email: dvansistine@usgs.gov Abstract: On October 1, 2012, the Rocky Mountain Geographic Science Center and the Geology and Environmental Change Science Center were reorganized into a new science center known as the Geosciences and Environmental Change Science Center (GECSC). This new center brings together scientists and programs from different disciplines, funded by many programs that collect, analyze, and archive data from the fields of geology, climate change, ecosystems, and geography, among others. The data collected are extremely diverse and there are multiple project-by-project practices for data management. The inception of the new science center is an ideal time to inventory the breadth of our data types and current data management practices and to develop an integrated plan. We propose a phased approach to develop a best practices plan to accomplish this work. Phase one will inventory the data holdings, both geospatial and non-geospatial (i.e., tabular, statistical, graphical) to better understand and organize the current data management practices in this new center, and will be achieved in 2013. Phase 2 will follow with the development of tools and services to help scientists by defining a data management workflow, including the archiving and dissemination of data. We will focus on data developed in the Environmental Change Research part of our center, which receives funding from three programs under the Climate and Land Use Change Mission Area. Total funding amount requested: $30,607 Total in-kind funding: $51,477 Datasets: An inventory of mission specific and core science center datasets will be generated by this project. Examples of datasets include the following: Cryospheric Studies - Climate and Borehole Temperature Data. Surficial geologic map of Mesa Verde National Park, Montezuma County, Colorado: U.S. Geological Survey Scientific Investigations Map 3224, 22 p. pamphlet, 1 sheet, scale 1:24,000. Geospatial Multi-Agency Coordination (GeoMAC) Wildland Fire Perimeters, 2008, U.S. Geological Survey Data Series 612. Fire Perimeters in Colorado from Landsat Imagery (2000-2009), USGS. 1
Summary: The new Geosciences and Environmental Change Science Center (GECSC) was created through the reorganization of two former science centers, the Geology and Environmental Change and the Rocky Mountain Geographic Science Centers. While this presents great opportunity for collaboration and leveraging of our science dollars, it also creates many challenges. Among those challenges is overcoming cultural and procedural differences in the ways we collect, analyze, serve, and archive our datasets, as well as understanding what datasets we actually maintain in the new center. This proposal will focus on the Environmental Change Research portion of the GECSC, which comprises ecologists, geologists, geographers, and physical scientists working on a broad array of projects that collect geospatial and nongeospatial datasets. These datasets include climate data, geochemical data, remotely sensed imagery, GIS spatial data, and modeling results, to name but a few. The inception of the new science center is an ideal time to inventory the breadth of our data types and current data management practices and to develop an integrated plan for moving forward. We propose to develop a phased approach to develop a best practices plan for inventorying and managing our data holdings, both geospatial and non-geospatial. Phase one will inventory existing data and data management practices and will be completed by August 31, 2013. Phase 2 will follow with the development of tools and services to help scientists by defining a data management workflow, including the archival and dissemination of data. We will focus on data developed in the Environmental Change Research part of our center, which receives funding from three programs under the Climate and Land Use Change mission area. The environmental change research undertaken in the GECSC represents or has the potential to represent the integrative research in the USGS. The results of our project will prototype improved data management best practices for interdisciplinary research teams throughout the USGS. CDI (Community for Data Integration) SSF (Science Support Framework) Category 1: Management, Policy and Standards Developing a Data Management Plan for Implementation: Best Practices for the Collection, Management, Storing, and Sharing of Geospatial and Non-geospatial Environmental Change Data Phase One Applicants/Principal Investigator(s): Lori Baer, USGS, Geosciences and Environmental Change Science Center, Box 25046, MS 980, Denver Ph: (303)236-1328 Email: labaer@usgs.gov Darren Van Sistine, USGS, Geosciences and Environmental Change Science Center, Box 25046, MS980, Denver Ph: (303)236-5452 Email: dvansistine@usgs.gov Other Personnel Involved: Grace Beal, USGS, Geosciences and Environmental Change Science Center, Box 25046, MS980, Denver Ph: (303)236-1802 Email: ygbeal@usgs.gov 2
Bryan Herrera, USGS, Geosciences and Environmental Change Science Center, Box 25046, MS980, Denver Ph: (303)236-5480 Email: bherrara@usgs.gov Scope: The outcome of this proposal will help move the new Geosciences and Environmental Change Science Center toward integrated collection, management, storing, and sharing of data, which will contribute to the USGS level data integration efforts. By performing a data inventory of existing geospatial and nongeospatial datasets, and implementing existing metadata collection tools for new datasets, we will set the stage for organizing future assets and the sharing of informational and scientific products. Considering the CDI Scientific Data Life Cycle Processes approach, (https://my.usgs.gov/confluence/display/cdi/scientific+data+life+cycle+model+for+the+usgs) our plan for Phase one includes all the variables that go into the Planning step of this process. With consultations with other groups, such as the Central Energy Resources Science Center (CERSC), we will consider what has already been done, and put in place, as far as infrastructure and existing hardware and software requirements, seek examples of inventory lists of geospatial and non-geospatial data, and implement a searchable catalog system to initially benefit internal science needs, with a specific directory structure and naming convention for all products. Because of our recent Science Center merge, this is the time when data is being migrated and hardware is being consolidated. There is no better time for this effort. Planned tasks include: - Conduct a data inventory survey, considering data types, naming conventions, and how data is currently being stored by engaging the data owners with questionnaires and face-to-face interviews; utilize BASIS+ to identify core tasks of each program and the output from each task - Evaluate existing infrastructure and hardware, and analyze the capabilities for data management, storage, distribution, and access - Verify what metadata collection standards are in place and being utilized; consider metadata training for those who need it; a small data team will act as liaisons to scientists to help gather, organize, and archive existing datasets and documentation; create appropriate metadata templates where none exist - Consider the use of the Data Exit Survey for USGS Scientists leaving the workforce in 2013 to make sure no scientific data or product goes undocumented - Explore existing science data management web applications and web mapping tools for solving data management problems, and data distribution, including SharePoint, the Geo Data Portal (GDP) and Science Base; reference and implement known data management plans i.e. Burley/Smith - (https://my.usgs.gov/confluence/download/attachments/59408387/cdi_datamgmtplanframewor k_proposal.pdf?version=1&modificationdate=1318536764774) 3
Technical Approach: Our technical approach will include: 1) Investigate and document existing infrastructure and services that resulted from the consolidation of two science centers 2) Identify via BASIS+ and interviews, the core datasets that currently exist 3) Build a data inventory list with consideration to data type, size, access, format, owner, and organization 4) Consider library catalog structure after inventorying all data, spatial vs. tabular vs. graphical, and how to organize, manage, and store the different types of data 5) Create a master directory structure, with subdirectories for different programs and projects 6) Create an authoritative source or central library for datasets; work with Center IT personnel on how to create backups of most current version of datasets and weekly backups 7) Investigate existing applications, such as SharePoint document services and other science center approaches; consider security issues with data storage and acquisition and conform to the USGS Data Release policy in progress (http://www.usgs.gov/laws/info_policies.html) 8) Document overall assessment of Phase one procedures as to what worked and what didn t work. Project Experience: We plan on consulting with members of the CDI and CERSC groups, as well as other Southwest Region science centers with expertise, who have implemented some aspects of the data management planning phase, which includes the knowledge of database, IT support and implementation, and GIS skills. With the merging of two science centers, we have access to existing knowledge and experience in order to accomplish our goals for Phase one of the data management process. Existing metadata templates will be used, as well as the creation of new templates for data that does not fit existing template requirements. Existing web services will be utilized and customized in Phase 2 for querying and distributing our managed spatial and non-spatial datasets. Commitment to Effort: Development of a protocol for the collection, organization, and management of scientific data within two merging Science Centers with a broad science portfolio, will benefit scientists in other integrated centers, with increased awareness of opportunities for collaboration, the ability to cite published work, contributions to scientific decision making, and the cross checking of duplication of effort among USGS scientists. We will explore data repositories such as ScienceBase and the National Digital Catalog (http://datapreservation.usgs.gov/catalog.shtml). This effort is strongly supported by our science center chief and management staff. Subsequent phases for implementing data management best practices will be considered with the understanding that said funding may not be derived from the CDI. 4
Budget: GROSS AMOUNTS @ 21.815 OH RATE Budget Category 1. SALARIES Federal Funding Requested Matching Funds 'Proposed' Lori Baer 2 PP $11,880.62 3 PP $17,821.53 Darren Van Sistine 2 PP $10,262.91 3 PP $15,394.98 IT support - Herrera 1 PP $4,727.64 1 PP $4,727.64 Programming support - Beal 1 PP $1,689.57 1PP $1,689.57 CERSC Personnel 1 wk $3,315.80 Subtotal Salaries $28,560.74 $42,949.53 2. FRINGE BENEFITS n/a n/a 3.FIELD EXPENSES One person traveling for CDI meeting Denver to Reston 4 days Airfare $602.98 Lodging $979.39 Per Diem $317.94 Vehicle Cost $146.18 Subtotal Field Expences $2,046.49 4. OTHER DIRECT COSTS Publications OFR $6,090.75 Develop data survey $2,436.30 Subtotal $8,527.05 GRAND TOTAL $30,607.24 $51,476.58 Timeline: Three deliverables for completion of this proposal: 1) Data questionnaire and data inventory complete 12 weeks from time of award 2) Evaluation of data to be archived in local repository, including metadata verification 16 weeks from time of award 3) Open File Report of findings and recommendations for other science centers to follow and a baseline implementation strategy for follow up phases of data management practices 20 weeks from time of award (August 31, 2013) 5