Data Curation for the Long Tail of Science: The Case of Environmental Sciences
|
|
|
- Damon Alexander
- 10 years ago
- Views:
Transcription
1 Data Curation for the Long Tail of Science: The Case of Environmental Sciences Carole L. Palmer, Melissa H. Cragin, P. Bryan Heidorn, Linda C. Smith Graduate School of Library and Information Science University of Illinois at Urbana-Champaign {clpalmer, cragin, heidorn, Abstract. Universities and consortial groups need to rationalize how they invest in and manage their growing data assets. While some sciences have organized sharing and deposit activities around standardized disciplinary repositories, the data curation and stewardship needs of sciences that rely on smaller, researchlevel data collections are less well understood. This paper outlines a research agenda, and a foundational study of data practices and needs of a loosely organized group of environmental scientists, to advance our understanding of the potential for inter-institutional data coordination to promote and preserve the long tail of science. Preliminary findings of the survey component of the study will be reported. Keywords: data curation, data management, research collections, environmental sciences, ecoinformatics 1 Introduction The success and promise of national data centers, such as those for Arabidopsis and other model organisms, have demonstrated the critical role of reference level data initiatives. At the same time, resource collections are maturing in some fields, as evidenced by efforts such as the Biomedical Informatics Research Network (BIRN) ( and the Global Biodiversity Information Facility (GBIF) ( [7]. But, we do not yet have a good model for coordinating the large proportion of small-scale scientific projects producing research level data collections. 1 We theorize that if we were to plot data collections by size we would see a moderate number of very large collections and then a long tail of smaller size collections. We can view this as the long tail of science data. These long tail collections in aggregate are highly heterogeneous and tend to be isolated in scientists 1 For an explanation of the distinction among reference, resource, and research level collections, see the National Science Board s (2005) report, Long-lived digital data collections: Enabling research and education in the 21st century, (
2 offices and laboratories, yet they account for a substantial portion of the data assets at any given research university. Anderson [1] coined the long tail to describe the power of collective small business markets, and later Dempsey [3] aptly applied the concept to libraries, noting that success of this model requires that consumers (or readers) have access to and be aware of long tail products (materials), and that services to match supply with demand must also be in place. We believe this model also applies to scientific data. Long tail sciences generate small but numerous data collections. The questions are how to unleash the market potential of these data collections and lower the barrier to access and reuse. This requires much more than putting the information on the web. Data curation and stewardship are needed to manage and add value to the data collections produced by long tail science, and to facilitate their integration with other coordinated data collections. Karasti, Baker, & Halkola make a valuable distinction between data curation and data stewardship, stating that they have different views about the nature of data, their life cycles and relations with their environments of science conduct [5, p. 352]. They argue that curation activities characterized in the e- Science literature focus on ingest, archive and delivery, whereas stewardship activities span data planning to sampling, from data archive to use and reuse including both data care and information infrastructure work [5, p. 352]. While we hold that curation includes a much broader spectrum of activities, both data curation and stewardship are necessary to maintain the long-term usefulness of data. Importantly, while large-volume and more homogeneous data collections are now relatively well curated (such as in astronomy and seismology), the more specialized and heterogeneous, but small and numerous, collections are not being well served. We believe there is a need for a mix of data management solutions to address the range of collections, from large, disciplinary collections with relatively homogeneous data to cross-institutional, centralized or distributed heterogeneous collections. Institutional repository (IR) efforts at a number of universities are beginning to explore ways to support local researchers and laboratories with data curation and management. Just as collaborative models will be necessary for collection development activities [2], we believe that coordinated or consortial IR initiatives can provide the economy of scale needed for data curation and stewardship services to improve access, preservation, and use of research-level, long-tail data collections. 2 Research Questions To better understand the potential of cross-institutional curation for research level data collections, the Data Curation Education Program (DCEP) at the Graduate School of Library and Information Science (GSLIS) is undertaking a series of studies. The initiative includes a study of IR development models and a project to develop data curation profiles across scientific research domains. Over the long term our studies aim to answer the following research questions: How long is the tail? Which
3 disciplines or research areas are tail-dependent, requiring high levels of data integration? What is it about the data or the science that makes them less likely candidates for resource or reference federations? How can data collections best be represented, in terms of collection granularity and data description, to support integration and reuse? How can research level collections best share and exchange data with existing resource and reference level collections? And, ultimately, what is the impact of accessibility to the long tail of data for the conduct of science? In this paper we focus on one project the DCEP is conducting in cooperation with the Environmental Council (EC) at the University of Illinois at Urbana-Champaign (UIUC) to investigate the data curation needs among the Faculty of the Environment, a campus-level organization that works to build the University's capacity for leadership in environmental discovery, learning, and public engagement and promote interdisciplinary discovery and learning. The Faculty of the Environment consists of approximately 400 members from all colleges and most departments at the University; they have an impressive variety of research interests and are dedicated to environmental excellence. Many EC researchers participate in long tail of science projects, collecting and analyzing data that is essential to solving critical problems related to climate change, energy, and ecology. The DCEP and the EC are working together to understand how current data sets or collections are being used across UIUC, and the nature and extent of associated data management practices and problems. This study will serve as the foundation for further investigations across regional institutions to address a more specific set of research questions related to environmental science curation and stewardship: What facets of data are of particular value to environmental and ecoinformatics research questions and projects? How can curation of data sets be extended and refined to encourage integration and reuse for ongoing ecological and environmental research? What are the collecting and reuse relationships among local research collections and disciplinary and national collections? How can curation encourage better data exchange among research, reference, and resource collections? 3 Research Approach The study uses complementary techniques to gather data from research scientists affiliated with the EC. A campus-based survey is currently underway, which is being pre-tested with representative earth scientists and sociologists. This will be followed with a pilot test of the revised survey on a larger sample of 8-12 researchers to allow for fine-tuning of the survey questions for web delivery. The remaining (approximately) 390 Faculty of the Environment will be invited by to participate in the survey. One question on the survey will ask for volunteers to participate in follow-up interviews. Semi-structured interviews will be conducted with this subsample. The two techniques will support gathering first broad, and then more focused details on the specifics of respondents data management activities and needs.
4 Preliminary results will be reported on data types, current in-house data management activities and outsourcing, how and when data are used and shared, data archiving plans, as well as how data curation professionals can contribute to research operations and the management of valued data for long-term use. 4 Environmental Sciences as Exemplar Case Because environmental sciences are multidisciplinary and rely on data collected in various ways to answer a broad range of research questions, they can profit greatly from more prudent management of the data resulting from long-tail science. Long Term Ecological Research (LTER) sites and the National Ecological Observatory Network (NEON) are projects where long tail data management is a critical issue. Moreover, environmental data are often used by constituencies beyond academic researchers and data managers. Audiences or stakeholders in environmental research include citizen scientists, politicians and policy makers, businesses and the general public [6]. Problem-oriented research domains, such as biofuels and land use research, require a large amount of data integration and interoperability to address questions that span multiple disciplines. These areas of research stretch across scales from the molecular to the ecosystem and across landscape scales from greenhouse growing trays to large multiuse land tracts, from agricultural to urban settings. Researchers from many fields of science need to work with and understand each others data to design intelligent experimentation and inform land use planning. Some examples of research questions include, what happens to land, water and atmosphere if we replace one land use with another? What crops can produce maximum sustainable energy yield for a given environment? 5 Conclusion The results from this study will be used most immediately by the EC to develop centralized indexing, data storage, and support services to lower the barriers to retention and access. In addition, the DCEP project will use the findings as a basis for curriculum planning and course development for training data professionals. Most importantly, this preliminary study will provide insights into the data needs across the long tail of science and related data curation and stewardship requirements. Data integration and reuse in the environmental sciences will require effective data curation and stewardship [4, 5]. The EC case will serve as an exemplar of research level data collection practices, problems, and potentials, advancing the longer-term research agenda on inter-institutional data coordination discussed above. Acknowledgments. We acknowledge our co-author Bryan Heidorn for sharing his ongoing ideas about long tail science. This work was supported in part by a grant from the Institute of Museum and Library Services RE and a University of Illinois Environmental Council, Earth and Society grant.
5 References 1. Anderson, C.: The Long Tail. Wired, Issue 12.10, Oct. (2004). Available: 2. Day, M., Pennock, M., Allinson, J.: Cooperation for Digital Preservation and Curation: Collaboration for Collection Development in Institutional Repository Networks. DigCCurr2007: An International Symposium on Digital Curation, April 18-20, 2007, Chapel Hill, NC. (2007), 3. Dempsey, L.: Libraries and the Long Tail. D-Lib Magazine, 12(4), April, (2006). Available: 4. Heidorn, P.B., Palmer, C.L., Cragin, M.H., Smith, L.C.: Data Curation Education and Biological Information Specialists. DigCCurr2007: An International Symposium on Digital Curation, April 18-20, 2007, Chapel Hill, NC. (2007) Karasti, H., Baker, K.S., Halkola, E.: Enriching the notion of data curation in e-science: Data managing and information infrastructuring in the Long Term Ecological Research (LTER) network. CSCW, 15(4), (2006). 6. Van House, N. A., Butler, M., Schiff, L.: Cooperative knowledge work and practices of trust: Sharing environmental planning data sets. CSCW 98: Proceedings of the 1998 ACM Conference on Computer Supported Cooperative Work, , (1998). 7. Wooley, J.C., Lin, H. (Eds.): Chapter 4. Catalyzing Inquiry at the Interface of Computing and Biology, (pp ). Washington, D.C.: National Academies Press (2005).
Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.
Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing. Dr Liz Lyon, UKOLN, University of Bath Introduction and Objectives UKOLN is undertaking
A Capability Maturity Model for Scientific Data Management
A Capability Maturity Model for Scientific Data Management 1 A Capability Maturity Model for Scientific Data Management Kevin Crowston & Jian Qin School of Information Studies, Syracuse University July
Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer
Survey of Canadian and International Data Management Initiatives By Diego Argáez and Kathleen Shearer on behalf of the CARL Data Management Working Group (Working paper) April 28, 2008 Introduction Today,
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21)
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) Overview The Cyberinfrastructure Framework for 21 st Century Science, Engineering, and Education (CIF21) investment
Digital libraries of the future and the role of libraries
Digital libraries of the future and the role of libraries Donatella Castelli ISTI-CNR, Pisa, Italy Abstract Purpose: To introduce the digital libraries of the future, their enabling technologies and their
The Program in Environmental Studies. http://www.princeton.edu/pei
The Program in Environmental Studies http://www.princeton.edu/pei Vibrant, Multidisciplinary, and Experiential The Program in Environmental Studies (ENV) offers a vibrant, multidisciplinary forum for engaging
UMCES Draft Mission Statement, March 31, 2014
MISSION STATEMENT University of Maryland Center for Environmental Science SUMMARY MISSION STATEMENT Through its four laboratories distributed across Maryland, the University of Maryland Center for Environmental
Summary Bachelor of Environment Credential and Concentrations (For review prior to SFU student focus group participation) November 2012
Summary Bachelor of Environment Credential and Concentrations (For review prior to SFU student focus group participation) November 2012 First of all, thank you for taking the time to review this Bachelor
A Policy Framework for Canadian Digital Infrastructure 1
A Policy Framework for Canadian Digital Infrastructure 1 Introduction and Context The Canadian advanced digital infrastructure (DI) ecosystem is the facilities, services and capacities that provide the
LIBER Case Study: University of Oxford Research Data Management Infrastructure
LIBER Case Study: University of Oxford Research Data Management Infrastructure AuthorS: Dr James A. J. Wilson, University of Oxford, [email protected] Keywords: generic, institutional, software
SUMMARY MISSION STATEMENT
MISSION STATEMENT University of Maryland Center for Environmental Science SUMMARY MISSION STATEMENT Through its four laboratories across Maryland, the University of Maryland Center for Environmental Science
OpenAIRE Research Data Management Briefing paper
OpenAIRE Research Data Management Briefing paper Understanding Research Data Management February 2016 H2020-EINFRA-2014-1 Topic: e-infrastructure for Open Access Research & Innovation action Grant Agreement
THE M.SC. PROGRAMS OF THE FACULTY OF SCIENCE GENERAL INFORMATION THE SCHOOL OF M.SC. STUDIES
THE M.SC. PROGRAMS OF THE FACULTY OF SCIENCE GENERAL INFORMATION THE SCHOOL OF M.SC. STUDIES The Faculty of Science at the Hebrew University of Jerusalem invites outstanding Bachelor s-degree-level graduates
Integrated Information Services (IIS) Strategic Plan
Integrated Information Services (IIS) Strategic Plan Preamble Integrated Information Services (IIS) supports UCAR/NCAR/UCP efforts to both manage, preserve, and provide access to its scholarship for the
Yan Zhang. 110 Mistywood Circle Apt. H Chapel Hill, NC, 27514 (919) 923-7173 [email protected] http://www.ils.unc.edu/~yanz
Yan Zhang 110 Mistywood Circle Apt. H Chapel Hill, NC, 27514 (919) 923-7173 [email protected] http://www.ils.unc.edu/~yanz EDUCATION Ph.D. 2009 Information and Library Science (cognitive psychology minor),
RESPONSE FROM GBIF TO QUESTIONS FOR FURTHER CONSIDERATION
RESPONSE FROM GBIF TO QUESTIONS FOR FURTHER CONSIDERATION A. Policy support tools and methodologies developed or used under the Convention and their adequacy, impact and obstacles to their uptake, as well
Canadian National Research Data Repository Service. CC and CARL Partnership for a national platform for Research Data Management
Research Data Management Canadian National Research Data Repository Service Progress Report, June 2016 As their digital datasets grow, researchers across all fields of inquiry are struggling to manage
University of Arizona Libraries Initiates Successful Partnership with Campus Commercialization Unit: A Case Study
University of Arizona Libraries Initiates Successful Partnership with Campus Commercialization Unit: A Case Study Cynthia Elliott, Research & Learning Librarian, M.A. University of Arizona Libraries, Tucson,
Data at NIST: A View from the Office of Data and Informatics
Data at NIST: A View from the Office of Data and Informatics Robert Hanisch Office of Data and Informatics Material Measurement Laboratory National Institute of Standards and Technology Data and NIST 1
First Cycle (Undergraduate) Degree Programme in Environmental Science, Cl. L-32
First Cycle (Undergraduate) Degree Programme in Environmental Science, Cl. L-32 DURATION 3 years 180 CAMPUS Udine ADMISSION REQUIREMENTS High school graduate students are allowed to be admitted to the
Center for Urban Ecology Strategic Plan
January 2004 1 Center for Urban Ecology Strategic Plan Science and Service through Partnerships Mission The Center for Urban Ecology is an interdisciplinary team that provides scientific guidance, technical
Exploitation of ISS scientific data
Cooperative ISS Research data Conservation and Exploitation Exploitation of ISS scientific data Luigi Carotenuto Telespazio s.p.a. Copernicus Big Data Workshop March 13-14 2014 European Commission Brussels
Entering its Third Century
the University Library Entering its Third Century SEPTEMBER 2015 A long with their universities, the best academic libraries constantly adapt to the forces reshaping research, teaching, and learning. This
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21) Goal Develop and deploy comprehensive, integrated, sustainable, and secure cyberinfrastructure (CI) to accelerate research
The data landscape lessons from UK
The data landscape lessons from UK Veerle Van den Eynden UK Data Archive University of Essex Faculty of Psychology and Educational Sciences University of Ghent, Belgium 23 October 2014 UK data landscape
How To Teach Data Science
The Past, Present, and Future of Data Science Education Kirk Borne @KirkDBorne http://kirkborne.net George Mason University School of Physics, Astronomy, & Computational Sciences Outline Research and Application
Civil & Environmental Engineering
Department of Civil & Environmental Engineering Developing Leaders of Innovation At the U.Va. Department of Civil and Environmental Engineering, our faculty and students serve society s need for sustainable
Data Registry Workshop Report
Data Registry Workshop Report Background A Joint Working Group on Data Sharing and Archiving (JWG), representing major professional societies that publish ecology, evolution, and organismal biology journals,
Progress Report Template -
Progress Report Template - Project Name Project Website Report compiled by Kultur, University of Southampton (lead institution) http://kultur.eprints.org Victoria Sheppard Reporting period Oct 07 Apr 08
Digital Stewardship Education at the Graduate School of Library & Information Science, Simmons College
Digital Stewardship Education at the Graduate School of Library & Information Science, Simmons College Martha Mahard and Ross Harvey Graduate School of Library & Information Science Simmons College Boston,
Multi-domain Research Data Description
Multi-domain Research Data Description Fostering the participation of researchers in a ontology-based data management environment João Aguiar Castro Faculdade de Engenharia da Universidade do Porto / INESC
Institutes for Data Science: New York University University of Washington University of California, Berkeley
Advancing scientific discovery through collaboration across research domains Institutes for Data Science: New York University University of Washington University of California, Berkeley Data Science growing
NERC Data Policy Guidance Notes
NERC Data Policy Guidance Notes Author: Mark Thorley NERC Data Management Coordinator Contents 1. Data covered by the NERC Data Policy 2. Definition of terms a. Environmental data b. Information products
Data-Intensive Science and Scientific Data Infrastructure
Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific
How to get started with research data management training services for the academic library?
How to get started with research data management training services for the academic library? Mari Elisa Kuusniemi, Tiina Heino, Katri Larmo Helsinki University Library, Helsinki, Finland [email protected]
Environment and Natural Resources Trust Fund 2016 Request for Proposals (RFP)
Project Title: Total Project Budget: Environment and Natural Resources Trust Fund 2016 Request for Proposals (RFP) Scientific Asset Management: Digital Preservation for Future Generations Category: Proposed
NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons
The NIH Commons Summary The Commons is a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage,
Computational Science and Informatics (Data Science) Programs at GMU
Computational Science and Informatics (Data Science) Programs at GMU Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ Outline Graduate Program
Biomedical Informatics: Computer Applications in Health Care and Biomedicine
An Overview of Biomedical Date 1/19/06 Biomedical : Computer Applications in Health Care and Biomedicine Edward H. Shortliffe, MD, PhD Department of Biomedical Columbia University Asian Pacific Association
Cambridge University Library. Working together: a strategic framework 2010 2013
1 Cambridge University Library Working together: a strategic framework 2010 2013 2 W o r k i n g to g e t h e r : a s t r at e g i c f r a m e w o r k 2010 2013 Vision Cambridge University Library will
SHared Access Research Ecosystem (SHARE)
SHared Access Research Ecosystem (SHARE) June 7, 2013 DRAFT Association of American Universities (AAU) Association of Public and Land-grant Universities (APLU) Association of Research Libraries (ARL) This
The Bachelor of Science program in Environmental Science is a broad, science-based
The Bachelor of Science program in Environmental Science is a broad, science-based curriculum designed to prepare students for a variety of environmentally-related technical careers, as well as for graduate
Rhode Island School of Design Strategic Plan Summary for 2012 2017. critical making. making critical
Rhode Island School of Design Strategic Plan Summary for 2012 2017 critical making making critical executive summary This strategic plan serves as a guide for Rhode Island School of Design (RISD) over
Approach Paper: Guidelines for Climate Mitigation Evaluations. Climate-Eval community of practice Draft as of February 8, 2012 1
Approach Paper: Guidelines for Climate Mitigation Evaluations Climate-Eval community of practice Draft as of February 8, 2012 1 I. Summary This paper presents the approach for crafting guidelines for evaluating
Vanderbilt University Biomedical Informatics Graduate Program (VU-BMIP) Proposal Executive Summary
Vanderbilt University Biomedical Informatics Graduate Program (VU-BMIP) Proposal Executive Summary Unique among academic health centers, Vanderbilt University Medical Center entrusts its Informatics Center
Jochen Schirrwagen, Najko Jahn. Bielefeld University Library, Germany. Research in Context
Jochen Schirrwagen, Najko Jahn Bielefeld University Library, Germany Research in Context In the light of recent results from OpenAIREplus and from the Library perspective Seminar to Access of Grey Literature
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / -24.43%
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / -24.43% Overview The Cyberinfrastructure Framework for 21 st Century Science, Engineering,
Mission and Goals Statement. University of Maryland, College Park. January 7, 2011
Summary of Mission Statement Mission and Goals Statement University of Maryland, College Park January 7, 2011 The mission of the University of Maryland, College Park is to provide excellence in teaching,
New MSc-programme at the Faculty of Science - University of Copenhagen
CB-møde nr. 28 Aftagerpanelet Bilag 11b New MSc-programme at the Faculty of Science - University of Copenhagen Climate Change, Impacts, Mitigation and Adaptation by a cross-faculty working group: associate
PhD in Information Studies Goals
PhD in Information Studies Goals The goals of the PhD Program in Information Studies are to produce highly qualified graduates for careers in research, teaching, and leadership in the field; to contribute
UNIVERSITY OF NAMIBIA
UNIVERSITY OF NAMIBIA SCHOLARLY COMMUNICATIONS POLICY FOR THE UNIVERSITY OF NAMIBIA Custodian /Responsible Executive Responsible Division Status Recommended by Pro Vice-Chancellor: Academic Affairs and
Digitization in the Pacific. Larry M. Page PD, idigbio Curator, FLMNH
Digitization in the Pacific Larry M. Page PD, idigbio Curator, FLMNH Advancing Digitization of Biodiversity Collections (ADBC) Coordinating center for the nationaleffort to digitize natural history collections
DATABASE ZOOLOGICAL RECORD
BIOSIS PREVIEWS BIOLOGICAL ABSTRACTS BIOSIS CITATION CAB ABSTRACTS CAB GLOBAL HEALTH CURRENT CONTENTS CONNECT CHINESE SCIENCE CITATION DATABASE ZOOLOGICAL RECORD DATA CITATION SCIELO CITATION DERWENT INNOVATIONS
REGULATIONS AND CURRICULUM FOR THE MASTER S PROGRAMME IN INFORMATION ARCHITECTURE FACULTY OF HUMANITIES AALBORG UNIVERSITY
REGULATIONS AND CURRICULUM FOR THE MASTER S PROGRAMME IN INFORMATION ARCHITECTURE FACULTY OF HUMANITIES AALBORG UNIVERSITY SEPTEMBER 2015 Indhold PART 1... 4 PRELIMINARY REGULATIONS... 4 Section 1 Legal
Metrics: (1) Poorly (2) Adequately (3) Well (4) Very Well (5) With Distinction
Core Competencies Faculty name Student name Metrics: Very 1. Communication and Informatics Collect, manage and organize data to produce information and meaning that is exchanged by use of signs and symbols.
Short Report. Research and development project Communicating the concept of ecosystem services on the basis of the TEEB study
Short Report Research and development project Communicating the concept of ecosystem services on the basis of the TEEB study Researcher: Project time: Helmholtz Centre for Environmental Research UFZ August
