Stewarding Big Data: Perspectives on Public Access to Federally Funded Scientific Research Data Big Data and Big Challenges for Law and Legal Information Georgetown Law Library January 30, 2013 William G. LeFurgy Library of Congress @blefurgy
My Perspective on Big Data Stewardship Realizing full potential from big data depends keeping it accessible over time Accessibility depends on life cycle management, most especially preservation Advocate for collaborative, distributed model Understand that stewardship has a different meaning for many data creators
White House RFI Input Instructive Request for Information on Public Access to Federally Funded Scientific Research Data, Nov. 2011 Interested individuals and organizations to provide recommendations on approaches for ensuring long-term stewardship and encouraging broad public access Input provided to inform development of agency policies and standards for managing big data
Summary of Responses 118 individual responses 50% from academic research departments, professional organizations 35% from libraries, repositories and allied organizations 10% from publishers and commercial organizations 5% other Excellent (unstructured!) data set to analyze current thinking on big data stewardship
Top-Level Policy Recommendations Remarkable degree of congruence among comments Broadly allocate adequate resources for data stewardship Extend a collaborative national digital stewardship infrastructure Institute and enforce a data preservation mandate Strongly encourage policies to support secondary use, respect for data But conflicted about IP, copyright, privacy
Need: Resources Funders to include money in awards for data stewardship Need cost models, other guidance for estimating data life cycle costs Allocate expanded resources to support national data repositories
Need: National Digital Stewardship Infrastructure Leverage current institutional efforts to define best practices, tools, services Extend community of practice for data stewardship through collaborative action across disciplines Develop a skilled workforce with data stewardship expertise
Need: A Data Preservation Mandate Incentivize grant applicants to make realistic plans for data Stronger data manager requirements in application process Tie future awards to demonstrated success with data stewardship Enable direct support of PIs by data stewardship specialists
Support: Secondary Use, Respect for Data Broadly apply a citation mechanism for data sets (e.g., DataCite, DOIs) Criteria for evaluating grant applications tied to secondary use of data Give equal credit for publishing articles and data sets Develop robust metrics to track data publication and use
Muddled Picture for IP Opinions diverge about role of copyright, patents, etc., in regard to research data Commercial interests see IP as critical Many data users favor Creative Commons or public domain approach Data creators fall between these positions A significant degree of concern raised regarding privacy in connection with IRB, personal data
Next Steps Two interagency working groups within the National Science and Technology Council reviewing recommendations Groups will develop science agency policies for data dissemination and stewardship Potential for major change, as policies may have association with funding from the Federal science agencies
Websites Request for Information: Public Access to Digital Data Resulting From Federally Funded Scientific Research, http://ow.ly/epb93 Your Comments on Access to Federally Funded Scientific Research Results, http://ow.ly/epbb9 National Science and Technology Council, http://ow.ly/h87li