UVA LIBRARY SCIENTIFIC DATA CONSULTING GROUP (SCIDAC): NEW PARTNERSHIPS AND SERVICES TO SUPPORT SCIENTIFIC DATA IN THE LIBRARY Andrew Sallans Head of Strategic Data Initiatives Sherry Lake Senior Scientific Data Consultant IASSIST 2011 1 June 2011
OUTLINE Phase 1 Research Computing Lab Phase 2 Scientific Data Consulting Group 1. Data assessment interviews 2. Data management planning 3. Integration of processes with IR Partnerships Internal External Challenges Future opportunities 2
BACKGROUND ON THE UNIVERSITY OF VIRGINIA Mr. Jefferson s University Size About 14,000 undergraduate students About 6,000 graduate students About 2,000 faculty Annual research dollars FY10 $375 million DE (Ed) - $10 million DOE- $10 million DOD- $15 million NSF - $29 million DHHS - $197 million 3
Source: Men's Lacrosse NCAA CHAMPS! (by Matt Riley) 5/31/2011 photo gallery, http://www.virginiasports.com/ 4
PHASE 1: RESEARCH COMPUTING LAB Began planning in 2005. Central IT: seeking greater visibility. Library: seeking new ways to support scientific research. Collocation provided mutual benefits. Staff combined in 2006, moved to Library locations (Research Computing Lab & Scholars Lab), setup new service points and services. 5
RESEARCH COMPUTING LAB RESPONSE Aiming to provide support across the entire scientific research data lifecycle Staff with expertise in: Data Quantitative data, statistics Modeling, visualization Scientific publishing Emphasis on consulting, not drop-off services Partnership with traditional librarians to help ease transition to new support models 6
SAMPLE RCL CONSULTATIONS STS Undergrad Environmental Justice (2008) Development of technology solutions for empowering the citizen scientist Web 2.0 tools, data collection/management Data analysis Economics Graduate Student (2008/2009) Airline flight price modeling Screen scraping, data collection/management Data analysis Mountain Lake Beetle Project (2009) Mobile data acquisition/collection solution Database development/management, programming Data analysis Archiving of dissertation data (2009) EVSC student, ModelMaker 4.0 data Biology student, IDL, Matlab, R code 7
TAKE-AWAYS This is the future Heavily growing space, lots of opportunity Requires big investment and commitment, the biggest being training and priority alignment Libraries and institutions need to make decisions on what to do and what not to do It s a culture change for both libraries, institutions, and researchers 8
PHASE 2 - SCIENTIFIC DATA CONSULTING GROUP December 2009/January 2010: rethinking the model Budgetary pressures Changes in organizational priorities Emerging demands in research community Spring 2010: decision to focus on data May 2010: close of RCL, start of SciDaC 9
WHAT S HOT IN 2010? Open data: growing governmental interest in making publicly-funded research more transparent and more available (NIH, NSF) Broader critical review: greater interest evaluating original research data (Nature) Technological advances: sharing of research results easier and faster (Repositories, Web 2.0) Reuse/preservation of research data: increased consideration of the cost and value of research data and need to ensure its longevity 10
SCIENTISTS SEEKING NSF FUNDING WILL SOON BE REQUIRED TO SUBMIT DATA MANAGEMENT PLANS Press Release 10-077, May 5, 2010 Current Policy: o To advance science by encouraging data sharing among researchers o Data obtained with federal funds be accessible to the general public o Grantees must develop and submit specific plans to share materials collected with NSF support, except where this is inappropriate or impossible On or around October 2010: o All new NSF proposals will be required to include a data management plan in the form of a 2 pg supplementary document (peer reviewed) o New policy is meant to be a 1 st step toward a more comprehensive approach to data management o Exact requirements vague 11
THE CHALLENGE FOR INSTITUTIONS Data is expensive Time, instrumentation, inability to reproduce Increasing regulation Granting agencies and journals require submission Inadequate training No formal data management curriculum Preservation is not a priority For most researchers, preservation takes time away from the work that is rewarded (publication, teaching) 12
SO WHO S GOING TO TAKE THIS ON? Researchers? VPR? CIO? OSP? UL? 13
WHY THE LIBRARY? Neutral: works across the entire institution Strong in relationship building: has experience fostering discussion and relationships, and cultivates an existing support network Intellectual Property experts: has dealt with copyright, can translate to data Service-oriented: uniquely positioned as an intellectual service unit within the institution 14
GETTING STARTED Take what we learned in the RCL experience and apply it to the focused demands around data Steps: Conduct a stakeholder analysis Make a short term plan (12 months) Develop clear priorities Refine and standardize consulting methods Communicate heavily 15
STAKEHOLDER ANALYSIS (ABBREVIATED) Internal Researchers Graduate Students Grant Administrators Deans VP/CIO VPR OSP UL External Funding agencies Broader research community The Public 16
SHORT TERM PLAN Survey OSP to match grant holders with regulations Educate/engage subject librarians Build political awareness/support Build partnerships with local/national/international groups Resource requests: Staffing commitment Travel/partnership support Promotion of initiative to institution 17
CLEAR PRIORITIES 1. Data interviews/assessments 2. Response to NSF Data Management Plan (DMP) Mandate 3. Leadership on data for the Institutional Repository (IR) 18
CONSULTING ACTIVITIES Interviews/assessments Data management planning templates LOTS of documentation Constant and continuous refinement of process Focus on helping researchers improve process 19
COMMUNICATE HEAVILY Internal Inform staff of processes, priorities, and progress Keep stakeholders engaged Reach the consumers from many angles External Discuss and share experiences with colleagues at other institutions Create partnerships to share, build upon resources and experiences, collaborate on tools Networking (Twitter, LinkedIn, listserves, conference calls, conference presentations) Bottom line: this is a big culture shift, and you do have to say the same thing many times in different ways 20
PRIORITY 1 DATA ASSESSMENT INTERVIEWS Initially a means of growing awareness of consulting service and doing assessment, now a means of establishing a baseline for research data management practices with any new client Protocol involves: 60 minute interview discussion (researcher / SciDaC consultants / subject librarian) Development of a report SciDaC consultants give researchers recommendations to improve data management SciDaC consultants work with researchers to implement recommended solutions Approach has proven to be very effective thus far 21
PRIORITY 2 DATA MANAGEMENT PLANNING Highest priority of responding to and addressing support needs for funding agency requirements (ie. NSF, others) Getting a handle on data management as a means of institutional risk management Coordination of effort across institution 22
NSF DATA MANAGEMENT PLAN MANDATE Official mandate became active Jan. 18, 2011 New NSF Directorates/Divisions continue to release and specify guidelines (examples below) Education and Human Resources (EHR) Engineering (ENG) Geological Sciences (GEO) Mathematical and Physical Sciences (MPS) Social, Behavioral, and Economic Sciences (SBE) Researchers continue to be mostly unaware of the mandate and how to prepare a DMP 23
UVA SCIDAC NSF DMP RESPONSE UVa Library s Original Request Develop boilerplate for researchers to use in proposals SciDaC Group s Response No boilerplate, successful proposals need customized plans Our approach involves: Knowledge across many communities ( translational opportunities) Leadership on policy/infrastructure development Development of a template that simplifies writing the plan Principles Must be easy for researcher Must be supportable by available UVA resources/infrastructure Must be able to be followed-through on if grant is awarded 24
PRIORITY 3 INTEGRATION WITH IR Institutional repository Libra (http://libra.virginia.edu) Built upon Hydra architecture Three components: open access publications, data, and electronic theses/dissertations Working on figuring out storage and cost models to support management of big and small data from across institution s research community Will provide preservation assurance for data in form of blobs or packages (bit preservation, no format migration) Currently in process of developing user interface/ingestion prototype that addresses needs of small data for release in late July 2011 25
COLLABORATIONS Internal Library / VPR / CIO / OSP Institutional Repository Team Kuali Coeus team External DMP Tool DataONE Conference/professional networks 26
27
CHALLENGES Involving subject librarians? Gaining institutional buy-in? Meeting demand? 28
HOW TO INVOLVE SUBJECT LIBRARIANS? UVa Library Staff Model Scientific Data Consultants Subject Librarians Current Training Model Brown Bag Data Curation Discussions Data Interviews Goals and Objectives Build Data Literacy Create Collaborative Opportunities Establish the Library for Data Preservation 29
HOW TO GAIN INSTITUTIONAL BUY-IN? Regulations are helpful Partnerships between key stakeholders: University libraries (UL) Central IT (CIO) Research Office (VP for Research) Sponsored Programs/Research Strategic investment: take ownership, allocate resources, and demonstrate capability 30
HOW TO MEET DEMAND? Time: how to best manage staff time NSF research support alone is going to be very time consuming (UVA had about 140 proposals over the past year, 44 in November alone) Funding: work with leaders to find money Redirection/reallocation of grant overhead dollars Write-in of library staff on grants Strategy: decide how to invest How might units be reorganized? How do we expand to other disciplines? How could staff resources and expertise be refocused? What additional partnerships would add value? 31
FUTURE DIRECTIONS Addressing data management needs of other disciplines across the institution Integration into formal research proposal process Broader data management education Increased funded research project consulting Technology consulting Expansion of virtual organization partners and creation of research advisory board Guiding of policy revision to address new interests in data management and preservation 32
THANK YOU! Andrew Sallans Head of Strategic Data Initiatives, SciDaC Group University of Virginia Library Email: als9q@virginia.edu Twitter: asallans http://www.lib.virginia.edu/brown/data 33