New Developments in Data Sharing, Remote Access, Secure Data, and Documentation at the Cornell Institute for Social and Economic Research (CISER)
|
|
|
- Basil Ball
- 10 years ago
- Views:
Transcription
1 New Developments in Data Sharing, Remote Access, Secure Data, and Documentation at the Cornell Institute for Social and Economic Research (CISER) William C. Block and Lars Vilhuber 4 th Workshop on Data Access 26 March 2012 Luxembourg
2 Outline of Today s Presentation CISER Secure Data Services Cornell s NSF-Census Research Network (NCRN) Project Cornell s Research Data Management Service Group (RDMSG) Improvements to CISER Data Archive Latest Developments at CISER and Cornell
3 Before we begin what is CISER? CISER s Mission:.anticipate and support the evolving computational and data needs of Cornell social scientists and economists throughout the entire research process and data life cycle.
4 CISER Broad Range of Services include: Data Programming Services Consulting and Workshops Data Archive Data Management Plans Restricted Data Services Research Computing Administrative Support
5 Lifecycle of Research Data Research study is conceived and planned, methodologies selected, funding sources explored For some projects, existing data sources may be sought and explored Idea Search & Discovery Data management By search tools utilizing metadata from data stores, new research data becomes available for finding and exploring by researchers Ideally begins early in data lifecycle to assure long-term preservation and access of data. Measurement instruments are designed, developed, acquired; data are collected through appropriate methodologies Collection Archiving Final datasets are deposited for long-term preservation e.g., into institutional or domain repository Collected data are merged, cleaned, analyzed, subsetted, coded, harmonized, linked, etc. Analysis & Processing Publication Final datasets are made publicly accessible e.g. via researcher s and/or department s and/or journal publisher s web site Based on slide from:
6 Search & Discovery Publication of social science research data Archiving Data Archive - Collection and Services Established 30 years ago Collection of datasets to support quantitative research Consulting services to match user needs with appropriate data Provides Cornell social science researchers a repository for providing others access to, and long-term preservation of, their numeric/statistical research data
7 CISER Research Computing DELL R810 servers, Diskless boot using DELL s Advanced Infrastructure Management (AIM) Windows 2008 R2 40-core (80-core when hyper threaded) 512Gb RAM 10Gb internal network utilizing Force10 equipment with failover capability 96 Tb of raw disk on CISERRSCH and CRADC domains/each 144 Tb of raw backup disk for CRADC domain (disk to disk backup) Ez-backup service for CISERRSCH domain (daily backup) Currently service ~2,200 accounts Virtual servers available upon request for specialized projects Secured by CISCO ASA 5520 (CRADC) and 5540 (CISERRSCH)
8 CISER Secure Data Services Newly-named umbrella term for a comprehensive suite of services for researchers using restricted data. Resolve confusion between CRADC and CRDC CISER is Cornell s official custodian of restricted access data sets Customized environment to meet data security standards from individual data providers Support across entire Project Lifecycle
9 CISER Secure Data Services Multiple Modes of Access Secure Remote Access (to CISER servers from your computer*) Dedicated Stand-along computers/secure Rooms at CISER Access to remote-servers from secure rooms at CISER (pending IAB agreement) Cornell Census Research Data Center (U.S. Government statistical data) Cornell Virtual RDC *Most commonly understood CRADC feature
10 CISER Secure Data Services Examples of Secure Data and Providers U.S. Census Bureau Agency for Healthcare Research and Quality (AHRQ) Adolescent Health (Add Health) New York State Department of Health Health and Retirement Study (HRS) German Institute for Employment Research (IAB) Many other government agencies, research organizations, private companies
11 CISER Secure Data Services Support Across the Data Lifecycle Identification of appropriate data resources Writing security plans/working with secure data providers Coordinate with Office of Sponsored Program and Institutional Review Board Data Management Plans Complete range of software applications/secure high capacity computing
12 Cornell s Node of the NSF-Census Research Network (NCRN) Cornell NSF-Census Research Node: Integrated Research, Support, Training, and Data Documentation 1 of 8 Nodes funded in the Network ($1.2M $3M each) Investigators: John Abowd, William Block, Lars Vilhuber, and Ping Li The Comprehensive Census Bureau Metadata Repository (CCBMR) Socio-economic / official statistics often have need for confidentiality restrictions/privacy.
13 The Death Knell for Public-use Data Sounded by young scholars pursuing research programs that mandate inherently identifiable data: geospatial relations, exact genome data, networks of all sorts, linking administrative records. These researchers acquire authorized restricted access to the confidential identifiable data and perform their analyses in secure environments. But they don t leave behind the scientific trail that has made public-use files so important.
14 The Comprehensive Census Bureau Metadata Repository (CCBMR) Facilitates access to detailed metadata Restricted access data from outside an RDC while enabling finegrained control over confidential information for the (Longitudinal Business Database (LBD), American Community Survey (ACS), American Housing Survey (AHS), Longitudinal Employer- Household Dynamics (LEHD)) Public-use datasets inside restricted-access areas (IPUMS, CPS) Expands the notion of metadata to include user-generated components (notes, programs, etc.)
15 NCRN Research Program Develop and implement state-of-the-art statistical learning algorithms for the U2W imputation task Explore boosting, logistic regression, support vector machines, and conditional random fields Boosting is potentially promising In addition to the distance attribute, explore other available data such as the demographic information (gender, occupation, race, education, etc.) Use statistical learning to improve the integration of the establishment demography
16 NCRN Training Component CCBMR VirtualRDC and Synthetic Data Server Instructional material converted to online, selfpaced course (see INFO 7470) All tools developed in the NCRN nodes accessible (in cooperation with all other nodes)
17 Cornell s Research Data Management Service Group (RDMSG) comprehensive data management planning and services CISER is a founding member Present university-wide coherent set of services available to researchers Unified web presence for specialized services Source of standard language/guidance on data management plans
18 Cornell RDMSG Service Areas Data Management Planning Overview Data Storage and Backup Metadata Data Analysis Collaboration Tools High performance computing Privacy and confidentiality Intellectual property and copyright Data publication Data Archiving and Curation
19 Some recent enhancements made to the CISER Data Archive
20 Future Plans for the CISER Data Archive Assignment of persistent identifiers (e.g., DOIs) to datasets to make them more easily citable in the scholarly literature, and reliably linkable Web-based ingest function for CU researchers to submit datasets and metadata for archiving and access Variable-level search functionality to allow end users to find datasets that contain specific variables of interest to their research Export and exposure of standards (DDI, Dublin Core, ) conformant metadata
21 Latest Developments at Cornell and CISER Labor Dynamics Institute Founded in the ILR School Mission: To create and make accessible novel data on the dynamics of the labor markets, we work with research networks and statistical agencies, developing appropriate statistics to inform policy makers, researchers, and simply people seeking knowledge. We emphasize and meet the requirements of stakeholders: users as well as providers, balancing the utility of the data with the confidentiality of the people and businesses whose activities the data describe. National Registry of Catastrophic Youth Sports Injuries (Pilot Project Proposed at CISER)
22 Latest Developments at Cornell and CISER (Continued) Improvements planned to the Cornell Virtual RDC Planned upgrade of internode network (10x faster) Easier transfers to XSEDE (web-based interface) IASSIST 2012 (June, Washington D.C.) Combined Session on Expanding Data Access and Protecting Confidentiality in a Shrinking World and Secure Data Services: Unlocking the Power of Restricted Data
23 Latest Developments at Cornell and CISER (Continued) Secure Data Agreement with Institut für Arbeitsmarkt- und Berufsforschung (IAB; Institute for Employment Research) Access to remote-servers from secure rooms at CISER (pending IAB agreement) Secure Remote Access (to CISER servers from your computer; Germany scientific use files)
24 Thank you for your time & attention! Questions? ciser.cornell.edu
Advancing Access to Restricted Data: Regulations, Compliance, Continuous Monitoring. OH MY!!!
Advancing Access to Restricted Data: Regulations, Compliance, Continuous Monitoring. OH MY!!! Cornell Institute for Social and Economic Research and Cornell Restricted Access Data Center CISER s Mission:.anticipate
RESEARCH DATA MANAGEMENT POLICY
Document Title Version 1.1 Document Review Date March 2016 Document Owner Revision Timetable / Process RESEARCH DATA MANAGEMENT POLICY RESEARCH DATA MANAGEMENT POLICY Director of the Research Office Regular
Data Management Plan Template Guidelines
Data Management Plan Template Guidelines This sample plan is provided to assist grant applicants in creating a data management plan, if required by the agency receiving the proposal. A data management
How To Manage Research Data At Columbia
An experience/position paper for the Workshop on Research Data Management Implementations *, March 13-14, 2013, Arlington Rajendra Bose, Ph.D., Manager, CUIT Research Computing Services Amy Nurnberger,
Checklist for a Data Management Plan draft
Checklist for a Data Management Plan draft The Consortium Partners involved in data creation and analysis are kindly asked to fill out the form in order to provide information for each datasets that will
A grant number provides unique identification for the grant.
Data Management Plan template Name of student/researcher(s) Name of group/project Description of your research Briefly summarise the type of your research to help others understand the purposes for which
data.bris: collecting and organising repository metadata, an institutional case study
Describe, disseminate, discover: metadata for effective data citation. DataCite workshop, no.2.. data.bris: collecting and organising repository metadata, an institutional case study David Boyd data.bris
UNIVERSITY OF NAMIBIA
UNIVERSITY OF NAMIBIA SCHOLARLY COMMUNICATIONS POLICY FOR THE UNIVERSITY OF NAMIBIA Custodian /Responsible Executive Responsible Division Status Recommended by Pro Vice-Chancellor: Academic Affairs and
Research Data Management Procedures
Research Data Management Procedures pro-123 To be read in conjunction with: Research Data Management Policy Version: 2.00 Last amendment: Oct 2014 Next Review: Oct 2016 Approved By: Academic Board Date:
WHAT SHOULD NSF DATA MANAGEMENT PLANS LOOK LIKE
WHAT SHOULD NSF DATA MANAGEMENT PLANS LOOK LIKE Bin Ye, College of Agricultural and Life Sciences University of Wisconsin Diane Winter, Inter-university Consortium for Political and Social Research (ICPSR),
GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington
GEOG 482/582 : GIS Data Management Lesson 10: Enterprise GIS Data Management Strategies Overview Learning Objective Questions: 1. What are challenges for multi-user database environments? 2. What is Enterprise
ESRC Research Data Policy
ESRC Research Data Policy Introduction... 2 Definitions... 2 ESRC Research Data Policy Principles... 3 Principle 1... 3 Principle 2... 3 Principle 3... 3 Principle 4... 3 Principle 5... 3 Principle 6...
Local Loading. The OCUL, Scholars Portal, and Publisher Relationship
Local Loading Scholars)Portal)has)successfully)maintained)relationships)with)publishers)for)over)a)decade)and)continues) to)attract)new)publishers)that)recognize)both)the)competitive)advantage)of)perpetual)access)through)
The RDMSG : Data Management Planning and More
The RDMSG : Data Management Planning and More Wendy Kozlowski Sarah Wright February 2014 Support for Cornell investigators Research Data Management Service Group (RDMSG) Virtual organization including
Management of Research Data Procedure
Management of Research Data Procedure Related Policy Management of Research Data Policy Responsible Officer Deputy Vice Chancellor (Research) Approved by Deputy Vice Chancellor (Research) Approved and
Service Road Map for ANDS Core Infrastructure and Applications Programs
Service Road Map for ANDS Core and Applications Programs Version 1.0 public exposure draft 31-March 2010 Document Target Audience This is a high level reference guide designed to communicate to ANDS external
Checklist and guidance for a Data Management Plan
Checklist and guidance for a Data Management Plan Please cite as: DMPTuuli-project. (2016). Checklist and guidance for a Data Management Plan. v.1.0. Available online: https://wiki.helsinki.fi/x/dzeacw
LJMU Research Data Policy: information and guidance
LJMU Research Data Policy: information and guidance Prof. Director of Research April 2013 Aims This document outlines the University policy and provides advice on the treatment, storage and sharing of
How To Useuk Data Service
Publishing and citing research data Research Data Management Support Services UK Data Service University of Essex April 2014 Overview While research data is often exchanged in informal ways with collaborators
German Record Linkage Center
German Record Linkage Center Microdata Computation Centre (MiCoCe) Workshop Nuremberg, 29 April 2014 Johanna Eberle FDZ of BA at IAB Agenda Basic information on German RLC Services & Software Projects
Creating a Data Management Plan for your Research
Creating a Data Management Plan for your Research EPFL Workshop Lausaunne, 28 Oct 2014 Robin Rice, Laine Ruus EDINA and Data Library Course content What is a Data Management Plan? Benefits and drivers
DOES MORTALITY DIFFER BETWEEN PUBLIC AND PRIVATE SECTOR WORKERS?
RETIREMENT RESEARCH State and Local Pension Plans Number 44, June 2015 DOES MORTALITY DIFFER BETWEEN PUBLIC AND PRIVATE SECTOR WORKERS? By Alicia H. Munnell, Jean-Pierre Aubry, and Geoffrey T. Sanzenbacher*
Data Management at UT
Data Management at UT Maria Esteva, TACC, [email protected] Colleen Lyon, UT Libraries, [email protected] Angela Newell, ITS, [email protected] What is data management? systematic organization
National Data Sharing and Accessibility Policy (NDSAP)
Draft National Data Sharing and Accessibility Policy (NDSAP) 1. Introduction 1.1 Data are recognized at all levels as a valuable resource that should be made publicly available and maintained over time
Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
DATA CITATION. what you need to know
DATA CITATION what you need to know The current state of practice of the citation of datasets is seriously lacking. Acknowledgement of intellectual debts should not be limited to only certain formats of
LIBER Case Study: University of Oxford Research Data Management Infrastructure
LIBER Case Study: University of Oxford Research Data Management Infrastructure AuthorS: Dr James A. J. Wilson, University of Oxford, [email protected] Keywords: generic, institutional, software
Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network
Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network November 16, 2010 Data Management Short Course Series Sponsored by the Odum Institute and the UNC Libraries Campus
Preservation and Dissemination Policy of the LISS Data Archive
Preservation and Dissemination Policy of the LISS Data Archive date 21 March 2016 authors Marika de Bruijne, Arnaud Wijnant, Edwin de Vet, Eric Balster version 1.3 classification standard CentERdata, Tilburg,
Research Data Management Guide
Research Data Management Guide Research Data Management at Imperial WHAT IS RESEARCH DATA MANAGEMENT (RDM)? Research data management is the planning, organisation and preservation of the evidence that
Data Management Plans & the DMPTool. IAP: January 26, 2016
Data Management Plans & the DMPTool IAP: January 26, 2016 Data Management Services @ MIT Libraries Workshops/Webinars Web guide: http://libraries.mit.edu/data-management Individual consultations includes
OpenAIRE Research Data Management Briefing paper
OpenAIRE Research Data Management Briefing paper Understanding Research Data Management February 2016 H2020-EINFRA-2014-1 Topic: e-infrastructure for Open Access Research & Innovation action Grant Agreement
University of Bristol. Research Data Storage Facility (the Facility) Policy Procedures and FAQs
University of Bristol Research Data Storage Facility (the Facility) Policy Procedures and FAQs This FAQs should be read in conjunction with the RDSF usage FAQs - https://www.acrc.bris.ac.uk/acrc/rdsf-faqs.html
How To Teach Social Science To A Class
Date submitted: 18/06/2010 Using Web-based Software to Promote Data Literacy in a Large Enrollment Undergraduate Course Harrison Dekker UC Berkeley Libraries Berkeley, California, USA Meeting: 86. Social
Research Data Archival Guidelines
Research Data Archival Guidelines LEROY MWANZIA RESEARCH METHODS GROUP APRIL 2012 Table of Contents Table of Contents... i 1 World Agroforestry Centre s Mission and Research Data... 1 2 Definitions:...
Canadian National Research Data Repository Service. CC and CARL Partnership for a national platform for Research Data Management
Research Data Management Canadian National Research Data Repository Service Progress Report, June 2016 As their digital datasets grow, researchers across all fields of inquiry are struggling to manage
NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons
The NIH Commons Summary The Commons is a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage,
About Recovery Manager for Active
Dell Recovery Manager for Active Directory 8.6.1 May 30, 2014 These release notes provide information about the Dell Recovery Manager for Active Directory release. About Resolved issues Known issues System
Installing and Configuring Windows Server 2008. Module Overview 14/05/2013. Lesson 1: Planning Windows Server 2008 Installation.
Installing and Configuring Windows Server 2008 Tom Brett Module Overview Planning Windows Server 2008 Installations Performing a Windows Server 2008 Installation Configuring Windows Server 2008 Following
How To Write An Nccwsc/Csc Data Management Plan
Guidance and Requirements for NCCWSC/CSC Plans (Required for NCCWSC and CSC Proposals and Funded Projects) Prepared by the CSC/NCCWSC Working Group Emily Fort, Data and IT Manager for the National Climate
Archiving A Dell Point of View
Archiving A Dell Point of View Dell Product Group 1 THIS POINT OF VIEW PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED
A Guide to the Research Data Service
A Guide to the Research Data Service DMP online ONLINE DATASHARE MY RESEARCH DATA PURE DATA SYNC DATA VAULT DATA STORE This booklet was produced in April 2016 by the Research Data Service Team, Information
H2020 Guidelines on Open Data and Data Management Plan
H2020 Guidelines on Open Data and Data Management Plan CRR Centro Risorse per la Ricerca Multimediale Why? Open scientific research data should be easily discoverable, accessible, assessable, intelligible,
Research Data Centre network for transnational access - four years of experiences by seven European RDCs
Research Data Centre network for transnational access - four years of experiences by seven European RDCs Karen Dennison (UK Data Archive) and David Schiller (IAB) Presented by Karen Dennison Luxembourg,
Documenting the research life cycle: one data model, many products
Documenting the research life cycle: one data model, many products Mary Vardigan, 1 Peter Granda, 2 Sue Ellen Hansen, 3 Sanda Ionescu 4 and Felicia LeClere 5 Introduction Technical documentation for social
ABB Technology Days Fall 2013 System 800xA Server and Client Virtualization. ABB Inc 3BSE074389 en. October 29, 2013 Slide 1
ABB Technology Days Fall 2013 System 800xA Server and Client ization October 29, 2013 Slide 1 System 800xA ization Customers specify it Customers harmonize with IT Training environments Lower cost of ownership
Interagency Science Working Group. National Archives and Records Administration
Interagency Science Working Group 1 National Archives and Records Administration Establishing Trustworthy Digital Repositories: A Discussion Guide Based on the ISO Open Archival Information System (OAIS)
Columbia University Digital Library Architecture. Robert Cartolano, Director Library Information Technology Office October, 2009
Columbia University Digital Library Architecture Robert Cartolano, Director Library Information Technology Office October, 2009 Agenda Technology Architecture Off-site NYSERNet Facility Ingest, curation
Grand Challenges Making Drill Down Analysis of the Economy a Reality. John Haltiwanger
Grand Challenges Making Drill Down Analysis of the Economy a Reality By John Haltiwanger The vision Here is the vision. A social scientist or policy analyst (denoted analyst for short hereafter) is investigating
REACCH PNA Data Management Plan
REACCH PNA Data Management Plan Regional Approaches to Climate Change (REACCH) For Pacific Northwest Agriculture 875 Perimeter Drive MS 2339 Moscow, ID 83844-2339 http://www.reacchpna.org [email protected]
SowiDataNet. Bringing Social and Economic Research Data Together
SowiDataNet Bringing Social and Economic Research Data Together Monika Linne, Data Archive for the Social Sciences GESIS Leibniz Institute for the Social Sciences SowiDataNet General Overview What is SowiDataNet?
Using Dataverse Virtual Archive Technology for Research Data Management. Jonathan Crabtree Thu-Mai Christian Amanda Gooch
Using Dataverse Virtual Archive Technology for Research Data Management Jonathan Crabtree Thu-Mai Christian Amanda Gooch H. W. Odum Institute Archive Services The Howard W. Odum Institute was founded in
Best Practices for Research Data Management. October 30, 2014
Best Practices for Research Data Management October 30, 2014 Presenters Andrew Johnson Research Data Librarian University Libraries Shelley Knuth Research Data Specialist Research Computing Outline What
The Key Elements of Digital Asset Management
The Key Elements of Digital Asset Management The last decade has seen an enormous growth in the amount of digital content, stored on both public and private computer systems. This content ranges from professionally
Scholarly Use of Web Archives
Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 February 2013 Web Archiving initiatives worldwide http://en.wikipedia.org/wiki/file:map_of_web_archiving_initiatives_worldwide.png
Research Data Management PROJECT LIFECYCLE
PROJECT LIFECYCLE Introduction and context Basic Project Info. Thesis Title UH or Research Council? Duration Related Policies UH and STFC policies: open after publication as your research is public funded
EUROPEAN COMMISSION Directorate-General for Research & Innovation. Guidelines on Data Management in Horizon 2020
EUROPEAN COMMISSION Directorate-General for Research & Innovation Guidelines on Data Management in Horizon 2020 Version 2.0 30 October 2015 1 Introduction In Horizon 2020 a limited and flexible pilot action
Data and Machine Architecture for the Data Science Lab Workflow Development, Testing, and Production for Model Training, Evaluation, and Deployment
Data and Machine Architecture for the Data Science Lab Workflow Development, Testing, and Production for Model Training, Evaluation, and Deployment Rosaria Silipo Marco A. Zimmer [email protected]
High Availability of the Polarion Server
Polarion Software CONCEPT High Availability of the Polarion Server Installing Polarion in a high availability environment Europe, Middle-East, Africa: Polarion Software GmbH Hedelfinger Straße 60 70327
Best Practices for Data Management. RMACC HPC Symposium, 8/13/2014
Best Practices for Data Management RMACC HPC Symposium, 8/13/2014 Presenters Andrew Johnson Research Data Librarian CU-Boulder Libraries Shelley Knuth Research Data Specialist CU-Boulder Research Computing
UNCLASSIFIED. UK Email Archiving powered by Mimecast Service Description
UNCLASSIFIED 11/12/2015 v2.2 UK Email Archiving powered by Mimecast Service Description Cobweb s UK Email Archiving, powered by Mimecast, provides businesses with a secure, scalable cloud-based message
Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020
Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020 Version 1.0 11 December 2013 Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020
