Deep Carbon Observatory Data Science Interim Report, Dec
|
|
- Abner Lambert
- 8 years ago
- Views:
Transcription
1 Deep Carbon Observatory Data Science Interim Report, Dec Progress on the main issue, problem or subject and why is it important. The Deep Carbon Observatory (DCO) Data Science (DCO-DS) team has spent the ~ first year of this phase implementing key data science infrastructure, developing data management plans and policies, and working with and further developing relations and assessing data science requirements from the four Science Communities (Deep Energy, Deep Life, Extreme Physics and Chemistry, and Reservoirs and Fluxes), DCO Engagement and DCO Secretariat. The result has been substantial progress on all six key elements of DCO-DS identified in the original proposal, as well as several additional sub-tasks, and refinement of priorities. We have in place the software infrastructure for ~ 95% of the proposed initial data science platform that coexists with, and enhances, key community activities. As such we have implemented the initial version of the Deep Carbon Virtual Observatory (DCVO); a collaborative scalable education and research environment for searching, accessing, integrating, and analyzing distributed observational, experimental, and model databases. We also have made progress in facilitating the definition of each of the following: A Virtual Mineral Laboratory, The Global Census of Deep Life, The Global Census of Deep Fluids, The Global Volcano Observatory, The Global State of High Pressure and Temperature Carbon (and related) Materials, and A Global Inventory of Diamonds with Inclusions. A key leveraging step has been the identification and endorsement (by the Mineralogical Society of America) of an Earth Materials Data Infrastructure (EMDI) spanning all DCO Science Communities. The genesis of EMDI arose from the data infrastructure elements as part of DCO- DS; in the Deep Carbon Virtual Observatory, Leveraging Community activities, Science Network, and Visualization and Discovery and comprise ~90% of the DCO-DS effort (personnel and funds) in this project period. The DCO-DS platform (Fig. 1 and 3) is facilitating collaboration, data and object registration, materials repository and much more. 1
2 - Figure 1. Schematic for the DCO Data Science Platform. The Web-based portal (built on Drupal; top of figure) is underpinned by three key data framework components. These are the DCO repository (left center) based on the Comprehensive Knowledge Archive Network (CKAN), the DCO research network (center) based on VIVO, and the identifier infrastructure based on the Global Handle System (GHS; lower). To the upper right and center right are visual renderings of the DCO science network (based on the initial 63 registrants on the DCO portal, i.e. ~ March 2013). Key Elements of the Data Science Effort Given DCO s intensive data and computational needs, each of the activities embedded in the Science Directorates and instrument development initiatives of the Deep Carbon Observatory have adopted or adapted data science and data management solutions to fulfill both decadal strategic objectives and their day-to-day tasks. As noted earlier, the Data Science effort for DCO will continually assess in detail the data science and data management needs in each DCO activity and for the DCO as a whole, by using a combination of informatics methods; use case development, requirements analysis, inventories and interviews. Detailed records of progress and activities can be found at: 2
3 Priorities for DCO-DS have been established by the Data Science Advisory Committee on an intermittent basis as input to the DCO-DS PI and Project/ Technical leads to ensure a balanced allocation of DCO-DS resources and to advance DCO decadal goals. We report on the six key elements of a DCO-DS platform (with actual percentage of the data science personnel effort for the project period in parenthesis): i) the Deep Carbon Virtual Observatory (40%), ii) Leverage and enhancement of the existing community data resources (25%), iii) the Deep Carbon Observatory Science Network (a virtual organization) (20%), iv) Visualization and scientific exploration (5%), v) Data as a first-class object, and vi) Data Science education and working activities (v-vi comprise the remaining 10%). i. The Deep Carbon Virtual Observatory Figure 2. Conceptual Information Model for the DCO research/data ecosystem. Using our use-case driven methodology we have formalized the information modems applicable to the DCO (See Fig. 2). This information model has guided our development of the underlying schema (ontology) for the data infrastructure and allowed us to leverage and extend existing schema from VIVO, CKAN, etc. All schemas are open source and being fed back to the respective communities (as well as software integration/ modification). 3
4 - Figure 3. Data Science Platform elements and relations. The elements are presented schematically in Fig. 1 and their relations are indicated here. Of note is that the primary support for the DCO infrastructure comes from the VIVO framework ( and that CKAN and GHS are further removed (i.e. not explicitly visible) ii. Leverage and Enhancement of Community Data Resources Generating, assembling and analyzing the libraries of new and complex data created by the DCO requires management of the inherent complexity to allow integration of information and knowledge across multiple scales spanning traditional Figure 4. Screen shot of live DCO data portal for Noble Gas Isotope dataset. Shown are the DCO-ID, QR code for easy recognition and (lower) dataset metadata fields. disciplinary boundaries. Curation of these data is also a critical element of such assembly. 4
5 Figure 5. Screen shot of dataset detail for collection in Fig. 4. This effort has been an exemplar for DCO Data Science Data Deposition and has benefited from very strong support from Deep Energy leadership (Cole/ Sheets; OSU). To date, we ve worked very closely with the Deep Energy Community, particularly for the Noble Gas Isotope dataset (see Fig. 3). We ve also further developed strong relationships via Boundary Activities with EarthChem, PetDB (geochemical data) and SESAR (sample registry) at Lamont Doherty Earth Observatory/ Columbia University ( for the Reservoirs and Fluxes Community, MetPetDB at RPI ( genomic data with taxonomic counts, sequences, and Visualization and Analysis of Microbial Population Structures [VAMPS] at the Marine Biology Laboratory ( for Deep Life. We have started and will build on integrating and further populating the thermochemical databases, such as the Library of Experimental Phase Relations [LEPR] with Mark Ghiorso as identified in the EPC Community proposal. ( Another undertaking for which DCO-DS dedicated resources toward, is the volcano gas emission database. While this began with the Global Volcanism Program at Smithsonian, it has not shifted to LDEO and we remain in close contact with these groups and ready to provide links to the DCO data infrastructure. In preparation for these efforts, as well as for larger DCO sample activities, RPI (DCO-DS) became a full member of the International Geo Sample Network registry (approve at AGU Dec. 2013) and is licensed to issue IGSN identifiers for DCO samples. 5
6 - We plan to actively continued these community data activities in the current and next phases. A welcome opportunity is provided by the DCO summer school for which the conveners (Jones, Baross) have indicated a strong desire for DCO datasets to be made available for school participants, as well as data science material embedded in the curriculum. We have committed to participating in the summer school as part of this effort. iii. The DCO Science Network. Figure 6. Northeast American continental regional map of DCO science network participants (~Nov. 2013). This map is illustrative of what one view of the network will be once more participants are entered (either registered for logins, or their data added). As part of the progression toward a global community of carbon scientists, DCO has the opportunity to form a virtual science network. The Data Science Team worked extensively with the Engagement Team in developing and refining key functions for the DCO web portal support collaboration. In doing so, there was a need to characterize roles and relationships within the DCO Science Network and to identify key liaisons within each project team. All such 6
7 functions are now implemented. All that is required is the population of the content, a task that extends well beyond the purview of the Data Science Team. Special effort will be required over the next 2-3 months in order to substantially increase the formal number of participants in the science network (meaning they exist and are part of the network now but not visible through the portal). That effort involves the DCO secretariat, Engagement and the larger DCO community. iv. Visualization and Scientific Exploration The Data Science Team has begun its contributions to visualization and data exploration in the first year as planned by working directly with several DCO scientists (specifically Hazen, Downs and Sverjensky) to both determine suitable visual approaches as well as finding ways of lowering the cost of visualization generation and allow greater integration in the scientific process. Figure 7. Klee diagram: A 2D matrix of all 72 x 72 essential mineral-forming elements, in which each matrix element is the percent of minerals with essential element E1 on the axis that also has element E2 on the Y axis (from Hazen, Downs, Fox et al. in preparation). Fig. 7 is an example of an early Klee diagram produced in the exploration of essential mineral forming elements. The diagram was produced by graduate student Yu Chen (RPI) working with Hazen and a graduate student of Downs. The results were well received and 7
8 - indicate strong potential for increasing such visualization efforts across more of DCO (e.g. exploratory work with Sogin on accessing Census of Deep Life records via VAMPS and generating Linked Data (RDF) has also been encouraging). v. Data as a first class science object An early contribution of the Data Infrastructure Team was the establishment of the identification infrastructure for DCO scientific data objects (DCO-ID based on the Global Handle System, Fig. 1). These identifiers facilitate the long-term management, discovery and exploitation of the range of artifacts beyond data. These artifacts include reports, papers, visualizations, tables, presentations, data products, interpretative analyses, etc. produced by DCO researchers. Integrated with this naming infrastructure is a rich metadata management system (using VIVO; see Fig. 1) enabling data integration, federation and the construction of higher-level applications and visualizations in the DCVO; see Fig. 1. Web services allow query (whether by users or applications) using the identifier for key metadata, access means, derivative products, provenance records, related artifacts of all types, etc. distinguishing citations attributable to DCO ( - e.g / CC = Noble Gas Dataset, or directly ). vi. Data Science Education and Data Management Training To date, attention to Data Science in DCO has increased over the first year. Marshall Ma will attend the early career scientist workshop in 2014 and present and encourage data science. Data, and data science has explicitly been included in the plans for the 2014 DCO Summer School. The Data Science Team has presented elements of Data Management and Data Science at the Deep Energy workshop in Manchester UK early in The Team also developed the DCO data policy guidelines to be synergistic with the publication and sharing guidelines. These are to be online at: Further, detailed data management and planning documentation was developed early in the first year 8
9 this document can be found at: and will migrated to the main DCO web site in Metrics/Goals Overall Goal and Metrics provided by Sloan Foundation to RPI: To develop methods and tools to handle DCO data integral with efforts of other institutions, programs and disciplines as shown within 18 months by posted protocols, packages of code, and agreements with other organizations such as NSF 1. To capture data and encode and store modeling and simulation results from the DCO to have at least 6 early examples with 18 months We will complete this goal by May To innovate with visualization of large data sets, as shown by at least 6 examples in publications or used widely in DCO presentations We will complete this goal by May To provide underlying services for the DCO that facilitates routine collaboration as shown by use by most of the core DCO researchers within 18 months We have completed the goal of providing the services and collaboration functions. The DCO Data Science Team does not have primary (or secondary) responsibility for adding DCO researchers (via registered logins) to the web site. As such DCO relies upon DCO Engagement, DCO Secretariat and the DCO Communities themselves to promote the use of the DCO Science Network. The Data Science Team will remain very responsive to look-and-feel aspects of the collaboration functions. 2. Additional Results from partnering with the DCO Secretariat, Science Directorates, DCO Engagement and Broader Community Reporting: In the first year, a Community Reporting function for DCO was identified. Due to the flexible architecture and interfaces provided both to DCO content (based on information model in Fig. 2), the Data Science Team were easily able to put in place the technical infrastructure for such reports. An example is shown in Fig. 8. A fully working 9
10 - capability will appear in 2014 as interface and query requirements are finalized. Figure 8. Early rendering of the Community Reporting function for DCO. This requirement was suggested/ identified during the first year of DCO. While the entire technical infrastructure is in place, the formatting of the reports, as well as specific queries to generate the reports are still being defined. DCO Secretariat, Engagement and Communities will have responsibility for adding content to the DCO repositories. The Data Science Team will continue to facilitate this activity. Boundary Activities: In the first year, the Data Science Team has initiated 7 major boundary activities as outlined below (See Appendix A for specific timing and level of activities compared to proposed ones). Boundary activities in DCO-Data Science are tasks to establish data infrastructure interfaces (data, metadata and services) that allow all community resources to be a part of the DCVO as well as allowing DCO data to flow into these locations such that it is still 10
11 known and attributed to DCO and discoverable. Details on each of these including status (and as additional activities are added) can be found at: DS/WorkingGroups/BoundaryActivities. 1 Reservoirs and Fluxes Community (DCO-RF) 1.1 Global Volcanism Program (GVP) Gas Emission Extension 1.2 Integrated Earth Data Applications (IEDA) and International Geo Samle Number (IGSN) 1.3 Diamonds and Mantle Geodynamics of Carbon (DMGC) and DiamondDB 1.4 Bibliography for volcanic gases and fluxes 2 Deep Energy Community (DCO-DE) 2.1 Igor's Noble Gas Dataset 3 Extreme Physics and Chemistry Community (DCO-EPC) 3.1 EPC Computer Cluster 3.2 Populate a New Data Source for Properties 4 Deep Life Community (DCO-DL) 4.1 Visualization and Analysis of Microbial Population Structures (VAMPS) 5 Geoscience Data Journal (GDJ) 5.1 Application to be an approved data center of Geoscience Data Journal 6 AGU Index List 6.1 Utilization of AGU Index List in the DCO Data Portal 7 DCO Summer School 2014 (spans all Communities) 7.1 Datasets and Data Science for Day One of Summer School. Bibliographic Infrastructure: 11
12 - Figure 9. Object registration workflow for DCO. This workflow allows for content to be hosted in other locations and have other identifiers, as well as be updated. The Data Science Data Infrastructure Team worked with the DCO Engagement Team to implement a DCO-wide Bibliographic Infrastructure (VIVO) that incorporates the digital object /name resolution infrastructure, extensible metadata model, efficient ingest workflows, and semantic discovery capabilities to track and manage an ever-growing list of DCO publications and their associated digital objects (data, with provenance, see Fig. 9). File and Information Sharing: The Data Science Data Infrastructure Team has implemented a Web-based file- and information-sharing approach using a combination of the Drupal file repository and the CKAN (see Fig. 1) repository. The relationships are denoted in Fig. 3. These resources are available to all DCO researchers (with login) and are being used for managing information, resources, and digital objects associated with research needs in data science and data management as well as communications and management functions. New semantic discovery tools will be available in 2014 for researchers to more effectively explore the holdings. 12
13 Coordination: The DCO-DS has worked extensively with the Engagement Team to enhance communications among DCO researchers and ensure that their scientific data and other artifacts are accessible and public under appropriate data-access policies. 4. Who has worked on this project in first year (Nov Nov. 2013). PI Peter Fox (Professor, project lead, architecture and design, resource management, executive committee member) Jim Hendler (Professor, architecture and design, resource management) John Erickson (Operations Director, operations, project management) Marshall Ma (Post-doc, data management, boundary activity lead, visualization) Patrick West (Senior Software Engineer, technical lead, primary developer) Yanning Chen (PhD candidate, software developer, and visualization technical lead) Han Wang (PhD candidate, software developer, and visualization technical lead) Katie Dunn (Technical services and metadata librarian, preliminary, effort will increase in 2014 for first 3 months) Dan Molik (student systems administrator, Nov 2012-Jan 2013) Anusha Akkiraju (MS student, summer 2013) Chengcong Du (MS student, summer 2013, fall 2013) Mengyu Yin (MS student, summer 2013) 13
14 - Appendix A. DCO Data Science primary activities and initial schedule. During the first period of DCO-DS activities (their approaches are discussed above) are Data Science needs analysis; Data Science outreach and education; Community engagement; Science network; Visualization; File and information sharing; Community database; Bibliographic infrastructure; Visual materials repository; Coordination; and Boundary activities: GVP, VAMPS, EarthChem, DE#1, EPC#1 (the latter two are to-be-determined boundary activities from Deep Energy and Extreme Physics and Chemistry Communities) and their approximate schedules are presented in Table J. These activities were phased according to our present understanding of DCO data needs and will need to adapt to changing requirements. Table J. As performed activity schedule for the first three-year period of the project. Activities and any sub-tasks are introduced in Section 2. =major effort, x=minor effort. Project start Nov Green = activity successful, Orange = activity behind or delayed, Red = activity cancelled or not achieved. Activity / Phase Year 0.5 Year 1 Year 1.5 Year 2 Year 2.5 Year 3 Data Science needs analysis. x Data Science outreach and education x x Community engagement/ coordination x x Science network x Visualization products File and information sharing. x Community database x x Bibliographic infrastructure x Visual materials repository x x Boundary Activities: GVP/emission VAMPS x EarthChem x x x EPC#1 x DE#1 x x Workshop 14
15 15
Databases & Data Infrastructure. Kerstin Lehnert
+ Databases & Data Infrastructure Kerstin Lehnert + Access to Data is Needed 2 to allow verification of research results to allow re-use of data + The road to reuse is perilous (1) 3 Accessibility Discovery,
More informationThe data forest. Application. Application Application DATA. Office of Research
The data forest DATA Unfortunately Data to the rescue The Rensselaer IDEA HPC: Computational Science and Engineering + Data Science and Predictive Analytics + Cognitive Computing + Perceptualization DATA
More informationComputational Science and Informatics (Data Science) Programs at GMU
Computational Science and Informatics (Data Science) Programs at GMU Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ Outline Graduate Program
More informationUSGS Community for Data Integration
Community of Science: Strategies for Coordinating Integration of Data USGS Community for Data Integration Kevin T. Gallagher USGS Core Science Systems January 11, 2013 U.S. Department of the Interior U.S.
More informationLinked Science as a producer and consumer of big data in the Earth Sciences
Linked Science as a producer and consumer of big data in the Earth Sciences Line C. Pouchard,* Robert B. Cook,* Jim Green,* Natasha Noy,** Giri Palanisamy* Oak Ridge National Laboratory* Stanford Center
More informationExploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.
Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing. Dr Liz Lyon, UKOLN, University of Bath Introduction and Objectives UKOLN is undertaking
More informationOrganic Data Publishing: A Novel Approach to Scientific Data Sharing
Second International Workshop on Linked Science Tackling Big Data, (LISC 2012), colocated with the International Semantic Web Conference (ISWC), Boston, MA, November 11-15, 2012. Organic Data Publishing:
More informationA Capability Maturity Model for Scientific Data Management
A Capability Maturity Model for Scientific Data Management 1 A Capability Maturity Model for Scientific Data Management Kevin Crowston & Jian Qin School of Information Studies, Syracuse University July
More informationData Driven Discovery In the Social, Behavioral, and Economic Sciences
Data Driven Discovery In the Social, Behavioral, and Economic Sciences Simon Appleford, Marshall Scott Poole, Kevin Franklin, Peter Bajcsy, Alan B. Craig, Institute for Computing in the Humanities, Arts,
More informationHow To Manage Research Data At Columbia
An experience/position paper for the Workshop on Research Data Management Implementations *, March 13-14, 2013, Arlington Rajendra Bose, Ph.D., Manager, CUIT Research Computing Services Amy Nurnberger,
More informationHow To Teach Data Science
The Past, Present, and Future of Data Science Education Kirk Borne @KirkDBorne http://kirkborne.net George Mason University School of Physics, Astronomy, & Computational Sciences Outline Research and Application
More informationCYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / -24.43%
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / -24.43% Overview The Cyberinfrastructure Framework for 21 st Century Science, Engineering,
More informationOpen Access to Manuscripts, Open Science, and Big Data
Open Access to Manuscripts, Open Science, and Big Data Progress, and the Elsevier Perspective in 2013 Presented by: Dan Morgan Title: Senior Manager Access Relations, Global Academic Relations Company
More informationResearch Data Management Services. Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012 Goals for the Workshop Context for research data management Libraries role in this arena Overview of
More informationMetadata Hierarchy in Integrated Geoscientific Database for Regional Mineral Prospecting
Metadata Hierarchy in Integrated Geoscientific Database for Regional Mineral Prospecting MA Xiaogang WANG Xinqing WU Chonglong JU Feng ABSTRACT: One of the core developments in geomathematics in now days
More informationData Management Considerations for the Data Life Cycle
Data Management Considerations for the Data Life Cycle NRC STS Panel 2011 November 17, 2011, Washington DC Peter Fox (RPI) foxp@rpi.edu, pfox@cs.rpi.edu Tetherless World Constellation http://tw.rpi.edu
More informationDigital preservation a European perspective
Digital preservation a European perspective Pat Manson Head of Unit European Commission DG Information Society and Media Cultural Heritage and Technology Enhanced Learning Outline The digital preservation
More informationTHE BRITISH LIBRARY. Unlocking The Value. The British Library s Collection Metadata Strategy 2015-2018. Page 1 of 8
THE BRITISH LIBRARY Unlocking The Value The British Library s Collection Metadata Strategy 2015-2018 Page 1 of 8 Summary Our vision is that by 2020 the Library s collection metadata assets will be comprehensive,
More informationCYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21) Goal Develop and deploy comprehensive, integrated, sustainable, and secure cyberinfrastructure (CI) to accelerate research
More informationHow to avoid building a data swamp
How to avoid building a data swamp Case studies in Hadoop data management and governance Mark Donsky, Product Management, Cloudera Naren Korenu, Engineering, Cloudera 1 Abstract DELETE How can you make
More informationIncreasing research data management capability within your university
LIBRARY SERVICES DIVISION OF TECHNOLOGY, INFORMATION AND LEARNING SUPPORT www.library.qut.edu.au LIBRARY SERVICES www.library.qut.edu.au Increasing research data management capability within your university
More informationScience Gateways in the US. Nancy Wilkins-Diehr wilkinsn@sdsc.edu
Science Gateways in the US Nancy Wilkins-Diehr wilkinsn@sdsc.edu NSF vision for cyberinfrastructure in the 21st century Software is critical to today s scientific advances Science is all about connections
More informationIn 2014, the Research Data group @ Purdue University
EDITOR S SUMMARY At the 2015 ASIS&T Research Data Access and Preservation (RDAP) Summit, panelists from Research Data @ Purdue University Libraries discussed the organizational structure intended to promote
More informationCYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21)
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) Overview The Cyberinfrastructure Framework for 21 st Century Science, Engineering, and Education (CIF21) investment
More informationMetadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan
Metadata for Data Discovery: The NERC Data Catalogue Service Steve Donegan Introduction NERC, Science and Data Centres NERC Discovery Metadata The Data Catalogue Service NERC Data Services Case study:
More informationOpenAIRE Research Data Management Briefing paper
OpenAIRE Research Data Management Briefing paper Understanding Research Data Management February 2016 H2020-EINFRA-2014-1 Topic: e-infrastructure for Open Access Research & Innovation action Grant Agreement
More informationData at NIST: A View from the Office of Data and Informatics
Data at NIST: A View from the Office of Data and Informatics Robert Hanisch Office of Data and Informatics Material Measurement Laboratory National Institute of Standards and Technology Data and NIST 1
More informationSummary of Responses to the Request for Information (RFI): Input on Development of a NIH Data Catalog (NOT-HG-13-011)
Summary of Responses to the Request for Information (RFI): Input on Development of a NIH Data Catalog (NOT-HG-13-011) Key Dates Release Date: June 6, 2013 Response Date: June 25, 2013 Purpose This Request
More informationInvenio: A Modern Digital Library for Grey Literature
Invenio: A Modern Digital Library for Grey Literature Jérôme Caffaro, CERN Samuele Kaplun, CERN November 25, 2010 Abstract Grey literature has historically played a key role for researchers in the field
More informationRealizing the research library - data center alliance
The Electronic Geophysical Year 2007-2008 Fifty years after the International Geophysical Year, 1957-1958 Realizing the research library - data center alliance Rajendra (Raj) Bose Center for Digital Research
More informationSurvey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer
Survey of Canadian and International Data Management Initiatives By Diego Argáez and Kathleen Shearer on behalf of the CARL Data Management Working Group (Working paper) April 28, 2008 Introduction Today,
More informationThe National Consortium for Data Science (NCDS)
The National Consortium for Data Science (NCDS) A Public-Private Partnership to Advance Data Science Ashok Krishnamurthy PhD Deputy Director, RENCI University of North Carolina, Chapel Hill What is NCDS?
More informationPDS4 and Build 5a Update. Dan Crichton, Emily Law November 2014
PDS4 and Build 5a Update Dan Crichton, Emily Law November 2014 1 PDS4 and Related MC Topics PDS4 Report and Build 5a Dan Crichton and Emily Law IM/DDWG Steve Hughes Software Sean Hardman Tool Planning
More informationCDI SSF Category 1: Management, Policy and Standards
CDI SSF Category 1: Management, Policy and Standards Developing a Data Management Plan for Implementation: Best Practices for the Collection, Management, Storing, and Sharing of Geospatial and Non-geospatial
More informationThe Importance of Bioinformatics and Information Management
A Graduate Program for Biological Information Specialists 1 Bryan Heidorn, Carole Palmer, and Dan Wright Graduate School of Library and Information Science University of Illinois at Urbana-Champaign UIUC
More informationGIS Initiative: Developing an atmospheric data model for GIS. Olga Wilhelmi (ESIG), Jennifer Boehnert (RAP/ESIG) and Terri Betancourt (RAP)
GIS Initiative: Developing an atmospheric data model for GIS Olga Wilhelmi (ESIG), Jennifer Boehnert (RAP/ESIG) and Terri Betancourt (RAP) Unidata seminar August 30, 2004 Presentation Outline Overview
More informationReport to the NOAA Science Advisory Board
Report to the NOAA Science Advisory Board from the Data Access and Archiving Requirements Working Group March 2011 The SAB s Data Access and Archiving Requirements Working Group (DAARWG) met in December
More informationWorkshop on Big Data for International Scientific Programmes: Challenges and Opportunities
Workshop on Big Data for Scientific Programmes: Challenges and Opportunities Sunday and Monday, 8-9 June 2014, Beijing, China To provide a better understanding of the opportunities and challenges of Big
More informationHow To Useuk Data Service
Publishing and citing research data Research Data Management Support Services UK Data Service University of Essex April 2014 Overview While research data is often exchanged in informal ways with collaborators
More informationIntegrating Research Information: Requirements of Science Research
Integrating Research Information: Requirements of Science Research Brian Matthews Scientific Information Group E-Science Centre STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk The science
More informationCAPITAL REGION GIS SPATIAL DATA DEMONSTRATION PROJECT
CAPITAL REGION GIS SPATIAL DATA DEMONSTRATION PROJECT DRAFT January 2013 Prepared by: O2 Planning + Design, Inc. The information contained in this document has been compiled by O2 Planning + Design Inc.
More informationBest Practices for Data Management. RMACC HPC Symposium, 8/13/2014
Best Practices for Data Management RMACC HPC Symposium, 8/13/2014 Presenters Andrew Johnson Research Data Librarian CU-Boulder Libraries Shelley Knuth Research Data Specialist CU-Boulder Research Computing
More informationGlobal Scientific Data Infrastructures: The Big Data Challenges. Capri, 12 13 May, 2011
Global Scientific Data Infrastructures: The Big Data Challenges Capri, 12 13 May, 2011 Data-Intensive Science Science is, currently, facing from a hundred to a thousand-fold increase in volumes of data
More informationIT S ABOUT TIME. Sponsored by. The National Science Foundation. Digital Government Program and Digital Libraries Program
IT S ABOUT TIME RESEARCH CHALLENGES IN DIGITAL ARCHIVING AND LONG-TERM PRESERVATION Sponsored by The National Science Foundation Digital Government Program and Digital Libraries Program Directorate for
More informationIntegrated Information Services (IIS) Strategic Plan
Integrated Information Services (IIS) Strategic Plan Preamble Integrated Information Services (IIS) supports UCAR/NCAR/UCP efforts to both manage, preserve, and provide access to its scholarship for the
More informationREACCH PNA Data Management Plan
REACCH PNA Data Management Plan Regional Approaches to Climate Change (REACCH) For Pacific Northwest Agriculture 875 Perimeter Drive MS 2339 Moscow, ID 83844-2339 http://www.reacchpna.org reacch@uidaho.edu
More informationRFI Summary: Executive Summary
RFI Summary: Executive Summary On February 20, 2013, the NIH issued a Request for Information titled Training Needs In Response to Big Data to Knowledge (BD2K) Initiative. The response was large, with
More informationData Curation for the Long Tail of Science: The Case of Environmental Sciences
Data Curation for the Long Tail of Science: The Case of Environmental Sciences Carole L. Palmer, Melissa H. Cragin, P. Bryan Heidorn, Linda C. Smith Graduate School of Library and Information Science University
More informationReport of the DTL focus meeting on Life Science Data Repositories
Report of the DTL focus meeting on Life Science Data Repositories Goal The goal of the meeting was to inform and discuss research data repositories for life sciences. The big data era adds to the complexity
More informationICSTI 2014 General Assembly October 18-19, 2014
ICSTI 2014 General Assembly October 18-19, 2014 TACC Workshop Sunday, October 19 th, 2014 Enhancing Discoverability and Accessibility of Scientific and Technical Research Information and Data The TACC
More informationScalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens
Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens 1 Optique: Improving the competitiveness of European industry For many
More informationDSpace: An Institutional Repository from the MIT Libraries and Hewlett Packard Laboratories
DSpace: An Institutional Repository from the MIT Libraries and Hewlett Packard Laboratories MacKenzie Smith, Associate Director for Technology Massachusetts Institute of Technology Libraries, Cambridge,
More informationThe Preservation and Sustainability of Research Data
The Preservation and Sustainability of Research Data Dr Markus Buchhorn, Director, ICT Environments Australian National University; Formerly: Head, ANU Internet Futures Grid Services Architect, APAC Grid
More informationCLARIN-NL Third Call: Closed Call
CLARIN-NL Third Call: Closed Call CLARIN-NL launches in its third call a Closed Call for project proposals. This called is only open for researchers who have been explicitly invited to submit a project
More informationRESPONSE FROM GBIF TO QUESTIONS FOR FURTHER CONSIDERATION
RESPONSE FROM GBIF TO QUESTIONS FOR FURTHER CONSIDERATION A. Policy support tools and methodologies developed or used under the Convention and their adequacy, impact and obstacles to their uptake, as well
More informationInstitutes for Data Science: New York University University of Washington University of California, Berkeley
Advancing scientific discovery through collaboration across research domains Institutes for Data Science: New York University University of Washington University of California, Berkeley Data Science growing
More informationDeposition and use of raw diffraction images
Deposition and use of raw diffraction images By John.R. Helliwell Crystallographic Information and Data Management A Satellite Symposium to the 28th European Crystallographic Meeting University of Warwick,
More informationBrown University Libraries Technology Plan, 2015-2017
Brown University Libraries Technology Plan, 2015-2017 Technology Vision Brown University Library creates, develops, promotes, and uses technology to further the Library s mission and strategic directions
More informationDigital Content Management Workflow Task Force
Digital Content Management Workflow Task Force Contents Digital Content Management Workflow Task Force... 1 Background... 1 An Abstract Workflow... 3 Content Types:... 5 Preservation by Content Type...
More informationCYBERINFRASTRUCTURE FRAMEWORK $143,060,000 FOR 21 ST CENTURY SCIENCE, ENGINEERING, +$14,100,000 / 10.9% AND EDUCATION (CIF21)
CYBERINFRASTRUCTURE FRAMEWORK $143,060,000 FOR 21 ST CENTURY SCIENCE, ENGINEERING, +$14,100,000 / 10.9% AND EDUCATION (CIF21) Overview The Cyberinfrastructure Framework for 21 st Century Science, Engineering,
More informationTHE UNIVERSITY OF LEEDS. Vice Chancellor s Executive Group Funding for Research Data Management: Interim
THE UNIVERSITY OF LEEDS VCEG/12/274 Vice Chancellor s Executive Group Funding for Research Data Management: Interim SOME CONTENT HAS BEEN REMOVED FROM THIS PAPER TO MAKE IT SUITABLE FOR PUBLIC DISSEMINATION
More informationRenewing Our Value: The Library s Role with Online Faculty Evaluations
Renewing Our Value: The Library s Role with Online Faculty Evaluations Maliaca Oxnam and Kimberly Chapman Annual evaluation processes for faculty are not new to most college and university campuses; however,
More informationData dissemination best practice and STAR experience
Data dissemination best practice and STAR experience Jacky Chaplow Informatics liaison image courtesy of strategicphilanthropyinc.com Why disseminate data? Comply with current legislation and government
More informationInformation and Communications Technology Strategy 2014-2017
Contents 1 Background ICT in Geoscience Australia... 2 1.1 Introduction... 2 1.2 Purpose... 2 1.3 Geoscience Australia and the Role of ICT... 2 1.4 Stakeholders... 4 2 Strategic drivers, vision and principles...
More informationCheck Your Data Freedom: A Taxonomy to Assess Life Science Database Openness
Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness Melanie Dulong de Rosnay Fellow, Science Commons and Berkman Center for Internet & Society at Harvard University This article
More informationRFP for Documentation and Development of Governance Process for Services Oriented Architecture
RFP for Documentation and Development of Governance Process for Services Oriented Architecture Section I RFP Process Introduction The University of Texas M. D. Anderson Cancer Center (M. D. Anderson) in
More informationSCAR report. SCAR Data Policy ISSN 1755-9030. International Council for Science. No 39 June 2011. Scientific Committee on Antarctic Research
International Council for Science ISSN 1755-9030 SCAR report No 39 June 2011 SCAR Data Policy Scientific Committee on Antarctic Research at the Scott Polar Research Institute, Cambridge, United Kingdom
More informationSchool of Earth and Environmental Sciences
Contact person: Karl W. Flessa Head, Department of Geosciences kflessa@email.arizona.edu 621-6000 School of Earth and Environmental Sciences Development Team Eric Betterton Head, Dept Atmospheric Sciences
More informationN O T E S. Environmental Forensics. Identification of Natural Gas Sources using Geochemical Forensic Tools. Dispute Scenarios
Environmental Forensics N O T E S V o l u m e 2 9 Identification of Natural Gas Sources using Geochemical Forensic Tools By Paul Boehm, Ph.D. and Tarek Saba, Ph.D. F o r m o r e i n f o r m a t i o n o
More informationH-Net: Preserving and Improving Access to Specialized Electronic Mailing List Archives
H-Net: Preserving and Improving Access to Specialized Electronic Mailing List Archives Interim Narrative Progress Report, August 1, 2008 January 31, 2009 Project Activities Undertaken Of the project activities
More informationSOOS Data Management
SOOS Data Management Kim Finney Chair, SOOS Data Management Sub Committee POGO 15, 23 January 2014 SOOS Goals Design and implement a comprehensive and multidisciplinary observing system for the Southern
More informationTHE M.SC. PROGRAMS OF THE FACULTY OF SCIENCE GENERAL INFORMATION THE SCHOOL OF M.SC. STUDIES
THE M.SC. PROGRAMS OF THE FACULTY OF SCIENCE GENERAL INFORMATION THE SCHOOL OF M.SC. STUDIES The Faculty of Science at the Hebrew University of Jerusalem invites outstanding Bachelor s-degree-level graduates
More informationThe EcoTrends Web Portal: An Architecture for Data Discovery and Exploration
The EcoTrends Web Portal: An Architecture for Data Discovery and Exploration Mark Servilla 1, Duane Costa 1, Christine Laney 2, Inigo San Gil 1, and James Brunt 1 1 LTER Network Office, Department of Biology,
More informationUNH Strategic Technology Plan
UNH Strategic Technology Plan Joanna Young, UNH Chief Information Officer - April 2010 People increasingly experience or interact with an organization through a technology lens. Accessible, engaging, responsive,
More informationStrategic Plan 2013 2017
Plan 0 07 Mapping the Library for the Global Network University NYU DIVISION OF LIBRARIES Our Mission New York University Libraries is a global organization that advances learning, research, and scholarly
More informationDigital Preservation Lifecycle Management
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar San Diego Supercomputer Center, University of California,
More informationWorkprogramme 2014-15
Workprogramme 2014-15 e-infrastructures DCH-RP final conference 22 September 2014 Wim Jansen einfrastructure DG CONNECT European Commission DEVELOPMENT AND DEPLOYMENT OF E-INFRASTRUCTURES AND SERVICES
More informationBig Data to Knowledge (BD2K)
Big Data to Knowledge () potential funding agency synergies Jennie Larkin, PhD Office of the Associate Director of Data Science National Institutes of Health idash-pscanner meeting UCSD September 16, 2014
More informationHow To Write A Blog Post On Globus
Globus Software as a Service data publication and discovery Kyle Chard, University of Chicago Computation Institute, chard@uchicago.edu Jim Pruyne, University of Chicago Computation Institute, pruyne@uchicago.edu
More informationIntegration of Polish National Bibliography within the repository platform for science and humanities
Marcin Roszkowski Integration of Polish National Bibliography within the repository platform for science and humanities The best thing to do to your data will be thought of by somebody else W3C LLD Agenda
More informationHarnessing the Potential of Data Scientists and Big Data for Scientific Discovery
Harnessing the Potential of Data Scientists and Big Data for Scientific Discovery Ed Lazowska, University of Washington Saul Perlmu=er, UC Berkeley Yann LeCun, New York University Josh Greenberg, Alfred
More informationCriteria for Accrediting Computer Science Programs Effective for Evaluations during the 2004-2005 Accreditation Cycle
Criteria for Accrediting Computer Science Programs Effective for Evaluations during the 2004-2005 Accreditation Cycle I. Objectives and Assessments The program has documented, measurable objectives, including
More informationHR STRATEGY FOR RESEARCHERS
HR STRATEGY FOR RESEARCHERS ACTION PLAN 2014-2017 December 2014 (Updated March 2015) Centre for Research in Agricultural Genomics CSIC- IRTA- UAB- UB Plan for the Implementation of Human Resources Policies
More informationCanadian National Research Data Repository Service. CC and CARL Partnership for a national platform for Research Data Management
Research Data Management Canadian National Research Data Repository Service Progress Report, June 2016 As their digital datasets grow, researchers across all fields of inquiry are struggling to manage
More informationAn Introduction to Managing Research Data
An Introduction to Managing Research Data Author University of Bristol Research Data Service Date 1 August 2013 Version 3 Notes URI IPR data.bris.ac.uk Copyright 2013 University of Bristol Within the Research
More informationDATA STEWARDSHIP from a geoscience and academic perspective
DATA STEWARDSHIP from a geoscience and academic perspective Margaret Leinen Vice Chancellor for Marine Science, UC San Diego Director, Scripps Institution of Oceanography Research Data Alliance - 5 San
More informationCASRAI, eurocris, Lattes, and VIVO: Four Perspectives on Research Information Standards
CASRAI, eurocris, Lattes, and VIVO: Four Perspectives on Research Information Standards David Baker, Keith Jeffery, José Salm, and Jon Corson-Rikert Laure Haak, Moderator August 24, 2012 1 Format A round
More informationSOA Enabled Workflow Modernization
Abstract Vitaly Khusidman Workflow Modernization is a case of Architecture Driven Modernization (ADM) and follows ADM Horseshoe Lifecycle. This paper explains how workflow modernization fits into the ADM
More informationExploitation of ISS scientific data
Cooperative ISS Research data Conservation and Exploitation Exploitation of ISS scientific data Luigi Carotenuto Telespazio s.p.a. Copernicus Big Data Workshop March 13-14 2014 European Commission Brussels
More information11-12 June 2015, Bari-Italy. Stefano Nativi CNR-IIA
11-12 June 2015, Bari-Italy Stefano Nativi CNR-IIA Coordinating an Observation Network of Networks EnCompassing satellite and IN-situ to fill the Gaps in European Observations GEOSS Information System
More informationData Management Best Practices for Landscape Conservation Cooperatives Part 1: LCC Funded Science
Data Management Best Practices for Landscape Conservation Cooperatives Part 1: LCC Funded Science Version 3.4, November 2012 LCC Network Data Management Working Group Sean Finn, Josh Bradley, Emily Fort,
More informationSemantically-enabled (large-scale) Scientific Data Integration (SESDI)
Semantically-enabled (large-scale) Scientific Data Integration (SESDI) Peter Fox * Deborah McGuinness $# Robert Raskin % Krishna Sinha @ *HAO/ESSL/NCAR $ McGuinness Associates # Knowledge Systems and AI
More informationHeating, Refrigeration and Air Conditioning. Techniques Program Standard. The approved program standard for the
Heating, Refrigeration and Air Conditioning Techniques Program Standard The approved program standard for the Heating, Refrigeration and Air Conditioning Techniques program of instruction leading to an
More informationA structured task-centered framework for online collaboration
Master Thesis - Final Presentation A structured task-centered framework for online collaboration Ph.D. Yolanda Gil University of Southern California Information Sciences Institute Supervisor January 19th,
More informationA Characterization Taxonomy for Integrated Management of Modeling and Simulation Tools
A Characterization Taxonomy for Integrated Management of Modeling and Simulation Tools Bobby Hartway AEgis Technologies Group 631 Discovery Drive Huntsville, AL 35806 256-922-0802 bhartway@aegistg.com
More informationCambridge University Library. Working together: a strategic framework 2010 2013
1 Cambridge University Library Working together: a strategic framework 2010 2013 2 W o r k i n g to g e t h e r : a s t r at e g i c f r a m e w o r k 2010 2013 Vision Cambridge University Library will
More informationWorking with the British Library and DataCite Institutional Case Studies
Working with the British Library and DataCite Institutional Case Studies Contents The Archaeology Data Service Working with the British Library and DataCite: Institutional Case Studies The following case
More informationCoastal Waters Consortium (CWC) Data Management Plan
I. General Description Coastal Waters Consortium (CWC) Data Management Plan The Coastal Waters Consortium The Effects of the Macondo Oil Spill on Coastal Ecosystems will address the fundamental objective
More informationUsing the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
More informationSoftware Description Technology
Software applications using NCB Technology. Software Description Technology LEX Provide learning management system that is a central resource for online medical education content and computer-based learning
More information