Community of Science: Strategies for Coordinating Integration of Data USGS Community for Data Integration Kevin T. Gallagher USGS Core Science Systems January 11, 2013 U.S. Department of the Interior U.S. Geological Survey
United Nations & World Bank Global Issues 2
The USGS Science Strategy Reiterates the Need to Be Interdisciplinary The inherent complexity of nature The need to explore questions and problems that are not confined to a single discipline The need to solve societal problems The power of new technologies (Committee on Facilitating Interdisciplinary Research, National Academy of Sciences, 2004) 3
The Specialization or Balkanization of Science Microbiology Geology/Geochemistry/Hy drology Landscape Ecology Community Ecology Epidemiology Ornithology Mammalogy Entomology Botany Sc cale Population Ecology Behavior Physiology Cell Biology Molecular Biology Agriculture Health Natural Resources Application
The President s Science Policy Quotes from the Office of Science and Technology: Data-Enabled Science ( Big Data ) Improved approaches are needed to derive science and social value from the vast amount of data we are now acquiring. Research and development in such approaches as algorithms, data mining, analytics, and visualization tools should be priorities. 5
U.S. Department of the Interior U.S. Geological Survey Data Integration
Data Integration: What is it? It appears intangible. Is it a concept? A picture? An Architecture? APortal? Data Standards? A Data Warehouse? Will we know it when we see it? How will we measure progress towards it? 7
Data Integration: What is it? Access & Discovery Tools Data Extraction, Transformation, Load Interoperable Data Semantic Standards, Vocabulary, Portals, Registries, Catalogs Taxonomy, Schema or Dictionary Applications Geospatial Analysis, Modeling, Visualization Standards for Exchange Protocols, Metadata, Standards 8
Data Integration: Examples Access & Discovery Search Tools Indexed d Science Content t (by geographic location, science topic, or time) Comprehensive Science Catalog (Data, Publications, Specimens, s, etc.) Web Services Portals Applications Geospatial Viewers Quantitative and Analytical Models 3-D and 4-D Visualization High Performance Computing Standards d & Tools XML, and evolving international standards (GeoSciML) Data Loading, Metadata, and Data Exchange Interoperable Data Semantics, Integrated Data Modeling, Archiving and Retrieval Data Hosting Capability Extraction, Load, and Query 9
What Does Data Integration Look Like? Eye Test Data -or - Data What do you see? 10
After Many s, What do you See? -or - Data 11
The Integrated Data View Many participate in the data resource The Data is: Visible Indexed and Accessible Developed According to Science Planning Commonly defined Quality Controlled Designed and Standardized Yet Continually Evolving How do we Get There? Data 12
The Community for Data Integration Today: Di Driving i the Agenda, Defining i Scope, Priority, Pi it and dtasks Volunteers! Chartered with over 500 members Sharing our stories once a month * Unified behind a long-term vision * Identifying and prioritizing projects that can deliver value or savings in the short term Sharing best practices Leveraging investments Serving on working groups (e.g., semantics, citizen science, data management, technology) Evolving solutions through pilot development & testing 13
Community for Data Integration Selection Guidelines Focus on targeted efforts that solve a short-term science challenge Leverage existing capabilities ( bottom up ) Develop solutions or methodology that can be replicated or repeated as well as scaled Create sustainable infrastructure Seek efficiencies or substantial return on investment Expose corporate data Organize science models and outputs t Preserve and access project data 14
The John Wesley Powell Center http://powellcenter.usgs.gov/ Cross-disciplinary scientific collaboration through Working Groups, now partnering with NSF, EPA and others. JWP Center and Community for Data Integration Use JWP Center as an incubator to test the viability of CDI tools and guidance Use Science Working Groups to identify new data management, modeling and tool requirements Apply CDI tools to Science Working Groups, ensuring models and data are preserved and available using standards 15
Working with Partners Earth Science Information ESRI Partnership (ESIP) Fish and Wildlife Service DataONE Lamont-Doherty Earth Earthcube Observatory Geoscience Information University of Auckland, New Network (GIN) Zealand OneGeology University of Idaho CUASI University of Wisconsin Arizona Geological Survey Rensselaer Polytechnic Colorado State University Institute (RPI) DiscoverLife EPA Tufts University Department of Agriculture
USGS Community for Data Integration 17 https://my.usgs.gov/confluence/display/cdi/home
Community for Data Integration Science Support Framework 18
Big Picture Computational Tools & Services Management, Policy & Standards Data & Information Assets Communities of Practice 19
Upper Level Categories Data & Information Assets Data Persistent archives, data registries, catalogs, data, metadata, derived information products, knowledge bases, vocabularies/ontologies. Technology Applications, web services, data discovery tools, models, semantic services & tools, infrastructure, data brokers and visualization tools Computational Tools & Services and visualization tools. Management, Policy & Standards Communities of Practice Business Data stewardship, implementation ti of the Data Management Life Cycle, knowledge management, data standards, governance and policy. People Scientists, CDI as a whole, CDI Working Groups, external partners, and the human network of scientific domain collaborators. 20
CDI Science Support Framework The foundation for sharing science 21
CDI s Mapped to the Science Support Framework 22
National Water Information System
CDI Data Upload, Registry and Access
Mobile Applications
Semantic Technologies for Integrating USGS Data
Data Management Website
GeoDataPortal
To Join the Community for Data Integration Contact: cdi@usgs.govgov Thank You! For more information, contact: Kevin T. Gallagher (kgallagher@usgs.gov) Cheryl Morris (cmorris@usgs.gov) gov) Jennifer Carlino (jcarlino@usgs.gov)
Geo Data Portal (GDP) A simple Web application and set of processing services that t access downscaled d climate projections and other data resources (such as geospatial datasets) that are otherwise difficult to access and manipulate Currently, historic weather and downscaled climate projections are available (CMIP3, RegCM3, PRISM, and others), more climate, weather and geographic data resources will be added 30