Searching biomedical data sets. Hua Xu, PhD The University of Texas Health Science Center at Houston

Size: px
Start display at page:

Download "Searching biomedical data sets. Hua Xu, PhD The University of Texas Health Science Center at Houston"

Transcription

1 Searching biomedical data sets Hua Xu, PhD The University of Texas Health Science Center at Houston

2 Motivations for biomedical data re-use Improve reproducibility Minimize duplicated efforts on creating similar data sets Enable Big Data Analysis

3 NIH BD2K Data Discovery Index Coordination Consortium (U24) Data and Informatics Working Group (DIWG) report, "Promote Data Sharing Through Central and Federated Catalogues." Development of an NIH BD2K Data Discovery Index Coordination Consortium (U24) HL html An NIH Data Discovery Index (DDI) to allow discovery, access, and citation of biomedical data

4 BioCADDIE - Biomedical and healthcare Data Discovery Indexing Engine (PI Dr. Lucila Ohno-Machado) Goal a sustainable ecosystem for making biomedical data sharing by engaging a broad community of stakeholders Discoverability Access Citation

5 Searching biomedical data sets Technical challenges The complexity of biomedical data The free-text description of biomedical data sets Approaches Standardizing representations of description of biomedical data the pfindr project NLP-enabled Elasticsearch Semantic vector-based search

6 The pfindr (phenotype Finding IN Data Repositories) Project To improve the search of dbgap, a database of genotypes and phenotypes, based on phenotype variables. Funded by NHLBI/NIH PI - Dr. Ohno-Machado Research team - Drs. Kim, Doan, and others

7 NCBI s database of Genotypes and Phenotypes (dbgap) dbgap was developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype Until 11/14/2013: top-level studies - 139,238 phenotypes variables - 2,816 datasets - 3,895 analyses 7

8 9/5/13 8

9 Phenotype Variable Standardization Pipeline Phenotype variables Tagger Identify topic and subject of information Categorizer Identify semantic category of phenotypes Topic: Main theme of phenotype variables Subject of information: Bearer of the variable Variable Topic Subject Category Gender of the participant Gender Study subject Demographics CIGARETTES/DAY, EXAM 1 Smoking study subject Smoking History Weight in kg. at baseline weighing patient study subject Clinical Attributes AGE OF LIVING MOTHER Age mother (person) Demographics

10 Phenotype Variable Standardization Pipeline Variable Descriptions Normalization MetaMap Processing Semantic Role Assignment Topic Filtering Variable Categorization Spell out abbreviations and short hand expressions Drop question numbers and other unimportant characters Generate CUIs, concept names, semantic types Semantic types and keywordbased role identification Keep concepts that match SNOMED-CT clinical findings Remove problematic concepts Semantic types and keywordbased categorization Tagger Categorizer 15 semantic categories are selected based on semantic types from MetaMap: Demographics, Medical History, Clinical Attribute, Medication, Lab Tests from two domain experts

11 Phenotype Variable Standardization Pipeline Variable Descriptions Normalization MetaMap Processing Semantic Role Assignment Topic Filtering Variable Categorization 135,608 variables 116,957 phenotypes mapped to Topic 104,172 phenotypes mapped to Category Tagger Categorizer Evaluation: - Random sample of 500 unique phenotypes - Reviewed by 3 domain experts 73% accuracy for topic 71% accuracy for category

12 Search o Term auto-complete o Synonym expansion Search by titles, platform, study PhenDisco system Export to Excel o Selected study metadata o Selected phenotype variables Display o Keyword highlighting o Ranking by relevance o Filter by study metadata o Cross-link related studies 12

13 NLP-enabled elasticsearch A supplement project led by Drs. Hongfang Liu, Serguei Pakhomov, and Hua Xu Elasticsearch for Big Data Distributed Real-time NLP-TAB Visualization Multiple NLP systems

14 Semantic vector-based search A supplement project led by Dr. Trevor Cohen Semantic vector representation for concepts Robustness Scalable and incremental indexing In-memory retrieval and inference How it works for biomedical data search Represent a data set using semantic vector Find similar data sets based on similarity between two semantic vectors of data sets

Summary of Responses to the Request for Information (RFI): Input on Development of a NIH Data Catalog (NOT-HG-13-011)

Summary of Responses to the Request for Information (RFI): Input on Development of a NIH Data Catalog (NOT-HG-13-011) Summary of Responses to the Request for Information (RFI): Input on Development of a NIH Data Catalog (NOT-HG-13-011) Key Dates Release Date: June 6, 2013 Response Date: June 25, 2013 Purpose This Request

More information

Enabling the Big Data Commons through indexing of data and their interactions

Enabling the Big Data Commons through indexing of data and their interactions biomedical and healthcare Data Discovery Index Ecosystem Enabling the Big Data Commons through indexing of and their interactions 2 nd BD2K all-hands meeting Bethesda 11/12/15 Aims 1. Help users find accessible

More information

Big Data to Knowledge (BD2K)

Big Data to Knowledge (BD2K) Big Data to Knowledge () potential funding agency synergies Jennie Larkin, PhD Office of the Associate Director of Data Science National Institutes of Health idash-pscanner meeting UCSD September 16, 2014

More information

biomedical and healthcare Data Discovery Index Ecosystem

biomedical and healthcare Data Discovery Index Ecosystem November 2-4, 2014 kickoff meeting biomedical and healthcare Data Discovery Index Ecosystem Table of Contents 1. Project Overview and Timelines.... 1 2. Community Engagement..... 3 3. Pilot Projects.......

More information

What s Next for Data Sharing: Insight from the NIH Experience

What s Next for Data Sharing: Insight from the NIH Experience What s Next for Data Sharing: Insight from the NIH Experience Jerry Sheehan Assistant Director for Policy Development National Library of Medicine National Institutes of Health SHARE In-Person Meeting

More information

SHARPn SUMMIT SECONDARY USE

SHARPn SUMMIT SECONDARY USE SHARPn SUMMIT SECONDARY USE 3rd Annual Face-to-Face University of Minnesota Rochester Center, 111 South Broadway Rochester, MN 55904 June 11-12, 2012 Join us to discuss: Standards, data integration & semantic

More information

Big Data. The Advisory Committee to the Director (ACD) Data and Informatics Working Group

Big Data. The Advisory Committee to the Director (ACD) Data and Informatics Working Group Big Data The Advisory Committee to the Director (ACD) Data and Informatics Working Group Trans-NIH Big Data Working Groups Interagency Big Data Group under Networking and IT Research and Development Program

More information

RFI Summary: Executive Summary

RFI Summary: Executive Summary RFI Summary: Executive Summary On February 20, 2013, the NIH issued a Request for Information titled Training Needs In Response to Big Data to Knowledge (BD2K) Initiative. The response was large, with

More information

Data and Informatics Implementation

Data and Informatics Implementation Data and Informatics Implementation Advisory Committee to the Director Meeting December 7, 2012 Lawrence A. Tabak, DDS, PhD Deputy Director, NIH Department of Health and Human Services Charge to the Working

More information

Achilles a platform for exploring and visualizing clinical data summary statistics

Achilles a platform for exploring and visualizing clinical data summary statistics Biomedical Informatics discovery and impact Achilles a platform for exploring and visualizing clinical data summary statistics Mark Velez, MA Ning "Sunny" Shang, PhD Department of Biomedical Informatics,

More information

Secondary Use of EMR Data View from SHARPn AMIA Health Policy, 12 Dec 2012

Secondary Use of EMR Data View from SHARPn AMIA Health Policy, 12 Dec 2012 Secondary Use of EMR Data View from SHARPn AMIA Health Policy, 12 Dec 2012 Christopher G. Chute, MD DrPH, Professor, Biomedical Informatics, Mayo Clinic Chair, ISO TC215 on Health Informatics Chair, International

More information

NIH As A Digital Enterprise Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health

NIH As A Digital Enterprise Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health NIH As A Digital Enterprise Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health Data Science Timeline 6/12 Findings: Sharing data & software through catalogs Support

More information

BIG DATA: DATA EVERYWHERE

BIG DATA: DATA EVERYWHERE Line Pouchard, PhD Purdue Libraries, Research Data 03/10/2015 BIG DATA INTEREST GROUP Issues in Big Data Cura/on BIG DATA: DATA EVERYWHERE DEFINITIONS OF DATA CURATION Data curation is a term used to indicate

More information

NIH s Genomic Data Sharing Policy

NIH s Genomic Data Sharing Policy NIH s Genomic Data Sharing Policy 2 Benefits of Data Sharing Enables data generated from one study to be used to explore a wide range of additional research questions Increases statistical power and scientific

More information

Diagnosis Code Assignment Support Using Random Indexing of Patient Records A Qualitative Feasibility Study

Diagnosis Code Assignment Support Using Random Indexing of Patient Records A Qualitative Feasibility Study Diagnosis Code Assignment Support Using Random Indexing of Patient Records A Qualitative Feasibility Study Aron Henriksson 1, Martin Hassel 1, and Maria Kvist 1,2 1 Department of Computer and System Sciences

More information

ICSTI 2014 General Assembly October 18-19, 2014

ICSTI 2014 General Assembly October 18-19, 2014 ICSTI 2014 General Assembly October 18-19, 2014 TACC Workshop Sunday, October 19 th, 2014 Enhancing Discoverability and Accessibility of Scientific and Technical Research Information and Data The TACC

More information

Report of the DTL focus meeting on Life Science Data Repositories

Report of the DTL focus meeting on Life Science Data Repositories Report of the DTL focus meeting on Life Science Data Repositories Goal The goal of the meeting was to inform and discuss research data repositories for life sciences. The big data era adds to the complexity

More information

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / -24.43%

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / -24.43% CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / -24.43% Overview The Cyberinfrastructure Framework for 21 st Century Science, Engineering,

More information

Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data. SHARPfest Washington June 2-3, 2010

Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data. SHARPfest Washington June 2-3, 2010 Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data SHARPfest Washington June 2-3, 2010 PI: Christopher G Chute, MD DrPH Collaborations Agilex Technologies CDISC (Clinical

More information

BD2K Update. Philip Bourne, PhD, FACMI Associate Director for Data Science

BD2K Update. Philip Bourne, PhD, FACMI Associate Director for Data Science BD2K Update Philip Bourne, PhD, FACMI Associate Director for Data Science Advisory Committee to the NIH Director December 11, 2015 http://datascience.nih.gov Slides: http://www.slideshare.net/pebourne

More information

From Research to Practice: New Models for Data-sharing and Collaboration to Improve Health and Healthcare

From Research to Practice: New Models for Data-sharing and Collaboration to Improve Health and Healthcare From Research to Practice: New Models for Data-sharing and Collaboration to Improve Health and Healthcare Joe Selby, MD, MPH, Executive Director, PCORI Francis Collins, MD, PhD, Director, National Institutes

More information

Key Pain Points Addressed

Key Pain Points Addressed Xerox Image Search 6 th International Photo Metadata Conference, London, May 17, 2012 Mathieu Chuat Director Licensing & Business Development Manager Xerox Corporation Key Pain Points Addressed Explosion

More information

European Data Infrastructure - EUDAT Data Services & Tools

European Data Infrastructure - EUDAT Data Services & Tools European Data Infrastructure - EUDAT Data Services & Tools Dr. Ing. Morris Riedel Research Group Leader, Juelich Supercomputing Centre Adjunct Associated Professor, University of iceland BDEC2015, 2015-01-28

More information

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons The NIH Commons Summary The Commons is a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage,

More information

Databases & Data Infrastructure. Kerstin Lehnert

Databases & Data Infrastructure. Kerstin Lehnert + Databases & Data Infrastructure Kerstin Lehnert + Access to Data is Needed 2 to allow verification of research results to allow re-use of data + The road to reuse is perilous (1) 3 Accessibility Discovery,

More information

Collaboration in Data Documentation: Developing STARDAT - The Data Archiving Suite

Collaboration in Data Documentation: Developing STARDAT - The Data Archiving Suite Collaboration in Data Documentation: Developing STARDAT - The Data Archiving Suite Wolfgang Zenk-Möltgen IASSIST 2011 - Data Science Professionals: a Global Community of Sharing May 30 June 3, 2011, Vancouver,

More information

Alison Yao, Ph.D. July 2014

Alison Yao, Ph.D. July 2014 * Alison Yao, Ph.D. Program Officer, Office of Genomics and Advanced Technologies Division of Microbiology and Infectious Diseases National Institute of Allergy and Infectious Diseases National Institutes

More information

Library Requirements

Library Requirements The Open Group Future Airborne Capability Environment (FACE ) Library Requirements Version 2.2 April 2015 Prepared by The Open Group FACE Consortium Business Working Group Library Subcommittee AMRDEC PR1201

More information

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 MEDICAL DATA MINING Timothy Hays, PhD Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 2 Healthcare in America Is a VERY Large Domain with Enormous Opportunities for Data

More information

Data Management Plan. Name of Contractor. Name of project. Project Duration Start date : End: DMP Version. Date Amended, if any

Data Management Plan. Name of Contractor. Name of project. Project Duration Start date : End: DMP Version. Date Amended, if any Data Management Plan Name of Contractor Name of project Project Duration Start date : End: DMP Version Date Amended, if any Name of all authors, and ORCID number for each author WYDOT Project Number Any

More information

The Importance of Bioinformatics and Information Management

The Importance of Bioinformatics and Information Management A Graduate Program for Biological Information Specialists 1 Bryan Heidorn, Carole Palmer, and Dan Wright Graduate School of Library and Information Science University of Illinois at Urbana-Champaign UIUC

More information

Data Science at the NIH Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health

Data Science at the NIH Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health Data Science at the NIH Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health Data Science Timeline 6/12 Findings: Sharing data & software through catalogs Support methods

More information

Disributed Query Processing KGRAM - Search Engine TOP 10

Disributed Query Processing KGRAM - Search Engine TOP 10 fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries Johan Montagnat CNRS, I3S lab, Modalis team on behalf of the CrEDIBLE

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

Artificial Intelligence and Transactional Law: Automated M&A Due Diligence. By Ben Klaber

Artificial Intelligence and Transactional Law: Automated M&A Due Diligence. By Ben Klaber Artificial Intelligence and Transactional Law: Automated M&A Due Diligence By Ben Klaber Introduction Largely due to the pervasiveness of electronically stored information (ESI) and search and retrieval

More information

Getting started with a data quality program

Getting started with a data quality program IBM Software White Paper Information Management Getting started with a data quality program 2 Getting started with a data quality program The data quality challenge Organizations depend on quality data

More information

Pipeliner CRM Phaenomena Guide Sales Target Tracking. 2015 Pipelinersales Inc. www.pipelinersales.com

Pipeliner CRM Phaenomena Guide Sales Target Tracking. 2015 Pipelinersales Inc. www.pipelinersales.com Sales Target Tracking 05 Pipelinersales Inc. www.pipelinersales.com Sales Target Tracking Learn how to set up Sales Target with Pipeliner Sales CRM Application. CONTENT. Setting up Sales Dynamic Target

More information

Informatics Domain Task Force (idtf) CTSA PI Meeting 02/04/2015

Informatics Domain Task Force (idtf) CTSA PI Meeting 02/04/2015 Informatics Domain Task Force (idtf) CTSA PI Meeting 02/04/2015 Informatics Domain Task Force (idtf) Lead Team Paul Harris, Vanderbilt University Medical Center, (co-chair) Steven Reis, University of Pittsburgh

More information

Data Mining Governance for Service Oriented Architecture

Data Mining Governance for Service Oriented Architecture Data Mining Governance for Service Oriented Architecture Ali Beklen Software Group IBM Turkey Istanbul, TURKEY alibek@tr.ibm.com Turgay Tugay Bilgin Dept. of Computer Engineering Maltepe University Istanbul,

More information

Open Access to Manuscripts, Open Science, and Big Data

Open Access to Manuscripts, Open Science, and Big Data Open Access to Manuscripts, Open Science, and Big Data Progress, and the Elsevier Perspective in 2013 Presented by: Dan Morgan Title: Senior Manager Access Relations, Global Academic Relations Company

More information

Ernestina Menasalvas Universidad Politécnica de Madrid

Ernestina Menasalvas Universidad Politécnica de Madrid Ernestina Menasalvas Universidad Politécnica de Madrid EECA Cluster networking event RITA 12th november 2014, Baku Sectors/Domains Big Data Value Source Public administration EUR 150 billion to EUR 300

More information

Find the signal in the noise

Find the signal in the noise Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical

More information

Serendipity a platform to discover and visualize Open OER Data from OpenCourseWare repositories Abstract Keywords Introduction

Serendipity a platform to discover and visualize Open OER Data from OpenCourseWare repositories Abstract Keywords Introduction Serendipity a platform to discover and visualize Open OER Data from OpenCourseWare repositories Nelson Piedra, Jorge López, Janneth Chicaiza, Universidad Técnica Particular de Loja, Ecuador nopiedra@utpl.edu.ec,

More information

Data Wrangling: From the Wild to the Lake

Data Wrangling: From the Wild to the Lake Data Wrangling: From the Wild to the Lake Ignacio Terrizzano Peter Schwarz Mary Roth John Colino IBM Research - Almaden 48 hours of video is uploaded to YouTube every minute Walmart processes million transactions

More information

In 2014, the Research Data group @ Purdue University

In 2014, the Research Data group @ Purdue University EDITOR S SUMMARY At the 2015 ASIS&T Research Data Access and Preservation (RDAP) Summit, panelists from Research Data @ Purdue University Libraries discussed the organizational structure intended to promote

More information

Research Data Networks: Privacy- Preserving Sharing of Protected Health Informa>on

Research Data Networks: Privacy- Preserving Sharing of Protected Health Informa>on Research Data Networks: Privacy- Preserving Sharing of Protected Health Informa>on Lucila Ohno-Machado, MD, PhD Division of Biomedical Informatics University of California San Diego PCORI Workshop 7/2/12

More information

Genomics and Health Data Standards: Lessons from the Past and Present for a Genome-enabled Future

Genomics and Health Data Standards: Lessons from the Past and Present for a Genome-enabled Future Genomics and Health Data Standards: Lessons from the Past and Present for a Genome-enabled Future Daniel Masys, MD Professor and Chair Department of Biomedical Informatics Professor of Medicine Vanderbilt

More information

Best Practices for Data Management. RMACC HPC Symposium, 8/13/2014

Best Practices for Data Management. RMACC HPC Symposium, 8/13/2014 Best Practices for Data Management RMACC HPC Symposium, 8/13/2014 Presenters Andrew Johnson Research Data Librarian CU-Boulder Libraries Shelley Knuth Research Data Specialist CU-Boulder Research Computing

More information

The Risks and Promises of Cloud Computing for Genomics

The Risks and Promises of Cloud Computing for Genomics The Risks and Promises of Cloud Computing for Genomics Laura Lyman Rodriguez, Ph.D. National Human Genome Research Institute P3G Privacy Summit: Data Sharing and Cloud Computing May 3, 2013 Key Elements

More information

K-means Clustering Technique on Search Engine Dataset using Data Mining Tool

K-means Clustering Technique on Search Engine Dataset using Data Mining Tool International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 505-510 International Research Publications House http://www. irphouse.com /ijict.htm K-means

More information

OpenAIRE Research Data Management Briefing paper

OpenAIRE Research Data Management Briefing paper OpenAIRE Research Data Management Briefing paper Understanding Research Data Management February 2016 H2020-EINFRA-2014-1 Topic: e-infrastructure for Open Access Research & Innovation action Grant Agreement

More information

WHAT SHOULD NSF DATA MANAGEMENT PLANS LOOK LIKE

WHAT SHOULD NSF DATA MANAGEMENT PLANS LOOK LIKE WHAT SHOULD NSF DATA MANAGEMENT PLANS LOOK LIKE Bin Ye, College of Agricultural and Life Sciences University of Wisconsin Diane Winter, Inter-university Consortium for Political and Social Research (ICPSR),

More information

Governance in Digital Asset Management

Governance in Digital Asset Management Governance in Digital Asset Management When was the last time you spent longer than it should have taken trying to find a specific file? Did you have to ask someone to help you? Or, has someone asked you

More information

Data platforms to support research, evaluation & practice. David V Ford Professor of Health Informatics School of Medicine, Swansea University

Data platforms to support research, evaluation & practice. David V Ford Professor of Health Informatics School of Medicine, Swansea University Data platforms to support research, evaluation & practice David V Ford Professor of Health Informatics School of Medicine, Swansea University Outline 1. Swift overview of SAIL Databank as used in Wales

More information

Environment Canada Data Management Program. Paul Paciorek Corporate Services Branch May 7, 2014

Environment Canada Data Management Program. Paul Paciorek Corporate Services Branch May 7, 2014 Environment Canada Data Management Program Paul Paciorek Corporate Services Branch May 7, 2014 EC Data Management Program (ECDMP) consists of 5 foundational, incremental projects which will implement

More information

Data advertising and managin system for Biobanks A use case for the egenvar data management system.

Data advertising and managin system for Biobanks A use case for the egenvar data management system. Data advertising and managin system for Biobanks A use case for the egenvar data management system. Sabry Razick (24 October 2014, ESBB) Department of Cancer Research and Molecular Medicine Norwegian University

More information

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute Justin Paschall Team Leader Genetic Variation / EGA ! European Genome-phenome

More information

PPInterFinder A Web Server for Mining Human Protein Protein Interaction

PPInterFinder A Web Server for Mining Human Protein Protein Interaction PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar

More information

A Simplified Framework for Data Cleaning and Information Retrieval in Multiple Data Source Problems

A Simplified Framework for Data Cleaning and Information Retrieval in Multiple Data Source Problems A Simplified Framework for Data Cleaning and Information Retrieval in Multiple Data Source Problems Agusthiyar.R, 1, Dr. K. Narashiman 2 Assistant Professor (Sr.G), Department of Computer Applications,

More information

SEVENTH FRAMEWORK PROGRAMME THEME ICT -1-4.1 Digital libraries and technology-enhanced learning

SEVENTH FRAMEWORK PROGRAMME THEME ICT -1-4.1 Digital libraries and technology-enhanced learning Briefing paper: Value of software agents in digital preservation Ver 1.0 Dissemination Level: Public Lead Editor: NAE 2010-08-10 Status: Draft SEVENTH FRAMEWORK PROGRAMME THEME ICT -1-4.1 Digital libraries

More information

How to stop looking in the wrong place? Use PubMed!

How to stop looking in the wrong place? Use PubMed! How to stop looking in the wrong place? Use PubMed! 1 Why not just use? Plus s Fast! Easy to remember web address Its huge - you always find something It includes PubMed citations Downside Is simply finding

More information

Introduction to Research Data Management. Tom Melvin, Anita Schwartz, and Jessica Cote April 13, 2016

Introduction to Research Data Management. Tom Melvin, Anita Schwartz, and Jessica Cote April 13, 2016 Introduction to Research Data Management Tom Melvin, Anita Schwartz, and Jessica Cote April 13, 2016 What Will We Cover? Why is managing data important? Organizing and storing research data Sharing and

More information

The NIHMS System User s Guide to Submitting a Manuscript

The NIHMS System User s Guide to Submitting a Manuscript On May 2, 2005, The National Institutes of Health (NIH) Public Access Policy went into effect for NIH-funded researchers to submit their peer-reviewed manuscripts only those that have been accepted for

More information

Implementing Ontology-based Information Sharing in Product Lifecycle Management

Implementing Ontology-based Information Sharing in Product Lifecycle Management Implementing Ontology-based Information Sharing in Product Lifecycle Management Dillon McKenzie-Veal, Nathan W. Hartman, and John Springer College of Technology, Purdue University, West Lafayette, Indiana

More information

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD 72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology Paulo.gottgtroy@aut.ac.nz Abstract This paper is

More information

Connecting Basic Research and Healthcare Big Data

Connecting Basic Research and Healthcare Big Data Elsevier Health Analytics WHS 2015 Big Data in Health Connecting Basic Research and Healthcare Big Data Olaf Lodbrok Managing Director Elsevier Health Analytics o.lodbrok@elsevier.com t +49 89 5383 600

More information

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Report on the Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Background and Goals of the Workshop June 5 6, 2012 The use of genome sequencing in human research is growing

More information

The National Cancer Informatics Program (NCIP) Hub

The National Cancer Informatics Program (NCIP) Hub The National Cancer Informatics Program (NCIP) Hub A platform for collaboration and sharing of data, tools, and standards amongst the cancer research community Ishwar Chandramouliswaran September 29 2014

More information

HydroDesktop Overview

HydroDesktop Overview HydroDesktop Overview 1. Initial Objectives HydroDesktop (formerly referred to as HIS Desktop) is a new component of the HIS project intended to address the problem of how to obtain, organize and manage

More information

Election of Diagnosis Codes: Words as Responsible Citizens

Election of Diagnosis Codes: Words as Responsible Citizens Election of Diagnosis Codes: Words as Responsible Citizens Aron Henriksson and Martin Hassel Department of Computer & System Sciences (DSV), Stockholm University Forum 100, 164 40 Kista, Sweden {aronhen,xmartin}@dsv.su.se

More information

Re: Public Access to Peer-Reviewed Scholarly Publications Resulting from Federally Funded Research Request for Information

Re: Public Access to Peer-Reviewed Scholarly Publications Resulting from Federally Funded Research Request for Information December 19, 2011 Office of Science and Technology Policy National Science and Technology Council s Task Force on Public Access to Scholarly Publications 725 17 th Street Washington DC 20502 Via Email

More information

Enhancing Document Review Efficiency with OmniX

Enhancing Document Review Efficiency with OmniX Xerox Litigation Services OmniX Platform Review Technical Brief Enhancing Document Review Efficiency with OmniX Xerox Litigation Services delivers a flexible suite of end-to-end technology-driven services,

More information

Environmental Health Science. Brian S. Schwartz, MD, MS

Environmental Health Science. Brian S. Schwartz, MD, MS Environmental Health Science Data Streams Health Data Brian S. Schwartz, MD, MS January 10, 2013 When is a data stream not a data stream? When it is health data. EHR data = PHI of health system Data stream

More information

Joint Research Centre

Joint Research Centre Joint Research Centre Open Source Monitoring Tools and Applications emm.newsbrief.eu Serving society Stimulating innovation Supporting legislation Open Source Monitoring - Overview EMM Introduction Custom

More information

<no narration for this slide>

<no narration for this slide> 1 2 The standard narration text is : After completing this lesson, you will be able to: < > SAP Visual Intelligence is our latest innovation

More information

AHCCCS Search Engine. Conceptual Design. Anthony Christianson Author Position Date 11/28/07. Version: 1.0

AHCCCS Search Engine. Conceptual Design. Anthony Christianson Author Position Date 11/28/07. Version: 1.0 AHCCCS Search Conceptual Design Author Anthony Christianson Author Position Date 11/28/07 Version: 1.0 11/28/2007 Revision & Sign-off Sheet Change Record Date Author Version Change Reference 12/4/07 Anthony

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

EXCOM 2015 Tromsø, Norway 27-28 August 2015 SCADM Report (Standing Committee on Antarctic Data Management)

EXCOM 2015 Tromsø, Norway 27-28 August 2015 SCADM Report (Standing Committee on Antarctic Data Management) Agenda Item: 3.1 EXCOM 2015 Person Responsible: A Van de Putte Tromsø, Norway 27-28 August 2015 SCADM Report (Standing Committee on Antarctic Data Management) 1 Executive Summary "#$%&'($)*+#*,-.//#$$&&.*0*$)12$#23)$)4)*),&/&*$5(-0346

More information

Data Management at UT

Data Management at UT Data Management at UT Maria Esteva, TACC, maria@tacc.utexas.edu Colleen Lyon, UT Libraries, c.lyon@austin.utexas.edu Angela Newell, ITS, anewell@austin.utexas.edu What is data management? systematic organization

More information

Susanna-Assunta Sansone, PhD. Metadata WG3 chair. 3-workgroup@biocaddie.org

Susanna-Assunta Sansone, PhD. Metadata WG3 chair. 3-workgroup@biocaddie.org Susanna-Assunta Sansone, PhD Metadata WG3 chair 3-workgroup@biocaddie.org http://dx.doi.org/10.6084/m9.figshare.1362572 WG3 Metadata - Goals Define a set of metadata specifications that support intended

More information

Bench to Bedside Clinical Decision Support:

Bench to Bedside Clinical Decision Support: Bench to Bedside Clinical Decision Support: The Role of Semantic Web Technologies in Clinical and Translational Medicine Tonya Hongsermeier, MD, MBA Corporate Manager, Clinical Knowledge Management and

More information

Image Data, RDA and Practical Policies

Image Data, RDA and Practical Policies Image Data, RDA and Practical Policies Rainer Stotzka and many others KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association www.kit.edu Data Life Cycle Lab

More information

Semantic Concept Based Retrieval of Software Bug Report with Feedback

Semantic Concept Based Retrieval of Software Bug Report with Feedback Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop

More information

Canadian National Research Data Repository Service. CC and CARL Partnership for a national platform for Research Data Management

Canadian National Research Data Repository Service. CC and CARL Partnership for a national platform for Research Data Management Research Data Management Canadian National Research Data Repository Service Progress Report, June 2016 As their digital datasets grow, researchers across all fields of inquiry are struggling to manage

More information

Curriculum Vitae. Mahesh Joshi. Education. Research Experience. Publications

Curriculum Vitae. Mahesh Joshi. Education. Research Experience. Publications Mahesh Joshi Curriculum Vitae E-Mail: maheshj@cmu.edu Web: http://www.d.umn.edu/~joshi031/ Education August 2006 present: Masters in Language Technologies, Carnegie Mellon University September 2004 August

More information

INSPIRE Dashboard. Technical scenario

INSPIRE Dashboard. Technical scenario INSPIRE Dashboard Technical scenario Technical scenarios #1 : GeoNetwork catalogue (include CSW harvester) + custom dashboard #2 : SOLR + Banana dashboard + CSW harvester #3 : EU GeoPortal +? #4 :? + EEA

More information

Long Term Preservation of Earth Observation Space Data. Preservation Workflow

Long Term Preservation of Earth Observation Space Data. Preservation Workflow Long Term Preservation of Earth Observation Space Data Preservation Workflow CEOS-WGISS Doc. Ref.: CEOS/WGISS/DSIG/PW Data Stewardship Interest Group Date: March 2015 Issue: Version 1.0 Preservation Workflow

More information

Truck Activity Visualizations In The Cloud

Truck Activity Visualizations In The Cloud Truck Activity Visualizations In The Cloud Presented By Dr. Catherine Lawson NATMEC Improving Traffic Data Collection, Analysis, and Use June 4 7, 2012 Dallas, Texas What Question Are You Trying to Answer?

More information

Website Usage Monitoring and Evaluation

Website Usage Monitoring and Evaluation 11 11 WEBSITE USAGE MONITORING AND EVALUATION Website Usage Monitoring and Evaluation Better Practice Checklist Practical guides for effective use of new technologies in Government www.agimo.gov.au/checklists

More information

A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics

A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics contents A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics Abstract... 2 Need of Social Content Analytics... 3 Social Media Content Analytics... 4 Inferences

More information

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,

More information

INTRODUCTION TO DATA MANAGEMENT

INTRODUCTION TO DATA MANAGEMENT INTRODUCTION TO DATA MANAGEMENT By Michelle Lloyd, Kate Crosby, Peter Lawton Data Management Team Canadian Healthy Oceans Network November 2013 Approved and endorsed by Canadian Healthy Oceans Network

More information

1 Executive Summary... 3. 2 Document Structure... 4. 3 Business Context... 5

1 Executive Summary... 3. 2 Document Structure... 4. 3 Business Context... 5 Contents 1 Executive Summary... 3 2 Document Structure... 4 3 Business Context... 5 4 Strategic Response... 6 4.1 Exploiting SharePoint... 6 4.2 Improving Business Effectiveness... 7 4.3 Improving Governance...

More information

Linked Science as a producer and consumer of big data in the Earth Sciences

Linked Science as a producer and consumer of big data in the Earth Sciences Linked Science as a producer and consumer of big data in the Earth Sciences Line C. Pouchard,* Robert B. Cook,* Jim Green,* Natasha Noy,** Giri Palanisamy* Oak Ridge National Laboratory* Stanford Center

More information

Workforce and Research Needs: Biomedical Big Data Science

Workforce and Research Needs: Biomedical Big Data Science Workforce and Research Needs: Biomedical Big Data Science Valerie Florance, PhD Associate Director for Extramural Programs National Library of Medicine, NIH/DHHS florancev@mail.nih.gov 2 Topics for Today

More information

A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

More information

What you can accomplish with IBMContent Analytics

What you can accomplish with IBMContent Analytics What you can accomplish with IBMContent Analytics An Enterprise Content Management solution What is IBM Content Analytics? Alex On February 14-16, IBM s Watson computing system made its television debut

More information

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data

More information

Application of a Medical Text Indexer to an Online Dermatology Atlas

Application of a Medical Text Indexer to an Online Dermatology Atlas Application of a Medical Text Indexer to an Online Dermatology Atlas GR Kim, MD 1, AR Aronson, PhD 2, JG Mork, MS 2, BA Cohen, MD 3, CU Lehmann, MD 1 1 Division of Health Sciences Informatics, Johns Hopkins

More information