Core Technology Development Team Meeting

Similar documents
In 2014, the Research Data Purdue University

Open Access to Manuscripts, Open Science, and Big Data

G&D. apoptosis, tumor suppressor and cell cycle research antibodies. 3 a A JOURNAL OF CELLULAR AND MOLECULAR BIOLOGY

Summary of Responses to the Request for Information (RFI): Input on Development of a NIH Data Catalog (NOT-HG )

TIBCO Spotfire Helps Organon Bridge the Data Gap Between Basic Research and Clinical Trials

Enabling the Big Data Commons through indexing of data and their interactions

High Performance Computing Initiatives

The Migration of Microsoft Excel Tools to Next Generation Platforms. Can You Hear the Footsteps? Jeremy Eden ICEAA Conference San Diego, CA June 2015

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

2019 Healthcare That Works for All

National curriculum and assessment guidelines in preparation for registration as a Medical Biological Scientist

Big Data. The Advisory Committee to the Director (ACD) Data and Informatics Working Group

MediSapiens Ltd. Bio-IT solutions for improving cancer patient care. Because data is not knowledge. 19th of March 2015

A Strategy for Managing Freight Forecasting Data Resources

Big Data and Data Analysis for Personalized Medicine

UCSF School of Pharmacy Case Study: Taxonomy Design for Open Source Digital Asset Management System

1. WHY ARE ELECTRONIC MEDICAL RECORDS IMPORTANT FOR PERSONALIZED MEDICINE?

Alison Yao, Ph.D. July 2014

An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives

UMDNJ New Jersey Medical School

Understanding Big Data Analytics for Research

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Innovative Internship and Career Development Programs for Graduate Students in STEM Disciplines

Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.

Attacking the Biobank Bottleneck

M.S. DEGREE REQUIREMENTS

RFI Summary: Executive Summary

Bachelor of Biomedical Science

Institutional Partnership Program

GUIDELINES for BIH translational Ph.D. grants

A Laboratory Information. Management System for the Molecular Biology Lab

New Brunswick Strategic Planning Proposal. Proposal Title: Innovation in Interdisciplinary Graduate Education--Nano-scale Materials Science

Life as a scientific database curator

Metrics that Matter. From the study The Measurable Contribution of Marketing

BEST. Biology Pre-Medicine COLLEGES

M.S. AND PH.D. IN BIOMEDICAL ENGINEERING

Insert Training Series image here. Session Two: Finding Candidates in your CRM

Data and Informatics Implementation

Virtual research environments: learning gained from a situation and needs analysis for malaria researchers

GENETIC COUNSELING IS IT A CAREER FOR YOU? Judith L Miller, MS, LGC April 8, 2014

Industry 4.0 and Big Data

Richmond, VA. Richmond, VA. 2 Department of Microbiology and Immunology, Virginia Commonwealth University,

Degree Level Expectations, Learning Outcomes, Indicators of Achievement and the Program Requirements that Support the Learning Outcomes

University of California Nuclear Medicine Technology Program Application

Data Management Planning

NIH Common Fund Workshop: Metabolomics and Translational Research Videoconference, Natcher Conference Center, NIH April 6, :00 4:00pm (EDT)

Statistics for BIG data

Conditions for Accreditation as (Basic) Pharmacologist

WE ARE FUTURE LEADERS

University recruitment effectiveness survey 2013

Creating Connection with Hive

M The Nucleus M The Cytoskeleton M Cell Structure and Dynamics

Overview. Overarching observations

Lincoln Fall Focus Event Business Development Best Practices. October 2013

Why PhDs Are Leaving the University (And How to Get Them into Independent Schools)

Vision for the Cohort and the Precision Medicine Initiative Francis S. Collins, M.D., Ph.D. Director, National Institutes of Health Precision

Is there a Crisis in Nuclear and Radiochemistry Education in the U.S.?

US Structure and Ground Motion Data. Virtual Data Center (VDC)

Lecturer/Senior Lecturer in Crime Scenes Investigation

2013 Pathology Workforce Summit

Johns Hopkins University Bloomberg School of Public Health

Professional Doctorate Program in Medical Physics (DMP) University of Cincinnati. Program Assessment. August 2014

Administrator Phase 1 MBChB Medical degree course

Multivariate time series analysis of physiological and clinical data to predict patent ductus arteriosus (PDA) in neo-natal patients

Improving EHR Functionality

Accelerating Cross-Sectoral Collaboration on Data in Climate, Education and Health

TABLE OF CONTENTS ! "!

Data Management for arts research: the experience at University of the Arts London

THE UNIVERSITY OF MANCHESTER PARTICULARS OF APPOINTMENT FACULTY OF MEDICAL & HUMAN SCIENCES INSTITUTE OF POPULATION HEALTH

Considerations for Research Data Management

SHARING RESEARCH DATA POLICY, INFRASTRUCTURE, PEOPLE

ResearchGate. Scientific Profile. Professional network for scientists. ResearchGate is. Manage your online presence

Regulations and Procedures Governing the Award of the Degrees of: Doctor of Philosophy by Published Work

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Analysis of Big Data Survey 2015 on Skills, Training and Capacity Building

Expanding Distance Learning Through Videoconferencing. Joan Hanor, Ph.D. Professor California State University San Marcos

Checklist for a Data Management Plan draft

NIDA Big Data Strategic Planning Workgroup

Department of Animal, Dairy and Veterinary Science Mid-Term Graduate Review of 5-Year Plan December 31, Program description:

Biomedical Science. General Syllabus for Postgraduate Research Training Programme in Biomedical Science

Ranking Member Subcommittee on Labor-HHS-Education Subcommittee on Labor-HHS-Education. Washington, DC Washington, DC 20515

Global Scientific Data Infrastructures: The Big Data Challenges. Capri, May, 2011

Policy Brief: Protecting Privacy in Cloud-Based Genomic Research

Positions Available in SINAPSE INSTITUTE in Singapore

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

The Risks and Promises of Cloud Computing for Genomics

Healthcare Professional. Driving to the Future 11 March 7, 2011

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO

Faculty of of Science

DMBI: Data Management for Bio-Imaging.

I M HEALING SKIN WITH LIGHT

How Can Institutions Foster OMICS Research While Protecting Patients?

Workprogramme

By: Omar AL-Rawajfah, RN, PhD

Regulations and Procedures Governing the Award of the Degrees of: Doctor of Philosophy by Published Work

RESEARCH PAPER. Big data are we nearly there yet?

Fax No: (0360) , Academic Plan

Multiple Myeloma Research Foundation Senior Research Award. Program Guidelines

CALIFORNIA STATE UNIVERSITY SAN MARCOS. Procedure for Submitting Proposals for New Options, Concentrations, Special Emphases and Minors

SIPBS Portfolio Entry

Transcription:

Core Technology Development Team Meeting

Agenda v Presentation User Needs Survey/Analysis (Todd/Vidya) v PP integration v Brief updates from others Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego 2

[Year] biocaddie Interview Summary VIDYA NARAYANA

Total interviews conducted until 06/22/2015: 10 User profiles: 1 Radiation Oncologist 1 public health Ph.D student 1 anesthesiologist 1 Assistant Professor, Molecular Diagnostic laboratory 1 Associate professor, Department of Chemistry 1 Graduate student, Molecular biology 1 researcher, National Cancer Institute of NIH 1 Professor of Physiology, Medicine and Cardiology 1 Project scientist, Department of Physiology, Medicine and anesthesiology, NHLBI proteomics center 1 Assistant research scientist, Comparative genomics Themes of questions asked: 1) Datasets involved/working on 2) Searching strategies 3) Preferred data formats, problems associated with it and most suitable format in their field 4) Data Visualization strategies preferred, problems and opinions 5) Access rights 6) Current trending problems in their respective field 7) Opinion on how to make data more discoverable Datasets involved/working on: All of the interviewees gave a detailed descriptions of the datasets that they are currently involved in (I can provide a detailed list). In 1 case, the interviewee mentioned the time gap involved in working with a particular dataset. Searching Strategies All of the interviewees mentioned that they knew exactly where they wanted to go to and did not have any problem in searching for data. However in one case, 1 candidate although not having problem in finding data expressed the need of having a list of the databases and descriptions about what could be found here and quick links to them. Preferred data formats

All of the interviewees gave a list of formats they work on (I can provide a detailed list here too), however most of them (7/10) preferred to download the raw txt file and convert to a format they wished. 1 person used a step by step protocol to curate and wrangle the data for class purposes, however she wished the data would be available to her in a form she required because her students were spending more time in curating the data than making a meaning out of it. Access rights None of the interviewees faced any problems in gaining access to their data. However 1 person expressed the need of shortening the process and the number of people it involved in gaining access to clinical patient data. Trending problems in their respective field This question had to be modified several ways (with the aim being constant) in asking people, keeping in mind the experience level of the interview. Answers to this question varied from individual to individual. Some of them being: 1 person had a lot of data available to her in the form of pictures. They were stored into the database in the form of pictures over many years. But when she wanted to use the data she had to go over the files individually to get clinical details from it. She calls it the failure of the system. She wishes Bigdata set captures the nitty gritty details of individual case files. In her opinion, bigdata is not just accumulating a lot of data, but powerful enough to capture very minute information even from less amount of data The same person above, looked for a particular syndrome amongst kids in the hospital s database for research, she could not search for it because the database wouldn t give an option to search by a certain factor. She then recruited a graduate student to look through 3000+ papers. 1 person described the lack of description of data at an individual level. Each of the data element and their connection with the main purpose should and must be described according to her. 1 person felt that there is a lot of useful data collected by independent researchers. She personally knows many people who collect data but are not willing to share. She feels there should be incentives (not necessarily money incentives) of some kind to make people give out their datasets. She also expressed concerns that if it is monetary incentive many people might misuse also. 1 person expressed that the metadata in bigdata itself are the biggest problems. He expressed that almost no one uses controlled vocabulary. He also stated that metadata is highly contextually variable. However his biggest problem in databases is that most databases do not support querying into data (metadata v/s content) 1 person expressed that there is a learning gap/education gap between communities that have embraced Hadoop and communities that use pipelines as means of constructing their big data. He feels that it is difficult to transition from the pipeline world to the big data world partly because there is not enough matured expertise available at the moment (he meant that it is

hard to get sufficient training for people). Training and tools he summarized as the big problems. 2 people expressed that discoverability of data is the biggest problem because of lack of indexing. Although not having faced these problems themselves they expressed that they are aware of people who face this problem. In general the people whom I am interviewing from the BD2K centers are able to give a holistic view of the problems and the people whom I interviewed in TMC are able to express problems at their individual level. Seems like a fair combination. Opinion on making data more discoverable This question was also modified (with the aim of the question being constant) according to the person being interviewed. Some opinions were Referencing to the dataset in the abstract of a paper so that it comes up faster in a search Collaborating with the popular data hosting websites The rest of the opinions were all associated with some examples. I can provide an exhaustive summary after all the interviews. Number of upcoming interviews: 8

Other issues v Any other issues? v Thank You