Understanding Big Data Analytics for Research
|
|
|
- Ashlynn Price
- 10 years ago
- Views:
Transcription
1 Understanding Big Data Analytics for Research Hye-Chung Kum Texas A&M Health Science Center, Dept. of Health Policy & Management University of North Carolina at Chapel Hill, Dept. of Computer Science
2 Agenda What is Big Data? What is Data Science? What is Population Informatics & the Social Genome?
3 Agenda What is Big Data? What is Data Science? What is Population Informatics & the Social Genome?
4 Properties of BIG DATA : 4V Volume : lots of data Velocity : constantly generating & changing Variety : expressed in many ways Veracity : lots of errors (Value) EXAMPLE: the INTERNET! What do you do to find information/knowledge on the Internet?
5 Finding actionable information on the Internet Figure out your question (refine as you find out more) Descriptive: what is X? Hypothesis: Does X do Y? Ontology/Taxonomies: Knowledge representation about the world (synonyms, relationship between concepts) Information integration Triangulation / validation Map: Zoom In / Zoom Out
6 The Big Data Problem Nutshelled Michael Franklin (UC Berkley) Something s gotta Time give: Massive Diverse and Growing Data Money Quality
7 AMPLab: Integrating Three Key Resources Algorithms Machine Learning, Statistical Methods Prediction, Business Intelligence Machines Clusters and Clouds Warehouse Scale Computing People Crowdsourcing, Human Computation Data Scientists, Analysts
8 NIST Big Data Public Working Group (NBD-PWG) NIST: National Institute of Standards and Technology (HIPAA security standard) Leaders of activity Wo Chang, NIST Robert Marcus, ET-Strategies Chaitanya Baru, UC San Diego
9 NBD-PWG Subgroups & Co-Chairs Requirements and Use Cases SG Geoffrey Fox, Indiana U.; Joe Paiva, VA; Tsegereda Beyene, Cisco Definitions and Taxonomies SG Nancy Grady, SAIC; Natasha Balac, SDSC; Eugene Luster, R2AD Reference Architecture SG Orit Levin, Microsoft; James Ketner, AT&T; Don Krapohl, Augmented Intelligence Security and Privacy SG Arnab Roy, CSA/Fujitsu Nancy Landreville, U. MD Akhil Manchanda, GE Technology Roadmap SG Carl Buffington, Vistronix; Dan McClary, Oracle; David Boyd, Data Tactic 9
10
11 NIH: Big Data to Knowledge (BD2K) NIH Names Dr. Philip E. Bourne First Associate Director for Data Science December 9, 2013 NIH commits $24 million annually for Big Data Centers of Excellence July 22, 2013 Bioinformatics DNA/RNA data HG html
12 NIH Definition The term 'Big Data' is meant to capture the opportunities and challenges facing all biomedical researchers in accessing, managing, analyzing, and integrating datasets of diverse data types [e.g., imaging, phenotypic, molecular (including various ' omics'), exposure, health, behavioral, and the many other types of biological and biomedical and behavioral data] that are increasingly larger, more diverse, and more complex, and that exceed the abilities of currently used approaches to manage and analyze effectively. Data Scientist: Development of a sufficient cadre of researchers skilled in the science of Big Data, in addition to elevating general competencies in data usage and analysis across the behavioral research workforce.
13 NIH: 4 Big Data Issues Data Compression/Reduction Data Compression refers to the algorithm-based conversion of large data sets into alternative representations that require less space in memory. Data Reduction refers to the reduction of data volume via the systematic removal of unnecessary data bulk. Data Visualization Data Visualization refers broadly to human-centric data representation that aids information presentation, exploration, and manipulation. This is typically performed via the use of visual and graphical techniques; however, these can be augmented with sound and other sensory cues to create deeper experiences. SEE the DATA: Zoom In / Zoom Out (mapquest) Data Provenance (replicable science tractable processes) Data Provenance refers to the chronology or record of transfer, use, and alteration of data that documents the reverse path from a particular set of data back to the initial creation of a source dataset. Provenance of digital scientific data is useful for determining attribution, enabling data citation, identifying relationships between objects, tracking back differences in similar results, guaranteeing the reliability of the data, and to allow researchers to determine whether a particular dataset can be used in their research by providing lineage information about the data. Good programming practice Data Wrangling (data cleaning/integration) Data Wrangling is a term that is applied to activities that make data more usable by changing their form but not their meaning. Data wrangling may involve reformatting data, mapping data from one data model to another, and/or converting data into more consumable forms.
14 NIH and Biomedical Big Data
15 Thomas Davenport Competing on Analytics Skill set for good data scientists IT & Programming skills Statistical skills Business skills: Understand pros/cons of decisions & actions Communication skills Excel / PowerPoint Intense curiosity: the most important skill or trait. a desire to go beyond the surface of a problem, find the question at its heart, and distill them into a very clear set of hypothesis that can be tested
16 Data science teams need people with the skills and curiosity to ask the big questions (oreilly) Technical expertise: the best data scientists typically have deep expertise in some scientific discipline. Curiosity: a desire to go beneath the surface and discover and distill a problem down into a very clear set of hypotheses that can be tested. Storytelling: the ability to use data to tell a story and to be able to communicate it effectively. Cleverness: the ability to look at a problem in different, creative ways. Health is a very important domain Team lead: good questions, good interpretation & implications
17 Thank you! Questions? Population Informatics Research Group
Big Data for Government Symposium http://www.ttcus.com
@TECHTrain Big Data for Government Symposium http://www.ttcus.com Linkedin/Groups: Technology Training i Corporation NIST Big Data NIST Bi D t Public Working Group p Wo Chang, NIST, [email protected] Ro
Chris Greer. Big Data and the Internet of Things
Chris Greer Big Data and the Internet of Things Big Data and the Internet of Things Overview of NIST Internet of Things Big Data Public Working Group NIST Bird s eye view The National Institute of Standards
DRAFT NIST Big Data Interoperability Framework: Volume 6, Reference Architecture
NIST Special Publication 1500-6 DRAFT NIST Big Data Interoperability Framework: Volume 6, Reference Architecture NIST Big Data Public Working Group Reference Architecture Subgroup Draft Version 1 April
DRAFT NIST Big Data Interoperability Framework: Volume 6, Reference Architecture
NIST Special Publication XXX-XXX DRAFT NIST Big Data Interoperability Framework: Volume 6, Reference Architecture NIST Big Data Public Working Group Reference Architecture Subgroup Draft Version 1 April
ISO JTC 1 SGBD Mtg and ACM Workshop
ISO JTC 1 SGBD Mtg and ACM Workshop Technology Roadmap Subgroup Presentation March 18 th, 2014 Carl Buffington (Vistronix) David Boyd (L-3 Data Tactics) Dan McClary (Oracle) Overview Goals and Objectives
NIST Big Data Interoperability Framework: Volume 2, Big Data Taxonomies
NIST Special Publication 1500-2 NIST Big Data Interoperability Framework: Volume 2, Big Data Taxonomies Final Version 1 NIST Big Data Public Working Group Definitions and Taxonomies Subgroup This publication
PHPM 672 Data Science for HSR PHPM 677 Data Science in Public Health
PHPM 672 Data Science for HSR PHPM 677 Data Science in Public Health Hye-Chung Kum Population Informatics Research Group http://research.tamhsc.edu/pinformatics/ http://pinformatics.web.unc.edu/ License:
NIST Big Data Phase I Public Working Group
NIST Big Data Phase I Public Working Group Reference Architecture Subgroup May 13 th, 2014 Presented by: Orit Levin Co-chair of the RA Subgroup Agenda Introduction: Why and How NIST Big Data Reference
Big Data to Knowledge (BD2K)
Big Data to Knowledge () potential funding agency synergies Jennie Larkin, PhD Office of the Associate Director of Data Science National Institutes of Health idash-pscanner meeting UCSD September 16, 2014
Data Science at the NIH Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health
Data Science at the NIH Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health Data Science Timeline 6/12 Findings: Sharing data & software through catalogs Support methods
DRAFT NIST Big Data Interoperability Framework: Volume 1, Definitions
NIST Special Publication 1500-1 DRAFT NIST Big Data Interoperability Framework: Volume 1, Definitions NIST Big Data Public Working Group Definitions and Taxonomies Subgroup Draft Version 1 April 6, 2015
Standard Big Data Architecture and Infrastructure
Standard Big Data Architecture and Infrastructure Wo Chang Digital Data Advisor Information Technology Laboratory (ITL) National Institute of Standards and Technology (NIST) [email protected] May 20, 2016
MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012
MEDICAL DATA MINING Timothy Hays, PhD Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 2 Healthcare in America Is a VERY Large Domain with Enormous Opportunities for Data
Summary of Responses to the Request for Information (RFI): Input on Development of a NIH Data Catalog (NOT-HG-13-011)
Summary of Responses to the Request for Information (RFI): Input on Development of a NIH Data Catalog (NOT-HG-13-011) Key Dates Release Date: June 6, 2013 Response Date: June 25, 2013 Purpose This Request
Demystifying The Data Scientist
Demystifying The Data Scientist Natasha Balac, Ph.D. Predictive Analytics Center of Excellence, Director San Diego Supercomputer Center University of California, San Diego Brief History of SDSC 1985-1997:
Big Data Research in the AMPLab: BDAS and Beyond
Big Data Research in the AMPLab: BDAS and Beyond Michael Franklin UC Berkeley 1 st Spark Summit December 2, 2013 UC BERKELEY AMPLab: Collaborative Big Data Research Launched: January 2011, 6 year planned
NIST Big Data Interoperability Framework: Volume 5, Architectures White Paper Survey
NIST Special Publication 1500-5 NIST Big Data Interoperability Framework: Volume 5, Architectures White Paper Survey Final Version 1 NIST Big Data Public Working Group Reference Architecture Subgroup This
DRAFT NIST Big Data Interoperability Framework: Volume 5, Architectures White Paper Survey
NIST Special Publication 1500-5 DRAFT NIST Big Data Interoperability Framework: Volume 5, Architectures White Paper Survey NIST Big Data Public Working Group Reference Architecture Subgroup Draft Version
Cloud and Big Data Standardisation
Cloud and Big Data Standardisation EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud 15 October 2013, Luxembourg Yuri Demchenko System and Network Engineering Group, University of Amsterdam
Big Data R&D Initiative
Big Data R&D Initiative Howard Wactlar CISE Directorate National Science Foundation NIST Big Data Meeting June, 2012 Image Credit: Exploratorium. The Landscape: Smart Sensing, Reasoning and Decision Environment
HOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD
HOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD DEFINITION OF BIG DATA Big data is a broad term for data sets so large or complex that traditional data processing applications
Big Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management
Big Data and New Paradigms in Information Management Vladimir Videnovic Institute for Information Management 2 "I am certainly not an advocate for frequent and untried changes laws and institutions must
www.pwc.com Implementation of Big Data and Analytics Projects with Big Data Discovery and BICS March 2015
www.pwc.com Implementation of Big Data and Analytics Projects with Big Data Discovery and BICS Agenda Big Data Discovery Oracle Business Intelligence Cloud Services (BICS) Use Cases How to start and our
CLUSTER ANALYSIS WITH R
CLUSTER ANALYSIS WITH R [cluster analysis divides data into groups that are meaningful, useful, or both] LEARNING STAGE ADVANCED DURATION 3 DAY WHAT IS CLUSTER ANALYSIS? Cluster Analysis or Clustering
Information Visualization WS 2013/14 11 Visual Analytics
1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and
ISO/IEC JTC1 SC32. Next Generation Analytics Study Group
November 13, 2013 ISO/IEC JTC1 SC32 Next Generation Analytics Study Group Title: Author: Project: Status: Big Data Efforts Keith W. Hare Discussion Paper References: 1/6 1 NIST Big Data Public Working
Survey of Big Data Architecture and Framework from the Industry
Survey of Big Data Architecture and Framework from the Industry NIST Big Data Public Working Group Sanjay Mishra May13, 2014 3/19/2014 NIST Big Data Public Working Group 1 NIST BD PWG Survey of Big Data
The InterNational Committee for Information Technology Standards INCITS Big Data
The InterNational Committee for Information Technology Standards INCITS Big Data Keith W. Hare JCC Consulting, Inc. April 2, 2015 Who am I? Senior Consultant with JCC Consulting, Inc. since 1985 High performance
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
A New Era Of Analytic
Penang egovernment Seminar 2014 A New Era Of Analytic Megat Anuar Idris Head, Project Delivery, Business Analytics & Big Data Agenda Overview of Big Data Case Studies on Big Data Big Data Technology Readiness
Balancing Big Data for Security, Collaboration and Performance
Balancing Big Data for Security, Collaboration and Performance Sai Balu Lineberger Cancer Center UNC Chapel Hill Oct 14, 2014 About UNC Oldest Public University -1793 Top 5 Public University. 46th World
Optimized for the Industrial Internet: GE s Industrial Data Lake Platform
Optimized for the Industrial Internet: GE s Industrial Lake Platform Agenda The Opportunity The Solution The Challenges The Results Solutions for Industrial Internet, deep domain expertise 2 GESoftware.com
COMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
"The performance driven Enterprise" Emerging trends in Enterprise BI Platforms
1 Month, Day, Year Venue City "The performance driven Enterprise" Emerging trends in Enterprise BI Platforms Kostiantyn Stupak Oracle BI representative in Ukraine 2 The Race to Gain Insight 2014? 50% 2009
Integrated Biomedical and Clinical Research Informatics for Translational Medicine and Therapeutics
Integrated Biomedical and Clinical Research Informatics for Translational Medicine and Therapeutics J. Richard Landis, PhD Robert M. Curley, MS Gregg Fromell, MD Center for Clinical Epidemiology and Biostatistics
Collaborations between Official Statistics and Academia in the Era of Big Data
Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI [email protected] What
Customized Report- Big Data
GINeVRA Digital Research Hub Customized Report- Big Data 1 2014. All Rights Reserved. Agenda Context Challenges and opportunities Solutions Market Case studies Recommendations 2 2014. All Rights Reserved.
High Performance Computing Initiatives
High Performance Computing Initiatives Eric Stahlberg September 1, 2015 DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health National Cancer Institute Frederick National Laboratory is
Alison Yao, Ph.D. July 2014
* Alison Yao, Ph.D. Program Officer, Office of Genomics and Advanced Technologies Division of Microbiology and Infectious Diseases National Institute of Allergy and Infectious Diseases National Institutes
A Design Technique: Data Integration Modeling
C H A P T E R 3 A Design Technique: Integration ing This chapter focuses on a new design technique for the analysis and design of data integration processes. This technique uses a graphical process modeling
Understanding the Value of In-Memory in the IT Landscape
February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to
locuz.com Big Data Services
locuz.com Big Data Services Big Data At Locuz, we help the enterprise move from being a data-limited to a data-driven one, thereby enabling smarter, faster decisions that result in better business outcome.
Panel on Emerging Cyber Security Technologies. Robert F. Brammer, Ph.D., VP and CTO. Northrop Grumman Information Systems.
Panel on Emerging Cyber Security Technologies Robert F. Brammer, Ph.D., VP and CTO Northrop Grumman Information Systems Panel Moderator 27 May 2010 Panel on Emerging Cyber Security Technologies Robert
Oracle Big Data Discovery Unlock Potential in Big Data Reservoir
Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Gokula Mishra Premjith Balakrishnan Business Analytics Product Group September 29, 2014 Copyright 2014, Oracle and/or its affiliates. All
Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness
Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness Melanie Dulong de Rosnay Fellow, Science Commons and Berkman Center for Internet & Society at Harvard University This article
Secure Cloud Computing Concepts Supporting Big Data in Healthcare. Ryan D. Pehrson Director, Solutions & Architecture Integrated Data Storage, LLC
Secure Cloud Computing Concepts Supporting Big Data in Healthcare Ryan D. Pehrson Director, Solutions & Architecture Integrated Data Storage, LLC Learning Objectives After this session, the learner should
Sunnie Chung. Cleveland State University
Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:
Beyond Traditional Management Reporting. 2013 IBM Corporation
Beyond Traditional Management Reporting 1 Agenda From Reporting to Business Analytics Expanding your capabilities set Workspace Authoring Statistical Analysis Predictive Modeling What-if analysis and planning
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Big Data Effects on Weather and Climate
Big Data Effects on Weather and Climate Informal Discussions on The New Economics Nancy Grady, PhD, Technical Fellow, Data Science, SAIC David Green, PhD, Emerging Services, NWS Troy Anselmo, Senior Solution
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper CAST-2015 provides an opportunity for researchers, academicians, scientists and
North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics
North Highland and Analytics Governance Considerations for Big Analytics Agenda Traditional BI/Analytics vs. Big Analytics Types of Requiring Governance Key Considerations Information Framework Organizational
www.sdsc.edu/research/ipp.html
www.sdsc.edu/research/ipp.html Industry s Gateway to SDSC The Industrial Partners Program (IPP) provides member companies with a framework for interacting with SDSC research-ers and staff, exchanging information,
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
Putting IBM Watson to Work In Healthcare
Martin S. Kohn, MD, MS, FACEP, FACPE Chief Medical Scientist, Care Delivery Systems IBM Research [email protected] Putting IBM Watson to Work In Healthcare 2 SB 1275 Medical data in an electronic or
Big Data, Data Analytics, and Data Visualization building your knowledge and expertise. September 15, 2015
+ Big Data, Data Analytics, and Data Visualization building your knowledge and expertise September 15, 2015 Today s Agenda 2! Kickoff: Glossary of Terms! Data analytics! Data visualization! Big Data! Body
Big Data & Analytics for Semiconductor Manufacturing
Big Data & Analytics for Semiconductor Manufacturing 半 導 体 生 産 におけるビッグデータ 活 用 Ryuichiro Hattori 服 部 隆 一 郎 Intelligent SCM and MFG solution Leader Global CoC (Center of Competence) Electronics team General
Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee
Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee Technology in Pedagogy, No. 8, April 2012 Written by Kiruthika Ragupathi ([email protected]) Computational thinking is an emerging
The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics
The Data Engineer Mike Tamir Chief Science Officer Galvanize Steven Miller Global Leader Academic Programs IBM Analytics Alessandro Gagliardi Lead Faculty Galvanize Businesses are quickly realizing that
Visual Analytics. Daniel A. Keim, Florian Mansmann, Andreas Stoffel, Hartmut Ziegler University of Konstanz, Germany http://infovis.uni-konstanz.
Visual Analytics Daniel A. Keim, Florian Mansmann, Andreas Stoffel, Hartmut Ziegler University of Konstanz, Germany http://infovis.uni-konstanz.de SYNONYMS Visual Analysis; Visual Data Analysis; Visual
Big Data and Data Analytics
2.0 Big Data and Data Analytics (Volume 18, Number 3) By Heather A. Smith James D. McKeen Sponsored by: Introduction At a time when organizations are just beginning to do the hard work of standardizing
BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, [email protected]) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
Why include analytics as part of the School of Information Technology curriculum?
Why include analytics as part of the School of Information Technology curriculum? Lee Foon Yee, Senior Lecturer School of Information Technology, Nanyang Polytechnic Agenda Background Introduction Initiation
RFI Summary: Executive Summary
RFI Summary: Executive Summary On February 20, 2013, the NIH issued a Request for Information titled Training Needs In Response to Big Data to Knowledge (BD2K) Initiative. The response was large, with
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
From Research to Practice: New Models for Data-sharing and Collaboration to Improve Health and Healthcare
From Research to Practice: New Models for Data-sharing and Collaboration to Improve Health and Healthcare Joe Selby, MD, MPH, Executive Director, PCORI Francis Collins, MD, PhD, Director, National Institutes
