The InterNational Committee for Information Technology Standards INCITS Big Data



Similar documents
Standard Big Data Architecture and Infrastructure

Big Data Systems and Interoperability

Standardization Requirements Analysis on Big Data in Public Sector based on Potential Business Models

A Big Picture for Big Data

BIG DATA CHALLENGES AND PERSPECTIVES

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment

The Concept of Big Data Reference Model

BIRT in the World of Big Data

Cloud and Big Data Standardisation

Big Data Standardisation in Industry and Research

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Survey of Big Data Architecture and Framework from the Industry

What happens when Big Data and Master Data come together?

ISO/IEC JTC1 SC32. Next Generation Analytics Study Group

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

ca IT Leaders Forum Working in the Cloud using the new ISO/IEC/ITU-T Cloud Computing Standards Dr David Ross, Chief Information Security Officer,

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

SQLSaturday #399 Sacramento 25 July, Big Data Analytics with Excel

White Paper: Datameer s User-Focused Big Data Solutions

Overview NIST Big Data Working Group Activities

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Big Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft

Interoperable Cloud Storage with the CDMI Standard

BIG DATA SOLUTION DATA SHEET

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

International Organization for Standardization TC 215 Health Informatics. Audrey Dickerson, RN MS ISO/TC 215 Secretary

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Luncheon Webinar Series May 13, 2013

Ali Eghlima Ph.D Director of Bioinformatics. A Bioinformatics Research & Consulting Group

The 3 questions to ask yourself about BIG DATA

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

The 4 Pillars of Technosoft s Big Data Practice

Cloud Computing and Big Data What Technical Writers Need to Know

Big Data Analytics in Space Exploration and Entrepreneurship

Genesee Health System RFI-Business Intelligence & Analytics with Dashboard Reporting Questions and Answers

Big Data and Analytics (Fall 2015)

Big Data & Security. Aljosa Pasic 12/02/2015

Software Defined Infrastructure The Next Wave of Workload Portability Vinod Eswaraprasad Principal Architect, Wipro

DATA MINING AND WAREHOUSING CONCEPTS

Collaborations between Official Statistics and Academia in the Era of Big Data

Emerging Trends and The Role of Standards in Future Health Systems. Nation-wide Healthcare Standards Adoption: Working Groups and Localization

Integrating Big Data into the Computing Curricula

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Manifest for Big Data Pig, Hive & Jaql

Big Data and Data Science: Behind the Buzz Words

Big Data a threat or a chance?

No Data Governance, No Actionable Insights

NoSQL and Hadoop Technologies On Oracle Cloud

Headstrong: SAP Solution Helps Streamline and Accelerate Financial Services Application Development

The Next Wave of Data Management. Is Big Data The New Normal?

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Informix Product Strategy and Roadmap Data, Cloud, Analytics, Internet of Things

Data processing goes big

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Applications for Big Data Analytics

Big Data Executive Survey

IBM Big Data in Government

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG1 CODING OF STILL PICTURES

BIG DATA TRENDS AND TECHNOLOGIES

Hadoop. Sunday, November 25, 12

Big Data. Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich

Data Refinery with Big Data Aspects

TECHNIQUES FOR OPTIMIZING THE RELATIONSHIP BETWEEN DATA STORAGE SPACE AND DATA RETRIEVAL TIME FOR LARGE DATABASES

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTRODUCTION TO CASSANDRA

BIG DATA TOOLS. Top 10 open source technologies for Big Data

YOU VS THE SENSORS. Six Requirements for Visualizing the Internet of Things. Dan Potter Chief Marketing Officer, Datawatch Corporation

Long Term Record Retention and XAM

Big Data and Analytics: Challenges and Opportunities

What is Big Data? BCS Aberdeen Branch 6 November 2014

BIG DATA Impact on DMOs. TTRA June 21, 2013

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India.

USING BIG DATA FOR INTELLIGENT BUSINESSES

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 12

Exploiting Data at Rest and Data in Motion with a Big Data Platform

COMP9321 Web Application Engineering

Cloud Data Management Interface (CDMI) The Cloud Storage Standard. Mark Carlson, SNIA TC and Oracle Chair, SNIA Cloud Storage TWG

Transcription:

The InterNational Committee for Information Technology Standards INCITS Big Data Keith W. Hare JCC Consulting, Inc. April 2, 2015

Who am I? Senior Consultant with JCC Consulting, Inc. since 1985 High performance database systems Replicating data between database systems SQL Standards committees since 1988 Interim chair of the INCITS Big Data Technical Committee Chair, INCITS Big Data Ad Hoc the USA TAG to the JTC1 Study Group on Big Data 2014 Convenor, ISO/IEC JTC1 SC32 WG3 since 2005 Vice Chair, ANSI INCITS DM32.2 since 2003 Education Muskingum College, 1980, BS in Biology and Computer Science Ohio State, 1985, Masters in Computer & Information Science Slide 2

Topics What is Big Data and why is it important? Where can standards help? What are the challenges? Big Data Efforts Slide 3

What is Big Data? Often described using 3, 4, or 5 V s Volume, Variety, Velocity, Variability, Veracity Imprecise definition because the problem space is imprecise Paradigm Shift Driving Forces How Big is Big? Slide 4

Why Big Data? Learn something from the data that is valuable Analytical and data visualization tools To be most effective, tools need a standard interface to the data sets don t spend a lot of time custom programming an interface per data set. Slide 5

Big Data: Driving Forces Inexpensive storage of large volumes of data Inexpensive compute power Next Generation Analytics Moving from Off-line to in-line embedded analytics Explaining what happened to Predicting what will happen Operating on Data at rest stored someplace Data in motion streaming Multiple disparate data sources Look at available data and wonder what answers are hidden there Slide 6

Big Data Soutions Big Data Solutions are those which address problems in data analytics where the Volume, Velocity, or Variability of the source data exclude delivery of analytic results within the required timeline using conventional solutions on current technology. Gregory Weidman L-3 Data Tactics Big Data Insights http://datatactics.blogspot.com/2013/10/what-is-big-data.html Slide 7

How Big is Big? Data volumes Data Distribution Slide 8

How Big is Big Data Volumes Terabytes 1000**4 Petabytes 1000**5 Exabyte 1000**6 Zettabyte 1000**7 Yottabyte 1000**8 Brontobyte 1000**9 Gegobyte 1000**10 Notes: Brontobyte and Gegobyte are not recognized by the International System of Units, still subject to change. Geobyte is also being used (see http://www.oxfordmathcenter.com/drupal7/node/410) but it is also the name of a US corporation. Slide 9

How Big is Big Data Distribution Server Cluster Datacenter Continent Planet Solar System Slide 10

Many Big Data Projects Rush to get something done without taking the time to do upfront design Meaning of the data documented In the custom coded interface On the whiteboard in someone's cubicle Intuitively obvious based on the abbreviated JSON or XML tags We are reinventing 1960s data technology but on a bigger scale Slide 11

Where can Standards help Big Data? Standards can assist in the Big Data arena, but we have to identify where they could be useful Horizontally Scaled Data Sources Variety of Data Sources Integrating Multiple Data Sources Slide 12

Horizontally Scaled Data Source Horizontally Scaled Storage & Analytics Computer 1 Computer 2 Computer N Horizontal Scaling is one solution to the data volume challenge. Slide 13

Horizontally Scaled Data Source Ease of Use Language for storing data Language for querying metadata Language for querying data Language for specifying distributed queries Potential for standardization! Performance Simple matter of engineering & programming Language for specifying distribution Likely to be product specific Little potential for standardization Slide 14

Variety of Data Sources Tabular data relations Designed, cleansed, curated Spatial data Images & Video Well defined structures Need additional domain information. aerial photos, faces, stars Etc. XML may have well defined DTD (Document Type Definition) Store everything now, figure it out later JSON Java Script Object Notation E.g. network packet logs Multiple storage models to handle the diversity Slide 15

Variety of Data Source Ownership Self Owned Publically Available Available with Restrictions Data for hire Derived Data Slide 16

Integrating Multiple Data Sources Data Source 1 Horizontally Scaled Storage & Analytics Computer 1 Computer 2 Computer N Data Source 2 Analytics Engine Horizontally Scaled Storage & Analytics Computer 1 Computer 2 Computer N Data Source N Data Data Source Registry 2 N Data Source Registry 1 Horizontally Scaled Storage & Analytics Computer 1 Computer 2 Computer N Disclaimer: This diagram assists in identifying standards requirements. It is not intended to be a full processing model. Slide 17

Data Source Registry Requirements Language/Interface for registering data source Support for discovering and identifying available data sources Content of the data source Semantics and Syntax of data Available analytic routines Security/Privacy restrictions Provenance of the data Information about connecting to data source Business agreement information Costs Use Restrictions Service Level Agreements Standards will support integration of multiple data sources Slide 18

Challenges Analytical and data visualization tools Integration of multiple data sources Discovery of data sources Description of data Slide 19

Challenges Analytical and Data Visualization Tools To be most effective, tools need a standard interface to the data sets Don t want to spend a lot of time custom programming an interface per data set. Slide 20

Challenges Integration of Multiple Data Sources It's neat that my data set has eleventy-trillion records Can I do an analysis that integrates my data set with your data set to learn something useful? Slide 21

Challenges Data Source Discovery Programmatically discover that you have an interesting data set available Understand what it is Understand how to access it To accomplish this, you need to register your data set someplace that I can programmatically query. Slide 22

Challenges Description of Data I need to understand what your data set contains Avoid accidently correlating Apple crop in New Zealand with Hedge Hog population in Wales Technical Challenge Need better support for describing data data set registries Social/Management Challenge If you do a good job of describing your data, I get the benefit Slide 23

Big Data Efforts ISO/IEC JTC1 WG9 Big Data INCITS/BigData Technical Committee NIST Big Data Public Working Group Slide 24

ISO/IEC JTC1 WG9 Big Data Established by the November 2014 JTC1 Plenary Two Work Items approved in March, 2015 Definitions Reference Architecture Meetings: April 7-9, Bremen Germany July 7-9, Seoul Korea (tentative) November 2015 Brasilia, Brazil, (tentative) Convenor: Wo Chang, from the US Slide 25

JTC1/WG9 April 2015 Meeting Representatives from 10 national bodies China, Finland, France, Germany, Ireland, Japan, South Korea, Singapore, UK, USA ISO/IEC 20546 Big Data Overview and Vocabulary Editors: Nancy Grady, Lili Yang FDIS March, 2017 ISO/IEC 20547 Big Data Reference Architecture Editors: Suwook Ha, David Boyd, Ray Walshe FDIS March, 2017 Liaison statements to several other groups Slide 26

INCITS/BigData Technical Committee INCITS Executive Board Established January 2015 TAG to JTC1 Big Data Efforts Meetings March 6, 2015 April 30, 2015 June 4, 2015 (Tentative) August 6, 2015 (Tentative) Current Membership, 14 voting, 3 prospective Users outnumber vendors Acting Chair: Keith Hare Slide 27

NIST Big Data Public Working Group Weekly web conferences since June, 2013 Mix of industry, academic, and users Participation is fluid 40 to 50 Output Documents NIST Special Publication 1500-1 through 1500-7 Chaired by Wo Chang of NIST Slide 28

NIST Big Data Interoperability Framework Draft Version 1 Documents NIST Special Publication 1500-1 Volume 1 - NIST Big Data Definitions http://bigdatawg.nist.gov/_uploadfiles/m0392_v1_3022325181.docx NIST Special Publication 1500-2 Volume 2 - NIST Big Data Taxonomies http://bigdatawg.nist.gov/_uploadfiles/m0393_v1_3613775223.docx NIST Special Publication 1500-3 Volume 3 - NIST Big Data Use Case and Requirements http://bigdatawg.nist.gov/_uploadfiles/m0394_v1_4746659136.docx NIST Special Publication 1500-4 Volume 4 - NIST Big Data Security and Privacy http://bigdatawg.nist.gov/_uploadfiles/m0395_v1_4717582962.docx NIST Special Publication 1500-5 Volume 5 - NIST Big Data Architectures White Paper Survey http://bigdatawg.nist.gov/_uploadfiles/m0396_v1_7656223932.docx NIST Special Publication 1500-6 Volume 6 - NIST Big Data Reference Architecture http://bigdatawg.nist.gov/_uploadfiles/m0397_v1_2395481670.docx NIST Special Publication 1500-7 Volume 7 - NIST Big Data Standards Roadmap http://bigdatawg.nist.gov/_uploadfiles/m0398_v1_1449826642.docx Slide 29

Overlapping Participation INCITS/BigData NIST BD PWG JTC1 WG9 BD Overlaps are indicative, but not to scale. Slide 30

Interactions With Other Groups SC32 (DM32) Data Management & Exchange WG2 Metadata WG3 Databases and Database Languages SC27 (CS1) Security SC38 (DAPS38) Cloud Computing ISO/TC 211 (L1) Geographic Information Systems Slide 31

Missing Links A number of Big Data related efforts are open source R Project for Statistical Computing Apache Software Foundation Hadoop and its eco-system Cassandra CouchDB MongoDB Missing Big Data Vendors include (but not limited to) IBM HP Microsoft Google Amazon Slide 32

Big Data in Perspective Space is big. Really big. You just won't believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it's a long way down the road to the chemist's, but that's just peanuts to space. Douglas Adams, The Hitchhiker's Guide to the Galaxy Don t solve problems that you don t have. Keith Gordon, in a talk to BCS Aberdeen, 2014 https://www.youtube.com/watch?v=wdu8oirk7ac Slide 33

Summary Big Data Lots of hype but potential and technology are real Standardization efforts are starting NIST documents provide a good base for discussion Overlap and coordination with other groups Slide 34

Thank you