@TECHTrain Big Data for Government Symposium http://www.ttcus.com Linkedin/Groups: Technology Training i Corporation
NIST Big Data NIST Bi D t Public Working Group p Wo Chang, NIST, wchang@nist.gov Ro ber t Marcus, E T S t r a t e g i e s Chaitanya B a r u, UC San u UC San Diego http://bigdatawg.nist.gov November, 20, 2013
AGENDA h h Why Big Data? Why NIST? NBD PWD Charter and Deliverables Overall Workplan O ll W k l Subgroup Charter and Deliverables Definitions & Taxonomies Subgroup Requirements & Use Case Subgroup Security & Privacy Subgroup Reference Architecture Subgroup Technology Roadmap Subgroup Next Steps/Future Activities g y p ISO/IEC JTC 1 Big Data Study Group 3
Why Big Data? Why NIST? Why Big Data? There is a broad agreement among commercial, academic, and government leaders about the remarkable potential of Big Data to spark innovation, fuel commerce, and drive progress. Why NIST? (a) Recommendation from January 15 h d f 17, 2013 Cloud/Big Data Forum l d and (b) A lack of consensus on some important, fundamental questions is confusing potential users and holding back progress. Questions such as: What are the attributes that define Big Data solutions? How is Big Data different from the traditional data environments and related applications that we have encountered thus far? What are the essential characteristics of Big Data environments? Wh h i l h i i f Bi D i? How do these environments integrate with currently deployed architectures? f, g, g What are the central scientific, technological, and standardization challenges that need to be addressed to accelerate the deployment of robust Big Data solutions? NBD PWG is being launched to address these questions and is charged to d l develop consensus definitions, taxonomies, secure reference architecture, and d fi iti t i f hit t d technology roadmap for Big Data that can be embraced by all sectors. 4
C H A R T E R ( M 0 0 0 1 ) T h e f o c u s o f t h e ( N B D P W G ) i s t o f o r m a c o m m u n i t y o f i n t e r e s t f r o m i n d u s t r y, a c a d e m i a, a n d g o v e r n m e n t, w i t h t h e g o a l o f d e v e l o p i n g a c o n s e n s u s d e f i n i t i o n s, t a xo n o m i e s, s e c u r e r e f e r e n c e a r c h i t e c t u r e s, a n d t e c h n o l o g y r o a d m a p. T h e a i m i s t o c r e a t e v e n d o r n e u t r a l, t e c h n o l o g y a n d i n f r a s t r u c t u r e a g n o s t i c d e l i v e r a b l e s t o e n a b l e b i g d a t a s t a ke h o l d e r s t o p i c k a n d c h o o s e b e s t a n a l y t i c s t o o l s f o r t h e i r p r o c e s s i n g a n d v i s u a l i z a t i o n r e q u i r e m e n t s o n t h e m o s t s u i t a b l e c o m p u t i n g p l a t f o r m s a n d c l u s t e r s w h i l e a l l o w i n g v a l u e a d d e d f r o m b i g d a t a s e r v i c e p r o v i d e r s a n d f l o w o f d a t a b e t w e e n t h e s t a ke h o l d e r s i n a c o h e s i v e a n d s e c u r e m a n n e r. DELIVERABLES: Working Draft for 1. B i g D a t a D e f i n i t i o n s 2. B i g D a t a Ta xo n o m i e s 3. B i g D a t a R e q u i r e m e n t s 4. B i g D a t a S e c u r i t y a n d P r i v a c y R e q u i r e m e n t s 5. B i g D a t a A r c h i t e c t u r e s S u r v e y 6. B i g D a t a R e f e r e n c e Architectures 7. B i g D a t a S e c u r i t y a n d P r i v a c y R e f e r e n c e Architectures 8. B i g D a t a Te c h n o l o g y Roadmap LAUNCHED DATE: June 26,, 20133 TARGET DATE: September 27, 2013 CHARTER AND DELIVERABLES NBD PWG 5
6
Requirement s and Use Cases Technology Roadmap NBD PWG Reference Architecture Definitions & Taxonomies Security and Privacy SUBGROUPS A ND THEIR SCOPES AN D DELIVERABLES 7
SUBGR S ROUP Requirements and Use Cases Geoffrey Fox, U. Indiana Joe Paiva, VA Tsegereda Beyene, Cisco Scope (M0020) The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus list of Big Data requirements across all stakeholders. This includes gathering and understanding various use cases from diversified application domains. Tasks Gather input from all stakeholders regarding Big Data requirements. Technology Roadmap Definitions and Taxonomies NBD PWG Requiremen ts Analyze/prioritize a list of challenging general requirements that may delay or prevent adoption of Big Data deployment Develop a comprehensive list of Big Data requirements Reference Architecture Security and Privacy Gov. Big Data Symposium, NIST/ITL, Wo Chang, Nov. 20, 2013 8
Requirements and Use Case Subgroup Key documents: M0105 use cases, M0125, requirements, M0152 working draft Use Case Template 1. Goals, Description 2. 3 3. 4. 5. 6 6. 7. Data Characteristics, Data Types Data Anal tics Data Analytics Current Solutions Security & Privacy Lifecycle Management and Data Quality System Management and Other issues 9
Requirements and Use Case Subgroup 51 Use Cases Received (http://bigdatawg.nist.gov/usecases.php) 1. Government Operation (4): National Archives and Records Administration, Census Bureau 2. Commercial (8): Finance in Cloud, Cloud Backup, Mendeley (Citations), Netflix, Web Search, Digital Materials, Cargo shipping (as in UPS) 3. Defense (3): Sensors, Image surveillance, Situation Assessment 4. Healthcare and Life Sciences (10): Medical records, Graph and Probabilistic analysis, Pathology, Bioimaging, Genomics, Epidemiology, People Activity models, Biodiversity 5. Deep Learning and Social Media (6): Driving Car, Geolocate images, Twitter, Crowd Sourcing, Network Science, NIST benchmark datasets 6. The Ecosystem for Research (4): Metadata, Collaboration, Language Translation, Light source experiments 7. Astronomy and Physics (5): Sky Surveys compared to simulation, Large Hadron Collider at CERN, Belle Accelerator II in Japan 8. Earth, Environmental and Polar Science (10): Radar Scattering in Atmosphere, Earthquake, Ocean, Earth Observation, Ice sheet Radar scattering, Earth radar mapping, Climate simulation datasets, Atmospheric turbulence identification, Subsurface Biogeochemistry (microbes to watersheds), AmeriFlux and FLUXNET gas sensors 9. Energy (10): Smart grid 10
Requirements and Use Case Subgroup Two step process is used for requirement extraction: 1. Extract specific requirements and map to reference architecture based on each application s characteristics such as: a data sources (data size, file formats, rate of grow, at rest or in a. data sources (data size file formats rate of grow at rest or in motion, etc.) b. data lifecycle management (curation, conversion, quality check, pre analytic processing, etc.) c. data transformation (data fusion/mashup, analytics), d f i d f h l d. capability infrastructure (software tools, platform tools, hardware resources such as storage and networking), and e. data usage (processed results in text, table, visual, and other formats). f. all architecture components informed by Goals and use case description g. Security & Privacy has direct map S it & P i h di t 2. Aggregate all specific requirements into high level generalized requirements which are vendor neutral and technology agnostic. 437 requirements were extracted from 51 Use Cases 437 q 5 35 aggregated general requirements into 7 categories 11
Partt of Prop perty Su ummary y Table Requirements and Use Case Subgroup 12
SUBGR S ROUP Definitions and Taxonomies Nancy Grady, SAIC Natasha Balac, SDSC Eugene Luster, R2AD Scope (M0018) The focus is to gain a better understanding d of the principles of Big Data. It is important to develop a consensus based common language and vocabulary terms used in Big Data across stakeholders from industry, academia, and government. In addition, it is also critical to identify essential actors with roles and responsibility, and subdivide them into components and sub components on how they interact/ relate with each other according to their similarities and differences. Tasks For Definitions: Compile terms used from all stakeholders regarding the meaning of Big Data from various standard bodies, domain applications, and diversified operational environments. Technology Roadmap Definitions and Taxonomies NBD PWG Requiremen ts Reference Security and Architecture Privacy For Taxonomies: Identify key actors with their roles and responsibilities from all stakeholders, categorize them into components and subcomponents based on their similarities and differences Develop Big Data Definitions and taxonomies documents Gov. Big Data Symposium, NIST/ITL, Wo Chang, Nov. 20, 2013 13
Definitions and Taxonomies Subgroup (M0024, M Key documents: M0024 ongoing discussion, M0142 working draft Big Data Definitions, v1 Big Data Definitions v1 (Developed from Jan. 15 (Developed from Jan 15 17, 2013 NIST 17 2013 NIST Cloud/BigData Workshop) Big Data refers to digital data volume, velocity Big Data refers to digital data volume velocity and/or variety that: enable novel approaches to frontier questions previously inaccessible or impractical using l bl l current or conventional methods; and/or g p y y exceed the storage capacity or analysis capability of current or conventional methods and systems. differentiates by storing and analyzing population data and not sample sizes. 14
Definitions and Taxonomies Subgroup (M0024, M Data Science is the extraction of actionable knowledge directly from data through a process of discovery, hypothesis, and analytical hypothesis analysis. Data Scientists is a practitioner who has sufficient knowledge of the overlapping regimes of expertise in business needs, domain knowledge, analytical skills and programming expertise to manage the end to end scientific method process through each stage in the big data lifecycle (through action) to deliver value. 15
Definitions and Taxonomies Subgroup (M0024, M Big Data Taxonomies (M0202) Actors S t R l System Roles Sensors Data Provider makes available data internal and/or external to the system Applications Software agents Individuals Organizations Hardware resources Service abstractions Data Consumer uses the output of the system S System Orchestrator O h governance, requirements, monitoring Big Data Application Provider instantiates application Big Data Framework Provider provides resources 16
Definitions and Taxonomies Subgroup (M0024, M Big Data Taxonomies (M0202) 17
Definitions and Taxonomies Subgroup (M0024, M Big Data Taxonomies (M0202) 18
SUBGR S ROUP Reference Architecture Orit Levin, Microsoft James Ketner, AT&T Don Krapohl, Augmented Intelligence Scope (M0021) The focus is to form a community of interest t from industry, academia, and government, with the goal of developing a consensus based approach to orchestrate vendor neutral, technology and infrastructure agnostic for analytics tools and computing environments. The goal is to enable Big Data stakeholders to pick and choose technologyagnostic analytics tools for processing and visualization in any computing platform and cluster while allowing value added from Big Data service providers and the flow of the data between the stakeholders in a cohesive and secure manner. Tasks Gather and study available Big Data architectures representing various stakeholders, different data types, use cases, and document the architectures using the Big Data taxonomies model based upon the identified actors with their roles and responsibilities. Technology Roadmap Definitions and Taxonomies NBD PWG Requiremen ts Ensure that the developed Big Data reference architecture and the Security and Privacy Reference Architecture correspond and complement each other. Reference Architecture Security and Privacy Gov. Big Data Symposium, NIST/ITL, Wo Chang, Nov. 20, 2013 19
Reference Architecture Subgroup Key documents: M0100, M0151 white paper, M0123 working draft Data Processing Flow D t P i Fl M0039 Data Transformation Flow D t T f ti Fl M0017 IT Stackk IT St M0047 20
Reference Architecture Subgroup V d Bi D A hi Vendors Big Data Architectures 21
Reference Architecture Subgroup What the Baseline Big Data RA A superset of a traditional IS data system y A representation of a vendor neutral and technology agnostic system g y A functional architecture comprised of logical roles Applicable to a variety of business models o Tightly integrated enterprise systems o Loosely coupled vertical industries A business architecture IS NOT representing internal vs. external functional boundaries A deployment architecture A detailed IT RA of a specific system implementation All of the above will be f developed in the next stage in the context of specific use cases. p f 22
Reference Architecture Subgroup RA Diagram 23
SUBGR S ROUP Technology Roadmap Definitions and Taxonomies NBD PWG Security and Privacy Arnab Roy, CSA/Fujitsu Nancy Landreville, U. MD Akhil Manchanda, GE Requiremen ts Scope (M0019) The focus is to form a community of interest t from industry, academia, and government, with the goal of developing a consensus secure reference architecture to handle security and privacy issues across all stakeholders. This includes gaining an understanding of what standards are available or under development, as well as identifies which key organizations are working on these standards. Tasks Gather input from all stakeholders regarding security and privacy concerns in Big Data processing, storage, and services. Analyze/prioritize a list of challenging security and privacy requirements that may delay or prevent adoption of Big Data deployment Develop a Security and Privacy Reference Architecture that supplements the general Big Data Reference Architecture Reference Architecture Security and Privacy Gov. Big Data Symposium, NIST/ITL, Wo Chang, Nov. 20, 2013 24
Security and Privacy Subgroup Key documents: Google Doc ongoing discussion, M0110 requirements working draft, M0xxx architecture & taxonomies Requirements Scope 1. Infrastructure Security I f S i 2. Data Privacy 3. Data Management 4 4. Integrity and Reactive Security Requirements Use Cases Studied 1. Retail (consumer) 2 2. Healthcare 3. Media (social media and communications) 4. Government (military, justice systems, etc.) 55. Marketingg Architecture & Taxonomies 1. Privacy 2. Provenance 3. System Health 25
SUBGR S ROUP Technology Roadmap Carl Buffington, USDA/Vistronix Dan McClary, Oracle David Boyd, Data Tactic Scope (M0022) The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus vision with recommendations on how Big Data should move forward by performing a good gap analysis through the materials gathered from all other NBD subgroups. This includes setting standardization and adoption priorities through an understanding of what standards are available or under development as part of the recommendations. Tasks Gather input from NBD subgroups and study the taxonomies for the actors roles and responsibility, use cases and requirements, and secure reference architecture. Gain understanding of what standards are available or under development for Big Data Perform a thorough gap analysis and document the findings Technology Roadmap Definitions and Taxonomies NBD PWG Requiremen ts Identify what possible barriers may delay or prevent adoption of Big Data Document vision and recommendations Reference Architecture Security and Privacy Gov. Big Data Symposium, NIST/ITL, Wo Chang, Nov. 20, 2013 26
Technology Roadmap Subgroup Key documents: M0087 working draft Inputs from other subgroups 11. 2. 3. 4 4. Definitions and Taxonomies Requirements and Use Cases Security and Privacy Reference Architecture Potential Standards Group with Big Data related activities (M0035) Capabilities and Technology Readiness Big Data Decision Framework Big Data Mapping and Gap Analysis Big Data Strategies 1. 2. 3. Adoption Implementation l Resourcing 27
Subgroups Working Draft Outline Contact: bigdatainfo@nist.gov Website: http://bigdatawg.nist.gov Join NBD PWG: http://bigdatawg nist gov/newuser php Join NBD PWG: http://bigdatawg.nist.gov/newuser.php Documents: http://bigdatawg.nist.gov/show_inputdoc.php Working Drafts Big Data Definitions and Taxonomies (M0142) f d Big Data Requirements (M0245) g y y q ( ) Big Data Security and Privacy Requirements (M0110) Big Data Architectures White Paper Survey M0151) Big Data Reference Architectures (M0226) Big Data Security and Privacy Reference Architectures (M0110) Big Data Technology Roadmap (M0087) NIST Big Data Workshop Slides: h http://bigdatawg.nist.gov/workshop.php b d k h h 28
http://bigdatawg.nist.gov 29
Next Steps/ Future Activities 30
ISO/IEC JTC 1 Big Data Study Group Terms and References 1. Survey the existing ICT landscape for key technologies and relevant standards/models/studies /use cases and scenarios for Big Data from JTC 1, ISO, IEC and other standards setting organizations, 2 Identify key terms and definitions commonly used in the area of Big Data, 2. Identify key terms and definitions commonly used in the area of Big Data 3. Assess the current status of Big Data standardization market requirements, identify standards gaps, and propose standardization priorities to serve as a basis for future JTC 1 work, and 4 Provide a report with recommendations and other potential deliverables to the 2014 JTC 4. 1 Plenary. Focus Areas: o Big Data Architecture/Infrastructure o Big Data Security & Privacy o Big Data Analytics o Big Data Applications & Tools o Big Data Management Meetings: three face to face meetings + telecon Meetings three face to face meetings + telecon o January US o April Europe o July Asia Format: 4 days (2 days workshop Format: 4 days (2 days workshop short papers, and 2 days meeting) short papers and 2 days meeting) Workshop papers will go into NIST Special Publication High quality papers may go into ACM Conference Proceeding (in process) Report due in September, 2014 31