ISO/IEC JTC1 SC32. Next Generation Analytics Study Group

November 13, 2013 ISO/IEC JTC1 SC32 Next Generation Analytics Study Group Title: Author: Project: Status: Big Data Efforts Keith W. Hare Discussion Paper References: 1/6

1 NIST Big Data Public Working Group In June, 2013, the US National Institute of Standards and Technology (NIST) created a Big Data Public Working Group (NBD-PWG). The NBD-PWG effort is broken down into five subgroups: Definition and Taxonomy Use Case and Requirements Security and Privacy Reference Architecture Technology Roadmap Each of the subgroups met weekly using web conference technology as well as e-mails. Participants in these efforts included people from government, private corporations, consultants, and education. I have not seen a definitive list of participants. The NIST Big Data Working Group completed a great deal of work in by the end of September. A larger number of papers have been posted at: http://bigdatawg.nist.gov/show_inputdoc.php The summary presentations and the more detailed reports generated by the five subgroups are for the September 30, 2013 Meeting were: Subgroup Title NBD-WG NGA Number Number Big Data PWG Overview Presentation M0259v1 NGA-008 Definition and Definition and Taxonomy Subgroup M0254v1 NGA-009 Taxonomy Presentation Definition and Big Data Definitions and Taxonomies Working M0142v4 NGA-010 Taxonomy Draft Use Case and Use Case and Requirements Subgroup M0261v1 NGA-011 Requirements Presentation Use Case and Big Data Requirements Working Draft M0245v3 NGA-012 Requirements Security and Security and Privacy Subgroup Presentation M0257v2 NGA-013 Privacy Security and Big Data Security and Privacy Requirements M0110v10 NGA-014 Privacy Working Draft Reference Reference Architecture Subgroup M0255v1 NGA-015 Architecture Presentation Reference Big Data Architectures White Paper Survey M0151v3 NGA-016 Architecture Reference Big Data Reference Architectures Working M0226v8 NGA-017 Architecture Draft Technology Technology Roadmap Subgroup Presentation M0256v1 NGA-018 Roadmap Technology Roadmap Big Data Technology Roadmap Working Draft M0087v7 NGA-019 NIST hosted a face-to-face Big Data Working Group meeting on Monday, September 30, 2013. I was unable to attend because of other travel commitments. On October 29, 2013, Wo Change, of NIST e-mailed the following summary: 2/6

September 30, 2013 NIST Big Data Workshop Summary Oct. 29, 2013 NIST hosted a Big Data Workshop for the NIST Big Data Public Working Group (NBD-PWG) on September 30, 2013. Over 70 industry, academia, and government representatives attended the workshop. The program schedule and presentations are available at: http://bigdatawg.nist.gov/workshop.php. The key objectives of this workshop were to present the NBD-PWG working draft documents and to plan future activities for the next three to six months. The report below is combined with the subgroup cochairs meeting held in Oct. 21 and 23. 1. Working Drafts V1.0 Publication The working draft documents were the results of over 140 hours of intensive debates over 3 ½ months of Web teleconferencing with over 240 submitted documents; 51general use cases (in addition, there are 9 security and privacy use cases) which are detailed use cases ranging from government operation to healthcare and life sciences; from deep learning and social media to security and privacy; from the research ecosystem to astronomy and physics and many more. From these use cases, the NBD-PWG has extracted 437 specific requirements and 35 general requirements. The working draft documents included: 1. Big Data Definitions 2. Big Data Taxonomies 3. Big Data Use Cases and Requirements 4. Big Data Security and Privacy Requirements 5. Big Data Reference Architectures Survey White Paper 6. Big Data Reference Architectures 7. Big Data Security and Privacy Reference Architectures 8. Big Data Technology Roadmap The goal is to publish the above documents as V1.0 under the NIST Special Publication within the next six months, including technical editing, external comments, and internal review. Due to the government shutdown for over two weeks, we have now shifted the working drafts publication process for a month. New schedule is (some documents may complete earlier than the others): Oct. 31: drafts completion Nov. : technical writer editing Dec. Jan.: RFI Feb. Mar.: adjudicate comments Apr. NIST: internal review May: publication 2. Next Steps for NBD-PWG At the September 30 workshop (and earlier), there was an overwhelming interest to map use cases (or patterns of unique scenarios) to the NBD Reference Architecture (NBD RA), see Section 3 for Big Data Workshop Brainstorming on Mapping Use Cases to NBD RA. However, details of the approach and how deep the mapping would be were not fully discussed which is why the subgroup co-chairs meeting was needed. Use Cases/Patterns Mapping Thanks to Chaitan for willing to contribute three patterns from BDBC for us to use; two of them are readily available. In addition, Geoffrey will identify few other simple unique scenarios (real time streaming, batch 3/6

processing, etc.) from the 51 use cases received. The idea is to get the use case submitters as collaborators in the mapping and reference implementation (see below) of the NBD RA with any necessary tools and technologies (under other potential collaboration organization, see below). Action Item: a. Chaitan will send two readily available use cases to bigdatasgc@nist.gov by Oct. 31. b. Geoffrey will identify few other unique big data scenarios out from the 51 submitted use cases by Oct. 31. Use Cases/Patterns Implementation (duration: six months) An approach to validate the NBD RA is to collaborate with other consortium or alliance such as the Research Data Alliance to implement selected use cases using the NBD RA. The specific responsibility between NBD-PWG and the potential collaboration organization are: NBD-PWG a. Identify appropriate unique use case scenarios to map use cases onto our NBD RA b. Invite NBD-PWG use case submitters to work closely with us to specify workflow between use case and NBD RA key components c. Work with the potential collaboration organization to provide guidance on how to map use case scenarios onto the NBD RA d. Identify high-level interface between NBD RA key components Potential Collaboration Organization a. Work with NBD-PWG to implement the identified unique use case scenarios using NBD RA with any desired necessary vendor solutions or technologies b. Create a best practices implementation guide based on the NBD RA Action Item: a. Wo will continue to seek potential collaboration organization such as RDA by mid Nov. Refresh NBD subgroups with new tasks new call for volunteers (sometime in early Nov.) Since the commitment for last-call volunteers for subgroup co-chairs was from July to end of September and new tasks require a new duration of commitment (six months), the NBD-PWG will try to issue another round of call for volunteers as subgroup co-chairs once the new charters are defined (early Nov.). I strongly hope and encourage all existing subgroup co-chairs to re-apply. Most likely thee will be biweekly meetings for the new tasks. It can be adjusted more/less frequent depending on the need. Action Item: a. NBD-PWG Co-Chairs will issue a new call for volunteers in early Nov. 3. Big Data Workshop Brainstorming on Mapping Use Cases to NBD RA Abstract use cases and then develop concept. Develop specific pattern It is hard to move various use cases into a few well-defined (but monolithic use cases) Conceptually, how you distribute across clusters, but you do need to communicate between each cluster. Can we identify how that inter-process communication occurs? There is quite a bit of complexity in the reference architecture. Can we identify underlying patterns as they will not change? Come up with patterns and principles that you can bubble up but may require to drill down 4/6

Because BD is so diverse, applications are so different and fast moving (dynamic), going forward and in order to get performance, the applications are going to use different interfaces (based on their needs). Standardizing single interfaces may not be practical. A lot of the use cases may be hybrid (with mapreduce, with column restore) different patterns.if we can boil down specific patterns, (synchronous vs asynchronous) based on a specific problem being solved. So, do we stay at conceptual level and define patterns? For e.g., real-time processing, batchprocessing so we know what patterns are? Create patterns and specific data source in terms of 3V s (or 5V s). What exactly these means for a pattern? For this patterns, for these kinds of source, we can specify recommendations (relatively high-level) Definition of pattern is specify the data provider and life cycle (what commands are coming from orchestrator) and then map those to Road-map. Create conceptual representation of the use cases and then distribute to a wider audience for input if the conceptual representation of a use case matches or fit the polled audience s scenarios? Develop patterns based on the requirements that generalize requirements into specific implementable patterns. You can have patterns addressing, real-time, batch processing and/or hybrid. Two patterns have emerged, data warehouse (traditional relational database) and deep analytics pipeline (ingestion of real-time stream, or batch processing or machine leaning). There is a 3rd proposal from Intel on real-time analytics. o Transaction Processing Council, for e.g., there is one dealing with machine learning, TPC Machine Learning 2 JTC/1 Big Data Study Group The recently concluded JTC1 Plenary accepted a resolution establishing a JTC1 level study group on Big Data: Resolution 27 Establishment of a JTC 1 Study Group on Big Data Recognizing that Big Data: Has been identified by SWG Planning as an important future area for JTC 1 focus, Is a topic of consideration within SC 32 as reported to the Plenary, and Continues to be of interest to other JTC 1 Subcommittees including SC 27, SC 34 and SC 38 JTC 1 establishes a Study Group on Big Data for consideration of Big Data activities across all of JTC 1 with the following terms of reference: Terms of Reference 1. Survey the existing ICT landscape for key technologies and relevant standards /models/studies/use cases and scenarios for Big Data from JTC 1, ISO, IEC and other standards setting organizations, 2. Identify key terms and definitions commonly used in the area of Big Data, 5/6

3. Assess the current status of Big Data standardization market requirements, identify standards gaps, and propose standardization priorities to serve as a basis for future JTC 1 work, and 4. Provide a report with recommendations and other potential deliverables to the 2014 JTC 1Plenary. Membership in the SG on Big Data is open to: 1. JTC 1 National Bodies, JTC 1 Liaisons and approved JTC 1 PAS Submitters; 2. JTC 1 /SCs, JTC 1/WGs, relevant ISO and IEC TCs; 3. Members of ISO and IEC central offices; and 4. Invited standards setting organizations that are engaged in Big Data standardization as approved by the SG on Big Data. JTC 1 instructs its Secretariat to issue a call for participation in the Study Group. JTC 1 accepts the offer from the US of Wo Chang to serve as Convenor for the JTC 1 Study Group on Big Data. Unanimous I have worked with Wo Chang on the USA NIST Big Data Working Group. I think he is an excellent choice for convenor of the JTC1 study group. I have not yet seen a call for participation, but it should be along soon. I expect that many of you will be interested in participating in this effort. I ve given Wo Chang the dates for June 2014 SC32 Plenary. I do not know yet if he will be able to attend that meeting. 3 SC32 Next Generation Analytics Study Group Where does the SC32 Next Generation Analytics Study Group fit into all of this? If you read the NIST documents, in particular Big Data Technology Roadmap Working Draft, M0087v7, you will note that the standards discussion looks very similar to sections of SC32N2388b the NGA Study Group paper from the 2013 SC32 plenary. This similarity exists because I wrote the standards section of the Big Data Technology Roadmap Working Draft and incorporated input from SC32N2388b. I expect that when the JTC1 Big Data Study Group ramps up, it will make sense to incorporate the SC32 NGA Study Group efforts into the JTC1 study group. However, we can do useful work before that rampup. I also expect that the output from the NIST Big Data Public Working Group will influence the JTC1 study group. I recommend looking carefully at Big Data Reference Architectures Working Draft, M0226v8. My opinion is that the reference architecture in M0226v8 is too general and at too high a level. I think we need a model more like the Korean model incorporated into SC32N2388b 6/6