Chris Greer Big Data and the Internet of Things
Big Data and the Internet of Things Overview of NIST Internet of Things Big Data Public Working Group
NIST Bird s eye view The National Institute of Standards and Technology (NIST) is where Nobel Prize-winning science meets realworld engineering. Courtesy HDR Architecture, Inc./Steve Hall Hedrich Blessing With an extremely broad research portfolio, world-class facilities, national networks, and an international reach, NIST works to support industry innovation our central mission.
R. Rathe NIST: Basic Stats and Facts Major assets ~ 3,000 employees ~ 2,800 associates and facilities users ~ 1,300 field staff in partner organizations Two main locations: Gaithersburg, Md., and Boulder, Colo. Nobel Prize Winners: 1997, 2001, 2005, 2007, 2013
Working with NIST on Cyber Physical Systems Overview of NIST Internet of Things Big Data Public Working Group
Internet of Things If we had computers that knew everything there was to know about things using data they gathered without any help from us we would be able to track and count everything, and greatly reduce waste, loss and cost. Kevin Ashton, That 'Internet of Things' Thing, RFID Journal, July 22, 2009
Internet of Things What are the defining characteristics of the Internet of Things? Scale Capability Reach
Internet of Things - Scale Devices connected to the Web: 1970 = 13 1980 = 188 1990 = 313,000 2000 = 93,000,000 2010 = 5,000,000,000 2020 = 31,000,000,000 Source: Intel
Internet of Things - Capability Intel Edison: "It's a full Pentiumclass PC in the form factor of an SD card," Intel CEO Brian Krzanich
Internet of Things Reach Virtual Physical
Internet Virtual Physical
Internet of Things Virtual Physical
Cisco Internet of Everything
Working with NIST on Cyber Physical Systems Overview of NIST Internet of Things Big Data Public Working Group
IBM Smarter Planet For five years, IBMers have been working with companies, cities, and communities around the world to build a Smarter Planet. We ve seen enormous advances, as leaders have begun using the vast supply of Big Data to transform their enterprises and institutions - IBM Source: http://www.ibm.com/smarterplanet/global/files/us en_us overview win_in_the_ era_of_smart_op_ad_03_2013.pdf
GE Industrial Internet New GE technology merges big iron with big data to create brilliant machines. This convergence of machine and intelligent data is known as the Industrial Internet, and it's changing the way we work. Source: http://www.ge.com/stories/industrial-internet
Big Data - Volume IDC April 2014: The Digital Universe of Opportunities From 2013 to 2020, the digital universe will grow by a factor of 10 from 4.4 trillion gigabytes to 44 trillion. It more than doubles every two years. In 2014, the digital universe will equal 1.7 megabytes a minute for every person on Earth. Data from embedded systems will grow from 2% of the digital universe in 2013 to 10% in 2020. In 2013, the available storage capacity could hold just 33% of the digital universe. By 2020, it will be able to store less than 15%. Source: IDC Corporation, http://idcdocserv.com/1678, sponsored by EMC
Petabytes Worldwide Big Data - Volume 1,000,000 900,000 800,000 700,000 600,000 Information 500,000 400,000 300,000 200,000 100,000 0 Available Storage 2005 2006 2007 2008 2009 2010 Source: John Gantz, IDC Corporation, The Expanding Digital Universe
Big Data - Velocity Sloan Digital Sky Survey 140 Terabytes, year 2000 to present LSST Large Synoptic Survey Telescope Expect 140 Terabytes every 5 days Square Kilometer Array Expect 140 Terabytes every 3 sec LSST: Suspended between its vast mirrors will be a three billion-pixel sensor array, which on a clear winter night will produce 30 terabytes of data. In less than a week this remarkable telescope will map the whole night sky. And then the next week it will do the same again building up a database of billions of objects and millions of billions of bytes. Nature 440:383
Big Data - Variety Combining Structured and Unstructured Data
Big Data Potential Value Using detailed survey data on the business practices and information technology investments of 179 large publicly traded firms, we find that firms that adopt DDD [data driven decision making] have output and productivity that is 5-6% higher than what would be expected given their other investments and information technology usage. Brynjolfsson, Erik and Hitt, Lorin M. and Kim, Heekyung Hellen, Strength in Numbers: How Does Data-Driven Decision making Affect Firm Performance? (April 22, 2011). http://dx.doi.org/10.2139/ssrn.1819486
Big Data - Limitations Good Data Won't Guarantee Good Decisions Shvetank Shah, Andrew Horne, and Jaime Capellá Harvard Business Review April 2012 At this very moment, there s an odds-on chance that someone in your organization is making a poor decision on the basis of information that was enormously expensive to collect. Analytical skills are concentrated in too few employees IT needs to spend more time on the I and less on the T Reliable information exists, but it s hard to locate
Working with NIST on Cyber Physical Systems Overview of NIST Internet of Things Big Data Public Working Group
Big Data - Questions What are the attributes that define Big Data solutions? How is Big Data different from traditional data environments and related applications? What are the essential characteristics of Big Data environments? How do these environments integrate with currently deployed architectures? What are the central scientific, technological, and standardization challenges needed to accelerate the deployment of robust Big Data solutions?
NIST Big Data Public Working Group & Standardization Activities Wo Chang, NIST Robert Marcus, ET-Strategies Chaitanya Baru, UC San Diego http://bigdatawg.nist.gov
Big Data PWG - Charter The focus of the (NBD-PWG) is to form a community of interest from industry, academia, and government, with the goal of developing consensus definitions, taxonomies, secure reference architectures, and a technology roadmap. The aim is to create vendorand technology-neutral and infrastructure-agnostic deliverables to enable big data stakeholders to select the best analytics tools for their processing and visualization requirements while enabling value-add from big data service providers and data flow among stakeholders in a cohesive and secure manner.
Big Data PWG - Deliverables 1. Definitions 2. Taxonomies 3. Requirements & Use Cases 4. Security & Privacy Requirements 5. Architectures Survey 6. Reference Architecture 7. Security & Privacy Architecture 8. Technology Roadmap
SUBGROUPS Requirements and Use Cases Technology Roadmap NBD-PWG Definitions & Taxonomies Reference Architecture Security and Privacy 29
Requirements & Use Cases Geoffrey Fox, U. Indiana Joe Paiva, VA Tsegereda Beyene, Cisco Scope (M0020) The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus list of Big Data requirements across all stakeholders. This includes gathering and understanding various use cases from diversified application domains. Tasks Gather input from all stakeholders regarding Big Data requirements. Analyze/prioritize a list of challenging general requirements that may delay or prevent adoption of Big Data deployment Develop a comprehensive list of Big Data requirements 30
Requirements and Use Case Subgroup 51 Use Cases Received (http://bigdatawg.nist.gov/usecases.php) 1. Government Operations (4): National Archives & Records Administration, Census Bureau 2. Commercial (8): Finance in Cloud, Cloud Backup, Mendeley (Citations), Netflix, Web Search, Digital Materials, Cargo shipping (e.g. UPS) 3. Defense (3): Sensors, Image Surveillance, Situation Assessment 4. Healthcare & Life Sciences (10): Medical Records, Graph & Probabilistic Analysis, Pathology, Bio-imaging, Genomics, Epidemiology, People Activity Models, Biodiversity 5. Deep Learning & Social Media (6): Driving Car, Geolocate Images, Twitter, Crowd Sourcing, Network Science, NIST Benchmark Datasets 6. Astronomy & Physics (5): Sky Surveys, Large Hadron Collider at CERN, Belle Accelerator II (Japan) 7. Earth, Environmental & Polar Science (10): Ice Sheet Scattering, Earthquake, Ocean, Earth Radar Mapping, Climate Simulation, Atmospheric Turbulence, Subsurface Biogeochemistry, AmeriFlux &FLUXNET gas sensors 8. Energy (10): Smart Grid 31
Definitions & Taxonomies Nancy Grady, SAIC Natasha Balac, SDSC Eugene Luster, R2AD Scope (M0018) It is important to develop a consensus-based common language and vocabulary terms used in Big Data across stakeholders from industry, academia, and government. In addition, it is also critical to identify essential actors with roles and responsibilities Tasks For Definitions: Compile terms used from all stakeholders regarding the meaning of Big Data from various standard bodies, domain applications, and diversified operational environments. For Taxonomies: Identify key actors with their roles and responsibilities from all stakeholders, categorize them into components and subcomponents based on their similarities and differences Develop Big Data Definitions and taxonomies documents 32
Definitions and Taxonomies Subgroup Big Data consists of extensive datasets, primarily in the characteristics of volume, velocity and/or variety, that require a scalable architecture for efficient storage, manipulation, and analysis. Data Scientist is a practitioner who has sufficient knowledge of the overlapping regimes of expertise in business needs, domain knowledge, analytical skills and programming expertise to manage the end-to-end scientific method process through each stage in the Big Data lifecycle.
Reference Architecture Orit Levin, Microsoft James Ketner, AT&T Don Krapohl, Augmented Intelligence Scope (M0021) The goal is to enable Big Data stakeholders to pick-and-choose technology-agnostic analytics tools for processing and visualization in any computing platform and cluster while allowing added value from Big Data service providers and the flow of data between the stakeholders in a cohesive and secure manner. Tasks Gather and study available Big Data architectures representing various stakeholders, different data types, use cases, and document the architectures using the Big Data taxonomies model based upon the identified actors with their roles and responsibilities. Ensure that the developed Big Data reference architecture and the Security and Privacy Reference Architecture correspond and complement each other. 34
Reference Architecture Subgroup Key documents: M0151 White Paper M0123 Working Draft M0039 Data Processing Flow M0017 Data Transformation Flow M0047 IT Stack 35
Security & Privacy Arnab Roy, CSA/Fujitsu Nancy Landreville, U. MD Akhil Manchanda, GE Scope (M0019) The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus secure reference architecture to handle security and privacy issues across all stakeholders. This includes gaining an understanding of what standards are available or under development, as well as identifies which key organizations are working on these standards. Tasks Gather input from all stakeholders regarding security and privacy concerns in Big Data processing, storage, and services. Analyze/prioritize a list of challenging security and privacy requirements that may delay or prevent adoption of Big Data deployment Develop a Security and Privacy Reference Architecture that supplements the general Big Data Reference Architecture 36
Security and Privacy Subgroup Requirements Scope Infrastructure Security Data Privacy Data Management Integrity & Reactive Security Requirements Use Cases Studied Retail (consumer) Healthcare Media Government Marketing Architecture & Taxonomies Privacy Provenance System Health 37
Technology Roadmap Carl Buffington, USDA/Vistronix Dan McClary, Oracle David Boyd, Data Tactic Scope (M0022) The goal is to develop a consensus vision with recommendations on how Big Data should move forward by performing a good gap analysis through the materials gathered from all other NBD subgroups. This includes setting standardization and adoption priorities through an understanding of what standards are available or under development as part of the recommendations. Tasks Gather input from NBD subgroups and study the taxonomies for the actors roles and responsibility, use cases and requirements, and secure reference architecture. Gain understanding of what standards are available or under development for Big Data Perform a thorough gap analysis and document the findings Identify what possible barriers may delay or prevent adoption of Big Data Document vision and recommendations 38
Technology Roadmap Subgroup Key document: M0087 Working Draft Inputs from other subgroups Potential Standards Group with Big Data-related activities (M0035) Capabilities & Technology Readiness Decision Framework Mapping & Gap Analysis Big Data Strategies Definitions & Taxonomies Requiremen ts & Use Cases Security & Privacy Reference Architecture Adoption Implementation Resourcing 39
Subgroups Working Draft Outline Contact: Website: bigdatainfo@nist.gov http://bigdatawg.nist.gov Join NBD-PWG: http://bigdatawg.nist.gov/newuser.php Documents: http://bigdatawg.nist.gov/show_inputdoc.php Working Drafts (under editing) Big Data Definitions & Taxonomies (M0142) NIST Big Data Workshop Slides: Big Data Requirements (M0245) Big Data Security & Privacy Requirements (M0110) Big Data Architectures White Paper Survey (M0151) Big Data Reference Architectures (M0226) Big Data Security & Privacy Reference Architecture (M0110) Big Data Technology Roadmap (M0087) http://bigdatawg.nist.gov/workshop.php 40
The SmartAmerica Challenge Build an integrated Cyber-Physical Systems Framework that allows interconnection of test beds and interoperation through shared data and associated data analytics for easy integration and accelerated adoption of CPS applications. The Arpanet for CPS Innovation Sokwoo Rhee, Geoff Mulligan Presidential Innovation Fellows
SmartAmerica Participants Industry GE, IBM, Qualcomm, Intel, Schneider Electric, Philips, AT&T, UTRC, Boeing Research/Educational Institutions MIT, Harvard, UC Berkeley, Vanderbilt, U Penn, UCLA, Internet2, US Ignite, Massachusetts General Hospital Government NIST, NSF, DoT, DoD, DHS, Montgomery County
The SmartAmerica Summit June 11, 2014 Washington, DC See: http://www.nist.gov/el/smartamerica.cfm
Thank you! Web Sites: bigdatawg.nist.gov www.nist.gov/el/smartamerica.cfm Contact: chris.greer@nist.gov