CLOUDS + BIG DATA ANALYTICS IN INDUSTY 4.0 Prof., Dr., emba Pauli Kuosmanen CEO
DIGILE in a nutshell DIGILE is the Strategic Center for Science, Technology and Innovation focusing on Internet economy and related technologies and business Mission: DIGILE creates Internet economy competencies to enable new global business and job growth for DIGILE s stakeholders and partners Three main services: Research: Cooperative national and international research programs to create new technological and business innovations Solutions: Facilitation of business ecosystems and lead solution creation to explore new global business opportunities Digital service creation: FORGE Service Lab for fast digital service creation and competence scaling Core enablers: International networking Operative excellence Co-creation leadership
Internet of Things research program Billions of heterogeneous things will be seamlessly and securely integrated into the Internet forming basis to IoT PROGRAM TARGETS: Formation of a sustainable IoT ecosystem in Finland and connecting it with the global ecosystem Producing IoT enablers and impact to standards Breakthrough in Finland s IoT research visibility on global level from blog.inresearch.com from infocux Technologies/Flickr BASIC FACTS: 2012-2016 4 years program with estimated budget ~50M More than 350 experts involved from over 35 industry and research organizations (~50% SMEs) Leading Company: Ericsson WORK AREAS: Technologies for networking and communications IoT Management (security, control, configuration) Services and applications development Human Interactions Trials and Demos Ecosystem development Agile work method with 3 sprints per year
Internet of Things consortium
Data to Intelligence research program Boost Finnish competitiveness through intelligent data processing technologies and services that add measurable value. PROGRAM TARGETS: Develop new data-intensive services improving productivity, cost effectiveness, comfort or knowledge. Design, implement and test user-centric automated services utilizing big data analytics. Build re-usable technology-free service modules and technological components. from olap.com BASIC FACTS: 2012-2016 Estimated program budget (4 years): ~40M More than 60 consortium partners from industry and research organizations Leading Company: Tieto WHY: Need for common and scalable tools and applications for managing big data Forecasting the future with the help of big data analysis is essential for modern organizations Analyzing the data helps to optimize different functions and reduce costs Developing sensors, tools and services opens up business possibilities for small startups as well as bigger companies Agile work method with 3 sprints per year
Cyber Trust research program To return privacy and trust in digital world and to gain a global competitive edge in security-related business when we go towards 2019 PROGRAM TARGETS: Proactive design for security. A new proactive model of information security that is driven by knowledge of vulnerabilities, threats, assets, the motives and targets of potential adversaries. Self-healing utilizing the toolbox. Novel and effective tools and methods to cope with challenges of dynamic risk landscape with self-healing. Public awareness increase trust. Enable seamless cyber security integration to every-day life. BASIC FACTS: 2015-2018 Estimated program budget (3,5 years): ~50M 31 consortium partners from industry and research organizations Leading Company: Bittium RESEARCH THEMES Security technology Security management and governance Situation awareness Resiliency Agile work method with 3 sprints per year
China-Finland Strategic ICT Alliance - Enabling Joint International R&D&I Launched in 2009 by Ministry of Science and Technology (MOST), China, and Ministry of Employment and the Economy (MEE), Finland, Coordinators: DIGILE, the Finnish Strategic Centre for Science, Technology and Innovation in Internet Economy, appointed by Tekes, the Finnish Funding Agency for Technology and Innovation, and WICO, The Shanghai Research Center for Wireless Communications, appointed by MOST. Apr 2009 high-level workshops in Beijing and Shanghai to define joint agenda MOST, China Minister Wan Gang Finland, PM Matti Vanhanen Evolution Roadmap: Phase I (2010-2012): project based, focusing on wireless network technology and services Phase II (2012 2015): wireless emphasis, but extended to service enabling technologies and to application areas of joint interest; activities extended to thematic workshops, organizing ICT meetings for official delegations, etc. Phase III (2015- ): facilitating joint open platforms for international cooperation, including real-life testing and piloting; aiming at increasing industry, SME & start-up and multistakeholder involvement in R&D&I
Cooperation topics and participants in 2012 2015 period Everyday Sensing (Cloud, Iot, Data analysis, and social media) Sensing City Traffic Energy-Efficient Wireless Networks and Connectivity of Devices Finland s Enhanced Navigation using COMPASS/Beidou Signals Green Campus - Finnish-Chinese Green ICT R&D&I Living Lab for Energy Efficient, Clean and Safe Environments Green wireless access technology and research Vehicle to Vehicle communications: research to key technologies and demos Internet of Things and its application demonstrations Finland: VTT, National Land Survey of Finland (Finnish Geodetic Institute), Aalto U., U. of Oulu, U. of Helsinki, Tampere U. of Technology. China: Chinese Academy of Sciences & SIMIT, WiCO, Shanghai Jiao tong U., Tsinghua U., BUPT, Huazhong U.of Science and Technology (HUST), Southeast U.(SEU), U. of Electronic Science and Technology (UESTC), U. of Science and Technology of China (USTC), Harbin Institute of Technology (HIT), GNSS Research Center (GRC) at Wuhan U., Chinese Antarctic Center of Surveying and Mapping (CACSM); Tongji U., Dalian U. of Technology, Chongqing U., Xi'an U. of Architecture & Technology, and Shenyang Jianzhu U.. In addition to the universities the projects involve several industry partners.
AND NOW TO THE ACTUAL CONTENT
The Industry 4.0 - Schematically High level Sensors & Devices & Things Connectivity & Transport & Protocols Systems & Processes & People & Business Bidirectional Information Flow Messaging and Control
CLOUD AND FOG COMPUTING
Cloud computing has been the main trend some time And Big data buddies say: We don t know what we want to look at, so transmitting, storing & searching haystacks is the right thing to do
Cloud computing limitations Connectivity to the cloud is a pre-requisite of cloud computing Many Industrial Internet of Things applications need to be able to work even when connection is temporarily unavailable or under degraded connection Cloud computing assumes that there is enough bandwidth to collect the data That can become too strong an assumption for Industrial Internet of Things applications Cloud computing centralizes the analytics thus defining the lower bound reaction time of the system Many Industrial Internet of Things applications won t be able to wait for the data to get to the cloud to be analyzed and results to be returned back There are situations where sensor data cannot be transported across country boundaries for legal or regulatory reasons
Some math With 50 Billion Devices by 2020 Number of wireless devices (at 5%): 2.5 Billion Data (assuming 4MB / month): 10 000 000 TB / month Even with more conservative estimates Transmissions are enormous Data haystacks are enormous
Fog Computing Fog computing means that the cloud is descending to be diffused among the client devices, often with mobility too: the cloud is becoming foggy Whereas the cloud is up there in the sky somewhere, distant and remote and deliberately abstracted, the fog is close to the ground, right where things are getting done Cisco Systems Fog Computing Nokia Solutions and Networks Liquid Applications
Why Fog computing? Traditional cloud architecture Fog computing architecture Data center / cloud Data center / cloud Assumptions: - Almost 0 delay - Almost finite bandwidth Assumptions: - 0 delay - Infinite bandwidth Fog Assumptions: - Variable delay - Limited bandwidth Sensors / Devices / Endpoints Sensors / Devices / Endpoints
2014 Internet of Things World Forum
Fog relies on scalable virtualization Fog relies on technology components for scalable virtualization of the key resource classes: Computing, requiring the selection of hypervisors in order to virtualize both the computing and I/O resources Storage, requiring a Virtual File System and a Virtual Block and/or Object Store Networking, requiring the appropriate Network Virtualization Infrastructure Network virtualization technologies offer NaaS (Network as a Service) Separating functionality from capacity Virtualizing network appliances Providing a general interface to network resources E.g., Software Defined Networking technology decoupling the control and data planes Similar to Cloud, Fog leverages a policy-based orchestration and provisioning mechanism on top of the resource virtualization layer for scalable and automatic resource management Fog architecture exposes APIs for application development and deployment
Your share Your share Key questions How to decide Where to process the data Which data to forget/discard and which data to retain/remember How to aggregate the data to be sent to the cloud How to utilize the context (it may become more important than the data itself) How to make the system adaptive How to handle the situations, where the same data in the edge is used by many partners = Analytics Platform Game Closed product, 3rd parties do not add value Too open, no control, no money (Linux as an example) Industry value add Your reward = (Your share) * (Industry value add) Industry value add Your reward = (Your share) * (Industry value add)
One natural idea use programmable gateways Hierarchically connecting Sensors & Devices Aggregating hybrid sources of data WAN transport over Cellular, Wireline & Other Good spot in chain for security best practices
Another natural idea... use peer-to-peer In Fog computing devices may communicate peer-to-peer to efficiently share/store data and take local decisions
One more natural idea... use mobile agents Mobile agents are computer programs that control their own execution and decide about their movement (i.e. migration) between hosting devices in networked systems. A mobile agent includes its task (code or reference to the code), some data that the task requires and the current result of the task, which are transmitted as a single unit (i.e. message) Once a mobile agent has been injected into the system, it executes its task autonomously and asynchronously its application-specific migration policy. When a device receives a mobile agent, it decodes the message to a runnable code unit, runs the code, updates the agent state, composes the agent back to a message and sends it further according to its application-specific migration policy.
BIG DATA STORAGE AND PROCESSING
Industrial companies are Big Data companies Not just Big Iron companies The variety of data types is increasing Real time demands are increasing IT and OT are converging fast -> A specific data architecture is needed
Data Warehouse or Data Lake? Data Warehouse: central repository of integrated data from one or more disparate sources. It stores current and historical data and is used for creating trending reports for senior management reporting such as annual and quarterly comparisons. Properties: It represents an abstracted picture of the business organized by subject area It is highly structured Data is not loaded to the data warehouse until the use for it has been defined Data Lake: is like a body of water in its natural state. Data flows from the streams (the source systems) to the lake. Users have access to the lake to examine, take samples or dive in. Properties: All data is loaded from source systems; no data is turned away Data is stored at the leaf level in an untransformed or nearly untransformed state Data is transformed and schema is applied to fulfill the needs of analysis
Data Warehouse or Data Lake? Classical Data Analytics Data Lake Datamart (a subset of a data warehouse) = store of bottled water cleansed, packaged and structured for easy consumption Repository of unlimited amounts of data of any format, schema and type, that is relatively inexpensive and massively scaleable Source: Pentaho CTO James Dixon (who has been credited with coining the term data lake ) 26
Data Warehouse or Data Lake? There s a huge legacy value in relational databases for the purposes they are used for There s no relationship between the EDW and Hadoop right now they are going to be complimentary we re not going to get rid of RDBMS or MPP Use the right tool for right job that will very much be driven by price
SEMANTIC INTEROPERABILITY
Semantic interoperability IIoT objects and different stakeholders need be able to understand each other through semantic interoperability How To Obtain? The dynamic, heterogeneous and resource-constrained nature of the IIoT requires special design considerations to be taken into account to effectively apply the semantic technologies in the IIoT Producing a generic solution on a global scale is a truly challenging task.
Semantic web technologies or NLP technologies? Based on ontologies and metadata data models Based on natural language processing
SUMMARY
Schematically Business People Processes We need next generation CDOs
Summary Fog and cloud computing are synergistic not exclusive Industrial Internet of Things requires both Not all data needs to be stored Data lake is a concept with many promises Hyped though Enterprise Data Warehouses have huge legacy value They should be used with Data lakes with clear roles Semantic interoperability is a necessity for future innovations Platform economy effects can be achieved for Analytics platforms