NTT DATA Big Data Reference Architecture Ver. 1.0



Similar documents
Fujitsu Big Data Software Use Cases

Information Technology Engineers Examination. Network Specialist Examination. (Level 4) Syllabus. Details of Knowledge and Skills Required for

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Big Data Collection and Utilization for Operational Support of Smarter Social Infrastructure

Social Innovation through Utilization of Big Data

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

Information Technology Engineers Examination. Information Security Specialist Examination. (Level 4) Syllabus

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

Market Trends: IoT Enables Smart City to Better Manage Public Infrastructures

Reimagining Business with SAP HANA Cloud Platform for the Internet of Things

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

Amplify Serviceability and Productivity by integrating machine /sensor data with Data Science

VIEWPOINT. High Performance Analytics. Industry Context and Trends

Big Data and Advanced Analytics Technologies for the Smart Grid

Luncheon Webinar Series May 13, 2013

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Exploration on Security System Structure of Smart Campus Based on Cloud Computing. Wei Zhou

Smarter Energy: optimizing and integrating renewable energy resources

ENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING

Contents. Condition-based Maintenance for High-speed Fleet 1 M2M approach to the CBM Solution References 4 About Author... 4

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Achieving Integrated IT Service Management

Big Data and Healthcare Payers WHITE PAPER

Effective Data Integration - where to begin. Bryte Systems

Big Data & Analytics for Semiconductor Manufacturing

Data Science & Big Data Practice

Architecting an Industrial Sensor Data Platform for Big Data Analytics: Continued

Bringing Together ESB and Big Data

Syslog Analyzer ABOUT US. Member of the TeleManagement Forum

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Big Data Processing in Cloud Environments

Big Data Use Cases Update

Streaming Analytics and the Internet of Things: Transportation and Logistics

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

The Internet of Things

International collaboration to understand the relevance of Big Data for official statistics

The Recipe for Sarbanes-Oxley Compliance using Microsoft s SharePoint 2010 platform

Challenges for Big Data Applications in Japan: Hopes and Concerns

Nagarjuna College Of

How To Make Data Streaming A Real Time Intelligence

BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS

Concept and Project Objectives

Development and Runtime Platform and High-speed Processing Technology for Data Utilization

The 4 Pillars of Technosoft s Big Data Practice

How to Enhance Traditional BI Architecture to Leverage Big Data

Data Warehousing Fundamentals for IT Professionals. 2nd Edition

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data

The IBM Solution Architecture for Energy and Utilities Framework

Harnessing the Data Flood: Oracle s Visionary Platform from Device to Data Center. Chris Baker Senior Vice President Worldwide ISV/OEM Java Sales

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Complex Event Processing (CEP) Why and How. Richard Hallgren BUGS

IoT Analytics Today and in 2020

Big Data, Physics, and the Industrial Internet! How Modeling & Analytics are Making the World Work Better."

Towards Smart and Intelligent SDN Controller

Performance testing as a full life cycle activity. Julian Harty

Tracking System for GPS Devices and Mining of Spatial Data

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

The future of Big Data A United Hitachi View

Bringing Big Data into the Enterprise

Understanding traffic flow

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

How To Handle Big Data With A Data Scientist

FOR IMMEDIATE RELEASE

Formal Methods for Preserving Privacy for Big Data Extraction Software

Understanding the Value of In-Memory in the IT Landscape

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Blueprints and feasibility studies for Enterprise IoT (Part Two of Three)

Make the right decisions with Distribution Intelligence

Big Data and Analytics: Challenges and Opportunities

Chapter ML:XI. XI. Cluster Analysis

How To Use Spagobi Suite

Interactive data analytics drive insights

Paper Robert Bonham, Gregory A. Smith, SAS Institute Inc., Cary NC

COMP9321 Web Application Engineering

Attunity Better Data Movement For The Internet Of Things

Understanding the impact of the connected revolution. Vodafone Power to you

Big Data Services From Hitachi Data Systems

Cloud Computing Based on Service- Oriented Platform

I D C V E N D O R S P O T L I G H T

Predictive Analytics: Turn Information into Insights

Using Predictive Maintenance to Approach Zero Downtime

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA GREAT VALUE.

An Agent-Based Concept for Problem Management Systems to Enhance Reliability

Big Data must become a first class citizen in the enterprise

Craig McWilliams Craig Burrell. Bringing Smarter, Safer Transport to NZ

EBERSPÄCHER ELECTRONICS automotive bus systems. solutions for network analysis

Big Data-Challenges and Opportunities

Where is... How do I get to...

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY

Service Middleware for Performing and Optimizing On-site Work

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

Meeting the challenges of today s oil and gas exploration and production industry.

The big data revolution

Transcription:

NTT DATA Big Data Reference Architecture Ver. 1.0 Big Data Reference Architecture is a joint work of NTT DATA and EVERIS SPAIN, S.L.U.

Table of Contents Chap.1 Advance of Big Data Utilization... 2 Chap.2 NTT DATA Big Data Reference Architecture 3 Chap.3 Use cases of Big Data Reference Architecture 6 3.1. Forecast of variation of financial market index by using SNS data.. 6 3.2. Automation Tool for System Development in Design Phase... 7 3.3. Real-time Bridge Monitoring System... 8 3.4. Traffic Congestion Control System... 9 Chap.4 Challenges of Big Data utilization and the features of BDRA 10 Figure Fig.1:Cases of Big Data use... 2 Fig.2:NTT DATA Big Data Reference Architecture (BDRA)... 3 Fig.3:Layers of NTT DATA Big Data Reference Architecture... 4 Fig.4:Patterns of analysis scenario... 11 1

Chap.1 Advance of Big Data Utilization It has been said that the world will be filled with a substantial amount of data and thus, the utilization of Big Data will drive the competitiveness of enterprises. In fact, the Ministry of Internal Affairs and Communications in Japan stated that the estimated value of the amount of transition of data distribution in enterprises expanded 8.7 times in 9 years from 2005 to 2013. In data utilization, there are some use cases: Ad technology, which is applied for Internet advertising and demand forecasting for individuals in marketing domain; and accuracy improvement in design for the manufacturing industry and improvement of operation efficiency in the transportation industry in operational management and quality control domain etc. (Figure 1) The IoT (Internet of Things) is one of the most important subjects of Big Data utilization. Every single product is connected to network, and equipped sensor with to understand the situation of each product. Therefore, we can collect that information in real time from a remote location and manipulate the product. New services that utilize this generated information in real time will soon follow. In response to this situation, a lot of enterprises work on the construction of the mechanism of accumulating, analyzing and utilizing Big Data more than before. Figure 1: Cases of Big Data use Utilization domain Use case Marketing DSP (Demand-Side Platform) for Internet advertising (Ad technology) Demand forecasting for individual produce management Business management and quality control Accuracy improvement in designing/machining operators in manufacturing industry Forecasting and management for growth conditions of livestock Optimization of operation schedule by onboard GPS data and number of passengers Source: (Information and Communications in Japan, Ministry of Internet Affairs and Communications, Japan, 2014) 2

Chap.2 NTT DATA Big Data Reference Architecture Looking at the mechanism of data utilization in the world, individual technologies have been provided such as the Hadoop, which is the infrastructure supporting distributed processing for large amounts of data, and CEP (Complex Event Processing), which supports real time analysis. Furthermore, some technologies are distributed as open source technology, so anyone can easily use these technologies. However, the key for utilizing Big Data for business is not only about gathering elemental technologies but also constructing the mechanism to fit the purpose of business by promptly combining these elemental technologies, and then flexibly expanding and developing it. Thus, NTT DATA Group systematizes the Big Data Reference Architecture (BDRA), which makes use of the global experience of developing Big Data solutions. (Figure 2) By using BDRA, we can represent the policy of Big Data utilization in accordance with the purpose and situation of the existing systems in each enterprise. Figure 2: NTT DATA Big Data Reference Architecture (BDRA) 3

This section describes the introduction of the framework of BDRA, which helps understanding use cases in the following section. The features of BDRA will be discussed later. BDRA is composed of three platforms and seven layers. The first platform, which has a role processing the various data for analysis, contains three layers: Information Gathering, Information Store, and Data Processing. The second one is the analytics platform, which is the core function for data utilization, and contains two layers: Data Analytics and Information Utilization. The third one is the management platform for total management and contains two layers: Governance and Infrastructure. (Figure 3) Figure 3: Layers of NTT DATA Big Data Reference Architecture Category Layer Overview Data Platform Information Gathering Information Store Data Processing This layer contains functions that gather various data generated and stored in various data sources such as web media, sensors and databases, changing them into a form that can be easily analyzed. It implements integration of different types of data by ETL, and deals with the improved reliability, availability, and accessibility by messaging/replication and shared information between different resources such as software and hardware in this layer. This layer contains database functions for flexibly storing and processing massive amounts of data. For example, distributed data store which realizes the processing of massive amounts of data, an in-memory database which realizes processing at high-speed, and NoSQL which realizes high scalability and flexibility, are contained in this layer. This layer contains a function for high-speed processing of massive amounts of data collected and a pre-processing function for analysis. For example, the core functions of Big Data solution such as distributed parallel processing which realizes massive data processing technology and complex event processing technology that realizes processing at high-speed, are in this layer. 4

Category Layer Overview Analytics Platform Data Analytics This layer contains functions for analyzing stored and collected data such as correlation analysis, natural language analysis and machine learning. For example, text mining and data mining are contained in this layer. Moreover, the analytics method, BICLAVIS originally developed by NTT DATA, optimizes various analytical methods and utilizes them in multiple ways. Information Utilization This layer contains functions for decision support with the results of analysis. Data visualization, OLAP, and business process management are contained in this layer. Management Platform Governance This layer contains functions for data management like data quality control and data protection. It realizes data quality management through data management such as information lifecycle management, data profiling, master data management, and metadata management. From the point of data protection view, it contains security management and auditing. Infrastructure This layer contains functions that realize both operation management and system management for the purpose of managing reliability, availability, performance, and scalability. Details about the data analytics method, BICLAVIS will be described in Challenges of Big Data utilization and the features of BDRA. 5

Chap.3 Use cases of Big Data Reference Architecture We introduce four main cases using BDRA mentioned in the previous section. 3.1. Forecast of variation of financial market index by using SNS data The following section describes the Twitter sentiment index, developed by real time analytics with a huge amount of data, that revealed the relation between a stock index and our Twitter sentiment index consisting of Twitter data. Recently, information utilization of SNS data such as Twitter data among financial sectors is becoming popular in the United States. There is also increasing demand in Japan for such utilization. In order to meet this demand, NTT DATA and NTT DATA Mathematical Systems developed the Twitter sentiment index, which is a numerical indicator of the proportion of positive or negative sentiments expressed in tweets relating to the stock market by extracting and analyzing Twitter data in real time. We verified that there is a statistically significant correlation between the Twitter sentiment index and the Nikkei 225 volatility index by extracting several millions of stock-related tweets for 35 months (from January 2011 to November 2013). Key points of Big Data utilization in this case are efficiently maintaining real time analysis and selecting analytical technologies. In order to analyze in real time, it has to construct a mechanism for quickly extracting data from high volume data. Besides, it takes more time to process Japanese text than other languages because it is necessary to take a process to judge the smallest word units by the context in Japanese while there are separations among words in English etc.therefore, analyzing tweets in real time is realized by Distributed Parallel Processing in the Data Processing layer and Distributed Data Store in the Information Store layer (specifically, utilizing the Hadoop Distributed File System) in BDRA. In addition, integrating various technologies such as Text Mining and Data Mining in the Data Analytics layer and Rule Engine in the Data Processing layer is one of the features in this case. Furthermore, the use of the data analytics method, BICLAVIS systematized by NTT DATA, assists in the selection of efficient analysis methods. In this case, an Evaluation and Important Analysis type scenario pattern is used in evaluating the correlation between the Twitter sentiment index and the Nikkei 225 volatility index. Details about the data analytics method, BICLAVIS will be described in Issues of Big Data utilization and the features of BDRA. 6

3.2. Automation Tool for System Development The following section describes the case in which we introduced the automation tool in system development by flexible data model construction and the use of metadata management. NTT DATA provides a total solution for open system development called TERASOLUNA, which realizes a conventional IT system with high quality in a short term due to a change in the business environment such as the progress of the globalization. We developed TERASOLUNA DS as one of the solutions that enables gathering information for system development such as design information and contributes to optimizing system development and quality assurance by implementing the consistency check of design documents and an accumulation of the design know-how. TERASOLUNA DS provides various functions: automating consistency and notation variability check among design documents, accelerated full-text searching of design documents and source codes, influential range analysis in changing specifications, and supporting input design document. It drastically improves productivity in the design phase by reducing reviews and supporting the identification of the influence range of specification changes or bug occurrences. In this case, Key points of Big Data utilization are that; complex schemas due to the difference in document formats depending on projects and the redesign of the schema due to the new document format being added. In this case, all documents are firstly converted into XML files by ETL processing in the Data Gathering layer, and are then stored in the NoSQL database. This constructs a flexible and schema-independent data model. In addition, "Metadata Management" in the Governance layer enables to automate a consistency check on design documents with efficiency and accuracy. The amount of design documents in large scale system development is enormous, extending to 40,000 files and 400,000 pages. By applying a mechanism of managing metadata such as structure, attribute, and recorded information about these design documents, it realizes the check and analysis with accuracy more than a manual review. 7

3.3. Real-time Bridge Monitoring System The following section describes the case of applying high speed processing for massive amounts of data based on the service of monitoring the Tokyo Gate Bridge in Japan and Can Tho Bridge in Vietnam. Bridges and roads are social infrastructure supporting the life of people and are thought to be safe anytime. Therefore, the road administrator is required to detect defect or damage in bridges, and thus make a decision on the road traffic flow or specify available routes. NTT DATA works on continuously collecting and analyzing various data in real time by using several sensors placed on the bridges such as strains of bridge beams and piers. The key point of Big Data utilization in this case is processing large amounts of sensor data in a short period. By using Complex Event Processing in the Data Processing layer in BDRA, this system can quickly analyze the sensor data of more than 100 bridges with just one server, in case of large-scale disasters, when the road administrators need to panoramically monitor multiple bridges. This system is realized by combing technologies in each layer: Data Mining in the Data Analytics layer to extract abnormal patterns; and Data Visualization in the Information Utilization layer to clearly visualize the anomalies in the detected results. Additionally, we can improve the accuracy in anomaly detection by using the data analytics method BICLAVIS developed by NTT DATA. The abnormal values detected from the sensor data include measurement failures because of the sensor malfunction, and external forces such as high winds and earthquakes. In order to distinguish between abnormal values and defects in bridges, we implemented pre-processing for low-frequency component removal and determination logic using lag correlation based on the positional relation between the sensors. Anomaly detection uses BICLAVIS scenario patterns: Outlier Detection if it is possible to define the abnormal patterns and Incorrect Detection if it is difficult to define them. 8

3.4. Traffic Congestion Control System The following section describes the case to ease traffic congestion by utilizing simulation technology of massive data and using a prediction/control analytics model. Traffic congestion is one of the biggest problems for both developed and developing countries. Congestion causes environmental problems like fossil fuel consumption and CO 2 emissions as well as enormous time and financial losses. Many countries have a strong interest in reducing and easing traffic congestion, however, most measures are expensive and the effectiveness of each measure is unclear. In addition, the problem is that these measures tend to be only partial optimization and not overall optimization. In order to solve the problem, NTT DATA developed a traffic simulation system that can evaluate the effectiveness of measures for easing traffic congestion such as traffic light control and traffic restriction. This system uses GPS data collected from car navigation systems and smart phones in each vehicle for a traffic simulation we tested in Jilin, China, and achieved a 27 percent improvement in bus service times by using the simulation results to ease traffic congestion. The technology we developed is based on the statistical traffic models with vehicles, roads, intersections, traffic lights and reproduces the traffic environment on a computer. Also, it enables to control the traffic lights by the best pattern that a light turns to green to minimize traffic congestions. The pattern are produced by traffic simulations, and they are evaluated through turning relevant parameters. Multi-agent simulation technology sets multiple system construction factors to operate in the computer and predicts the future. A traffic administrator can judge the effectiveness of traffic measures on some scenarios in advance with this system. Also, a traffic administrator is able to detect the causes of the current traffic condition such as road and time slots which tend to cause traffic congestion. In this case, the key point of Big Data utilization is high-speed processing for the traffic simulation platform to simulate a large amount of traffic volume. By utilizing Distributed Parallel Processing in the Data Processing layer, this system can handle over one million vehicles. In addition to Distributed Parallel Processing, we combine functions in each layer of BDRA: Real Time Capture in the Data Gathering layer and Data Visualization in the Information Utilization layer. This realizes efficient and appropriate development of architecture. Moreover, in this case, we use analysis method BICLAVIS for the data analysis of prediction and control, and adopt the Risk Simulation scenario pattern for this system. 9

Chap.4 Challenges of Big Data utilization and the features of BDRA The following chapter describes common issues of Big Data utilization found in the cases previously mentioned and the features of BDRA. Issues of Big Data utilization (1) The combination of multiple IT infrastructure technologies Single use of IT infrastructure technology is not enough and thus, the combination of data gathering, data storing, data processing, and data analysis is necessary when enterprises utilize Big Data. Especially, it becomes common to select items of realizing data storing among various technologies including relational database, NoSQL database etc. Therefore it is required to provide how to store data and/or how to use stored data. (2) Data analysis of various industries Various enterprises utilize Big Data such as finance, IT, social infrastructure etc. Furthermore, data analysis methods become complicated as enterprises require more advanced data analysis results. It is important to select the correct data analysis method in order to respond to these requirements without slowing business speed down. (3) Assurance of the data quality Stored data by enterprises is not originally assumed to be analyzed as we mentioned in the case of the automation of system development. Therefore, it is required to verify the availability of data for analysis by data profiling. Also, it is important to properly manage the lifecycle of data in order to get significant results from data analysis. The features of BDRA BDRA has the features as below to solve the three problems above. (1) Comprehensive framework to realize rapid and flexible technology integration BDRA systematized the knowledge of the Big Data utilization with the deep understanding as well as the combination of this knowledge. BDRA verified the combination of products by vendors and open source software (OSS) and thus, it can support selecting the combination of products by different vendors. The combination of products with high frequency of use is provided as a set; besides, it is possible to select products fitting with existing IT systems. (2) BICLAVIS to realize systematic and efficient approach to analysis NTT DATA developed cross-industrial data analytical methods BICLAVIS generated based on data analysis implemented over 200 cases. Data analysis work tends to be individualistic and a wide range of industries and business seek data analysis. Therefore, NTT DATA constructs the mechanism to gather know-how of data analysis and systematizes the analytics model in order to utilize this information in cross-industrial. Specifically, we organize them as patterns of analysis scenario based on the analysis purpose by categorizing and summarizing purposes, procedures, and techniques of data analysis with the template of those scenarios. (Figure 4) Thus, we can get results about requests from any industries. 10

Figure 4: Patterns of analysis scenario Scenario Pattern Portent Overall Detect the signs of structure change and situation change from Big Data. Anomaly Detection Automatically detect an abnormal pattern in real time and stimulate early crisis response through alerting. False Detection Detect an illegal or outlier situation that is fitted to the definition of an abnormal pattern. Outlier Detection Detect a deviation from the standard or normal situation. Prediction and Control By clarifying relation between the cause and effect at work and estimating the change in result due to manipulating causes, understand the appropriate standard for the cause. Profit Simulation Estimate the effect of work restructuring measures and prioritize them by simulation. Risk Simulation Assess risks by business modeling with uncertainties and prioritize them. Optimization Select the measures that maximize performance by the optimization method. Risk Hedge Support risk reduction with the risk scattered method Targeting Extract the targets to be approached, such as a potential customer in order to maximize the cost-effectiveness. Credit Control Determine the default risk of individuals or bankruptcy risk of enterprises. Evaluation and Factor Analysis Context Awareness Weigh up the various objects and identify the factors. Recommend the product and the service through analysis of behavior and preferences in advance. Process Trace Extract the process of growth and development and identify the accelerator or inhibitor. 11

(3) Governance layer to establish Big Data governance BDRA has abundant governance functions required for the utilization of Big Data, such as improving reliability of data and security. As the words Garbage in, Garbage out, only meaningless results come out from the inaccurate data. Especially about functions to improve reliability of data, we define the implementation of data profiling before data cleansing and fixing rules. Also, the system to manage master data has been confirmed. Furthermore, various functions are arranged from the viewpoint of security in order to protect data. Recently, there have been various discussions about personal data, therefore security is necessary to utilize Big Data with peace of mind. BDRA has security management as a series of methods with various audit points such as IT audit, information security audit, and Data- Centric Audit and Protection (DCAP). As we mentioned above, BDRA is aggregating know-how about various architecture and technology integration for the utilization of Big Data. NTT DATA has, thus far, been providing a lot of architecture by BDRA, and we are continuously going to improve it. NTT DATA Corporation Toyosu Center Building, 3-3, Toyosu 3-chome, Koto-ku, Tokyo 135-6033, Japan http://www.nttdata.com The display of the (TM) mark or the (R) mark might be omitted in this paper.