Hitachi Open Middleware for Big Data Processing



Similar documents
IT Platforms for Utilization of Big Data

More Advanced Use of Information in Retailing

Fujitsu Big Data Software Use Cases

proprietary stream processi technology ena es large amounts of data to be processed in real real-time and analyzed quickly.

- 3 - Overview of benchmark evaluation About the joint R&D of the ultrafast database engine by IIS and Hitachi

Information & Control Technology Platform for Public Infrastructure

How To Make Data Streaming A Real Time Intelligence

Development and Runtime Platform and High-speed Processing Technology for Data Utilization

System Management and Operation for Cloud Computing Systems

Fujitsu Interstage Big Data Parallel Processing Server V1.0

Plant Maintenance Services and Application of ICT

IT Resource Management Technology for Reducing Operating Costs of Large Cloud Data Centers

Social Innovation through Utilization of Big Data

Total System Operations and Management for Network Computing Environment

Integration of Supervisory Control and Data Acquisition Systems Connected to Wide Area Networks

Development and Future Deployment of Cloud-based Video Surveillance Systems

Cloud Fusion Concept. Yoshitaka Sakashita Kuniharu Takayama Akihiko Matsuo Hidetoshi Kurihara

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Providing Support Infrastructure in Consideration of IT Management

Hitachi s Midrange Disk Array as Platform for DLCM Solution

Interactive data analytics drive insights

Hitachi Storage Management Software that Achieves the Best Operation

Big Data Processing in Cloud Environments

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Cloud-based Equipment Maintenance and Facility Management Service Platform

The 4 Pillars of Technosoft s Big Data Practice

FOR IMMEDIATE RELEASE Hitachi, Ltd. Red Hat K.K.

HiRDB 9 HiRDB is an evolving database for continuing business services.

ARC VIEW. Stratus High-Availability Solutions Designed to Virtually Eliminate Downtime. Keywords. Summary. By Craig Resnick

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data

Interstage: Fujitsu s Application Platform Suite

Cloud Services Supporting Plant Factory Production for the Next Generation of Agricultural Businesses

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Symfoware Server Reliable and Scalable Data Management

Traffic Management Solutions for Social Innovation Business

The Challenge of Managing On-line Transaction Processing Applications in the Cloud Computing World

Fujitsu s Approach to Platform as a Service (PaaS)

Big Data Collection and Utilization for Operational Support of Smarter Social Infrastructure

Hitachi Storage Solution for Cloud Computing

WebSphere Cast Iron Cloud integration

Archive Storage Technologies Supporting Information Lifecycle Management

Knowledge Emergence Infrastructure by Convergence of Real World and IT

On a Hadoop-based Analytics Service System

Hitachi s Involvement in Networking for Cloud Computing

On-Demand Virtual System Service

Oracle: Database and Data Management Innovations with CERN Public Day

Unified Batch & Stream Processing Platform

Using In-Memory Computing to Simplify Big Data Analytics

IBM WebSphere Cast Iron Cloud integration

Achieving a Personal Cloud Environment

Practice of M2M Connecting Real-World Things with Cloud Computing

Hitachi s Activities Regarding Broadband IPv6 Network Systems

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Hadoop in the Hybrid Cloud

FOR IMMEDIATE RELEASE

NTT DOCOMO Technical Journal. Large-Scale Data Processing Infrastructure for Mobile Spatial Statistics

Overview of Next-Generation Green Data Center

Delivering Business Continuity In a High Reliability Database

Case Studies of System Architectures That Use COBOL Assets

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Dependable, Open and Real-time Architecture for Power Systems

Cloud-based Fleet Management System Proof of Concept

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

Cyber Security Technologies for Social Infrastructure Systems

SOLUTION BRIEF. JUST THE FAQs: Moving Big Data with Bulk Load.

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Online Failure Prediction in Cloud Datacenters

Dell* In-Memory Appliance for Cloudera* Enterprise

Hitachi s IT Solution for Creating Value in a Competitive Market of Electricity System Reform/Full Liberalization of Retail Markets

Logical Partitioning Feature of CB Series Xeon Servers Suitable for Robust and Reliable Cloud

Big Data Use Case: Business Analytics

Hitachi Data Processing and Distribution Service for Telematics

DbvisitConnect. Partner Enablement Guide. For Dbvisit Software Partners

Convergence of Advanced Information and Control Technology in Advanced Metering Infrastructure (AMI) Solution

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

Essential Elements of an IoT Core Platform

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

TWX-21 Business System Cloud for Global Corporations

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

London Stock Exchange Cuts Information Dissemination Time from 30 to 2 Milliseconds

Privacy-preserving Analysis Technique for Secure, Cloud-based Big Data Analytics

BIG DATA TRENDS AND TECHNOLOGIES

Understanding traffic flow

Hitachi s Plans for Healthcare IT Services

What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER

Service Delivery Platforms for Network Operators

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

NEW SOLMICRO-EXPERTIS ERP-CRM

Why DBMSs Matter More than Ever in the Big Data Era

Use of EAM and Equipment Operation Technology for Sophisticated Operation and Maintenance of Electric Power Distribution Equipment

Testing Big data is one of the biggest

Next Generation of Global Production Management Using Sensing and Analysis Technology

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

O&M Cloud Service for Expediting Business Innovation

Recent Trends of Storage System Technologies

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Why Big Data in the Cloud?

Transcription:

Hitachi Open Middleware for Big Data Processing 94 Hitachi Open Middleware for Big Data Processing Jun Yoshida Nobuo Kawamura Kazunori Tamura Kazuhiko Watanabe OVERVIEW: The quantity of being handled by corporations is growing rapidly and the ability to utilize this big effectively will be one of the keys to future corporate development. Examples include the use of real-time such using sensor to detect abnormalities in plant and machinery, batch-processing-style trend analysis such as the use of sensor collected over a long period to conduct failure analysis of plant and machinery, and improving the speed of daily batch processing such as calculating sales and order totals or nightly batch processing of base updates. In response to these needs, Hitachi supplies platforms such as stream processing middleware and parallel and distributed processing middleware to support the processing of big. With volumes expected to continue expanding rapidly, Hitachi is also working with The University of Tokyo, etc. on the research and development project of an ultra-high-speed base. INTRODUCTION WE have entered an era of explosion, in which the quantity of handled by corporations is growing rapidly in response to factors such as advances in sensor technology and the widespread adoption of broadband and portable devices. For example, making effective use of large quantities of access log or sensor, in ways that can create new businesses is one of the keys to future corporate development. In existing systems, the time required to process batch jobs is becoming longer as the quantity of increases, and this is starting to impinge on the time available for other services. Given this situation, new business value can be unlocked by processing a batch job quickly that previously took several days to execute. Hitachi is working on research and development of these open middleware technologies to facilitate the efficient processing of big. This article gives an overview of big processing and Hitachi s open middleware, which is evolving to meet its requirements. OPEN MIDDLEWARE FOR BIG DATA PROCESSING Challenges to be Overcome by Big Data Processing The technologies required for big processing can be broadly divided into the following two categories, which represent challenges to be overcome: (1) Real-time processing Technologies for processing an endless flow of big such as geospatial services or the use of sensor to detect abnormalities in plant and machinery (2) Faster batch and tabulation processing Technologies for improving processing speed to avoid lengthening execution times as volumes rise when performing daily batch processing, such as calculating sales and order totals or nightly batch processing of base updates. To overcome these two challenges, progress is being made on software technologies that take advantage of developments in hardware. Stream processing middleware is one example of a technology for achieving real-time processing (1). Taking advantage of the improved performance and decreasing cost of memory, this technology can achieve high-speed near-real-time performance by processing big in memory. For faster batch and tabulation processing (2), meanwhile, a parallel and distributed processing middleware technology that has attracted attention in recent years is the open source Hadoop *1 software. Hadoop takes advantage of the improved performance and lower cost of IA (Intel *2 architecture) servers to speed up batch processing by operating a large array of these servers in parallel. Prospects and Challenges for Hadoop Hadoop allows high-speed batch processing to be performed with ease, without requiring operators to *1 Hadoop is a trademark of the Apache Software Foundation. *2 Intel is a trademark of Intel Corporation in the U.S. and/or other countries.

Hitachi Review Vol. 61 (2012), No. 2 95 be concerned about such things as the complexities of parallel and distributed processing or how is allocated. Hadoop is an open source program with good future prospects, and applications for it are being sought on corporate systems around the world. In one typical example, it is used to generate product recommendations from web access logs at a consumer web site to help motivate customer purchases. Beyond web access logs, other potential applications include the use of sensor collected over a long period to conduct failure analyses of plant and machinery, or statistical analyses of geospatial. However, while Hadoop can simplify the process of implementing high-speed batch processing, its applications are limited. For example, it cannot run existing batch programs written in languages such as COBOL (Common Business Oriented Language) and programs need to be rewritten to suit the Hadoop processing model. Other problems include a lack of flexibility in areas such as how allocation is performed, and an inability to give a reliable indication of how long batch processing will take. Supply of Open Middleware and Associated Services Hitachi supplies a range of open middleware products for big processing (see Fig. 1). For real-time processing, it offers the stream processing middleware. For faster batch and tabulation processing, Hitachi offers a support service for the open source Hadoop software. However, Hadoop does not suit all applications and the parallel and distributed processing middleware is available to cover tasks for which Hadoop is unsuited. Features of the parallel and distributed processing middleware include the ease with which existing batch processing can be migrated, flexibility in how is allocated, and reliable batch processing completion times. STREAM DATA PROCESSING MIDDLEWARE Features The stream processing middleware is a middleware package for the real-time, in-memory processing of continuous flows of big as it is generated. Tabulating and analyzing large quantities of at high speed allows the timely detection of abnormal situations. The middleware uses scripts written in CQL (continuous query language), an extension of the SQL (structured query language) commonly used with bases, to provide an easy way to specify tabulation and analysis scenario definitions. This means that users familiar with SQL will have no trouble producing scenario definitions. POS terminal Increase speed of COBOL batch processing. Calculation of sales and order totals, etc. Detection of abnormalities in plant and equipment, monitoring of traffic congestion Speed up existing batch jobs, highly reliable batch processing. Use of in daily business activities Mobile handset Real-time processing Server Memory Stream Data Server Server Server Parallel and Distributed Tabulation results (file/db) Sensing systems Online DB Trend analysis of log or sensor, etc. Server Server Server Parallel and distributed processing Hadoop open source software Analysis results (file/db) Trend analysis of web access logs, analysis of plant and equipment failure statistics POS: point of sales system DB: base COBOL: common business oriented language Fig. 1 Concepts Used in Big Data Processing Technologies for Business Systems. Potential applications include using the stream processing middleware for real-time processing such as detection of abnormalities in plant and equipment, using the parallel and distributed processing middleware for highly reliable batch processing such as calculating sales and order totals, and using the open source Hadoop software for tasks such as trend analysis of web access logs.

Hitachi Open Middleware for Big Data Processing 96 Production Real-time processing of big Production monitoring Transaction Stream Data Analysis scenario Algorithmic trading Position Sophisticated analysis of time-series Geospatial services Fig. 2 Overview of Stream Data. The continuous flow of big is processed in realtime as it is generated. This allows applications such as geospatial services that perform real-time analysis of position from objects. Monitoring conditions are easily defined as scenarios using CQL (continuous query language). Example Applications It is anticipated that the stream processing middleware will be used for real-time processing in a wide range of different fields. In addition to the detection of abnormalities in plant and equipment and other maintenance services, potential examples include implementing corporate compliance requirements such as preventing unauthorized web access, algorithmic trading in which buy and sell orders are generated automatically based on an analysis of factors such as stock prices and turnover, and recommendation services that use position from GPS (global positioning system) terminals (see Fig. 2). The stream processing middleware has already been chosen by a Japanese exchange to provide a service for publishing market indices, where it provides a high-speed distribution service with world-leading levels of performance. The system is able to track fluctuations in the prices of the stocks that make up an index, and then update and distribute it at a pitch in the order of milliseconds (compared to seconds previously). Examples of applications that take advantage of the advanced capabilities of the stream processing middleware for analyzing time-series are also increasing. One such is proactive preventive maintenance of IT systems, which are becoming larger and more complex due to advances such as virtualization and cloud computing. This involves analyzing the large quantities of log collected by these systems to look for correlations and other trends so that potential faults can be identified and prevented before they occur. PARALLEL AND DISTRIBUTED PROCESSING MIDDLEWARE Features As existing batch jobs at corporations have become black boxes, there are risks associated with making any changes. The parallel and distributed processing middleware is a middleware package that can reuse existing batch jobs and distribute them across multiple servers for faster parallel execution. Because a number of servers are used, if a fault occurs on one of them, its processing can be shifted to another server. Minimizing the scope of faults in this way significantly reduces recovery time (see Fig. 3). In addition to preventing any impact on other jobs due to overruns (tabulation jobs run as nightly batch processing taking longer than scheduled), the middleware can also ensure that scheduled execution times are achieved even when quantities increase in the future due to business growth. Example Applications In addition to preventing overruns, the ways in which the parallel and distributed processing middleware is used include being able to create new tasks by shortening execution time compared to the past. One example is the tabulation of from POS (point of sale) systems, which are normally run as daily batch processing to calculate sales totals. Making it possible to perform tabulation and analysis at onehour intervals instead of overnight speeds up decision making, such as product procurement and placement.

Hitachi Review Vol. 61 (2012), No. 2 97 Integrated operation and management Integrated operation and management of job scheduling, execution monitoring, and similar Parallel and distributed processing using resources from a number of computers Allocate the I/O. Job Parallel execution of jobs Minimize scope of fault. Application layer Job Job Job Job Job Job Job Job Fault Data Data Data Data Data Data Data Data Achieve high-speed access by spreading between multiple computers. Data layer Distributed access Fig. 3 Overview of Parallel and Distributed. The parallel and distributed processing middleware uses parallel and distributed processing to speed up batch processing. Features include the ease with which existing batch processing can be migrated, flexibility in how is allocated, and reliable batch processing completion times. Improving the speed and reliability of all aspects of batch jobs delivers major advantages in missioncritical fields, such as bases for systems that handle big, situations in which it is desirable to complete tabulation processing within a designated time, and settlement and bank transfer applications in the finance industry that require strict exclusive control. The parallel and distributed processing middleware is also recognized as an ideal solution for speeding up batch jobs that use existing COBOL programs, many of which remain in use in core business systems. It has been demonstrated that the level of program modification required to when migrating COBOL programs to the parallel and distributed processing middleware environment is only 1%. FUTURE-ORIENTED RESEARCH AND DEVELOPMENT: ULTRA-HIGH-SPEED DATABASE Large petabyte-class bases will be required if quantities continue to increase rapidly in the future. Unfortunately, existing commercial bases take a long time to process such large amounts of and are getting toward the point where they can no longer keep up. Accordingly, Hitachi, Ltd. is working with The University of Tokyo, etc. on the research and development project entitled Development of the Ultra-high-speed Database Engine for the Era of Super-large Databases and Demonstration & Evaluation of Strategic Social Services with This Database Engine at Their Heart, which is supported by the Funding Program for World-leading Innovative R&D on Science and Technology (FIRST). This project is developing an ultra-high-speed base engine that operates on a new principle called the out-of-order execution principle that was devised by The University of Tokyo. The aim is to enhance industrial competitiveness and help provide safety and peace of mind, with potential applications including the development of products that are tailored to specific needs based on an understanding of the customer s lifestyle and life stage, or to assist with quality assurance and more efficient inventory management through the use of traceability in production and distribution. CONCLUSIONS This article has given an overview of big processing and Hitachi s open middleware, which is evolving to meet its requirements. In addition to supplying maintenance support and products designed to use technologies, including those for stream processing and parallel and distributed processing, Hitachi is a one-stop supplier of a range of different solutions. One of these is the Big Data Distributed Processing Assessment Service. To get the best out of big in business, it is necessary to deepen one s understanding of

Hitachi Open Middleware for Big Data Processing 98 each of these technologies, and to pay attention to effectiveness evaluations when making decisions on which technology to use. Accordingly, Hitachi also offers a consulting service on how to analyze and utilize big, and on the use of the PaaS (platform as a service) products of Hitachi Cloud Computing Solutions to provide systems on which the stream processing middleware, the parallel and distributed processing middleware, or Hadoop are already configured in order to offer a testing support service that can assist customers in testing on an actual system in preparation for installation. Hitachi also intends to continue collaborating with The University of Tokyo on the development of an ultra-high-speed base engine, and to continue developing open middleware with the aim of processing big efficiently and applying it in business. REFERENCES (1) Distributed Processing of Big Data, http://www.hitachi.co.jp/ Prod/comp/soft1/big_/ in Japanese. (2) A. Arasu et al., STREAM: The Stanford Stream Data Manager, IEEE Data Engineering Bulletin 26, No. 1 (Mar. 2003). (3) Welcome to Apache Hadoop!, http://hadoop.apache.org/ (4) Development of the Fastest Database Engine for the Era of Very Large Database and Experiment and Evaluation of Strategic Social Services Enabled by the Database Engine, http://www.tkl.iis.u-tokyo.ac.jp/first/index.html in Japanese. (5) M. Kitsuregawa et al., Vision and Preliminary Experiments for Out-of-Order Database Engine (OoODE), DBSJ Journal 8, No. 1 (Jun. 2009) in Japanese. ABOUT THE AUTHORS Jun Yoshida Joined Hitachi, Ltd. in 1998, and now works at the Business and Technology Emerging Business Laboratory, Software Division, Information & Telecommunication Systems Company. He is currently engaged in market development of Big Data business. Nobuo Kawamura Joined Hitachi, Ltd. in 1981, and now works at the Innovative R&D Project Center, Software Division, Information & Telecommunication Systems Company. He is currently engaged in the development of the ultra-high-speed base engine for the era of superlarge bases through The University of Tokyo FIRST Program. Mr. Kawamura is a member of the Information Processing Society of Japan (IPSJ). Kazunori Tamura Joined Hitachi, Ltd. in 1991, and now works at the Network & Transaction Department, Software Division, Information & Telecommunication Systems Company. He is currently engaged in market development of the Stream Data Platform. Kazuhiko Watanabe Joined Hitachi, Ltd. in 1988, and now works at the Operating System Department, Software Division, Information & Telecommunication Systems Company. He is currently engaged in the parallel batch job project.