Development and Runtime Platform and High-speed Processing Technology for Data Utilization
|
|
|
- Geraldine Haynes
- 10 years ago
- Views:
Transcription
1 Development and Runtime Platform and High-speed Processing Technology for Data Utilization Hidetoshi Kurihara Haruyasu Ueda Yoshinori Sakamoto Masazumi Matsubara Dramatic increases in computing power and network speed, along with advances in sensing technology, have broadened the range of devices that can connect to the Internet. Large quantities of diverse, time-sequenced data are flowing in from the Web and countless sensors, and there is a strong demand to rapidly and efficiently extract any valuable information from that data and to utilize that data for various types of navigation systems. Fujitsu Laboratories envisions a Human-Centric Intelligent Society and is building a cloud platform to support it. This article introduces an integrated development and runtime platform technology for utilizing large quantities of data, a technology for extracting parallelism from complex event processing as a base technology for high-speed processing, and a distributed-parallel technology for complex event processing and describes the goals, features, and effects of each. It also gives an overview of directions for future technology initiatives. 1. Introduction Fujitsu is focusing on the value of data arising from the activities of objects and of people. It is important to derive new value from the large amounts of diverse data that, till now, were not collected or were collected but never used. There is increasing expectation that, with advances in ICT, such data can be used to help meet the traffic congestion challenge facing urban areas, effectively using energy resources, easing traffic congestion, and reducing medical costs. However, a combination of human intelligence, ICT infrastructure, and a rapid cycle of hypothesizing and testing is required to extract the value of data. The ICT infrastructure should have a simple interface for handling data and enable flexible system scaling, from small start-up to full service deployment. Fujitsu Laboratories envisions a Human-Centric Intelligent Society and, with such a society in mind, is conducting research with the goal of connecting and sharing information to expand ICT application fields and create new markets. 1) This article first gives an overview of development technologies, identifying inherent issues and how they can be addressed. It then describes three technologies based on these development technologies, including potential applications and their characteristics. These technologies are an integrated development/runtime platform, parallelism extraction from complex event processing, and distributed-parallel complex event processing. Finally, the effects of these technologies and technical directions needing to be addressed in the future are discussed. 2. Overview of development technologies Figure 1 illustrates the circular nature of data utilization. It starts with batch processing analysis of the collected data. Features of the data are then detected in real time using knowledge and rules from the analysis. These two steps are then repeated. An example use case is a service that analyzes tendencies in point-of-sale (POS) data and uses the results to issue coupons to particular customer segments when they approach a certain location. Since all the details of the service cannot be established at service start-up, they must be refined in a cycle of hypothesizing and testing. Maintaining real-time performance as the service scale expands requires high-speed data processing using cloud or other computing resources that can be increased easily. 16 FUJITSU Sci. Tech. J., Vol. 50, No. 1, pp (January 2014)
2 Business user 1) Integrated development and runtime platform technology 2) CEP parallelism extraction technology New knowledge/ rules Rules Data analysis/ search Sensors, meters, vehicles, etc. 3) Distributed-parallel CEP technology Real-time detection/decision CEP <Real-time system> Hypothesizing and testing cycle Parallel distribution Distributed data infrastructure... Hadoop Data analyst/searcher <Batch system> (~1000 machines) Figure 1 Overview of development and runtime technology. Such a service would utilize large amounts of POS and customer location data, so an ICT infrastructure must be used, including a batch processing system such as Hadoop (an open-source parallel batch processing infrastructure), and a real-time processing system such as a complex event processing (CEP) system. The issues inherent in meeting these requirements and our approaches to addressing them are discussed below. 2.1 Issues 1) Ease of service development Although business users developing a new service may have a wealth of business knowledge, they are generally unfamiliar with individual ICT processing technologies. It is thus difficult for them to learn a processing language supported by Hadoop or CEP and to develop and/or customize their service. Moreover, parallel processing must be used to maintain high performance as the scale of the business and the amount of data processed increases. Hadoop has advanced parallel processing technology and uses it automatically, but parallelization must be designed carefully in order to obtain adequate performance when processing complex events, and this is difficult for most business users. 2) Achieving both high performance and continuous operation High-speed data processing is needed to maintain real-time performance of a service with CEP. It is also important to maintain continuous operation as the service expands from small start-up with a few servers to full service deployment with many servers and as the scale of the business expands and contracts. It is difficult to achieve both high performance and continuous operation, so services have normally been stopped, reconfigured, and restarted when the upper limit of the system was reached. FUJITSU Sci. Tech. J., Vol. 50, No. 1 (January 2014) 17
3 2.2 Approaches 1) Ease of service development A development and runtime platform that generates programs automatically and independently of the processing language was developed [1) in Figure 1]. It provides a data flow diagram interface that defines the inputs and outputs for data processing, enabling business users to easily customize the analysis process and the service provision conditions, such as when coupons are issued. Another technology that was developed extracts parallelism from a description of complex event processing and automatically recommends combinations of parallel operations [2) in Figure 1]. By simply describing a data flow diagram, even business users can implement high-speed real-time services that use parallel processing. 2) Achieving both high performance and continuous operation A function was developed that adjusts computing resources and rapidly distributes the load dynamically, without stopping the system, in accordance with the volume of input events [3) in Figure 1]. It maximizes performance by using the parallelization extraction technology. Business users can thus provide continuous, stable, real-time services, independent of the volume of input events, and computing resources can be used more efficiently. 3. Integrated development and runtime platform technology Systems are conventionally built in accordance with the type of processing to be done, such as using Hadoop for batch processing and CEP for real-time processing, and development platforms are not unified. To enable even business users to develop and customize services easily, technology was developed to integrate the development and runtime platforms, enabling the cycle of hypothesizing and testing to be speeded up. As shown in Figure 2, the business user defines the process details using a data flow diagram on the basis of the application properties (parameters are defined for each process) and determines the process type (batch or real-time). An automatic program generation pattern is then used to automatically generate either a batch program (Hadoop) or a real-time program (CEP). Next, data-type conversion and other processes needed in accordance with the process details are added. The program and data are then distributed to the appropriate batch or real-time processing environment for execution. The automatic program generation patterns, which are a core component of the development and runtime platform, are explained below using the example of analyzing trends in accumulated POS or other data. The business user defines processing parameters as properties for trend analysis, such as the input data types and the initial number of clusters (upper-right in Figure 2). This information is combined with the set of automatic generation patterns pre-configured in the development and runtime platform (center-right) to generate the program automatically. For example, a pattern to generate a trend analysis program automatically could add pre-processing to select columns from the database of accumulated data and convert them into vector form and add post-processing to summarize the analysis results in a table. In this way, only the minimum necessary processing parameters are specified; the pre-, post-, and main processing programs are generated automatically by the pattern. The resulting program is distributed to the runtime platform and executed. 2) 4) The data-flow diagram editor and other components of the development and runtime platform are implemented using the Eclipse Remote Application Platform (RAP), so business users can use them as a Web service in the cloud and do not need to install them. Moreover, program transformations for the development and runtime platform use tools such as Eclipse Acceleo, which is a model transformation engine conforming to OMG MTL standard specifications, so it is easy to develop automatic generation patterns that conform to accepted standards. Functions that are essential for the hypothesizing and testing cycle, such as repository management and version control for the data-flow diagram and data schema definitions, are also implemented. Use of this technology enables the time from development of the analysis program to development of the complex event processing program to be reduced to approximately one-fifth (from eight weeks to 1.5 weeks). It also enables any of the parameters to be easily changed without reprogramming, making it easier to iterate the hypothesizing and testing cycle in the development platform. For example, knowledge 18 FUJITSU Sci. Tech. J., Vol. 50, No. 1 (January 2014)
4 Data flow diagram Batch processing Application properties Define parameters for each process Process definition Business user Real-time processing Batch process program Determine processing type Generate program automatically Complete processing in accordance with process details Set of automatic generation patterns Real-time processing Stored data Batch processing Event data Combine stored and event data Deploy/Execute 2) CEP parallelism extraction technology Deploy/Execute CEP Batch-process runtime platform (Hadoop) Real-time process runtime environment (CEP) 3) Distributed-parallel CEP technology Figure 2 Integrated development and runtime platform operation. gained from the analysis results can be readily applied to the event detection conditions. Work is currently underway to enhance the set of automatic generation patterns to be more general and extensible so that they can be used as a cloud platform (i.e., platform-as-a-service [PaaS]) for various types of data utilization. 4. Extracting parallelism from complex event processing To maintain high performance even as the amount of data being processed increases, parallel processing has to be implemented in the real-time processing programs generated automatically by the integrated development and runtime platform. A technology was thus developed to extract parallelism from the programs and to recommend effective ways of using this parallelism (Figure 3). In a distributed parallel CEP system 5) developed earlier at Fujitsu Laboratories, a single process (query) is shared among several servers, and events are processed in parallel by distributing them among several servers, thereby enabling events arriving in real time to be processed rapidly. For example, in the couponissuing example, the ID of a pedestrian (phone number, etc.) or of a regional area can be used as the key to partition processing by person or region, respectively. Automatic partitioning technology from the parallel database field can be extended and used to extract this sort of parallelism automatically for event processing. An identifier (key) suitable for parallel processing is extracted from the description of a process, such as grouping (group by) or joining, and the value of the key in the data is used to determine which server FUJITSU Sci. Tech. J., Vol. 50, No. 1 (January 2014) 19
5 1) Integrated development/runtime platform technology Before execution Input Selection Issue coupon Output 2) CEP parallelism extraction technology Input Selection Coupon Output Allocating events with process as key produces much communication between servers During execution Recommendation Input Output Query group Allocating events with query group as key reduces communication between servers Server 1 Selection Coupon Server 1 Input Server 2 Selection Coupon Output Input Server 2 Output Server N Selection Coupon Server N 3) Distributed-parallel CEP technology Figure 3 Technology to extract parallelism from CEP. will be used to process the data. This technology was extended to event processing functions such as eventpattern detection and filtering, making it possible to extract keys suitable for parallel processing and to process events in parallel by using the key values. This technology can also be used to obtain recommended optimal combinations of keys for multiple processes (a distribution strategy). It is generally easier to balance the load and achieve higher parallel performance by dividing a process into smaller units and selecting keys by sub-process, but with event processing, better performance can be achieved by reducing communication between processes. Thus, selecting key combinations that enable execution of as many processes as possible on the same server reduces the amount of communication required and increases the overall processing efficiency. Processes gathered together in this way are referred to as a query group. The CEP parallelism extraction technology was used to extract parallelism automatically from all queries in the coupon-issuing program and to recommend query groups with the pedestrian ID or the regional area ID as the key. This is normally done by a parallel processing specialist, so this technology eliminates the need for the high-level knowledge and tuning normally required for parallelization design, reducing development time. The performance of sample programs, each allocating events differently between processes, was measured to evaluate the increase in performance gained by optimizing the combination of keys used to distribute a program across multiple processes. Allocating using a key by query group resulted in a 60% reduction in the amount of communication compared to using a per-process key, increasing processing efficiency by a factor of 3.5. This corresponds to an increase in processing efficiency of 16 times compared to using no parallel processing. A future direction is to develop technology to extract even more parallelism and achieve higher processing efficiency by using trends in the data being processed and not simply the description of the 20 FUJITSU Sci. Tech. J., Vol. 50, No. 1 (January 2014)
6 real-time processing program. 5. Distributed parallel complex event processing technology In the distributed parallel CEP system, query processing that cannot be distributed is processed on a single server, and query processing that can be distributed is distributed over multiple servers using the distribution strategy described in the previous section (Figure 4). As described above, the server to be used to process the data is determined on the basis of a query-group key when distributing processing. Thus, by moving the keys being handled by each server in suitably small units, accurate load balancing among multiple servers and continuous operation can be achieved. The number of keys is large compared to the number of servers, so the load can be moved in units in accordance with the key and distributed with very fine granularity by determining which server will process which key values. However, if the load is moved in key units with granularity that is too fine, communication and other overhead can increase and result in data processing delays. Because of this, the data is divided into approximately 1000 partitions (key partitions) using a specified distribution method such as a key hash function (a computation that generates a fixedlength random number from arbitrary data), and the load is moved between servers in units of those partitions. If an event related to a key partition arrives while the key partition is being moved, the event is forwarded to the new server after the migration has completed. High performance is maintained by determining the number of servers needed to operate the system without interruption for the current operating conditions and distributing the load equally among the servers. Thus, in this parallel CEP system, the execution manager periodically collects usage state data from each CEP engine for resources such as CPU, memory, and network. It then determines whether load redistribution is needed, taking into consideration this resource-state data as well as the input event load state and query characteristics. If it determines that redistribution is needed, it calculates a suitable range for the number of servers, adjusts the number if the current number is not in this range, and then balances 2) CEP parallelism extraction technology Input Output Distribute query processing to multiple servers using query group as key. Query group Input Server 1 Server 2 Output... Adjust number of servers on basis of load state. Added server Runtime platform manager Move data in key-partition units. 3) Distributed-parallel CEP technology Figure 4 Parallel CEP system structure. FUJITSU Sci. Tech. J., Vol. 50, No. 1 (January 2014) 21
7 the overall load across the servers (by migrating key partitions). We evaluated the performance of a parallel CEP system with these functions by using it to implement the coupon-issuing application described above. It was used to match pedestrian location and preference data with product information from surrounding shops (up to shops) and issuing coupons appropriately. Up to eight servers were used, and pedestrian data events were input, gradually increasing in number from to The results confirmed that, as the load increased, the number of servers increased as needed, and the load was dynamically distributed across them. The system operated without interruption even while the load was being distributed. It processed queries with a latency of between 100 and 200 μs, maintaining stable, high-speed processing. This technology enables dynamic configuration changes in accordance with the load state of service utilization, while maintaining real-time performance. A prototype system 6) built in 2011 required a long time and approximately double the resources when dynamically changing the configuration, but these problems are eliminated with the current method. Practical testing based on several examples is planned, and functions will be developed that enable easier detection of higher-level patterns. 6. Conclusion These technologies simplify the development of services through use of the hypothesizing and testing cycle, reducing development time by approximately 80% in a scenario where coupons are issued on the basis of POS data analysis. They also enable real-time service with continuous operation to be maintained even as the processing load increases. We are currently integrating these technologies and targeting several projects to test them, such as services utilizing location data. The market for utilizing big data is maturing rapidly, and great value can be produced depending on how big data is used. Means for producing even greater value must be nurtured by combining these technologies with practical knowledge and expertise. As the amount of sensor data increases and virtualization technology advances, it will become more important to optimize overall systems, including cloud infrastructure, networks, and terminals. We will continue to apply and extend the technologies described here and to work on resolving the technical issues standing in the way of such overall optimization. The part of this research related to distributed parallel complex event processing technology was conducted under contract as part of a Japanese Ministry of Economy, Trade and Industry project, Development and Verification of Next-Generation Reliable, Energysaving IT Infrastructure Technology, in FY2010 and FY2011. References 1) Y. Sakashita et al.: Cloud Fusion Concept. Fujitsu Sci. Tech. J., Vol. 48, No. 2, pp (2012). 2) Y. Nomura et al.: Massive Event Data Analysis and Processing Service Development Environment Using DFD, Services (SERVICES) IEEE Eighth World Congress on Services, pp , ) K. Kimura: A Method of DFD to Software Components Transformation using Multi-query Integration. IPSJ Software Engineering Symposium 2012 Proceedings, 2012, pp. 1 8, 2012 (in Japanese). 4) K. Kimura et al.: Multi-Query Unification for Generating Efficient Big Data Processing Components from a DFD. Cloud Computing (CLOUD), 2013 IEEE International Conference on Cloud Computing, pp , ) Ministry of Economy, Trade and Industry: FY2011 Next Generation High-reliability, Energy-saving IT Infrastructure Technology Development Report: R&D on Large-Scale Data Stream Processing Infrastructure (in Japanese). cloud/2011/02.pdf 6) S. Tsuchiya et al.: Big Data Processing in Cloud Environments. Fujitsu Sci. Tech. J., Vol. 48, No. 2, pp (2012). 22 FUJITSU Sci. Tech. J., Vol. 50, No. 1 (January 2014)
8 Hidetoshi Kurihara Fujitsu Laboratories Ltd. Mr. Kurihara is engaged in research and development of application development and runtime platforms. Yoshinori Sakamoto Fujitsu Laboratories Ltd. Mr. Sakamoto is engaged in research and development of parallel complex event processing technology. Haruyasu Ueda Fujitsu Laboratories Ltd. Mr. Ueda is engaged in research and development of distributed processing for data utilization. Masazumi Matsubara Fujitsu Laboratories Ltd. Mr. Matsubara is engaged in research and development of parallel complex event processing technology. FUJITSU Sci. Tech. J., Vol. 50, No. 1 (January 2014) 23
Big Data Processing in Cloud Environments
Big Data in Cloud Environments Satoshi Tsuchiya Yoshinori Sakamoto Yuichi Tsuchimoto Vivian Lee In recent years, accompanied by lower prices of information and communications technology (ICT) equipment
StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data
: High-throughput and Scalable Storage Technology for Streaming Data Munenori Maeda Toshihiro Ozawa Real-time analytical processing (RTAP) of vast amounts of time-series data from sensors, server logs,
Mechanical Design Platform on Engineering Cloud
Mechanical Design Platform on Engineering Cloud Yuichi Arita Naoyuki Nozaki Koji Demizu The Fujitsu Group is always using leading-edge information and communications technology (ICT) and building an integrated
Online Failure Prediction in Cloud Datacenters
Online Failure Prediction in Cloud Datacenters Yukihiro Watanabe Yasuhide Matsumoto Once failures occur in a cloud datacenter accommodating a large number of virtual resources, they tend to spread rapidly
Overview of Next-Generation Green Data Center
Overview of Next-Generation Green Data Center Kouichi Kumon The dramatic growth in the demand for data centers is being accompanied by increases in energy consumption and operating costs. To deal with
General Design Information Management System: PLEMIA
General Design Information Management System: PLEMIA Yoshitaka Iida Hideto Murata Seiichi Kamata PLEMIA is a tool that realizes a management solution concept for Fujitsu s product lifecycle management
Fujitsu s Approach to Platform as a Service (PaaS)
Fujitsu s Approach to Platform as a Service (PaaS) Shoji Wajima Software as a service (SaaS) has been deployed primarily for general-purpose applications, but recently it has been increasingly used for
Power Supply and Demand Control Technologies for Smart Cities
Power Supply and Demand Control Technologies for Smart Cities Tomoyoshi Takebayashi Toshihiro Sonoda Wei-Peng Chen One of the missions of smart cities is to contribute to the environment and reduce social
Media Cloud Service with Optimized Video Processing and Platform
Media Cloud Service with Optimized Video Processing and Platform Kenichi Ota Hiroaki Kubota Tomonori Gotoh Recently, video traffic on the Internet has been increasing dramatically as video services including
Fujitsu Big Data Software Use Cases
Fujitsu Big Data Software Use s Using Big Data Opens the Door to New Business Areas The use of Big Data is needed in order to discover trends and predictions, hidden in data generated over the course of
Data Transfer Technology to Enable Communication between Displays and Smart Devices
Data Transfer Technology to Enable Communication between Displays and Smart Devices Kensuke Kuraki Shohei Nakagata Ryuta Tanaka Taizo Anan Recently, the chance to see videos in various places has increased
Private Cloud Using Service Catalog
Private Cloud Using Service Catalog Takahiro Nakai Enterprises are centralizing their in-house systems at data centers to reduce the total cost of ownership by using fewer servers and less power. However,
On-Demand Virtual System Service
On-Demand System Service Yasutaka Taniuchi Cloud computing, which enables information and communications technology (ICT) capacity to be used over the network, is entering a genuine expansion phase for
Practice of M2M Connecting Real-World Things with Cloud Computing
Practice of M2M Connecting Real-World Things with Cloud Computing Tatsuzo Osawa Machine-to-Machine (M2M) means connecting many machines with a management center via wide-area mobile or satellite networks
Grid Middleware for Realizing Autonomous Resource Sharing: Grid Service Platform
Grid Middleware for Realizing Autonomous Resource Sharing: Grid Service Platform V Soichi Shigeta V Haruyasu Ueda V Nobutaka Imamura (Manuscript received April 19, 2007) These days, many enterprises are
Network Virtualization for Large-Scale Data Centers
Network Virtualization for Large-Scale Data Centers Tatsuhiro Ando Osamu Shimokuni Katsuhito Asano The growing use of cloud technology by large enterprises to support their business continuity planning
Cloud Computing Based on Service- Oriented Platform
Cloud Computing Based on Service- Oriented Platform Chiseki Sagawa Hiroshi Yoshida Riichiro Take Junichi Shimada (Manuscript received March 31, 2009) A new concept for using information and communications
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
System Management and Operation for Cloud Computing Systems
System Management and Operation for Cloud Computing Systems Motomitsu Adachi Toshihiro Kodaka Motoyuki Kawaba Yasuhide Matsumoto With the progress of virtualization technology, cloud systems have been
Big Data Collection and Utilization for Operational Support of Smarter Social Infrastructure
Hitachi Review Vol. 63 (2014), No. 1 18 Big Data Collection and Utilization for Operational Support of Smarter Social Infrastructure Kazuaki Iwamura Hideki Tonooka Yoshihiro Mizuno Yuichi Mashita OVERVIEW:
High-Speed Thin Client Technology for Mobile Environment: Mobile RVEC
High-Speed Thin Client Technology for Mobile Environment: Mobile RVEC Masahiro Matsuda Kazuki Matsui Yuichi Sato Hiroaki Kameyama Thin client systems on smart devices have been attracting interest from
Fujitsu s Approach to Hybrid Cloud Systems
Fujitsu s Approach to Hybrid Cloud Systems Mikio Funahashi Shigeo Yoshikawa This paper introduces Fujitsu s approach to a hybrid cloud, which combines internal (on-premises) systems and services on public
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Social Innovation through Utilization of Big Data
Social Innovation through Utilization of Big Data Hitachi Review Vol. 62 (2013), No. 7 384 Shuntaro Hitomi Keiro Muro OVERVIEW: The analysis and utilization of large amounts of actual operational data
Technologies Supporting Smart Meter Networks
Technologies Supporting Smart Meter Networks Isamu Kitagawa Shinichi Sekiguchi The power crisis that Japan confronted in the wake of the Great East Japan Earthquake in 2011 underscored the need for managing
SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here
PLATFORM Top Ten Questions for Choosing In-Memory Databases Start Here PLATFORM Top Ten Questions for Choosing In-Memory Databases. Are my applications accelerated without manual intervention and tuning?.
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
IT Platforms for Utilization of Big Data
Hitachi Review Vol. 63 (2014), No. 1 46 IT Platforms for Utilization of Big Yasutomo Yamamoto OVERVIEW: The growing momentum behind the utilization of big in social and corporate activity has created a
Revolutionizing System Support in the Age of Electronic Medical Records
Revolutionizing System Support in the Age of Electronic Medical Records Toshihiro Sakurai To allow its customers to use electronic medical record (EMR) systems with peace of mind, Fujitsu began providing
NTT DATA Big Data Reference Architecture Ver. 1.0
NTT DATA Big Data Reference Architecture Ver. 1.0 Big Data Reference Architecture is a joint work of NTT DATA and EVERIS SPAIN, S.L.U. Table of Contents Chap.1 Advance of Big Data Utilization... 2 Chap.2
Engineering Cloud: Flexible and Integrated Development Environment
Engineering Cloud: Flexible and Integrated Development Environment Seiichi Saito Akira Ito Hiromu Matsumoto Eiji Ohta Nowadays product development must be done speedily and in a way that can respond to
Big Data and Apache Hadoop Adoption:
Expert Reference Series of White Papers Big Data and Apache Hadoop Adoption: Key Challenges and Rewards 1-800-COURSES www.globalknowledge.com Big Data and Apache Hadoop Adoption: Key Challenges and Rewards
Advanced Electronic Platform Technologies Supporting Development of Complicated Vehicle Control Software
133 Hitachi Review Vol. 63 (2014), No. 2 Advanced Electronic Platform Technologies Supporting Development of Complicated Vehicle Control Software Yoshinobu Fukano, Dr. Sci. Kosei Goto Masahiro Matsubara
How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
SQL Server 2012 Performance White Paper
Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.
Quality Assurance for Stable Server Operation
Quality Assurance for Stable Server Operation Masafumi Matsuo Yuji Uchiyama Yuichi Kurita Server products are the core elements of mission-critical systems in society and business and are required to operate
Distributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult
Approach to Energy Management for Companies
Approach to Energy Management for Companies Tomiyasu Ichimura Masahiro Maeeda Kunio Fukumoto Ken Kuroda Ryuzou Fukunaga Most companies in Japan are taking energy-saving actions in an effort to cope with
Search and Real-Time Analytics on Big Data
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
Big Data. Fast Forward. Putting data to productive use
Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize
Virtual PC-Type Thin Client System
Thin Client Virtual PC-Type Thin Client System KAWASHIMA Hiroyuki, KOSHIBA Kunihiro, TUCHIMOCHI Kazuki, FUTAMURA Kazuhiro, ENOMOTO Masahiko, WATANABE Masahiro Abstract NEC has developed VirtualPCCenter,
Operating System for the K computer
Operating System for the K computer Jun Moroo Masahiko Yamada Takeharu Kato For the K computer to achieve the world s highest performance, Fujitsu has worked on the following three performance improvements
Integration of PRIMECLUSTER and Mission- Critical IA Server PRIMEQUEST
Integration of and Mission- Critical IA Server V Masaru Sakai (Manuscript received May 20, 2005) Information Technology (IT) systems for today s ubiquitous computing age must be able to flexibly accommodate
Big Data for the Rest of Us Technical White Paper
Big Data for the Rest of Us Technical White Paper Treasure Data - Big Data for the Rest of Us 1 Introduction The importance of data warehousing and analytics has increased as companies seek to gain competitive
Cloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
Middleware for Creating Private Clouds
Middleware for Creating Private Clouds Hiroshi Nagakura Akihiko Sakurai Cloud computing has been attracting a lot of attention recently. This is because it can meet demands for speedy system implementation
Policy-based Pre-Processing in Hadoop
Policy-based Pre-Processing in Hadoop Yi Cheng, Christian Schaefer Ericsson Research Stockholm, Sweden [email protected], [email protected] Abstract While big data analytics provides
Symfoware Server Reliable and Scalable Data Management
Symfoware Server Reliable and Scalable Data Management Kazunori Takata Masaki Nishigaki Katsumi Matsumoto In recent years, the scale of corporate information systems has continued to expand and the amount
Using an In-Memory Data Grid for Near Real-Time Data Analysis
SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
SaaS and PaaS of Engineering Cloud
SaaS and PaaS of Engineering Cloud Yoshifumi Yoshida Yusuke Fujita Fujitsu will provide Desktop as a Service (DaaS) (offers remote access from a thin client) as a cloud service that is specialized for
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
2. Research and Development on the Autonomic Operation. Control Infrastructure Technologies in the Cloud Computing Environment
R&D supporting future cloud computing infrastructure technologies Research and Development on Autonomic Operation Control Infrastructure Technologies in the Cloud Computing Environment DEMPO Hiroshi, KAMI
Scala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
HPC technology and future architecture
HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange [email protected] Toàn Nguyên [email protected]
How To Make Data Streaming A Real Time Intelligence
REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log
High-Speed Information Utilization Technologies Adopted by Interstage Shunsaku Data Manager
High-Speed Information Utilization Technologies Adopted by Interstage Shunsaku Data Manager V Hiroya Hayashi (Manuscript received November 28, 2003) The current business environment is characterized by
ANALYTICS BUILT FOR INTERNET OF THINGS
ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that
Research Article Hadoop-Based Distributed Sensor Node Management System
Distributed Networks, Article ID 61868, 7 pages http://dx.doi.org/1.1155/214/61868 Research Article Hadoop-Based Distributed Node Management System In-Yong Jung, Ki-Hyun Kim, Byong-John Han, and Chang-Sung
Informatica and the Vibe Virtual Data Machine
White Paper Informatica and the Vibe Virtual Data Machine Preparing for the Integrated Information Age This document contains Confidential, Proprietary and Trade Secret Information ( Confidential Information
GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications
GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102
SharePlex for SQL Server
SharePlex for SQL Server Improving analytics and reporting with near real-time data replication Written by Susan Wong, principal solutions architect, Dell Software Abstract Many organizations today rely
Disaster Recovery Feature of Symfoware DBMS
Disaster Recovery Feature of Symfoware MS V Teruyuki Goto (Manuscript received December 26, 2006) The demands for stable operation of corporate information systems continue to grow, and in recent years
Design of Electric Energy Acquisition System on Hadoop
, pp.47-54 http://dx.doi.org/10.14257/ijgdc.2015.8.5.04 Design of Electric Energy Acquisition System on Hadoop Yi Wu 1 and Jianjun Zhou 2 1 School of Information Science and Technology, Heilongjiang University
Novel IT Platform for Social Innovation
Hitachi Review Vol. 59 (2010), No. 5 199 Novel IT Platform for Social Innovation Takeshi Ishizaki Chikara Nakae Nobutoshi Sagawa Yang Qiu Chen HITACHI s ROLE IN SUPPORTING SOCIAL CHANGE HITACHI in 2010
Introduction to Cloud Computing
Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services
Knowledge Base Data Warehouse Methodology
Knowledge Base Data Warehouse Methodology Knowledge Base's data warehousing services can help the client with all phases of understanding, designing, implementing, and maintaining a data warehouse. This
MySQL. Leveraging. Features for Availability & Scalability ABSTRACT: By Srinivasa Krishna Mamillapalli
Leveraging MySQL Features for Availability & Scalability ABSTRACT: By Srinivasa Krishna Mamillapalli MySQL is a popular, open-source Relational Database Management System (RDBMS) designed to run on almost
SURVEY REPORT DATA SCIENCE SOCIETY 2014
SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses
An Oracle White Paper October 2013. Oracle Data Integrator 12c New Features Overview
An Oracle White Paper October 2013 Oracle Data Integrator 12c Disclaimer This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality, and should
Network Technology Supporting an Intelligent Society: WisReed
Network Technology Supporting an Intelligent Society: WisReed Yuji Takahashi Kazuya Kawashima Yuta Nakaya Tatsuya Ichikawa Fujitsu intends to help achieve an intelligent society by providing cloud-based
