ROCANA WHITEPAPER Rocana Ops Architecture

Size: px
Start display at page:

Download "ROCANA WHITEPAPER Rocana Ops Architecture"

Transcription

1 ROCANA WHITEPAPER Rocana Ops Architecture

2 CONTENTS INTRODUCTION... 2 DESIGN PRINCIPLES... 2 EVENTS: A COHESIVE AND FLEXIBLE DATA MODEL FOR OPERATIONAL DATA... 4 DATA COLLECTION... 5 Syslog... 6 File tailing... 6 Directory spooling... 6 Java API... 6 REST API... 7 Log4J... 7 PUBLISH-SUBSCRIBE MESSAGING... 7 EVENT PROCESSING AND STORAGE... 7 Event Storage... 7 Search Indexing... 8 Metric Aggregation... 8 Advanced Analytics... 8 Data Lifecycle Management... 8 DATA EXPLORATION, VISUALIZATION, AND ANALYSIS... 9 EXTENDING ROCANA OPS... 9 ABOUT ROCANA Rocana, Inc. 1

3 INTRODUCTION The infrastructure that runs the modern business looks fundamentally different today than it did twenty, ten, or even five years ago, and operations has been struggling to keep pace with the pace of change. Applications that ran on single large machines gave way to specialized servers numbering in the tens or hundreds, followed by the virtualization movement, which turned the hardware substrate into a general purpose platform. Today, applications themselves look different; they re decomposed into finegrained services, distributed across many machines, and highly interdependent. Organizations have begun shifting from discrete line-of-business applications to shared services built and maintained by platform groups. The modern application may be the amalgamation of tens of services, each of which operates and scales independently. While incredibly powerful, resilient, and scalable, these systems are also very complex, and are challenging to operate in mission critical environments. Traditional IT operations management tools that worked well in the past can no longer provide the necessary insight and run at the scale required for modern IT infrastructure. A fundamentally new approach is required. One that treats IT Operations Analytics as the Big Data problem it has become. Rocana Ops was built from the ground up to give system, network, and application administrators and developers true insight into modern infrastructure and applications. It was designed with the assumptions that there are hundreds of thousands of machines to be managed, and that the relationship between services, applications, and the underlying infrastructure is ephemeral and can change dynamically. The fundamental goal of Rocana Ops is to provide a single view of event and metric data for all services, while separating the signal from the noise using advanced analytics and machine learning techniques associated with Big Data challenges in other domains. DESIGN PRINCIPLES When building Rocana Ops, we held the belief that it would be used across a large organization as a shared service. As a result, it needed to operate at enterprise scale, handling tens of terabytes per day of event and metric data, and hundreds or even thousands of users engaging with the application concurrently. We ve anticipated collected data would be used for more than the most obvious operational use cases: Our customers would want to extend the platform and use the data in new and unexpected ways. We therefore needed to eliminate proprietary formats and inefficient batch exports, to allow customers to truly own their data assets. Also, in order to promote interoperability, extensibility and scalability, we maximized the use of open source platforms in the underlying architecture. In order to support production operations, we dramatically reduced end-to-end latency from when data is initially produced to when it s available for query and analytics. An operational analytics platform, by definition, also needs to be more available than the systems it monitors. And therefore any failures in the system or the infrastructure on which it runs need to be anticipated and handled appropriately Rocana, Inc. 2

4 Pulling together operational data from disparate systems is inherently an exercise in data integration. It is infeasible to require modification of source systems; therefore, extensible data processing and transformation needs to be a first class citizen. The system has many different kinds of users. The needs of application, network, and system administrators differ from those of developers, and DevOps staff have still different requirements. To be truly useful, the system needs to support all of these users and use cases. High-Level Logical View of Rocana Ops Architecture Analyzing complex operational data is not that different from analyzing security, customer behavior, or financial data, but these IT operators rarely have tools as sophisticated as those of their peers in these other disciplines. There s typically a lot of noise in operational monitoring at scale, and users can be easily overwhelmed by it, lacking the time to separate the signal. Further, the signal required to solve one problem may be the noise for another. Rocana Ops aims to address this by providing out-of-the-box machine-learning algorithms that guide the user in the analysis process. The solution s interaction model is based on visualizations that facilitate narrowing down the scope of analysis and pinpointing problem areas. Many years of operational experience, big data expertise, and advanced analytics practice have come together to build the next generation of operational analytics for today s modern infrastructure. One of the primary goals of Rocana Ops is to eliminate data silos by combining the critical features of different operational systems into a single application. That said, much of the data collected by Rocana Ops is the same data used by specialized applications in security, marketing, e-commerce, and other such systems. Rather than force developers to reinvent the kind of scale-out infrastructure that drives Rocana Ops and source this data a second time, we actively encourage direct integration with and extension of the Rocana Ops platform Rocana, Inc. 3

5 EVENTS: A COHESIVE AND FLEXIBLE DATA MODEL FOR OPERATIONAL DATA To simplify collection, transformation, storage, exploration, and analysis, Rocana Ops uses an extensible event data model to represent all data within the system. This event data model provides a simple and flexible way of representing time-oriented discrete events that occur within the data center. An event can represent a miscellaneous log record, an HTTP request, a system or application authentication result, a SQL query audit, or even a group of application metrics at a particular point in time designated with an event type. Application and device performance data is also captured as events, categorized under one of the system built-in event types. Data sources generate events containing one or more metrics in the attributes of a metric event, and the system automatically builds and maintains a time series of each metric over multiple dimensions. All events have a set of common fields, including a millisecond-accurate timestamp, event type, unique identifier, and event message body, as well as the originating host, location, and service. Since all events have these common fields, it s possible to correlate otherwise disparate data sources without requiring complex and bespoke queries. Additionally, each event contains a set of key-value pairs which can carry event type-specific data. These attributes act as a natural point of customization for user-defined event types. Moreover, because all parts of Rocana Ops understand these attributes, new or business-specific event data can be easily captured, processed, and queried without custom development. Here s an example of a typical syslog (RFC 3164) event, represented as text. Within the system, events in transit are actually encoded in Apache Avro format; an efficient, compact, binary format designed for the transport and storage of data. Example: A syslog (RFC 3164) event. { // Common fields present in all events. } } id: JRHAIDMLCKLEAPMIQDHFLO3MXYXV7NVBEJNDKZGS2XVSEINGGBHA====, event_type_id: 100, ts: , location: aws/us-west-2a, host: example01.rocana.com, service: dhclient, body:..., attributes: { // Attributes are specific to the event type. syslog_timestamp: , syslog_process: dhclient, syslog_pid: 865, syslog_facility: 3, syslog_severity: 6, syslog_hostname: example01, syslog_message: DHCPACK from (xid=0x5c64bdb0) 2015 Rocana, Inc. 4

6 This example shows how HTTP request events from an application server can be easily represented as well. Example: An application server-neutral HTTP request event. { id: JRHAIDMLCKLEAPMIQDHFLO3MXYXV7NVBEJNDKZGS2XVSEINGGBHA====, event_type_id: 103, ts: , location: nyc/fac2/rack17, host: example01.rocana.com, service: httpd, body:..., attributes: { http_request_url: http_request_vhost: http_request_proto: HTTP/1.1, http_request_method: GET, http_request_path: /product/104, http_request_query: ref=search&uid=1007, http_request_scheme: http, http_response_code: 200, http_response_size: 2452,... } } When building an operational analytics platform as a service to business units within a larger organization, it s common to onboard new event types constantly. This flexible event format allows the system to capture new event types without requiring configuration within the platform prior to receiving data. Additionally, both raw and extracted data can be retained within an event, allowing for easy reprocessing in case data processing errors are found. DATA COLLECTION As noted before, we see operational analytic a first class exercise in data integration, a solution needs to collect data from every system that makes up the larger organizational infrastructure, as well as handle the different ways in which those systems produce logs, metrics, and diagnostics. In simple cases, integration may be possible via configuration; elsewhere, custom plugins may do the trick. Specialized cases may require deeper integration involving direct access to traditionally internal formats and data. All of these methods are supported by Rocana Ops using two main mechanisms: 2015 Rocana, Inc. 5

7 1. A native Rocana Agent plays three major roles on Linux and Windows operating systems. The Agent acts as a syslog server and provides file tailing and directory spooling data integration for locally written logs. Additionally, the Agent collects OS- and host-level metrics of the machines on which it runs. ROCANA OPS AGENT File tailing Syslog Server Directory Spooling 2. A high-performance Java API is also available to collect data directly from applications. It can be used directly for system extension, or employing wrapper REST and Log4J APIs. JAVA API Log4J REST API SYSLOG Syslog is the primary source for Unix-variant operating system data, as well as most network devices. The Rocana Agent operates a RFC and 5246-compliant syslog server, supporting both TCP and UDP protocols. Syslog messages are automatically parsed, becoming events within the system. FILE TAILING Text-based log files are a common and simple method of exposing application activity. The Rocana Agent supports reliable real-time tailing of log files with customizable regular expressions for extracting fields from records. DIRECTORY SPOOLING While file tailing is used to watch well-known log files that receive a continuous stream of small changes, directory spooling supports the use case of directories that receive larger data files that should be ingested once. If systems dump diagnostic trace or performance files, for example, the Rocana Agent can respond by processing each file as it arrives in a specified filesystem directory. JAVA API The Rocana Ops Java API is the highest performance, most direct, and most flexible method of producing or consuming event data. Those who wish to explicitly instrument applications or integrate with custom systems can use this API to produce data directly to, or consume data from, the publish/subscribe messaging system used by the rest of the platform. This same API powers the REST API as well as many of the internal components of Rocana Ops and, as a result, is highly optimized and proven Rocana, Inc. 6

8 REST API A simple REST API is provided on top of the Java API for easy integration of systems where performance is less critical. This API can be used with any language or thirdparty system that supports HTTP/S and JSON parsing. LOG4J An appender plugin for the common Apache Log4J Java logging framework is provided for code-free integration with systems already instrumented with these APIs. Using the Rocana Ops Log4J appender obviates the need to write text logs to disk, sending data directly and reliably to the pub/sub messaging system. PUBLISH-SUBSCRIBE MESSAGING At its core, Rocana Ops uses the high-throughput, reliable, scale-out, open source publish-subscribe ( pub-sub ) messaging system Apache Kafka for the transport of all data within the system. This pub-sub layer acts as the central nervous system, facilitating all of the real-time data delivery and processing performed by the rest of the system. All event data captured by the sources described in the Data Integration section is sent to a global event stream, which is consumed by the system in different ways. This firehose of event data provides a single, full-fidelity, real-time view of all activity within an entire organization, making it the perfect data integration point for both Rocana Ops and custom applications. Kafka has a very familiar logical structure, similar to any other pubsub broker, with a few notable deviations that allow it to function at this scale. Just as with traditional pubsub messaging systems, Kafka employs the notions of producers, consumers, and topics to which producers send, or from which consumers receive data. All data in Kafka is always persisted to disk in transaction logs. This is the equivalent of most reliable or durable modes in traditional messaging systems. Rather than assume all data fits on a single broker, however, Kafka partitions each topic, spreading the partitions across multiple brokers that work together as a cluster. Each partition of a topic is also optionally replicated a configurable number of times so broker failures can be tolerated. The Rocana Ops data sources described earlier automatically distribute data across these partitions, taking full advantage of the aggregate capacity of the available brokers. When more capacity is required, additional brokers may be added to the cluster. For additional technical information about Apache Kafka, see EVENT PROCESSING AND STORAGE Event data is processed in real-time by specialized services as it is received from the messaging layer. The following services exist within Rocana Ops to facilitate its different functionalities: EVENT STORAGE All event data is persisted to the Hadoop Distributed Filesystem (HDFS) for long-term storage and downstream processing. Data is stored in Apache Parquet a highly optimized, PAX-structured format supporting common columnar storage features such as run length encoding (RLE), dictionary encoding, as well as traditional block 2015 Rocana, Inc. 7

9 compression. This dataset is partitioned by time to facilitate partition pruning when performing queries against known time ranges (the common case in operational analytics). SEARCH INDEXING Each event is indexed for full text and faceted search within the Rocana Ops application. Just as with the event storage service, search indexes are partitioned by time, and served by a scale-out parallelized search engine. Indexing is performed in real-time, as the data arrives. METRIC AGGREGATION In addition to discrete events, Rocana Ops also maintains time series datasets of metric data from device and application activity. Metric data can arrive as events containing host-related metrics collected by the Rocana Agent (described earlier), or it can be extracted from other kinds of events, such as logs. Examples include extracting the number of failed application logins from clickstream data, the number and types of various HTTP errors, or the query time of every SQL statement executed in a relational database. The metric aggregation service records and derives this kind of metric data, writing it to HDFS as a time-partitioned Parquet dataset. Rocana Ops uses this data to build charts and detect patterns in the data using advanced data analysis techniques. ADVANCED ANALYTICS Many of the advanced analytical functions of Rocana Ops operate by observing the data over time and learning about patterns that occur. Rocana Ops employs sophisticated anomaly detection, using machine learning to develop baseline models for the system as a whole as well as custom models of hosts, services, or location. For example, Rocana Ops anomaly detection can establish an ever-improving model of disk IO on a particular machine, then flag any unexpected deviations from that norm. The analytics service controls the execution of these algorithms. DATA LIFECYCLE MANAGEMENT Over time, it becomes necessary to control the growth and characteristics of highvolume datasets in order to control resource consumption and cost. Rocana Ops includes a data lifecycle management (DLM) service that enforces policies on the collected data, including control over data retention and optimization. All services described above are highly available and may be scaled independently to accommodate the size and complexity of a deployment Rocana, Inc. 8

10 DATA EXPLORATION, VISUALIZATION, AND ANALYSIS Rocana Ops includes an interactive user interface for exploring, visualizing, and analyzing operational data collected by the system. Specialized views exist for interactive data exploration and trends identification, comparing and correlating different kinds of events, full-text search, custom dashboard creation, and more. These views combine metadata extracted from event data, time series metric data, and discrete event data to provide a comprehensive understanding of what s happening within the infrastructure, without requiring operators to learn specialized skills or query syntax. Rather than attempt to support these visualizations by shoehorning all their requests into a single general-purpose query engine or storage system, Rocana Ops uses different query engines and data representations to answer different kinds of questions. Interactive full text search queries, for instance, are handled by a parallel search engine, while charts of time series data use a parallel SQL engine over partitioned Parquet datasets. All parts of the system use first-class parallelized query and storage systems for handling event and metric data, making Rocana Ops the first natively parallelized operational analytics system available. EXTENDING ROCANA OPS Rocana Ops is highly customizable, all the way from the user interfaces down to the data platform. Users can extend the system by defining their own event types for specific use cases. They can also develop custom producers to support custom sources, as well as custom consumers to execute specialized data processing. In addition, data transformation can be defined using configuration-based approach to enable quick, simple transformation from and to multiple formats. Anomaly detection targets and thresholds, as well as system-wide data retention policies are also fully customizable Rocana, Inc. 9

11 Simplified Flow Diagram for Integrating External Data with Events CONCLUSION Rocana Ops is all about bringing the power of Big Data analytics into the realm of IT operations. It provides several mechanisms to ingest all of your data, and utilizes the Kafka bus architecture to insure that the system can scale to the required volumes. It also uses HDFS for long term data retention in order to minimize the costs associated with storage infrastructure. The use of these, as well as other open source platforms and formats also insures that the data is always accessible and is never held hostage. To that end, robust integration mechanisms are also provided in and out of the system. In order to deliver sophisticated yet straightforward data analysis (typically associated with other domains such as finance), Rocana Ops employs anomaly detection algorithms based on machine learning, which drive visualizations to guide the user to the most important data points. These algorithms can be tweaked, but are meant to work right out of the box, without requiring any data science background. Our key goal is to provide a simple, easy to use application, while taking advance of the most sophisticated technologies available today Rocana, Inc. 10

12 ABOUT ROCANA Rocana is creating the next generation of IT operations analytics software in a world in which IT complexity is growing exponentially as a result of virtualization, containerization and shared services. Rocana s mission is to provide guided root cause analysis of event oriented machine data in order to streamline IT operations and boost profitability. Founded by veterans from Cloudera, Vertica and Experian, the Rocana team has directly experienced the challenges of today s IT infrastructures, and has set out to address them using modern technology that leverages the Hadoop ecosystem. Rocana, Inc. 548 Market St #22538, San Francisco, CA (877) ROCANA Rocana, Inc. All rights reserved. Rocana and the Rocana logo are trademarks or registered trademarks of Rocana, Inc. in the United States and/or other countries. WP-ARCH Rocana, Inc. 11

ROCANA WHITEPAPER. Improving Event Data Management and Legacy Systems

ROCANA WHITEPAPER. Improving Event Data Management and Legacy Systems ROCANA WHITEPAPER Improving Event Data Management and Legacy Systems INTRODUCTION STATE OF AFFAIRS WHAT IS EVENT DATA? There are a myriad of terms and definitions related to data that is the by-product

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

ROCANA WHITEPAPER How to Investigate an Infrastructure Performance Problem

ROCANA WHITEPAPER How to Investigate an Infrastructure Performance Problem ROCANA WHITEPAPER How to Investigate an Infrastructure Performance Problem INTRODUCTION As IT infrastructure has grown more complex, IT administrators and operators have struggled to retain control. Gone

More information

Best Practices for Hadoop Data Analysis with Tableau

Best Practices for Hadoop Data Analysis with Tableau Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks

More information

HDFS. Hadoop Distributed File System

HDFS. Hadoop Distributed File System HDFS Kevin Swingler Hadoop Distributed File System File system designed to store VERY large files Streaming data access Running across clusters of commodity hardware Resilient to node failure 1 Large files

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

effective performance monitoring in SAP environments

effective performance monitoring in SAP environments WHITE PAPER September 2012 effective performance monitoring in SAP environments Key challenges and how CA Nimsoft Monitor helps address them agility made possible table of contents executive summary 3

More information

CDH AND BUSINESS CONTINUITY:

CDH AND BUSINESS CONTINUITY: WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable

More information

Integrating Cloudera and SAP HANA

Integrating Cloudera and SAP HANA Integrating Cloudera and SAP HANA Version: 103 Table of Contents Introduction/Executive Summary 4 Overview of Cloudera Enterprise 4 Data Access 5 Apache Hive 5 Data Processing 5 Data Integration 5 Partner

More information

BIG DATA ANALYTICS For REAL TIME SYSTEM

BIG DATA ANALYTICS For REAL TIME SYSTEM BIG DATA ANALYTICS For REAL TIME SYSTEM Where does big data come from? Big Data is often boiled down to three main varieties: Transactional data these include data from invoices, payment orders, storage

More information

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Using In-Memory Computing to Simplify Big Data Analytics

Using In-Memory Computing to Simplify Big Data Analytics SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed

More information

WHITE PAPER. Five Steps to Better Application Monitoring and Troubleshooting

WHITE PAPER. Five Steps to Better Application Monitoring and Troubleshooting WHITE PAPER Five Steps to Better Application Monitoring and Troubleshooting There is no doubt that application monitoring and troubleshooting will evolve with the shift to modern applications. The only

More information

CRITEO INTERNSHIP PROGRAM 2015/2016

CRITEO INTERNSHIP PROGRAM 2015/2016 CRITEO INTERNSHIP PROGRAM 2015/2016 A. List of topics PLATFORM Topic 1: Build an API and a web interface on top of it to manage the back-end of our third party demand component. Challenge(s): Working with

More information

Server & Application Monitor

Server & Application Monitor Server & Application Monitor agentless application & server monitoring SolarWinds Server & Application Monitor provides predictive insight to pinpoint app performance issues. This product contains a rich

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Machine Data Analytics with Sumo Logic

Machine Data Analytics with Sumo Logic Machine Data Analytics with Sumo Logic A Sumo Logic White Paper Introduction Today, organizations generate more data in ten minutes than they did during the entire year in 2003. This exponential growth

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Using an In-Memory Data Grid for Near Real-Time Data Analysis SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses

More information

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A

More information

Violin Symphony Abstract

Violin Symphony Abstract Violin Symphony Abstract This white paper illustrates how Violin Symphony provides a simple, unified experience for managing multiple Violin Memory Arrays. Symphony facilitates scale-out deployment of

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Putting Apache Kafka to Use!

Putting Apache Kafka to Use! Putting Apache Kafka to Use! Building a Real-time Data Platform for Event Streams! JAY KREPS, CONFLUENT! A Couple of Themes! Theme 1: Rise of Events! Theme 2: Immutability Everywhere! Level! Example! Immutable

More information

Paper 064-2014. Robert Bonham, Gregory A. Smith, SAS Institute Inc., Cary NC

Paper 064-2014. Robert Bonham, Gregory A. Smith, SAS Institute Inc., Cary NC Paper 064-2014 Log entries, Events, Performance Measures, and SLAs: Understanding and Managing your SAS Deployment by Leveraging the SAS Environment Manager Data Mart ABSTRACT Robert Bonham, Gregory A.

More information

Trafodion Operational SQL-on-Hadoop

Trafodion Operational SQL-on-Hadoop Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL

More information

The IBM Cognos Platform

The IBM Cognos Platform The IBM Cognos Platform Deliver complete, consistent, timely information to all your users, with cost-effective scale Highlights Reach all your information reliably and quickly Deliver a complete, consistent

More information

From Spark to Ignition:

From Spark to Ignition: From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

WHITE PAPER. Reference Guide for Deploying and Configuring Apache Kafka

WHITE PAPER. Reference Guide for Deploying and Configuring Apache Kafka WHITE PAPER Reference Guide for Deploying and Configuring Apache Kafka Revised: 02/2015 Table of Content 1. Introduction 3 2. Apache Kafka Technology Overview 3 3. Common Use Cases for Kafka 4 4. Deploying

More information

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES Data Consolidation and Multi-Tenancy in Financial Services CLOUDERA INDUSTRY BRIEF 2 Table of Contents Introduction 3 Security

More information

Firebird meets NoSQL (Apache HBase) Case Study

Firebird meets NoSQL (Apache HBase) Case Study Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

What can DDS do for You? Learn how dynamic publish-subscribe messaging can improve the flexibility and scalability of your applications.

What can DDS do for You? Learn how dynamic publish-subscribe messaging can improve the flexibility and scalability of your applications. What can DDS do for You? Learn how dynamic publish-subscribe messaging can improve the flexibility and scalability of your applications. 2 Contents: Abstract 3 What does DDS do 3 The Strengths of DDS 4

More information

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems Simplified Management With Hitachi Command Suite By Hitachi Data Systems April 2015 Contents Executive Summary... 2 Introduction... 3 Hitachi Command Suite v8: Key Highlights... 4 Global Storage Virtualization

More information

Oracle Identity Analytics Architecture. An Oracle White Paper July 2010

Oracle Identity Analytics Architecture. An Oracle White Paper July 2010 Oracle Identity Analytics Architecture An Oracle White Paper July 2010 Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may

More information

Agenda. Some Examples from Yahoo! Hadoop. Some Examples from Yahoo! Crawling. Cloud (data) management Ahmed Ali-Eldin. First part: Second part:

Agenda. Some Examples from Yahoo! Hadoop. Some Examples from Yahoo! Crawling. Cloud (data) management Ahmed Ali-Eldin. First part: Second part: Cloud (data) management Ahmed Ali-Eldin First part: ZooKeeper (Yahoo!) Agenda A highly available, scalable, distributed, configuration, consensus, group membership, leader election, naming, and coordination

More information

Enterprise Solution for Remote Desktop Services... 2. System Administration... 3. Server Management... 4. Server Management (Continued)...

Enterprise Solution for Remote Desktop Services... 2. System Administration... 3. Server Management... 4. Server Management (Continued)... CONTENTS Enterprise Solution for Remote Desktop Services... 2 System Administration... 3 Server Management... 4 Server Management (Continued)... 5 Application Management... 6 Application Management (Continued)...

More information

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon. Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new

More information

Active Directory Comapatibility with ExtremeZ-IP A Technical Best Practices Whitepaper

Active Directory Comapatibility with ExtremeZ-IP A Technical Best Practices Whitepaper Active Directory Comapatibility with ExtremeZ-IP A Technical Best Practices Whitepaper About this Document The purpose of this technical paper is to discuss how ExtremeZ-IP supports Microsoft Active Directory.

More information

What's New in SAS Data Management

What's New in SAS Data Management Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases

More information

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS ..................................... PEPPERDATA IN MULTI-TENANT ENVIRONMENTS technical whitepaper June 2015 SUMMARY OF WHAT S WRITTEN IN THIS DOCUMENT If you are short on time and don t want to read the

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

The Sumo Logic Solution: Security and Compliance

The Sumo Logic Solution: Security and Compliance The Sumo Logic Solution: Security and Compliance Introduction With the number of security threats on the rise and the sophistication of attacks evolving, the inability to analyze terabytes of logs using

More information

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success Developing an MDM Strategy Key Components for Success WHITE PAPER Table of Contents Introduction... 2 Process Considerations... 3 Architecture Considerations... 5 Conclusion... 9 About Knowledgent... 10

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

A Vision for Operational Analytics as the Enabler for Business Focused Hybrid Cloud Operations

A Vision for Operational Analytics as the Enabler for Business Focused Hybrid Cloud Operations A Vision for Operational Analytics as the Enabler for Focused Hybrid Cloud Operations As infrastructure and applications have evolved from legacy to modern technologies with the evolution of Hybrid Cloud

More information

Hypertable Architecture Overview

Hypertable Architecture Overview WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for

More information

Next-Generation Cloud Analytics with Amazon Redshift

Next-Generation Cloud Analytics with Amazon Redshift Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Messaging. High Performance Peer-to-Peer Messaging Middleware. brochure

Messaging. High Performance Peer-to-Peer Messaging Middleware. brochure Messaging High Performance Peer-to-Peer Messaging Middleware brochure Can You Grow Your Business Without Growing Your Infrastructure? The speed and efficiency of your messaging middleware is often a limiting

More information

Integration Maturity Model Capability #5: Infrastructure and Operations

Integration Maturity Model Capability #5: Infrastructure and Operations Integration Maturity Model Capability #5: Infrastructure and Operations How improving integration supplies greater agility, cost savings, and revenue opportunity TAKE THE INTEGRATION MATURITY SELFASSESSMENT

More information

IBM Tivoli Composite Application Manager for WebSphere

IBM Tivoli Composite Application Manager for WebSphere Meet the challenges of managing composite applications IBM Tivoli Composite Application Manager for WebSphere Highlights Simplify management throughout the life cycle of complex IBM WebSphere-based J2EE

More information

Cloud Computing and Advanced Relationship Analytics

Cloud Computing and Advanced Relationship Analytics Cloud Computing and Advanced Relationship Analytics Using Objectivity/DB to Discover the Relationships in your Data By Brian Clark Vice President, Product Management Objectivity, Inc. 408 992 7136 brian.clark@objectivity.com

More information

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers Modern IT Operations Management Why a New Approach is Required, and How Boundary Delivers TABLE OF CONTENTS EXECUTIVE SUMMARY 3 INTRODUCTION: CHANGING NATURE OF IT 3 WHY TRADITIONAL APPROACHES ARE FAILING

More information

Identifying Fraud, Managing Risk and Improving Compliance in Financial Services

Identifying Fraud, Managing Risk and Improving Compliance in Financial Services SOLUTION BRIEF Identifying Fraud, Managing Risk and Improving Compliance in Financial Services DATAMEER CORPORATION WEBSITE www.datameer.com COMPANY OVERVIEW Datameer offers the first end-to-end big data

More information

BEA AquaLogic Integrator Agile integration for the Enterprise Build, Connect, Re-use

BEA AquaLogic Integrator Agile integration for the Enterprise Build, Connect, Re-use Product Data Sheet BEA AquaLogic Integrator Agile integration for the Enterprise Build, Connect, Re-use BEA AquaLogic Integrator delivers the best way for IT to integrate, deploy, connect and manage process-driven

More information

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP Your business is swimming in data, and your business analysts want to use it to answer the questions of today and tomorrow. YOU LOOK TO

More information

Scalability in Log Management

Scalability in Log Management Whitepaper Scalability in Log Management Research 010-021609-02 ArcSight, Inc. 5 Results Way, Cupertino, CA 95014, USA www.arcsight.com info@arcsight.com Corporate Headquarters: 1-888-415-ARST EMEA Headquarters:

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform Optimized for the Industrial Internet: GE s Industrial Lake Platform Agenda The Opportunity The Solution The Challenges The Results Solutions for Industrial Internet, deep domain expertise 2 GESoftware.com

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

MySQL and Hadoop Big Data Integration

MySQL and Hadoop Big Data Integration MySQL and Hadoop Big Data Integration Unlocking New Insight A MySQL White Paper December 2012 Table of Contents Introduction... 3 The Lifecycle of Big Data... 4 MySQL in the Big Data Lifecycle... 4 Acquire:

More information

Ramping to a Big Data Visibility Architecture. White Paper. Copyright 2003 2014. VSS Monitoring Inc. All rights reserved.

Ramping to a Big Data Visibility Architecture. White Paper. Copyright 2003 2014. VSS Monitoring Inc. All rights reserved. Ramping to a Big Data Visibility Architecture White Paper Copyright 2003 2014. VSS Monitoring Inc. All rights reserved. Ramping to a Big Data Visibility Architecture White Paper Introduction Big data is

More information

White Paper: Cloud Identity is Different. World Leading Directory Technology. Three approaches to identity management for cloud services

White Paper: Cloud Identity is Different. World Leading Directory Technology. Three approaches to identity management for cloud services World Leading Directory Technology White Paper: Cloud Identity is Different Three approaches to identity management for cloud services Published: March 2015 ViewDS Identity Solutions A Changing Landscape

More information

Trusted Analytics Platform (TAP) TAP Technical Brief. October 2015. trustedanalytics.org

Trusted Analytics Platform (TAP) TAP Technical Brief. October 2015. trustedanalytics.org Trusted Analytics Platform (TAP) TAP Technical Brief October 2015 TAP Technical Brief Overview Trusted Analytics Platform (TAP) is open source software, optimized for performance and security, that accelerates

More information

Day with Development Master Class Big Data Management System DW & Big Data Global Leaders Program

Day with Development Master Class Big Data Management System DW & Big Data Global Leaders Program Day with Development Master Class Big Data Management System DW & Big Data Global Leaders Program Jean-Pierre Dijcks Big Data Product Management Server Technologies Part 1 Part 2 Foundation and Architecture

More information

Sisense. Product Highlights. www.sisense.com

Sisense. Product Highlights. www.sisense.com Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze

More information

HP Vertica OnDemand. Vertica OnDemand. Enterprise-class Big Data analytics in the cloud. Enterprise-class Big Data analytics for any size organization

HP Vertica OnDemand. Vertica OnDemand. Enterprise-class Big Data analytics in the cloud. Enterprise-class Big Data analytics for any size organization Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater

More information

Business Transformation for Application Providers

Business Transformation for Application Providers E SB DE CIS IO N GUID E Business Transformation for Application Providers 10 Questions to Ask Before Selecting an Enterprise Service Bus 10 Questions to Ask Before Selecting an Enterprise Service Bus InterSystems

More information

THE GLOBAL EVENT MANAGER

THE GLOBAL EVENT MANAGER The Big Data Mining Company THE GLOBAL EVENT MANAGER When data is available and reachable, it has to be processed and decrypted using multiple heterogeneous tools, if these are available. Each of these

More information

How to Choose Between Hadoop, NoSQL and RDBMS

How to Choose Between Hadoop, NoSQL and RDBMS How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A

More information

Interactive data analytics drive insights

Interactive data analytics drive insights Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has

More information

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer Automated Data Ingestion Bernhard Disselhoff Enterprise Sales Engineer Agenda Pentaho Overview Templated dynamic ETL workflows Pentaho Data Integration (PDI) Use Cases Pentaho Overview Overview What we

More information

CA Service Desk Manager

CA Service Desk Manager PRODUCT BRIEF: CA SERVICE DESK MANAGER CA Service Desk Manager CA SERVICE DESK MANAGER IS A VERSATILE, COMPREHENSIVE IT SUPPORT SOLUTION THAT HELPS YOU BUILD SUPERIOR INCIDENT AND PROBLEM MANAGEMENT PROCESSES

More information

Integrating VoltDB with Hadoop

Integrating VoltDB with Hadoop The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.

More information

White. Paper. EMC Isilon: A Scalable Storage Platform for Big Data. April 2014

White. Paper. EMC Isilon: A Scalable Storage Platform for Big Data. April 2014 White Paper EMC Isilon: A Scalable Storage Platform for Big Data By Nik Rouda, Senior Analyst and Terri McClure, Senior Analyst April 2014 This ESG White Paper was commissioned by EMC Isilon and is distributed

More information

Using Kafka to Optimize Data Movement and System Integration. Alex Holmes @

Using Kafka to Optimize Data Movement and System Integration. Alex Holmes @ Using Kafka to Optimize Data Movement and System Integration Alex Holmes @ https://www.flickr.com/photos/tom_bennett/7095600611 THIS SUCKS E T (circa 2560 B.C.E.) L a few years later... 2,014 C.E. i need

More information

ORACLE COHERENCE 12CR2

ORACLE COHERENCE 12CR2 ORACLE COHERENCE 12CR2 KEY FEATURES AND BENEFITS ORACLE COHERENCE IS THE #1 IN-MEMORY DATA GRID. KEY FEATURES Fault-tolerant in-memory distributed data caching and processing Persistence for fast recovery

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

USING R.E.A.L. BIG DATA FOR IT OPERATIONS CONTENTS

USING R.E.A.L. BIG DATA FOR IT OPERATIONS CONTENTS CONTENTS Introduction... 3 Making Sense of the Options... 4 The R.E.A.L. Big Data Test... 5 Understanding R.E.A.L... 6 The Elements of RELIABILITY... 7 The Elements of EXTENSIBILITY... 9 The Elements of

More information

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

WHITE PAPER SPLUNK SOFTWARE AS A SIEM SPLUNK SOFTWARE AS A SIEM Improve your security posture by using Splunk as your SIEM HIGHLIGHTS Splunk software can be used to operate security operations centers (SOC) of any size (large, med, small)

More information

Informatica and the Vibe Virtual Data Machine

Informatica and the Vibe Virtual Data Machine White Paper Informatica and the Vibe Virtual Data Machine Preparing for the Integrated Information Age This document contains Confidential, Proprietary and Trade Secret Information ( Confidential Information

More information

IBM Global Technology Services September 2007. NAS systems scale out to meet growing storage demand.

IBM Global Technology Services September 2007. NAS systems scale out to meet growing storage demand. IBM Global Technology Services September 2007 NAS systems scale out to meet Page 2 Contents 2 Introduction 2 Understanding the traditional NAS role 3 Gaining NAS benefits 4 NAS shortcomings in enterprise

More information

IBM InfoSphere BigInsights Enterprise Edition

IBM InfoSphere BigInsights Enterprise Edition IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade

More information

Architecture Modernization

Architecture Modernization Architecture Modernization Pragmatic Data Engineering and Pipeline Creation 1 Trends in the Market Explosion of Unstructured Data Data Warehouse Limitations Increased Processing Demands 16 billion connected

More information

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS Managing and analyzing data in the cloud is just as important as it is anywhere else. To let you do this, Windows Azure provides a range of technologies

More information

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385 brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 1 Hadoop in a heartbeat 3 2 Introduction to YARN 22 PART 2 DATA LOGISTICS...59 3 Data serialization working with text and beyond 61 4 Organizing and

More information