ROCANA WHITEPAPER Rocana Ops Architecture
|
|
|
- Ethelbert Flynn
- 10 years ago
- Views:
Transcription
1 ROCANA WHITEPAPER Rocana Ops Architecture
2 CONTENTS INTRODUCTION... 2 DESIGN PRINCIPLES... 2 EVENTS: A COHESIVE AND FLEXIBLE DATA MODEL FOR OPERATIONAL DATA... 4 DATA COLLECTION... 5 Syslog... 6 File tailing... 6 Directory spooling... 6 Java API... 6 REST API... 7 Log4J... 7 PUBLISH-SUBSCRIBE MESSAGING... 7 EVENT PROCESSING AND STORAGE... 7 Event Storage... 7 Search Indexing... 8 Metric Aggregation... 8 Advanced Analytics... 8 Data Lifecycle Management... 8 DATA EXPLORATION, VISUALIZATION, AND ANALYSIS... 9 EXTENDING ROCANA OPS... 9 ABOUT ROCANA Rocana, Inc. 1
3 INTRODUCTION The infrastructure that runs the modern business looks fundamentally different today than it did twenty, ten, or even five years ago, and operations has been struggling to keep pace with the pace of change. Applications that ran on single large machines gave way to specialized servers numbering in the tens or hundreds, followed by the virtualization movement, which turned the hardware substrate into a general purpose platform. Today, applications themselves look different; they re decomposed into finegrained services, distributed across many machines, and highly interdependent. Organizations have begun shifting from discrete line-of-business applications to shared services built and maintained by platform groups. The modern application may be the amalgamation of tens of services, each of which operates and scales independently. While incredibly powerful, resilient, and scalable, these systems are also very complex, and are challenging to operate in mission critical environments. Traditional IT operations management tools that worked well in the past can no longer provide the necessary insight and run at the scale required for modern IT infrastructure. A fundamentally new approach is required. One that treats IT Operations Analytics as the Big Data problem it has become. Rocana Ops was built from the ground up to give system, network, and application administrators and developers true insight into modern infrastructure and applications. It was designed with the assumptions that there are hundreds of thousands of machines to be managed, and that the relationship between services, applications, and the underlying infrastructure is ephemeral and can change dynamically. The fundamental goal of Rocana Ops is to provide a single view of event and metric data for all services, while separating the signal from the noise using advanced analytics and machine learning techniques associated with Big Data challenges in other domains. DESIGN PRINCIPLES When building Rocana Ops, we held the belief that it would be used across a large organization as a shared service. As a result, it needed to operate at enterprise scale, handling tens of terabytes per day of event and metric data, and hundreds or even thousands of users engaging with the application concurrently. We ve anticipated collected data would be used for more than the most obvious operational use cases: Our customers would want to extend the platform and use the data in new and unexpected ways. We therefore needed to eliminate proprietary formats and inefficient batch exports, to allow customers to truly own their data assets. Also, in order to promote interoperability, extensibility and scalability, we maximized the use of open source platforms in the underlying architecture. In order to support production operations, we dramatically reduced end-to-end latency from when data is initially produced to when it s available for query and analytics. An operational analytics platform, by definition, also needs to be more available than the systems it monitors. And therefore any failures in the system or the infrastructure on which it runs need to be anticipated and handled appropriately Rocana, Inc. 2
4 Pulling together operational data from disparate systems is inherently an exercise in data integration. It is infeasible to require modification of source systems; therefore, extensible data processing and transformation needs to be a first class citizen. The system has many different kinds of users. The needs of application, network, and system administrators differ from those of developers, and DevOps staff have still different requirements. To be truly useful, the system needs to support all of these users and use cases. High-Level Logical View of Rocana Ops Architecture Analyzing complex operational data is not that different from analyzing security, customer behavior, or financial data, but these IT operators rarely have tools as sophisticated as those of their peers in these other disciplines. There s typically a lot of noise in operational monitoring at scale, and users can be easily overwhelmed by it, lacking the time to separate the signal. Further, the signal required to solve one problem may be the noise for another. Rocana Ops aims to address this by providing out-of-the-box machine-learning algorithms that guide the user in the analysis process. The solution s interaction model is based on visualizations that facilitate narrowing down the scope of analysis and pinpointing problem areas. Many years of operational experience, big data expertise, and advanced analytics practice have come together to build the next generation of operational analytics for today s modern infrastructure. One of the primary goals of Rocana Ops is to eliminate data silos by combining the critical features of different operational systems into a single application. That said, much of the data collected by Rocana Ops is the same data used by specialized applications in security, marketing, e-commerce, and other such systems. Rather than force developers to reinvent the kind of scale-out infrastructure that drives Rocana Ops and source this data a second time, we actively encourage direct integration with and extension of the Rocana Ops platform Rocana, Inc. 3
5 EVENTS: A COHESIVE AND FLEXIBLE DATA MODEL FOR OPERATIONAL DATA To simplify collection, transformation, storage, exploration, and analysis, Rocana Ops uses an extensible event data model to represent all data within the system. This event data model provides a simple and flexible way of representing time-oriented discrete events that occur within the data center. An event can represent a miscellaneous log record, an HTTP request, a system or application authentication result, a SQL query audit, or even a group of application metrics at a particular point in time designated with an event type. Application and device performance data is also captured as events, categorized under one of the system built-in event types. Data sources generate events containing one or more metrics in the attributes of a metric event, and the system automatically builds and maintains a time series of each metric over multiple dimensions. All events have a set of common fields, including a millisecond-accurate timestamp, event type, unique identifier, and event message body, as well as the originating host, location, and service. Since all events have these common fields, it s possible to correlate otherwise disparate data sources without requiring complex and bespoke queries. Additionally, each event contains a set of key-value pairs which can carry event type-specific data. These attributes act as a natural point of customization for user-defined event types. Moreover, because all parts of Rocana Ops understand these attributes, new or business-specific event data can be easily captured, processed, and queried without custom development. Here s an example of a typical syslog (RFC 3164) event, represented as text. Within the system, events in transit are actually encoded in Apache Avro format; an efficient, compact, binary format designed for the transport and storage of data. Example: A syslog (RFC 3164) event. { // Common fields present in all events. } } id: JRHAIDMLCKLEAPMIQDHFLO3MXYXV7NVBEJNDKZGS2XVSEINGGBHA====, event_type_id: 100, ts: , location: aws/us-west-2a, host: example01.rocana.com, service: dhclient, body:..., attributes: { // Attributes are specific to the event type. syslog_timestamp: , syslog_process: dhclient, syslog_pid: 865, syslog_facility: 3, syslog_severity: 6, syslog_hostname: example01, syslog_message: DHCPACK from (xid=0x5c64bdb0) 2015 Rocana, Inc. 4
6 This example shows how HTTP request events from an application server can be easily represented as well. Example: An application server-neutral HTTP request event. { id: JRHAIDMLCKLEAPMIQDHFLO3MXYXV7NVBEJNDKZGS2XVSEINGGBHA====, event_type_id: 103, ts: , location: nyc/fac2/rack17, host: example01.rocana.com, service: httpd, body:..., attributes: { http_request_url: http_request_vhost: http_request_proto: HTTP/1.1, http_request_method: GET, http_request_path: /product/104, http_request_query: ref=search&uid=1007, http_request_scheme: http, http_response_code: 200, http_response_size: 2452,... } } When building an operational analytics platform as a service to business units within a larger organization, it s common to onboard new event types constantly. This flexible event format allows the system to capture new event types without requiring configuration within the platform prior to receiving data. Additionally, both raw and extracted data can be retained within an event, allowing for easy reprocessing in case data processing errors are found. DATA COLLECTION As noted before, we see operational analytic a first class exercise in data integration, a solution needs to collect data from every system that makes up the larger organizational infrastructure, as well as handle the different ways in which those systems produce logs, metrics, and diagnostics. In simple cases, integration may be possible via configuration; elsewhere, custom plugins may do the trick. Specialized cases may require deeper integration involving direct access to traditionally internal formats and data. All of these methods are supported by Rocana Ops using two main mechanisms: 2015 Rocana, Inc. 5
7 1. A native Rocana Agent plays three major roles on Linux and Windows operating systems. The Agent acts as a syslog server and provides file tailing and directory spooling data integration for locally written logs. Additionally, the Agent collects OS- and host-level metrics of the machines on which it runs. ROCANA OPS AGENT File tailing Syslog Server Directory Spooling 2. A high-performance Java API is also available to collect data directly from applications. It can be used directly for system extension, or employing wrapper REST and Log4J APIs. JAVA API Log4J REST API SYSLOG Syslog is the primary source for Unix-variant operating system data, as well as most network devices. The Rocana Agent operates a RFC and 5246-compliant syslog server, supporting both TCP and UDP protocols. Syslog messages are automatically parsed, becoming events within the system. FILE TAILING Text-based log files are a common and simple method of exposing application activity. The Rocana Agent supports reliable real-time tailing of log files with customizable regular expressions for extracting fields from records. DIRECTORY SPOOLING While file tailing is used to watch well-known log files that receive a continuous stream of small changes, directory spooling supports the use case of directories that receive larger data files that should be ingested once. If systems dump diagnostic trace or performance files, for example, the Rocana Agent can respond by processing each file as it arrives in a specified filesystem directory. JAVA API The Rocana Ops Java API is the highest performance, most direct, and most flexible method of producing or consuming event data. Those who wish to explicitly instrument applications or integrate with custom systems can use this API to produce data directly to, or consume data from, the publish/subscribe messaging system used by the rest of the platform. This same API powers the REST API as well as many of the internal components of Rocana Ops and, as a result, is highly optimized and proven Rocana, Inc. 6
8 REST API A simple REST API is provided on top of the Java API for easy integration of systems where performance is less critical. This API can be used with any language or thirdparty system that supports HTTP/S and JSON parsing. LOG4J An appender plugin for the common Apache Log4J Java logging framework is provided for code-free integration with systems already instrumented with these APIs. Using the Rocana Ops Log4J appender obviates the need to write text logs to disk, sending data directly and reliably to the pub/sub messaging system. PUBLISH-SUBSCRIBE MESSAGING At its core, Rocana Ops uses the high-throughput, reliable, scale-out, open source publish-subscribe ( pub-sub ) messaging system Apache Kafka for the transport of all data within the system. This pub-sub layer acts as the central nervous system, facilitating all of the real-time data delivery and processing performed by the rest of the system. All event data captured by the sources described in the Data Integration section is sent to a global event stream, which is consumed by the system in different ways. This firehose of event data provides a single, full-fidelity, real-time view of all activity within an entire organization, making it the perfect data integration point for both Rocana Ops and custom applications. Kafka has a very familiar logical structure, similar to any other pubsub broker, with a few notable deviations that allow it to function at this scale. Just as with traditional pubsub messaging systems, Kafka employs the notions of producers, consumers, and topics to which producers send, or from which consumers receive data. All data in Kafka is always persisted to disk in transaction logs. This is the equivalent of most reliable or durable modes in traditional messaging systems. Rather than assume all data fits on a single broker, however, Kafka partitions each topic, spreading the partitions across multiple brokers that work together as a cluster. Each partition of a topic is also optionally replicated a configurable number of times so broker failures can be tolerated. The Rocana Ops data sources described earlier automatically distribute data across these partitions, taking full advantage of the aggregate capacity of the available brokers. When more capacity is required, additional brokers may be added to the cluster. For additional technical information about Apache Kafka, see EVENT PROCESSING AND STORAGE Event data is processed in real-time by specialized services as it is received from the messaging layer. The following services exist within Rocana Ops to facilitate its different functionalities: EVENT STORAGE All event data is persisted to the Hadoop Distributed Filesystem (HDFS) for long-term storage and downstream processing. Data is stored in Apache Parquet a highly optimized, PAX-structured format supporting common columnar storage features such as run length encoding (RLE), dictionary encoding, as well as traditional block 2015 Rocana, Inc. 7
9 compression. This dataset is partitioned by time to facilitate partition pruning when performing queries against known time ranges (the common case in operational analytics). SEARCH INDEXING Each event is indexed for full text and faceted search within the Rocana Ops application. Just as with the event storage service, search indexes are partitioned by time, and served by a scale-out parallelized search engine. Indexing is performed in real-time, as the data arrives. METRIC AGGREGATION In addition to discrete events, Rocana Ops also maintains time series datasets of metric data from device and application activity. Metric data can arrive as events containing host-related metrics collected by the Rocana Agent (described earlier), or it can be extracted from other kinds of events, such as logs. Examples include extracting the number of failed application logins from clickstream data, the number and types of various HTTP errors, or the query time of every SQL statement executed in a relational database. The metric aggregation service records and derives this kind of metric data, writing it to HDFS as a time-partitioned Parquet dataset. Rocana Ops uses this data to build charts and detect patterns in the data using advanced data analysis techniques. ADVANCED ANALYTICS Many of the advanced analytical functions of Rocana Ops operate by observing the data over time and learning about patterns that occur. Rocana Ops employs sophisticated anomaly detection, using machine learning to develop baseline models for the system as a whole as well as custom models of hosts, services, or location. For example, Rocana Ops anomaly detection can establish an ever-improving model of disk IO on a particular machine, then flag any unexpected deviations from that norm. The analytics service controls the execution of these algorithms. DATA LIFECYCLE MANAGEMENT Over time, it becomes necessary to control the growth and characteristics of highvolume datasets in order to control resource consumption and cost. Rocana Ops includes a data lifecycle management (DLM) service that enforces policies on the collected data, including control over data retention and optimization. All services described above are highly available and may be scaled independently to accommodate the size and complexity of a deployment Rocana, Inc. 8
10 DATA EXPLORATION, VISUALIZATION, AND ANALYSIS Rocana Ops includes an interactive user interface for exploring, visualizing, and analyzing operational data collected by the system. Specialized views exist for interactive data exploration and trends identification, comparing and correlating different kinds of events, full-text search, custom dashboard creation, and more. These views combine metadata extracted from event data, time series metric data, and discrete event data to provide a comprehensive understanding of what s happening within the infrastructure, without requiring operators to learn specialized skills or query syntax. Rather than attempt to support these visualizations by shoehorning all their requests into a single general-purpose query engine or storage system, Rocana Ops uses different query engines and data representations to answer different kinds of questions. Interactive full text search queries, for instance, are handled by a parallel search engine, while charts of time series data use a parallel SQL engine over partitioned Parquet datasets. All parts of the system use first-class parallelized query and storage systems for handling event and metric data, making Rocana Ops the first natively parallelized operational analytics system available. EXTENDING ROCANA OPS Rocana Ops is highly customizable, all the way from the user interfaces down to the data platform. Users can extend the system by defining their own event types for specific use cases. They can also develop custom producers to support custom sources, as well as custom consumers to execute specialized data processing. In addition, data transformation can be defined using configuration-based approach to enable quick, simple transformation from and to multiple formats. Anomaly detection targets and thresholds, as well as system-wide data retention policies are also fully customizable Rocana, Inc. 9
11 Simplified Flow Diagram for Integrating External Data with Events CONCLUSION Rocana Ops is all about bringing the power of Big Data analytics into the realm of IT operations. It provides several mechanisms to ingest all of your data, and utilizes the Kafka bus architecture to insure that the system can scale to the required volumes. It also uses HDFS for long term data retention in order to minimize the costs associated with storage infrastructure. The use of these, as well as other open source platforms and formats also insures that the data is always accessible and is never held hostage. To that end, robust integration mechanisms are also provided in and out of the system. In order to deliver sophisticated yet straightforward data analysis (typically associated with other domains such as finance), Rocana Ops employs anomaly detection algorithms based on machine learning, which drive visualizations to guide the user to the most important data points. These algorithms can be tweaked, but are meant to work right out of the box, without requiring any data science background. Our key goal is to provide a simple, easy to use application, while taking advance of the most sophisticated technologies available today Rocana, Inc. 10
12 ABOUT ROCANA Rocana is creating the next generation of IT operations analytics software in a world in which IT complexity is growing exponentially as a result of virtualization, containerization and shared services. Rocana s mission is to provide guided root cause analysis of event oriented machine data in order to streamline IT operations and boost profitability. Founded by veterans from Cloudera, Vertica and Experian, the Rocana team has directly experienced the challenges of today s IT infrastructures, and has set out to address them using modern technology that leverages the Hadoop ecosystem. Rocana, Inc. 548 Market St #22538, San Francisco, CA (877) ROCANA1 [email protected] Rocana, Inc. All rights reserved. Rocana and the Rocana logo are trademarks or registered trademarks of Rocana, Inc. in the United States and/or other countries. WP-ARCH Rocana, Inc. 11
How To Manage Event Data With Rocano Ops
ROCANA WHITEPAPER Improving Event Data Management and Legacy Systems INTRODUCTION STATE OF AFFAIRS WHAT IS EVENT DATA? There are a myriad of terms and definitions related to data that is the by-product
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically
ROCANA WHITEPAPER How to Investigate an Infrastructure Performance Problem
ROCANA WHITEPAPER How to Investigate an Infrastructure Performance Problem INTRODUCTION As IT infrastructure has grown more complex, IT administrators and operators have struggled to retain control. Gone
Best Practices for Hadoop Data Analysis with Tableau
Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks
effective performance monitoring in SAP environments
WHITE PAPER September 2012 effective performance monitoring in SAP environments Key challenges and how CA Nimsoft Monitor helps address them agility made possible table of contents executive summary 3
CDH AND BUSINESS CONTINUITY:
WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable
HDFS. Hadoop Distributed File System
HDFS Kevin Swingler Hadoop Distributed File System File system designed to store VERY large files Streaming data access Running across clusters of commodity hardware Resilient to node failure 1 Large files
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Apache Hadoop: The Big Data Refinery
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
BIG DATA ANALYTICS For REAL TIME SYSTEM
BIG DATA ANALYTICS For REAL TIME SYSTEM Where does big data come from? Big Data is often boiled down to three main varieties: Transactional data these include data from invoices, payment orders, storage
Using In-Memory Computing to Simplify Big Data Analytics
SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
HadoopTM Analytics DDN
DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate
WHITE PAPER. Five Steps to Better Application Monitoring and Troubleshooting
WHITE PAPER Five Steps to Better Application Monitoring and Troubleshooting There is no doubt that application monitoring and troubleshooting will evolve with the shift to modern applications. The only
Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
Trafodion Operational SQL-on-Hadoop
Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL
Machine Data Analytics with Sumo Logic
Machine Data Analytics with Sumo Logic A Sumo Logic White Paper Introduction Today, organizations generate more data in ten minutes than they did during the entire year in 2003. This exponential growth
Using an In-Memory Data Grid for Near Real-Time Data Analysis
SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses
Server & Application Monitor
Server & Application Monitor agentless application & server monitoring SolarWinds Server & Application Monitor provides predictive insight to pinpoint app performance issues. This product contains a rich
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
CRITEO INTERNSHIP PROGRAM 2015/2016
CRITEO INTERNSHIP PROGRAM 2015/2016 A. List of topics PLATFORM Topic 1: Build an API and a web interface on top of it to manage the back-end of our third party demand component. Challenge(s): Working with
How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
Putting Apache Kafka to Use!
Putting Apache Kafka to Use! Building a Real-time Data Platform for Event Streams! JAY KREPS, CONFLUENT! A Couple of Themes! Theme 1: Rise of Events! Theme 2: Immutability Everywhere! Level! Example! Immutable
From Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
The IBM Cognos Platform
The IBM Cognos Platform Deliver complete, consistent, timely information to all your users, with cost-effective scale Highlights Reach all your information reliably and quickly Deliver a complete, consistent
Constructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
CitusDB Architecture for Real-Time Big Data
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam [email protected]
Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam [email protected] Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Violin Symphony Abstract
Violin Symphony Abstract This white paper illustrates how Violin Symphony provides a simple, unified experience for managing multiple Violin Memory Arrays. Symphony facilitates scale-out deployment of
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
Simplified Management With Hitachi Command Suite. By Hitachi Data Systems
Simplified Management With Hitachi Command Suite By Hitachi Data Systems April 2015 Contents Executive Summary... 2 Introduction... 3 Hitachi Command Suite v8: Key Highlights... 4 Global Storage Virtualization
What's New in SAS Data Management
Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases
Paper 064-2014. Robert Bonham, Gregory A. Smith, SAS Institute Inc., Cary NC
Paper 064-2014 Log entries, Events, Performance Measures, and SLAs: Understanding and Managing your SAS Deployment by Leveraging the SAS Environment Manager Data Mart ABSTRACT Robert Bonham, Gregory A.
Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
WHITE PAPER. Reference Guide for Deploying and Configuring Apache Kafka
WHITE PAPER Reference Guide for Deploying and Configuring Apache Kafka Revised: 02/2015 Table of Content 1. Introduction 3 2. Apache Kafka Technology Overview 3 3. Common Use Cases for Kafka 4 4. Deploying
PEPPERDATA IN MULTI-TENANT ENVIRONMENTS
..................................... PEPPERDATA IN MULTI-TENANT ENVIRONMENTS technical whitepaper June 2015 SUMMARY OF WHAT S WRITTEN IN THIS DOCUMENT If you are short on time and don t want to read the
Why Big Data in the Cloud?
Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data
The Sumo Logic Solution: Security and Compliance
The Sumo Logic Solution: Security and Compliance Introduction With the number of security threats on the rise and the sophistication of attacks evolving, the inability to analyze terabytes of logs using
Hadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
Oracle Identity Analytics Architecture. An Oracle White Paper July 2010
Oracle Identity Analytics Architecture An Oracle White Paper July 2010 Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may
How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)
Day with Development Master Class Big Data Management System DW & Big Data Global Leaders Program Jean-Pierre Dijcks Big Data Product Management Server Technologies Part 1 Part 2 Foundation and Architecture
What can DDS do for You? Learn how dynamic publish-subscribe messaging can improve the flexibility and scalability of your applications.
What can DDS do for You? Learn how dynamic publish-subscribe messaging can improve the flexibility and scalability of your applications. 2 Contents: Abstract 3 What does DDS do 3 The Strengths of DDS 4
Cloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
Agenda. Some Examples from Yahoo! Hadoop. Some Examples from Yahoo! Crawling. Cloud (data) management Ahmed Ali-Eldin. First part: Second part:
Cloud (data) management Ahmed Ali-Eldin First part: ZooKeeper (Yahoo!) Agenda A highly available, scalable, distributed, configuration, consensus, group membership, leader election, naming, and coordination
Using Kafka to Optimize Data Movement and System Integration. Alex Holmes @
Using Kafka to Optimize Data Movement and System Integration Alex Holmes @ https://www.flickr.com/photos/tom_bennett/7095600611 THIS SUCKS E T (circa 2560 B.C.E.) L a few years later... 2,014 C.E. i need
Next-Generation Cloud Analytics with Amazon Redshift
Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Enterprise Solution for Remote Desktop Services... 2. System Administration... 3. Server Management... 4. Server Management (Continued)...
CONTENTS Enterprise Solution for Remote Desktop Services... 2 System Administration... 3 Server Management... 4 Server Management (Continued)... 5 Application Management... 6 Application Management (Continued)...
Cloud Computing and Advanced Relationship Analytics
Cloud Computing and Advanced Relationship Analytics Using Objectivity/DB to Discover the Relationships in your Data By Brian Clark Vice President, Product Management Objectivity, Inc. 408 992 7136 [email protected]
How To Improve Your Communication With An Informatica Ultra Messaging Streaming Edition
Messaging High Performance Peer-to-Peer Messaging Middleware brochure Can You Grow Your Business Without Growing Your Infrastructure? The speed and efficiency of your messaging middleware is often a limiting
Identifying Fraud, Managing Risk and Improving Compliance in Financial Services
SOLUTION BRIEF Identifying Fraud, Managing Risk and Improving Compliance in Financial Services DATAMEER CORPORATION WEBSITE www.datameer.com COMPANY OVERVIEW Datameer offers the first end-to-end big data
MySQL and Hadoop Big Data Integration
MySQL and Hadoop Big Data Integration Unlocking New Insight A MySQL White Paper December 2012 Table of Contents Introduction... 3 The Lifecycle of Big Data... 4 MySQL in the Big Data Lifecycle... 4 Acquire:
Optimized for the Industrial Internet: GE s Industrial Data Lake Platform
Optimized for the Industrial Internet: GE s Industrial Lake Platform Agenda The Opportunity The Solution The Challenges The Results Solutions for Industrial Internet, deep domain expertise 2 GESoftware.com
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Scalability in Log Management
Whitepaper Scalability in Log Management Research 010-021609-02 ArcSight, Inc. 5 Results Way, Cupertino, CA 95014, USA www.arcsight.com [email protected] Corporate Headquarters: 1-888-415-ARST EMEA Headquarters:
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385
brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 1 Hadoop in a heartbeat 3 2 Introduction to YARN 22 PART 2 DATA LOGISTICS...59 3 Data serialization working with text and beyond 61 4 Organizing and
Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success
Developing an MDM Strategy Key Components for Success WHITE PAPER Table of Contents Introduction... 2 Process Considerations... 3 Architecture Considerations... 5 Conclusion... 9 About Knowledgent... 10
Enabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
Interactive data analytics drive insights
Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has
How to Choose Between Hadoop, NoSQL and RDBMS
How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A
A Vision for Operational Analytics as the Enabler for Business Focused Hybrid Cloud Operations
A Vision for Operational Analytics as the Enabler for Focused Hybrid Cloud Operations As infrastructure and applications have evolved from legacy to modern technologies with the evolution of Hybrid Cloud
Cisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
Integration Maturity Model Capability #5: Infrastructure and Operations
Integration Maturity Model Capability #5: Infrastructure and Operations How improving integration supplies greater agility, cost savings, and revenue opportunity TAKE THE INTEGRATION MATURITY SELFASSESSMENT
IBM Tivoli Composite Application Manager for WebSphere
Meet the challenges of managing composite applications IBM Tivoli Composite Application Manager for WebSphere Highlights Simplify management throughout the life cycle of complex IBM WebSphere-based J2EE
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
ORACLE COHERENCE 12CR2
ORACLE COHERENCE 12CR2 KEY FEATURES AND BENEFITS ORACLE COHERENCE IS THE #1 IN-MEMORY DATA GRID. KEY FEATURES Fault-tolerant in-memory distributed data caching and processing Persistence for fast recovery
Hypertable Architecture Overview
WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for
BEA AquaLogic Integrator Agile integration for the Enterprise Build, Connect, Re-use
Product Data Sheet BEA AquaLogic Integrator Agile integration for the Enterprise Build, Connect, Re-use BEA AquaLogic Integrator delivers the best way for IT to integrate, deploy, connect and manage process-driven
Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers
Modern IT Operations Management Why a New Approach is Required, and How Boundary Delivers TABLE OF CONTENTS EXECUTIVE SUMMARY 3 INTRODUCTION: CHANGING NATURE OF IT 3 WHY TRADITIONAL APPROACHES ARE FAILING
Informatica and the Vibe Virtual Data Machine
White Paper Informatica and the Vibe Virtual Data Machine Preparing for the Integrated Information Age This document contains Confidential, Proprietary and Trade Secret Information ( Confidential Information
Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.
Detecting Anomalous Behavior with the Business Data Lake Reference Architecture and Enterprise Approaches. 2 Detecting Anomalous Behavior with the Business Data Lake Pivotal the way we see it Reference
How To Use Hp Vertica Ondemand
Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater
Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
Sisense. Product Highlights. www.sisense.com
Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze
PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP
PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP Your business is swimming in data, and your business analysts want to use it to answer the questions of today and tomorrow. YOU LOOK TO
Oracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
How To Write A Trusted Analytics Platform (Tap)
Trusted Analytics Platform (TAP) TAP Technical Brief October 2015 TAP Technical Brief Overview Trusted Analytics Platform (TAP) is open source software, optimized for performance and security, that accelerates
Online Transaction Processing in SQL Server 2008
Online Transaction Processing in SQL Server 2008 White Paper Published: August 2007 Updated: July 2008 Summary: Microsoft SQL Server 2008 provides a database platform that is optimized for today s applications,
RSA Security Analytics Security Analytics System Overview
RSA Security Analytics Security Analytics System Overview Copyright 2010-2015 RSA, the Security Division of EMC. All rights reserved. Trademarks RSA, the RSA Logo and EMC are either registered trademarks
Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel
Big data platform for IoT Cloud Analytics Chen Admati, Advanced Analytics, Intel Agenda IoT @ Intel End-to-End offering Analytics vision Big data platform for IoT Cloud Analytics Platform Capabilities
Work Smarter, Not Harder: Leveraging IT Analytics to Simplify Operations and Improve the Customer Experience
Work Smarter, Not Harder: Leveraging IT Analytics to Simplify Operations and Improve the Customer Experience Data Drives IT Intelligence We live in a world driven by software and applications. And, the
Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities
Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling
Deploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer [email protected] Alejandro Bonilla / Sales Engineer [email protected] 2 Hadoop Core Components 3 Typical Hadoop Distribution
BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics
BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are
THE GLOBAL EVENT MANAGER
The Big Data Mining Company THE GLOBAL EVENT MANAGER When data is available and reachable, it has to be processed and decrypted using multiple heterogeneous tools, if these are available. Each of these
