1 WHITE PAPER Fast Data FAST DATA APPLICATION REQUIRMENTS FOR CTOS AND ARCHITECTS CTOs and Enterprise Architects recognize that the consumerization of IT is changing how software is developed, requiring an evolution in the way applications are designed and developed. The world s data is doubling in size every two years, driven by the forces of mobile adoption, connected devices, sensors, social media, and the Internet of Things (IoT). These sources generate vast amounts of fast-moving data. Successfully handling this Fast Data is business s next big challenge. The availability and abundance of fast, new data presents an enormous opportunity to extract intelligence, gain insight, and personalize interactions of all types. Companies of all sizes are examining how they build new applications and new analytics capabilities; in many cases, what were two separate functions the application and the analytics are beginning to merge. This evolution is leading enterprises to realize that they need a unifying architecture to support the development of dataheavy applications that rely on a fast-data, big-data workflow. The emergence of this unifying architecture - the new enterprise data architecture -will result in a platform capable of processing streams of data at new levels of complexity, scale and speed. This white paper looks at the requirements of the Fast Data workflow and proposes solution patterns for the most common problems software development organizations must resolve to build applications and apps capable of managing fast and big data.
2 WHITE PAPER Fast Data page 2 Solving common, repeating problems Software designers and architects build software stacks from reference architectures to solve common, repeating problems. For example, the LAMP stack provides a framework for building web applications. Each tier of the LAMP stack has viable substitutes that might be chosen simply by preference or perhaps for technical reasons. Additional layers or complementing tools can be mixed in to address specific needs. A lot of information can be conveyed succinctly to others familiar with the pattern by explaining the resulting architecture in the context of the generalized LAMP pattern. Across industries a software stack is emerging that enables the development of applications capable of processing high velocity feeds of data (that quickly accumulate into large volumes of data). Over the next year or two this emerging Fast and Big Data stack will gain prominence and serve as a starting point for developers writing applications against fast, largescale data. Though reaching a consensus on a functional fast data-big data stack through trial, testing, production and extended evaluation is the goal, it s important to step up a layer and consider the workflow that motivates the stack s requirements. We know the common user activities for a desktop application stack (a graphical UI, likely an MVC-style layering, a configuration database, undo capability ). We know the common user activities that guide the design of a web application. In fact we know these so well many of us have written and designed tens or even hundreds of applications and websites over the course of our careers - that we internalize the knowledge and forget that it was once new to us. However, few of us have developed many large-scale data applications. We haven t yet as individuals, or as a community, internalized the usage patterns and general requirements for applications that process high speed inputs from sensors, M2M devices, or IOT platforms. The core activities of the Fast-Data, Big- Data workflow Large-scale data platforms must support four distinct activities to support fast-data, big-data workflows: collecting, exploring, analyzing and acting. The large-scale data stack must allow collecting high-speed, event-oriented data that accumulates quickly. Simply collecting incoming data can be challenging. Tools like Flume and Kafka scale the retrieval and buffering of incoming event data for further application processing, solving one piece of the puzzle. However, data often needs to be enriched augmented with metadata or transformed to common formats validated, or correlated (re-assembled) before it can be used in Fast Data applications. The collection portions of the Fast Data stack must enable these computations and transformations in addition to the raw bucketing of incoming feeds.
3 WHITE PAPER Fast Data page 3 The Fast Data stack must allow exploring the resulting data for patterns, trends and correlations. We commonly refer to this process as data science. Exploration is highly interactive and is often driven by questions and intuition. The demand for qualified, experienced data scientists is evidence of the value of exploring. The popularity of tools such as the R Project and the ability of systems like Spark to run machine-learning algorithms and statistical tests against massive data sets are democratizing access to exploration. The Fast Data stack must allow analyzing data sets to derive seasonal patterns, statistical summaries, scoring models, recommendation models, search indexes and other artifacts for use in user-facing applications. We explore to find and test hypotheses and to identify exploitable trends. Analyzing is a distinct activity from exploring. Analyzing is formal and repeated. Analytics may run once a year or once every few seconds but are typically periodic. For example, we explore to understand if our business is affected by seasonal trends or if a regression model can predict an interesting event. If our exploration confirms our hypothesis, we analyze periodically to calculate seasonal factors or parameters to a confirmed model. The Fast Data stack must allow acting in real time against new incoming data. Responding to users or driving downstream processes, based on the output of the analysis activity and the context and content of the present event, is a requirement of applications that improve revenue. These actions protect users from fraud, enhance customer experience, or optimize the usage of expensive resources. Analytics provides us with a path to wisdom. Analogous to how the integrated capabilities of the LAMP stack informed the millions of web applications written over the last decade, a stack that reliably, simply and efficiently enables the core activities of the Fast Data- Big Data workflow will win and become the reference architecture for large-scale data applications. The new enterprise data architecture It is natural that the collect, explore and analyze phases precede the act phase as we innovate applications and businesses that leverage Big Data. Consequently, the Big Data portion of the Enterprise Data Architecture emerged first and is more familiar. Practitioners have been using OLAP systems, statistical analysis packages, and data cubes for many years. There is consensus that a complete Big Data stack requires ubiquitous storage, often leveraging distributed file systems like HDFS; columnar analytics for OLAP; and frameworks that make data science, machine learning and statistical analytics feasible. The requirements to enable real time applications running against fast data streams - the streams of data that create business value - are less broadly understood. In part this is because these applications are often conceived and deployed in the later act phase of the Fast Data Big Data workflow, and in part because the technologies that support applications at this data velocity are relatively new.
4 WHITE PAPER Fast Data page 4 A discussion of the common requirements underlying Fast Data applications is needed. The 5 common requirements for Fast Data solutions Scott Jarr (VoltDB, CSO) lists five requirements of Fast Data applications (http://voltdb.com/blog/youd-better-do-fast-data-right/): 1. Ingest/interact with the data feed. 2. Make decisions on each event in the feed. 3. Provide visibility into fast-moving data with real-time analytics. 4. Integrate Fast Data systems with Big Data systems. 5. Serve analytic outputs from Big Data systems to users and applications. These requirements are broadly-shared characteristics of Fast Data applications, and appear with remarkable similarity across verticals as diverse as mobile subscriber management, smart grid, gaming, ad-tech, IoT and financial services. If you are designing an application on top of a high-speed data feed, you will eventually encounter and need to implement most (and often all) of these requirements. Requirements are helpful; however, practitioners require meaningful guidance to structure a production solution. In order to understand how to build reliable, scalable and efficient Fast Data solutions, let s look at the problem from the perspective of the data that must be managed and the patterns of processing those data that are common in Fast Data applications. Categorizing Data Data management is harder to scale than computation. Computation, in the absence of data management requirements, is often easily parallelized, easily restarted in case of failure, and, consequently, easier to scale. Reading and writing state in a durable, fault tolerant environment while offering semantics, such as ACID transactions needed by developers to write reliable applications efficiently, is the more difficult problem. Scaling Fast Data applications necessitates organizing data first. What are the data that need to be considered? There are several: the incoming data feed; metadata (dimension data) about the events in the feed; responses and outputs generated by processing the data feed; the postprocessed output data feed; and analytic outputs from the Big Data store. Some of these data are streaming in nature, e.g. the data feed. Some are state-based: the metadata. Some are the results of transactions and algorithms: responses. Fast Data solutions must be capable of organizing and managing all of these types of data.
5 WHITE PAPER Fast Data page 5 DATA SET TEMPORALITY EXAMPLE Input feed of events Stream Click stream, tick stream, sensor outputs, game play metrics Event metadata State Version data, location, user profiles, point of interest data Big data analytic outputs State Scoring models, seasonal usage, demographic trends Event responses Stream Authorizations, policy decisions, threshold alerts Output feed Stream Enriched, filtered, correlated transformation of input feed Results of real-time analytics State Decisions, threshold alerts, authorizations Two distinct types of data must be managed streaming data and stateful data. Recognizing that the problem involves these different types of inputs is key to organizing a Fast Data solution. Streaming data enters the Fast Data stack, is processed, possibly transformed, and then leaves. The objective of the Fast Data stack is not to capture and store these streaming inputs indefinitely; that s the Big Data responsibility. Rather, the Fast Data architecture must be able to ingest this stream and process discrete incoming events to allow applications that act in real time to mke fast, high-value decisions. Stateful data is metadata, dimension data and analytic outputs that describe or enrich the input stream. Metadata might take the form of device locations, remote sensor versions, or authorization policies. Analytic outputs are reports, scoring models, or user segmentation values: information gleaned from processing historic data that informs the real-time processing of the input feed. The Fast Data architecture must support very fast lookup against this stateful data as incoming events are processed and must support fast SQL processing to group, filter, combine and compute as part of the input feed processing. As the Fast Data solution processes the incoming data feed, new events event responses such as alerts, alarms, notifications, responses and decisions are created. These events flow in two directions: responses flow back to the client or application that issued the incoming request; alerts, alarms and notifications are pushed downstream, often to a distributed queue for processing by the next pipeline stages. The Fast Data stack must support the ability to respond in milliseconds to each incoming event; integrate with downstream queuing systems to enable pipelined processing of newly created events; and provide fast export of proccessed events via an output feed to the data lake.
6 WHITE PAPER Fast Data page 6 Categorizing Processing Fast Data applications present three distinct workloads to the Fast Data portion of the emerging large-scale data stack. These workloads are related but require different data management capabilities and are best explained as separate usage patterns. Understanding how these patterns fit together what they share and how they differ is key to understanding the differences between fast and big, and key to making the management of Fast Data applications reliable, scalable and efficient. MAKING REAL TIME DECISIONS The most traditional processing requirement for Fast Data applications is simply fast responses. As high-speed events are being received, Fast Data enables the application to execute decisions: perform authorizations, policy evaluations, calculate personalized responses, refine recommendations, and offer responses at predictable millisecond-level latencies. These applications often need personalization responses in line with customer experience (driving the latency requirement). These applications are, very simply, modern OLTP. These Fast Data applications are driven by machines, middleware, networks or high concurrency interactions (e.g., ad-tech optimization or omni-channel location-based retail personalization). The data generated by these interactions and observations are often archived for subsequent data science. Otherwise, these patterns are classic transaction processing use cases. Meeting the latency and throughput requirements for modern OLTP requires leveraging the performance of in-memory databases in combination with ACID transaction support to create a processing environment capable of fast per-event decisions with latency budgets that meet user experience requirements. In order to process at the speed and latencies required, the database platform must support moving transaction processing closer to the data. Eliminating round trips between client and database is critical to achieving throughput and latency requirements. Moving transaction processing in to memory and eliminating client round trips cooperatively reduce the running time of transactions in the database, further improving throughput. ENRICHING WITHOUT BATCH ETL Real time data feeds often need to be filtered, correlated, or enriched before they can be frozen in the historical warehouse. Performing this processing in real time, in a streaming fashion against the incoming data feed, offers several benefits: 1. Unnecessary latency created by batch ETL processes is eliminated and time-to-analytics is minimized. 2. Unnecessary disk IO is eliminated from downstream Big Data systems (which are usually disk-based, not memory based when ETL is real-time and not batch-oriented). 3. Application-appropriate data reduction at the ingest point eliminates operational expense downstream less hardware is necessary. deltadna is a big data company. We re really interested in delivering fast real-time analytics on big data. We have huge numbers of players creating billions of data points. The games are generating event data as the player plays through that allows us to understand what the players are doing in the game, how they re interacting. The objective is to collect this data to understand users. Being able to interact, to customize the game experience to the individual player, is really important to keep those players playing. We needed to be able to ingest data at a very high velocity. We also knew we weren t simply going to be ingesting data and pushing it into a data warehouse or data lake to do interesting stuff; we have to be able to interact with our users in real-time to make the experience personalized to the game. VoltDB gave us the front-end technology that allows us to do this kind of interaction.
7 WHITE PAPER Fast Data page 7 4. Operational transparency is improved when real time operational analytics can be run immediately without intermediate batch processing or batch ETL. The input data feed in Fast Data applications is a stream of information. Maintaining stream semantics while processing the events in the stream discretely creates a clean, composable processing model. Accomplishing this requires the ability to act on each input event a capability distinct from building and processing windows, as is done in traditional CEP systems. These event-wise actions need three capabilities: fast look-ups to enrich each event with metadata; contextual filtering and sessionizing (re-assembly of discrete events into meaningful logical events is very common); and a stream-oriented connection to downstream pipeline systems ( e.g. distributed queues like Kafka, or OLAP storage or Hadoop/HDFS clusters). Fundamentally, this requires a stateful system that is fast enough to transact event-wise against unlimited input streams and able to connect the results of that transaction processing to downstream components. TRANSITIONING TO REAL-TIME In some cases, backend systems built for batch processing are being deployed in support of sensor networks that are becoming more and more real-time. An example of this is validation, estimation and error (VEE) platforms sitting behind real-time smart grid concentrators. There are real-time use cases (real time consumption, pricing, grid management applications) that need to process incoming readings in real-time. However, traditional billing and validation systems designed to process batched data may see less benefit from being rewritten as real time applications. A platform that offers real time capabilities to real-time applications, while supporting stateful buffering of the feed for downstream batch processing, meets both sets of requirements. Returning to the smart grid example, utility sensor readings become more informative and valuable when one can use real-time analytics to compare a reading from a single meter to 10 others connected to the same transformer. This makes it possible to determine if there is a problem with the transformer, rather than inferring the problem lies with a single meter located at a home. PRESENTING REAL TIME ANALYTICS AND STREAMING AGGREGA- TIONS Typically organizations begin by designing solutions to collect and store real-time feeds as a test bed to explore and analyze the business opportunity of Fast Data before they deploy sophisticated real-time processing applications. Consequently, the on-ramp to Fast Data processing is making real-time data feeds operationally transparent and valuable: is collected data arriving? Is it accurate and complete? Without the ability to introspect real-time feeds, this important data asset is operationally opaque pending delayed batch validation.
8 WHITE PAPER Fast Data page 8 These real-time analytics often fall into four conceptually simple capabilities: counters, leaderboards, aggregates and time-series summaries. However, performing these at scale, in real-time, is a challenge for many organizations that lack Fast Data infrastructure. Real-time analytics and monitoring enable organizations to capture a fast stream of events, sensor readings, market ticks, etc., and provide real-time analytic queries such as aggregated summaries by category, by specific entity, by timeframe, providing meaningful summaries of vast operational data updated second-by-second to show what is currently happening. For example, digital advertisers can launch campaigns and then tweak campaign parameters in real-time to see the immediate difference in performance. Organizations are also deploying in-memory SQL analytics to scale highfrequency reports, or to support detailed slice, dice, sub-aggregation and groupings of reports for real time end-user BI and dashboards. This problem is sometimes termed SQL Caching, meaning a cache of the output of another query, often from an OLAP system, that needs to support scaleout high-frequency, high-concurrency SQL reads. An example of scaling end-user visibility is provided by a cross-channel marketing attribution software provider which chose a Fast Data solution to support its real-time analysis and decisioning infrastructure. The company s Big Data systems run complex algorithms for marketing attribution and optimization. The company needed a high performance, high throughput database with real-time in-memory analytics to enable its clients to make key decisions on media spend. The selection of a scale-out Fast Data architecture supported the company s cloud-based delivery model, enabling fast response times for multiple concurrent users. Combining the Data and Processing Discussions A platform that enables Fast Data applications must account for both stateful and streaming data and provide high speed, fault tolerant connectors to up stream and downstream systems in the data pipeline. When first approaching Fast Data problems, many developers focus solely on the input event feed. This is a mistake. It is important to consider enrichment data, allowing interactive querying of real time analytics using BI and dashboard tooling, combining OLAP analytic outputs with current data to make real time decisions. All of these functional components must be present in a Fast Data architecture to support applications that act. VoltDB s in-memory, scale-out, relational database provides all of these functions. VoltDB offers high speed transactional ACID performance - the ability to process thousands to millions of discrete incoming events per second. VoltDB includes export connects that parallelize streaming to downstream queues, databases and Hadoop clusters. Shopzilla, Inc. a leading source for connecting buyers and sellers online, partnered with VoltDB to improve its high-speed, low-latency ingestion of merchant feeds as part of an inventory platform upgrade, with the goal of delivering a better customer experience. Shopzilla wanted to update the way it delivered cost comparisons of over 100 million products from tens of thousands of retailers a month (including constantly-updated discounts), through its unique portfolio of engaging and informative shopping brands. Shopzilla chose VoltDB to narrow the data ingestion-to-decision gap. VoltDB s ability to run tens of thousands of writes, and hundreds of thousands of reads per second while performing real-time tracking enabled Shopzilla to drive revenues by delivering up-to-date information to consumers, and by passing along more highly targeted leads to the thousands of retailers paying Shopzilla on a pay-per-click basis.
9 WHITE PAPER Fast Data page 9 VoltDB s in-memory SQL support enables interactive queries against real time analytics, streaming aggregations and operational summary data. VoltDB s durable, relational data model supports hosting enrichment data, OLAP analytics directly in the Fast Data platform, obviating secondary databases, simplifying the overall architecture and reducing network latency. Conclusion Applications built to take advantage of fast and big data have different processing and system requirements. Implementing a component on the front end of the enterprise data architecture to handle fast ingestion of data and interact on data to perform real real-time analytics enables developers to create applications that can make data-driven decisions on each event as it enters the data pipeline. Thus applications can take action on real-time, per-event data; processed data can then be exported to the long-term data warehouse or analytics store for reporting and analysis. A unified enterprise data architecture provides the connection between fast and big, linking high-value, historical data from the data lake to fastmoving in-bound streams of data from thousands to millions of endpoints. Providing this capability enables application developers to write code that solves business challenges, without worrying about how to persist data as it flows through the data pipeline to the data lake. An in-memory, scaleout operational system that can decide, analyze and serve results at Fast Data s speed is key to making Big Data work at enterprise scale. A new approach to building enterprise data archicttures to handle the demands of Fast Data gives enterprises the tools to process high volume streams of data and perform millions of complex decisions in real-time. With Fast Data, things that were not possible before become achievable: instant decisions can be made on real-time data to drive sales, connect with customers, inform business processes, and create value.
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment
White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream
Getting Real Real Time Data Integration Patterns and Architectures Nelson Petracek Senior Director, Enterprise Technology Architecture Informatica Digital Government Institute s Enterprise Architecture
Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid
Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours
Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically
Integrating Cloudera and SAP HANA Version: 103 Table of Contents Introduction/Executive Summary 4 Overview of Cloudera Enterprise 4 Data Access 5 Apache Hive 5 Data Processing 5 Data Integration 5 Partner
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling
Enterprises should never lose sight of the end game of Big Data: improving business decisions based on actionable, data-driven intelligence. Today s analytics platforms, low-cost storage and powerful in-memory
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that
Big Data Are You Ready? Jorge Plascencia Solution Architect Manager Big Data: The Datafication Of Everything Thoughts Devices Processes Thoughts Things Processes Run the Business Organize data to do something
Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers
Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation
Practical Considerations for Real-Time Business Intelligence Donovan Schneider Yahoo! September 11, 2006 Outline Business Intelligence (BI) Background Real-Time Business Intelligence Examples Two Requirements
mwd a d v i s o r s Turning Big Data into Big Insights Helena Schwenk A special report prepared for Actuate May 2013 This report is the fourth in a series and focuses principally on explaining what s needed
Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built
TIBCO Live Datamart: Push-Based Real-Time Analytics ABSTRACT TIBCO Live Datamart is a new approach to real-time analytics and data warehousing for environments where large volumes of data require a management
VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations
REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log
Building Data-Driven Internet of Things (IoT) Applications A four-step primer IOT DEMANDS NEW APPLICATIONS Automated homes. Connected cars. Smart cities. The Internet of Things (IoT) will forever change
Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology
How to Ingest Data into Google BigQuery using Talend for Big Data A Technical Solution Paper from Saama Technologies, Inc. July 30, 2013 Table of Contents Intended Audience What you will Learn Background
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
Processing and Analyzing Streams of CDRs in Real Time Streaming Analytics for CDRs 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet
The Kentik Data Engine A CLUSTERED, MULTI-TENANT APPROACH FOR MANAGING THE WORLD S LARGEST NETWORKS When Kentik set out to build a network traffic monitoring solution that would be scalable, high-precision,
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, firstname.lastname@example.org Assistant Professor, Information
An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise Solutions Group The following is intended to outline our
SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
Detecting Anomalous Behavior with the Business Data Lake Reference Architecture and Enterprise Approaches. 2 Detecting Anomalous Behavior with the Business Data Lake Pivotal the way we see it Reference
Ron Bodkin Founder & CEO, Think Big June 2013 Real-time Big Data Analytics with Storm Leading Provider of Data Science and Engineering Services Accelerating Your Time to Value IMAGINE Strategy and Roadmap
Introduction to Apache Kafka And Real-Time ETL for Oracle DBAs and Data Analysts 1 About Myself Gwen Shapira System Architect @Confluent Committer @ Apache Kafka, Apache Sqoop Author of Hadoop Application
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
Talend Real-Time Big Data Talend Real-Time Big Data Overview of Real-time Big Data Pre-requisites to run Setup & Talend License Talend Real-Time Big Data Big Data Setup & About this cookbook What is the
TECH NOTE Hadoop Alone Is Not Big Data Twenty-one years ago, a year before the first web browser appeared, Walmart s Teradata data warehouse exceeded a terabyte of data and kicked off a revolution in supply-chain
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
Architecting an Industrial Sensor Data Platform for Big Data Analytics 1 Welcome For decades, organizations have been evolving best practices for IT (Information Technology) and OT (Operation Technology).
perspective Data Virtualization A Potential Antidote for Big Data Growing Pains Atul Shrivastava Abstract Enterprises are already facing challenges around data consolidation, heterogeneity, quality, and
: High-throughput and Scalable Storage Technology for Streaming Data Munenori Maeda Toshihiro Ozawa Real-time analytical processing (RTAP) of vast amounts of time-series data from sensors, server logs,
September 10-13, 2012 Orlando, Florida Demonstration of SAP Predictive Analysis 1.0, consumption from SAP BI clients and best practices Vishwanath Belur, Product Manager, SAP Predictive Analysis Learning
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
2015 Analyst and Advisor Summit Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist Agenda Key Facts Offerings and Capabilities Case Studies When to Engage
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
A Near Real-Time Personalization for ecommerce Platform Amit Rustagi email@example.com Abstract. In today's competitive environment, you only have a few seconds to help site visitors understand that you
BIG DATA ANALYTICS For REAL TIME SYSTEM Where does big data come from? Big Data is often boiled down to three main varieties: Transactional data these include data from invoices, payment orders, storage
Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data
Corralling Data for Business Insights The difference data relationship management can make Part of the Rolta Managed Services Series Data Relationship Management Data inconsistencies plague many organizations.
Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
FINANCIAL SERVICES: FRAUD MANAGEMENT A solution showcase TECHNOLOGY OVERVIEW FRAUD MANAGE- MENT REFERENCE ARCHITECTURE This technology overview describes a complete infrastructure and application re-architecture
The Next Big Thing in the Internet of Things: Real-Time Big Data Analytics" #IoTAnalytics" Mike Gualtieri Principal Analyst Forrester Research " " "" " " " " Dale Skeen CTO & Co-Founder Vitria Technology
ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION EXECUTIVE SUMMARY Oracle business intelligence solutions are complete, open, and integrated. Key components of Oracle business intelligence
E SB DE CIS IO N GUID E Business Transformation for Application Providers 10 Questions to Ask Before Selecting an Enterprise Service Bus 10 Questions to Ask Before Selecting an Enterprise Service Bus InterSystems
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
Data Science & Big Data Practice INSIGHTS ANALYTICS INNOVATIONS Manufacturing IoT Amplify Serviceability and Productivity by integrating machine /sensor data with Data Science What is Internet of Things
Making big data simple with Databricks We are Databricks, the company behind Spark Founded by the creators of Apache Spark in 2013 Data 75% Share of Spark code contributed by Databricks in 2014 Value Created
September 10-13, 2012 Orlando, Florida Solving big data problems in real-time with CEP and Dashboards - patterns and tips Kevin Wilson Karl Kwong Learning Points Big data is a reality and organizations
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
White Paper Unified Data Integration Across Big Data Platforms Contents Business Problem... 2 Unified Big Data Integration... 3 Diyotta Solution Overview... 4 Data Warehouse Project Implementation using
Unified Data Integration Across Big Data Platforms Contents Business Problem... 2 Unified Big Data Integration... 3 Diyotta Solution Overview... 4 Data Warehouse Project Implementation using ELT... 6 Diyotta