1 GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and mobile applications are driving the need for scalable, real-time platforms that can handle streaming analysis and processing of massive amounts of data. Today, creating an analytics system for big data generally means collecting multiple technologies from various providers, and building the system yourself. This presents challenges in terms of erformance, costs, scalability, real-time, and more. GigaSpaces resolves these issues: You need to handle massive amounts of data in real time, without losing data and at minimum cost. Most analytics systems are not designed for real-time: it can take hours or days to see the impact of an event in reports, enabling you to take action. The challenge becomes even greater as events are gathered from more sources at significantly higher volumes. One option: Construct your own solution by combining various available technologies. This can be complex: In addition to messaging, data storage, and processing, you need management and orchestration for automating the deployment and ensuring continuous availability the assorted parts. A simpler option: Just plug in the GigaSpaces Real-Time Analytics solution. You can focus on your business logic, and leave the rest to us. GigaSpaces makes building and deploying a large-scale real-time analytics system simple. You just provide simple event processing business logic, and we handle the scalability, performance, and database integration. Seamlessly. GigaSpaces delivers software middleware that provides enterprises and ISVs with end-to-end application scalability and cloud-enablement for mission-critical applications for hundreds of tier-1 organizations worldwide. It s Open: Use any stack, avoid lock-in. Pick your own Big Data database (RDBMS or NoSQL); Plug in consistent management and monitoring across the stack without changing your code; Write event handlers using common languages; Access your data using standard SQL/JPA APIs. All while minimizing costs. A unique combination of memory and disk-based databases ensure the optimum cost/performance ratio. Leveraging automation and cloud-based deployment reduces operational costs. The GigaSpaces Real-Time Analytics solution for Big Data Applications eliminates the complexity XAP Real-Time Solution for Big Data Cassandra HBase MongoDB Redis
2 CURRENT TECHNOLOGIES OVERVIEW There is no one-size-fits-all technology. Building an analytic application that addresses real-time and batch analytics requirements requires a combination of the available technologies. The challenge becomes the integration of these various pieces, tuning the system to ensure consistent performance and scaling through the entire stack, and providing consistent management and monitoring across the entire stack. Most analytics systems can be broken down into three stages of data flow in the system: Metrics Correlation Research various metrics are collected into counters. For example, number of requests per day. (Real-time) Correlate metrics for a more aggregative system view. For example, analyze which features hook users. (Near real-time) Use this information to run research and trend analysis over a period of time. (Batch map/reduce processing) Currently, you must integrate different products and technologies to provide the entire analytics functionality. This method has many associated challenges: Traditional App: Database (RDBMS) Used to run many analytics systems Complex Event Processing (CEP) Designed to correlate data in real time Associated Challenges: Performance: Not designed for real time Scaling: Not designed to grow at the speed and volume of information required in a Big Data environment, doesn t fit well for data that is continuously evolving Cost: Most RDBMS rely on expensive set-up and hardware to maintain reliability and performance Scaling: It is often necessary to aggregate events into a centralized source, which doesn t scale Capacity: Not designed to deal with historical data Hadoop Designed for batch analytics and complex correlation Performance: Not designed for real time In-Memory Data Grid Fast processing power for storing and processing data Capacity: Capacity for storing vast amounts of information in-memory doesn t scale, in terms of both system scaling and cost NoSQL Designed to handle large data volumes at low cost Processing capability: Sheer amount of data can be challenging
3 THE SOLUTION Google, Facebook, and Twitter have already shown us the way by moving many of their analytics systems to real time. The question now is how businesses can build their own Google/Facebook/Twitter-like analytics, but in a significantly simpler way that fits existing applications and skillsets. Step 1: Collect and Store Enable collection of large volumes of data from multiple sources in real time. The process must be reliable, to ensure the accuracy of the analytics. Solution: Use an In-Memory Data Grid Memory enables x100k msg/sec Reliability is achieved through redundancy and replication Can be accessed through large set of APIs (Document, JMS, Memcache...) Step 2: Speed up processing through co-location of business logic with data By co-locating your business logic and data, you can process events as they enter the system, reducing multiple network hops and serialization/de-serialization overhead. You can also reduce the number of moving parts, making the entire system significantly simpler to scale and maintain. Step 3: Integrate with the Big Data store to meet volume and cost demands Integrate with the Big Data store through a generic plug-in, compatible with your data store of choice, whether NoSQL or SQL. Avoid lock-in to a specific NoSQL API Performance: Reduced network hops & serialization overhead Simplicity fewer moving parts Scalability without compromising consistency (strict consistency at the front, eventual consistency for the long-term data) JPA/Standard API
4 PUTTING IT ALL TOGETHER 1. Store events in memory 2. Co-locate business logic with data for RT processing 3. Integrate with Big Data store for long-term data Cluster of in-memory data grids (IMDG) at the front and a Big Data database at the backend. Feeds are stored directly into the IMDG. The feeds trigger a set of co-located processors that process them. The processing can include validation and enrichment of the data as well as creation of new data sets needed for further correlation and post-processing of data. Data is forwarded to the back-end Big Data store through the built-in write-behind feature of the IMDG. The IMDG can be used as a processing buffer: After processing by the IMDG, data is stored in the Big Data storage. It can also be used to store the last day of information. Data sent to the NoSQL data store is stored in batches to maximize write throughput. The analytics application reads the data directly from the NoSQL data store. When the app needs only the last day of activity, it can access the data grid directly through the built-in JPA/SQL interface.
6 COST BENEFITS Economic Data Scaling Leverage commodity hardware and software-based storage to provide a large-scale data store at low cost. Solution: Memory short-term data Disk long-term data Combine memory and disk for optimum cost performance ratio: Memory is x10, x100 lower than disk for high data access rate (According to Stanford research) Disk is lower cost for high capacity lower access rate Example: Cost RAM Use Disk for this throughput Throughput Disk Use Memory for this throughput Optimum Cost The cost of processing 10K events per second and storing it for a window of an hour (till it gets pulled to the long-term storage) with 500B message size in memory requires only ~16G at a cost of ~$32 per month per server. Economic App Scaling Automation: Reduce operational cost Elastic Scaling: Reduce over-provisioning cost Cloud portability: Choose the right cloud for the job Cloud bursting: Scavenge extra capacity when needed Industry use cases that particularly need real-time insights from big data sets include: Social Networking: Measure the immediate impact to your site traffic from social media, whether a new blog post, a tweet, a Like, or even a comment. Knowing this information translates to better conversion and more effective online campaigns. SaaS: Measuring user behavior and acting upon it is crucial for improving customer satisfaction and conversion rates which represent immediate increases in revenue. Financial Services: Determining in real time whether your portfolio is losing money, or if there is fraud in your system means that you can prevent disasters as they occur, not after the damage is done. Correlating multiple sources from the market in real time results in a more accurate view of the market and enables more accurate actions to maximize your profit. ABOUT GIGASPACES GigaSpaces Technologies is the pioneer of a new generation of application virtualization platforms, and a leading provider of end-to-end scaling solutions for distributed, mission-critical application environments, and cloud enabling technologies. GigaSpaces is the only platform on the market that offers truly silo-free architecture, along with operational agility and openness, delivering enhanced efficiency, extreme performance, and always-on availability. Our technology was designed from the ground up to support any cloud environment private, public, or hybrid and offers a pain-free, evolutionary path from today s data center to the technologies of tomorrow. GIGASPACES OFFICES WORLDWIDE US East Coast Office, New York Tel: US West Coast Office, San Jose Tel: International Office, Tel Aviv Tel: Europe Office, London Tel: Asia Pacific Office, Singapore Tel:
Easy Deployment of Mission-Critical Applications to the Cloud Businesses want to move to the cloud to gain agility and reduce costs. But if your app needs re-architecting or new code that s neither easy
MASARYK UNIVERSITY FACULTY OF INFORMATICS Best Practices in Scalable Web Development MASTER THESIS Martin Novák May, 2014 Brno, Czech Republic Declaration Hereby I declare that this paper is my original
Plug Into The Cloud with Oracle Database 12c ORACLE WHITE PAPER DECEMBER 2014 Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only,
E-PAPER March 2014 Big Data & the Cloud: The Sum Is Greater Than the Parts Learn how to accelerate your move to the cloud and use big data to discover new hidden value for your business and your users.
Migration Planning Kit Microsoft Windows Server 2003 This educational kit is intended for IT administrators, architects, and IT managers. The kit covers the reasons and process you should consider when
QLIKVIEW AND BIG DATA: HAVE IT YOUR WAY A QlikView White Paper November 2012 qlikview.com Table of Contents Executive Summary 3 Introduction 3 The Two Sides of Big Data Analytics 3 How Big Data Flows from
WHITE PAPER Elastic Cloud Infrastructure: Agile, Efficient and Under Your Control - 1 - INTRODUCTION Most businesses want to spend less time and money building and managing infrastructure to focus resources
Trends in Cloud Computing and Big Data Nikita Bhagat, Ginni Bansal, Dr.Bikrampal Kaur firstname.lastname@example.org, email@example.com, firstname.lastname@example.org Abstract - BIG data refers to the
ISSN (Online): 2409-4285 www.ijcsse.org Page: 78-85 A Survey of Big Data Cloud Computing Security Elmustafa Sayed Ali Ahmed 1 and Rashid A.Saeed 2 1 Electrical and Electronic Engineering Department, Red
terracotta technical whitepaper: BigMemory: In-Memory Data Management for the Enterprise Abstract As your application s data grows, maintaining scalability and performance is an increasing challenge. With
Best Practices for Virtualizing and Managing SQL Server v1.0 May 2013 Best Practices for Virtualizing and Managing SQL Server 2012 1 1 Copyright Information 2013 Microsoft Corporation. All rights reserved.
Cloud optimize your business Windows Server 2012 R2 Published: October 7, 2013 Contents 1 Trends 3 Windows Server: cloud optimize your business 5 Windows Server 2012 R2 capability overview 5 Server virtualization
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers
Database Systems Journal vol. IV, no. 3/2013 31 Big Data Challenges Alexandru Adrian TOLE Romanian American University, Bucharest, Romania email@example.com The amount of data that is traveling across
Microsoft System Center 2012 R2 Why Microsoft? For Virtualizing & Managing SharePoint July 2014 v1.0 2014 Microsoft Corporation. All rights reserved. This document is provided as-is. Information and views
Oracle Whitepaper June 2013 An Oracle White Paper June 2013 Oracle Multitenant plug into the cloud with oracle database 12c Disclaimer The following is intended to outline our general product direction.
WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept
IT@Intel White Paper Intel IT IT Best Practices Private Cloud and Cloud Architecture December 2011 Best Practices for Building an Enterprise Private Cloud Executive Overview As we begin the final phases
WHITE PAPER Introduction... 2 Reduce Tool and Process Sprawl... 2 Control Virtual Server Sprawl... 3 Effectively Manage Network Stress... 4 Reliably Deliver Application Services... 5 Comprehensively Manage
Cloud Computing Models Eugene Gorelik Working Paper CISL# 2013-01 January 2013 Composite Information Systems Laboratory (CISL) Sloan School of Management, Room E62-422 Massachusetts Institute of Technology
OPEN DATA CENTER ALLIANCE : sm Big Data Consumer Guide SM Table of Contents Legal Notice...3 Executive Summary...4 Introduction...5 Objective...5 Big Data 101...5 Defining Big Data...5 Big Data Evolution...7
Solution Brief Big Data in the Cloud: Converging Technologies How to Create Competitive Advantage Using Cloud-Based Big Data Analytics Why You Should Read This Document This paper describes how cloud and
A Fresh Graduate s Guide to Software Development Tools and Technologies Chapter 1 Cloud Computing CHAPTER AUTHORS Wong Tsz Lai Hoang Trancong Steven Goh PREVIOUS CONTRIBUTORS: Boa Ho Man; Goh Hao Yu Gerald;
1 Contents Introduction. 1 View Point Phil Shelley, CTO, Sears Holdings Making it Real Industry Use Cases Retail Extreme Personalization. 6 Airlines Smart Pricing. 9 Auto Warranty and Insurance Efficiency.
ascent Thought leadership from Atos white paper Data Analytics as a Service: unleashing the power of Cloud and Big Data Your business technologists. Powering progress Big Data and Cloud, two of the trends