ANALYTICS BUILT FOR INTERNET OF THINGS

Similar documents
How To Handle Big Data With A Data Scientist

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Tap into Big Data at the Speed of Business

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

In-Memory Analytics for Big Data

Big Data. Fast Forward. Putting data to productive use

Page 2 of 5. Big Data = Data Literacy: HP Vertica and IIS

The 4 Pillars of Technosoft s Big Data Practice

SQL Server 2012 Performance White Paper

Understanding the Value of In-Memory in the IT Landscape

Transforming the Telecoms Business using Big Data and Analytics

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Cray: Enabling Real-Time Discovery in Big Data

NoSQL for SQL Professionals William McKnight

How To Use Big Data For Telco (For A Telco)

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

The 3 questions to ask yourself about BIG DATA

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How to Enhance Traditional BI Architecture to Leverage Big Data

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

The big data revolution

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Reaping the Rewards of Big Data

How To Scale Out Of A Nosql Database

Using Tableau Software with Hortonworks Data Platform

IoT and Big Data- The Current and Future Technologies: A Review

Big Data at Cloud Scale

Introducing Oracle Exalytics In-Memory Machine

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Oracle Big Data SQL Technical Update

Customized Report- Big Data

Are You Ready for Big Data?

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

HP Vertica at MIT Sloan Sports Analytics Conference March 1, 2013 Will Cairns, Senior Data Scientist, HP Vertica

Traditional BI vs. Business Data Lake A comparison

Luncheon Webinar Series May 13, 2013

Understanding traffic flow

Microsoft Analytics Platform System. Solution Brief

The Rise of Industrial Big Data

Using In-Memory Computing to Simplify Big Data Analytics

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal

The Next Wave of Data Management. Is Big Data The New Normal?

White. Paper. EMC Isilon: A Scalable Storage Platform for Big Data. April 2014

Navigating Big Data business analytics

Navigating the Big Data infrastructure layer Helena Schwenk

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Demonstration of SAP Predictive Analysis 1.0, consumption from SAP BI clients and best practices

Big Data Integration: A Buyer's Guide

Blueprints for Big Data Success

Big Data and Analytics 21 A Technical Perspective Abhishek Bhattacharya, Aditya Gandhi and Pankaj Jain November 2012

SAP HANA FAQ. A dozen answers to the top questions IT pros typically have about SAP HANA

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Are You Ready for Big Data?

Actian Vector in Hadoop

NextGen Infrastructure for Big DATA Analytics.

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora

SQL Server 2012 Parallel Data Warehouse. Solution Brief

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013

Data Refinery with Big Data Aspects

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Oracle Database In-Memory The Next Big Thing

The Edge Editions of SAP InfiniteInsight Overview

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Dell* In-Memory Appliance for Cloudera* Enterprise

Why Big Data in the Cloud?

Manifest for Big Data Pig, Hive & Jaql

Big Data & the Cloud: The Sum Is Greater Than the Parts

PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA

Transcription:

ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that is critical. Big Data is a world of never-ending combinations of data in previously unimaginable numbers- up to zettabytes. Accounting for all the binary, text, audio, video files, bank transaction records, sensor readings, cell-phone call records, social media musings, and so on, enterprises in virtually every industry are amassing vast quantities of useful data on their operations, products, customers, and competitors.

Gone are the days when reporting on all this data once in a time period was enough. We are now in an age where acting upon real-time intelligence correlated with historical data and steering the course of business is critical. Transforming data into Instant insights repeatedly empowers businesses into marketing decisions regarding customer-oriented strategies, generating new revenue-streams, or achieving operational efficiencies; to gain definitive competitive advantages today. Consider these two examples- Retail Customer Profiling - Where ecommerce stores, operators, fleet managers, warehouse managers and others want to find actionable insights on their customer s changing needs, interests and behavior, so they can customize their services and offer promotional or upsell items at the precise point in time to the rights customer. Location-based Mobile Wallet Services - Where a telco service provider can offer its customers location-specific information guided by historical preferences regarding cuisine, entertainment and shopping; in real time: Sending coupons for favorite products just as the customer nears a grocery store, for example. The Challenge Big Data is in Motion The examples above touch upon the significant challenges to analyzing big data and making it actionable- 1) the sheer size (volume) of data, 2) the variety of data types and sources, 3) the need for fast and yet complex analyses, 4) the velocity of new data being generated and 5) the need for flexibility of operations, for example, the ability to quickly create or modify views or filters on this data. Looking from a different perspective, the industry examples also bring to light that enterprises have a Big Data problem with the data at rest the historical data they have had in their repositories and data in motion the data they are generating on a continuous basis from their operations, interactions, etc. Interestingly, both categories are growing persistently. However, conventional database technology does not support this easily. Most enterprises therefore still run reports and analytics historically they wait tell the end of the hour/day/week/month, thereby endangering their competitive advantage and operational efficiencies. Companies that do not simultaneously process historical and real-time data run the risk of disenchanting their customers by offering dated and mis-matched promotions and products. At the same time, such businesses do not have an accurate representation of their operations and costs. Enterprises need to mine data at rest and data in motion simultaneously to uncover problems and discover new opportunities for their business. From OLTP to Hadoop How Existing Technologies Fall Short of Big Data Demands There have been consistent advances in data-management techniques in recent years. However, limitations in prevailing database technologies continue to persist, restricting enterprises from fully harnessing the potential of their big data. Many of today s databases, while effective for some purposes, use architectures that were designed decades ago with data and index structures that were not constructed for efficient, real-time analyses of Big Data. In addition, these databases use sequential algorithms, which are not capable of fully exploiting the potential of parallel hardware One cannot sequentially search a billion rows of data and believe that s going to be the fastest way to address big data; the solution simply will not scale. Simply put, today s new demands require new technologies to address them.

The following chart shows various technologies that cover data analytics in terms of the size of data they address and the response times they achieve for typical queries. It is worth nothing however, that this chart addresses only two of the barriers, i.e.; size and speed. It does not address the velocity, the variety or the flexibility of the solution. The highlights of current database technologies are as follows: Mature database technologies such as Online Transaction Processing (OLTP) perform reasonably well when used for reporting, but only when handling low operational data volumes. This is because OLTP architectures are not optimized for analytics performance, but for a mixture of read and write-intensive transactions. Today only very small enterprises use their OLTP platform for analytics. The prevailing Online Analytical Processing Cube approach to business intelligence is inflexible. Defining and creating cubes is time-consuming, requiring specialist skills at significant cost with every change in requirement. Data scientists working with line-of-business experts have to predict future reporting and analytical requirements thereby preventing ad-hoc queries- and the complexity increases in any model with more than three dimensions. Complex Event Processing (CEP) was developed to fulfill speedier analyses of a constant stream of data from multiple sources. It is well suited for real-time critical events such as in automobile crash prevention sensor networks or facility security sensor networks, when viewed in the context of a very short interval of time. However, by its nature, data volumes with the CEP approach do not reach the size of Big Data. In-Memory Databases exhibit increased query speed as there is little I/O data transfer, but the data size is limited and constrained by how much memory is available. At Big Data volumes, there is a definite cost challenge with this approach as memory costs exceed raw storage costs by orders of magnitude, and the vast data sets required typically cannot affordably work via in-memory processing. Batch Analytics promise to tackle the high volume side of the chart, such as Hadoop open source framework for data intensive applications, the associated MapReduce programming model, and NoSQL databases. These use a distributed model on clusters of computers and are capable of large-volume analyses. However, due to its batchoriented methodology of processing data. Batch Analytics cannot perform mass calculations and analysis in anywhere close to real time. Further, Batch Analytics approaches suffer from the lack of resiliency in their cluster the failure of nodes within the cluster will impact the timeliness of the query. With these current technologies all demonstrating limitations with respect to the growing data volume and increasing velocity, one could conclude that achieving Big Data and real-time data analysis is not feasible. However, emerging technologies categorized as Interactive Analytics have the potential to solve these challenges.

Introducing ParStream: Real-time Analytics and Instant Insights ParStream has developed one of the most comprehensive platforms in the Interactive Analytics category. The approach was to build a new computing architecture capable of massive parallel processing, configured and optimized for large amounts of data, and employing new indexing methods to achieve real-time query response times. The ParStream platform was built while keeping in mind the requirements of Big Data, namely- Pure speed being able to process huge volumes of records in sub-second response times Accommodation of data in motion supporting continuous and fast data imports while concurrently analyzing the data without performance degradation Simultaneous analysis of historical and current data without cubes correlating the two as new information continuous to come in at the Big Data rate Flexible support of complex and ad hoc queries a data structure that can concurrently support multiple complex queries and easily generated ad hoc queries Concurrency ability to serve thousands of concurrent users without loss of performance Minimal infrastructure ability to scale and perform on minimal and commodity hardware while still ensuring high levels of fault tolerance and availability Robust integration ability to integrate with existing data and server infrastructure and third party software stacks via standard protocols Organizations need to turn Big Data into immediate and useable knowledge require database architecture capable of handling analyses of historical data in conjunction with new data that is constantly being generated from their operations. ParStream is specifically engineered to deliver both Big Data and fast data the first real-time analytics platform designed and optimized for the speed of Big Data queries combined with the high velocity of incoming data. High Level Architecture ParStream s patented high performance, compressed indexing technology enables efficient parallel query processing on parallel architecture in a multi-server environment. ParStream also requires a fraction of the infrastructure of other solutions up and running quickly. Hardware and energy costs are substantially reduced while overall performance is optimized. In addition, ParStream can be seamlessly added to a company s existing environment and processes without major system architectural change. In addition, ParStream has been specifically engineered to handle:

Structured and semi-structured data De-normalized, very large fact tables Selective and multi-dimensional filtering and analytics Extremely short query response times Very high query throughput Conclusion Enterprises have been dealing with growing data demands for generations now. While the transient data or data in motion has historically been a fraction of the size of the large, legacy data repositories or data at rest, we are now at the point where the data in motion itself is in big size. Real-time analytics and the resulting insights provide a definite competitive edge and must be an integral part of the data strategy for every enterprise. To reap the full benefits from real-time analytics, enterprises will have to consider a technology platform that enables them to analyze both Big Data at rest and the Big Data in motion, simultaneously. ParStream has built a real-time analytics platform with unique and innovative technology. In contrast to conventional database and analytical technologies, ParStream continuously imports new data rendering updated analytical results; thus providing faster and more accurate insights to decision makers within seconds. About ParStream ParStream is the IoT analytics platform company. The ParStream Analytics Platform was purpose-built for scale to handle the massive volumes and high velocity of IoT data. Enabling a new breed of analytics for the enterprise, ParStream has earned accolades including CIO Magazine #1 Big Data Startup, Gartner Cool Vendor and Database Trends and Applications Magazine s Trend-Setting Products in Data. ParStream is based in Silicon Valley, online at www.parstream. com and on Twitter @ParStream.

How Customers are Using ParStream Many enterprises have begun to use ParStream as their real-time Big Data analytics platform, using it to deliver ultra-fast interactive analytical results. ParStream technology is best suited for use cases requiring ultra-fast response times, very high query throughput and continuous data import. Some examples are listed below Facetted Search At a leading provider of credit insurance and financial services, ParStream supports a large database- approx. 10M data records with thousands of columns. In addition, ParStream enables the customer s more than 100 concurrent users at any given time, to navigate to their desired information easily through a facetted search that features multi-lingual text and multiple-choice numeric filters. Online Marketing Analytics This search and social analytics company provides search analytics tools and monitors over 75 million domains and 100 million keywords on the world s biggest search engines. Its own customers use the service to monitor competing domains and to optimize their keywords to drive their traffic. ParStream enables them to manage their data; regularly importing several terabytes of data to query more than ten billion data records. By switching to ParStream, this company greatly reduced infrastructure requirements that resulted in faster import and query execution times. AD / Campaign Conversion Records ParStream s capacity to provide real-time responses on big data queries while simultaneously absorbing on-the-fly clickstreams at rates of over 100,000/second has allowed this customer, a web analytics and optimization company, to develop an innovative in-depth interactive analytics service including live-segmentation. With a performance boost ranging from 500 to 12,000 times faster.