ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that is critical. Big Data is a world of never-ending combinations of data in previously unimaginable numbers- up to zettabytes. Accounting for all the binary, text, audio, video files, bank transaction records, sensor readings, cell-phone call records, social media musings, and so on, enterprises in virtually every industry are amassing vast quantities of useful data on their operations, products, customers, and competitors.
Gone are the days when reporting on all this data once in a time period was enough. We are now in an age where acting upon real-time intelligence correlated with historical data and steering the course of business is critical. Transforming data into Instant insights repeatedly empowers businesses into marketing decisions regarding customer-oriented strategies, generating new revenue-streams, or achieving operational efficiencies; to gain definitive competitive advantages today. Consider these two examples- Retail Customer Profiling - Where ecommerce stores, operators, fleet managers, warehouse managers and others want to find actionable insights on their customer s changing needs, interests and behavior, so they can customize their services and offer promotional or upsell items at the precise point in time to the rights customer. Location-based Mobile Wallet Services - Where a telco service provider can offer its customers location-specific information guided by historical preferences regarding cuisine, entertainment and shopping; in real time: Sending coupons for favorite products just as the customer nears a grocery store, for example. The Challenge Big Data is in Motion The examples above touch upon the significant challenges to analyzing big data and making it actionable- 1) the sheer size (volume) of data, 2) the variety of data types and sources, 3) the need for fast and yet complex analyses, 4) the velocity of new data being generated and 5) the need for flexibility of operations, for example, the ability to quickly create or modify views or filters on this data. Looking from a different perspective, the industry examples also bring to light that enterprises have a Big Data problem with the data at rest the historical data they have had in their repositories and data in motion the data they are generating on a continuous basis from their operations, interactions, etc. Interestingly, both categories are growing persistently. However, conventional database technology does not support this easily. Most enterprises therefore still run reports and analytics historically they wait tell the end of the hour/day/week/month, thereby endangering their competitive advantage and operational efficiencies. Companies that do not simultaneously process historical and real-time data run the risk of disenchanting their customers by offering dated and mis-matched promotions and products. At the same time, such businesses do not have an accurate representation of their operations and costs. Enterprises need to mine data at rest and data in motion simultaneously to uncover problems and discover new opportunities for their business. From OLTP to Hadoop How Existing Technologies Fall Short of Big Data Demands There have been consistent advances in data-management techniques in recent years. However, limitations in prevailing database technologies continue to persist, restricting enterprises from fully harnessing the potential of their big data. Many of today s databases, while effective for some purposes, use architectures that were designed decades ago with data and index structures that were not constructed for efficient, real-time analyses of Big Data. In addition, these databases use sequential algorithms, which are not capable of fully exploiting the potential of parallel hardware One cannot sequentially search a billion rows of data and believe that s going to be the fastest way to address big data; the solution simply will not scale. Simply put, today s new demands require new technologies to address them.
The following chart shows various technologies that cover data analytics in terms of the size of data they address and the response times they achieve for typical queries. It is worth nothing however, that this chart addresses only two of the barriers, i.e.; size and speed. It does not address the velocity, the variety or the flexibility of the solution. The highlights of current database technologies are as follows: Mature database technologies such as Online Transaction Processing (OLTP) perform reasonably well when used for reporting, but only when handling low operational data volumes. This is because OLTP architectures are not optimized for analytics performance, but for a mixture of read and write-intensive transactions. Today only very small enterprises use their OLTP platform for analytics. The prevailing Online Analytical Processing Cube approach to business intelligence is inflexible. Defining and creating cubes is time-consuming, requiring specialist skills at significant cost with every change in requirement. Data scientists working with line-of-business experts have to predict future reporting and analytical requirements thereby preventing ad-hoc queries- and the complexity increases in any model with more than three dimensions. Complex Event Processing (CEP) was developed to fulfill speedier analyses of a constant stream of data from multiple sources. It is well suited for real-time critical events such as in automobile crash prevention sensor networks or facility security sensor networks, when viewed in the context of a very short interval of time. However, by its nature, data volumes with the CEP approach do not reach the size of Big Data. In-Memory Databases exhibit increased query speed as there is little I/O data transfer, but the data size is limited and constrained by how much memory is available. At Big Data volumes, there is a definite cost challenge with this approach as memory costs exceed raw storage costs by orders of magnitude, and the vast data sets required typically cannot affordably work via in-memory processing. Batch Analytics promise to tackle the high volume side of the chart, such as Hadoop open source framework for data intensive applications, the associated MapReduce programming model, and NoSQL databases. These use a distributed model on clusters of computers and are capable of large-volume analyses. However, due to its batchoriented methodology of processing data. Batch Analytics cannot perform mass calculations and analysis in anywhere close to real time. Further, Batch Analytics approaches suffer from the lack of resiliency in their cluster the failure of nodes within the cluster will impact the timeliness of the query. With these current technologies all demonstrating limitations with respect to the growing data volume and increasing velocity, one could conclude that achieving Big Data and real-time data analysis is not feasible. However, emerging technologies categorized as Interactive Analytics have the potential to solve these challenges.
Introducing ParStream: Real-time Analytics and Instant Insights ParStream has developed one of the most comprehensive platforms in the Interactive Analytics category. The approach was to build a new computing architecture capable of massive parallel processing, configured and optimized for large amounts of data, and employing new indexing methods to achieve real-time query response times. The ParStream platform was built while keeping in mind the requirements of Big Data, namely- Pure speed being able to process huge volumes of records in sub-second response times Accommodation of data in motion supporting continuous and fast data imports while concurrently analyzing the data without performance degradation Simultaneous analysis of historical and current data without cubes correlating the two as new information continuous to come in at the Big Data rate Flexible support of complex and ad hoc queries a data structure that can concurrently support multiple complex queries and easily generated ad hoc queries Concurrency ability to serve thousands of concurrent users without loss of performance Minimal infrastructure ability to scale and perform on minimal and commodity hardware while still ensuring high levels of fault tolerance and availability Robust integration ability to integrate with existing data and server infrastructure and third party software stacks via standard protocols Organizations need to turn Big Data into immediate and useable knowledge require database architecture capable of handling analyses of historical data in conjunction with new data that is constantly being generated from their operations. ParStream is specifically engineered to deliver both Big Data and fast data the first real-time analytics platform designed and optimized for the speed of Big Data queries combined with the high velocity of incoming data. High Level Architecture ParStream s patented high performance, compressed indexing technology enables efficient parallel query processing on parallel architecture in a multi-server environment. ParStream also requires a fraction of the infrastructure of other solutions up and running quickly. Hardware and energy costs are substantially reduced while overall performance is optimized. In addition, ParStream can be seamlessly added to a company s existing environment and processes without major system architectural change. In addition, ParStream has been specifically engineered to handle:
Structured and semi-structured data De-normalized, very large fact tables Selective and multi-dimensional filtering and analytics Extremely short query response times Very high query throughput Conclusion Enterprises have been dealing with growing data demands for generations now. While the transient data or data in motion has historically been a fraction of the size of the large, legacy data repositories or data at rest, we are now at the point where the data in motion itself is in big size. Real-time analytics and the resulting insights provide a definite competitive edge and must be an integral part of the data strategy for every enterprise. To reap the full benefits from real-time analytics, enterprises will have to consider a technology platform that enables them to analyze both Big Data at rest and the Big Data in motion, simultaneously. ParStream has built a real-time analytics platform with unique and innovative technology. In contrast to conventional database and analytical technologies, ParStream continuously imports new data rendering updated analytical results; thus providing faster and more accurate insights to decision makers within seconds. About ParStream ParStream is the IoT analytics platform company. The ParStream Analytics Platform was purpose-built for scale to handle the massive volumes and high velocity of IoT data. Enabling a new breed of analytics for the enterprise, ParStream has earned accolades including CIO Magazine #1 Big Data Startup, Gartner Cool Vendor and Database Trends and Applications Magazine s Trend-Setting Products in Data. ParStream is based in Silicon Valley, online at www.parstream. com and on Twitter @ParStream.
How Customers are Using ParStream Many enterprises have begun to use ParStream as their real-time Big Data analytics platform, using it to deliver ultra-fast interactive analytical results. ParStream technology is best suited for use cases requiring ultra-fast response times, very high query throughput and continuous data import. Some examples are listed below Facetted Search At a leading provider of credit insurance and financial services, ParStream supports a large database- approx. 10M data records with thousands of columns. In addition, ParStream enables the customer s more than 100 concurrent users at any given time, to navigate to their desired information easily through a facetted search that features multi-lingual text and multiple-choice numeric filters. Online Marketing Analytics This search and social analytics company provides search analytics tools and monitors over 75 million domains and 100 million keywords on the world s biggest search engines. Its own customers use the service to monitor competing domains and to optimize their keywords to drive their traffic. ParStream enables them to manage their data; regularly importing several terabytes of data to query more than ten billion data records. By switching to ParStream, this company greatly reduced infrastructure requirements that resulted in faster import and query execution times. AD / Campaign Conversion Records ParStream s capacity to provide real-time responses on big data queries while simultaneously absorbing on-the-fly clickstreams at rates of over 100,000/second has allowed this customer, a web analytics and optimization company, to develop an innovative in-depth interactive analytics service including live-segmentation. With a performance boost ranging from 500 to 12,000 times faster.