1 Big Data and Transactional Databases Exploding Data Volume is Creating New Stresses on Traditional Transactional Databases
2 Introduction The world is awash in data and turning that data into actionable information is creating a massive new stress on traditional technology infrastructures across the board. New data collectors, such as sensors, geo-data, internet click tracking have created a world where everything from drilling rigs, to cars to mobile phones are collecting massive amounts of data related to your every utterance or click. Whole industries are becoming increasingly aware of the value locked in these massive oceans of data. The first phase of utilizing this data is to extract and organize the valuable information from it. Companies are adopting map-reduce solutions such as Hadoop and its nested query complement Dremel. Once the data is organized and query-able, extracting more information drives us toward visualization of the data via technologies like D3 and similar tools. At the same time, massive amounts of data, transactional data, click-through data, even intelligence extracted via Hadoop and Dremel is finding its way into transactional data. As the web becomes increasingly social, user-generated, associative and user-tracking data is flowing into transactional databases at unprecedented rates, overwhelming traditional architectures, tools and infrastructures. Big Data is a major technology wave that is washing over every industry and in the process, it is causing the industry to rethink how we handle these data volumes. This paper addresses the impact on online transaction processing (OLTP) databases. What is Big Data? Big Data describes the process of extracting actionable intelligence from disparate, and often times non-traditional, data sources. These data sources may include structured data such as databases, sensor, click stream and location data, as well as unstructured data like , HTML, social data and images. The actionable data may be represented visually (e.g. in a graph), but it is often distilled down to a structured format, which is then stored in a database for further manipulation. The sheer size of data being collected is more than traditional compute infrastructures can handle; exceeding the capacities of databases, storage, networks and everything in between. Extracting actionable intelligence from BigData requires handling large amounts of disparate data and processing it very quickly. Finally, the data inputs and the actionable intelligence must be correct; the data must be consistent and clean. As the saying goes, garbage in, garbage out. All of these demands are overwhelming
3 traditional computing infrastructure. IBM describes these new demands across four dimensions: Volume, Velocity, Variety and Veracity. I would add Richly Linked Data to this list processing Big Data uncovers rich relationships between that data except I cannot think of a V word that says richly linked. To deal with the onslaught of Big Data, companies are turning to new tools and new business processes. Big Data Tools As Big Data overwhelms traditional databases, storage and more, companies are looking to exploit new tools like Hadoop, SSD, database virtualization, storage virtualization, network virtualization, and more. The reality is that you want to avoid single device bottlenecks, since they inhibit scaling. Hadoop uses map-reduce to spread analytical processing across armies of commodity servers. SSD, while expensive per GB of data capacity, provides the performance necessary to keep up with the velocity of Big Data. Virtualization of the database, storage and networking provides the elasticity and agility needed to scale to address Big Data demands, while delivering a consistent quality of service. These are just some of the tools being brought to bear on the Big Data challenge. Big Data Business Processes The benefits of Big Data are quite tantalizing. Big Data can be used to improve efficiency and the predictive capabilities in everything from health care to oil drilling. Once businesses get a taste of Big Data, their appetite becomes insatiable. This has spawned new business processes to meet the rising demand. Moving to the cloud is one such business process enabling Big Data. Cloud enables you to process your Big Data using say 1,000 machines for just an hour, paying only for the time you use them. This makes Big Data processing cost-effective in terms of both operational expenses (OpEx) and capital expenses (CapEx). Another interesting business process that is becoming popular is cloud-bursting. Cloud-bursting means running the core process on your own machines, but allowing overflow compute demands to run on a public cloud, typically for a short period of time. Creative companies will use these and other innovative business processes to deal with the growing demands of Big Data. What Role do Databases Play in Big Data? Big Data begets Bigger Data. The more a company recognizes the transformative role of Big Data, the more data they seek to capture and utilize. As a result, more companies are capturing more data. This includes everything from web analytics and click stream data to expanding their database schema to capture more transactional information. The more you utilize Big Data, the more data you seek to collect. Databases are broken into two classes: analytical and transactional. Transactional databases capture structured information and maintain the relationships between that information. Transactional data is one feedstock for Big Data. Analytical databases then sift through the
4 structured and unstructured data to extract actionable intelligence. Often times, this actionable intelligence is then stored back in a transactional database. How are Transactional Databases Handling Big Data? Big Data requires decentralization. Because of the volume and velocity of data being processed, centralization is anathema to Big Data. The networking, storage and compute must be decentralized or they will not scale. However, centralization is a core tenant of SQL databases. Traditional databases tightly link computation, caching and storage in a single machine in order to deliver optimal performance. There are two approaches to scaling SQL databases in order to handle Big Data namely sharding and shared-data clustering. Sharding One approach to decentralizing transactional databases is sharding. If you have an existing schema, sharding removes the relations between tables and then stores those various tables in separate databases. This forces the application layer to maintain, and in some cases reconstruct, those relationships. One common approach to sharding is to split customers across multiple databases. For example, you might have customers 1-10,000 in one database, then 10,001-20,000 in another database and so on. Sharding is one way to scale your data handling needs, but it is very inflexible, it doesn t adhere to the Big Data principle of agility. A sharded database cannot add new data sources, and new ways of processing that data, on the fly. Sharding creates a rigid structure that necessitates a painful resharding each time you modify or expand the data or relationships between the data. Shared-Data Clustering Shared-data database clusters, as provided by ScaleDB and Oracle RAC, deliver the agility required to handle Big Data. Unlike sharded databases, shared-data clusters support elastic scaling. If your database requires more compute, you can add compute nodes. If your database is I/O bound, you can add storage nodes. In keeping with the Big Data principle of distributing the workload, shared-data clusters parallelize some processing across smart storage nodes, further eliminating bottlenecks, and allowing you to scale to address your Big Data needs. Unlike sharded databases, shared-data clusters maintain the flexibility to add new tables and relationships on the fly. This flexibility is imperative, in order to keep up with the ever changing data sources and data relationships driven by Big Data. Shared-data clusters, like ScaleDB, can scale to accommodate thousands of storage nodes enabling almost unlimited scaling capability. The ability to distribute process of this data to the storage nodes, which return results for their specific piece of the data is very analogous to MapReduce
5 techniques used by Hadoop, yet it is handled within the tenants of traditional ACID database constraints. How Can You Prepare for Big Data? The most important first step in preparing for Big Data is to consider scale, parallelization, and agility. These issues must be considered when choosing your computing tools and your business processes. Maintain agility or flexibility, because your data and your processing needs will change and that change may be rapid and disruptive. Scale and parallelization go hand-in-hand. The only way you can scale to handle Big Data, is by leveraging parallelization. This means you must distribute processing, data and networking so as to avoid bottlenecks. These same principles apply to your business processes. This may involve exploiting elastic cloud computing either directly or through cloud bursting. Consider that, as you plan your infrastructure and your schemas today, things will change relatively quickly. Big Data begets Bigger Data, so prepare for future scale and agility today. Conclusion As more and more industries are recognizing the value of information extracted from massive amounts of Big Data, the trend of ever increasing data collection and manipulation will continue to accelerate. This trend will compound upon itself; as more companies recognize the value of Big Data, they will demand more data and more manipulation and visualization of that data. This trend of accelerating data volume and velocity will put increasing demands on all aspects of the computing infrastructure. It will lead to a shift away from centralized architectures like traditional shared-nothing databases. It will also lead a shift away from data silos, like sharded shared-nothing databases. Big Data will drive the adoption of architectures like Hadoop (for analytics) and ScaleDB (for transactions) that distribute data and processing over large clusters of machines. Big Data benefits will entice businesses to collect and analyze even more data, resulting in an every increasing strain on the computing infrastructure at all levels. This will force IT to adopt Big Data friendly technologies, tools, architecture and business practices. It will be a very exciting time to be in IT.
E-PAPER March 2014 Big Data & the Cloud: The Sum Is Greater Than the Parts Learn how to accelerate your move to the cloud and use big data to discover new hidden value for your business and your users.
An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure
E-Guide BIG DATA: IMPLICATIONS ON STORAGE B ig data is a source of confusion for many storage professionals, and the lack of a standard definition of this popular buzzword is partly responsible. This expert
White paper Proactive Planning for.. Big Data.. In government, Big Data presents both a challenge and an opportunity that will grow over time. Executive Summary Consider this list of government-adopted
An Oracle White Paper March 2013 Big Data Analytics Advanced Analytics in Oracle Database Advanced Analytics in Oracle Database Disclaimer The following is intended to outline our general product direction.
White Paper The Business Analyst s Guide to Hadoop Get Ready, Get Set, and Go: A Three-Step Guide to Implementing Hadoop-based Analytics By Alteryx and Hortonworks (T)here is considerable evidence that
David Chappell Understanding NoSQL on Microsoft Azure Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Data on Azure: The Big Picture... 3 Relational Technology: A Quick
IBM Software Thought Leadership White Paper June 2013 The top five ways to get started with big data 2 The top five ways to get started with big data Big data: A high-stakes opportunity Remember what life
INTELLIGENT BUSINESS STRATEGIES W H I T E P A P E R Architecting A Big Data Platform for Analytics By Mike Ferguson Intelligent Business Strategies October 2012 Prepared for: Table of Contents Introduction...
W I N T E R C O R P O R A T I O N Executive Report BIG DATA: BUSINESS OPPORTUNITIES, REQUIREMENTS AND ORACLE S APPROACH RICHARD WINTER December 2011 SUMMARY NEW SOURCES OF DATA and distinctive types of
For Big Data Analytics There s No Such Thing as Too Big The Compelling Economics and Technology of Big Data Computing March 2012 By: 4syth.com Emerging big data thought leaders Forsyth Communications 2012.
OPEN DATA CENTER ALLIANCE : sm Big Data Consumer Guide SM Table of Contents Legal Notice...3 Executive Summary...4 Introduction...5 Objective...5 Big Data 101...5 Defining Big Data...5 Big Data Evolution...7
Expert Reference Series of White Papers Big Data and Apache Hadoop Adoption: Key Challenges and Rewards 1-800-COURSES www.globalknowledge.com Big Data and Apache Hadoop Adoption: Key Challenges and Rewards
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R B i g D a t a : W h a t I t I s a n d W h y Y o u S h o u l d C a r e Sponsored
QLIKVIEW AND BIG DATA: HAVE IT YOUR WAY A QlikView White Paper November 2012 qlikview.com Table of Contents Executive Summary 3 Introduction 3 The Two Sides of Big Data Analytics 3 How Big Data Flows from
David Chappell October 2012 WINDOWS AZURE DATA MANAGEMENT CHOOSING THE RIGHT TECHNOLOGY Sponsored by Microsoft Corporation Copyright 2012 Chappell & Associates Contents Windows Azure Data Management: A
Customer Cloud Architecture for Big Data and Analytics Executive Overview Using analytics reveals patterns, trends and associations in data that help an organization understand the behavior of the people
White Paper Panasas Parallel Storage for Big Data Applications November 2012 Introduction Big data has become the buzz word of the storage industry, with little analysis of what it actually means or what
American Journal of Engineering Research (AJER) e-issn : 2320-0847 p-issn : 2320-0936 Volume-03, Issue-05, pp-266-270 www.ajer.org Research Paper Open Access Convergence of Big Data and Cloud Sreevani.Y.V.
1 Contents Introduction. 1 View Point Phil Shelley, CTO, Sears Holdings Making it Real Industry Use Cases Retail Extreme Personalization. 6 Airlines Smart Pricing. 9 Auto Warranty and Insurance Efficiency.
White Paper Data Warehouse Optimization with Hadoop A Big Data Reference Architecture Using Informatica and Cloudera Technologies This document contains Confidential, Proprietary and Trade Secret Information
ericsson White paper 284 23-3211 Uen August 2013 Big Data Analytics Successful decision-making will increasingly be driven by analytics-generated insights. From the lowest-level network enablers to high-level
WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept
www.pwc.com PwC Advisory Oracle practice 2012 How to drive innovation and business growth Leveraging emerging technology for sustainable growth 1 Heart of the matter Top growth driver today is innovation