1 EXECUTIVE REPORT Big Data and the 3 V s: Volume, Variety and Velocity
2 The three V s are the defining properties of big data. It is critical to understand what these elements mean. The main point of the V-based characterization of big data is to highlight its most serious challenges: the capturing, cleaning, curating, integrating, storing, processing, indexing, searching, sharing, transferring, mining, analyzing, and visualizing of large volumes of fast-moving, highly complex data. The good news is that there are many big data solutions to help out, including the MapR M7 Enterprise Database Edition for Hadoop, which received the highest ranking among big data deployments, according to a recent Forrester report 1. The first V stands for volume. Volume describes the absolute magnitude of the data being analyzed. Terabyte is becoming a relatively small amount, as petabyte becomes the metric being used to quantify data sets in industries across the board. It is easy to revel in the astronomical numbers around data in this new world, a world measured in zettabytes 1,180,591,620,717,411,303,424 bytes. Large numbers like these and the amount of processing power needed to digest this amount of information make it easy to capture the attention of early adopters. The second V stands for variety. Variety in big data refers to the differing types of data. This is where the discussion turns to Social Media. Buried in the monotony of social data, nuggets of advertising gold may be found. The problem with this is the digging process. The information advertisers and B2C businesses want, is buried in mountains of non-structured data. The information one can mine from social media is unlike traditional structured information and as a result, does not fit well into a standard database, where you can run tried and true analytical tools. The information is found in the metadata of photos and videos. It is found in statuses about forgotten anniversaries and in the well-meant birthday wishes of individuals. Footnotes: 1 The Forrester WaveTM: Big Data Hadoop Solutions, Q1 2014, February Copyright (c) 2014, Forrester Research, Inc. 2
3 Even if you limit your data tracking to only text-based information, what one gets is varied. For example, if one wants to track what people think of orange on Twitter, it s not as simple as just tracking the word orange. There s orange the fruit, the color, the French tech company, the city in California, the county in California and that is just the first page of results from a Google search. Variety includes structured, database ready information and unstructured data, including different forms of media and seemingly simple, text-based information that may require context to be analyzed properly. The final V is Velocity. Velocity is the sheer speed of data. Statistics such as every minute, users watch over 138,000 hours of video on YouTube. Every minute, 27,778 new blog posts go live on Tumblr. Every minute, 100,000 Tweets are shared. Every minute, 208,000 pictures are posted on Facebook. Sometimes, these authors will glaze over more industrial statistics that are equally amazing, such as every flight a Virgin Atlantic Boeing 787 takes collects 500 gigs of data. This data is analyzed in real time, allowing preparations for repairs to occur before the plane ever lands at its destination. The speed of incoming data is imperative and the life and usefulness of this data is short. To take advantage of it, the data must be analyzed in real time, requiring a huge amount of computing power. These aspects of big data would cause almost any adopter to pause. However, to continue in the quest towards knowing what big data is and the different ways one can use it, one must wrap ones head around Hadoop. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures
4 Hadoop is an open-source software licensed by the Apache Software Foundation. Part of the features of the software is to provide for scalability within big data applications. Traditional architectures have grids that are for storage only, compute only, and report only. This segregated system makes big data applications inflexible. Data collection has gotten to the point where the volume, velocity and the variety of incoming data has increased the complexity of processing dramatically. Users with traditional architectures are finding their systems cannot finish processing today s load of data before tomorrow s load starts flowing in. With a traditional architecture, it is not possible to add a new server into the architecture and have it help process the query that has been slowing the system down; the computation would have to start over. It is not possible to add new data to the set and continue the computation where you left it; the computation would have to start over. Hadoop solves the dilemma by grouping servers together and then having them act as one clustered system. Within that capability, Hadoop allows for the addition of new data into an ongoing query and it allows for scaling of the physical infrastructure without interrupting current processing. Another key benefit to Hadoop is the systematic process of replication that has been designed into the operation. Each data set is broken up into blocks and then those blocks are replicated on 3 different nodes (servers) in the cluster. That means if any one block goes down, it has 2 backups of replication data. Replicated Data Being analyzed and stored Raw Data There are also some weaknesses with Hadoop such that ultimately, the solutions being implemented are a hybrid architecture, to blend the traditional approaches around structured and lasting historical data, and the evolving approaches for bursting and unstructured data that need extremely fast processing and analyzing. 4
5 An important component to big data, which sometimes goes unnoticed, is the physical data center facility, which is vital to the success of big data architectures. The data center is the backbone for big data applications. For some small companies there are cloud options available. However, for many companies the security of the cloud becomes an issue. As a result, many companies need to manage their own data and their own server clusters. The information big data solutions are trying to collect, crunch and store can be extremely private in nature. The health care industry, for example, has many regulations dealing with the collection and use of data, including the Health Insurance Portability and Accountability Act (HIPAA) which establishes procedures for the exercise of individual health information privacy rights. For some companies, the sheer size of their data makes using cloud based solutions uneconomical. Companies looking to experiment with clusters start with around 100 servers (around 3-4 racks). Adoption leaders have much bigger clusters; Yahoo for example, was reported by Information Week to have 42,000 servers (around 1,400 racks) in their clusters. The space required to store a cluster like that is massive and can start to double in size very rapidly as the data expands. Keeping the clusters cool, powered and accessible to the internet is no easy feat. For many companies, the different aspects of managing a data center are outside of their core competencies. That is why more and more enterprise companies are choosing to outsource their data center needs, by using colocation providers. Big data provides businesses a level of business intelligence that can set them apart from their competition and give them valuable insights into their market. Despite this fact, Gartner research estimates that 85% of Fortune 500 companies will be unable to exploit big data for competitive advantage. 3 5
6 About With over two dozen data centers across the globe, helps many of the world s largest global businesses including 9 of the global Fortune 20 companies and over 140 of the Fortune 1000 and companies of all sizes take advantage of the latest data center technology and realize top operational efficiencies through: Flexible, Scalable Solutions Receive flexible data center solutions that readily scale to match the needs of your growing business. Proven, Innovative Technology Benefit from the latest data center innovations expert technicians can put to work for your IT environment. Exceptional Service Enjoy personalized, consultative service through all stages of the relationship - design, build, installation, management, and reporting. National IX Offers low-cost metro connectivity and city-to-city transport in an ever growing number of cities across the US. About the Author Scott Brueggeman oversees the management of s global marketing, product development, inside sales, and corporate communications including branding, demand creation, and public relations. His 20 years of marketing and sales experience includes Fortune 50 firms, as well as smaller high-growth companies. Prior to, he spent several years with running marketing at a data center hosting and managed services company, as well as Chief Marketing Officer at PEAK6 Investments, an international financial services firm. Prior to that he was VP Marketing for CareerBuilder, and also held leadership positions at AT&T and PepsiCo. Brueggeman serves on several advisory boards. 6
EXECUTIVE REPORT 6 Key Considerations for to Improve Security and Scalability of Mobile Banking Services Consumers Demand Mobile Banking, So Financial Companies Need Best-in- Class Security and Scalable
EXECUTIVE REPORT Why Healthcare Providers Seek Out New Ways To Manage and Utilize Big Impact of Healthcare Regulations on the Center The HIPAA and HITECH acts, along with the Affordable Care Act, are changing
EXECUTIVE REPORT 4 Critical Steps Financial Firms Must Take for IT Uptime, Security, and Connectivity When Millions of Dollars of Financial Transactions are On the Line, Downtime is Not an Option The many
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R B i g D a t a : W h a t I t I s a n d W h y Y o u S h o u l d C a r e Sponsored
Executive Report Impact of Healthcare Regulations on the Data Center Impact of Healthcare Regulations The HIPAA and HITECH acts, along with the Affordable Care Act, are changing the face of the healthcare
For Big Data Analytics There s No Such Thing as Too Big The Compelling Economics and Technology of Big Data Computing March 2012 By: 4syth.com Emerging big data thought leaders Forsyth Communications 2012.
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers
How to embrace Big Data A methodology to look at the new technology Contents 2 Big Data in a nutshell 3 Big data in Italy 3 Data volume is not an issue 4 Italian firms embrace Big Data 4 Big Data strategies
OPEN DATA CENTER ALLIANCE : sm Big Data Consumer Guide SM Table of Contents Legal Notice...3 Executive Summary...4 Introduction...5 Objective...5 Big Data 101...5 Defining Big Data...5 Big Data Evolution...7
QLIKVIEW AND BIG DATA A QlikView Technology White Paper July 2012 victa.nl firstname.lastname@example.org +31 74 2915208 Introduction There is an incredible amount of interest in the topic of Big Data at present: for many
EXECUTIVE SUMMARY Big Data is not an uncommon term in the technology industry anymore. It s of big interest to many leading IT providers and archiving companies. But what is Big Data? While many have formed
Brochure Best practices for cloud-based information governance Autonomy Cloud solutions Information governance in the cloud Key advantages to cloud computing Cloud computing alleviates adoption complexity,
Introduction.... 1 Emerging Trends and Technologies... 3 The Changing Landscape... 4 The Impact of New Technologies... 8 Cloud... 9 Mobile... 10 Social Media... 13 Big Data... 16 Technology Challenges...
White paper Proactive Planning for.. Big Data.. In government, Big Data presents both a challenge and an opportunity that will grow over time. Executive Summary Consider this list of government-adopted
Emergence and Taxonomy of Big Data as a Service Benoy Bhagattjee Working Paper CISL# 2014-06 May 2014 Composite Information Systems Laboratory (CISL) Sloan School of Management, Room E62-422 Massachusetts
Database Systems Journal vol. IV, no. 3/2013 31 Big Data Challenges Alexandru Adrian TOLE Romanian American University, Bucharest, Romania email@example.com The amount of data that is traveling across
Cost aware real time big data processing in Cloud Environments By Cristian Montero Under the supervision of Professor Rajkumar Buyya and Dr. Amir Vahid A minor project thesis submitted in partial fulfilment
May 2014, HAPPIEST MINDS TECHNOLOGIES Big Data: Why should enterprises adopt it Author Manish Kumar 1 S HARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright Information This
Introduction to Cloud Computing architecture White Paper 1st Edition, June 2009 Abstract Cloud computing promises to increase the velocity with which applications are deployed, increase innovation, and
Big Data: Powering the Next Industrial Revolution Author: Abhishek Mehta April 2011 p2 Executive Summary Data is a key raw material for a variety of socioeconomic business systems. Unfortunately, the ability
Plug IT In 3 Cloud computing PLUG IT IN OUTLINE PI3.1 Introduction PI3.2 What is cloud computing? PI3.3 Different types of clouds PI3.4 Cloud computing services PI3.5 Cloud computing benefits PI3.6 Concerns
1 Contents Introduction. 1 View Point Phil Shelley, CTO, Sears Holdings Making it Real Industry Use Cases Retail Extreme Personalization. 6 Airlines Smart Pricing. 9 Auto Warranty and Insurance Efficiency.
Convergence of Social, Mobile and Cloud: 7 Steps to Ensure Success June, 2013 Contents Executive Overview...4 Business Innovation & Transformation...5 Roadmap for Social, Mobile and Cloud Solutions...7
At the Big Data Crossroads: turning towards a smarter travel experience Thomas H. Davenport Visiting Professor at Harvard Business School Amadeus IT Group is committed to minimizing its carbon footprint.
Big Data: Beyond the Hype Why Big Data Matters to You White Paper BY DATASTAX CORPORATION October 2013 Table of Contents Abstract 3 Introduction 3 Big Data and You 5 Big Data Is More Prevalent Than You