Chapter 1. Contrasting traditional and visual analytics approaches



Similar documents
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

HP Vertica at MIT Sloan Sports Analytics Conference March 1, 2013 Will Cairns, Senior Data Scientist, HP Vertica

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Big Data Analytics FOR. by Steve Piper

Data Refinery with Big Data Aspects

Big Data. Fast Forward. Putting data to productive use

VIEWPOINT. High Performance Analytics. Industry Context and Trends

Outline. What is Big data and where they come from? How we deal with Big data?

Are You Ready for Big Data?

Il mondo dei DB Cambia : Tecnologie e opportunita`

Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada

Big Analytics: A Next Generation Roadmap

Are You Ready for Big Data?

White Paper: Datameer s User-Focused Big Data Solutions

Data Doesn t Communicate Itself Using Visualization to Tell Better Stories

Big + Fast + Safe + Simple = Lowest Technical Risk

Demystifying Big Data Government Agencies & The Big Data Phenomenon

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

How To Create A Data Science System

Getting Started Practical Input For Your Roadmap

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

What happens when Big Data and Master Data come together?

Business Intelligence / Big Data Consulting Service

Navigating the Four Vs of Big Data: Shrinking the Haystack for Actionable Insights

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Using Tableau Software with Hortonworks Data Platform

A New Era Of Analytic

Industry Impact of Big Data in the Cloud: An IBM Perspective

How the oil and gas industry can gain value from Big Data?

Advanced Fraud Detection & Prevention Through Big Data

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Big Data. Lyle Ungar, University of Pennsylvania

The New Normal: Get Ready for the Era of Extreme Information Management. John Mancini President, DigitalLandfill.

22 SMARTENTERPRISEMAG.COM

REDEFINING ANALYTICS This White Paper does the following: Examines current and emerging technologies.

How To Handle Big Data With A Data Scientist

BIG DATA APPLIANCES. July 23, TDWI. R Sathyanarayana. Enterprise Information Management & Analytics Practice EMC Consulting

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Customized Report- Big Data

Doing Multidisciplinary Research in Data Science

BIG DATA CHALLENGES AND PERSPECTIVES

Exploiting Data at Rest and Data in Motion with a Big Data Platform

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

Hadoop Market - Global Industry Analysis, Size, Share, Growth, Trends, and Forecast,

Big Data and the new trends for BI and Analytics Juha Teljo Business Intelligence and Predictive Solutions Executive IBM Europe

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013

locuz.com Big Data Services

Big Data The next big thing

ADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE

DISCOVERING AND SECURING SENSITIVE DATA IN HADOOP DATA STORES

Majed Al-Ghandour, PhD, PE, CPM Division of Planning and Programming NCDOT 2016 NCAMPO Conference- Greensboro, NC May 12, 2016

Big Data and Its Impact on the Data Warehousing Architecture

In-Memory Analytics for Big Data

Big Data a threat or a chance?

Big Data Technologies Compared June 2014

ANALYTICS BUILT FOR INTERNET OF THINGS

EVERYTHING THAT MATTERS IN ADVANCED ANALYTICS

UNIFY YOUR (BIG) DATA

Big Data and Apache Hadoop Adoption:

Big Data Big Deal? Salford Systems

Advanced In-Database Analytics

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud

Beyond Watson: The Business Implications of Big Data

TUT NoSQL Seminar (Oracle) Big Data

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Hadoop for Enterprises:

ISSN: International Journal of Innovative Research in Technology & Science(IJIRTS)

Dell Information Management solutions

We are Big Data A Sonian Whitepaper

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

Big Data and Healthcare Payers WHITE PAPER

INVESTOR PRESENTATION. First Quarter 2014

Introduction to Big Data the four V's

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Big Data and Data Science: Behind the Buzz Words

Unlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach

BIRT in the World of Big Data

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Welcome to the webinar Does your department or company use the valuable data it collects to plan for future needs and trends?

Banking On A Customer-Centric Approach To Data

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data. Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich

How To Understand The Benefits Of Big Data

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst

Why is Internal Audit so Hard?

IBM BigInsights for Apache Hadoop

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Big Data and Market Surveillance. April 28, 2014

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Reaping the Rewards of Big Data

The Big Picture on Big Data. Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg

Библиотека БГУИР BIG DATA IN BANKING. Constantine DZIK. Dzmitry BALTUNOU. MOHAMMED Utech LLC location Madison, MS USA

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Transcription:

Chapter 1 Understanding Big Data Analytics In This Chapter Defining Big Data Understanding Big Data Analytics Contrasting traditional and visual analytics approaches The era of Big Data is upon us. The race is on to extract insight and value from this abundant resource. The opportunities are enormous and so are the challenges. Organizations that master the emerging discipline of Big Data Analytics can reap significant rewards and separate themselves from their competitors; those that fail to do so will be left in the dust. Big Data is here to stay. And you re part of it! In this chapter, I define Big Data and Big Data Analytics and explore the challenges of harvesting value from an evergrowing sea of digital information. What Is Big Data? COPYRIGHTED MATERIAL Big Data is a term applied to data sets so large that common software tools aren t capable of capturing, managing, and processing their data within a tolerable period. Big Data is colossal, unstructured (or loosely structured), distributed, fluid, and often unconnected. The amount of Big Data varies by organization, but its volume (and variety) tends to increase astonishingly quickly and exponentially.

4 Big Data Analytics For Dummies, Centrifuge Special Edition In the following sections, I give you some basic background on Big Data: exactly how big it is, where it comes from, how to evaluate it, and how to use it. Putting the Big in Big Data Analysts estimate that approximately 300 million terabytes (TB) of data exist in the world today. But what s staggering is that 90 percent of this data was created in the last two years! In a recent study titled The Digital Decade Are You Ready? market research company IDC projected that by 2020 the digital universe will encompass a staggering 35 zettabytes (ZB). Step aside, petabytes and exabytes! The word is now zettabytes! And with 1ZB being equivalent to 1 billion terabytes, that s a whole lot of data. To put this into perspective, from its founding in April 1800 to April 2011, the U.S. Library of Congress had amassed about 235TB of data. Currently, it s adding about 5TB of new data each month. So if IDC is correct, by 2020, computers will collectively store 400 million times more data than is archived in the entire Library of Congress today! Seeing where the data comes from You may wonder where all this data comes from. It comes from almost everywhere. Enterprises and government agencies aggregate data from myriad private and/or public data sources. Private data is information that your organization specifically collects that is available only to your organization, such as employee data, customer data, and machine data (such as user transactions, customer behavior, computer system health, and cybersecurity threats). Commercial-specific examples include credit-card, pharmacy, and mortgage transactions. Government-specific examples include Social Security data, Medicare transactions, and passport paperwork. Public data is information that s generally available to the public for a fee or at no charge. Examples include stock prices, company and individual credit ratings, social media content (such as Facebook and Twitter), and computer IP

Chapter 1: Understanding Big Data Analytics 5 blacklists (such as known hacking sites) along with all other content found on the public Internet. When you stop and think about it, it s no wonder the world is drowning in data. If an organization can record something, it usually does environmental data, financial data, medical data, surveillance data, and on and on. Figuring out what to do with the data The most significant challenges of Big Data no longer involve aggregation and storage but rather what to do with all the accumulated data. Today, common concerns for commercial enterprises and government agencies include the following: Deriving actionable value from Big Data, due to information overload. Analyzing the connections between structured, semistructured, and unstructured data sets. Structured data is stored in relational databases in columns and rows. Semistructured data doesn t conform to the formal structure of tables and rows but contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields. Examples include web pages and XML (extensible markup language). Unstructured data refers to information that doesn t have a predefined data model and thus can t be stored in a relational database. Unstructured data can be textual or nontextual. Examples of textual unstructured data include e-mail messages, PowerPoint presentations, and Word documents. Examples of nontextual unstructured data include JPEG images, MP3 audio files, and Flash video files. According to the market-research firm IDC, semistructured and unstructured data accounts for more than 90 percent of the data in today s organizations. Uncovering patterns of useful data when you don t know what questions to ask in the first place. As you discover throughout this book, a Big Data Analytics solution can help your organization meet all these challenges.

6 Big Data Analytics For Dummies, Centrifuge Special Edition Popular Big Data infrastructure solutions The following is a list of popular Big Data infrastructure solutions that you re likely to encounter when using Big Data Analytics applications. Apache Hadoop is an opensource software framework that supports data-intensive distributed applications working with thousands of computers and petabytes of data. Cloudera offers Apache Hadoopbased software and services that make it easier to run Hadoop in a production environment. EMC Greenplum is a commercial data warehouse based on the open-source database PostgreSQL and intended for large-scale enterprise and cloud deployments. HP Vertica is a commercial database software platform that uses a column-oriented analytic database to process large amounts of data for quick analysis. IBM Netezza is a commercial data-warehouse solution based on proprietary technology, scaling to more than 10 petabytes (PB) of data. NetApp specializes in enterpriseclass data-warehouse solutions and is a thought leader in Big Data. NetApp s highest-end platform can accommodate up to 4PB of raw data. Oracle is one of the most successful enterprise database warehouse providers today, with all of the Fortune 100 as customers. Oracle offers a line of Big Data Appliances that can accommodate up to 648TB of raw storage in a single rack and up to 5PB within an eight-rack cluster. Splunk is a software application that enables users to search, monitor, and analyze machinegenerated data by applications, systems, and IT infrastructure via a web-based interface. Sybase, an SAP company, is an enterprise software and services company offering software to manage, analyze, and mobilize information using relational databases, analytics, and data warehousing solutions and mobile applications development platforms. Teradata offers commercial relational database management system (RDBMS) hardware and software. The company launched the Petabyte Power Players club to include customers with petabyte-plus data warehouses, including Dell (1PB), Bank of America (1.5PB), Wal-Mart Stores (2.5PB), and ebay (5PB).

Chapter 1: Understanding Big Data Analytics 7 Evaluating Big Data for analysis: The Four Vs Diamonds are evaluated on what are commonly known as the Four Cs: color, cut, clarity, and carat weight. Similarly, Big Data is commonly evaluated on the Four Vs: Volume describes the relative size of data typically, in terabytes or petabytes. Velocity describes the frequency at which data is generated, captured, and shared. Variety describes the types of data in a data set, such as transactional, social, content, geospatial, location-based, log, and radio-frequency identification (RFID). Value describes the business benefits reaped by the organization, such as fraud detection, loan risk analysis, and customer-behavioral analytics. All four of these Big Data characteristics are important to consider when you re evaluating solutions for Big Data Analytics which I introduce next. See Chapter 2 for a lot more information about the Four Vs. What Is Big Data Analytics? Big Data Analytics is the process whether manual or automated of analyzing Big Data to extract meaning and actionable intelligence. Put another way, it makes Big Data useful. Only a short time ago, companies used to spend considerable time and resources to identify and procure useful data. Today, most companies have the opposite problem. Aggregating useful data is relatively easy; analyzing that data is the challenge. In the following sections, I explore two approaches to Big Data Analytics: the traditional approach and the visual analytics approach.

8 Big Data Analytics For Dummies, Centrifuge Special Edition Traditional analytics approach You may be surprised that many organizations still employ data analysts who use manual techniques to extract useful information from large data warehouses. Such techniques typically include ad hoc database queries followed by a series of univariate (analysis of single-variable distributions), bivariate, and, more often, multivariate analyses. These analysts often have advanced degrees in mathematics and/or statistics and pride themselves on their ability to perform advanced regression analyses. They often view data in columns and rows and then periodically create charts and graphs manually, using spreadsheets or basic business intelligence reporting tools (see Figure 1-1). Figure 1-1: Manual data analysis. Even with automation, this type of analytical approach is limited in its ability to detect unknown or undiscovered patterns (link analysis). Assumptions are often hard-coded, leading to false outcomes. It s like trying to find a needle in a haystack! Ultimately, the results are too little and too late.

Chapter 1: Understanding Big Data Analytics 9 Visual analytics approach Today s data analysts take an entirely different approach. They prefer to work smarter not harder to uncover hidden meanings in Big Data, leveraging visual analytics tools to integrate, visualize, and collaborate with data in ways that old-school data analysts have never seen. Visual analytics applications extract value from Big Data through advanced analytics and interactive visualization. Advanced analytics, such as link analysis, enable the integration of complex information in simple visualizations for pattern discovery (for example, seeing the forest through the trees). Interactive visualization refers to the ability to do it yourself through prebuilt charts, graphs, and timelines that tell the complete story. You ve often heard that a picture is worth a thousand words. Would you rather try to extract useful information from the table of data shown in Figure 1-1 or through interactive visualizations displayed in Figure 1-2? Figure 1-2: Data analysis with visual analytics software. For centuries, visualization has been used to support the understanding of complex information. Better understanding of relationships and context is key to visual analytics.

10 Big Data Analytics For Dummies, Centrifuge Special Edition Visual analytics software can improve time-to-discovery by more than 50 percent and make data analysts 10 to 20 times more productive than analysts who use traditional manual methods. Organizations typically recoup their investments in visual analytics tools in a matter of months. They also find it easier to fill data-analyst positions because advanced degrees in mathematics and statistics are no longer required. Analysts who leverage visual analytics applications instantly become data scientists because they now have the ability to test new hypotheses and experiment with data in ways never before possible. Visual representation of the data sharpens focus on what s important (so you can see clearly). If you re excited by the prospects of visual analytics, read on. Chapter 2 describes how to get started.