Generating the Business Value of Big Data:



Similar documents
The Next Wave of Data Management. Is Big Data The New Normal?

Complex, true real-time analytics on massive, changing datasets.

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

Ubuntu and Hadoop: the perfect match

How To Handle Big Data With A Data Scientist

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

There s no way around it: learning about Big Data means

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Hadoop. Sunday, November 25, 12

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Transforming the Telecoms Business using Big Data and Analytics

Big Data. Fast Forward. Putting data to productive use

Data Refinery with Big Data Aspects

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Big Data on Microsoft Platform

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Are You Ready for Big Data?

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Apache Hadoop: The Big Data Refinery

White Paper: Datameer s User-Focused Big Data Solutions

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

Are You Ready for Big Data?

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Open source Google-style large scale data analysis with Hadoop

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Big Data and Hadoop for the Executive A Reference Guide

The 3 questions to ask yourself about BIG DATA

Bringing Together ESB and Big Data

BIG DATA CHALLENGES AND PERSPECTIVES

Data Mining in the Swamp

BIG DATA TRENDS AND TECHNOLOGIES

How To Scale Out Of A Nosql Database

Testing Big data is one of the biggest

Big Data: Beyond the Hype

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

A Brief Outline on Bigdata Hadoop

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

The Big Picture on Big Data. Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Search and Real-Time Analytics on Big Data

We are Big Data A Sonian Whitepaper

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Cloudera Enterprise Data Hub in Telecom:

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Microsoft Big Data. Solution Brief

Comprehensive Analytics on the Hortonworks Data Platform

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

INTRODUCTION TO CASSANDRA

Blazent IT Data Intelligence Technology:

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

BIG DATA FUNDAMENTALS

White Paper: Hadoop for Intelligence Analysis

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

So What s the Big Deal?

Integrating a Big Data Platform into Government:

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

BIG DATA What it is and how to use?

Auto-Classification for Document Archiving and Records Declaration

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

Big Data Integration: A Buyer's Guide

International Journal of Innovative Research in Computer and Communication Engineering

Integrated Social and Enterprise Data = Enhanced Analytics

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

Trustworthiness of Big Data

Big Data, Big Traffic. And the WAN

Unlock the business value of enterprise data with in-database analytics

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

Predicting & Preventing Banking Customer Churn by Unlocking Big Data

Big Data & Tourism. Rajendra Akerkar

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

How To Use Hadoop For Gis

Coverity Scan. Big Data Spotlight

Hexaware E-book on Predictive Analytics

Indian Journal of Science The International Journal for Science ISSN EISSN Discovery Publication. All Rights Reserved

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013

Transcription:

Leveraging People, Processes, and Technology Generating the Business Value of Big Data: Analyzing Data to Make Better Decisions Authors: Rajesh Ramasubramanian, MBA, PMP, Program Manager, Catapult Technology Roberto Berezdivin, Ph.D. Systems Architect, Catapult Technology 11 Canal Center Plaza, Floor 2 Alexandria, VA 22314 240-482-2100 www.catapulttechnology.com

Introduction Big Data refers to large data sets whose size and disparity makes it difficult, if not impossible, for relational database software tools to capture, store, manage, and analyze the data. Relational databases, typical of structured data, cannot handle the scale and agility challenges that face modern applications, nor were they built to take advantage of the relatively inexpensive, cloud-based storage and processing power that is now available. The platforms, tools, and software available to store, process, and analyze the large datasets of unstructured data prevalent today are collectively known as Big Data technologies. As more and more companies incorporate efficient and scalable technology, data management and data storage is no longer the issue. Organizations generate constant data, through the use of the Internet, mobile applications, social media, internal documents, content and automated processes employed by the organization. The solutions available to the big Internet players Opportunity: 80-85% of global to process and analyze this voluminous data are data is unstructured. publicly available by open-source software communities. Meanwhile, the advent of cloud-based solutions has dramatically lowered the cost of storage and processing. Virtual file systems, either open source or vendor-specific, has helped transition from a managed infrastructure to a service-based approach. In addition, innovative designs for database management and cost-effective ways to support massively parallel processing have led to new products like nosql databases and the Apache Hadoop MapReduce platform. NoSQL was developed specifically to respond to the massive data of today, and improve upon the shortcomings of relational databases. Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on commodity hardware. According to a recent study, 80-85 percent of global data that exists is unstructured, meaning that it has no pre-defined data model or is not organized in a pre-defined manner. It can come from such disparate sources as social media platforms (e.g., Facebook, Twitter); email; online purchases; online profiles; content management system footprint; and photos. Page 2

The large Internet players are already discovering great value in their data by identifying new customers, improving their products and service offerings, expanding their markets, and increasing profitability. The real questions for business now are: How do you put all this captured and stored data to valuable use? How do you analyze it to make better business decisions? The 3 V s The 3 V s that define the Big Data are: 1. Volume Currently there is exponential growth in data storage, as data is not just textual but comes in the form of videos, music, images, clickstream and blog content, often through social media channels. It has been recently projected that every individual is predicted to generate over 20 petabytes of data over the course of his/her lifetime. A recent projection by Paypal cites that every individual is predicted to generate over 20 petabytes of data over the course of his/her lifetime. (For context, a terabyte is 10 12 bytes of digital information; a petabyte is 10 15 bytes of digital information.) According to International Data Corporation (IDC), the digital universe will grow to 35 zettabytes (i.e. 35 trillion terabytes) globally by 2020. The point is, data is exploding. The response to this data boom, as well as the ubiquity of the cloud, will be a significant decrease in a your IT capital expenditure, as many organizations invest in data virtualization. At the same time, there will be an increase in operating expenditure as organizations move towards the use and exploitation of that data using cloud-based storage and processing solutions. 2. Velocity The explosion of data is happening almost in real time, as people turn to social media for updates about what is occurring in the world around them. No one waits for news anymore; the speed with which we are informed has literally become fractions of seconds. An interesting example was the earthquake of 2011 southwest of Washington D.C.; the first news of it arrived via Twitter minutes before the tremor was felt. Page 3

As more and more data is produced, it must be collected in shorter timeframes. Therefore, organizations require tools and platforms for real-time processing of data in order to achieve, and maintain, a competitive advantage in the marketplace. 3. Variety In the real world, data comes in different formats, from structured, data typically data contained in relational databases and spreadsheets with specific classification to unstructured data, which can t be as neatly classified (e.g., videos, images, SMS, social media content, PDFs, etc.) Veracity (Value) The accuracy, truthfulness, and quality of data are the most important aspect that fuels new insights into your organization and provides high value. The data that organizations collect is all about supporting the decisions that can have a major impact on the organization as a whole. Businesses are going to want as much quality information as possible to support the business case. Establishing trust in Big Data solutions probably presents the biggest challenge; but once overcome, it will introduce a solid foundation for successful decision-making within your organization. There is more data than ever from which business decisions can be made. According to a study done by Avanade, Inc., 46 percent of companies report they have made an inaccurate business decision as a result of bad or outdated data. In many cases, useful and necessary data to make business decisions are not collected and well-meaning managers end up guessing. It is therefore critical for organizations to address this issue and position itself to react quickly to fast-changing business conditions. For example, a user posts something like, I am interested in buying a new smart phone for my wife on her birthday on social media. A smart phone manufacturer s data engineers who analyze this unstructured data can infer information about the shopper s interest, such as: 1. He is married; 2. He is looking for a smart phone; and 3. The phone will be used by his wife. Page 4

In addition, if he is a previous or current customer, the phone manufacturer can pull his profile and better target the individual with various options compared to competitors. Harnessing this kind of unstructured data will help increase the the phone manufacturer s sales and revenue and target the customer with better products. Imagine that this kind of information is posted by users in various social media. The volume of information that is available for organizations to analyze and better target their customers help companies increase 46% of organizations report their market reach. While organizations can acquire a negative online reputation, data can be leveraged as a corrective. For example, a passenger is traveling from one city to another by bus. The bus breaks down on the way to its destination. The passenger takes pictures of the incapacitated bus and tweets those images with complaints about the bus breaking down. Smart data mining from the bus line s data analysis team could provide this information to their customer service department in the form of alerts. Customer service can then return a tweet that apologizes for the inconvenience, ensures fast repair, and promises better services By offering a free ticket back from the trip s original destination, or some other accommodation, the bus line can also rebuild goodwill and fortify its customer retention strategy. The bus line s Big Data solution has mined unstructured data to return an actionable solution. Currently, the challenge that businesses face is to transform raw data into meaningful information and provide actionable insights for better business decision-making. Basically, organizations that mine their data warehouses, transactional systems, and the social media footprints of their customers can benefit by discovering the preferences of their customers. They can establish a meaningful relationship between customer segments and product segments with a higher degree of correlation. they have made an inaccurate business decision as a result of bad or outdated data. Page 5

The diagram below encompasses Big Data Management, the technology used, and the benefits for an enterprise: Technology Implementation: A Case Study A Catapult customer implemented a new web portal and wanted to answer basic questions, such as How many people visited the portal? and On average, how much time did people spend on the portal? Catapult leveraged Apache Hadoop, the open-source platform that is applied to Big Data. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage (Apache Hadoop ). In order to answer the customer s business questions, Catapult leveraged web server access logs and an http requests log. Catapult used the Hadoop Distributed File System (HDFS), which breaks down data into smaller pieces for easier processing, and wrote a MapReduce program to identify the unique values based on the IP address. (MapReduce basically breaks down individual data elements, thus reducing the size of a data set. The reduce job takes the output from a map as input and combines those data segments into a still smaller data set.) The session ID plays a critical role in the mining of the web logs. This session information provided the vital information of how long visitors spent on the portal. The MapReduce was used to compute this in fully distributed modes of the cluster. Page 6

Finally, the parsed log files were stored as text file in HDFS. This parsed log file was loaded to a Hive data warehouse. (Built on top of Hadoop, Hive provides data summarization, query, and analysis). By writing nosql queries, Catapult got answers to the above business questions. With this information, the customer targeted the user base with appropriate application design and better user experience, which led to more quality information. Security and Privacy both a technical and sociologi- The privacy of data is another huge concern, and cal issue; a solution should be one that increases in the context of Big Data. Organizations should understand that managing data tives. addressed from both perspec- privacy is both a technical and sociological issue and a solution should be addressed from both perspectives. As enterprises become more and more dependent on data to drive business decisions (whether the data is available publicly or through internal collection processes), they face the risk of inaccurate, incomplete, and fraudulently manipulated data. In order to avoid these risks, organizations need to verify and validate all the data sources from which they analyze and use tools and processes to check for vulnerabilities. Enterprises should have a proper Big Data governance process in order to avoid misleading data and additional unexpected costs associated with it. Implementing adequate controls through the governance process ensures that the information that businesses depend on is accurate, consistent, and good quality. In addition, data governance must be measured at three distinct levels: 1. At the program level, at which the organization identifies and highlights the qualitative level and the impact the data governance process delivers; 2. At the operational level, at which the organization monitors on how data is behaving against the companies set policy and baseline; 3. And at the quantitative level, at which the organization measures the effectiveness and efficiency of data management results, assessing quantitative business values like revenue growth, cost savings, risk reduction, internal processes, and customer retention. For example, as part of a data analysis contract with Department of Transportation (DOT) Pipeline and Hazardous Materials Safety Administration (PHMSA), Catapult provided data management activities Managing data privacy is Page 7

aligned to the agency s data management policy. The policy identifies roles and responsibilities for data owners, stewards, and managers, as well as rulemaking impacts on collected data set and data management procedures. As part of the agency s data governance effort, Catapult contractors developed comprehensive data policies, standards, and procedures and monitored and enforced conformance with those data policies, standards, and architecture. In addition, Catapult contractors manage and resolve data related issues and communicate and promote the value of data assets within the agency. Conclusion Through Big Data analytics, the potential has never been greater to optimize business processes, to drive product and service innovation, and to enable enterprise controls. By leveraging Big Data analytics, Catapult Technology can help your organization: Measure the incremental cost of managing and analyzing unstructured data sets against the incremental benefits gained over and above what can be achieved using structured data sets. Develop a data culture in which the management, employees, and strategic partners are active participants in managing a meaningful data lifecycle. Harness new sources of information and take responsibility over accurate data creation, dissemination, data governance, quality and maintenance Enable businesses to turn data from information into actionable insights. Catapult s Big Data consultants are adept at: Collecting, cleaning, and integrating unstructured data from multiple sources, while creating a road map that helps organizations realize their business value by deriving greater insights from their data. Developing a migration strategy, creating prototypes, and engaging in full-fledged deployment of Big Data solutions. Page 8

Accommodating privacy, security, and data governance aspects of Big Data. Translating Big Data analytical findings into appropriate risk management and marketing strategies that drive business value. Hadoop/HDFS, MapReduce, HBase, Pig, NoSQL data stores (Cassandra, MongoDB). New businesses are emerging based on harvesting Big Data and by combining data and analytics services. Disruptive change is being implemented across industries both horizontally and vertically. Contact Catapult Technology so we can help your organization take advantage of Big Data technologies and build a culture that infuses analytics everywhere! Call 240-482-2100 or email info@catapulttechnology.com References: McKinsey Global Institute: Big data: The next frontier for innovation, competition, and productivity The White House Big Data and Privacy Review Report: http://www.whitehouse.gov/sites/ default/files/docs/big_data_privacy_report_5.1.14_final_print.pdf Daniel Austin, Principal Architect at PayPal: http://www.kdnuggets.com/2014/04/big-datainnovation-summit-2014-highlights-day1.html Apache Hadoop : http://hadoop.apache.org/ Avanade Inc., Global Survey: The Business Impact of Big Data, 2010: http://www.avanade. com/en-us/approach/research/pages/big-data.aspx# www.idc.com (International Data Corporation) Page 9

11 Canal Center Plaza, Floor 2 Alexandria, VA 22314 240-482-2100 www.catapulttechnology.com info@catapulttechnology.com 09/02/14 QP1560-106