BIG DATA FUNDAMENTALS

Similar documents
Now, Next and the Future: IT, Big Data and other Implications for RIM. Presented by Michael S. Smith /

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Statistical Challenges with Big Data in Management Science

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data

Exploiting Data at Rest and Data in Motion with a Big Data Platform

How To Use Big Data Effectively

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

WHAT IS BIG DATA? David Bechtold

Hadoop for Enterprises:

We are Big Data A Sonian Whitepaper

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

Mind Commerce. Commerce Publishing v3122/ Publisher Sample

2013 BIG DATA OPPORTUNITIES SURVEY

Big Data a threat or a chance?

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

What happens when Big Data and Master Data come together?

1. Understanding Big Data

Big Data. Fast Forward. Putting data to productive use

Big Data Analytics: 14 November 2013

Analyzing Big Data: The Path to Competitive Advantage

Big Data Introduction, Importance and Current Perspective of Challenges

Research Note What is Big Data?

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

Data Refinery with Big Data Aspects

Dr. John E. Kelly III Senior Vice President, Director of Research. Differentiating IBM: Research

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Delivering new insights and value to consumer products companies through big data

Systems of Discovery The Perfect Storm of Big Data, Cloud and Internet-of-Things

Discover How a 360-Degree View of the Customer Boosts Productivity and Profits. eguide

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Deploying Big Data to the Cloud: Roadmap for Success

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

Beyond Watson: The Business Implications of Big Data

Industry Impact of Big Data in the Cloud: An IBM Perspective

Big Data Use Cases Update

COMP9321 Web Application Engineering

Big Data Executive Survey

Big Data-Challenges and Opportunities

Big Data / FDAAWARE. Rafi Maslaton President, cresults the maker of Smart-QC/QA/QD & FDAAWARE 30-SEP-2015

Big Data. What is Big Data? Over the past years. Big Data. Big Data: Introduction and Applications

Big Data: What defines it and why you may have a problem leveraging it DISCUSSION PAPER

BIG DATA AND ANALYTICS

How Big Data is Different

A New Era Of Analytic

Decoding CAMS: Cloud, Analytics, Mobile, & Social Technologies: A Discussion of the Implications for Enterprises and their Providers

Customer Experience Management

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Sunnie Chung. Cleveland State University

Internet of Things. Point of View. Turn your data into accessible, actionable insights for maximum business value.

Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014

BIG DATA I N B A N K I N G

Big Data : Next Big Thing or Big Distraction?

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013

M2M innovations that will drive the market: Big Data, Cloud and LTE technologies impact?

The benefits and implications of the Cloud and Software as a Service (SaaS) for the Location Services Market. John Caulfield Solutions Director

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Getting Started Practical Input For Your Roadmap

Understanding & Realizing Big Data Potential

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Bringing Together ESB and Big Data

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

Big Data Er Big Data bare en døgnflue? Lasse Bache-Mathiesen CTO BIM Norway

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

Big Data and Analytics: Challenges and Opportunities

Rapid Visualization with Big Data Analytics. Ravi Chalaka VP, Solution and Social Innovation Marketing

No Data Governance, No Actionable Insights

Doing Multidisciplinary Research in Data Science

Generating the Business Value of Big Data:

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Big Data Analytics in Health Care

Industrial Dr. Stefan Bungart

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Problems to store, transfer and process the Big Data 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 1

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

How the oil and gas industry can gain value from Big Data?

United Nations Global Working Group on Big Data for Official Statistics Task Team on Cross-Cutting Issues

MCCM: An Approach to Transform

BIG DATA. - How big data transforms our world. Kim Escherich Executive Innovation Architect, IBM Global Business Services

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

Transcription:

BIG DATA FUNDAMENTALS Timeframe Minimum of 30 hours Use the concepts of volume, velocity, variety, veracity and value to define big data Learning outcomes Critically evaluate the need for big data management across the spheres of government, business, the environment, and society Evaluate the changing and expanding role of information technology in the organisation Discuss the concepts of correlation and prediction as they apply to big data Recommended articles Cloud Security Alliance, 2014, 'Big Data Taxonomy', Big Data Working Group, https://cloudsecurityalliance.org/research/big-data/ (Accessed 06 October, 2014). Evans, P.C. and Annunziata, M. 2012, 'Industrial Internet: Pushing the Boundaries of Minds and Machines', GE Imagination at Work, http://www.ge.com/docs/chapters/industrial_internet.pdf (Accessed 28 August 2014). This opening section deals with some of the fundamentals of big data, including how it is defined. It also looks at the business need and where big data belongs in the organisation. Section overview This section points to the importance of correlation in big data, whereas Section 7.3 (Big Data Analytics) will provide a more in-depth discussion on statistical analysis, and the role of quants vs the role of management. Due to the nature of the subject, some terms may be unfamiliar to you. Refer to the glossary of terms at the end of this study guide for explanations that do not appear in the text. What is Big Data the Five Vs? Companies and governments around the world are collecting vast quantities of digital information about us and our environments using information exchanges (eg e-mails, mobile phones, etc) and sensory devices (eg cameras, heat sensors, etc). The potential for every electronic device to be connected to the internet to produce data (the internet of things) is imminent.

This proliferation of data has exceeded many organisations and governments capacities to store, compute, and analyse the information, quite apart from the implications for security and privacy. Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn't fit the strictures [constraints of the structure] of your database architectures. (Dumbill, 2014) Every day we create 2.5 quintillion bytes of data the issues of storing, computing, security and privacy, and analytics are all magnified by the velocity, volume, and variety of big data, such as largescale cloud infrastructures, diversity of data sources and formats, streaming nature of data acquisition and high volume inter-cloud migration. (Cloud Security Alliance, 2014) The following are types of data that contribute to the big data environment: Traditional enterprise data (eg from CRM systems, transactional ERP data, web store transactions, general ledger data, etc); Machine-generated or sensor data (eg call detail records, weblogs, smart meters, manufacturing sensors, equipment logs, and trading systems data); and Social data (eg customer feedback streams, micro blogging sites such as Twitter, and social media platforms like Facebook). (Oracle, 2013a) The authors cited above, and many other writers, emphasise the significance of big data, why big data is important for business, and where big data fits into organisational structures and decisionmaking processes. We begin by describing the concept of big data by using the characteristics of volume, velocity, variety, veracity and value (the 5Vs). In late 2001, the term 3Vs first appeared in a research note titled 3-D Data Management: Controlling Data Volume, Velocity and Variety (Laney, 2001). Many authors have since added other distinguishing factors (ie veracity and value). While these two additional factors do not necessarily define big data, they do emphasise the importance of data trustworthiness and business value.

FIGURE 1: 5V S OF BIG DATA Volume Value Velocity Veracity Variety (Marr, 2014) High volume IBM (2014) states that, every day we create 2.5 quintillion bytes of data, with 90% of the data in the world today created in the last two years alone (since 2011). Data comes from everywhere: Sensors, eg used to gather climate information; Posts to social media sites and social media advertising, eg Facebook, Qzone (China only), WhatsApp, Google+, etc (refer to Appendix 1 for largest social networks in the world); Digital pictures and videos, eg YouTube; Websites, eg purchase transaction records; and GPS signals, to name a few.

Consider the following example of the New York Stock Exchange s management of data volumes (Melnyk, 2014). New York Stock Exchange NYSE Euronext operates multiple securities exchanges, most notably the New York Stock Exchange (NYSE). NYSE ingests approximately 8 billion transactions per day, which can go up to as much as 15 billion during a crash or surge in the market. Analysts track this data, eg the value of listed companies, performance trends, and fraudulent activity. This market surveillance and analysis includes every transaction from each trading day. Similar to most other large organisations, NYSE was moving data back and forth between the storage systems and their analytic engines which could take over 26 hours to complete. NYSE also has global customers who require 24/7 accesses without any downtime. Now NYSE has reduced the time needed to access business-critical data from 26 hours to 2 minutes. They can also carry out ad-hoc searches over a petabyte of data (one thousand, million, million, bytes or 10 15 ) and they have opened up new analytical capabilities. (Melnyk, 2014) It is possible to hold very large data sets due to the decreasing cost of different types of storage and the availability of cloud-based services. To appreciate the size of a petabyte, if you counted all the bits in one petabyte at one bit per second, it would take 285 million years and if you counted one byte per second, it would take 35.7 million years (McKenna, 2014). Refer to Appendix 2 for more on bytes. Digital mapping Google first entered digital mapping in 2004 and launched Google Maps and Google Earth in 2005. Today, Google offers its users over 20 petabytes (21.5 billion megabytes) of imagery from satellite images to aerial photos to 360 degree street view images. (McKenna, 2014) High velocity A second characteristic that implies big is high velocity (or frequency) of the data. Velocity, defined by McKenna (2014), is the rate at which data arrives at the enterprise and is processed or well understood.

But there is a big difference between the speed of the information being received and the information being processed, as the example below shows. The doubling of computing power every 18 months is nothing compared to a big algorithm An algorithm is a set of rules that can be used to solve a problem a thousand times faster than conventional computational methods could. One colleague, faced with a mountain of data, figured out that he would need a $2-million computer to analyse it. Instead they came up with an algorithm within two hours that would do the same thing in 20 minutes on a laptop: a simple example, but illustrative. (Shaw, 2014) McKenna (2014) argues that from a governance perspective, powerful analytics engines can apply analytics to the data as it flows across the wire, and you can glean insight from that data without having to store it, you might not have to subject this data to retention policies, and that can result in huge savings for your IT department. Consider that social media messages go viral in seconds and technology allows organisations to analyse that data while it is being generated without ever putting it into their databases. Through cloud-based information exchange there is an opportunity for organisations to pull varying data sets into a single view. Social media Twitter users are estimated to generate nearly 100,000 tweets a minute. This is in addition to 700,000 Facebook posts and more than 100,000 million e-mails a minute. (State Tech, 2013) Driverless cars Big data projects include using surrounding (big) data to get a car from A to B this requires high velocity data in real-time. High variety Structured data is what we typically find on our traditional databases, eg customer relationship management records and statistics relating to financial transactions. Big data brings together structured data and the unstructured data available from the multiple sources we described under data volume above social media conversations, photos, sensor data, video or voice recordings etc (Marr, 2014; State Tech, 2014). Retailers Retailers combine data from social media such as Twitter with their own in-house data collected from point-of-sale terminals and loyalty cards to produce rich and detailed information for marketing.