What is Big Data? BCS Aberdeen Branch 6 November 2014



Similar documents
How To Handle Big Data With A Data Scientist

Transforming the Telecoms Business using Big Data and Analytics

Turning Big Data into Big Decisions Delivering on the High Demand for Data

How To Scale Out Of A Nosql Database

The emergence of big data technology and analytics

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Data Modeling for Big Data

The 3 questions to ask yourself about BIG DATA

Applications for Big Data Analytics

Big Systems, Big Data

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Are You Ready for Big Data?

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Integrating a Big Data Platform into Government:

REAL-TIME BIG DATA ANALYTICS

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Big Data: Tools and Technologies in Big Data

NoSQL Databases. Nikos Parlavantzas

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Architecting for the Internet of Things & Big Data

CS 378 Big Data Programming

Are You Ready for Big Data?

Where is... How do I get to...

The Internet of Things and Big Data: Intro

So What s the Big Deal?

Information Builders Mission & Value Proposition

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

InfiniteGraph: The Distributed Graph Database

Industry Impact of Big Data in the Cloud: An IBM Perspective

Customized Report- Big Data

Big Data. What is Big Data? Over the past years. Big Data. Big Data: Introduction and Applications

Enterprise Operational SQL on Hadoop Trafodion Overview

ANALYTICS BUILT FOR INTERNET OF THINGS

NoSQL for SQL Professionals William McKnight

The InterNational Committee for Information Technology Standards INCITS Big Data

BIG DATA IN BUSINESS ENVIRONMENT

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Lecture Data Warehouse Systems

BIG DATA What it is and how to use?

Big Data Analytics Nokia

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Chapter 7. Using Hadoop Cluster and MapReduce

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Big Data and Industrial Internet

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Hadoop implementation of MapReduce computational model. Ján Vaňo

Spatio-Temporal Networks:

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

How To Use Big Data For Telco (For A Telco)

BIRT in the World of Big Data

What happens when Big Data and Master Data come together?

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Survey of Big Data Architecture and Framework from the Industry

Choosing The Right Big Data Tools For The Job A Polyglot Approach

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

CISC 432/CMPE 432/CISC 832 Advanced Database Systems

Cloud Big Data Architectures

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Deploying Big Data to the Cloud: Roadmap for Success

Parallel Data Warehouse

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

A Survey on Big Data Concepts and Tools

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

How To Improve Performance In A Database

Scalable ecommerce with NoSQL. Dipali Trivedi

Infrastructures for big data

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Big Data and Analytics in Government

Big Data Technologies Compared June 2014

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

An Oracle White Paper June Oracle: Big Data for the Enterprise

Large scale processing using Hadoop. Ján Vaňo

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Reference Architecture, Requirements, Gaps, Roles

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

How To Use Big Data Effectively

Getting Real Real Time Data Integration Patterns and Architectures

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Big Data a threat or a chance?

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Databases 2 (VU) ( )

Transcription:

What is Big Data? BCS Aberdeen Branch 6 November 2014

Keith Gordon Soldier Teacher Data Manager Engineer Information Systems Professional Standards Expert Big Data Sceptic

What they say The overeager adoption of big data is likely to result in catastrophes of analysis comparable to a national epidemic of collapsing bridges. Despite recent claims to the contrary, we are no further along with computer vision than we were with physics when Isaac Newton sat under his apple tree. Michael I. Jordan Pehong Chen Distinguished Professor University of California, Berkeley.

Agenda Defining Big Data Storing Big Data Using Big Data Final thoughts

Towards a definition It is only information 5

Towards a definition The IET Seminar on Big Data: Turning big challenges into big opportunities I am confused!!

Towards a definition The 3 Vs Semistructured & Unstructured Batch Structured Big Data Streaming Data Zettabytes Terabytes Volume

Towards a definition PC ERP E-business Volume Transac -tion Data Velocity GPS RFID Video surveillance Interacti -ve data Sensor data Variety Weibo Social network forum China Electronics Standardization Institute

Towards a definition Adding to the Vs Volume where the amount of data is sufficiently large so as to require special considerations Variety where the data consists of multiple types of data potentially from multiple sources; this variety can be combinations of structured data, semistructured data and unstructured data Velocity where the data is produced at high rates and operating on stale data is not valuable Value where the data has perceived or quantifiable benefit to the enterprise or organization using it Veracity where the correctness of the data can be assessed Variability where the change in velocity is important 9

We have a definition! Big Data is a data set(s) with characteristics (e.g. volume, velocity, variety, variability, veracity, etc.) that for a particular problem domain at a given point in time cannot be efficiently processed using current/existing/established/ traditional technologies and techniques in order to extract value. This definition distinguishes Big Data from business intelligence and traditional transactional processing while alluding to a broad spectrum of applications that includes them. The ultimate goal of processing Big Data is to derive differentiated value that can be trusted (because the underlying data can be trusted). ISO/IEC JTC1 Study Group on Big Data 10

Agenda Defining Big Data Storing Big Data Using Big Big Data Final thoughts

SQL databases Based on the relational model of data - Grounded in set theory - Mature and well understood - Concurrency support and transaction support - Fixed schema - Now object-relational But, there are problems - Impedance mismatch - Do not scale well - Poor at semi-structured and unstructured data - Cannot handle new datatypes 12

NoSQL databases Not only SQL or Never SQL Schema-less Scalable (work well in clusters) Four basic categories - Key-value stores - Document stores - Wide column stores - Graph databases 13

Key-value stores abcdef27 ghjk5l88 Name: Arvind Age: 20 Type: Student Likes: play cricket Studies: History Name: Nitin Age: 25 Type: Student Likes: play cricket, play cards

Document stores Customer Records Document First_Name: Andrew Last_Name: Brust Address: Street_Address: 123 Main St City: New York State: NY Zip: 10014 Orders: Most_Recent_Order: 252 Order Records Document Id: 252 Total_Price: 300 USD Item_1: 56432 Item_2: 98726 Document First_Name: Napoleon Last_Name: Bonaparte Address: Street_Address: 29, Rue de Rivoli City: Paris Postal_Code: 75007 Country: France Orders: Most_Recent_Order: 265 Document Id: 265 Total_Price: 2,500 EUR Item_1: 86413 Item_2: 77904

Wide-column stores 7b976c48 name: Bill Waterson state: DC birth_date: 1953 7c8f33e2 name: Howard Tayler state: UT birth_date: 1968 7dd2a3630 name: Randall Monroe state: PA 7da30d76 name: Dave Kellett state: CA

Graph databases Id: 2 Name: Bob Age: 22 Id: 1 Name: Alice Age: 18 Id: 3 Type: Group Name: Chess

Performance comparison Data Model Performance Scalability Flexibility Complexity Relational database variable variable low moderate Key-value store high high high none Document store high variable high low Wide-column store high high moderate low Graph database variable variable high high

Size and complexity Data size Key-value store Wide-column store Document store Graph database Data complexity

Agenda Defining Big Data Storing Big Data Using Big Data Final thoughts

Initial players 21

Following closely 22

Use cases key-value stores Suitable for: - Session information - User profiles, preferences - Shopping basket/trolley information Not suitable for: - Relationships among data - Multi-operation transactions - Query by data 23

Use cases document stores Suitable for: - Event logging - Content management systems - Real-time analytics - E-commerce applications Not suitable for: - Complex transactions - Queries against varying aggregates 24

Use cases wide-column stores Suitable for: - Event-logging - Content management systems - Counters (particularly in web applications) Not suitable for: - Where ACID transactions are needed 25

Use cases graph databases Suitable for: - Connected data (friends, employees, etc) - Routing, despatch, location-based services - Recommendations Not suitable for: - Multiple updates 26

Changes in the data life cycle The relational paradigm Collection Big Data Volume use case Collection Big Data Velocity use case Collection Preparation Storage Preparation Storage Preparation Analysis Analysis Analysis Action Action Action Storage

MapReduce Google s greatest lasting contribution to computer science 327 Reduce 154 44 129 Reduce Reduce Reduce 35 18 101 Map Map Map 32 0 12 Map Map Map 12 98 19 Map Map Map aaa: 35 bbb: 18 ccc: 101 ddd: 0 eee: 32 fff: 12 ggg: 12 hhh: 98 jjj: 19 Node A Node B Node C 28

Agenda Defining Big Data Storing Big Data Using Big Data Final thoughts

Final thought #1 From Redmond E., Wilson J.R. Seven Databases in Seven Weeks 30

Final thought #2

Final thought #3 Do not try to solve a problem you do not have! 32

Agenda Defining Big Data Storing Big Data Using Big Data Final thoughts

What is Big Data? BCS Aberdeen Branch 6 November 2014