NoSQL for SQL Professionals William McKnight



Similar documents
Google Bing Daytona Microsoft Research

How To Scale Out Of A Nosql Database

BIG DATA TRENDS AND TECHNOLOGIES

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Integrating Big Data into the Computing Curricula

Can the Elephants Handle the NoSQL Onslaught?

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Oracle Big Data SQL Technical Update

INTRODUCTION TO CASSANDRA

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Big Data Are You Ready? Thomas Kyte

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Introduction to Apache Cassandra

Big Data on Microsoft Platform

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Practical Cassandra. Vitalii

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Data Modeling for Big Data

The Future of Data Management

TUT NoSQL Seminar (Oracle) Big Data

Architectures for Big Data Analytics A database perspective

NextGen Infrastructure for Big DATA Analytics.

How To Handle Big Data With A Data Scientist

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Big Data Course Highlights

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

Virtualizing Apache Hadoop. June, 2012

A survey of big data architectures for handling massive data

Choosing The Right Big Data Tools For The Job A Polyglot Approach

NoSQL Data Base Basics

Big Data: Are You Ready? Kevin Lancaster

Lecture Data Warehouse Systems

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Big + Fast + Safe + Simple = Lowest Technical Risk

Large scale processing using Hadoop. Ján Vaňo

In-memory computing with SAP HANA

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

So What s the Big Deal?

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

An Oracle White Paper October Oracle: Big Data for the Enterprise

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

ANALYTICS BUILT FOR INTERNET OF THINGS

There s no way around it: learning about Big Data means

Luncheon Webinar Series May 13, 2013

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Big Data Technologies Compared June 2014

Building your Big Data Architecture on Amazon Web Services

How To Use Hp Vertica Ondemand

Practical Approaches to Big Data & Analytics: From Infrastructure to

CitusDB Architecture for Real-Time Big Data

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Navigating the Big Data infrastructure layer Helena Schwenk

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

An Oracle White Paper June Oracle: Big Data for the Enterprise

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Getting Started Practical Input For Your Roadmap

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Data Management in SAP Environments

Hadoop IST 734 SS CHUNG

Challenges for Data Driven Systems

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Data Refinery with Big Data Aspects

Next-Generation Cloud Analytics with Amazon Redshift

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Using In-Memory Computing to Simplify Big Data Analytics

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Oracle Database 12c Plug In. Switch On. Get SMART.

BIG DATA What it is and how to use?

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

An Approach to Implement Map Reduce with NoSQL Databases

Why Big Data in the Cloud?

HDP Hadoop From concept to deployment.

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

An Oracle White Paper September Oracle: Big Data for the Enterprise

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

The 4 Pillars of Technosoft s Big Data Practice

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Transcription:

NoSQL for SQL Professionals William McKnight Session Code BD03

About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to Pfizer, Scotiabank, Teva Pharmaceuticals, Verizon, and many other Global 1000 companies A prolific writer with hundreds of articles, blogs and white papers in publication Focused on delivering business value and solving business problems utilizing proven, streamlined approaches to information management Former Fortune 50 Information Technology executive

No More

The No-Reference Architecture DATA STREAM PROCESSING GRAPH DATA HADOOP DATA WAREHOUSE DATA MARTS USERS/REPORTS MDBS RDBMS LEGACY SOURCES DATA WAREHOUSE APPLIANCE NOSQL COLUMNAR DATABASES ELEMENTS IN THE CLOUD IN-MEMORY DATABASES MASTER DATA DATA INTEGRATION SYNDICATED DATA

The Relational Database Data Page Page Header Records 1120Aris Doug Johnson Practice Director 206-676-5636 doug.johnson@aris.com 1121Stolt Offshore MS Craig Lennox Mr +66 1226 71269 craig.lennox@stoltoffshore.com 1122Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557 mark.kohls@medtronic.com Page Footer McKnight Consulting Group, 2010 Row IDs

What does Big Data Mean? Data in NoSQL - No SQL allowed or Not Only SQL? Sensor, social and web data? Data in a system that does not support SQL? A system with petabytes? Hadoop?

Why the Sudden Explosion of Interest? An increased number and variety of data sources that generate large quantities of data Sensors (e.g. location, RFID, ) Social (e.g. twitter, wikis, ) Web clicks Realization that data was too valuable to delete Even when little signal to lots of noise Dramatic decline in the cost of hardware, especially storage If storage was still $100/GB there would be no big data revolution underway

Why NoSQL for Big Data More data model flexibility JSON as a data model (think XML) No schema first requirement; load first Faster time to insight from data acquisition Relaxed ACID Eventual consistency Willing to trade consistency for availability ACID would crush things like storing clicks on Google Low upfront software costs Utilizes Java Full Scans Programmers love the freedoms

Hadoop, MapReduce and Big Data Parallel programming framework Hadoop is an open source distributed file system (HDFS) plus MapReduce Hadoop is used by those facing webscale-data challenges

Scale Up vs. Scale Out Single, fixed-resource controller Growth through adding shelves Multiple controllers Processing power in each unit of disk

Who uses Hadoop 40,000+ nodes running Hadoop Research for Ad systems and web search Product search indexes Analytics from user sessions Log analysis for reporting and analytics and machine learning Log analysis, data mining, and machine learning Large scale image conversion High energy physics, genomics, Digital Sky Survey

ACID Atomicity full transactions pass or fail Consistency database in valid state after each transaction Isolation transactions do not interfere with one another Durability transactions remain committed no matter what (i.e., crashes)

What Gives the CIO Heartburn About NoSQL Developer Skills Lack of ACID Compliance Tools lacking and Projects Flawed Fast Nature of Unburdened Projects Different Developers Schema-less/lite Models Lack of Payback Methodology

DFS Block Placement Example: write affinity to minimize cross-rack network traffic to tolerate switch failures

File System Summary Highly scalable 1000s of nodes and massive (100s of TB) files Large block sizes to maximize sequential I/O performance No use of mirroring or RAID. Reduce cost Use one mechanism (triply replicated blocks) to deal with a wide variety of failure types rather than multiple different mechanisms Negatives Lack of control over record placement Makes it impossible to employ many optimizations successfully employed by parallel DB systems

MapReduce 1. Take a large problem and divide it into sub-problems 2. Perform the same function on all sub-problems 3. Combine the output

MapReduce (MR) Programming framework (library and runtime) for analyzing data sets stored in HDFS MapReduce jobs are composed of two functions Map Reduce User only writes the Map and Reduce functions MR framework provides all the glue and coordinates the execution of the Map and Reduce jobs on the cluster. Fault tolerant Scalable

A Quick Summary Data Model Parallel DB Systems Structured data with known schema NoSQL Any data will fit in any format (un)(semi)structured Hardware Configuration Fault Tolerance Purchased as an appliance Failures assumed to be rare No query level fault tolerance User assembled from commodity machines Failures assumed to be common Simple, yet efficient, fault tolerance. Where to do big data analytics?

Key-Value Stores NoSQL OLTP A record may look like: Book: Of Mice and Men": Author: Steinbeck Great for unstructured data centered on a single object. Typically used as a cache for data frequently requested by web applications such as online shopping carts or social media sites.

Document Stores A record may look like: id => 12345, name => Jane, age => 22, address => number => 123 street => Main Often deployed for web-traffic analysis, social gaming, content stores, user-behavior/action analysis, or log-file analysis in real time.

Graph Stores: Emphasizing Relationships as Primary Data Based on Graph Theory Vertices (nodes), edges (relations) and properties Navigating social networks, configurations and recommendations i.e., Get the cheapest flights from DFW to SYD leaving on 7/12/13 with a minimum number of stops and each stop less than 2 hours. i.e., Social Networks Churn and Offer Management

Operational Big Data Platform Selection Data Size Key-Value Document Column Store SQL Graph Workload Complexity

The NoSQL Challenge

There s No Technology Silver Bullet Source: ebay, ebay Extreme Analytics in a Virtual World, Nov 10, 2010 24

Enablers for NoSQL

Data Integration Increasingly data first lands in the unstructured universe NoSQL stores are big data "EL" tools The Need for Data Integration with the Enterprise

Data Virtualization Enterprise Data Virtualization Data Warehouses Marts & Cubes Operational Data Stores Transactional Sources File Systems Big Data Enterprise-wide data fabric providing consistent and timely access to all structured and semi-structured data 2011 Composite Software, Inc. / Composite Proprietary

Infrastructure Strategy, including Cloud The benefits of cloud computing are: On-Demand and Self Service Broad Network Access Resource Pooling Rapid Elasticity Measured Service Source: Cloud Security and Privacy. An Enterprise Perspective on Risks&Compliance (Mather, Kumaraswamy & Latif)

What Will Motivate IT to Adopt NoSQL? Continuation of Big Vendor Legacy Seen as Too Expensive Scaling: Data > 1 Machine Schema Flexibility Mandatory Requirements to Keep Multiple Years of Highly Detailed Data Tired of Losing Deals to More Agile Hybrid IT Organizations NoSQL Tool Marketplace Innovations

NoSQL for SQL Professionals William McKnight