Vertica Live Aggregate Projections

Similar documents

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

How To Scale Out Of A Nosql Database

Big Data Data-intensive Computing Methods, Tools, and Applications (CMSC 34900)

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Integrating Big Data into the Computing Curricula

Using distributed technologies to analyze Big Data

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

How To Use Hp Vertica Ondemand

HP Vertica. Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop. Helmut Schmitt Sales Manager DACH

HP Vertica and MicroStrategy 10: a functional overview including recommendations for performance optimization. Presented by: Ritika Rahate

Columnstore in SQL Server 2016

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

SEIZE THE DATA SEIZE THE DATA. 2015

Architectures for Big Data Analytics A database perspective

INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD METAMARKETS

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Graph Mining on Big Data System. Presented by Hefu Chai, Rui Zhang, Jian Fang

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

CS54100: Database Systems

Apache Kylin Introduction Dec 8,

Innovative technology for big data analytics

In-Memory Data Management for Enterprise Applications

5 Signs You Might Be Outgrowing Your MySQL Data Warehouse*

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Inge Os Sales Consulting Manager Oracle Norway

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Lecture Data Warehouse Systems

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

A Comparison of Approaches to Large-Scale Data Analysis

The Vertica Analytic Database Technical Overview White Paper. A DBMS Architecture Optimized for Next-Generation Data Warehousing

Optimizing Performance. Training Division New Delhi

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

CSE-E5430 Scalable Cloud Computing Lecture 2

Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Trafodion Operational SQL-on-Hadoop

Unified Big Data Processing with Apache Spark. Matei

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

NoSQL for SQL Professionals William McKnight

Cloud Computing at Google. Architecture

Lecture 10 - Functional programming: Hadoop and MapReduce

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

How To Handle Big Data With A Data Scientist

NoSQL. Thomas Neumann 1 / 22

Parquet. Columnar storage for the people

How to Choose Between Hadoop, NoSQL and RDBMS

CitusDB Architecture for Real-Time Big Data

One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008

Can the Elephants Handle the NoSQL Onslaught?

Big Systems, Big Data

Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.

Hadoop IST 734 SS CHUNG

NoSQL and Hadoop Technologies On Oracle Cloud

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010

Navigating the Big Data infrastructure layer Helena Schwenk

Practical Cassandra. Vitalii

SQL Server 2012 Performance White Paper

Oracle Big Data, In-memory, and Exadata - One Database Engine to Rule Them All Dr.-Ing. Holger Friedrich

SQL Server Business Intelligence on HP ProLiant DL785 Server

Big Data and Its Impact on the Data Warehousing Architecture

Oracle Database In-Memory The Next Big Thing

Business Intelligence and Column-Oriented Databases

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect

2015 Ironside Group, Inc. 2

A survey of big data architectures for handling massive data

Hadoop Ecosystem B Y R A H I M A.

Hadoop and Map-Reduce. Swati Gore

Splice Machine: SQL-on-Hadoop Evaluation Guide

Database Internals (Overview)

Oracle InMemory Database

Energy Efficient MapReduce

Big Data Analytics in LinkedIn. Danielle Aring & William Merritt

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

In Memory Accelerator for MongoDB

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

An Approach to Implement Map Reduce with NoSQL Databases

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Performance Verbesserung von SAP BW mit SQL Server Columnstore

Relational Database Basics Review

Who am I? Copyright 2014, Oracle and/or its affiliates. All rights reserved. 3

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

I/O Considerations in Big Data Analytics

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Unit Storage Structures 1. Storage Structures. Unit 4.3

Alternatives to HIVE SQL in Hadoop File Structure

Cost-Effective Business Intelligence with Red Hat and Open Source

Transcription:

Vertica Live Aggregate Projections Modern Materialized Views for Big Data Nga Tran - HPE Vertica - Nga.Tran@hpe.com November 2015

Outline What is Big Data? How Vertica provides Big Data Solutions? What are Materialized Views (MVs)? What are the Roles of MVs in Big Data? How does Vertica implement their MVs? Why Vertica names their MVs Live Aggregare Projections (LAPs)? What are the differences between LAPS & MVs?

What is Big Data? 3

What is Big Data? What you have and what you want? 4

What is Big Data? What you have and what you want? You have a lot of data. 5

What is Big Data? What you have and what you want? You have a lot of data. You want to mine your data for treasures 6

What is Big Data? What you have and what you want? You have a lot of data. You want to mine your data for treasures You want sub-second mining processes. Why? 7

What is Big Data? What you have and what you want? You have a lot of data. You want to mine your data for treasures You want sub-second mining processes. Why? Treasures no more after that time. 8

What is Big Data? What you have and what you want? You have a lot of data. You want to mine your data for treasures You want sub-second mining processes. Why? Treasures no more after that time. Treasures help with urgent decisions. 9

What is Big Data? Recap: What you have and what you want? 10

What is Big Data? Recap: What you have and what you want? HAVE: a lot of data. WANT: dig them for treasures in sub-second 11

What is Big Data? How you make it? 12

What is Big Data? How you make it? Places to store and access your data safely & efficiently. 13

What is Big Data? How you make it? Places to store and access your data safely & efficiently. Tools to analyze them quickly. 14

What is Big Data? How you make it? Places to store and access your data safely & efficiently. [Columnar] Database Tools to analyze them quickly. Analytic Tools 15

What is Big Data? How you make it? Places to store and access your data safely & efficiently. [Columnar] Database Tools to analyze them quickly. Analytic Tools [Vertica] Analytic Databases 16

Outline What is Big Data? How Vertica provides Big Data Solutions? What are Materialized Views (MVs)? What are the Roles of MVs in Big Data? How does Vertica implement their MVs? Why Vertica names their MVs Live Aggregare Projections (LAPs)? What are the differences between LAPS & MVs?

How Vertica provides Big Data Solutions? 18

How Vertica provides Big Data Solutions? Same answers of What is Vertica Analytic Database? 19

What is Vertica Analytic Database? Columnar and MPP RDBMS and More 20

Columnar and MPP RDBMS and More Column Store Conventional databases store a heap file with all records in row order Column stores store each column separately, so you only read what you need SELECT id, game2_score FROM scores; id name game1_score game2_score profile_txt id name game1_score game2_score profile_txt 1 Michael 1120 42 <long string> 1 Michael 1120 42 <big long string> 2 Christopher 9001 17 <long string> 2 Christopher 9001 17 <big long string> 3 Matthew 4041 17 <long string> 3 Matthew 4041 17 <big long string> 4 Jessica 7218 42 <long string> 4 Jessica 7218 42 <big long string> 5 Ashley 3364 42 <long string> 5 Ashley 3364 42 <big long string> 21 Names courtesy of US Social Security Administration. Top 5 US baby names in 1994, in descending order of popularity.

Columnar and MPP RDBMS and More Sorted Column Store Want the highest values? Trivial if it s sorted. SELECT id, game2_score FROM scores ORDER BY game2_score DESC LIMIT 3; id 1 4 5 2 3 name Michael Jessica Ashley Christopher Matthew game1_score 1120 7218 3364 9001 4041 game2_score 42 42 42 17 17 profile_txt <big long string> <big long string> <big long string> <big long string> <big long string> Joining two tables? If they re sorted on the join key, just line up the rows. SELECT scores.name, MAX(sessions.last_login_time) FROM scores, sessions WHERE scores.id = sessions.user_id GROUP BY scores.id, scores.name; name Michael Christopher Matthew Jessica Ashley id 1 2 3 4 5 user_id 1 1 1 2 2 last_login_time 7:32 PM 5:18 PM 4:59 PM 3:02 PM 1:03 PM 22 Multi-column sort for the win!

Columnar and MPP RDBMS and More Compressed Sorted Column Store Sorted fields often have lots of duplicate values. Let s de-duplicate! user_id user_id 1 1 1 x3 2 x2 - run_count of 3 - run_count of 2 Saves lots of disk space, disk IO Values don t need to be identical They will be similar; many compression schemes work well Can compute directly on compressed data for more performance COUNT(column) secretly rewritten as SUM(run_count) SUM(column) secretly rewritten as SUM(run_count * value) 1 2 2 23

Columnar and MPP RDBMS and More Distributed Compressed Sorted Column Store We re already lining up join keys Let s segment the data across machines Enforce that matching values are always on the same node Means we can JOIN in parallel Node_01 SELECT * FROM scores, sessions; Node_02 name id user_id last_login_time name id user_id last_login_time Michael 1 1 x3 7:32 PM Christopher 2 2 x2 3:02 PM 5:18 PM 1:03 PM 4:59 PM 24

Columnar and MPP RDBMS and More ACID-Compliant Distributed Compressed Sorted Column Store The world is moving away from straight Map/Reduce; back towards transactional systems Google ( F1: A Distributed SQL Database That Scales ; VLDB 2013) Yahoo ( Peta-Scale Data Warehousing at Yahoo! ; Sigmod 2009) Consistency is really hard! If you leave it to each application developer, rather than guaranteeing it always, you get bugs. name id user_id last_login_time name id user_id last_login_time Michael 1 1 x3 7:32 PM?? Christopher 2 2 x2 3:02 PM 5:18 PM 1:03 PM 4:59 PM 25

Columnar and MPP RDBMS and More ACID-Compliant Distributed Compressed Sorted Column Store The problem: Data being loaded and queried on all nodes by many users all at once Must provide a consistent view of the data Either you see the whole change (on all machines) or none of it Must scale linearly (more nodes + more data == same performance) Node_01 Node_02 name id user_id last_login_time name id user_id last_login_time Michael 1 1 x3 7:32 PM Christopher 2 2 x2 3:02 PM 5:18 PM 1:03 PM 4:59 PM 26

Columnar and MPP RDBMS and More ACID-Compliant Distributed Compressed Sorted Column Store The solution: Distributed Agreement Spread (open-source library) UPDATE SELECT INSERT Send carefully-designed control messages around a ring of nodes Ring enforces order Arbitrary/unordered processing between messages Node_01 Node_02 name Michael id 1 user_id 1 x3 last_login_time 7:32 PM 5:18 PM name Christopher id 2 user_id 2 x2 last_login_time 3:02 PM 1:03 PM 4:59 PM 27 Is the above diagram really accurate? No. If you want to know more, come interview with us!

Columnar and MPP RDBMS and More UDx TM DBD a = true WHEN b > 10 UPDATE SELECT INSERT? Query Optimizer Node_01 Node_02 name Michael id 1 user_id 1 x3 last_login_time 7:32 PM 5:18 PM name Christopher id 2 user_id 2 x2 last_login_time 3:02 PM 1:03 PM 4:59 PM 28

Columnar and MPP RDBMS and More Query Optimizer Generating Optimal Query Plan SQL Query Query Optimizer Query Plan Statistics & Histogram 29

Columnar and MPP RDBMS and More Database Designer Designing Optimal Projection Set for an available Row-Stored Schema Tables & Constraints Data Set Queries DBD Query Performance Storage Footprint Projection Set Fault Tolerance Recovery Scenario: Workload + Space Budget 30

Columnar and MPP RDBMS and More Tuple Mover Combining small storage containers into larger ones Storage Containers TM Fewer but larger Storage Containers 31

Columnar and MPP RDBMS and More Analytic Queries and Functions Built-in Analytic Queries and Functions Time Series: evaluate the values of a given set of variables over time and group those values into a window 32 SELECT symbol, slice_time, TS_FIRST_VALUE(bid1) AS first_bid FROM Tickstore WHERE symbol IN ('MSFT', 'IBM') TIMESERIES slice_time AS '5 seconds' OVER (PARTITION BY symbol ORDER BY ts) Event Series Pattern Matching SELECT uid, sid, ts, refurl, pageurl, action, event_name(), pattern_id(), match_id() FROM clickstream_log MATCH (PARTITION BY uid, sid ORDER BY ts DEFINE Entry AS RefURL NOT ILIKE '%website2.com%' AND PageURL ILIKE '%website2.com%', PATTERN Onsite AS PageURL ILIKE '%website2.com%' AND Action='V', Purchase AS PageURL ILIKE P AS (Entry Onsite* Purchase) ROWS MATCH FIRST EVENT); '%website2.com%' AND Action = 'P'

Columnar and MPP RDBMS and More Analytic Queries and Functions User-Defined Functions, Extensions, Transforms Vertica can run R, C++ and Java extensions internally. We ve even enabled parallel/distributed execution within R itself distributedr Extensive collaboration with HP Labs and various universities Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices (Eurosys 2013) Can use Vertica as a data store and preprocessor 33

Columnar and MPP RDBMS and More TM R HDFS UDxDBD a = true WHEN b > 10 UPDATE SELECT INSERT? Query Optimizer LAPs Text Search Flex (Unstructured) BI Tools Web App Node_01 Node_02 name Michael id 1 user_id 1 x3 last_login_time 7:32 PM 5:18 PM name Christopher id 2 user_id 2 x2 last_login_time 3:02 PM 1:03 PM 4:59 PM 34

This talk is about LAPs implementation TM R HDFS Node_01 UDxDBD a = true WHEN b > 10 UPDATE SELECT INSERT? Query Optimizer LAPs Text Search Flex (Unstructured) Node_02 BI Tools Web App name Michael id 1 user_id 1 x3 last_login_time 7:32 PM 5:18 PM name Christopher id 2 user_id 2 x2 last_login_time 3:02 PM 1:03 PM 4:59 PM 35

What are LAPs? LAPs = Live Aggregate Projections ~ Materialized Views in other Databases 36

Outline What is Big Data? How Vertica provides Big Data Solutions? What are Materialized Views (MVs)? What are the Roles of MVs in Big Data? How does Vertica implement their MVs? Why Vertica names their MVs Live Aggregare Projections (LAPs)? What are the differences between LAPS & MVs?

What are Materialized Views (MVs)? Redundant Data Storages for Performance Purposes 38

What are Materialized Views (MVs)? Redundant Data Storages for Performance Purposes Example 1: Avoid Expensive Joins SalesData Emp DateOfSale Amount Nga 2013-12-12 15K Natalya 2013-12-30 7K SELECT Emp, DateOfSale, Amount, Location FROM SalesData, Employee WHERE Emp = EmpID; Jaimin 2014-01-10 3K Nga 2014-01-20 11K Jaimin 2014-01-30 10K Employee EmpID Location... Jaimin Boston... Natalya New York Nga Chicago... 39

What are Materialized Views (MVs)? Redundant Data Storages for Performance Purposes Example 1: Avoid Expensive Joins SalesData Emp DateOfSale Amount Nga 2013-12-12 15K Natalya 2013-12-30 7K Jaimin 2014-01-10 3K Nga 2014-01-20 11K SELECT Emp, DateOfSale, Amount, Location FROM SalesData, Employee WHERE Emp = EmpID; Normal Way: Run above query and get the results. Jaimin 2014-01-30 10K Employee EmpID Location... Jaimin Boston... Natalya New York Nga Chicago... 40

What are Materialized Views (MVs)? Redundant Data Storages for Performance Purposes Example 1: Avoid Expensive Joins SalesData Emp DateOfSale Amount Nga 2013-12-12 15K Natalya 2013-12-30 7K Jaimin 2014-01-10 3K Nga 2014-01-20 11K Jaimin 2014-01-30 10K SELECT Emp, DateOfSale, Amount, Location FROM SalesData, Employee WHERE Emp = EmpID; Normal Way: Run above query and get the results. If many joins and a lot of data, the query can be slow Employee EmpID Location... Jaimin Boston... Natalya New York Nga Chicago... 41

What are Materialized Views (MVs)? Redundant Data Storages for Performance Purposes Example 1: Avoid Expensive Joins SalesData Emp DateOfSale Amount Nga 2013-12-12 15K Natalya 2013-12-30 7K Jaimin 2014-01-10 3K Nga 2014-01-20 11K Jaimin 2014-01-30 10K Employee EmpID Location... SELECT Emp, DateOfSale, Amount, Location FROM SalesData, Employee WHERE Emp = EmpID; Normal Way: Run above query and get the results. If many joins and a lot of data, the query can be slow > For better performance, Pre-join data and insert them into its own table, Sales Jaimin Boston... Natalya New York Nga Chicago... 42

What are Materialized Views (MVs)? Redundant Data Storages for Performance Purposes Example 1: Avoid Expensive Joins SalesData Emp DateOfSale Amount Nga 2013-12-12 15K Natalya 2013-12-30 7K Jaimin 2014-01-10 3K Nga 2014-01-20 11K Jaimin 2014-01-30 10K Employee EmpID Location... Jaimin Boston... Sales Emp DateOfSale Amount Location Nga 2013-12-12 15K Chicago Natalya 2013-12-30 7K New York Jaimin 2014-01-10 3K Boston Nga 2014-01-20 11K Chicago Jaimin 2014-01-30 10K Boston The query before becomes simpler and run faster This simplified query is generated implicitly SELECT * from Sales; Natalya New York Nga Chicago... 43

What are Materialized Views (MVs)? Redundant Data Storages for Performance Purposes Example 1: Avoid Expensive Joins SalesData Emp DateOfSale Amount Nga 2013-12-12 15K Natalya 2013-12-30 7K Jaimin 2014-01-10 3K Nga 2014-01-20 11K Jaimin 2014-01-30 10K Employee EmpID Location... Jaimin Boston... Natalya New York Nga Chicago... Sales Emp DateOfSale Amount Location Nga 2013-12-12 15K Chicago Natalya 2013-12-30 7K New York Jaimin 2014-01-10 3K Boston Nga 2014-01-20 11K Chicago Jaimin 2014-01-30 10K Boston The query before becomes simpler and run faster This simplified query is generated implicitly SELECT * from Sales; Sales is a Materialized View 44

What are Materialized Views (MVs)? Redundant Data Storages for Performance Purposes Example 2: Pre-aggregate Data SalesData SELECT year(dateofsale) year, emp, Emp DateOfSale Amount sum(amount) sum, max(amount) max FROM SalesData Nga 2013-12-12 15K GROUP BY year, emp Natalya 2013-12-30 7K ORDER BY year, emp; Jaimin 2014-01-10 3K Nga 2014-01-20 11K Jaimin 2014-01-30 10K Jaimin 2014-02-15 18K Nga 2014-02-16 5K Nga 2014-02-25 9K Results Year Emp Sum Max 2013 Natalya 7K 7K 2013 Nga 15K 15K 2014 Jaimin 31K 18K 2014 Nga 25K 11K 45

What are Materialized Views (MVs)? Redundant Data Storages for Performance Purposes Example 2: Pre-aggregate Data SalesData Emp DateOfSale Amount Nga 2013-12-12 15K Natalya 2013-12-30 7K Jaimin 2014-01-10 3K Nga 2014-01-20 11K Jaimin 2014-01-30 10K Jaimin 2014-02-15 18K Nga 2014-02-16 5K Nga 2014-02-25 9K For better performance, data is pre-aggregated and inserted to SalesSummary Year Emp Sum Max 2013 Natalya 7K 7K 2013 Nga 15K 15K 2014 Jaimin 31K 18K 2014 Nga 25K 11K The query before becomes simpler and run faster This simplified query is generated implicitly SELECT * from SalesSummary; 46

What are Materialized Views (MVs)? Redundant Data Storages for Performance Purposes Example 2: Pre-aggregate Data SalesData Emp DateOfSale Amount Nga 2013-12-12 15K Natalya 2013-12-30 7K Jaimin 2014-01-10 3K Nga 2014-01-20 11K Jaimin 2014-01-30 10K Jaimin 2014-02-15 18K Nga 2014-02-16 5K Nga 2014-02-25 9K For better performance, data is pre-aggregated and inserted to SalesSummary Year Emp Sum Max 2013 Natalya 7K 7K 2013 Nga 15K 15K 2014 Jaimin 31K 18K 2014 Nga 25K 11K The query before becomes simpler and run faster This simplified query is generated implicitly SELECT * from SalesSummary; SalesSummary is a Materialized View 47

Outline What is Big Data? How Vertica provides Big Data Solutions? What are Materialized Views (MVs)? What are the Roles of MVs in Big Data? How does Vertica implement their MVs? Why Vertica names their MVs Live Aggregare Projections (LAPs)? What are the differences between LAPS & MVs?

What are the Roles of MVs in Big Data? 49

What are the Roles of MVs in Big Data? Remember this about Big Data? HAVE: a lot of data. WANT: dig them for treasures in sub-second 50

What are the Roles of MVs in Big Data? Remember this about Big Data? HAVE: a lot of data. WANT: dig them for treasures in sub-second MVs helps bring a lot of data to a small set of data Dig that small set of data can happen in sub-second 51

What are the Roles of MVs in Big Data? Remember this about Big Data? HAVE: a lot of data. WANT: dig them for treasures in sub-second MVs helps bring a lot of data to a small set of data Dig that small set of data can happen in sub-second Vertica is already fast. MVs will bring it to an even better level 52

Outline What is Big Data? How Vertica provides Big Data Solutions? What are Materialized Views (MVs)? What are the Roles of MVs in Big Data? How does Vertica implement their MVs? Why Vertica names their MVs Live Aggregare Projections (LAPs)? What are the differences between LAPS & MVs?

How Vertica Implements their MVs? Answer these 2 questions will answer the above question 1. Why Vertica names their MVs Live Aggregate Projections (LAPs)? 2. What are the differences between LAPs and Mvs? 54

How Vertica Implements their MVs? Answer these 2 questions will answer the above question 1. Why Vertica names their MVs Live Aggregate Projections (LAPs)? 2. What are the differences between LAPs and MVs? The answers of question #2 will light up the answer of question 1. 55

How Vertica Implements their MVs? Answer these 2 questions will answer the above question 1. Why Vertica names their MVs Live Aggregate Projections (LAPs)? 2. What are the differences between LAPs and MVs? The answers of question 2 will light up the answer of question 1. So What are the differences between LAPs and MVs? 56

Differences between LAPs and MVs? 57

Differences between LAPs and MVs? Fully vs Partially Aggregated MVs Fully aggregated OR out-of-date LAPs Partially aggregated per load on each partition. Always up-to-date. Year Emp Sum Max 2013 Natalya 7K 7K 2013 Nga 15K 15K 2014 Jaimin 31K 18K 2014 Nga 25K 11K Partitioned by MONTH(DateOfSale) a Year Emp Sum Max 2013-12 2013 Natalya 7K 7K 2013 Nga 15K 15K a Year Emp Sum Max 2014-01 2014 Jaimin 13K 10K 2014 Nga 11K 11K a Year Emp Sum Max 2014-02 2014 Jaimin 18K 18K a 2014 Nga 14K 9K 58

Differences between LAPs and MVs? Fully vs Partially Aggregated Different Insert Strategy MVs Compute/Recompute MVs data after INSERT LAPs Compute LAP's during INSERT Base Table MVs Base Table LAPs GroupBy Data Recompute/ Maintenance Data 59

Fully vs Partially Aggregated Different Maintenance Cost MVs LAPs Differences between LAPs and MVs? Compute Delta Recompute affected data for all related MVs to bring them up-to-date before they are used Do nothing 60

Differences between LAPs and MVs? Fully vs Partially Aggregated Different Currency Control MVs Locking needed during MV maintenance Heavy duty system LAPs Do nothing since data of new load will be in their own container 61

Differences between LAPs and MVs? Fully vs Partially Aggregated Different Currency Control MVs Locking needed during MV maintenance Heavy duty system LAPs Do nothing since data of new load will be in their own container Look back at Insert Plans Base Table MVs Base Table LAPs 62 Data Recompute/ Maintenance The recompute/maintenance step is not cheap and BaseTable has to be locked during that time Data GroupBy

Differences between LAPs and MVs? Fully vs Partially Aggregated Different Select Query Plan MVs Read MV LAPs Read and Combine partially aggregated data Data Data GroupBy Scan Scan MVs LAPs 63

Differences between LAPs and MVs? Fully vs Partially Aggregated Different Background Maintenance MVs Possible if MVs are not used right after INSERT LAPs Use Tuple Mover (TM) to combine partially aggregated data from different loads GroupBy Scan LAPs 64

Differences between LAPs and MVs? Fully vs Partially Aggregated Different Delete Handling MVs Recompute MVs for affected groups LAPs No delete single row but drop a whole partition Do nothing 65

Differences between LAPs and MVs? Summary MVs Fully aggregated but not always up-to-date Expensive maintenance process Heavy-duty Concurrency Control Select data from MVs is fast Background Maintenance not work since MVs must be up-to-date LAPs Partially aggregated but always up-to-date No maintenance needed No Concurrency Control needed on LAPs Select data from LAPs also fast but need to combine data from different partitions & loads. Background Maintenance by TM 66

Differences between LAPs and MVs? Summary MVs Fully aggregated but not always up-to-date Expensive maintenance process Heavy-duty Concurrency Control Select data from MVs is fast Background Maintenance not work since MVs must be up-to-date LAPs Partially aggregated but always up-to-date No maintenance needed No Concurrency Control needed on LAPs Select data from LAPs also fast but need to combine data from different partitions & loads. Background Maintenance by TM So why name Live Aggregate Projections? 67

Differences between LAPs and MVs? Summary MVs Fully aggregated but not always up-to-date Expensive maintenance process Heavy-duty Concurrency Control Select data from MVs is fast Background Maintenance not work since MVs must be up-to-date LAPs Partially aggregated but always up-to-date No maintenance needed No Concurrency Control needed on LAPs Select data from LAPs also fast but need to combine data from different partitions & loads. Background Maintenance by TM So why name Live Aggregate Projections? 68

Differences between LAPs and MVs? Summary MVs Fully aggregated but not always up-to-date Expensive maintenance process Heavy-duty Concurrency Control Select data from MVs is fast Background Maintenance not work since MVs must be up-to-date LAPs Partially aggregated but always up-to-date No maintenance needed No Concurrency Control needed on LAPs Select data from LAPs also fast but need to combine data from different partitions & loads. Background Maintenance by TM Vertica's physical storages are Projections So why name Live Aggregate Projections? 69

Differences between LAPs and MVs? Summary MVs Fully aggregated but not always up-to-date Expensive maintenance process Heavy-duty Concurrency Control Select data from MVs is fast Background Maintenance not work since MVs must be up-to-date LAPs Partially aggregated but always up-to-date No maintenance needed No Concurrency Control needed on LAPs Select data from LAPs also fast but need to combine data from different partitions & loads. Background Maintenance by TM Vertica's physical storages are Projections Data is pre-aggregated So why name Live Aggregate Projections? 70

Questions? 71