Changing the Face of Database Cloud Services with Personalized Service Level Agreements Jennifer Ortiz, Victor Teixeira de Almeida, Magdalena Balazinska University of Washington, Computer Science and Engineering PETROBRAS S.A., Rio de Janerio, RJ, Brazil CIDR 2015 1
2
Many Data Management & Analytics Systems Available 3
Many Systems are Available as Cloud Services 4
Which Hadoop Version? Cloud Services Today Amazon EMR Pig or Hive? How many instances of the service? 5
Which Hadoop Version? Cloud Services Today Amazon EMR Pig or Hive? How many instances of the service? 6
Cloud Services Today BigQuery How long will my query take? 7
Cloud Services Can Do Better! System Internals 8
Cloud Services Can Do Better! System Internals 9
Cloud Services Can Do Better! Query: Query Capabilities Time Money SELECT FROM WHERE 10
A new proposal Time to Re-think the interface Hide details of cluster deployment and resources Show users monetary costs and performance estimates on their data Let users pick the desired trade-off between options shown Personalized Service Level Agreements 11
Tier 1: $0.10/hour Within 20 seconds: SELECT <up to 10 attributes> FROM <Fact Dimension> WHERE <up to 100% of data> Within 1 minute: SELECT <up to 5 attributes> FROM <JOIN Fact + 4 Dimensions> WHERE <up to 10% of data> A PSLA Example Fixed, hourly price Expected performance Templates capture capabilities Within 10 minutes: SELECT <up to 10 attributes> FROM <JOIN Fact + 8 Dimensions> WHERE <up to 100% of data> 12
Tier 1: $0.10/hour Within 20 seconds: SELECT <up to 10 attributes> FROM <Fact Dimension> WHERE <up to 100% of data> Within 1 minute: SELECT <up to 5 attributes> FROM <JOIN Fact + 4 Dimensions> WHERE <up to 10% of data> A PSLA Example Fixed, hourly price Expected performance Templates capture capabilities Tier 2: $0.50/hour Within 1 second: SELECT <up to 10 attributes> FROM <Fact Dimension> WHERE <up to 100% of data> Different tiers of service Within 10 minutes: SELECT <up to 10 attributes> FROM <JOIN Fact + 8 Dimensions> WHERE <up to 100% of data> 13
Goals Cloud C Database D 14
Goals Cloud C PSLA P Database D Money, Time, Capabilities 15
Goals Cloud C PSLAManager PSLA P Database D Money, Time, Capabilities 16
Goals Cloud C PSLAManager PSLA P Database D Money, Time, Capabilities 17
Example of a Real PSLA 18
TPC-H Star Schema Benchmark Based on TPC-H 10GB 19
Myria is a data management service in the cloud that we built at UW. It has a parallel, shared-nothing back-end query execution engine called MyriaX 20
PSLA for Myria 21
PSLA for Myria 22
PSLA for Myria 23
PSLA for Myria 24
PSLA for Myria 25
PSLAManager Cloud C PSLAManager PSLA P Database D 26
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 27
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 28
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 29
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 30
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 31
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 32
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 33
Query Workload Generation Which queries to generate? Joins drive performance Think about possible combinations of joins Consider: All possible 2-way joins Tables in Order by Size: Lineitem, Part, Customer, Supplier, Date (Lineitem Part) (Customer Date), (Lineitem Supplier), etc. Only consider most expensive queries Build toward more complex queries, include selections and projections 34
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 35
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 36
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 37
Seconds Seconds Tier Selection Runtime Distributions of Query Workload Per Configuration in Myria 350 300 250 200 EMD 17.43 150 100 50 EMD 7.07 EMD 13.74 0 0 2 4 6 8 10 12 14 16 18 Workers Workers (Configurations) 38
Tier Selection 39
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 40
Workload Compression STEP 1: Query Clustering Threshold-based Density-based 0 2 4 6 8 10 12 14 16 18 Workers 41
Workload Compression STEP 1: Query Clustering th 0 2 4 6 8 10 12 14 16 18 Workers 42
Workload Compression STEP 1: Query Clustering 0 2 4 6 8 10 12 14 16 18 Workers 43
Workload Compression STEP 1: Query Clustering Tier 1: $0.XX/hour Within Cluster Max Threshold: Query Templates in Cluster Within Cluster Max Threshold: Query Templates in Cluster 44
Time(s) Workload Compression STEP 2: Template Generation Queries Query Query Templates Dominance SELECT (5 ATT) FROM (5 TABLES) WHERE 10% SELECT (4 ATT) FROM (4 TABLES) WHERE 1% Configuration 45
Time(s) Workload Compression STEP 2: Template Generation Attributes Projected Query Dominance Tables SELECT (5 ATT) FROM (5 TABLES) WHERE 10% SELECT (4 ATT) FROM (4 TABLES) WHERE 1% Selectivity Configuration 46
Time(s) Workload Compression STEP 2: Template Generation Attributes Projected Query Dominance Tables SELECT (5 ATT) FROM (5 TABLES) WHERE 10% SELECT (4 ATT) FROM (4 TABLES) WHERE 1% Selectivity Given: Configuration 47
Time(s) Workload Compression STEP 2: Template Generation Configuration 48
Time(s) Workload Compression STEP 2: Template Generation Root Query Template: We call a query template a root query template if no other query template in the same cluster dominates it. Configuration 49
Time(s) Workload Compression STEP 3: Dropping Queries with Similar Times Configuration 1 50
Time(s) Workload Compression STEP 3: Dropping Queries with Similar Times Root Query Templates Queries Query Templates Configuration 1 Configuration 2 51
Time(s) Workload Compression STEP 3: Dropping Queries with Similar Times Configuration 1 Configuration 2 52
Time(s) Workload Compression STEP 3: Dropping Queries with Similar Times Configuration 1 Configuration 2 53
Time(s) Workload Compression STEP 3: Dropping Queries with Similar Times Configuration 1 Configuration 2 54
PSLA Quality Assessment 55
PSLA Quality Metrics PSLA Query Capabilities PSLA Complexity PSLA Performance Error Metric 56
Time(s) Time(s) Quality Metric Trade-offs High Complexity Low Error Low Complexity High Error Found that a log interval is best without tuning 57
PSLA Evaluation on Predicted Runtimes 58
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 59
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 60
Performance Model (Offline) Train model offline on other data and queries Cloud Configuration Query Feature Vector Est. Rows Est. IO Avg. Row runtime Query workload Predict runtime from query features Query Feature Vector Cloud Configuration Est. Rows Est. IO Avg. Row runtime Based on Predicting Multiple Metrics for Queries: Better Decisions enabled by Machine Learning [Ganapathi et. al. 2009] 61
Training Dataset Synthetic Dataset 10GB 6 Tables 61 Attributes 62
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Data Workload Generation Runtime Prediction Tier Selection Workload Compression into PSLA (repeat for each tier) Query Clustering Template Generation Dropping Queries with Similar Times PSLA 63
Predicted Myria PSLA (Predicted Runtimes) 64
Looking Forward Direct extensions to the approach Add support for indexes Improve time predictions Longer-term future work Can we guarantee the runtimes? Can we update the PSLA as user queries data Goal is to show increasingly more complex queries Usability testing 65
Conclusion Many cloud DBMSs exist Require users to reason about resources We propose to re-think that interface Personalized Service Level Agreements Service Tiers Price/Capabilities/Performance Important direction for cloud DBMSs 66