adaptive loading adaptive indexing dbtouch Ideas for Big Data Exploration Stratos Idreos CWI, INS-, Amsterdam
data is everywhere
daily data years
daily data years Eric Schmidt: Every two days we create as much data as much we did from dawn of humanity to 00
- exabytes daily data years 0 Eric Schmidt: Every two days we create as much data as much we did from dawn of humanity to 00
not always sure what we are looking for (until we find it)
not always sure what we are looking for (until we find it) data exploration
database systems great... declarative processing, back-end to numerous apps
database systems great... declarative processing, back-end to numerous apps but databases are too heavy and blind!
database systems great... declarative processing, back-end to numerous apps but databases are too heavy and blind! timeline
database systems great... declarative processing, back-end to numerous apps but databases are too heavy and blind! load timeline
database systems great... declarative processing, back-end to numerous apps but databases are too heavy and blind! load index timeline
database systems great... declarative processing, back-end to numerous apps but databases are too heavy and blind! load index query timeline
expert users - idle time - workload knowledge database systems great... declarative processing, back-end to numerous apps but databases are too heavy and blind! load index query timeline
systems tailored for data exploration
no workload knowledge systems tailored for data exploration
no workload knowledge no installation steps systems tailored for data exploration
no workload knowledge no installation steps quick response times systems tailored for data exploration
no workload knowledge no installation steps quick response times minimize data-to-query time systems tailored for data exploration
ideas adaptive indexing ACM SIGMOD Jim Gray Diss. Award - Cor Baayen Award years, papers
ideas adaptive indexing ACM SIGMOD Jim Gray Diss. Award - Cor Baayen Award years, papers adaptive loading In use in Hadapt and Tableau years, papers
ideas adaptive indexing ACM SIGMOD Jim Gray Diss. Award - Cor Baayen Award years, papers adaptive loading In use in Hadapt and Tableau years, papers dbtouch VENI 0 year, paper
ideas adaptive indexing ACM SIGMOD Jim Gray Diss. Award - Cor Baayen Award years, papers adaptive loading In use in Hadapt and Tableau years, papers dbtouch VENI 0 year, paper Martin Kersten, Stefan Manegold, Felix Halim, Panagiotis Karras, Roland Yap, Goetz Graefe, Harumi Kuno, Eleni Petraki, Anastasia Ailamaki, Ioannis Alagiannis, Renata Borovica, Miguel Branco, Ryan Johnson, Erietta Liarou
indexing load index query tune= create proper indexes offline performance 0-00X
indexing load index query tune= create proper indexes offline performance 0-00X but it depends on workload! which indices to build? on which data parts? and when to build them?
load index query
load index query timeline
load index query sample workload timeline
load index query sample workload analyze timeline
load index query sample workload analyze create indices timeline
load index query sample workload analyze create indices query timeline
load index query sample workload analyze create indices query timeline complex and time consuming process
load index query human administrators + auto-tuning tools sample workload analyze create indices query timeline complex and time consuming process
what can go wrong?
what can go wrong? not enough idle time to finish proper tuning
what can go wrong? not enough idle time to finish proper tuning by the time we finish tuning, the workload changes
database cracking
database cracking remove all tuning steps and need for human input incremental, adaptive, partial indexing
initialization indexing database cracking remove all tuning steps and need for human input incremental, adaptive, partial indexing
initialization query indexing database cracking remove all tuning steps and need for human input incremental, adaptive, partial indexing
initialization query indexing database cracking remove all tuning steps and need for human input incremental, adaptive, partial indexing
initialization query indexing database cracking remove all tuning steps and need for human input incremental, adaptive, partial indexing every query is treated as an advice on how data should be stored
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Result tuples Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Result tuples Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 00 cracking example gain knowledge on how data is organized Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Result tuples Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 00 cracking example Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A Result tuples Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 00 cracking example the more we crack, the more we learn Column A Cracker column of A Cracker column of A Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Q (copy) Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A Result tuples Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 00 cracking example the more we crack, the more we learn Q: select * from R where R.A > 0 and R.A < Q: select * from R where R.A > and R.A <= Column A Q (copy) Cracker column of A Piece : A <= 0 Q (in place) Piece : 0 < A < Piece : <= A Cracker column of A every query is treated as an advice on how data should be stored Piece : A <= Piece : < A <= 0 Piece : 0 < A < Piece : <= A <= Piece 5: < A Result tuples Dynamically/on-the-fly within the select-operator
Sideways Cracking, SIGMOD 0 TPC-H 0000 MonetDB Presorted Sel. Crack Sid. Crack MySQL Presorted Response time (milli secs) 000 0 00 50 00 0 TPC-H Query 5 50 00 0 0 5 0 5 0 5 0 Query sequence
Sideways Cracking, SIGMOD 0 TPC-H 0000 MonetDB Presorted Sel. Crack Sid. Crack MySQL Presorted Response time (milli secs) 000 0 00 50 00 50 0 TPC-H Query 5 Normal MonetDB selection cracking 00 0 0 5 0 5 0 5 0 Query sequence
Sideways Cracking, SIGMOD 0 TPC-H 0000 MonetDB Presorted Sel. Crack Sid. Crack MySQL Presorted Response time (milli secs) 000 0 00 50 00 50 0 TPC-H Query 5 Normal MonetDB selection cracking Presorted MonetDB Preparation cost - minutes 00 0 0 5 0 5 0 5 0 Query sequence
Sideways Cracking, SIGMOD 0 TPC-H 0000 MonetDB Presorted Sel. Crack Sid. Crack MySQL Presorted Response time (milli secs) Presorted MonetDB Preparation cost - minutes 000 0 00 50 00 50 00 0 0 TPC-H Query 5 0 5 0 5 0 5 0 Query sequence Normal MonetDB selection cracking MonetDB with sideways cracking
Sideways Cracking, SIGMOD 0 TPC-H 0000 MonetDB Presorted Sel. Crack Sid. Crack MySQL Presorted Response time (milli secs) Presorted MonetDB Preparation cost - minutes 000 0 00 50 00 50 00 0 0 TPC-H Query 5 0 5 0 5 0 5 0 Query sequence Normal MonetDB selection cracking MonetDB with sideways cracking
Sideways Cracking, SIGMOD 0 TPC-H 0000 MonetDB Presorted Sel. Crack Sid. Crack MySQL Presorted Response time (milli secs) Presorted MonetDB Preparation cost - minutes 000 0 00 50 00 50 00 0 0 TPC-H Query 5 0 5 0 5 0 5 0 Query sequence Normal MonetDB selection cracking MonetDB with sideways cracking
Sideways Cracking, SIGMOD 0 TPC-H 0000 MonetDB Presorted Sel. Crack Sid. Crack MySQL Presorted Response time (milli secs) Presorted MonetDB Preparation cost - minutes 000 0 00 50 00 50 00 0 0 TPC-H Query 5 0 5 0 5 0 5 0 Query sequence Normal MonetDB selection cracking MonetDB with sideways cracking
Stochastic Cracking, PVLDB skyserver experiments cracking answers 0.000 queries while full indexing is still half way creating the index
cracking databases updates (SIGMOD0) multiple columns (SIGMOD0) storage restrictions (SIGMOD0) benchmarking (TPCTC0) algorithms (PVLDB) concurrency control (PVLDB) robustness (PVLDB)
cracking databases updates (SIGMOD0) multiple columns (SIGMOD0) storage restrictions (SIGMOD0) benchmarking (TPCTC0) joins disk based multi-dimensional holistic indexing optimization algorithms (PVLDB) row-stores multi-cores concurrency control (PVLDB) robustness (PVLDB) aggregations vectorwised
loading load index query
loading load index query copy data inside the database database now has full control
loading load index query copy data inside the database database now has full control slow process...not all data might be needed all the time
database vs. unix tools file, attributes, billion tuples DB Awk single query cost (secs) 0 550,00,50,00
database vs. unix tools file, attributes, billion tuples DB Awk single query cost (secs) 0 550,00,50,00 break down db cost Loading Query Processing % %
database vs. unix tools file, attributes, billion tuples DB Awk single query cost (secs) 0 550,00,50,00 break down db cost Loading Query Processing % loading is a major bottleneck %
database vs. unix tools file, attributes, billion tuples DB Awk single query cost (secs) 0 550,00,50,00 break down db cost Loading Query Processing % loading is a major bottleneck %... but writing/maintaining scripts is hard too
adaptive loading load/touch only what is needed and only when it is needed
but raw data access is expensive tokenizing - parsing - no indexing - no statistics
but raw data access is expensive tokenizing - parsing - no indexing - no statistics challenge: fast raw data access
query plan
query plan scan
query plan scan db
query plan scan db
query plan scan zero loading cost
query plan scan files access raw data adaptively on-the-fly zero loading cost
query plan scan files cache access raw data adaptively on-the-fly zero loading cost
selective parsing - file indexing file splitting - online statistics query plan scan files cache access raw data adaptively on-the-fly zero loading cost
NoDB SIGMOD 0 Execution Time (sec) 00 00 000 00 00 00 ~500 ~500 Q0 Q Q Q Q Q5 Q Q Q Q Q0 Q Q Q Q Q5 Q Q Q Q Load 00 0 MySQL CSV Engine MySQL DBMS X DBMS X w/ external files PostgreSQL PostgresRaw PM + C 0
NoDB SIGMOD 0 Execution Time (sec) 00 00 000 00 00 00 ~500 ~500 Q0 Q Q Q Q Q5 Q Q Q Q Q0 Q Q Q Q Q5 Q Q Q Q Load 00 0 MySQL CSV Engine MySQL DBMS X DBMS X w/ external files PostgreSQL PostgresRaw PM + C 0
NoDB SIGMOD 0 Execution Time (sec) 00 00 000 00 00 00 ~500 ~500 Q0 Q Q Q Q Q5 Q Q Q Q Q0 Q Q Q Q Q5 Q Q Q Q Load 00 0 MySQL CSV Engine MySQL DBMS X DBMS X w/ external files PostgreSQL PostgresRaw PM + C 0
NoDB SIGMOD 0 Execution Time (sec) 00 00 000 00 00 00 ~500 ~500 Q0 Q Q Q Q Q5 Q Q Q Q Q0 Q Q Q Q Q5 Q Q Q Q Load 00 0 MySQL CSV Engine MySQL DBMS X DBMS X w/ external files PostgreSQL PostgresRaw PM + C 0
NoDB SIGMOD 0 Execution Time (sec) 00 00 000 00 00 00 ~500 ~500 Q0 Q Q Q Q Q5 Q Q Q Q Q0 Q Q Q Q Q5 Q Q Q Q Load 00 0 MySQL CSV Engine MySQL DBMS X DBMS X w/ external files PostgreSQL PostgresRaw PM + C 0
NoDB SIGMOD 0 Execution Time (sec) 00 00 000 00 00 00 ~500 ~500 Q0 Q Q Q Q Q5 Q Q Q Q Q0 Q Q Q Q Q5 Q Q Q Q Load 00 0 MySQL CSV Engine MySQL DBMS X DBMS X w/ external files PostgreSQL PostgresRaw PM + C 0
NoDB SIGMOD 0 Execution Time (sec) 00 00 000 00 00 00 ~500 ~500 Q0 Q Q Q Q Q5 Q Q Q Q Q0 Q Q Q Q Q5 Q Q Q Q Load 00 0 MySQL CSV Engine MySQL DBMS X DBMS X w/ external files PostgreSQL PostgresRaw PM + C 0
NoDB SIGMOD 0 Execution Time (sec) 00 00 000 00 00 00 ~500 ~500 Q0 Q Q Q Q Q5 Q Q Q Q Q0 Q Q Q Q Q5 Q Q Q Q Load 00 0 MySQL CSV Engine MySQL DBMS X DBMS X w/ external files PostgreSQL PostgresRaw PM + C reducing data-to-query time 0
querying load index query
querying load index query declarative SQL interface
querying load index query declarative SQL interface correct and complete answers
querying load index query declarative SQL interface correct and complete answers
querying complex and slow - not fit for exploration load index query declarative SQL interface correct and complete answers
dbtouch dbtouch CIDR 0
dbtouch dbtouch CIDR 0 data
dbtouch dbtouch CIDR 0 data query
dbtouch dbtouch CIDR 0 no expert knowledge - no set-up costs interactive mode fit for exploration data query
dbtouch CIDR 0 challenge design database systems tailored for touch exploration
dbtouch CIDR 0 challenge design database systems tailored for touch exploration user drives query processing actions
adaptive loading adaptive indexing dbtouch Ideas for Big Data Exploration Stratos Idreos CWI, INS-, Amsterdam
adaptive loading adaptive indexing dbtouch Ideas for Big Data Exploration Stratos Idreos CWI, INS-, Amsterdam systems tailored for exploration
more in INS- data vaults sciborg cyclotron sciql holistic indexing datacell charles hardware-conscious
more in INS- data vaults sciborg cyclotron sciql holistic indexing Thank you! datacell charles hardware-conscious