Testing the In-Memory Column Store for in-database physics analysis. Dr. Maaike Limper

Size: px
Start display at page:

Download "Testing the In-Memory Column Store for in-database physics analysis. Dr. Maaike Limper"

Transcription

1 Testing the In-Memory Column Store for in-database physics analysis Dr. Maaike Limper

2 About CERN CERN - European Laboratory for Particle Physics Support the research activities of scientists from 110+ nationalities Largest machine in the world, the Large Hadron Collider: 27km, superconducting magnets Four main experiments: ATLAS, ALICE, CMS, LHCb 17/6/2014 Maaike Limper - CERN 2

3 Higgs Boson discovery 4 July 2012: Scientists from ATLAS and CMS present Higgs discovery result Plots of the invariant mass of photon-pairs produced at the LHC show a significant bump around 125 GeV Operation of the Large Hadron Collider and its experiments relies on Oracle databases: conditions data, metadata, logging & monitoring data, but the data-points in these plots did not came out of a database 17/6/2014 Maaike Limper - CERN 3

4 CERN openlab My project: Test the possibility of using the Oracle database for physics analysis CERN openlab is a unique public-private partnership between CERN and leading ICT companies. Its mission is to accelerate the development of cutting-edge solutions to be used by the worldwide LHC community 17/6/2014 Maaike Limper - CERN 4

5 In-database physics analysis Higgs decay to 2 photons candidate: event display from the ATLAS experiment 17/6/2014 Maaike Limper - CERN 5

6 In-database physics analysis Physics Analysis database Separate physics-objects in separate tables Physics-object described by hundreds of variables wide tables! Analysis queries Predicate filtering to quickly apply object quality-criteria Each analysis-specific query uses unique combination of columns J/ψ Ψ(3686) 17/6/2014 Maaike Limper - CERN 6

7 The problem Analysis query performance typically limited by I/O reads Full table scans over tables with many columns, while only few columns are used for each specific analysis Combination of columns unique for each query Can t index every column! 17/6/2014 Maaike Limper - CERN 7

8 In-Memory Column Store Oracle s In-Memory Column Store provides a solution to reduce I/O read time, especially for tables with many columns Profit from fast In-Memory reads Read only columns relevant for the specific analysis query 17/6/2014 Maaike Limper - CERN 8

9 Compression rates COMPRESS FOR QUERY vs CAPACITY HIGH electron typical physics-object data: mixture of int, float, double Event Filter only booleans (mostly false), best compression Missing Energy table with floats & double, worst compression Table name Compress ratio IMC cap. high Compress ratio IMC query electron Event Filter Missing Energy Average compression rate of dataset is 2.1 with query compression and 3.6 with capacity high: physics-objects represent the bulk of the data 17/6/2014 Maaike Limper 9 - CERN 9

10 Simple query performance Comparing read from disk vs IMC time: 1000x faster Comparing read from buffer cache vs IMC time: 40x faster Note 2x more memory needed to put data in the buffer cache compared to placing it in the In-Memory Column store! 17/6/2014 Maaike Limper 10 - CERN 10

11 Complex query performance Comparing read from disk vs IMC time: 70x faster Comparing read from buffer cache vs IMC time: 7x faster With IMC only 10 s to make this plot, allowing the analyst to quickly optimize results while trying different variable combinations 17/6/2014 Maaike Limper 11 - CERN 11

12 17/6/2014 Conclusion IMC s STAR-story: Situation: In-database physics analysis is limited by I/O Task: Remove I/O bottleneck for any query using any combination of columns in a table Action: Use Oracle s In-Memory Column Store Take advantage of fast reads from cache Columnar compression increases size of data that fits in-memory Access only relevant columns and use predicate pruning to further reduce I/O Result: I/O bottleneck removed, real-time in-database physics analysis is now possible* *while the Oracle database is not currently used for physics analysis, this study shows promising results using the In-Memory Column Store for in-database physics analysis Maaike Limper - CERN 12