Data analysis in Par,cle Physics From data taking to discovery Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 1
$ whoami Lukasz (Luke) Kreczko Par,cle Physicist Graduated in Physics from University of Hamburg in 2009 2009 2013 PhD in Par,cle Physics at the University of Bristol Currently Compu,ng Research Assistant at the University of Bristol Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 2
Outline Data taking at the Compact Muon Solenoid (CMS) experiment Data format (and distribu,on) Data analysis procedure Summary Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 3
Outline Data taking at the Compact Muon Solenoid (CMS) experiment Data format (and distribu,on) Data analysis procedure Summary Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 4
Outline Data taking at the Compact Muon Solenoid (CMS) experiment Data format (and distribu,on) Data analysis procedure Summary Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 5
What is CERN Conseil Europeen pour la Recherche Nucleaire aka European Laboratory for Par,cle Physics Between Geneva and the Jura mountains, straddling the Swiss- French border Founded in 1954 with an interna,onal treaty Our business is fundamental par,cle and how our universe works What is the origin of mass? We are a step closer with the Higgs! What is 96 % of the universe made of? We only see 4%! Why isn t there an,- maber in the universe? What is the state of maber just ader the Big Bang? Saturday, 1 June 13 Lukasz Kreczko - Bristol IT MegaMeet 6
Large Hadron Collider Saturday, 1 June 13 Lukasz Kreczko - Bristol IT MegaMeet 7
Large Hadron Collider Mankind s biggest machine (27 km circumference) Ho:er than the centre of the sun: collisions are 100,000 @mes ho:er Colder than deep space: (super) liquid helium cooling at 1.9 K (- 271 C) Saturday, 1 June 13 Lukasz Kreczko - Bristol IT MegaMeet 8
A complex of accelerators Saturday, 1 June 13 Lukasz Kreczko - Bristol IT MegaMeet 9
The experiment: a big digital camera Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 10
The experiment: a big digital camera 40 million pictures per second Each picture around 1 MB! Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 11
The data: a structured mess Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 12
The data: a structured mess This is low intensity! Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 13
What do we do? Experiment Local compu,ng farm CERN data centre Globally distributed data centres My computer Paper Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 14
What do we do? Experiment Local compu,ng farm Today s focus CERN data centre Globally distributed data centres My computer Paper Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 15
The experiment - CMS Experiment Local compu,ng farm CERN data centre Globally distributed data centres My computer Paper Input from LHC 40 million collisions per second 40 Tera bytes per second Hardware trigger (L1) Low resolu,on Makes decision in 3 micro seconds Reduces output to 100 khz (100 GB/s) Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 16
High Level Trigger Experiment Local compu,ng farm CERN data centre Globally distributed data centres My computer Paper Input from experiment 100,000 collisions per second Sodware trigger (HLT) poor man s reconstruc,on High resolu,on Writes around 700 Hz (700 MB/s) in ROOT data format Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 17
(Event) Reconstruc,on hbp://en.wikipedia.org/wiki/ Event_reconstruc,on Reading the detector informa,on and bundling it into par,cles Detector response from different detector regions helps to iden,fy par,cles In addi,on algorithms look for specific par,cle behaviour (i.e. b- quark: travels half a millimetre before decaying) and iden,fy them Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 18
ROOT ROOT (hbp://root.cern.ch, hbp://root.cern.ch/git/root.git) Developed in 1995 ROOT is a lot of things: hbp://root.cern.ch/drupal/content/about Most used features (subjec,ve): data format, histograms, fipng Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 19
ROOT Also has a C interpreter (CINT) blessing and curse ask any student which one is more accurate 177 PB of LHC data stored in ROOT format ROOT The Next Genera,on : hbps://indico.cern.ch/conferencetimetable.py? confid=217511#20130311 Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 20
ROOT data format hbp://root.cern.ch/drupal/content/root- files- 1 Binary storage for C++ objects Serialisa,on via TObject class Supports par,al reads (i.e. subset of objects) Objects grouped by event (i.e. file.getevent(10).electron.at(0).energy()) Supports read- ahead (tuneable parameter for analysis) Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 21
CERN T0 data reconstruc,on Experiment Local compu,ng farm CERN data centre Globally distributed data centres Input: 300-350 collisions per second Rest is done when machine is shut down Reconstruc,on Connec,ng the dots My computer Paper Removing noise Applying correc,ons Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 22
Analysing all data CMS records 10 000 Terabytes of data every year (around 70 years of full HD movies) + same amount of simula,on To analyse this on a single computer would take 64,000 years! Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 23
Analysing all data CMS records 10 000 Terabytes of data every year (around 70 years of full HD movies) + same amount of simula,on To analyse this on a single computer would take 64,000 years! Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 24
The LHC grid Experiment Local compu,ng farm Distribu,ng on a global scale CERN data centre Globally distributed data centres My computer This is where the analysis happens Paper Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 25
The data: a much nicer picture Jet: p T = 84.1 GeV/c η = 2.24 Missing E T : 22.3 GeV Jet: p T = 89.0 GeV/c η = 2.14 Jet: p T = 85.3 GeV/c η = 2.02 Jet: p T = 90.5 GeV/c η = 1.40 Muon: p T = 71.5 GeV/c η = 0.82 Run: 163583 Event: 26579562 _ m(f)=1.2 TeV/c 2 Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 26
The goal: extend our knowledge Billions of Jet: p T = 84.1 GeV/c η = 2.24 Jet: p T = 89.0 GeV/c η = 2.14 Run: 163583 Event: 26579562 Missing E T : 22.3 GeV Muon: p T = 71.5 GeV/c η = 0.82 + simula,on Jet: p T = 85.3 GeV/c η = 2.02 Jet: p T = 90.5 GeV/c η = 1.40 _ m(f)=1.2 TeV/c 2 S/(S+B) Weighted Events / 1.5 GeV 1500 1000 500 0 CMS -1 s = 7 TeV, L = 5.1 fb Data S+B Fit B Fit Component ±1σ ±2 σ Events / 1.5 GeV 1500 1000-1 s = 8 TeV, L = 5.3 fb Unweighted 120 130 (GeV) m γγ 110 120 130 140 150 m γγ (GeV) Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 27
The goal: extend our knowledge Billions of Jet: p T = 84.1 GeV/c η = 2.24 Jet: p T = 89.0 GeV/c η = 2.14 Run: 163583 Event: 26579562 Missing E T : 22.3 GeV Muon: p T = 71.5 GeV/c η = 0.82 + simula,on Jet: p T = 85.3 GeV/c η = 2.02 Jet: p T = 90.5 GeV/c η = 1.40 _ m(f)=1.2 TeV/c 2 S/(S+B) Weighted Events / 1.5 GeV 1500 1000 500 0 CMS -1 s = 7 TeV, L = 5.1 fb Data S+B Fit B Fit Component ±1σ ±2 σ That s the famous Higgs boson Events / 1.5 GeV 1500 1000-1 s = 8 TeV, L = 5.3 fb Unweighted 120 130 (GeV) m γγ 110 120 130 140 150 m γγ (GeV) Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 28
Analysis Data prepara,on Correc@ons: applying the newest knowledge about the experiment Simula@on: newest knowledge of the theory histogramming Data reduc,on Event selec,on Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 29
Analysis Data prepara,on histogramming Data reduc,on Filtering: we know more or less what we are looking for Ntuples: objects - > plain data structures Event selec,on Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 30
Analysis Data prepara,on histogramming Data reduc,on Selec@on: very refined selec,on to increase signal purity (usually a,ny effect compared to backgrounds) Event selec,on Jet: p T = 84.1 GeV/c η = 2.24 Missing E T : 22.3 GeV Jet: p T = 89.0 GeV/c η = 2.14 Jet: p T = 85.3 GeV/c η = 2.02 Jet: p T = 90.5 GeV/c η = 1.40 Run: 163583 Event: 26579562 Muon: p T = 71.5 GeV/c η = 0.82 _ m(f)=1.2 TeV/c 2 Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 31
Analysis Data prepara,on histogramming Data reduc,on Analysis: apply algorithms (produce derived data) Histograms: data reduc,on S/(S+B) Weighted Events / 1.5 GeV 1500 1000 500 0 CMS -1 s = 7 TeV, L = 5.1 fb Data S+B Fit B Fit Component ±1σ ±2 σ Events / 1.5 GeV 1500 1000-1 s = 8 TeV, L = 5.3 fb Unweighted 120 130 m γγ (GeV) 110 120 130 140 150 (GeV) m γγ Event selec,on Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 32
Analysis Rinse & repeat Data prepara,on histogramming Data reduc,on Event selec,on Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 33
Analysis in Big data terms Data prepara,on MAP histogramming REDUCE Data reduc,on REDUCE Event selec,on MAP Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 34
Analysis in Big data terms Data prepara,on MAP LHC Grid histogramming REDUCE Data reduc,on REDUCE Usually local site Event selec,on MAP Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 35
Summary The data from the experiments are reduced before storing them to disk/tape All data is stored in ROOT format: either as classes or as basic data types Heavy workflows are performed on the LHC grid, frequent and fast work usually on local servers The final result is a histogram (or table) and is a huge reduc,on step from the input (20 PB - > 100 MB) Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 36
Ques,ons? Thank you for listening. Do you have any ques,ons? Saturday, 1 June 13 Lukasz Kreczko - Bristol IT MegaMeet 37
Secret slides Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 38
ROOT and CMS hbps://indico.cern.ch/getfile.py/access? contribid=16&resid=0&materialid=slides&con fid=217511 Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 39
hbps://indico.cern.ch/getfile.py/access?contribid=7&resid=0&materialid=slides&confid=217511 Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 40
What do we do? Experiment Local compu,ng farm online CERN data centre Globally distributed data centres My computer offline Paper Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 41