Data analysis in Par,cle Physics



Similar documents
Big Data Processing Experience in the ATLAS Experiment

Open access to data and analysis tools from the CMS experiment at the LHC

Testing the In-Memory Column Store for in-database physics analysis. Dr. Maaike Limper

Real Time Tracking with ATLAS Silicon Detectors and its Applications to Beauty Hadron Physics

Highlights of Recent CMS Results. Dmytro Kovalskyi (UCSB)

Top-Quark Studies at CMS

How To Teach Physics At The Lhc

PHYSICS WITH LHC EARLY DATA

A Physics Approach to Big Data. Adam Kocoloski, PhD CTO Cloudant

Top rediscovery at ATLAS and CMS

Big Data Analytics. for the Exploitation of the CERN Accelerator Complex. Antonio Romero Marín

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

Measurement of the Mass of the Top Quark in the l+ Jets Channel Using the Matrix Element Method

Accelerating Experimental Elementary Particle Physics with the Gordon Supercomputer. Frank Würthwein Rick Wagner August 5th, 2013

FCC JGU WBS_v0034.xlsm

The Emerging Discipline of Data Science. Principles and Techniques For Data- Intensive Analysis

Cross section, Flux, Luminosity, Scattering Rates

How To Find The Higgs Boson

Online data handling with Lustre at the CMS experiment

Neural networks in data analysis

What is the real cost of Commercial Cloud provisioning? Thursday, 20 June 13 Lukasz Kreczko - DICE 1

Big Data Needs High Energy Physics especially the LHC. Richard P Mount SLAC National Accelerator Laboratory June 27, 2013

Linux and the Higgs Particle

The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group

Concepts in Theoretical Physics

Measurement of Neutralino Mass Differences with CMS in Dilepton Final States at the Benchmark Point LM9

Jet Reconstruction in CMS using Charged Tracks only

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier

Online Monitoring in the CDF II experiment

Searching for the Building Blocks of Matter

Physik des Higgs Bosons. Higgs decays V( ) Re( ) Im( ) Figures and calculations from A. Djouadi, Phys.Rept. 457 (2008) 1-216

Theory versus Experiment. Prof. Jorgen D Hondt Vrije Universiteit Brussel jodhondt@vub.ac.be

High Energy Physics. Lecture 4 More kinematics and a picture show of particle collisions

Web based monitoring in the CMS experiment at CERN

Large Hadron Collider am CERN

Online CMS Web-Based Monitoring. Zongru Wan Kansas State University & Fermilab (On behalf of the CMS Collaboration)

The STAR Level-3 Trigger System

New Design and Layout Tips For Processing Multiple Tasks

Search for Dark Matter at the LHC

ATLAS Petascale Data Processing on the Grid: Facilitating Physics Discoveries at the LHC

Computing at the HL-LHC

Theoretical Particle Physics FYTN04: Oral Exam Questions, version ht15

variables to investigate Monte Carlo methods of t t production

ALICE Trigger and Event Selection QA

The Compact Muon Solenoid Experiment. CMS Note. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland. D. J. Mangeol, U.

Implications of CMS searches for the Constrained MSSM A Bayesian approach

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

FTK the online Fast Tracker for the ATLAS upgrade

The Data Quality Monitoring Software for the CMS experiment at the LHC

HIGH ENERGY PHYSICS EXPERIMENTS IN GRID COMPUTING NETWORKS EKSPERYMENTY FIZYKI WYSOKICH ENERGII W SIECIACH KOMPUTEROWYCH GRID. 1.

Abderrahman El Kharrim

Physics for the 21 st Century. Unit 1: The Basic Building Blocks of Matter. Bonnie Fleming and Mark Kruse

An Open Dynamic Big Data Driven Applica3on System Toolkit

Invenio: A Modern Digital Library for Grey Literature

The CMS All Silicon Tracker

Data Quality Monitoring. workshop

An Integrated CyberSecurity Approach for HEP Grids. Workshop Report.

A Guide to Detectors Particle Physics Masterclass. M. van Dijk

Launching DORIS II and ARGUS. Herwig Schopper University Hamburg and CERN

Database Monitoring Requirements. Salvatore Di Guida (CERN) On behalf of the CMS DB group

GRID computing at LHC Science without Borders

Web-based pre-analysis Tools

Calorimetry in particle physics experiments

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data, Social Networks, and Human Behavior

Performance Monitoring of the Software Frameworks for LHC Experiments

The OPERA Emulsions. Jan Lenkeit. Hamburg Student Seminar, 12 June Institut für Experimentalphysik Forschungsgruppe Neutrinophysik

Technical Case Study CERN the European Organization for Nuclear Research

Operation and Performance of the CMS Silicon Tracker

Transcription:

Data analysis in Par,cle Physics From data taking to discovery Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 1

$ whoami Lukasz (Luke) Kreczko Par,cle Physicist Graduated in Physics from University of Hamburg in 2009 2009 2013 PhD in Par,cle Physics at the University of Bristol Currently Compu,ng Research Assistant at the University of Bristol Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 2

Outline Data taking at the Compact Muon Solenoid (CMS) experiment Data format (and distribu,on) Data analysis procedure Summary Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 3

Outline Data taking at the Compact Muon Solenoid (CMS) experiment Data format (and distribu,on) Data analysis procedure Summary Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 4

Outline Data taking at the Compact Muon Solenoid (CMS) experiment Data format (and distribu,on) Data analysis procedure Summary Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 5

What is CERN Conseil Europeen pour la Recherche Nucleaire aka European Laboratory for Par,cle Physics Between Geneva and the Jura mountains, straddling the Swiss- French border Founded in 1954 with an interna,onal treaty Our business is fundamental par,cle and how our universe works What is the origin of mass? We are a step closer with the Higgs! What is 96 % of the universe made of? We only see 4%! Why isn t there an,- maber in the universe? What is the state of maber just ader the Big Bang? Saturday, 1 June 13 Lukasz Kreczko - Bristol IT MegaMeet 6

Large Hadron Collider Saturday, 1 June 13 Lukasz Kreczko - Bristol IT MegaMeet 7

Large Hadron Collider Mankind s biggest machine (27 km circumference) Ho:er than the centre of the sun: collisions are 100,000 @mes ho:er Colder than deep space: (super) liquid helium cooling at 1.9 K (- 271 C) Saturday, 1 June 13 Lukasz Kreczko - Bristol IT MegaMeet 8

A complex of accelerators Saturday, 1 June 13 Lukasz Kreczko - Bristol IT MegaMeet 9

The experiment: a big digital camera Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 10

The experiment: a big digital camera 40 million pictures per second Each picture around 1 MB! Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 11

The data: a structured mess Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 12

The data: a structured mess This is low intensity! Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 13

What do we do? Experiment Local compu,ng farm CERN data centre Globally distributed data centres My computer Paper Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 14

What do we do? Experiment Local compu,ng farm Today s focus CERN data centre Globally distributed data centres My computer Paper Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 15

The experiment - CMS Experiment Local compu,ng farm CERN data centre Globally distributed data centres My computer Paper Input from LHC 40 million collisions per second 40 Tera bytes per second Hardware trigger (L1) Low resolu,on Makes decision in 3 micro seconds Reduces output to 100 khz (100 GB/s) Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 16

High Level Trigger Experiment Local compu,ng farm CERN data centre Globally distributed data centres My computer Paper Input from experiment 100,000 collisions per second Sodware trigger (HLT) poor man s reconstruc,on High resolu,on Writes around 700 Hz (700 MB/s) in ROOT data format Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 17

(Event) Reconstruc,on hbp://en.wikipedia.org/wiki/ Event_reconstruc,on Reading the detector informa,on and bundling it into par,cles Detector response from different detector regions helps to iden,fy par,cles In addi,on algorithms look for specific par,cle behaviour (i.e. b- quark: travels half a millimetre before decaying) and iden,fy them Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 18

ROOT ROOT (hbp://root.cern.ch, hbp://root.cern.ch/git/root.git) Developed in 1995 ROOT is a lot of things: hbp://root.cern.ch/drupal/content/about Most used features (subjec,ve): data format, histograms, fipng Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 19

ROOT Also has a C interpreter (CINT) blessing and curse ask any student which one is more accurate 177 PB of LHC data stored in ROOT format ROOT The Next Genera,on : hbps://indico.cern.ch/conferencetimetable.py? confid=217511#20130311 Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 20

ROOT data format hbp://root.cern.ch/drupal/content/root- files- 1 Binary storage for C++ objects Serialisa,on via TObject class Supports par,al reads (i.e. subset of objects) Objects grouped by event (i.e. file.getevent(10).electron.at(0).energy()) Supports read- ahead (tuneable parameter for analysis) Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 21

CERN T0 data reconstruc,on Experiment Local compu,ng farm CERN data centre Globally distributed data centres Input: 300-350 collisions per second Rest is done when machine is shut down Reconstruc,on Connec,ng the dots My computer Paper Removing noise Applying correc,ons Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 22

Analysing all data CMS records 10 000 Terabytes of data every year (around 70 years of full HD movies) + same amount of simula,on To analyse this on a single computer would take 64,000 years! Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 23

Analysing all data CMS records 10 000 Terabytes of data every year (around 70 years of full HD movies) + same amount of simula,on To analyse this on a single computer would take 64,000 years! Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 24

The LHC grid Experiment Local compu,ng farm Distribu,ng on a global scale CERN data centre Globally distributed data centres My computer This is where the analysis happens Paper Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 25

The data: a much nicer picture Jet: p T = 84.1 GeV/c η = 2.24 Missing E T : 22.3 GeV Jet: p T = 89.0 GeV/c η = 2.14 Jet: p T = 85.3 GeV/c η = 2.02 Jet: p T = 90.5 GeV/c η = 1.40 Muon: p T = 71.5 GeV/c η = 0.82 Run: 163583 Event: 26579562 _ m(f)=1.2 TeV/c 2 Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 26

The goal: extend our knowledge Billions of Jet: p T = 84.1 GeV/c η = 2.24 Jet: p T = 89.0 GeV/c η = 2.14 Run: 163583 Event: 26579562 Missing E T : 22.3 GeV Muon: p T = 71.5 GeV/c η = 0.82 + simula,on Jet: p T = 85.3 GeV/c η = 2.02 Jet: p T = 90.5 GeV/c η = 1.40 _ m(f)=1.2 TeV/c 2 S/(S+B) Weighted Events / 1.5 GeV 1500 1000 500 0 CMS -1 s = 7 TeV, L = 5.1 fb Data S+B Fit B Fit Component ±1σ ±2 σ Events / 1.5 GeV 1500 1000-1 s = 8 TeV, L = 5.3 fb Unweighted 120 130 (GeV) m γγ 110 120 130 140 150 m γγ (GeV) Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 27

The goal: extend our knowledge Billions of Jet: p T = 84.1 GeV/c η = 2.24 Jet: p T = 89.0 GeV/c η = 2.14 Run: 163583 Event: 26579562 Missing E T : 22.3 GeV Muon: p T = 71.5 GeV/c η = 0.82 + simula,on Jet: p T = 85.3 GeV/c η = 2.02 Jet: p T = 90.5 GeV/c η = 1.40 _ m(f)=1.2 TeV/c 2 S/(S+B) Weighted Events / 1.5 GeV 1500 1000 500 0 CMS -1 s = 7 TeV, L = 5.1 fb Data S+B Fit B Fit Component ±1σ ±2 σ That s the famous Higgs boson Events / 1.5 GeV 1500 1000-1 s = 8 TeV, L = 5.3 fb Unweighted 120 130 (GeV) m γγ 110 120 130 140 150 m γγ (GeV) Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 28

Analysis Data prepara,on Correc@ons: applying the newest knowledge about the experiment Simula@on: newest knowledge of the theory histogramming Data reduc,on Event selec,on Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 29

Analysis Data prepara,on histogramming Data reduc,on Filtering: we know more or less what we are looking for Ntuples: objects - > plain data structures Event selec,on Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 30

Analysis Data prepara,on histogramming Data reduc,on Selec@on: very refined selec,on to increase signal purity (usually a,ny effect compared to backgrounds) Event selec,on Jet: p T = 84.1 GeV/c η = 2.24 Missing E T : 22.3 GeV Jet: p T = 89.0 GeV/c η = 2.14 Jet: p T = 85.3 GeV/c η = 2.02 Jet: p T = 90.5 GeV/c η = 1.40 Run: 163583 Event: 26579562 Muon: p T = 71.5 GeV/c η = 0.82 _ m(f)=1.2 TeV/c 2 Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 31

Analysis Data prepara,on histogramming Data reduc,on Analysis: apply algorithms (produce derived data) Histograms: data reduc,on S/(S+B) Weighted Events / 1.5 GeV 1500 1000 500 0 CMS -1 s = 7 TeV, L = 5.1 fb Data S+B Fit B Fit Component ±1σ ±2 σ Events / 1.5 GeV 1500 1000-1 s = 8 TeV, L = 5.3 fb Unweighted 120 130 m γγ (GeV) 110 120 130 140 150 (GeV) m γγ Event selec,on Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 32

Analysis Rinse & repeat Data prepara,on histogramming Data reduc,on Event selec,on Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 33

Analysis in Big data terms Data prepara,on MAP histogramming REDUCE Data reduc,on REDUCE Event selec,on MAP Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 34

Analysis in Big data terms Data prepara,on MAP LHC Grid histogramming REDUCE Data reduc,on REDUCE Usually local site Event selec,on MAP Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 35

Summary The data from the experiments are reduced before storing them to disk/tape All data is stored in ROOT format: either as classes or as basic data types Heavy workflows are performed on the LHC grid, frequent and fast work usually on local servers The final result is a histogram (or table) and is a huge reduc,on step from the input (20 PB - > 100 MB) Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 36

Ques,ons? Thank you for listening. Do you have any ques,ons? Saturday, 1 June 13 Lukasz Kreczko - Bristol IT MegaMeet 37

Secret slides Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 38

ROOT and CMS hbps://indico.cern.ch/getfile.py/access? contribid=16&resid=0&materialid=slides&con fid=217511 Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 39

hbps://indico.cern.ch/getfile.py/access?contribid=7&resid=0&materialid=slides&confid=217511 Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 40

What do we do? Experiment Local compu,ng farm online CERN data centre Globally distributed data centres My computer offline Paper Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 41