Optimisation of the ATLAS track reconstruction software for Run-2 A. Salzburger, CERN
ATLAS Inner Detector (ID) track reconstruction Track reconstruction is the most challenging step in event reconstruction - classical pattern recognition problem in steps 2
ATLAS Inner Detector (ID) track reconstruction Track reconstruction is the most challenging step in event reconstruction - classical pattern recognition problem in steps (1) track seeding using groups of 3D space points 2
ATLAS Inner Detector (ID) track reconstruction Track reconstruction is the most challenging step in event reconstruction - classical pattern recognition problem in steps (1) track seeding using groups of 3D space points (2) track candidate building using a combinatorial filter 2
ATLAS Inner Detector (ID) track reconstruction Track reconstruction is the most challenging step in event reconstruction - classical pattern recognition problem in steps (1) track seeding using groups of 3D space points (2) track candidate building using a combinatorial filter (3) ambiguity solving of tracks in the silicon tracker 2
ATLAS Inner Detector (ID) track reconstruction Track reconstruction is the most challenging step in event reconstruction - classical pattern recognition problem in steps (1) track seeding using groups of 3D space points (2) track candidate building using a combinatorial filter (3) ambiguity solving of tracks in the silicon tracker (4) track extension into transition radiation tracker (TRT) 2
The LHC was performing outstandingly well - came with a price - event pile-up, i.e. instantaneous collisions per bunch crossing Peak interactions per crossing The Run-1 data taking period 50 45 40 s = 7 TeV s = 7 TeV ATLAS Online Luminosity s = 8 TeV 35 30 25 20 what we initially designed for 15 10 5 0 Jan Apr Jul Oct Month in 2010 Jan Apr Jul Oct Jan Month in 2011 Apr Jul Oct Month in 2012 3
Boundaries and projections - 境 界 と 予 測 Run-2 start up brings excitement of new challenges - increase to 13 TeV (from 7 TeV) : more particles per collision - increase of HLT rate to 1kHz (from 400 Hz): more collisions to process/time - increase of pile-up to <μ>~40 (from ~20) : more collisions per bunch crossing Funding profile is likely to stay flat at best - extrapolation: 3 speed-up for event reconstruction needed - ID track reconstruction is the dominant part Achieved a factor 4 - reduced relative fraction of ID to total reconstruction time Reconstruction time per event [s] 80 70 60 50 40 30 20 10 0 ATLAS Simulation Preliminary RDO to ESD s = 14 TeV <µ> = 40 25 ns bunch spacing Run 1 Geometry pp tt HS06 = 13.08 Full reconstruction Inner Detector only 17.2, 32bit 19.0, 64bit 19.1, 64bit 20.1, 64bit Software release 4
Changing the algebra and math libraries Event Data Model (EDM) and algorithmic code was based on CLHEP Track reconstruction makes heavy use of matrix manipulations - usually N x M (with N,M in [1,5]) and inversions Identified that CLHEP was one of the bottlenecks in our software - simple testbeds implemented to mimic Kalman filterting or Jacobian transport - Eigen algebra library chosen (supports SIMD instructions) Massive reworking of entire ATLAS code 10 8 6 4 2 0 CLHEP MKL SMatrix Eigen Achieved speed-up w.r.t. CLHEP in 5x5 matrix multiplication testbed - more than 1000 packages changed - Eigen/ATLAS interface via typedefs and plugins http://eigen.tuxfamily.org/ 5
Cleaning up the Event Data Model (EDM) Flattening the structures of the track reconstruction EDM Surface trajectory needs to be expressed on different surfaces of the detector exist as charged / neutral representation may exist with covariance or without may be 5-dimensional or 6-dimensional representation (when adding mass) PlaneSurface CylinderSurface ConeSurface DiskSurface Run-1 EDM (x charge, x DIM) AtaPlane AtaCylinder AtaCone AtaDisk MeasuredAtaPlane MeasuredAtaCylinder MeasuredAtaCone MeasuredAtaDisk Run-2 EDM template <class Surface, class Charge, size_t DIM> AtaSurface; A. Salzburger - ATLAS Track Reconstruction Optimisation during LS1 - CHEP April 14, 13, 2015 PerigeeSurface StraightLineSurface Perigee AtaStraightLine MeasuredPerigee MeasuredAtaStraightLine 6
Enormous reduction of code lines in tracking EDM - while even extending the functionality Package C++ C/C++ C++ C/C++ Header Header TrkParameterBase 63 561 11 214 TrkParameters 1715 602 0 52 TrkNeutralParameters 1425 663 0 48 ExtendedTrkParameterBase 0 295 0 0 ExtendedTrkParameters 1412 514 0 0 ExtendedTrkNeutralParameters 1416 514 0 0 Total 6031 3149 11 266 - nice consequence: Run-1 Run-2 Run-1 simplification of object persistency service (only one converter needed) Additional campaigns cleanup A. Salzburger - ATLAS Track Reconstruction Optimisation during LS1 - CHEP April 14, 13, 2015 - removed lazy initialisation and dynamic memory allocation where possible (led to memory fragmentation) - implemented type identifiers to avoid dynamic_cast testing 7
Optimising the software - ソフトウェアを 最 適 化 Example: magnetic field access - numerical (Runge-Kutta) field integration is one of the big CPU consumers - ATLAS adaptive Runge-Kutta propagator has been highly optimised dedicated version was back-ported into Geant4 - field access was not yet optimised deep caller chain field data needed conversion was written in FORTRAN90 - new field service implemented simplified caller chain use native units use cell caching to store value of field -> minimised cache misses speed-up of 20% in simulation, few % in reconstruction Magnetic field map in memory as 3D grid Field look up in Runge-Kutta integration 8
class TrackParticle that marks the analysis representation of tracking. The constructor of the TrackParticleBase shows the new philosophy that allows multiple representations of the underlying track within the detector, while keeping one ParametersBase object specifically outstanding to identify the track state where the four-momentum is defined. Centralise tasks - タスクを 一 元 化 Track to calorimeter cluster association is a frequent process in event reconstruction: - clients throughout the combined reconstruction, e.g.: Figure 6: The new ParticleBase object illustrated in an example based on the ATLANTIS [5] event display. The Track muon/tau/electron is hereby represented reconstruction, with one single TrackParticleBase object at three missing track Et, etc. di erent stages in the detector: as a MeasuredPerigee expression close to the particle flow, photon reconstruction, - analysis showed that this was done up interaction point (defining parameters), to six times per track in our factory design through TrackParameters at the exit of the Inner Detector and the Calorimeter, respectively. switched to a service design where all tracks are dressed with their calorimeter cell associations time saving from multiple calls allows free d CPU cycles to be invested into a more precise job parameters at vertex (defining) - neatly tie in with the new ATLAS analysis event data format (xaod) parameters at ID exit parameters at Calorimeter exit Track prediction through an example calorimeter cell Run-1/2: intersection with cell center Run-2: additionally entry/exit position and path length in cell A. Salzburger - ATLAS Track Reconstruction Optimisation during LS1 - CHEP April 14, 13, 2015 9
Being smarter - 賢 く Track reconstruction software for Run-1 was designed with a lot of redundancy and safety margin Run-1 performance convinced us that our system was understood - and extremely well modelled by MC Re-investigation of track seeding - taking high purity seeds from strip - make optimal use of new innermost Pixel layer (IBL) Greatly improved the seed purity managed to be more efficient in less time: ~25 % saving triple seeds can we built as: - pixel space points only (PPP ) - strip space points only (SSS ) - a combination of both, e. g. (PSS ) 10
Being smarter - 賢 く Track reconstruction software for Run-1 was designed with a lot of redundancy and safety margin Run-1 performance convinced us that our system was understood - and extremely well modelled by MC Re-investigation of track seeding - taking high purity seeds from strip - make optimal use of new innermost Pixel layer (IBL) Greatly improved the seed purity managed to be more efficient in less time: ~25 % saving Efficiency of a seed with 3 space points resulting in a successful track candidate <μ> PPP PPS PSS SSS 0 57% 26% 29% 66% 40 17% 6% 5% 35% When requiring confirmation by another space point (Run-2 strategy) <μ> PPP+I PPS+I PSS+I SSS+I 0 79% 53% 52% 86% 40 39% 8% 16% 70% triple seeds can we built as: - pixel space points only (PPP ) - strip space Step points 2 only remove (SSS space ) - a combination of both, points e. g. from (PSS Step 1 ) Step 1 10
Doing better - より 良 いやって During Run-1 we developed a neural network based cluster splitting - aimed at identifying merged clusters stemming from multiple particles - was run as default before the pattern (ran over all clusters as default) Second iteration - update of the ambiguity solving method - only clusters on track candidates are further tested for splitting (less hits into the pattern) - at the same time use this information to allow for more shared hits on track - about 10% CPU gain Algorithmic Efficiency τ 1.1 1 0.9 Baseline TIDE ATLAS Preliminary Simulation, τ ν τ 3π ± 2 Shared SCT Clusters No Secondaries ATL-PHYS-PUB-2015-006 0.8 0 200 400 600 800 1000 τ p T [GeV] 11
Free lunch - フリーランチ Some CPU saving came for free by updates - new kernel in Scientific Linux 6 gave approximately 10% saving for total event reconstruction - switching from 32bit to 64bit architecture did increase memory footprint slightly - newer compiler version - some vectorisation benefits that came into place via the Eigen library nota bene: track reconstruction is mainly operating on local coordinate systems: DIM 1,2 global coordinate systems: DIM 3 full track representation: DIM 5 this is not optimal for a DIM 4 based vector register [ are currently revisiting a potential DIM 4 based Runge-Kutta method ] - switch to Intel math library (pre-loaded as a plug-in) 12
The factor 4 - 因 子4 Reconstruction time per event [s] LHC Run-1 80 70 60 50 40 30 20 10 0 Jan 2013 Eigen/algebra tests ATLAS Simulation Preliminary RDO to ESD > 1 year w/o working head release s = 14 TeV <µ> = 40 25 ns bunch spacing Run 1 Geometry pp tt HS06 = 13.08 Full reconstruction Inner Detector only 17.2, 32bit 19.0, 64bit 19.1, 64bit 20.1, 64bit integration mag field pattern updates LS 1 Planning & Deployment Software release LHC Run-2 A. Salzburger - ATLAS Track Reconstruction Optimisation during LS1 - CHEP April 14, 13, 2015 TIDE changes Tracking SW workshop Run-2 planning Nov 2012 Tracking SW workshop LS 1 Mid-term Oct 2013 March/April 2015 Run-2 release frozen 13
Today and Tomorrow - 今 日と 明 日 LS1 gave a unique opportunity to clean up the ATLAS track reconstruction software - massive campaign with a rework of almost the entire repository mixture of technology improvements, algorithmic improvements and simply code cleanup - disentangling the impact of the different projects is almost impossible, due to time constraint of LS-1 we had to develop and deploy in parallel We achieved a factor 4 speedup of the overall event reconstruction time - at the same time improving physics performance on all ends - mainly achieved by the Inner Detector track reconstruction - ready for Run-2 data taking Currently in review of the Tracking software for future ATLAS framework - anticipate extensive use of concurrency 14
ご 静 聴 ありがとうございました