New Design and Layout Tips For Processing Multiple Tasks

Similar documents
How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)

Next Generation Operating Systems

Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

Computing at the HL-LHC

FPGA-based Multithreading for In-Memory Hash Joins

DELL TM PowerEdge TM T Mailbox Resiliency Exchange 2010 Storage Solution

The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group

Windows Server Performance Monitoring

Benchmarking Cassandra on Violin

Readout Unit using Network Processors

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

IPRO ecapture Performance Report using BlueArc Titan Network Storage System

Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab

Accelerating Server Storage Performance on Lenovo ThinkServer

CSE-E5430 Scalable Cloud Computing Lecture 2

Tier0 plans and security and backup policy proposals

Final Project Report. Trading Platform Server

Performance and scalability of a large OLTP workload

Virtuoso and Database Scalability

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

Delivering Quality in Software Performance and Scalability Testing

Performance Guideline for syslog-ng Premium Edition 5 LTS

Chronicle: Capture and Analysis of NFS Workloads at Line Rate

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Very Large Enterprise Network Deployment, 25,000+ Users

XenDesktop 7 Database Sizing

11.1 inspectit inspectit

Initial Hardware Estimation Guidelines. AgilePoint BPMS v5.0 SP1

Muse Server Sizing. 18 June Document Version Muse

InterScan Web Security Virtual Appliance

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

Datacenter Operating Systems

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator

DELL s Oracle Database Advisor

Distributed applications monitoring at system and network level

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

Adonis Technical Requirements

Very Large Enterprise Network, Deployment, Users

Data Deduplication HTBackup

Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment. Sanjay Kulhari, Jian Wen UC Riverside

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Sun 8Gb/s Fibre Channel HBA Performance Advantages for Oracle Database

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

Benchmarking Hadoop & HBase on Violin

Optimization of Cluster Web Server Scheduling from Site Access Statistics

WHITE PAPER BRENT WELCH NOVEMBER

Introduction to the NI Real-Time Hypervisor

A Deduplication File System & Course Review

Performance measurements of syslog-ng Premium Edition 4 F1

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

Deliverable Billion Triple dataset hosted on the LOD2 Knowledge Store Cluster. LOD2 Creating Knowledge out of Interlinked Data

Hadoop Architecture. Part 1

Deep Dive: Maximizing EC2 & EBS Performance

Technical White Paper. Symantec Backup Exec 10d System Sizing. Best Practices For Optimizing Performance of the Continuous Protection Server

Shoal: IaaS Cloud Cache Publisher

VI Performance Monitoring

Online data handling with Lustre at the CMS experiment

WHITE PAPER FUJITSU PRIMERGY SERVER BASICS OF DISK I/O PERFORMANCE

STI Hardware Specifications for PCs

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

FTK the online Fast Tracker for the ATLAS upgrade

Improving Microsoft Exchange Performance Using SanDisk Solid State Drives (SSDs)

Evaluation Report: Supporting Microsoft Exchange on the Lenovo S3200 Hybrid Array

High Performance Tier Implementation Guideline

Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010

LSI MegaRAID CacheCade Performance Evaluation in a Web Server Environment

An Oracle White Paper December Oracle Virtual Desktop Infrastructure: A Design Proposal for Hosted Virtual Desktops

Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data

Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles

Microsoft Exchange Solutions on VMware

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

SQL Server Consolidation Using Cisco Unified Computing System and Microsoft Hyper-V

EMC Business Continuity for Microsoft SQL Server 2008

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011

Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack

Parallel Replication for MySQL in 5 Minutes or Less

Microsoft SharePoint Server 2010

Analyzing the Virtualization Deployment Advantages of Two- and Four-Socket Server Platforms

Software-defined Storage Architecture for Analytics Computing

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Distribution One Server Requirements

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

Performance Analysis and Capacity Planning Whitepaper

IOmark- VDI. HP HP ConvergedSystem 242- HC StoreVirtual Test Report: VDI- HC b Test Report Date: 27, April

Main Memory Data Warehouses

Transcription:

Novel, Highly-Parallel Software for the Online Storage System of the ATLAS Experiment at CERN: Design and Performances Tommaso Colombo a,b Wainer Vandelli b a Università degli Studi di Pavia b CERN IEEE Real-Time Conference, 12 June 2012 T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 1 / 13

The ATLAS Trigger & DAQ System Event rates design (2012 peak) 40 MHz (20 MHz) 2.5 µs 75 khz (~65 khz) Level 1 Trigger Custom Hardware Regions of Interest Level 1 Accept Calo/ Muon FE DAQ Other FE Other FE ROD ROD ROD Detector Readout Data rates design (2012 peak) ATLAS Event 1.5 MB/25 ns (1.6 MB/50ns) ~40 ms (~45 ms) 3 khz (~5.5 khz) ~4 s (~1 s) ~200 Hz (~800 Hz) Level 2 Event Filter ~5000 dsa Processing Unit ~5000 Processing Unit ROI data L2 Accept Full events EF Accept Data Collection Network Event Filter Network CERN Permanent Storage ~150 Readout System ~100 Event Builder 5 Data Logger Data Flow ~110 GB/s (~105 GB/s) ~4.5 GB/s (~9 GB/s) ~300 MB/s (~1100 MB/s) T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 2 / 13

The current Data Logging system: overview Purpose 5 PCs receive data from the Event Filter system and write it to local disks. Each event is: analyzed to determine the tags applied by the Event Filter trigger processed (e.g. compressed) written to appropriate file(s) according to the tags Details The event tags are determined by the trigger algorithms based on the event content To facilitate off-line data distribution, every event is written to multiple files, one per each of its tags File checksum is calculated while writing CPU-intensive! T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 3 / 13

The current Data Logging system: overview Purpose 5 PCs receive data from the Event Filter system and write it to local disks. Each event is: analyzed to determine the tags applied by the Event Filter trigger processed (e.g. compressed) written to appropriate file(s) according to the tags Details The event tags are determined by the trigger algorithms based on the event content To facilitate off-line data distribution, every event is written to multiple files, one per each of its tags File checksum is calculated while writing CPU-intensive! T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 3 / 13

The current Data Logging system: limitations The current Data Logger implementation is essentially single-threaded: multiple threads receive the events from the EF a single thread does the processing and writing This design is very unlikely to scale: maximum processing throughput: ~ 500 MB/s comparable with I/O (network and disks) limits Network I/O Thread Events Q. ev ev ev put event get event Processing and Writing Thread It is a major blocker for the addition of new features requiring more CPU power than a single core can provide due to this event-level data compression currently needs to be performed off-line, as an additional step T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 4 / 13

The current Data Logging system: limitations The current Data Logger implementation is essentially single-threaded: multiple threads receive the events from the EF a single thread does the processing and writing This design is very unlikely to scale: maximum processing throughput: ~ 500 MB/s comparable with I/O (network and disks) limits Network I/O Thread Events Q. ev ev ev put event get event Processing and Writing Thread It is a major blocker for the addition of new features requiring more CPU power than a single core can provide due to this event-level data compression currently needs to be performed off-line, as an additional step T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 4 / 13

The current Data Logging system: limitations The current Data Logger implementation is essentially single-threaded: multiple threads receive the events from the EF a single thread does the processing and writing This design is very unlikely to scale: maximum processing throughput: ~ 500 MB/s comparable with I/O (network and disks) limits Network I/O Thread Events Q. ev ev ev put event get event Processing and Writing Thread It is a major blocker for the addition of new features requiring more CPU power than a single core can provide due to this event-level data compression currently needs to be performed off-line, as an additional step T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 4 / 13

New design: general considerations The data processing workload is embarrassingly parallel: the incoming data are already divided in events Constraint The raw data file format is strictly sequential It is impossible to do concurrent writes to the same file Necessary to calculate the overall file checksum before writing to disk, aiding in the detection of write errors Keeps the format complexity to a minimum Multiple events can be written to different raw data files concurrently, but no more than one event can be written to each data file at once T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 5 / 13

New design: general considerations The data processing workload is embarrassingly parallel: the incoming data are already divided in events Constraint The raw data file format is strictly sequential It is impossible to do concurrent writes to the same file Necessary to calculate the overall file checksum before writing to disk, aiding in the detection of write errors Keeps the format complexity to a minimum Multiple events can be written to different raw data files concurrently, but no more than one event can be written to each data file at once T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 5 / 13

New design: general considerations The data processing workload is embarrassingly parallel: the incoming data are already divided in events Constraint The raw data file format is strictly sequential It is impossible to do concurrent writes to the same file Necessary to calculate the overall file checksum before writing to disk, aiding in the detection of write errors Keeps the format complexity to a minimum Multiple events can be written to different raw data files concurrently, but no more than one event can be written to each data file at once T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 5 / 13

New design: idea Split the workload in tasks For each event: one task does the processing multiple tasks do the writing (for each tag, a task writes the event to the corresponding file). Use a single thread pool to execute the tasks Schedule the tasks cleverly to avoid locking At any given time: any number of processing tasks can run for each raw data file, only one task writing to it can run T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 6 / 13

New design: idea Split the workload in tasks For each event: one task does the processing multiple tasks do the writing (for each tag, a task writes the event to the corresponding file). Use a single thread pool to execute the tasks Schedule the tasks cleverly to avoid locking At any given time: any number of processing tasks can run for each raw data file, only one task writing to it can run T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 6 / 13

New design: idea Split the workload in tasks For each event: one task does the processing multiple tasks do the writing (for each tag, a task writes the event to the corresponding file). Use a single thread pool to execute the tasks Schedule the tasks cleverly to avoid locking At any given time: any number of processing tasks can run for each raw data file, only one task writing to it can run T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 6 / 13

New design: finally, a diagram! Events Raw file manager File eγ Q. File μ Q. File jets Q. PT 14 PT 13 PT 12 WT 7-eγ WT 6-eγ WT 5-eγ WT 7-μ WT 6-μ WT 6-jets WT 5-jets schedule schedule only one notify completion Execution Q. PT 11 PT 10 WT 2-jets PT 9 PT nn WT nn-aa Processing task for event nn Writing task for stream aa of event nn enqueue run Thread Thread Thread WT 8-jets WT 1-eγ WT 1-μ PT 8 WT 8-μ T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 7 / 13

New design: implementation with Threading Building Blocks Events Raw file manager File eγ Q. File μ Q. File jets Q. PT 14 PT 13 PT 12 WT 7-eγ WT 6-eγ WT 5-eγ WT 7-μ WT 6-μ WT 6-jets WT 5-jets schedule schedule only one notify completion Execution Q. PT 11 PT 10 Writing task The new design was implemented WT 2-jets WT nn-aa for stream using aa of event nn (and inspired PT by) 9 the open source C++ library Thread run Thread PT nn Intel Threading Building Blocks Thread Processing task for event nn WT 8-jets enqueue WT 1-eγ WT 1-μ PT 8 WT 8-μ T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 8 / 13

New design: implementation with Threading Building Blocks Events PT 14 PT 13 PT 12 File eγ Q. WT 7-eγ File μ Q. Task based multi-threading Raw file manager File jets Q. WT 6-eγ WT 7-μ WT 6-jets WT 5-eγ WT 6-μ WT 5-jets schedule Task execution queue schedule only one notify completion Execution Q. PT 11 PT 10 WT 2-jets PT 9 PT nn WT nn-aa Processing task for event nn Writing task for stream aa of event nn enqueue Thread pool run Thread Thread Thread WT 8-jets WT 1-eγ WT 1-μ PT 8 WT 8-μ T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 8 / 13

New design: implementation with Threading Building Blocks Events Raw file manager File eγ Q. File μ Q. File jets Q. PT 14 PT 13 PT 12 WT 7-eγ WT 6-eγ WT 5-eγ WT 7-μ WT 6-μ WT 6-jets WT 5-jets schedule Concurrent queue notify completion Thread Execution Q. PT 11 PT 10 WT 2-jets PT 9 run Thread schedule only one Concurrent hash map PT nn WT nn-aa Thread Processing task for event nn Writing task for stream aa of event nn WT 8-jets enqueue Thread-safe containers optimized for concurrency WT 1-eγ WT 1-μ PT 8 WT 8-μ T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 8 / 13

Performance evaluation: resource utilization The new implementation was tested and compared with the old one in the current production system in a testbed with older hardware A single Data Logger machine was operated at saturation The dataset consisted of actual event data, with 1 to 4 tags assigned to each event changing the number of tags per event changes the number of files the Data Logger has to write each event to does not change the required network bandwidth Testbed Data Logger PC 2x dual-core Xeon 5130 4 GB RAM 3x 3ware RAID5 array 2x GbE NIC Production Data Logger PC 2x quad-core Xeon E5520 24 GB RAM 3x Adaptec RAID5 array 2x GbE NIC T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 9 / 13

Performance evaluation: resource utilization The new implementation was tested and compared with the old one in the current production system in a testbed with older hardware A single Data Logger machine was operated at saturation The dataset consisted of actual event data, with 1 to 4 tags assigned to each event changing the number of tags per event changes the number of files the Data Logger has to write each event to does not change the required network bandwidth T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 9 / 13

Performance evaluation: resource utilization The new implementation was tested and compared with the old one in the current production system in a testbed with older hardware A single Data Logger machine was operated at saturation The dataset consisted of actual event data, with 1 to 4 tags assigned to each event changing the number of tags per event changes the number of files the Data Logger has to write each event to does not change the required network bandwidth T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 9 / 13

Performance evaluation: resource utilization The new implementation was tested and compared with the old one in the current production system in a testbed with older hardware A single Data Logger machine was operated at saturation The dataset consisted of actual event data, with 1 to 4 tags assigned to each event changing the number of tags per event changes the number of files the Data Logger has to write each event to does not change the required network bandwidth T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 9 / 13

Performance evaluation: resource utilization The new implementation was tested and compared with the old one in the current production system in a testbed with older hardware A single Data Logger machine was operated at saturation The dataset consisted of actual event data, with 1 to 4 tags assigned to each event changing the number of tags per event changes the number of files the Data Logger has to write each event to does not change the required network bandwidth T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 9 / 13

Performance evaluation: resource utilization The new implementation was tested and compared with the old one in the current production system in a testbed with older hardware A single Data Logger machine was operated at saturation The dataset consisted of actual event data, with 1 to 4 tags assigned to each event changing the number of tags per event changes the number of files the Data Logger has to write each event to does not change the required network bandwidth T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 9 / 13

Performance evaluation: resource utilization The hard limit on the throughput of a single Data Logger is given by the network bandwidth: 2 Gb/s 250 MB/s Old single-threaded implementation Can operate at network saturation only for a single tag per event Above 2 tags per event, the load generated by its single thread exceeds what a single CPU core can take The throughput decreases accordingly T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 10 / 13

Performance evaluation: resource utilization The hard limit on the throughput of a single Data Logger is given by the network bandwidth: 2 Gb/s 250 MB/s New multi-threaded implementation The throughput is almost unaffected by the load Its 4 threads spread the workload on the 4 CPU cores: none of them uses more than 60% of a core Leaves plenty of headroom for additional CPU intensive processing compression T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 10 / 13

Performance evaluation: scalability (in testbed) On-line event compression (with zlib) radically changes the landscape The time spent compressing events (~ 50 ms per MB) dominates the rest of the processing (~ 2 ms per MB) throughput is much lower all workloads saturate the CPU One can examine the throughput as a function of the number of CPU cores (threads) used scaling is (almost) linear T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 11 / 13

Performance evaluation: scalability (in testbed) On-line event compression (with zlib) radically changes the landscape The time spent compressing events (~ 50 ms per MB) dominates the rest of the processing (~ 2 ms per MB) throughput is much lower all workloads saturate the CPU One can examine the throughput as a function of the number of CPU cores (threads) used scaling is (almost) linear T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 11 / 13

Performance evaluation: scalability (in production) T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 12 / 13

Performance evaluation: scalability (in production) T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 12 / 13

Conclusions A novel design for the ATLAS Data Logging application was implemented and thoroughly tested The performance of the new software is very satisfactory: taps into the full power of modern CPUs future-proofs the Data Logger enables the addition of computationally-intensive features It will be one of the essential components of the evolved system currently being developed to meet the challenges of LHC data-taking in 2014 and beyond T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 13 / 13

Conclusions A novel design for the ATLAS Data Logging application was implemented and thoroughly tested The performance of the new software is very satisfactory: taps into the full power of modern CPUs future-proofs the Data Logger enables the addition of computationally-intensive features It will be one of the essential components of the evolved system currently being developed to meet the challenges of LHC data-taking in 2014 and beyond T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 13 / 13

Conclusions A novel design for the ATLAS Data Logging application was implemented and thoroughly tested The performance of the new software is very satisfactory: taps into the full power of modern CPUs future-proofs the Data Logger enables the addition of computationally-intensive features It will be one of the essential components of the evolved system currently being developed to meet the challenges of LHC data-taking in 2014 and beyond T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 13 / 13

Backup T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 14 / 13

Other design constraints Operations are driven by the received event data The Data Logger can only rely on the information it gathers by examining the received events No assumptions about the data flow Cannot assume that the rate of received events is somehow balanced across the spectrum of the possible tags The flow of events with one tag can vary during a run and even stop completely T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 15 / 13

Other design constraints Operations are driven by the received event data The Data Logger can only rely on the information it gathers by examining the received events No assumptions about the data flow Cannot assume that the rate of received events is somehow balanced across the spectrum of the possible tags The flow of events with one tag can vary during a run and even stop completely T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 15 / 13

for ask access 1 Stream 1 Lumiblock Event Queue 1 Stream 2 Lumiblock save event Other possible designs: thread pool with locking get event Processing Thread Raw File Manager Stream 2 Stream 3 T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 16 / 13

Event Queue 1 Stream 2 Lumiblock 1 Stream 1 Lumiblock Other possible designs: chain of responsibility Processing Thread Processing Thread Processing Thread Stream 2 Risk of starvation! T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 17 / 13

1 Stream 2 Lumiblock Event Queue 1 Stream 2 Lumiblock Other possible designs: one thread pool per file Processing Thread Processing Thread Processing Thread Stream 2 Too many threads! T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 18 / 13

zlib performance evaluation T. Colombo, W. Vandelli (Pavia U., CERN) ATLAS Highly-parallel Data-Logging System Real-Time Conference, 12 Jun 2012 19 / 13