Linux and the Higgs Particle



Similar documents
GridKa: Roles and Status

The CMS analysis chain in a distributed environment

EGEE is a project funded by the European Union under contract IST

Grid Computing in Aachen

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

Tier0 plans and security and backup policy proposals

Computing at the HL-LHC

High Availability Databases based on Oracle 10g RAC on Linux

Accelerating Experimental Elementary Particle Physics with the Gordon Supercomputer. Frank Würthwein Rick Wagner August 5th, 2013

Implementing a Digital Video Archive Based on XenData Software

Data analysis in Par,cle Physics

Big Data Analytics. for the Exploitation of the CERN Accelerator Complex. Antonio Romero Marín

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Implementing a Digital Video Archive Using XenData Software and a Spectra Logic Archive

Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

The GRID and the Linux Farm at the RCF

Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software

Scalable Multi-Node Event Logging System for Ba Bar

Using High Availability Technologies Lesson 12

Testing the In-Memory Column Store for in-database physics analysis. Dr. Maaike Limper

An Integrated CyberSecurity Approach for HEP Grids. Workshop Report.

Scala Storage Scale-Out Clustered Storage White Paper

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

CHESS DAQ* Introduction

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

Accelerating Microsoft Exchange Servers with I/O Caching

Database Virtualization and the Cloud

Mass Storage System for Disk and Tape resources at the Tier1.

The dcache Storage Element

Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser

Using Linux Clusters as VoD Servers

HEP computing and Grid computing & Big Data

Protect Data... in the Cloud

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Maurice Askinazi Ofer Rind Tony Wong. Cornell Nov. 2, 2010 Storage at BNL

U-LITE Network Infrastructure

Accelerating and Simplifying Apache

(Scale Out NAS System)

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

ATLAS Petascale Data Processing on the Grid: Facilitating Physics Discoveries at the LHC

Hadoop on the Gordon Data Intensive Cluster

Ultra-Scalable Storage Provides Low Cost Virtualization Solutions

LHC schedule: what does it imply for SRM deployment? CERN, July 2007

Overview of Requirements and Applications for 40 Gigabit and 100 Gigabit Ethernet

Cluster, Grid, Cloud Concepts

How To Design A Data Centre

Optimizing Large Arrays with StoneFly Storage Concentrators

Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab

Integrated Grid Solutions. and Greenplum

SMB Direct for SQL Server and Private Cloud

HyperQ Storage Tiering White Paper

GRID computing at LHC Science without Borders

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

How To Create A Large Enterprise Cloud Storage System From A Large Server (Cisco Mds 9000) Family 2 (Cio) 2 (Mds) 2) (Cisa) 2-Year-Old (Cica) 2.5

Lecture 1: the anatomy of a supercomputer

Service Challenge Tests of the LCG Grid

Clusters: Mainstream Technology for CAE

The HP IT Transformation Story

Advantages of Tape-Network (LTO) Technology

Software Scalability Issues in Large Clusters

EMC Unified Storage for Microsoft SQL Server 2008

Using Linux Clusters as VoD Servers

White Paper. Recording Server Virtualization

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available

Evolution of Database Replication Technologies for WLCG

NAS or iscsi? White Paper Selecting a storage system. Copyright 2007 Fusionstor. No.1

New!! - Higher performance for Windows and UNIX environments

Cisco, Citrix, Microsoft, and NetApp Deliver Simplified High-Performance Infrastructure for Virtual Desktops

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Moving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage

Microsoft SQL Server 2005 on Windows Server 2003

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Transcription:

Linux and the Higgs Particle Dr. Bernd Panzer-Steindel Computing Fabric Area Manager, CERN/IT Linux World, Frankfurt 27.October 2004

Outline What is CERN The Physics The Physics Tools The Accelerator The Detectors The Computing Tools The Local Computing Fabric The World Wide GRID

The Institute CERN

CERN Conseil Européen pour la Recherche Nucléaire European Organisation for Particle Physics Basic Research Laboratory, world s largest particle physics centre Founded in 1954, 50 th Anniversary this year! Located on top of the French-Swiss border in Geneva (Switzerland) 2700 Staff members and Fellows plus ~6500 visitors on-site ~1000 MCHF (~700 MEuro) Annual Budget

CERN has some 6,500 visiting scientists from more than 500 institutes and 80 countries from around the world Europe: 267 institutes 4663 users Elsewhere: 238 institutes 1832 users

The Physics

Particle Physics Establish a periodic system of the fundamental building blocks and understand forces

The standard model of particle physics The Standard Model, the unification of three out of four theories. Great success with a precision of 0.1 % verified Constant interaction of theory and experiment but too many free input parameter and there are nonsense predictions at very high energies

The Higgs Particle The inclusion of the Higgs mechanism into the standard model fixes quite a few problems. The vacuum is not empty, but is filled with a Higgs particle condensate. All particles collide with the Higgs particle while they move through the vacuum. This acts like a molasses, slows the particles down and gives them mass. This is one of the key elements of the expansion of the standard model.

Open Questions Why are the parameters of the size as we observe them? What gives the particles their masses? How can gravity be integrated into a unified theory? Why is there only matter and no anti-matter in the universe? Are there more space-time dimensions than the 4 we know of? What is dark energy and dark matter which makes up 98% of the universe? finding the Higgs and possible new physics with LHC will give the answers!

The Physics Tools 1. The Accelerator

Methods of Particle Physics The most powerful microscope Creating conditions similar to the Big Bang

The principal accelerator machine components

The Large Hadron Collider LHC

View of the LHC Experiements

The LHC accelerator The largest superconducting installation in the world 27 kilometer long circle with two beam tubes 15 meter long dipole magnets at -271 o C 1700 superconducting magnets 7000 kilometers super conducting cables niobium-titanium with a copper matrix 13000 amps 8.3 Tesla magnetic field

Tides Stray currents Precision The 27 km length of the ring is sensitive to <1mm changes Rainfall

The Physics Tools 2. The Detectors

The ATLAS Experiment Diameter 25 m Barrel toroid length 26 m End-wall chamber span 46 m Overall weight 7000 Tons

The ATLAS Cavern 140000 m3 rock removed 53000 m3 concrete 6000 tons steel reinforcement 53 meters long 30 meters wide 53 meters high (10-storey building)

The CMS Magnet

The Dataflow of an Experiment

Data Rates Data Rates On-line System Multi-level trigger Filter out background Reduce data volume 24 x 7 operation 40 MHz 40 MHz (1000 TB/sec) (1000 TB/sec) 75 KHz 75 KHz (75 GB/sec) Level 1 - Special Hardware (75 GB/sec) 5 KHz 5 KHz (5 GB/sec) (5 GB/sec) Level 2 - Embedded Processors 100 Hz 100 Hz (100 MB/sec) (100 MB/sec) Level 3 Farm of commodity CPUs Data Recording Recording & Offline Offline Analysis Analysis

Particle physics data From raw data to physics results 2037 2446 1733 1699 4003 3611 952 1328 2132 1870 2093 3271 4732 1102 2491 3216 2421 1211 2319 2133 3451 1942 1121 3429 3742 1288 2343 7142 e + e - Z 0 f _ f Raw data Convert to physics quantities Detector response apply calibration, alignment Interaction with detector material Pattern, recognition, Particle identification Fragmentation, Decay, Physics analysis Basic physics Results Reconstruction Analysis Simulation (Monte-Carlo)

A Photo of a proton-proton collision (Event)

LHC data 40 million collisions per second After filtering, 100-200 collisions of interest per second, 1-10 good! events 1-10 Megabytes of data digitised for each collision = recording rate of 0.1-1 Gigabytes/sec 10 10 collisions recorded each year = ~15 Petabytes/year of data 1 Megabyte (1MB) A digital photo 1 Gigabyte (1GB) = 1000MB A DVD movie 1 Terabyte (1TB) = 1000GB World annual book production 1 Petabyte (1PB) = 1000TB Annual production of one LHC experiment 1 Exabyte (1EB) = 1000 PB World annual information production CMS LHCb ATLAS ALICE

The Computing Tools 1. The Local Computing Fabric

Challenge : Large, distributed community ATLAS Offline software effort: 1000 person-years per experiment CMS Software life span: 20 years ~ 5000 Physicists around the world - around the clock LHCb

detector event event filter filter (selection (selection& reconstruction) reconstruction) Data Handling and Computation for Physics Analysis reconstruction event summary data processed data raw data event event reprocessing reprocessing analysis batch batch physics physics analysis analysis analysis objects (extracted by physics topic) event event simulation simulation simulation interactive physics analysis

Requirements and Boundaries (I) The High Energy Physics applications require integer processor performance and less floating point performance choice of processor type, benchmark reference Large amount of processing and storage needed, but optimization is for aggregate performance, not the single tasks + the events are independent units many components, moderate demands on the single components, coarse grain parallelism Basic infrastructure, environment availability of space, cooling and electricity heavy investment, don t underestimate

Requirements and Boundaries (II) the major boundary condition is cost, staying within the budget envelope + maximum amount of resources commodity equipment, best price/performance values cheapest! take into account reliability, functionality and performance together == total-cost-of-ownership Chaotic workload! - batch & interactive - research environment == physics analysis by collective iterative discovery unpredictable data acces no practical limit to the requirements

View of different Fabric areas Automation, Operation, Control Installation Configuration + monitoring Fault tolerance Infrastructure Electricity, Cooling, Space Storage system (AFS, CASTOR, disk server) Benchmarks, R&D, Architecture Prototype, Testbeds Batch system (LSF, CPU server) Network GRID services!? Purchase, Hardware selection, Resource planning Coupling of components through hardware and software

Current CERN Fabrics architecture based on : In general on commodity components Dual Intel processor PC hardware for CPU, disk and tape Server Hierarchical Ethernet (100, 1000, 10000) network topology NAS disk server with ATA/SATA disk arrays RedHat Linux operating system Medium end tape drive (linear) technology OpenSource software for storage (CASTOR, OpenAFS) and cluster management (Quattor, Lemon, ELF) Commercial software packages (LSF, Oracle)

Level of complexity CPU PC Cluster Couplings Disk Storage tray, NAS server, SAN element Hardware Motherboard, backplane, Bus, integrating devices (memory,power supply, controller,..) Physical and logical coupling Network (Ethernet, fibre channel, Myrinet,.) Hubs, switches, routers Software Operating system (Linux), driver, applications Batch system (LSF), Mass Storage (CASTOR) filesystems (AFS), Control software, World wide cluster Grid-Fabric Interfaces Wide area network (WAN) Grid middleware, monitoring, firewalls (Services)

Building the Farm CPU server + Fiber Channel Interface + tape drive == Tape server Processors desktop+ node == CPU server CPU server + larger case + 6*2 disks == Disk server All using the Linux OS

Today s schematic network topology WAN Gigabit Ethernet 1000 Mbit/s Backbone Multiple Gigabit Ethernet, 20 * 1000 Mbit/s Disk Server Tape Server Gigabit Ethernet 1000 Mbit/s CPU Server Fast Ethernet, 100 Mbit/s Tomorrow s schematic network topology Backbone WAN 10 Gigabit Ethernet 10000 Mbit/s Multiple 10 Gigabit Ethernet 200 * 10000 Mbit/s 10 Gigabit Ethernet 10000 Mbit/s Gigabit Ethernet 1000 Mbit/s Disk Server CPU Server Tape Server

General Fabric Layout Development cluster GRID testbeds New software, new hardware (purchase) Certification cluster Main cluster en miniature R&D cluster (new architecture and hardware) Benchmark and performance cluster (current architecture and hardware) Service control and management (e.g. stager, HSM, LSF master, repositories, GRID services, CA, etc Main fabric cluster 2-3 hardware generations 2-3 OS/software versions 4 Experiment environments old current new

Software glue management of the basic hardware and software : installation, configuration and monitoring system (from the European Data Grid project) management of the processor computing resources : Batch system (LSF from Platform Computing) management of the storage (disk and tape) : CASTOR (CERN developed Hierarchical Storage Management system)

Linux Linux is our choice as the OS for all LHC computing Using Redhat Enterprise version We have our own 4 person support team Linux deployed on ~2000 farm PC s and 1500 desktop nodes still trying to sort out an efficient TCO (Total-Cost_of_Ownership) model stability versus new features problem tracking and bug fixes community support versus licenses and support contract Boundary conditions support of old versions user community heterogeneous, can t move to new versions easily long and complicated certification process of a new version several third-party products to be supported

The CERN Computing Centre ~4000 processors ~400 TBytes of disk ~12 PB of magnetic tape Even with technology-driven improvements in performance and costs CERN can provide nowhere near enough capacity for LHC!

Considerations current state of performance, functionality and reliability is good and technology developments look still promising more of the same for the future!?!? How can we be sure that we are following the right path? How to adapt to changes?

Strategy continue and expand the current system BUT do in parallel : R&D activities SAN versus NAS, iscsi, IA64 processors,. technology evaluations infiniband clusters, new filesystem technologies,.. Data Challenges to test scalabilities on larger scales bring the system to it s limit and beyond we are very successful already with this approach, especially with the beyond part watch carefully the market trends

Challenges 1. Status of the current system Is the stability of the equipment acceptable? stress test the equipment? where and what are the weak points / bottlenecks? 2. Physics Data Challenges test the bookkeeping, organization and management of data processing 3. Computing Data Challenge scalability of software and hardware in the fabric try to verify whether the current architecture would survive the anticipated load in the LHC area.

Dataflow local CERN Fabric 2007 Complex organization with high data rates (~10 GBytes/s) and ~100k streams in parallel permanent Disk Storage Calibration Farm Analysis Farm Raw Data Calibration Data Online Filter Farm (HLT) Reconstruction Farm permanent Disk Storage EST Data Raw Data Calibration Data Raw Data EST Data AOD Data AOD Data Disk Storage Disk Storage Raw Data Calibration Data EST Data AOD Data EST Data AOD Data Raw Data Calibration Data Tape Storage Tape Storage Tape Storage Tier 1 Data Export

High Througput Prototype (openlab( + LCG prototype) (specific layout, October 2004) 12 Tape Server STK 9940B 4 * GE connections to the backbone 10GE WAN connection lxsharexxxd 36 Disk Server (dual P4, IDE disks, ~ 1TB disk space each) 4 *ENTERASYS N7 10 GE Switches 2 * Enterasys X-Series 2 * 50 Itanium 2 (dual 1.3/1.5 GHz, 2 GB mem) oplapro0xx 10 GE per node tbed00xx 10 GE per node 80 * IA32 CPU Server (dual 2.4 GHz P4, 1 GB mem.) 10GE 10GE 1 GE per node 20 TB, IBM StorageTank lxs50xx 40 * IA32 CPU Server (dual 2.4 GHz P4, 1 GB mem.) 80 IA32 CPU Server (dual 2.8 GHz P4, 2 GB mem.) 12 Tape Server STK 9940B

IT Data Challenge performance [ GBytes/s] 1.4 CPU Disk Tape running in parallel with increasing production service 1.2 1.0 0.8 920 MB/s average 0.6 0.4 daytime tape server intervention 0.2 0.0 time in minutes

CERN computer center 2008 Hierarchical Ethernet network tree topology (280 GBytes/s) ~ 8000 mirrored disks ( 4 PB) ~ 4000 dual CPU nodes (20 million SI2000) ~ 170 tape drives (4 GB/s) ~ 25 PB tape storage estimated investment in 2006-2008: ~ 50 million Euro all numbers : IF exponential growth rate continues! (Moore s law)

The Computing Tools 2. The World Wide GRID

Why the GRID? The CERN computer center can only deliver only a fraction (~10%) of the cpu/disk capacity needed for the analysis of the huge amount of data delivered by the LHC experiments. Need a transparent mechanism for the physicists to run their analysis jobs anywhere in the world.

Scavenging unused cycles What is a Grid? Going strong since 1986 Not so easy to scavenge unused storage Berkeley Open Infrastructure for Network Computing

What is the Grid? Resource Sharing On a global scale, across the labs/universities Secure Access Needs a high level of trust Resource Use Load balancing, making most efficient use The Death of Distance Requires excellent networking Open Standards Allow constructive distributed development 5.44 Gbps 1.1 TB in 30 min. 6.25 Gbps 20 April 2004 There is not (yet) a single Grid

The GRID middleware: How will it work? Finds convenient places for the scientists job (computing task) to be run Optimises use of the widely dispersed resources Organises efficient access to scientific data Deals with authentication to the different sites that the scientists will be using Interfaces to local site authorisation and resource allocation policies Runs the jobs Monitors progress Recovers from problems and. Tells you when the work is complete and transfers the result back!

Virtual Organizations for LHC and others ATLAS VO BioMed VO CMS VO coupling of computer centres

UI JDL A Job Submission Example Input sandbox Data Management Services LFN->PFN Information Service Author. &Authen. Job Submit Job Query Output sandbox Job Status Resource Broker Job Submission Service Input sandbox Brokerinfo Storage Element Logging & Book-keeping keeping Job Status Output sandbox Compute Element

High Energy Physics Leading and Leveraging Grid Technology Many national, regional Grid projects -- GridPP(UK), INFN-grid(I), NorduGrid, Dutch Grid, US projects European projects

The LHC Computing Grid Project - LCG Collaboration LHC Experiments Grid projects: Europe, US Regional & national centres Choices Adopt Grid technology. Go for a Tier hierarchy. Use Intel CPUs in standard PCs Use LINUX operating system. Goal Prepare and deploy the computing environment to help the experiments analyse the data from the LHC detectors. Tier3 physics departmen t γ Lab a Tier2 β grid for a regional group CERN Tier 1 USA Lab b α Uni x Tier 1 Italy Uni y Desktop Taipei Lab m CERN Tier 0 UK France Japan Germany Uni b Uni a Lab c grid for a physics study group Uni n

desktops portables small centres IFCA LHC Computing Model (simplified!!) Tier-0 the accelerator centre Filter raw data Reconstruction summary data (ESD) Record raw data and ESD Distribute raw and ESD to Tier-1 Tier-1 Permanent storage and management of raw, ESD, calibration data, metadata, analysis data and databases grid-enabled data service Data-heavy analysis Re-processing raw ESD National, regional support online to the data acquisition process high availability, long-term commitment managed mass storage UB MSU IC Cambridge Budapest Prague Taipei Tier-2 TRIUMF Legnaro Tier-1 RAL IN2P3 FNAL CSCS CNAF FZK Rome Tier-2 PIC CIEMAT BNL Well-managed disk storage grid-enabled Simulation Data distribution ~ 70 Gbits/s Krakow NIKHEF ICEPP USC End-user analysis batch and interactive High performance parallel analysis (PROOF)

Challenges Service quality Reliability, availability, scaling, performance Security our biggest risk Management and operations grid a collaboration of computing centres Maturity is some years away - a second (or third) generation of middleware will be needed before LHC starts In the short-term there will many grids and middleware implementations for LCG - inter-operability will be a major headache How homogeneous does it need to be? Standards help to avoid adapters

The Summary

The scientific collaborations are large, global, and already in place There will be a lot of data complex data handling, large amount of storage, 10 s of PB and that will need a lot of processing power order of 100K processors The vast majority of the PC s will use Linux as the Operating System key element of the architecture Need to pay attention to the market developments, technology is of secondary concern We need to have the computing facility in perfect operational shape by the end of 2006, not much time left for such a complex operation A utility grid looks like a very good fit for LHC and LHC looks like an ideal pilot application for a utility grid