GRID computing at LHC Science without Borders Kajari Mazumdar Department of High Energy Physics Tata Institute of Fundamental Research, Mumbai. Disclaimer: I am a physicist whose research field induces & utilizes cutting-edge technology in the field of electronics, communication,.. Dr. Paul s Eng. College, Velucherry September 12, 2011
Basic idea (G. Gilder): when the network is as fast as the computer s internal link, the machine disintegrates across the net into a set of special purpose appliances. Plan of talk Requirements of today s scientific community Grid concept in simple terms Evolution of Grid LHC Computing Grid and CMS experiment CMS Tier2 Grid Computing Centre at TIFR, Mumbai Outlook
Computing requirements and challenges Today s science is based on computations, data analysis, data visualization,.. 1. Scientific and engineering problems are getting ever more complex. 2. Collaborations are becoming larger. Computer simulation and modelling is more cost-effective than experimental methods in some cases (eg. reactor safety, designing of an aircraft). Users need more accurate and precise solutions to their problems in shortest time possible (eg. weather forecasts). Recent years is seeing mammoth scientific projects where data size is several PetaBytes per year (eg., LHC experiments) to be used by several thousand people. To work with a colleague even across a campus on Petabyte (1015 ) scale we need ultrafast network. Even though CPU power, disc storage, communication speed continue to increase, computing resources are failing to satisfy users demands!
Current trend in scientific communications 1. Free, open-source software GNU/Linux based OS has been developed consciously with many applications Research/academic institutes use cheaper PC clusters to achieve high performance easy to develop loosely coupled distributed applications. Softwares have to catch up with users demands and expectations for high end computing. 2. Parallel computing: multiple computers or processors working together on a common task -- each processor works on its section of the problem -- processors are allowed to exchange information among themselves Two big advantages of parallel computers: performance and memory. 3. Internet computing using idle PC s is becoming an important computing platform (LHC@home, Seti@home, Napster,..) www is the promising candidate for core component of wide-area distributed computing environment. Efficient client/server models/protocols Transparent networking, navigation, GUI with multimedia access and dissemination for data visualization.
Grid computing in simple words Grid is an utility or infra-structure for complex, huge computations, where remote resources are accessible through web (internet), from desktop, laptop, mobile phone. It is similar to the electrical power grid, where the user does not have to worry about the source of the computing power. Imagine millions of computers owned by individuals, institutes from various countries across the world connected to form a single, huge, super-computer! This technology, developed since last only one decade, is being used by --- high energy physicists to store, analyze data being produced by LHC experiments at CERN, Geneva, Switzerland. --- Earth scientists to monitor Ozone layer activity. --- Biologists to monitor behaviour of bees ---... It is the natural evolution of internet facility.
Going back World Wide Web Information Sharing Invented at CERN by Tim Berners-Lee (in 1990s) For use in High Energy Physics experiments Quickly crossed over into public use Agreed protocols, like, HTTP Anyone can access information and post their own GRID is changing the way science is being done. High-speed networking over large distance has been the key aspect of GRID.
From Web to Grid Computing Working together apart. Use of internet as infrastructure, and advanced web services for seemless Integration. 1. Sharing more than just information; Data, computing power, applications in dynamic, multi-institutional, virtual organizations tools: email, video conference, webcast. white board. 2. Efficient use of major and minor resources at many institutes. People from many institutions working to solve a common problem Ensure data accessible anywhere and anytime. 3. Interactions with the underneath layers need to be transparent and seemless to the user. 4. Harness the power of internet to aggregate and share resources spread across the globe: both challenging and highly cost-effective can give unlimited capability. Grow rapidly, yet remain reliable for more than a decade.
Large Hadron Collider (LHC) Largest ever scientific project 20 years to plan, build 20 years to work with 27 km circumference at 1.9 K at 10-13 Torr at 50-175 m below surface more than 10K magnets 4 big experiments, with about 10K scientists, 3k students,engineers. Operational since 2009, Q4 excellent performance fast reap of science!
LHC: ~ 10-12 seconds (p-p) ~ 10-6 seconds (Pb-Pb) Big Bang WMAP (2001) COBE(1989) COBE( Today Experiments in Astrophysics & Cosmology ~300 000 years
In hard numbers LHC collides 6-8 hundred million proton-on-proton per second for several years. Only 1 in ~20 thousand collisions will have an important tale to tell, but we do not know which one! so we have to search through all of them! Huge task! 15 PBytes (10 15 bytes) of data a year Analysis requires ~100,000 computers to get results in reasonable time. GRID computing is essential
Complexity of LHC experiments When 2 very high energy protons collide at LHC, it results in a very crowded situation. In a single experiment several million electrical signals are recorded within tiny fraction of a second, repeatedly, for a long time. There are 4 big experiments. Using computers, a digital image is created for each such instance. Image size can vary from 1 to 80 MB depending on the impact. But, unfortunately, most of these pictures are not interesting! One in few thousand billion collisions will be really useful to provide the clue about the early conditions in the universe! Store data by colliding intense beams of energetic protons. statistically search for clue of the early universe when it was much hotter.
ata volume rates for a typical experiment Presently event size ~ 1MB data collection rate ~ 400 Hz,
Layered Structure of CMS GRID Experimental site Tier 0 Tier 1 National centres ASIA (Taiwan) connecting computers across globe CERN computer centre, Geneva USA Germany Italy France Tier 2 Regional groups in a continent/nation India Indiacms T2_IN_TIFR Different Universities, Institutes in a country Individual scientist s PC,laptop,.. Chin a BARC TIFR Korea Taiwa n Delhi Univ. Pakist an Panjab Univ.
Overview of Grid Components A huge manpower is invisibly at work Tier2 components
Grid middleware The grid relies on advanced software which interfaces between resources and applications linked by internet: Middleware mediates everything 1.Secure and effective unifrom access to wide range of resources 2.Optimal use 3.Authentication to the system by digital certificate and then to groups and sites 4. Application level amangemnet: job execution and monitoring during progress 5.Problem recovery 6.Collection of results after execution and delivery to user 7. Address inter-domain issue of security, policy, etc. authorisation rights to use the facility for the user s purpose Middleware components: User Interface Resource broker/worksload management system Information system, file and replica catalogues Logging and book-keeping 1. You submit task to grid. Storage elements 2. Grid find convenient places to execute the task. compute elements decomposes if necessary. 3. Informs you when finished.
GRID portal / Gateway Event level parallelism: process event-by event. Split large job into M efficient processes, each dealing with M events. Large memory needed, though scalability is built-in.
Grid map for CMS experiment at LHC CMS in Total: 1 Tier-0 at CERN (GVA) 7 Tier-1s on 3 continents 50 Tier-2s on 4 continents CMS T2 in India : one of the 5 in Asia-Pacific region Today : 6 collaborating institutes in CMS, ~ 50 scientists +students 2.1% of signing authors in publication, Contributing to computing resource of CMS ~ 3%
CMS Tier2 site at TIFR: T2_IN_TIFR Current resources: Storage: 450 TB 400 worker nodes. Internet bandwidth > 1 GBps Note, continuous monitoring essential. To have reliable service and availability for 24X7 About 60 users/scientists at present, still growing. Grid facility has been functional at TIFR for last few years. The CMS collaboration at LHC, CERN has been using the computer resources at Mumbai to mainly perform event simulation, storing Physics data Indian contribution noted as collective service to the experiment.
Grid Connectivity within India Network connections 1 Gbps to CERN peered to GEANT 2.5 Gbps NKN +TEIN3 VECC-INDIAALICE-T2 TIFR-INDIACMS T2 100 Mbps to VECC RRCAT, IPR
Data Transfers from/to TIFR upload Total data volume at present ~ 250 TB download Current CMS total CPU pledge at T2s : 18k jobs slots Nominal Analysis pledge : 50% Slot utilization during Summer/Fall 09 was reasonable but need to go into sustained analysis mode Total transfers during last 6 months ~ 70 TB TIFR hosting i) centrally managed data (simulated, custodial) ii)collision data skims
August 15-18, 2011 Maximum: 1.5 Gbps Avg. : 1Gbps
Statistics and plots Site summary table Site ranking Site history
Conclusion Front ranking science and engineering requires massive amounts of computing, including huge data collection, storage and access to data, database etc. LHC is the largest grid serving in the world with 200 sites in 40 countires, equipped with tens of thousands of linux servers and tens of PetaByte storage. Seemless and transparent access is enabled by grid technology, without compromising on security and convenience. Challenge for the younger generation Conservation of network bandwidth or use on demand basis is a challenge. The technology is still young and immature Good tools are required Portability and scalability likely be resolved by virtualization YOU ARE WELCOME TO GET STARTED WITH GRID ISSUES!