Advancements in Big Data Processing
|
|
- Mitchell Campbell
- 8 years ago
- Views:
Transcription
1 Advancements in Big Data Processing Alexandre Vaniachine V Interna0onal Conference "Distributed Compu0ng and Grid- technologies in Science and Educa0on" (Grid2012) July 16 21, 2012 JINR, Dubna, Russia
2 Big Data and the Grid The forthcoming conference will be the fioh one organized by the JINR Laboratory of Informa0on Technologies star0ng from The distributed compu0ng and grid- technologies have gone beyond the academic interest since than. The present- day grid- technologies made it possible to integrate the compu0ng centers located at any place worldwide and to provide these compu0ng resources to the users. The present day high technologies ask for the development of high- performance compu0ng. The coverage of the ac0vi0es involving huge amounts of informa0on throughputs encompasses now not only various areas of natural sciences but also business and industry. During the last few years the implemented distributed compu0ng grid- infrastructure enables the solu0on of a wide class of tasks that are asking for the processing of enormous amount of data. 2
3 Impressive Conference Series 3
4 Fourth Paradigm In a speech given just a few weeks before he was lost at sea off the California coast in January 2007, Jim Gray, a database sooware pioneer and a MicrosoO researcher, sketched out an argument that exaflood of scien0fic data is fundamentally transforming the prac0ce of science Dr. Gray called the shio a fourth paradigm Invited talk at Grid2010 Conference, Dubna, Russia hzp:// A. Vaniachine 4
5 Big Data The ever- increasing volumes of scien0fic data present new challenges to Distributed Compu0ng and Grid- technologies The emerging Big Data revolu0on drives new discoveries in scien0fic fields including nanotechnology, astrophysics, high- energy physics, biology and medicine New ini0a0ves are transforming data- driven scien0fic fields enabling massive data analysis in new ways To address the Big Data processing challenge, the LHC experiments are relying on the computa0onal grids infrastructure deployed in the framework of the Worldwide LHC Compu0ng Grid Following Big Data processing on the Grid, more than 8000 scien0sts analyze LHC data in search of discoveries 5
6 Historic Milestone Expected from SM Higgs at given mh Global Effort Æ Global Success Results today only possible due to extraordinary performance of accelerators ± experiments ± Grid computing Observation of a new particle consistent with a Higgs Boson (but which RQH«" Historic Milestone but only the beginning Global Implications for the future!"#$%&'&($ 6
7 J. Incandela for the CMS COLLABORATION Big Data Processing for Discoveries GEN-SIM/AODSIM Production/Month!"#$%&'()*+&,#"-.-/+0+*+12##! !#9#:+1%6;<2#! 56-,%78369:;)%! A1'+B+6A1%1)&9# :+1%6C<2# 100 k July 4th 2012 The Status of the Higgs Search It would have been impossible to release physics results so quickly without! 7(2*-+,#CD?E&,*@# the outstanding performance of the Grid (including the CERN Tier-0) Number of concurrent ATLAS jobs Jan-July !#! <3(=)%/8>/%()%+?@A% A1'+B+6A1%1)&# Includes MC production, user and group analysis at CERN, 10 Tier1-s, ~ 70 Tier-2 federations! > 80 sites ;># > 1500 distinct ATLAS users do analysis on the GRID " Available resources fully used/stressed (beyond pledges in some cases) " Massive production of 8 TeV Monte Carlo samples " Very effective and flexible Computing Model and Operation team! accommodate high trigger rates and pile-up, intense MC simulation, analysis demands from worldwide users (through e.g. dynamic data placement) 12 Photo: Denis Balibouse, Pool/AP 7
8 Genesis Big Data experiments at LHC study quark- hadron transi0on (ALICE) electroweek phase transi0on Baryon number violated earlier mechanism: CP viola0on A.D. Sakharov (1967) CP- viola0on in Standard Model is too small for baryogenesis: indicates new physics CP- viola0on/bariogenesis beyond the LHC require Big Data 8
9 Future Big Data Experiments DAQ - event rate 748'9:$-+'1,;$'<=>>?&5+$'@A'B'C'&4&4%D' Experiment Event Size E,-,FE,3%)'$:$-+'1,;$'A'G'C'&4&4%' [kb] Rate [Hz] Rate [MB/s] High rate scenario for Belle II DAQ G'3)#,$1')K'%48'04+4' Belle II 300!?,*'$C#4-1,)-'K43+)%6'L' 6,000 1,800 LCG TDR (2005)* ALICE (HI) 12, '04+4'1+)%$0')-'+4#$' 100 1,250! ALICE (pp) 1, '9:$-+'1,;$'<=>>?&5+$'@A'B'C'&4&4%D' ATLAS 1,600 ' E,-,FE,3%)'$:$-+'1,;$'A'G'C'&4&4%' CMS 1,500 (HI'F'"-,+'J"*,6'B'C'&4&4%' G'3)#,$1')K'%48'04+4' LHCb 25!!?,*'$C#4-1,)-'K43+)%6'L' 2,000 50!"#$%&'#()'*#+,'*-%.,'/()('0(%!)*$'K%43+,)-')K'E,-,')-'0,1?'@=>>M'NO'=>MD' * The LHC experiments are,'%1+2$%3'()24%5$46+06% running J"*,' at a factor of two or higher event rates 5 July 2012 Martin Sevior, ICHEP, Melbourne 2012!"#$%&'()*#"+,-.'/$$01' 2,*,+$0'#%$3,1,)-'0"$'+)'*4-5'411"*#+,)-16'!+)%4.$'.%)81'K%)*'P@L>D'H&'+)'P@Q>>D'H&',-'Q'5$4%1R' 748'04+4'1+)%$0')-'+4#$'! Page 15!+)%4.$'.%)81'K%)*'P@L>D'H&'+)'P@Q>>D'H&',-'Q'5$4%1R' '!"#$%&'()*#"+,-.'/$$01' 2,*,+$0'#%$3,1,)-'0"$'+)'*4-5'411"*#+,)-16' (HI'F'"-,+'J"*,6'B'C'&4&4%'!)*$'K%43+,)-')K'E,-,')-'0,1?'@=>>M'NO'=>MD' J"*,' (HI'.%)81'K%)*'L>>'+)'=GS>>>'TU$#!#$3',-'Q'5$4%1R'!"#!$%#&'$($## )$%*+,'-$.#/01/#2,%3#4!"##11!"# (HI'.%)81'K%)*'L>>'+)'=GS>>>'TU$#!#$3',-'Q'5$4%1R'!"#$% 9
10 Big Data Processing in Future Experiments Belle II Computing Model Grid-based Distributed Computing Common framework for DAQ and offline basf2 based on root I/O Raw Data Storage and Processing Belle II detector MC Production and Ntuple Production MC Production (optional) Ntuple Analysis!"# Zden!k Dole"al SuperB Collaboration Meeting 1/6/ ! 10
11 Six Sigma Quality of Industrial Production In industry, Six Sigma analysis improves the quality of produc0on by iden0fying and removing the causes of defects A Six Sigma process is one in which products are free of defects at level In prac0ce, an industrial Six Sigma process corresponds to 4.5 sigma: taking into account the 1.5- sigma shio from varia0ons in produc0on process In contrast, Big Data processing requires six sigma quality in a true mathema0cal sense 10-8 level of defects 11
12 σ tot Tevatron LHC Why Six Sigma? 10-5 event selection in online data processing σ b σ (nb) σ jet (E T jet > s/20) σ W σ Z σ jet (E T jet > 100 GeV) σ t events/sec for L = cm -2 s -1 >1 PB per year 10-9 event selection in offline data processing σ jet (E T jet > s/4) σ Higgs (M H = 150 GeV) σ Higgs (M H = 500 GeV) s (TeV) Figure 4-1 Cross-section and rates (for a luminosity of cm 2 s 1 ) for various processes in proton Physics discovery requires six sigma quality during Big Data processing 12
13 Golden H 4l Channel: One Event per Billion LHC physics discovery requires Big Data processing errors below 10-8 level 13
14 Six Sigma Quality Failures recovered by re- tries achieves produc0on quality in Big Data processing on the Grid at the six sigma level No events were lost during the main ATLAS reprocessing campaign of the 2010 data that reconstructed on the Grid more than 1 PB of data with events In the last 2011 data reprocessing only two collision events out of events total could not be reconstructed These events were reprocessed later in a dedicated data recovery step Later, silent data corrup0on was detected in six events from the reprocessed 2010 data and in one case of five adjacent events from the 2011 reprocessed data Corresponding to event losses below the 10-8 level, this demonstrates sustained six sigma quality performance in Big Data processing CMS keeps event losses below the 10-8 level in the first pass processing (not Grid) 14
15 Task Processing: ATLAS vs. CMS In Big Data processing scien0sts deal with datasets, not individual files Thus, a task not a Request Define job is a major unit Task Tag Big Data processing on the Grid Task Evolution Recovery Speed up & Forcefinish DEfT: Dynamic Evolution for Tasks Forking & Extensions Transient & Intermediate Autosplitting Figure 1. Workload management architecture. 15
16 A Powerful Analogy!"#$%&'(&)*%&+,&% ' -. %/$00$%11%2$#$340%5$$-6#7*%8347"$*%9:$;.% Splipng of a large data processing task into jobs is similar to the splipng of a large file into smaller TCP/IP packets during the FTP data transfer Splipng data into smaller pieces achieves reliability by re- sending of the dropped TCP/IP packets!"#$%&'$(')"**"'++'',-&-'.-(/*0(1'2$%30(1'4%$5#' Likewise in Big Data processing transient job failures are recovered by re- tries In file transfer the TCP/IP packet is a unit of checkpoin0ng In HEP Big Data processing the checkpoin0ng unit is a job (PanDA) or a file (DIRAC) Unlike many 0ghtly- coupled high- performance compu0ng problems, which require inter- process communica0ons to be parallelized, high energy physics Big Data processing ooen called embarrassingly parallel, since each event is independent However, the event- level checkpoin0ng rarely used today, as its granularity is too small for Big Data processing in high energy physics Next genera0on systems need event- level checkpoin0ng for Big Data (see PanDA talk by K. De),%!"# 16
17 Reliability Engineering for Big Data on the Grid LHC experience has shown that Grid failures can occur for a variety of reasons Grid heterogeneity makes failures hard to diagnose and repair quickly Big Data processing on the Grid must tolerate a con0nuous stream of failures errors and faults While many fault- tolerance mechanisms improve the reliability of Big Data processing in the Grid, their benefits come at costs Reliability Engineering provides a framework for fundamental understanding of the Big Data processing on the Grid, which is not a desirable enhancement but a necessary requirement 17
18 Jim Gray Hypothesis Heisenbugs are the bugs which may or may not cause a fault for a given opera0on If a Heisenbug is present in the system, the error could be removed on retrying the opera0on Heisenbugs are also called as transient or intermizent faults Bohrbugs are the bugs which always cause a failure when a par0cular opera0on is performed In the presence of Bohrbugs, there would always be a failure on retrying the opera0on which caused the failure Bohrbugs are also called permanent faults 18
19 marked as down. If a component is down the operator proceeds to log in to the machine and check the logs for the given component, restart it, and make an entry on the team e-log about the incident. If the problem persists then one of the team leader checks and diagnoses the situation, and if needed contacts the WMAgent support (developers). But keeping up the infrastructure is not CMS limited Failure to just restarting Recovery some components: Cost it also involves checking the load on the machines and if necessary clean up disk space. An alert system is being developed in order to further automatize this task and it will be soon commissioned for production [9]. Available on CMS information server CMS CR -2012/070 Debugging site problems Given The Compact the distributed Muon Solenoid nature Experiment of the CMS computing model the debugging and monitoring Conference of site specific problems Report becomes a communication challenge. Site problems can be divided into two types: jobs cannot enter a site or jobs can run but fail. The Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland first one is solved by communicating the problem to the GlideIn factory support and debugging the problem with them. The second one involves the monitoring of failed jobs in the Global 10 May 2012 Monitor together with the information of Dashboard. Once the nature of the problem is found a site problem or a configuration problem it is then correctly propagated. To the site through the Savannah and GGUS ticket systems [11] in the first case or to the requestor in the latter one. Once the situation A is new understood era for central the request processing is either aborted and production (user configuration problems) or the submission to the site is stopped until the problem is understood and solved. in CMS Edgar Mauricio Fajardo Hernandez for the CMS Collaboration Closing-out requests Once a request is finished, all jobs have either succeeded or failed. At this moment the operator checks that the event count of the request is higher than 95% of the Abstract requested events for MC production and reprocessing and 100% for real data. If it is lower, The goal for CMS computing is to maximise the throughput of simulated event generation while also then another request must be created to recover the failed jobs; this task normally involves processing the real data events as quickly and reliably as possible. To maintain this achievement as communication withe quantity the of coordinators. events increases, since theonce beginningthe of 2011event CMS computing count has migrated is correct, at the the correct transfer Tier 1 level from its old production framework, ProdAgent, to a new one, WMAgent. The WMAgent subscription (to theframework corresponding offers improved processing custodial efficiency and Tier-1 increasedsite) resource usage is monitored as well as a reduction to be at 100%. Once the in manpower. operator agrees that the event count and transfers are fine, he sets the status of the request on In addition to the challenges encountered during the design of the WMAgent framework, several the ReqMgr as closed-out. operational issues have arisen during its commissioning. The largest operational challenges were in the usage and monitoring of resources, mainly a result of a change in the way work is allocated. Other tasks thatinstead areofmainly work being assigned done to operators, by the all work Team is centrally leaders injected andconsist managed in the ofrequest the following: tightly work Invited talk Grid2012 Conference, Dubna, Russia Manager system and the task of the operators has changed from running individual workflows to with the Integration team to commission new sites for MC production and to test new versions monitoring the global workload. 19 of ReqMgr, Global Workqueue and WMAgent. In this report we present how we tackled some of the operational challenges, and how we benefitted from the lessons learned in the commissioning of the WMAgent framework at the Tier 2 level in late As case studies, we will show how the WMAgent system performed during some of the large data reprocessing and Monte Carlo simulation campaigns.
20 ATLAS Failure Recovery Cost CPU- hours used to recover transient failures Job resubmission avoids data loss at the expense of CPU 0me used by the failed jobs Distribu0on of tasks ordered by CPU 0me used to recover transient failures is not uniform: Most of CPU 0me required for recovery was used in a small frac0on of tasks In 2010 reprocessing, the CPU 0me used to recover transient failures was 6% of the CPU 0me used for reconstruc0on In 2011 reprocessing, the CPU 0me used to recover transient failures was reduced to 4% of the CPU 0me used for the reconstruc0on ATLAS Production System, Ljubljana - June 20, 2012 Task Order 20
21 Waloddi Weibull 1939 Weibull published his paper on Weibull distribu0on in probability theory and sta0s0cs 1941 Weibull received a personal research professorship in Technical Physics at the Royal Ins0tute of Technology in Stockholm from the arms producer Bofors 1951 Weibull presented his most famous paper to the American Society of Mechanical Engineers (ASME) on Weibull distribu0on, using seven case studies 1972 American Society of Mechanical Engineers awarded Weibull the Gold Medal 1978 Royal Swedish Academy of Engineering Sciences awarded Weibull the Great Gold Medal The Weibull distribu0on is by far the world's most popular sta0s0cal model for produc0on data 21
22 Bi-modal Weibull Distribution: CPU-time Used to Recover Job Failures in ATLAS Data Reprocessing 100 Number of Tasks Majority of costs are from failures at the end of job 2011 improvements reduced major costs Log 10 (CPU-hours) 22
23 Time Overhead It takes about three million core- hours to processes one petabyte of ATLAS data Transient job failures and retries delay the reprocessing dura0on Op0miza0on of ATLAS Big Data processing workflow and other improvements cut the tails and halved the dura0on of the petabyte- scale reprocessing on the Grid from almost two months in 2010 to less than four weeks in 2011 Op0miza0on of fault- tolerance techniques vs. 0me overhead ( tails ) induced in task comple0on is an ac0ve area of research in Big Data processing on the Grid See the presenta0on by D. Golubkov at the ATLAS Compu0ng Workshop session today 23
24 Conclusions The emerging Big Data revolu0on drives new discoveries in scien0fic fields including nanotechnology, astrophysics, high- energy physics, biology and medicine In Big Data processing scien0sts deal with datasets, not individual files A task (comprised of many jobs) became a unit of Big Data processing on the Grid Splipng of the Big Data processing task into jobs enables fine- granularity checkpoin0ng at the level of a single job/file Fault- tolerance achieved through automa0c re- tries of the failed jobs induces a 0me overhead in the task comple0on, which is difficult to predict Reliability Engineering provides a framework for fundamental understanding of the Big Data processing on the Grid, which is not a desirable enhancement but a necessary requirement Inter- rela0onship of fault- tolerance techniques and performance predic0on is an ac0ve area of research in Big Data processing on the Grid 24
25 Extra
26 Slide from ICHEP: Which Experiments use Big Data? Progress: Scale of the Solution Progress in distributed computing and evolution of computing capacity WLCG processes 1.5M jobs on the grid per day More than 150PB of both disk and tape #,-!+" +!!!!!!!!" *!!!!!!!!" )!!!!!!!!" (!!!!!!!!" '!!!!!!!!"!"#$% #&'% ()!('% (!*#+%!" # $%&'('&)*+,-./01,23+$ 45!6"$7$)'8$9,2:2-,-/$-/;<$ &!!!!!!!!" %!!!!!!!!" $!!!!!!!!" #!!!!!!!!" Ian Fisk FNAL/CD!"./01#!" 2341#!" 5/61#!" 7861#!" 5/91#!".:01#!".:;1#!" 7:<1#!" =381#!" >?@1#!" ABC1#!" D3?1#!"./01##" 2341##" 5/61##" 7861##" 5/91##".:01##".:;1##" 7:<1##" =381##" >?@1##" ABC1##" D3?1##" 13 26
27 Big Data Processing in LHC Experiments Big Data processing on the Grid requires a balance between the CPU performance and the storage Pushing Bid Data limits, LHC compu0ng evolves from the hierarchy to the mesh IEEE NSS/MIC - October 24,
28 Tier 0 First Pass Data Processing Minimizing event losses, LHC experiments implemented mul0- pass processing for Big Data First pass data processing at Tier- 0 Reprocessing at Tier- 1 on the Grid CMS (right) reported first pass event losses at 10-8 level!"#$%"&'$()*+),- ATLAS (leo) tolerates event!"#$%&'()*+(($,$)-+./+. losses at 10-5 level during first pass processing Lost events are recovered during reprocessing 28
ADVANCEMENTS IN BIG DATA PROCESSING IN THE ATLAS AND CMS EXPERIMENTS 1. A.V. Vaniachine on behalf of the ATLAS and CMS Collaborations
ADVANCEMENTS IN BIG DATA PROCESSING IN THE ATLAS AND CMS EXPERIMENTS 1 A.V. Vaniachine on behalf of the ATLAS and CMS Collaborations Argonne National Laboratory, 9700 S Cass Ave, Argonne, IL, 60439, USA
More informationBig Data Processing Experience in the ATLAS Experiment
Big Data Processing Experience in the ATLAS Experiment A. on behalf of the ATLAS Collabora5on Interna5onal Symposium on Grids and Clouds (ISGC) 2014 March 23-28, 2014 Academia Sinica, Taipei, Taiwan Introduction
More informationATLAS Petascale Data Processing on the Grid: Facilitating Physics Discoveries at the LHC
ATLAS Petascale Data Processing on the Grid: Facilitating Physics Discoveries at the LHC Wensheng Deng 1, Alexei Klimentov 1, Pavel Nevski 1, Jonas Strandberg 2, Junji Tojo 3, Alexandre Vaniachine 4, Rodney
More informationData analysis in Par,cle Physics
Data analysis in Par,cle Physics From data taking to discovery Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 1 $ whoami Lukasz (Luke) Kreczko Par,cle Physicist Graduated in Physics from
More informationSUPPORT FOR CMS EXPERIMENT AT TIER1 CENTER IN GERMANY
The 5th InternaEonal Conference Distributed CompuEng and Grid technologies in Science and EducaEon SUPPORT FOR CMS EXPERIMENT AT TIER1 CENTER IN GERMANY N. Ratnikova, J. Berger, C. Böser, O. Oberst, G.
More information(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015
(Possible) HEP Use Case for NDN Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015 Outline LHC Experiments LHC Computing Models CMS Data Federation & AAA Evolving Computing Models & NDN Summary Phil DeMar:
More informationBetriebssystem-Virtualisierung auf einem Rechencluster am SCC mit heterogenem Anwendungsprofil
Betriebssystem-Virtualisierung auf einem Rechencluster am SCC mit heterogenem Anwendungsprofil Volker Büge 1, Marcel Kunze 2, OIiver Oberst 1,2, Günter Quast 1, Armin Scheurer 1 1) Institut für Experimentelle
More informationBig Data Needs High Energy Physics especially the LHC. Richard P Mount SLAC National Accelerator Laboratory June 27, 2013
Big Data Needs High Energy Physics especially the LHC Richard P Mount SLAC National Accelerator Laboratory June 27, 2013 Why so much data? Our universe seems to be governed by nondeterministic physics
More informationEvolution of Database Replication Technologies for WLCG
Home Search Collections Journals About Contact us My IOPscience Evolution of Database Replication Technologies for WLCG This content has been downloaded from IOPscience. Please scroll down to see the full
More informationE-mail: guido.negri@cern.ch, shank@bu.edu, dario.barberis@cern.ch, kors.bos@cern.ch, alexei.klimentov@cern.ch, massimo.lamanna@cern.
*a, J. Shank b, D. Barberis c, K. Bos d, A. Klimentov e and M. Lamanna a a CERN Switzerland b Boston University c Università & INFN Genova d NIKHEF Amsterdam e BNL Brookhaven National Laboratories E-mail:
More informationCMS: Challenges in Advanced Computing Techniques (Big Data, Data reduction, Data Analytics)
CMS: Challenges in Advanced Computing Techniques (Big Data, Data reduction, Data Analytics) With input from: Daniele Bonacorsi, Ian Fisk, Valentin Kuznetsov, David Lange Oliver Gutsche CERN openlab technical
More informationThe Emerging Discipline of Data Science. Principles and Techniques For Data- Intensive Analysis
The Emerging Discipline of Data Science Principles and Techniques For Data- Intensive Analysis What is Big Data Analy9cs? Is this a new paradigm? What is the role of data? What could possibly go wrong?
More informationCMS Computing Model: Notes for a discussion with Super-B
CMS Computing Model: Notes for a discussion with Super-B Claudio Grandi [ CMS Tier-1 sites coordinator - INFN-Bologna ] Daniele Bonacorsi [ CMS Facilities Ops coordinator - University of Bologna ] 1 Outline
More informationComputing at the HL-LHC
Computing at the HL-LHC Predrag Buncic on behalf of the Trigger/DAQ/Offline/Computing Preparatory Group ALICE: Pierre Vande Vyvre, Thorsten Kollegger, Predrag Buncic; ATLAS: David Rousseau, Benedetto Gorini,
More informationHow To Teach Physics At The Lhc
LHC discoveries and Particle Physics Concepts for Education Farid Ould- Saada, University of Oslo On behalf of IPPOG EPS- HEP, Vienna, 25.07.2015 A successful program LHC data are successfully deployed
More informationLHC Databases on the Grid: Achievements and Open Issues * A.V. Vaniachine. Argonne National Laboratory 9700 S Cass Ave, Argonne, IL, 60439, USA
ANL-HEP-CP-10-18 To appear in the Proceedings of the IV International Conference on Distributed computing and Gridtechnologies in science and education (Grid2010), JINR, Dubna, Russia, 28 June - 3 July,
More informationObject Database Scalability for Scientific Workloads
Object Database Scalability for Scientific Workloads Technical Report Julian J. Bunn Koen Holtman, Harvey B. Newman 256-48 HEP, Caltech, 1200 E. California Blvd., Pasadena, CA 91125, USA CERN EP-Division,
More informationObjectivity Data Migration
Objectivity Data Migration M. Nowak, K. Nienartowicz, A. Valassi, M. Lübeck, D. Geppert CERN, CH-1211 Geneva 23, Switzerland In this article we describe the migration of event data collected by the COMPASS
More informationBig Data and Storage Management at the Large Hadron Collider
Big Data and Storage Management at the Large Hadron Collider Dirk Duellmann CERN IT, Data & Storage Services Accelerating Science and Innovation CERN was founded 1954: 12 European States Science for Peace!
More informationScience+ Large Hadron Cancer & Frank Wurthwein Virus Hunting
Shared Computing Driving Discovery: From the Large Hadron Collider to Virus Hunting Frank Würthwein Professor of Physics University of California San Diego February 14th, 2015 The Science of the LHC The
More informationNT1: An example for future EISCAT_3D data centre and archiving?
March 10, 2015 1 NT1: An example for future EISCAT_3D data centre and archiving? John White NeIC xx March 10, 2015 2 Introduction High Energy Physics and Computing Worldwide LHC Computing Grid Nordic Tier
More informationLong Term Data Preserva0on LTDP = Data Sharing In Time and Space
Long Term Data Preserva0on LTDP = Data Sharing In Time and Space Jamie.Shiers@cern.ch Big Data, Open Data Workshop, May 2014 International Collaboration for Data Preservation and Long Term Analysis in
More informationThe Data Quality Monitoring Software for the CMS experiment at the LHC
The Data Quality Monitoring Software for the CMS experiment at the LHC On behalf of the CMS Collaboration Marco Rovere, CERN CHEP 2015 Evolution of Software and Computing for Experiments Okinawa, Japan,
More informationNo file left behind - monitoring transfer latencies in PhEDEx
FERMILAB-CONF-12-825-CD International Conference on Computing in High Energy and Nuclear Physics 2012 (CHEP2012) IOP Publishing No file left behind - monitoring transfer latencies in PhEDEx T Chwalek a,
More informationATLAS Software and Computing Week April 4-8, 2011 General News
ATLAS Software and Computing Week April 4-8, 2011 General News Refactor requests for resources (originally requested in 2010) by expected running conditions (running in 2012 with shutdown in 2013) 20%
More informationFrom Distributed Computing to Distributed Artificial Intelligence
From Distributed Computing to Distributed Artificial Intelligence Dr. Christos Filippidis, NCSR Demokritos Dr. George Giannakopoulos, NCSR Demokritos Big Data and the Fourth Paradigm The two dominant paradigms
More informationNew Design and Layout Tips For Processing Multiple Tasks
Novel, Highly-Parallel Software for the Online Storage System of the ATLAS Experiment at CERN: Design and Performances Tommaso Colombo a,b Wainer Vandelli b a Università degli Studi di Pavia b CERN IEEE
More informationComputing Model for SuperBelle
Computing Model for SuperBelle Outline Scale and Motivation Definitions of Computing Model Interplay between Analysis Model and Computing Model Options for the Computing Model Strategy to choose the Model
More informationStatus and Evolution of ATLAS Workload Management System PanDA
Status and Evolution of ATLAS Workload Management System PanDA Univ. of Texas at Arlington GRID 2012, Dubna Outline Overview PanDA design PanDA performance Recent Improvements Future Plans Why PanDA The
More informationA multi-dimensional view on information retrieval of CMS data
A multi-dimensional view on information retrieval of CMS data A. Dolgert, L. Gibbons, V. Kuznetsov, C. D. Jones, D. Riley Cornell University, Ithaca, NY 14853, USA E-mail: vkuznet@gmail.com Abstract. The
More informationBaBar and ROOT data storage. Peter Elmer BaBar Princeton University ROOT2002 14 Oct. 2002
BaBar and ROOT data storage Peter Elmer BaBar Princeton University ROOT2002 14 Oct. 2002 The BaBar experiment BaBar is an experiment built primarily to study B physics at an asymmetric high luminosity
More informationPoS(EGICF12-EMITC2)110
User-centric monitoring of the analysis and production activities within the ATLAS and CMS Virtual Organisations using the Experiment Dashboard system Julia Andreeva E-mail: Julia.Andreeva@cern.ch Mattia
More informationBNL Contribution to ATLAS
BNL Contribution to ATLAS Software & Performance S. Rajagopalan April 17, 2007 DOE Review Outline Contributions to Core Software & Support Data Model Analysis Tools Event Data Management Distributed Software
More informationSoftware, Computing and Analysis Models at CDF and D0
Software, Computing and Analysis Models at CDF and D0 Donatella Lucchesi CDF experiment INFN-Padova Outline Introduction CDF and D0 Computing Model GRID Migration Summary III Workshop Italiano sulla fisica
More informationOnline Monitoring in the CDF II experiment
Online Monitoring in the CDF II experiment 3t, Tetsuo Arisawa 4, Koji Ikado 4, Kaori Maeshima 1, Hartmut Stadie 3, Greg Veramendi 2, Hans Wenzel 1 1 Fermi National Accelerator Laboratory, Batavia, U.S.A.
More informationCERN s Scientific Programme and the need for computing resources
This document produced by Members of the Helix Nebula consortium is licensed under a Creative Commons Attribution 3.0 Unported License. Permissions beyond the scope of this license may be available at
More informationCLEO III Data Storage
CLEO III Data Storage M. Lohner 1, C. D. Jones 1, Dan Riley 1 Cornell University, USA Abstract The CLEO III experiment will collect on the order of 200 TB of data over the lifetime of the experiment. The
More informationScalus A)ribute Workshop. Paris, April 14th 15th
Scalus A)ribute Workshop Paris, April 14th 15th Content Mo=va=on, objec=ves, and constraints Scalus strategy Scenario and architectural views How the architecture works Mo=va=on for this MCITN Storage
More informationTop rediscovery at ATLAS and CMS
Top rediscovery at ATLAS and CMS on behalf of ATLAS and CMS collaborations CNRS/IN2P3 & UJF/ENSPG, LPSC, Grenoble, France E-mail: julien.donini@lpsc.in2p3.fr We describe the plans and strategies of the
More informationTesting the In-Memory Column Store for in-database physics analysis. Dr. Maaike Limper
Testing the In-Memory Column Store for in-database physics analysis Dr. Maaike Limper About CERN CERN - European Laboratory for Particle Physics Support the research activities of 10 000 scientists from
More informationData Management Plan (DMP) for Particle Physics Experiments prepared for the 2015 Consolidated Grants Round. Detailed Version
Data Management Plan (DMP) for Particle Physics Experiments prepared for the 2015 Consolidated Grants Round. Detailed Version The Particle Physics Experiment Consolidated Grant proposals now being submitted
More informationThe Resilient Smart Grid Workshop Network-based Data Service
The Resilient Smart Grid Workshop Network-based Data Service October 16 th, 2014 Jin Chang Agenda Fermilab Introduction Smart Grid Resilience Challenges Network-based Data Service (NDS) Introduction Network-based
More informationContext-aware cloud computing for HEP
Department of Physics and Astronomy, University of Victoria, Victoria, British Columbia, Canada V8W 2Y2 E-mail: rsobie@uvic.ca The use of cloud computing is increasing in the field of high-energy physics
More informationLHC schedule: what does it imply for SRM deployment? Jamie.Shiers@cern.ch. CERN, July 2007
WLCG Service Schedule LHC schedule: what does it imply for SRM deployment? Jamie.Shiers@cern.ch WLCG Storage Workshop CERN, July 2007 Agenda The machine The experiments The service LHC Schedule Mar. Apr.
More informationUsing S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier
Using S3 cloud storage with ROOT and CernVMFS Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier INDEX Huawei cloud storages at CERN Old vs. new Huawei UDS comparative
More informationHIGH ENERGY PHYSICS EXPERIMENTS IN GRID COMPUTING NETWORKS EKSPERYMENTY FIZYKI WYSOKICH ENERGII W SIECIACH KOMPUTEROWYCH GRID. 1.
Computer Science Vol. 9 2008 Andrzej Olszewski HIGH ENERGY PHYSICS EXPERIMENTS IN GRID COMPUTING NETWORKS The demand for computing resources used for detector simulations and data analysis in High Energy
More informationThe CMS analysis chain in a distributed environment
The CMS analysis chain in a distributed environment on behalf of the CMS collaboration DESY, Zeuthen,, Germany 22 nd 27 th May, 2005 1 The CMS experiment 2 The CMS Computing Model (1) The CMS collaboration
More informationBatch and Cloud overview. Andrew McNab University of Manchester GridPP and LHCb
Batch and Cloud overview Andrew McNab University of Manchester GridPP and LHCb Overview Assumptions Batch systems The Grid Pilot Frameworks DIRAC Virtual Machines Vac Vcycle Tier-2 Evolution Containers
More informationUS NSF s Scientific Software Innovation Institutes
US NSF s Scientific Software Innovation Institutes S 2 I 2 awards invest in long-term projects which will realize sustained software infrastructure that is integral to doing transformative science. (Can
More informationEnabling multi-cloud resources at CERN within the Helix Nebula project. D. Giordano (CERN IT-SDC) HEPiX Spring 2014 Workshop 23 May 2014
Enabling multi-cloud resources at CERN within the Helix Nebula project D. Giordano (CERN IT-) HEPiX Spring 2014 Workshop This document produced by Members of the Helix Nebula consortium is licensed under
More informationOpen access to data and analysis tools from the CMS experiment at the LHC
Open access to data and analysis tools from the CMS experiment at the LHC Thomas McCauley (for the CMS Collaboration and QuarkNet) University of Notre Dame, USA thomas.mccauley@cern.ch! 5 Feb 2015 Outline
More informationNext Generation Operating Systems
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the
More informationDSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group
DSS High performance storage pools for LHC Łukasz Janyst on behalf of the CERN IT-DSS group CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Introduction The goal of EOS is to provide a
More informationMission. To provide higher technological educa5on with quality, preparing. competent professionals, with sound founda5ons in science, technology
Mission To provide higher technological educa5on with quality, preparing competent professionals, with sound founda5ons in science, technology and innova5on, commi
More informationAugust 2009. Transforming your Information Infrastructure with IBM s Storage Cloud Solution
August 2009 Transforming your Information Infrastructure with IBM s Storage Cloud Solution Page 2 Table of Contents Executive summary... 3 Introduction... 4 A Story or three for inspiration... 6 Oops,
More informationCMS Software Deployment on OSG
CMS Software Deployment on OSG Bockjoo Kim 1, Michael Thomas 2, Paul Avery 1, Frank Wuerthwein 3 1. University of Florida, Gainesville, FL 32611, USA 2. California Institute of Technology, Pasadena, CA
More informationTier0 plans and security and backup policy proposals
Tier0 plans and security and backup policy proposals, CERN IT-PSS CERN - IT Outline Service operational aspects Hardware set-up in 2007 Replication set-up Test plan Backup and security policies CERN Oracle
More informationScheduling and Load Balancing in the Parallel ROOT Facility (PROOF)
Scheduling and Load Balancing in the Parallel ROOT Facility (PROOF) Gerardo Ganis CERN E-mail: Gerardo.Ganis@cern.ch CERN Institute of Informatics, University of Warsaw E-mail: Jan.Iwaszkiewicz@cern.ch
More informationStatus of Grid Activities in Pakistan. FAWAD SAEED National Centre For Physics, Pakistan
Status of Grid Activities in Pakistan FAWAD SAEED National Centre For Physics, Pakistan 1 Introduction of NCP-LCG2 q NCP-LCG2 is the only Tier-2 centre in Pakistan for Worldwide LHC computing Grid (WLCG).
More informationBENCHMARKING V ISUALIZATION TOOL
Copyright 2014 Splunk Inc. BENCHMARKING V ISUALIZATION TOOL J. Green Computer Scien
More informationTop-Quark Studies at CMS
Top-Quark Studies at CMS Tim Christiansen (CERN) on behalf of the CMS Collaboration ICHEP 2010, Paris 35th International Conference on High-Energy Physics tt 2 km 22 28 July 2010 Single-top 4 km New Physics
More informationTheoretical Particle Physics FYTN04: Oral Exam Questions, version ht15
Theoretical Particle Physics FYTN04: Oral Exam Questions, version ht15 Examples of The questions are roughly ordered by chapter but are often connected across the different chapters. Ordering is as in
More informationNERSC Archival Storage: Best Practices
NERSC Archival Storage: Best Practices Lisa Gerhardt! NERSC User Services! Nick Balthaser! NERSC Storage Systems! Joint Facilities User Forum on Data Intensive Computing! June 18, 2014 Agenda Introduc)on
More informationScalable Multi-Node Event Logging System for Ba Bar
A New Scalable Multi-Node Event Logging System for BaBar James A. Hamilton Steffen Luitz For the BaBar Computing Group Original Structure Raw Data Processing Level 3 Trigger Mirror Detector Electronics
More informationAn Integrated CyberSecurity Approach for HEP Grids. Workshop Report. http://hpcrd.lbl.gov/hepcybersecurity/
An Integrated CyberSecurity Approach for HEP Grids Workshop Report http://hpcrd.lbl.gov/hepcybersecurity/ 1. Introduction The CMS and ATLAS experiments at the Large Hadron Collider (LHC) being built at
More informationDesigning a Cloud Storage System
Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes
More informationTier-1 Services for Tier-2 Regional Centres
Tier-1 Services for Tier-2 Regional Centres The LHC Computing MoU is currently being elaborated by a dedicated Task Force. This will cover at least the services that Tier-0 (T0) and Tier-1 centres (T1)
More informationPHYSICS WITH LHC EARLY DATA
PHYSICS WITH LHC EARLY DATA ONE OF THE LAST PROPHETIC TALKS ON THIS SUBJECT HOPEFULLY We may have some two month of the Machine operation in 2008 LONG HISTORY... I will extensively use: Fabiola GIANOTTI
More informationBeyond High Performance Computing: What Matters to CERN
Beyond High Performance Computing: What Matters to CERN Pierre VANDE VYVRE for the ALICE Collaboration ALICE Data Acquisition Project Leader CERN, Geneva, Switzerland 2 CERN CERN is the world's largest
More informationAccelerating Experimental Elementary Particle Physics with the Gordon Supercomputer. Frank Würthwein Rick Wagner August 5th, 2013
Accelerating Experimental Elementary Particle Physics with the Gordon Supercomputer Frank Würthwein Rick Wagner August 5th, 2013 The Universe is a strange place! 67% of energy is dark energy We got no
More informationOnline Performance Monitoring of the Third ALICE Data Challenge (ADC III)
EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH European Laboratory for Particle Physics Publication ALICE reference number ALICE-PUB-1- version 1. Institute reference number Date of last change 1-1-17 Online
More informationDistributed Computing for CEPC. YAN Tian On Behalf of Distributed Computing Group, CC, IHEP for 4 th CEPC Collaboration Meeting, Sep.
Distributed Computing for CEPC YAN Tian On Behalf of Distributed Computing Group, CC, IHEP for 4 th CEPC Collaboration Meeting, Sep. 12-13, 2014 1 Outline Introduction Experience of BES-DIRAC Distributed
More informationFCC 1309180800 JGU WBS_v0034.xlsm
1 Accelerators 1.1 Hadron injectors 1.1.1 Overall design parameters 1.1.1.1 Performance and gap of existing injector chain 1.1.1.2 Performance and gap of existing injector chain 1.1.1.3 Baseline parameters
More informationDatabase Monitoring Requirements. Salvatore Di Guida (CERN) On behalf of the CMS DB group
Database Monitoring Requirements Salvatore Di Guida (CERN) On behalf of the CMS DB group Outline CMS Database infrastructure and data flow. Data access patterns. Requirements coming from the hardware and
More informationThe Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland
Available on CMS information server CMS CR -2012/114 The Compact Muon Solenoid Experiment Conference Report Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland 23 May 2012 CMS Data Transfer operations
More informationBig Data + Big Analytics Transforming the way you do business
Big Data + Big Analytics Transforming the way you do business Bryan Harris Chief Technology Officer VSTI A SAS Company 1 AGENDA Lets get Real Beyond the Buzzwords Who is SAS? Our PerspecDve of Big Data
More informationData Management in the Cloud: Limitations and Opportunities. Annies Ductan
Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management
More informationATLAS job monitoring in the Dashboard Framework
ATLAS job monitoring in the Dashboard Framework J Andreeva 1, S Campana 1, E Karavakis 1, L Kokoszkiewicz 1, P Saiz 1, L Sargsyan 2, J Schovancova 3, D Tuckett 1 on behalf of the ATLAS Collaboration 1
More informationRoadmap for Applying Hadoop Distributed File System in Scientific Grid Computing
Roadmap for Applying Hadoop Distributed File System in Scientific Grid Computing Garhan Attebury 1, Andrew Baranovski 2, Ken Bloom 1, Brian Bockelman 1, Dorian Kcira 3, James Letts 4, Tanya Levshina 2,
More informationBig Data Analytics. for the Exploitation of the CERN Accelerator Complex. Antonio Romero Marín
Big Data Analytics for the Exploitation of the CERN Accelerator Complex Antonio Romero Marín Milan 11/03/2015 Oracle Big Data and Analytics @ Work 1 What is CERN CERN - European Laboratory for Particle
More informationLeveraging OpenStack Private Clouds
Leveraging OpenStack Private Clouds Robert Ronan Sr. Cloud Solutions Architect! Robert.Ronan@rackspace.com! LEVERAGING OPENSTACK - AGENDA OpenStack What is it? Benefits Leveraging OpenStack University
More informationBig Data : The New Oil
Big Data : The New Oil Data Driven Innova7on: $ 15 billion by 2015 Data Driven Innova,on for Growth and Well Being: OECD Report: Interim Synthesis Report 2014 The new data- driven era of scien7fic discovery
More informationPerformance Monitoring of the Software Frameworks for LHC Experiments
Proceedings of the First EELA-2 Conference R. mayo et al. (Eds.) CIEMAT 2009 2009 The authors. All rights reserved Performance Monitoring of the Software Frameworks for LHC Experiments William A. Romero
More informationClusters in the Cloud
Clusters in the Cloud Dr. Paul Coddington, Deputy Director Dr. Shunde Zhang, Compu:ng Specialist eresearch SA October 2014 Use Cases Make the cloud easier to use for compute jobs Par:cularly for users
More informationRunning a typical ROOT HEP analysis on Hadoop/MapReduce. Stefano Alberto Russo Michele Pinamonti Marina Cobal
Running a typical ROOT HEP analysis on Hadoop/MapReduce Stefano Alberto Russo Michele Pinamonti Marina Cobal CHEP 2013 Amsterdam 14-18/10/2013 Topics The Hadoop/MapReduce model Hadoop and High Energy Physics
More informationThe Elusive U,lity Customer: How Big Data & Analy,cs Connects U,li,es & Their Customers
The Place Analy,cs Leaders Turn to for Answers Member.U(lityAnaly(cs.com The Elusive U,lity Customer: How Big & Analy,cs Connects U,li,es & Their Customers Mike Smith Vice President, U(lity Analy(cs Ins(tute
More informationUnterstützung datenintensiver Forschung am KIT Aktivitäten, Dienste und Erfahrungen
Unterstützung datenintensiver Forschung am KIT Aktivitäten, Dienste und Erfahrungen Achim Streit Steinbuch Centre for Computing (SCC) KIT Universität des Landes Baden-Württemberg und nationales Forschungszentrum
More informationWorkflow Requirements (Dec. 12, 2006)
1 Functional Requirements Workflow Requirements (Dec. 12, 2006) 1.1 Designing Workflow Templates The workflow design system should provide means for designing (modeling) workflow templates in graphical
More informationHow To Find The Higgs Boson
Dezső Horváth: Search for Higgs bosons Balaton Summer School, Balatongyörök, 07.07.2009 p. 1/25 Search for Higgs bosons Balaton Summer School, Balatongyörök, 07.07.2009 Dezső Horváth MTA KFKI Research
More informationReal Time Tracking with ATLAS Silicon Detectors and its Applications to Beauty Hadron Physics
Real Time Tracking with ATLAS Silicon Detectors and its Applications to Beauty Hadron Physics Carlo Schiavi Dottorato in Fisica - XVII Ciclo Outline The ATLAS Experiment The SiTrack Algorithm Application
More informationSummer Student Project Report
Summer Student Project Report Dimitris Kalimeris National and Kapodistrian University of Athens June September 2014 Abstract This report will outline two projects that were done as part of a three months
More informationPLANNING FOR PREDICTABLE NETWORK PERFORMANCE IN THE ATLAS TDAQ
PLANNING FOR PREDICTABLE NETWORK PERFORMANCE IN THE ATLAS TDAQ C. Meirosu *#, B. Martin *, A. Topurov *, A. Al-Shabibi Abstract The Trigger and Data Acquisition System of the ATLAS experiment is currently
More informationSecure Hybrid Cloud Infrastructure for Scien5fic Applica5ons
Secure Hybrid Cloud Infrastructure for Scien5fic Applica5ons Project Members: Paula Eerola Miika Komu MaA Kortelainen Tomas Lindén Lirim Osmani Sasu Tarkoma Salman Toor (Presenter) salman.toor@helsinki.fi
More informationMeasurement of BeStMan Scalability
Measurement of BeStMan Scalability Haifeng Pi, Igor Sfiligoi, Frank Wuerthwein, Abhishek Rana University of California San Diego Tanya Levshina Fermi National Accelerator Laboratory Alexander Sim, Junmin
More informationATLAS grid computing model and usage
ATLAS grid computing model and usage RO-LCG workshop Magurele, 29th of November 2011 Sabine Crépé-Renaudin for the ATLAS FR Squad team ATLAS news Computing model : new needs, new possibilities : Adding
More informationJet Reconstruction in CMS using Charged Tracks only
Jet Reconstruction in CMS using Charged Tracks only Andreas Hinzmann for the CMS Collaboration JET2010 12 Aug 2010 Jet Reconstruction in CMS Calorimeter Jets clustered from calorimeter towers independent
More informationWeb based monitoring in the CMS experiment at CERN
FERMILAB-CONF-11-765-CMS-PPD International Conference on Computing in High Energy and Nuclear Physics (CHEP 2010) IOP Publishing Web based monitoring in the CMS experiment at CERN William Badgett 1, Irakli
More informationThe CMS Tier0 goes Cloud and Grid for LHC Run 2. Dirk Hufnagel (FNAL) for CMS Computing
The CMS Tier0 goes Cloud and Grid for LHC Run 2 Dirk Hufnagel (FNAL) for CMS Computing CHEP, 13.04.2015 Overview Changes for the Tier0 between Run 1 and Run 2 CERN Agile Infrastructure (in GlideInWMS)
More informationScalable stochastic tracing of distributed data management events
Scalable stochastic tracing of distributed data management events Mario Lassnig mario.lassnig@cern.ch ATLAS Data Processing CERN Physics Department Distributed and Parallel Systems University of Innsbruck
More informationBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary
Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster business decisions 8 The ATLAS experiment
More information