Big Data Processing Experience in the ATLAS Experiment

Size: px
Start display at page:

Download "Big Data Processing Experience in the ATLAS Experiment"

Transcription

1 Big Data Processing Experience in the ATLAS Experiment A. on behalf of the ATLAS Collabora5on Interna5onal Symposium on Grids and Clouds (ISGC) 2014 March 23-28, 2014 Academia Sinica, Taipei, Taiwan

2 Introduction To improve the data quality for physics analysis and extend physics reach, the ATLAS collabora5on rou5nely reprocess petabytes of data on the Grid During LHC data taking, we completed three major data reprocessing campaigns, with up to 2 PB of raw data being reprocessed every year At the 5me of the conference, the latest data reprocessing campaign of more than 2 PB of 2012 pp data is nearing comple5on The demands on Grid compu5ng resources grow, as scheduled LHC upgrades will increase the data taking rates tenfold Since a tenfold increase in WLCG resources is not an op5on, a comprehensive model for the composi5on and execu5on of the data processing workflow within given CPU and storage constraints is necessary to accommodate physics needs of the next LHC run We will report on experience gained in ATLAS Big Data processing and on efforts underway to scale up Grid data processing beyond petabytes 2

3 ATLAS Detector 7000 tons, 88 million electronics channels, raw event size ~1 MB With up to 3 billion events per year ATLAS records petabytes of LHC collision events 3

4 Detector Data Processing A star5ng point for physics analysis is the reconstruc5on of raw event data from detector Applica5ons are processing raw detector data with sophis5cated algorithms to iden5fy and reconstruct physics objects such as charged par5cle tracks 4

5 Big Data Processing on the Grid High Energy Physics data are comprised of independent events Reconstruc5on applica5ons process one event at a 5me One raw file contains events taken in a few minutes A dataset contains files with events close in 5me The first- pass processing of all raw event data at the ATLAS Tier- 0 compu5ng site at CERN provides promptly the data for quality assessment and physics analysis To extend physics reach, the quality of the reconstructed data is improved by further op5miza5ons of socware algorithms and condi5ons/calibra5ons data For data processing with improved socware and condi5ons/calibra5ons (reprocessing) ATLAS uses ten Tier- 1 compu5ng sites distributed on the Grid 5

6 Increasing Big Data Processing Throughput High throughput is cri5cal for 5mely comple5on of the reprocessing campaigns conducted in prepara5on for major physics conferences During LHC data- taking, the eight- fold increase in the throughput of Big Data processing was achieved 3.5 M jobs processed 2 PB of 2012 data 1.1 M jobs processed in four weeks 1 PB of 2011 data 0.9 M jobs processed in four weeks 1 PB of 2010 data in two months

7 Big Data Processing Throughput For a faster throughput, the number of jobs running concurrently exceeded 33k during ATLAS reprocessing campaign in November 2012 For comparison the daily average number of running jobs remained below 20k during the legacy reprocessing of 2012 pp data conducted by the CMS experiment in January- March 2013 K. Bloom CMS Use of a Data Federa5on CMS CR /339 7

8 2013 Reprocessing Campaign To increase ATLAS physics output, the reprocessing gives possibility to find new signatures through the subsequent analysis of the LHC Run- 1 data Such as look for heavy, long- lived par5cles predicted by several SUSY and exo5c models Input data volume: 2.2 PB Using trigger signatures ~15% of events are selected in three major physics streams High throughput not required in this campaign the slow- burner schedule requires just 15% of the resources available Reprocessing status: more than 95% done 8

9 Engineering Reliability ATLAS data reprocessing on the Grid tolerates a con5nuous stream of failures, errors and faults Our experience has shown that Grid failures can occur for a variety of reasons Grid heterogeneity makes failures hard to diagnose and repair quickly While many fault- tolerance mechanisms improve the reliability of data reprocessing on the Grid, their benefits come at costs Reliability Engineering provides a framework for fundamental understanding of Big Data processing which is not a desirable enhancement but a necessary requirement 9

10 CHEP2012: Costs of Recovery from Failures Job re- tries avoids data loss at the expense of CPU 5me used by the failed jobs Distribu5on of tasks 1 ranked by CPU 5me used to recover from transient failures is not uniform: Most of CPU 5me required for recovery was used in a small frac5on of tasks CPU- hours used to recover from transient failures Task Rank 1 In ATLAS data reprocessing jobs from the same run are processed in the same task 10

11 Histogram of Transient Failures Recovery Costs Number of Tasks CPU- hours used to recover from transient failures log10(cpu-hours) Task Rank 11

12 Grid2012: Changes in Failure Recovery Costs 100 Number of Tasks The major costs were reduced in 2011 Majority of the costs are from storage failures at the end of job log10(cpu-hours) 12

13 CHEP2013: Same Behavior in 2012 Reprocessing #!!!!$ #!!!$ '!#!$ '!##$ '!#'$!"#$%&'()* #!!$ #!$ #$!"#$!"!#$!$ %!!$ &!!$ #'!!$ #(!!$ +,)-*.,/-* There were more tasks in 2012 reprocessing of 2 PB of 2012 p- p data 13

14 2013 Reprocessing: Confirms Universal Behavior!"#$%&'()* #!!!!$ #!!!$ #!!$ #!$ '!#!$ '!##$ '!#'$ '!#)$ #$!"#$!"!#$!$ %!!$ &!!$ #'!!$ #(!!$ +,)-*.,/-* 14

15 CPU-time Used to Recover from Job Failures 0.15 Number of Tasks (Normalized) log10(cpu-hours) 15

16 Big Data Processing on the Grid: Performance Reprocessing campaign Input Data Volume (PB) CPU Time Used for Reconstruction (10 6 h) Fraction of CPU Time Used for Recovery (%)

17 Scaling Up Big Data Processing beyond Petabytes The demands on Grid compu5ng resources grow, as scheduled LHC upgrades will increase ATLAS data taking rates a comprehensive model for the composi5on and execu5on of the data processing workflow within given CPU and storage constraints is necessary to accommodate physics needs of the next LHC run Coordinated efforts are underway to scale up Grid data processing beyond petabytes Preparing ATLAS Distributed Compu5ng for LHC Run 2 hnp://indico3.twgrid.org/indico/contribu5ondisplay.py?contribid=160&sessionid=44&confid=513 PanDA's Role in ATLAS Compu5ng Model Evolu5on hnp://indico3.twgrid.org/indico/contribu5ondisplay.py?contribid=162&sessionid=44&confid=513 Integra5ng Network Awareness in ATLAS Distributed Compu5ng hnp://indico3.twgrid.org/indico/contribu5ondisplay.py?contribid=189&sessionid=54&confid=513 Extending ATLAS Compu5ng to Commercial Clouds and Supercomputers hnp://indico3.twgrid.org/indico/contribu5ondisplay.py?contribid=191&sessionid=55&confid=513 17

18 Conclusions Reliability Engineering is an ac5ve area of research providing solid founda5ons for the efforts underway to scale up Grid data processing beyond petabytes Maximizing throughput During LHC data- taking, ATLAS achieved an eight- fold increase in the throughput of Big Data processing on the Grid Minimizing costs of recovery from transient failures ATLAS Big Data processing on the Grid keeps the cost of automa5c re- tries of the failed jobs at the level of 3-6% of total CPU- hours used for data reconstruc5on Predic9ng performance Despite substan5al differences in all four ATLAS major data reprocessing campaigns on the Grid, we found that the distribu5on of the CPU- 5me used to recover from transient job failures exhibits the same general log- normal behavior The ATLAS experiment con5nues op5mizing the use of Grid compu5ng resources in prepara5on for the LHC data taking in

19 Extra Materials

20 Increasing Big Data Processing Throughput!"#"$%"&'%(%)*+,%-*%./*01,,%#%23%*4%!"#"% 56-6%78%-9*%:*8-;,%!"##$%#&#%%(%)*+,%-*%./*01,,%#%23%*4%!"##% 56-6%78%4*</%911=,%!"#!$%>&?%%(%)*+,%-*%./*01,,%!%23%*4%!"#!% 56-6%78%4*</%911=,% High throughput is cri5cal for 5mely comple5on of the reprocessing campaigns conducted in prepara5on for major physics conferences In 2011 reprocessing the throughput doubled in comparison to the 2010 reprocessing campaign To deliver new physics results for the 2013 Moriond Conference, ATLAS reprocessed twice more data in November 2012 within the same 5me period as in 2011 reprocessing, while due to increased LHC pileup, the 2012 pp events required twice more 5me to reconstruct than 2011 events 20

ATLAS Petascale Data Processing on the Grid: Facilitating Physics Discoveries at the LHC

ATLAS Petascale Data Processing on the Grid: Facilitating Physics Discoveries at the LHC ATLAS Petascale Data Processing on the Grid: Facilitating Physics Discoveries at the LHC Wensheng Deng 1, Alexei Klimentov 1, Pavel Nevski 1, Jonas Strandberg 2, Junji Tojo 3, Alexandre Vaniachine 4, Rodney

More information

Data analysis in Par,cle Physics

Data analysis in Par,cle Physics Data analysis in Par,cle Physics From data taking to discovery Tuesday, 13 August 2013 Lukasz Kreczko - Bristol IT MegaMeet 1 $ whoami Lukasz (Luke) Kreczko Par,cle Physicist Graduated in Physics from

More information

Secure Hybrid Cloud Infrastructure for Scien5fic Applica5ons

Secure Hybrid Cloud Infrastructure for Scien5fic Applica5ons Secure Hybrid Cloud Infrastructure for Scien5fic Applica5ons Project Members: Paula Eerola Miika Komu MaA Kortelainen Tomas Lindén Lirim Osmani Sasu Tarkoma Salman Toor (Presenter) [email protected]

More information

Status and Evolution of ATLAS Workload Management System PanDA

Status and Evolution of ATLAS Workload Management System PanDA Status and Evolution of ATLAS Workload Management System PanDA Univ. of Texas at Arlington GRID 2012, Dubna Outline Overview PanDA design PanDA performance Recent Improvements Future Plans Why PanDA The

More information

Clusters in the Cloud

Clusters in the Cloud Clusters in the Cloud Dr. Paul Coddington, Deputy Director Dr. Shunde Zhang, Compu:ng Specialist eresearch SA October 2014 Use Cases Make the cloud easier to use for compute jobs Par:cularly for users

More information

Data Center Evolu.on and the Cloud. Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM

Data Center Evolu.on and the Cloud. Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM Data Center Evolu.on and the Cloud Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM 1 Hardware Evolu.on 2 Where is hardware going? x86 con(nues to move upstream Massive compute

More information

Summer Student Project Report

Summer Student Project Report Summer Student Project Report Dimitris Kalimeris National and Kapodistrian University of Athens June September 2014 Abstract This report will outline two projects that were done as part of a three months

More information

Computing at the HL-LHC

Computing at the HL-LHC Computing at the HL-LHC Predrag Buncic on behalf of the Trigger/DAQ/Offline/Computing Preparatory Group ALICE: Pierre Vande Vyvre, Thorsten Kollegger, Predrag Buncic; ATLAS: David Rousseau, Benedetto Gorini,

More information

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely

More information

The Theory And Practice of Testing Software Applications For Cloud Computing. Mark Grechanik University of Illinois at Chicago

The Theory And Practice of Testing Software Applications For Cloud Computing. Mark Grechanik University of Illinois at Chicago The Theory And Practice of Testing Software Applications For Cloud Computing Mark Grechanik University of Illinois at Chicago Cloud Computing Is Everywhere Global spending on public cloud services estimated

More information

Experiments on cost/power and failure aware scheduling for clouds and grids

Experiments on cost/power and failure aware scheduling for clouds and grids Experiments on cost/power and failure aware scheduling for clouds and grids Jorge G. Barbosa, Al0no M. Sampaio, Hamid Harabnejad Universidade do Porto, Faculdade de Engenharia, LIACC Porto, Portugal, [email protected]

More information

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015 (Possible) HEP Use Case for NDN Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015 Outline LHC Experiments LHC Computing Models CMS Data Federation & AAA Evolving Computing Models & NDN Summary Phil DeMar:

More information

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management

More information

2015-16 ITS Strategic Plan Enabling an Unbounded University

2015-16 ITS Strategic Plan Enabling an Unbounded University 2015-16 ITS Strategic Plan Enabling an Unbounded University Update: July 31, 2015 IniAaAve: Agility Through Technology Vision Mission Enable Unbounded Learning Support student success through the innovaave

More information

Project Overview. Collabora'on Mee'ng with Op'mis, 20-21 Sept. 2011, Rome

Project Overview. Collabora'on Mee'ng with Op'mis, 20-21 Sept. 2011, Rome Project Overview Collabora'on Mee'ng with Op'mis, 20-21 Sept. 2011, Rome Cloud-TM at a glance "#$%&'$()!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"#$%&!"'!()*+!!!!!!!!!!!!!!!!!!!,-./01234156!("*+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!&7"7#7"7!("*+!!!!!!!!!!!!!!!!!!!89:!;62!("$+!

More information

An Integrated CyberSecurity Approach for HEP Grids. Workshop Report. http://hpcrd.lbl.gov/hepcybersecurity/

An Integrated CyberSecurity Approach for HEP Grids. Workshop Report. http://hpcrd.lbl.gov/hepcybersecurity/ An Integrated CyberSecurity Approach for HEP Grids Workshop Report http://hpcrd.lbl.gov/hepcybersecurity/ 1. Introduction The CMS and ATLAS experiments at the Large Hadron Collider (LHC) being built at

More information

Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary

Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster business decisions 8 The ATLAS experiment

More information

Evolution of Database Replication Technologies for WLCG

Evolution of Database Replication Technologies for WLCG Home Search Collections Journals About Contact us My IOPscience Evolution of Database Replication Technologies for WLCG This content has been downloaded from IOPscience. Please scroll down to see the full

More information

Tier0 plans and security and backup policy proposals

Tier0 plans and security and backup policy proposals Tier0 plans and security and backup policy proposals, CERN IT-PSS CERN - IT Outline Service operational aspects Hardware set-up in 2007 Replication set-up Test plan Backup and security policies CERN Oracle

More information

The Data Quality Monitoring Software for the CMS experiment at the LHC

The Data Quality Monitoring Software for the CMS experiment at the LHC The Data Quality Monitoring Software for the CMS experiment at the LHC On behalf of the CMS Collaboration Marco Rovere, CERN CHEP 2015 Evolution of Software and Computing for Experiments Okinawa, Japan,

More information

Interac(ve Broker (UK) Limited Webinar: Proprietary Trading Groups

Interac(ve Broker (UK) Limited Webinar: Proprietary Trading Groups Interac(ve Broker (UK) Limited Webinar: Proprietary Trading Groups Presenter Gerald Perez Managing Director London, United Kingdom E- mail: gperez@interac=vebrokers.com Important Informa=on: The risk of

More information

Bulletin. Introduction. Dates and Venue. History. Important Dates. Registration

Bulletin. Introduction. Dates and Venue. History. Important Dates. Registration Bulletin Introduction The International Conference on Computing in High Energy and Nuclear Physics (CHEP) is a major series of international conferences for physicists and computing professionals from

More information

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier Using S3 cloud storage with ROOT and CernVMFS Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier INDEX Huawei cloud storages at CERN Old vs. new Huawei UDS comparative

More information

With DDN Big Data Storage

With DDN Big Data Storage DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big

More information

Big Data Needs High Energy Physics especially the LHC. Richard P Mount SLAC National Accelerator Laboratory June 27, 2013

Big Data Needs High Energy Physics especially the LHC. Richard P Mount SLAC National Accelerator Laboratory June 27, 2013 Big Data Needs High Energy Physics especially the LHC Richard P Mount SLAC National Accelerator Laboratory June 27, 2013 Why so much data? Our universe seems to be governed by nondeterministic physics

More information

An Open Dynamic Big Data Driven Applica3on System Toolkit

An Open Dynamic Big Data Driven Applica3on System Toolkit An Open Dynamic Big Data Driven Applica3on System Toolkit Craig C. Douglas University of Wyoming and KAUST This research is supported in part by the Na3onal Science Founda3on and King Abdullah University

More information

The CMS analysis chain in a distributed environment

The CMS analysis chain in a distributed environment The CMS analysis chain in a distributed environment on behalf of the CMS collaboration DESY, Zeuthen,, Germany 22 nd 27 th May, 2005 1 The CMS experiment 2 The CMS Computing Model (1) The CMS collaboration

More information

HIGH ENERGY PHYSICS EXPERIMENTS IN GRID COMPUTING NETWORKS EKSPERYMENTY FIZYKI WYSOKICH ENERGII W SIECIACH KOMPUTEROWYCH GRID. 1.

HIGH ENERGY PHYSICS EXPERIMENTS IN GRID COMPUTING NETWORKS EKSPERYMENTY FIZYKI WYSOKICH ENERGII W SIECIACH KOMPUTEROWYCH GRID. 1. Computer Science Vol. 9 2008 Andrzej Olszewski HIGH ENERGY PHYSICS EXPERIMENTS IN GRID COMPUTING NETWORKS The demand for computing resources used for detector simulations and data analysis in High Energy

More information

GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid

GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid Wantao Liu 1,2 Raj Kettimuthu 2,3, Brian Tieman 3, Ravi Madduri 2,3, Bo Li 1, and Ian Foster 2,3 1 Beihang University, Beijing, China 2 The

More information

New Design and Layout Tips For Processing Multiple Tasks

New Design and Layout Tips For Processing Multiple Tasks Novel, Highly-Parallel Software for the Online Storage System of the ATLAS Experiment at CERN: Design and Performances Tommaso Colombo a,b Wainer Vandelli b a Università degli Studi di Pavia b CERN IEEE

More information

Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab

Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab Performance monitoring at CERN openlab July 20 th 2012 Andrzej Nowak, CERN openlab Data flow Reconstruction Selection and reconstruction Online triggering and filtering in detectors Raw Data (100%) Event

More information

Meeting Management Solution. Technology and Security Overview. 10012 N. Dale Mabry Hwy Suite 115 Tampa, FL 33618 800-274-5624 Ext 702

Meeting Management Solution. Technology and Security Overview. 10012 N. Dale Mabry Hwy Suite 115 Tampa, FL 33618 800-274-5624 Ext 702 Meeting Management Solution Technology and Security Overview 10012 N. Dale Mabry Hwy Suite 115 Tampa, FL 33618 800-274-5624 Ext 702 Technology SaaS Software as a Service is offered as well. Client may

More information

RAID Basics Training Guide

RAID Basics Training Guide RAID Basics Training Guide Discover a Higher Level of Performance RAID matters. Rely on Intel RAID. Table of Contents 1. What is RAID? 2. RAID Levels RAID 0 RAID 1 RAID 5 RAID 6 RAID 10 RAID 0+1 RAID 1E

More information

1. Base Programming. GIORGIO RUSSOLILLO - Cours de prépara+on à la cer+fica+on SAS «Base Programming»

1. Base Programming. GIORGIO RUSSOLILLO - Cours de prépara+on à la cer+fica+on SAS «Base Programming» 1. Base Programming GIORGIO RUSSOLILLO Cours de prépara+on à la cer+fica+on SAS «Base Programming» 9 What is SAS Highly flexible and integrated soiware environment; you can use SAS for: GIORGIO RUSSOLILLO

More information

Evolution of the ATLAS PanDA Production and Distributed Analysis System

Evolution of the ATLAS PanDA Production and Distributed Analysis System Evolution of the ATLAS PanDA Production and Distributed Analysis System T. Maeno 1, K. De 2, T. Wenaus 1, P. Nilsson 2, R. Walker 3, A. Stradling 2, V. Fine 1, M. Potekhin 1, S. Panitkin 1, G. Compostella

More information

Grid Computing in Aachen

Grid Computing in Aachen GEFÖRDERT VOM Grid Computing in Aachen III. Physikalisches Institut B Berichtswoche des Graduiertenkollegs Bad Honnef, 05.09.2008 Concept of Grid Computing Computing Grid; like the power grid, but for

More information

Techniques for implementing & running robust and reliable DB-centric Grid Applications

Techniques for implementing & running robust and reliable DB-centric Grid Applications Techniques for implementing & running robust and reliable DB-centric Grid Applications International Symposium on Grid Computing 2008 11 April 2008 Miguel Anjo, CERN - Physics Databases Outline Robust

More information

Big data management with IBM General Parallel File System

Big data management with IBM General Parallel File System Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers

More information

OS/Run'me and Execu'on Time Produc'vity

OS/Run'me and Execu'on Time Produc'vity OS/Run'me and Execu'on Time Produc'vity Ron Brightwell, Technical Manager Scalable System SoAware Department Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation,

More information

The Development of Cloud Interoperability

The Development of Cloud Interoperability NSC- JST Workshop The Development of Cloud Interoperability Weicheng Huang Na7onal Center for High- performance Compu7ng Na7onal Applied Research Laboratories 1 Outline Where are we? Our experiences before

More information

New Jersey Big Data Alliance

New Jersey Big Data Alliance Rutgers Discovery Informatics Institute (RDI 2 ) New Jersey s Center for Advanced Computation New Jersey Big Data Alliance Manish Parashar Director, Rutgers Discovery Informatics Institute (RDI 2 ) Professor,

More information

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

CMS Tier-3 cluster at NISER. Dr. Tania Moulik CMS Tier-3 cluster at NISER Dr. Tania Moulik What and why? Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach common goal. Grids tend

More information

From Distributed Computing to Distributed Artificial Intelligence

From Distributed Computing to Distributed Artificial Intelligence From Distributed Computing to Distributed Artificial Intelligence Dr. Christos Filippidis, NCSR Demokritos Dr. George Giannakopoulos, NCSR Demokritos Big Data and the Fourth Paradigm The two dominant paradigms

More information

An introduction to disaster recovery. And how DrAAS from I.R.I.S. Ondit can help!

An introduction to disaster recovery. And how DrAAS from I.R.I.S. Ondit can help! An introduction to disaster recovery And how DrAAS from I.R.I.S. Ondit can help! Events That Impact Information Availability Events that require a data center move: Fewer than 1% of occurrences Natural

More information

US NSF s Scientific Software Innovation Institutes

US NSF s Scientific Software Innovation Institutes US NSF s Scientific Software Innovation Institutes S 2 I 2 awards invest in long-term projects which will realize sustained software infrastructure that is integral to doing transformative science. (Can

More information

Return on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013

Return on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013 Return on Experience on Cloud Compu2ng Issues a stairway to clouds Experts Workshop Agenda InGeoCloudS SoCware Stack InGeoCloudS Elas2city and Scalability Elas2c File Server Elas2c Database Server Elas2c

More information

Protec'ng Informa'on Assets - Week 8 - Business Continuity and Disaster Recovery Planning. MIS 5206 Protec/ng Informa/on Assets Greg Senko

Protec'ng Informa'on Assets - Week 8 - Business Continuity and Disaster Recovery Planning. MIS 5206 Protec/ng Informa/on Assets Greg Senko Protec'ng Informa'on Assets - Week 8 - Business Continuity and Disaster Recovery Planning MIS5206 Week 8 In the News Readings In Class Case Study BCP/DRP Test Taking Tip Quiz In the News Discuss items

More information

The Emerging Discipline of Data Science. Principles and Techniques For Data- Intensive Analysis

The Emerging Discipline of Data Science. Principles and Techniques For Data- Intensive Analysis The Emerging Discipline of Data Science Principles and Techniques For Data- Intensive Analysis What is Big Data Analy9cs? Is this a new paradigm? What is the role of data? What could possibly go wrong?

More information

NextGen Infrastructure for Big DATA Analytics.

NextGen Infrastructure for Big DATA Analytics. NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures

More information

How To Teach Physics At The Lhc

How To Teach Physics At The Lhc LHC discoveries and Particle Physics Concepts for Education Farid Ould- Saada, University of Oslo On behalf of IPPOG EPS- HEP, Vienna, 25.07.2015 A successful program LHC data are successfully deployed

More information

A Physics Approach to Big Data. Adam Kocoloski, PhD CTO Cloudant

A Physics Approach to Big Data. Adam Kocoloski, PhD CTO Cloudant A Physics Approach to Big Data Adam Kocoloski, PhD CTO Cloudant 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Solenoidal Tracker at RHIC (STAR) The life of LHC data Detected by experiment Online

More information

ATLAS job monitoring in the Dashboard Framework

ATLAS job monitoring in the Dashboard Framework ATLAS job monitoring in the Dashboard Framework J Andreeva 1, S Campana 1, E Karavakis 1, L Kokoszkiewicz 1, P Saiz 1, L Sargsyan 2, J Schovancova 3, D Tuckett 1 on behalf of the ATLAS Collaboration 1

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

IT Change Management Process Training

IT Change Management Process Training IT Change Management Process Training Before you begin: This course was prepared for all IT professionals with the goal of promo9ng awareness of the process. Those taking this course will have varied knowledge

More information

Solving Data Loss in Massive Storage Systems Jason Resch Cleversafe

Solving Data Loss in Massive Storage Systems Jason Resch Cleversafe Solving Data Loss in Massive Storage Systems Jason Resch Cleversafe 2010 Storage Developer Conference. Insert Your Company Name. All Rights Reserved. 1 In the beginning There was replication Long before

More information

Scalable Multi-Node Event Logging System for Ba Bar

Scalable Multi-Node Event Logging System for Ba Bar A New Scalable Multi-Node Event Logging System for BaBar James A. Hamilton Steffen Luitz For the BaBar Computing Group Original Structure Raw Data Processing Level 3 Trigger Mirror Detector Electronics

More information

HEP computing and Grid computing & Big Data

HEP computing and Grid computing & Big Data May 11 th 2014 CC visit: Uni Trieste and Uni Udine HEP computing and Grid computing & Big Data CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Massimo Lamanna/CERN IT department - Data

More information

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1 Introduction - 8.1 I/O Chapter 8 Disk Storage and Dependability 8.2 Buses and other connectors 8.4 I/O performance measures 8.6 Input / Ouput devices keyboard, mouse, printer, game controllers, hard drive,

More information

ATLAS Data Management Accounting with Hadoop Pig and HBase

ATLAS Data Management Accounting with Hadoop Pig and HBase ATLAS Data Management Accounting with Hadoop Pig and HBase Mario Lassnig, Vincent Garonne, Gancho Dimitrov, Luca Canali, on behalf of the ATLAS Collaboration European Organization for Nuclear Research

More information

Forschungszentrum Karlsruhe in der Helmholtz - Gemeinschaft. Holger Marten. Holger. Marten at iwr. fzk. de www.gridka.de

Forschungszentrum Karlsruhe in der Helmholtz - Gemeinschaft. Holger Marten. Holger. Marten at iwr. fzk. de www.gridka.de Tier-2 cloud Holger Marten Holger. Marten at iwr. fzk. de www.gridka.de 1 GridKa associated Tier-2 sites spread over 3 EGEE regions. (4 LHC Experiments, 5 (soon: 6) countries, >20 T2 sites) 2 region DECH

More information

Mission. To provide higher technological educa5on with quality, preparing. competent professionals, with sound founda5ons in science, technology

Mission. To provide higher technological educa5on with quality, preparing. competent professionals, with sound founda5ons in science, technology Mission To provide higher technological educa5on with quality, preparing competent professionals, with sound founda5ons in science, technology and innova5on, commi

More information

From raw data to Pbytes on disk The world wide LHC Computing Grid

From raw data to Pbytes on disk The world wide LHC Computing Grid The world wide LHC Computing Grid HAP Workshop Bad Liebenzell, Dark Universe Nov. 22nd 2012 1 KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association

More information