Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary
|
|
- Calvin Mathews
- 8 years ago
- Views:
Transcription
1 Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary
2
3 16/02/2015 Real-Time Analytics: Making better and faster business decisions 8
4
5 The ATLAS experiment CERN IT Department CH-1211 Genève 23 Switzerland tons, 150 million sensors generating data 40 millions times per second i.e. a petabyte/s The Worldwide LHC Computing Grid 5
6 Data Collection and Archiving at CERN Data flow to permanent storage: 4-6 GB/sec LHCb: MB/sec ATLAS: 1-2 GB/sec Alice: 4 GB/sec CMS: 1-2 GB/sec Markus.Schulz@cern.ch
7 The Worldwide LHC Computing Grid An international collaboration to distribute and analyse LHC data Integrates computer centres worldwide that provide computing and storage resource into a single infrastructure accessible by all LHC physicists Tier-0 (CERN): data recording, reconstruction and distribution nearly 170 sites, 40 countries ~ cores Tier-1: permanent storage, reprocessing, analysis Tier-2: Simulation, end-user analysis 500 PB of storage > 2 million jobs/day Gb links
8 LHC Big Data Few PB of raw data becomes ~100 PB! Duplicate raw data Simulated data Derived data products Versions as software improves Replicas to allow access by more physicists
9 How do we store/retrieve LHC data? A short history 1 st Try - All data in an commercial Object Database (1995) good match for complex data model and OO language integralon but the market predicted by many analysts did not materialise! 2 nd Try - All data in a relalonal DB - object relalonal mapping (1999) PB- scale of deployment was far for from being proven Users code in C++ and rejected data model definilon in SQL Hybrid between RDBMS and structured files (from today) RelaLonal DBs for transaclonal management of metadata (only TB- scale) File/dataset meta data, condilons, calibralon, provenance, work flow via DB abstraclon (plugins: Oracle, MySQL, SQLite, FronLer/SQUID) Open source persistency framework (ROOT) Uses C++ introspeclon to store/retrieve networks of C++ objects Column- store for efficient sparse reading Ian.Bird@cern.ch 9
10 Processing a TTree TSelector Output list Begin() - Create histograms - Define output list preselection Process() Ok analysis Terminate() - Finalize analysis (fitting,...) Event Branch n Leaf Leaf Branch Branch Read needed parts only Branch Leaf Leaf Leaf Leaf Leaf TTree 1 2 n last Loop over events 16
11 CERN Disk Storage Overview AFS CASTOR EOS Ceph NFS CERNBox Raw Capacity 3 PB 20 PB 140 PB 4 PB 200 TB 1.1 PB Data Stored 390 TB 86 PB (tape) 27 PB 170 TB 36 TB 35 TB Files Stored 2.7 B 300 M 284 M 77 M (obj) 120 M 14 M AFS is CERN s linux home directory service CASTOR & EOS are mainly used for the physics use case (Data Analysis and DAQ) Ceph is our storage backend for images and volumes in OpenStack NFS is mainly used by engineering application CERNBox is our file synchronisation service based on OwnCloud+EOS 2
12 ture Tape at CERN inosity Archive write 27 PB Data Volume 100 PB physics archive 7 PB backup (TSM) 15 PB 23 PB Tape libraries 3+2 x IBM TS x Oracle SL8500 Tape drives 100 physics archive 50 backup Archive read Capacity 70k slots 30k tapes /4/2015 CHEP 2015, Okinawa 2
13 Archive: Large scale media migration Part 1: Oracle T10000D Part 2: IBM TS1150 Deadline: LHC run 2 start! Repack Repack LHC Run1 LHC Run1 14/4/2015 CHEP 2015, Okinawa 13
14 Feb
15 Smart vs Simple Archive: HSM Issues CASTOR had been designed as Hierarchical Storage Management system disk-only and multi-pool support were added later painfully.. required rates for namespace access and file-open exceeded earlier estimates Around LHC start also conceptual issues with the HSM model became visible A file is not a meaningful granule for managing data exchange experiment use datasets Dataset parts needed to be pinned on disk by users to avoid cache trashing Users had to trick the HSM to do the right thing :-(
16 DSS EOS Project: Goals & Choices Server, media, file system failures need to be transparently absorbed key functionality: file level replication and rebalancing data stays available after a failure - no human intervention Fine grained redundancy within one h/w setup choose & change redundancy level for specific data either file replica count or erasure encoding Support bulk deployment operations eg replace hundreds of servers at end of warranty In-memory namespace (sparse hash per directory) file stat calls 1-2 orders faster write ahead logging for durability Internet Services CERN IT Department CH-1211 Genève 23 Switzerland Later in addition: transparent multi-site clustering eg between Geneva and Budapest 16
17 Connectivity (100 Gbps) Dante/Géant T-Systems
18 EOS Raw Capacity Evolution
19 Why do we develop our own open source storage software? Large science community trained to be effective with set of products efficiency of this community is our main asset - not just the raw utilisation of CPUs and disks integration and specific support do matter community sharing via tools and formats even more Long term projects change of vendor/technology is not only likely but expected we carry old but valuable data through time (bit-preservation) loss of data ownership after first active project period
20 Does Kryder s law still hold? areal density CAGR source: HDD Opportunities & Challenges, Now to 2020, Dave Anderson, Seagate
21 Object Disk Each disk talks object storage protocol over TCP replication/failover with other disks in a networked disk cluster open access library for app development Why now? shingled media come with constrained (object) semantic: eg no updates Early stage with several open questions port price for disk network vs price gain by reduced server/power cost? standardisation of protocol/semantics to allow app development at low risk of vendor binding?
22 Can we optimise our systems further? Infrastructure analytics apply statistical analysis to the complete system: storage, cpu, network, user app measure/predict quantitative impact of changes on real job population Easy! looks like physics analysis with infrastructure metrics instead of physics data really?
23 Non-trivial Technically needs consolidated service and application side metrics usually: log data for human consumption without data design Conceptually some established metrics turn out to be less useful for analysis of today s hardware than expected cpu efficiency = t_cpu / t_wall? storage efficiency = GB / s? correlation does not imply causal relation Sociologically better observe rule of local discovery people who quantitatively understand the infrastructure are busy running services Always
24 Data Collection and Analysis Repository MR node MR node MR node Hadoop MR node MR node MR node Ramping up: ~ 100 nodes ~ 100 TB raw logs eos lsf ai Monitoring JSON Files small, binary subset Periodic Extract & Cleaning HDFS Set: Set: EOS EOS readbytes Set: eos : number readbytes : number readbytes filename : number : string filename : string filename opentime : string : time opentime : time opentime : time export User extract In production: - Flume - HDFS - MR - Pig - Spark - Scoop - {Impala} Current work items: Service: availability (eg isolation and rolling upgrades) Analytics: workbooks support for popular analysis tools: R/python/ROOT
25 Summary CERN has a long tradition in deploying large scale storage systems used by a distributed science community world-wide During the first LHC run period we have passed the 100 PB mark at CERN and more importantly have contributed to the rapid confirmation of the Higgs boson and many other LHC results For LHC Run 2 we have significantly upgraded & optimised the infrastructure in close collaboration between service providers and users Adding more quantitative infrastructure analytics to prepare for High-Luminosity-LHC CERN is already very active as user and provider in the open source world and the overlap with other Big Data communities is increasing.
26 Thank you!
Big Data and Storage Management at the Large Hadron Collider
Big Data and Storage Management at the Large Hadron Collider Dirk Duellmann CERN IT, Data & Storage Services Accelerating Science and Innovation CERN was founded 1954: 12 European States Science for Peace!
More informationCERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT
SS Data & Storage CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT HEPiX Fall 2012 Workshop October 15-19, 2012 Institute of High Energy Physics, Beijing, China SS Outline
More informationDSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group
DSS High performance storage pools for LHC Łukasz Janyst on behalf of the CERN IT-DSS group CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Introduction The goal of EOS is to provide a
More informationStorage strategy and cloud storage evaluations at CERN Dirk Duellmann, CERN IT
SS Data & Storage Storage strategy and cloud storage evaluations at CERN Dirk Duellmann, CERN IT (with slides from Andreas Peters and Jan Iven) 5th International Conference "Distributed Computing and Grid-technologies
More informationDSS. The Data Storage Services (DSS) Strategy at CERN. Jakub T. Moscicki. (Input from J. Iven, M. Lamanna A. Pace, A. Peters and A.
The Data Storage Services () Strategy at CERN Jakub T. Moscicki (Input from J. Iven, M. Lamanna A. Pace, A. Peters and A. Wiebalck) HEPiX Spring 2012 Workshop Prague, April 2012 The big picture Situation
More informationData and Storage Services
Data and Storage Services G. Cancio, D. Duellmann, J. Iven, M. Lamanna, A. Pace, A.J. Peters, R.Toebbicke CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it CERN IT Department CH-1211 Genève
More informationDistributed Database Access in the LHC Computing Grid with CORAL
Distributed Database Access in the LHC Computing Grid with CORAL Dirk Duellmann, CERN IT on behalf of the CORAL team (R. Chytracek, D. Duellmann, G. Govi, I. Papadopoulos, Z. Xie) http://pool.cern.ch &
More informationHEP computing and Grid computing & Big Data
May 11 th 2014 CC visit: Uni Trieste and Uni Udine HEP computing and Grid computing & Big Data CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Massimo Lamanna/CERN IT department - Data
More informationTier0 plans and security and backup policy proposals
Tier0 plans and security and backup policy proposals, CERN IT-PSS CERN - IT Outline Service operational aspects Hardware set-up in 2007 Replication set-up Test plan Backup and security policies CERN Oracle
More informationUsing S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier
Using S3 cloud storage with ROOT and CernVMFS Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier INDEX Huawei cloud storages at CERN Old vs. new Huawei UDS comparative
More informationImprovement Options for LHC Mass Storage and Data Management
Improvement Options for LHC Mass Storage and Data Management Dirk Düllmann HEPIX spring meeting @ CERN, 7 May 2008 Outline DM architecture discussions in IT Data Management group Medium to long term data
More informationCERN s Scientific Programme and the need for computing resources
This document produced by Members of the Helix Nebula consortium is licensed under a Creative Commons Attribution 3.0 Unported License. Permissions beyond the scope of this license may be available at
More informationData storage services at CC-IN2P3
Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief Agenda Hardware: Storage on disk. Storage on tape. Software:
More informationOracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
More informationData storage at CERN
Data storage at CERN Overview: Some CERN / HEP specifics Where does the data come from, what happens to it General-purpose data storage @ CERN Outlook EAKC2014 Data at CERN J.Iven - 1 CERN vs Experiments
More informationEvolution of Database Replication Technologies for WLCG
Home Search Collections Journals About Contact us My IOPscience Evolution of Database Replication Technologies for WLCG This content has been downloaded from IOPscience. Please scroll down to see the full
More informationSMART SCALE YOUR STORAGE - Object "Forever Live" Storage - Roberto Castelli EVP Sales & Marketing BCLOUD
SMART SCALE YOUR STORAGE - Object "Forever Live" Storage - Roberto Castelli EVP Sales & Marketing BCLOUD 1 BCLOUD at a Glance 4 years constantly growing + 3PBs protected and distributed every day from
More informationHigh Availability Databases based on Oracle 10g RAC on Linux
High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN, June 2006 Luca Canali, CERN IT Outline Goals Architecture of an HA DB Service Deployment at the CERN Physics Database
More informationDSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE
DSS Data & Diskpool and cloud storage benchmarks used in IT-DSS CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Geoffray ADDE DSS Outline I- A rational approach to storage systems evaluation
More informationTHE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB
More informationThe Agile Infrastructure Project. Monitoring. Markus Schulz Pedro Andrade. CERN IT Department CH-1211 Genève 23 Switzerland www.cern.
The Agile Infrastructure Project Monitoring Markus Schulz Pedro Andrade CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Outline Monitoring WG and AI Today s Monitoring in IT Architecture
More informationOracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya
Oracle Database - Engineered for Innovation Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database 11g Release 2 Shipping since September 2009 11.2.0.3 Patch Set now
More informationDatabase Services for Physics @ CERN
Database Services for Physics @ CERN Deployment and Monitoring Radovan Chytracek CERN IT Department Outline Database services for physics Status today How we do the services tomorrow? Performance tuning
More informationTechnical. Overview. ~ a ~ irods version 4.x
Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number
More informationTesting the In-Memory Column Store for in-database physics analysis. Dr. Maaike Limper
Testing the In-Memory Column Store for in-database physics analysis Dr. Maaike Limper About CERN CERN - European Laboratory for Particle Physics Support the research activities of 10 000 scientists from
More informationData Management Plan (DMP) for Particle Physics Experiments prepared for the 2015 Consolidated Grants Round. Detailed Version
Data Management Plan (DMP) for Particle Physics Experiments prepared for the 2015 Consolidated Grants Round. Detailed Version The Particle Physics Experiment Consolidated Grant proposals now being submitted
More informationBetriebssystem-Virtualisierung auf einem Rechencluster am SCC mit heterogenem Anwendungsprofil
Betriebssystem-Virtualisierung auf einem Rechencluster am SCC mit heterogenem Anwendungsprofil Volker Büge 1, Marcel Kunze 2, OIiver Oberst 1,2, Günter Quast 1, Armin Scheurer 1 1) Institut für Experimentelle
More informationCERNBox + EOS: Cloud Storage for Science
Data & Storage Services CERNBox + EOS: Cloud Storage for Science CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Presenter: Luca Masce. Thanks to: Jakub T. Mościcki, Andreas J. Peters,
More informationCan Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation
Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation Forward-Looking Statements During our meeting today we may make forward-looking
More informationBeyond High Performance Computing: What Matters to CERN
Beyond High Performance Computing: What Matters to CERN Pierre VANDE VYVRE for the ALICE Collaboration ALICE Data Acquisition Project Leader CERN, Geneva, Switzerland 2 CERN CERN is the world's largest
More information(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015
(Possible) HEP Use Case for NDN Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015 Outline LHC Experiments LHC Computing Models CMS Data Federation & AAA Evolving Computing Models & NDN Summary Phil DeMar:
More informationHTCondor at the RAL Tier-1
HTCondor at the RAL Tier-1 Andrew Lahiff, Alastair Dewhurst, John Kelly, Ian Collier, James Adams STFC Rutherford Appleton Laboratory HTCondor Week 2014 Outline Overview of HTCondor at RAL Monitoring Multi-core
More informationAgile Infrastructure Update Monitoring
Agile Infrastructure Update Monitoring Pedro Andrade IT/GT 6 th July 2012 IT Technical Forum CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Overview Introduction Motivation, Challenge,
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationThe CMS analysis chain in a distributed environment
The CMS analysis chain in a distributed environment on behalf of the CMS collaboration DESY, Zeuthen,, Germany 22 nd 27 th May, 2005 1 The CMS experiment 2 The CMS Computing Model (1) The CMS collaboration
More informationStatus and Evolution of ATLAS Workload Management System PanDA
Status and Evolution of ATLAS Workload Management System PanDA Univ. of Texas at Arlington GRID 2012, Dubna Outline Overview PanDA design PanDA performance Recent Improvements Future Plans Why PanDA The
More informationBig Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
More informationBig Data? Definition # 1: Big Data Definition Forrester Research
Big Data Big Data? Definition # 1: Big Data Definition Forrester Research Big Data? Definition # 2: Quote of Tim O Reilly brings it all home: Companies that have massive amounts of data without massive
More informationStorage Virtualization. Andreas Joachim Peters CERN IT-DSS
Storage Virtualization Andreas Joachim Peters CERN IT-DSS Outline What is storage virtualization? Commercial and non-commercial tools/solutions Local and global storage virtualization Scope of this presentation
More informationHitachi NAS Platform and Hitachi Content Platform with ESRI Image
W H I T E P A P E R Hitachi NAS Platform and Hitachi Content Platform with ESRI Image Aciduisismodo Extension to ArcGIS Dolore Server Eolore for Dionseq Geographic Uatummy Information Odolorem Systems
More informationData Management in an International Data Grid Project. Timur Chabuk 04/09/2007
Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the
More informationDas HappyFace Meta-Monitoring Framework
Das HappyFace Meta-Monitoring Framework B. Berge, M. Heinrich, G. Quast, A. Scheurer, M. Zvada, DPG Frühjahrstagung Karlsruhe, 28. März 1. April 2011 KIT University of the State of Baden-Wuerttemberg and
More informationCisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
More informationThe LCG Distributed Database Infrastructure
The LCG Distributed Database Infrastructure Dirk Düllmann, CERN & LCG 3D DESY Computing Seminar 21. May 07 CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Outline of the Talk Why databases
More informationBig Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
More informationExperience in running relational databases on clustered storage
Experience in running relational databases on clustered storage Ruben.Gaspar.Aparicio_@_cern.ch CERN, IT Department CHEP 2015, Okinawa, Japan 13/04/2015 Agenda Brief introduction Our setup Caching technologies
More informationMichał Jankowski Maciej Brzeźniak PSNC
National Data Storage - architecture and mechanisms Michał Jankowski Maciej Brzeźniak PSNC Introduction Assumptions Architecture Main components Deployment Use case Agenda Data storage: The problem needs
More informationAlexandria Overview. Sept 4, 2015
Alexandria Overview Sept 4, 2015 Alexandria 1U System Block Diagram SAS Interface Board Zoneboard Zoneboard I2C UART SAS to SATA I2C 12V AC Power Supply Power 60w Supply Seagate Confidential Alexandria
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationNo file left behind - monitoring transfer latencies in PhEDEx
FERMILAB-CONF-12-825-CD International Conference on Computing in High Energy and Nuclear Physics 2012 (CHEP2012) IOP Publishing No file left behind - monitoring transfer latencies in PhEDEx T Chwalek a,
More informationApache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com
Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture
More informationComparison of the Frontier Distributed Database Caching System with NoSQL Databases
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra dwd@fnal.gov Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359
More informationMaurice Askinazi Ofer Rind Tony Wong. HEPIX @ Cornell Nov. 2, 2010 Storage at BNL
Maurice Askinazi Ofer Rind Tony Wong HEPIX @ Cornell Nov. 2, 2010 Storage at BNL Traditional Storage Dedicated compute nodes and NFS SAN storage Simple and effective, but SAN storage became very expensive
More informationDATA MINING WITH HADOOP AND HIVE Introduction to Architecture
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of
More informationRunning a typical ROOT HEP analysis on Hadoop/MapReduce. Stefano Alberto Russo Michele Pinamonti Marina Cobal
Running a typical ROOT HEP analysis on Hadoop/MapReduce Stefano Alberto Russo Michele Pinamonti Marina Cobal CHEP 2013 Amsterdam 14-18/10/2013 Topics The Hadoop/MapReduce model Hadoop and High Energy Physics
More information(Scale Out NAS System)
For Unlimited Capacity & Performance Clustered NAS System (Scale Out NAS System) Copyright 2010 by Netclips, Ltd. All rights reserved -0- 1 2 3 4 5 NAS Storage Trend Scale-Out NAS Solution Scaleway Advantages
More informationScalable stochastic tracing of distributed data management events
Scalable stochastic tracing of distributed data management events Mario Lassnig mario.lassnig@cern.ch ATLAS Data Processing CERN Physics Department Distributed and Parallel Systems University of Innsbruck
More informationMass Storage System for Disk and Tape resources at the Tier1.
Mass Storage System for Disk and Tape resources at the Tier1. Ricci Pier Paolo et al., on behalf of INFN TIER1 Storage pierpaolo.ricci@cnaf.infn.it ACAT 2008 November 3-7, 2008 Erice Summary Tier1 Disk
More informationObjectivity Data Migration
Objectivity Data Migration M. Nowak, K. Nienartowicz, A. Valassi, M. Lübeck, D. Geppert CERN, CH-1211 Geneva 23, Switzerland In this article we describe the migration of event data collected by the COMPASS
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationIBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:
Creating an Integrated, Optimized, and Secure Enterprise Data Platform: IBM PureData System for Transactions with SafeNet s ProtectDB and DataSecure Table of contents 1. Data, Data, Everywhere... 3 2.
More informationData Domain Profiling and Data Masking for Hadoop
Data Domain Profiling and Data Masking for Hadoop 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or
More informationPrototyping a file sharing and synchronisation platform with owncloud
Data & Storage Services Prototyping a file sharing and synchronisation platform with owncloud CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Jakub T. Moscicki Massimo Lamanna CERN IT-
More informationCloud Computing PES. (and virtualization at CERN) Cloud Computing. GridKa School 2011, Karlsruhe. Disclaimer: largely personal view of things
PES Cloud Computing Cloud Computing (and virtualization at CERN) Ulrich Schwickerath et al With special thanks to the many contributors to this presentation! GridKa School 2011, Karlsruhe CERN IT Department
More informationLCG POOL, Distributed Database Deployment and Oracle Services@CERN
LCG POOL, Distributed Database Deployment and Oracle Services@CERN Dirk Düllmann, D CERN HEPiX Fall 04, BNL Outline: POOL Persistency Framework and its use in LHC Data Challenges LCG 3D Project scope and
More informationIBM ELASTIC STORAGE SEAN LEE
IBM ELASTIC STORAGE SEAN LEE Solution Architect Platform Computing Division IBM Greater China Group Agenda Challenges in Data Management What is IBM Elastic Storage Key Features Elastic Storage Server
More informationHADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
More informationPoS(EGICF12-EMITC2)110
User-centric monitoring of the analysis and production activities within the ATLAS and CMS Virtual Organisations using the Experiment Dashboard system Julia Andreeva E-mail: Julia.Andreeva@cern.ch Mattia
More informationIntro to Map/Reduce a.k.a. Hadoop
Intro to Map/Reduce a.k.a. Hadoop Based on: Mining of Massive Datasets by Ra jaraman and Ullman, Cambridge University Press, 2011 Data Mining for the masses by North, Global Text Project, 2012 Slides by
More informationForecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
More informationOverview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
More informationAn Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise
An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise Solutions Group The following is intended to outline our
More informationEvolution of the Italian Tier1 (INFN-T1) Umea, May 2009 Felice.Rosso@cnaf.infn.it
Evolution of the Italian Tier1 (INFN-T1) Umea, May 2009 Felice.Rosso@cnaf.infn.it 1 In 2001 the project of the Italian Tier1 in Bologna at CNAF was born. First computers were based on Intel Pentium III
More informationBIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets
More informationUpcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationSymantec NetBackup Appliances
Symantec NetBackup Appliances Simplifying Backup Operations Geoff Greenlaw Manager, Data Centre Appliances UK & Ireland January 2012 1 Simplifying Your Backups Reduce Costs Minimise Complexity Deliver
More informationNews and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
More informationBig data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
More informationRunning the scientific data archive
Running the scientific data archive Costs, technologies, challenges Jos van Wezel STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz
More informationHadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
More informationTake An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data
More informationU-LITE Network Infrastructure
U-LITE: a proposal for scientific computing at LNGS S. Parlati, P. Spinnato, S. Stalio LNGS 13 Sep. 2011 20 years of Scientific Computing at LNGS Early 90s: highly centralized structure based on VMS cluster
More informationCASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1
CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level -ORACLE TIMESTEN 11gR1 CASE STUDY Oracle TimesTen In-Memory Database and Shared Disk HA Implementation
More informationirods at CC-IN2P3: managing petabytes of data
Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules irods at CC-IN2P3: managing petabytes of data Jean-Yves Nief Pascal Calvat Yonny Cardenas Quentin Le Boulc h
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
More information2011 FileTek, Inc. All rights reserved. 1 QUESTION
2011 FileTek, Inc. All rights reserved. 1 QUESTION 2011 FileTek, Inc. All rights reserved. 2 HSM - ILM - >>> 2011 FileTek, Inc. All rights reserved. 3 W.O.R.S.E. HOW MANY YEARS 2011 FileTek, Inc. All rights
More informationObject Oriented Storage and the End of File-Level Restores
Object Oriented Storage and the End of File-Level Restores Stacy Schwarz-Gardner Spectra Logic Agenda Data Management Challenges Data Protection Data Recovery Data Archive Why Object Based Storage? The
More informationDeploying a distributed data storage system on the UK National Grid Service using federated SRB
Deploying a distributed data storage system on the UK National Grid Service using federated SRB Manandhar A.S., Kleese K., Berrisford P., Brown G.D. CCLRC e-science Center Abstract As Grid enabled applications
More informationData Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com
Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,
More informationWell packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
More informationConstructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
More informationOverview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
More informationThe CMS Tier0 goes Cloud and Grid for LHC Run 2. Dirk Hufnagel (FNAL) for CMS Computing
The CMS Tier0 goes Cloud and Grid for LHC Run 2 Dirk Hufnagel (FNAL) for CMS Computing CHEP, 13.04.2015 Overview Changes for the Tier0 between Run 1 and Run 2 CERN Agile Infrastructure (in GlideInWMS)
More informationData Warehouse as a Service. Lot 2 - Platform as a Service. Version: 1.1, Issue Date: 05/02/2014. Classification: Open
Data Warehouse as a Service Version: 1.1, Issue Date: 05/02/2014 Classification: Open Classification: Open ii MDS Technologies Ltd 2014. Other than for the sole purpose of evaluating this Response, no
More informationCaringo Swarm 7: beyond the limits of traditional storage. A new private cloud foundation for storage needs at scale
Caringo Swarm 7: beyond the limits of traditional storage. A new private cloud foundation for storage needs at scale Prepared for: Caringo May 2014 TABLE OF CONTENTS TABLE OF CONTENTS 1 EXECUTIVE SUMMARY
More informationData sharing and Big Data in the physical sciences. 2 October 2015
Data sharing and Big Data in the physical sciences 2 October 2015 Content Digital curation: Data and metadata Why consider the physical sciences? Astronomy: Video Physics: LHC for example. Video The Research
More informationPrivate Cloud Storage for Media Applications. Bang Chang Vice President, Broadcast Servers and Storage bang.chang@xor-media.com
Private Cloud Storage for Media Bang Chang Vice President, Broadcast Servers and Storage bang.chang@xor-media.com Table of Contents Introduction Cloud Storage Requirements Application transparency Universal
More information