Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary
|
|
|
- Calvin Mathews
- 10 years ago
- Views:
Transcription
1 Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary
2
3 16/02/2015 Real-Time Analytics: Making better and faster business decisions 8
4
5 The ATLAS experiment CERN IT Department CH-1211 Genève 23 Switzerland tons, 150 million sensors generating data 40 millions times per second i.e. a petabyte/s The Worldwide LHC Computing Grid 5
6 Data Collection and Archiving at CERN Data flow to permanent storage: 4-6 GB/sec LHCb: MB/sec ATLAS: 1-2 GB/sec Alice: 4 GB/sec CMS: 1-2 GB/sec [email protected]
7 The Worldwide LHC Computing Grid An international collaboration to distribute and analyse LHC data Integrates computer centres worldwide that provide computing and storage resource into a single infrastructure accessible by all LHC physicists Tier-0 (CERN): data recording, reconstruction and distribution nearly 170 sites, 40 countries ~ cores Tier-1: permanent storage, reprocessing, analysis Tier-2: Simulation, end-user analysis 500 PB of storage > 2 million jobs/day Gb links
8 LHC Big Data Few PB of raw data becomes ~100 PB! Duplicate raw data Simulated data Derived data products Versions as software improves Replicas to allow access by more physicists
9 How do we store/retrieve LHC data? A short history 1 st Try - All data in an commercial Object Database (1995) good match for complex data model and OO language integralon but the market predicted by many analysts did not materialise! 2 nd Try - All data in a relalonal DB - object relalonal mapping (1999) PB- scale of deployment was far for from being proven Users code in C++ and rejected data model definilon in SQL Hybrid between RDBMS and structured files (from today) RelaLonal DBs for transaclonal management of metadata (only TB- scale) File/dataset meta data, condilons, calibralon, provenance, work flow via DB abstraclon (plugins: Oracle, MySQL, SQLite, FronLer/SQUID) Open source persistency framework (ROOT) Uses C++ introspeclon to store/retrieve networks of C++ objects Column- store for efficient sparse reading [email protected] 9
10 Processing a TTree TSelector Output list Begin() - Create histograms - Define output list preselection Process() Ok analysis Terminate() - Finalize analysis (fitting,...) Event Branch n Leaf Leaf Branch Branch Read needed parts only Branch Leaf Leaf Leaf Leaf Leaf TTree 1 2 n last Loop over events 16
11 CERN Disk Storage Overview AFS CASTOR EOS Ceph NFS CERNBox Raw Capacity 3 PB 20 PB 140 PB 4 PB 200 TB 1.1 PB Data Stored 390 TB 86 PB (tape) 27 PB 170 TB 36 TB 35 TB Files Stored 2.7 B 300 M 284 M 77 M (obj) 120 M 14 M AFS is CERN s linux home directory service CASTOR & EOS are mainly used for the physics use case (Data Analysis and DAQ) Ceph is our storage backend for images and volumes in OpenStack NFS is mainly used by engineering application CERNBox is our file synchronisation service based on OwnCloud+EOS 2
12 ture Tape at CERN inosity Archive write 27 PB Data Volume 100 PB physics archive 7 PB backup (TSM) 15 PB 23 PB Tape libraries 3+2 x IBM TS x Oracle SL8500 Tape drives 100 physics archive 50 backup Archive read Capacity 70k slots 30k tapes /4/2015 CHEP 2015, Okinawa 2
13 Archive: Large scale media migration Part 1: Oracle T10000D Part 2: IBM TS1150 Deadline: LHC run 2 start! Repack Repack LHC Run1 LHC Run1 14/4/2015 CHEP 2015, Okinawa 13
14 Feb
15 Smart vs Simple Archive: HSM Issues CASTOR had been designed as Hierarchical Storage Management system disk-only and multi-pool support were added later painfully.. required rates for namespace access and file-open exceeded earlier estimates Around LHC start also conceptual issues with the HSM model became visible A file is not a meaningful granule for managing data exchange experiment use datasets Dataset parts needed to be pinned on disk by users to avoid cache trashing Users had to trick the HSM to do the right thing :-(
16 DSS EOS Project: Goals & Choices Server, media, file system failures need to be transparently absorbed key functionality: file level replication and rebalancing data stays available after a failure - no human intervention Fine grained redundancy within one h/w setup choose & change redundancy level for specific data either file replica count or erasure encoding Support bulk deployment operations eg replace hundreds of servers at end of warranty In-memory namespace (sparse hash per directory) file stat calls 1-2 orders faster write ahead logging for durability Internet Services CERN IT Department CH-1211 Genève 23 Switzerland Later in addition: transparent multi-site clustering eg between Geneva and Budapest 16
17 Connectivity (100 Gbps) Dante/Géant T-Systems
18 EOS Raw Capacity Evolution
19 Why do we develop our own open source storage software? Large science community trained to be effective with set of products efficiency of this community is our main asset - not just the raw utilisation of CPUs and disks integration and specific support do matter community sharing via tools and formats even more Long term projects change of vendor/technology is not only likely but expected we carry old but valuable data through time (bit-preservation) loss of data ownership after first active project period
20 Does Kryder s law still hold? areal density CAGR source: HDD Opportunities & Challenges, Now to 2020, Dave Anderson, Seagate
21 Object Disk Each disk talks object storage protocol over TCP replication/failover with other disks in a networked disk cluster open access library for app development Why now? shingled media come with constrained (object) semantic: eg no updates Early stage with several open questions port price for disk network vs price gain by reduced server/power cost? standardisation of protocol/semantics to allow app development at low risk of vendor binding?
22 Can we optimise our systems further? Infrastructure analytics apply statistical analysis to the complete system: storage, cpu, network, user app measure/predict quantitative impact of changes on real job population Easy! looks like physics analysis with infrastructure metrics instead of physics data really?
23 Non-trivial Technically needs consolidated service and application side metrics usually: log data for human consumption without data design Conceptually some established metrics turn out to be less useful for analysis of today s hardware than expected cpu efficiency = t_cpu / t_wall? storage efficiency = GB / s? correlation does not imply causal relation Sociologically better observe rule of local discovery people who quantitatively understand the infrastructure are busy running services Always
24 Data Collection and Analysis Repository MR node MR node MR node Hadoop MR node MR node MR node Ramping up: ~ 100 nodes ~ 100 TB raw logs eos lsf ai Monitoring JSON Files small, binary subset Periodic Extract & Cleaning HDFS Set: Set: EOS EOS readbytes Set: eos : number readbytes : number readbytes filename : number : string filename : string filename opentime : string : time opentime : time opentime : time export User extract In production: - Flume - HDFS - MR - Pig - Spark - Scoop - {Impala} Current work items: Service: availability (eg isolation and rolling upgrades) Analytics: workbooks support for popular analysis tools: R/python/ROOT
25 Summary CERN has a long tradition in deploying large scale storage systems used by a distributed science community world-wide During the first LHC run period we have passed the 100 PB mark at CERN and more importantly have contributed to the rapid confirmation of the Higgs boson and many other LHC results For LHC Run 2 we have significantly upgraded & optimised the infrastructure in close collaboration between service providers and users Adding more quantitative infrastructure analytics to prepare for High-Luminosity-LHC CERN is already very active as user and provider in the open source world and the overlap with other Big Data communities is increasing.
26 Thank you!
CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT
SS Data & Storage CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT HEPiX Fall 2012 Workshop October 15-19, 2012 Institute of High Energy Physics, Beijing, China SS Outline
DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group
DSS High performance storage pools for LHC Łukasz Janyst on behalf of the CERN IT-DSS group CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Introduction The goal of EOS is to provide a
Storage strategy and cloud storage evaluations at CERN Dirk Duellmann, CERN IT
SS Data & Storage Storage strategy and cloud storage evaluations at CERN Dirk Duellmann, CERN IT (with slides from Andreas Peters and Jan Iven) 5th International Conference "Distributed Computing and Grid-technologies
Distributed Database Access in the LHC Computing Grid with CORAL
Distributed Database Access in the LHC Computing Grid with CORAL Dirk Duellmann, CERN IT on behalf of the CORAL team (R. Chytracek, D. Duellmann, G. Govi, I. Papadopoulos, Z. Xie) http://pool.cern.ch &
HEP computing and Grid computing & Big Data
May 11 th 2014 CC visit: Uni Trieste and Uni Udine HEP computing and Grid computing & Big Data CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Massimo Lamanna/CERN IT department - Data
Tier0 plans and security and backup policy proposals
Tier0 plans and security and backup policy proposals, CERN IT-PSS CERN - IT Outline Service operational aspects Hardware set-up in 2007 Replication set-up Test plan Backup and security policies CERN Oracle
Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier
Using S3 cloud storage with ROOT and CernVMFS Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier INDEX Huawei cloud storages at CERN Old vs. new Huawei UDS comparative
Data storage services at CC-IN2P3
Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief Agenda Hardware: Storage on disk. Storage on tape. Software:
Oracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
Data storage at CERN
Data storage at CERN Overview: Some CERN / HEP specifics Where does the data come from, what happens to it General-purpose data storage @ CERN Outlook EAKC2014 Data at CERN J.Iven - 1 CERN vs Experiments
Evolution of Database Replication Technologies for WLCG
Home Search Collections Journals About Contact us My IOPscience Evolution of Database Replication Technologies for WLCG This content has been downloaded from IOPscience. Please scroll down to see the full
High Availability Databases based on Oracle 10g RAC on Linux
High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN, June 2006 Luca Canali, CERN IT Outline Goals Architecture of an HA DB Service Deployment at the CERN Physics Database
DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE
DSS Data & Diskpool and cloud storage benchmarks used in IT-DSS CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Geoffray ADDE DSS Outline I- A rational approach to storage systems evaluation
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon [email protected] [email protected] XLDB
The Agile Infrastructure Project. Monitoring. Markus Schulz Pedro Andrade. CERN IT Department CH-1211 Genève 23 Switzerland www.cern.
The Agile Infrastructure Project Monitoring Markus Schulz Pedro Andrade CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Outline Monitoring WG and AI Today s Monitoring in IT Architecture
Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya
Oracle Database - Engineered for Innovation Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database 11g Release 2 Shipping since September 2009 11.2.0.3 Patch Set now
Database Services for Physics @ CERN
Database Services for Physics @ CERN Deployment and Monitoring Radovan Chytracek CERN IT Department Outline Database services for physics Status today How we do the services tomorrow? Performance tuning
Technical. Overview. ~ a ~ irods version 4.x
Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number
Testing the In-Memory Column Store for in-database physics analysis. Dr. Maaike Limper
Testing the In-Memory Column Store for in-database physics analysis Dr. Maaike Limper About CERN CERN - European Laboratory for Particle Physics Support the research activities of 10 000 scientists from
Data Management Plan (DMP) for Particle Physics Experiments prepared for the 2015 Consolidated Grants Round. Detailed Version
Data Management Plan (DMP) for Particle Physics Experiments prepared for the 2015 Consolidated Grants Round. Detailed Version The Particle Physics Experiment Consolidated Grant proposals now being submitted
CERNBox + EOS: Cloud Storage for Science
Data & Storage Services CERNBox + EOS: Cloud Storage for Science CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Presenter: Luca Masce. Thanks to: Jakub T. Mościcki, Andreas J. Peters,
Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation
Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation Forward-Looking Statements During our meeting today we may make forward-looking
(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015
(Possible) HEP Use Case for NDN Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015 Outline LHC Experiments LHC Computing Models CMS Data Federation & AAA Evolving Computing Models & NDN Summary Phil DeMar:
HTCondor at the RAL Tier-1
HTCondor at the RAL Tier-1 Andrew Lahiff, Alastair Dewhurst, John Kelly, Ian Collier, James Adams STFC Rutherford Appleton Laboratory HTCondor Week 2014 Outline Overview of HTCondor at RAL Monitoring Multi-core
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
The CMS analysis chain in a distributed environment
The CMS analysis chain in a distributed environment on behalf of the CMS collaboration DESY, Zeuthen,, Germany 22 nd 27 th May, 2005 1 The CMS experiment 2 The CMS Computing Model (1) The CMS collaboration
Status and Evolution of ATLAS Workload Management System PanDA
Status and Evolution of ATLAS Workload Management System PanDA Univ. of Texas at Arlington GRID 2012, Dubna Outline Overview PanDA design PanDA performance Recent Improvements Future Plans Why PanDA The
Big Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
Big Data? Definition # 1: Big Data Definition Forrester Research
Big Data Big Data? Definition # 1: Big Data Definition Forrester Research Big Data? Definition # 2: Quote of Tim O Reilly brings it all home: Companies that have massive amounts of data without massive
Hitachi NAS Platform and Hitachi Content Platform with ESRI Image
W H I T E P A P E R Hitachi NAS Platform and Hitachi Content Platform with ESRI Image Aciduisismodo Extension to ArcGIS Dolore Server Eolore for Dionseq Geographic Uatummy Information Odolorem Systems
Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007
Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the
Das HappyFace Meta-Monitoring Framework
Das HappyFace Meta-Monitoring Framework B. Berge, M. Heinrich, G. Quast, A. Scheurer, M. Zvada, DPG Frühjahrstagung Karlsruhe, 28. März 1. April 2011 KIT University of the State of Baden-Wuerttemberg and
Cisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
Experience in running relational databases on clustered storage
Experience in running relational databases on clustered storage Ruben.Gaspar.Aparicio_@_cern.ch CERN, IT Department CHEP 2015, Okinawa, Japan 13/04/2015 Agenda Brief introduction Our setup Caching technologies
Alexandria Overview. Sept 4, 2015
Alexandria Overview Sept 4, 2015 Alexandria 1U System Block Diagram SAS Interface Board Zoneboard Zoneboard I2C UART SAS to SATA I2C 12V AC Power Supply Power 60w Supply Seagate Confidential Alexandria
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
Apache Sentry. Prasad Mujumdar [email protected] [email protected]
Apache Sentry Prasad Mujumdar [email protected] [email protected] Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra [email protected] Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359
Maurice Askinazi Ofer Rind Tony Wong. HEPIX @ Cornell Nov. 2, 2010 Storage at BNL
Maurice Askinazi Ofer Rind Tony Wong HEPIX @ Cornell Nov. 2, 2010 Storage at BNL Traditional Storage Dedicated compute nodes and NFS SAN storage Simple and effective, but SAN storage became very expensive
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of
Running a typical ROOT HEP analysis on Hadoop/MapReduce. Stefano Alberto Russo Michele Pinamonti Marina Cobal
Running a typical ROOT HEP analysis on Hadoop/MapReduce Stefano Alberto Russo Michele Pinamonti Marina Cobal CHEP 2013 Amsterdam 14-18/10/2013 Topics The Hadoop/MapReduce model Hadoop and High Energy Physics
(Scale Out NAS System)
For Unlimited Capacity & Performance Clustered NAS System (Scale Out NAS System) Copyright 2010 by Netclips, Ltd. All rights reserved -0- 1 2 3 4 5 NAS Storage Trend Scale-Out NAS Solution Scaleway Advantages
Mass Storage System for Disk and Tape resources at the Tier1.
Mass Storage System for Disk and Tape resources at the Tier1. Ricci Pier Paolo et al., on behalf of INFN TIER1 Storage [email protected] ACAT 2008 November 3-7, 2008 Erice Summary Tier1 Disk
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:
Creating an Integrated, Optimized, and Secure Enterprise Data Platform: IBM PureData System for Transactions with SafeNet s ProtectDB and DataSecure Table of contents 1. Data, Data, Everywhere... 3 2.
Data Domain Profiling and Data Masking for Hadoop
Data Domain Profiling and Data Masking for Hadoop 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or
Prototyping a file sharing and synchronisation platform with owncloud
Data & Storage Services Prototyping a file sharing and synchronisation platform with owncloud CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Jakub T. Moscicki Massimo Lamanna CERN IT-
IBM ELASTIC STORAGE SEAN LEE
IBM ELASTIC STORAGE SEAN LEE Solution Architect Platform Computing Division IBM Greater China Group Agenda Challenges in Data Management What is IBM Elastic Storage Key Features Elastic Storage Server
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
PoS(EGICF12-EMITC2)110
User-centric monitoring of the analysis and production activities within the ATLAS and CMS Virtual Organisations using the Experiment Dashboard system Julia Andreeva E-mail: [email protected] Mattia
Intro to Map/Reduce a.k.a. Hadoop
Intro to Map/Reduce a.k.a. Hadoop Based on: Mining of Massive Datasets by Ra jaraman and Ullman, Cambridge University Press, 2011 Data Mining for the masses by North, Global Text Project, 2012 Slides by
Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise
An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise Solutions Group The following is intended to outline our
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets
Upcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC [email protected] Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
Large scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
Symantec NetBackup Appliances
Symantec NetBackup Appliances Simplifying Backup Operations Geoff Greenlaw Manager, Data Centre Appliances UK & Ireland January 2012 1 Simplifying Your Backups Reduce Costs Minimise Complexity Deliver
News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
Big data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
Hadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc [email protected]
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc [email protected] What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data
U-LITE Network Infrastructure
U-LITE: a proposal for scientific computing at LNGS S. Parlati, P. Spinnato, S. Stalio LNGS 13 Sep. 2011 20 years of Scientific Computing at LNGS Early 90s: highly centralized structure based on VMS cluster
CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1
CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level -ORACLE TIMESTEN 11gR1 CASE STUDY Oracle TimesTen In-Memory Database and Shared Disk HA Implementation
irods at CC-IN2P3: managing petabytes of data
Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules irods at CC-IN2P3: managing petabytes of data Jean-Yves Nief Pascal Calvat Yonny Cardenas Quentin Le Boulc h
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
Object Oriented Storage and the End of File-Level Restores
Object Oriented Storage and the End of File-Level Restores Stacy Schwarz-Gardner Spectra Logic Agenda Data Management Challenges Data Protection Data Recovery Data Archive Why Object Based Storage? The
Deploying a distributed data storage system on the UK National Grid Service using federated SRB
Deploying a distributed data storage system on the UK National Grid Service using federated SRB Manandhar A.S., Kleese K., Berrisford P., Brown G.D. CCLRC e-science Center Abstract As Grid enabled applications
Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,[email protected]
Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,[email protected] Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,
Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
Constructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
Data Warehouse as a Service. Lot 2 - Platform as a Service. Version: 1.1, Issue Date: 05/02/2014. Classification: Open
Data Warehouse as a Service Version: 1.1, Issue Date: 05/02/2014 Classification: Open Classification: Open ii MDS Technologies Ltd 2014. Other than for the sole purpose of evaluating this Response, no
Caringo Swarm 7: beyond the limits of traditional storage. A new private cloud foundation for storage needs at scale
Caringo Swarm 7: beyond the limits of traditional storage. A new private cloud foundation for storage needs at scale Prepared for: Caringo May 2014 TABLE OF CONTENTS TABLE OF CONTENTS 1 EXECUTIVE SUMMARY
Private Cloud Storage for Media Applications. Bang Chang Vice President, Broadcast Servers and Storage [email protected]
Private Cloud Storage for Media Bang Chang Vice President, Broadcast Servers and Storage [email protected] Table of Contents Introduction Cloud Storage Requirements Application transparency Universal
