LSST Database server. Osman AÏDEL

Similar documents
WHITEPAPER: Understanding Pillar Axiom Data Protection Options

F600Q 8Gb FC Storage Performance Report Date: 2012/10/30

Database Administration with MySQL

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

MySQL Storage Engines

GraySort on Apache Spark by Databricks

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Arrow ECS sp. z o.o. Oracle Partner Academy training environment with Oracle Virtualization. Oracle Partner HUB

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Oracle Database In-Memory The Next Big Thing

White paper. QNAP Turbo NAS with SSD Cache

Vendor and Hardware Platform: Fujitsu BX924 S2 Virtualization Platform: VMware ESX 4.0 Update 2 (build )

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

SUN/Oracle WSCA Price List. Please call (877) SUN-LATG (877) (504) Or Fax (504) Or

SQL Server Business Intelligence on HP ProLiant DL785 Server

HP reference configuration for entry-level SAS Grid Manager solutions

MedInformatix System Requirements

TREND MICRO SOFTWARE APPLIANCE SUPPORT

Enkitec Exadata Storage Layout

Optimizing Performance. Training Division New Delhi

Oracle Database Deployments with EMC CLARiiON AX4 Storage Systems

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014

Marvell DragonFly. TPC-C OLTP Database Benchmark: 20x Higher-performance using Marvell DragonFly NVCACHE with SanDisk X110 SSD 256GB

VMware VMmark V1.1.1 Results

S 2 -RAID: A New RAID Architecture for Fast Data Recovery

Scaling from Datacenter to Client

System Requirements for Netmail Archive

Scaling in a Hypervisor Environment

Protect SQL Server 2012 AlwaysOn Availability Group with Hitachi Application Protector

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

NEWSTAR ENTERPRISE and NEWSTAR Sales System Requirements

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Virtuoso and Database Scalability

SUN HARDWARE FROM ORACLE: PRICING FOR EDUCATION

Oracle DBA Course Contents

out of this world guide to: POWERFUL DEDICATED SERVERS

SECURE Web Gateway Sizing Guide

System Requirements Table of contents

Managing Storage Space in a Flash and Disk Hybrid Storage System

Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator

MySQL Cluster Deployment Best Practices

HP Z Turbo Drive PCIe SSD

SYSTEM SETUP FOR SPE PLATFORMS

Summary: This paper examines the performance of an XtremIO All Flash array in an I/O intensive BI environment.

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

Technical Paper. Performance and Tuning Considerations for SAS on Pure Storage FA-420 Flash Array

IBM DB2: LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Enterprise Edition. Hardware Requirements

Technical Paper. Performance and Tuning Considerations for SAS on the EMC XtremIO TM All-Flash Array

Appendix 3. Specifications for e-portal

IMPLEMENTING GREEN IT

Partitioning under the hood in MySQL 5.5

IncidentMonitor Server Specification Datasheet

Virtualizing Microsoft SQL Server 2008 on the Hitachi Adaptable Modular Storage 2000 Family Using Microsoft Hyper-V

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

ewebguru Web Hosting at its best

Cisco Prime Home 5.0 Minimum System Requirements (Standalone and High Availability)

Measuring Interface Latencies for SAS, Fibre Channel and iscsi

N /150/151/160 RAID Controller. N MegaRAID CacheCade. Feature Overview

OBM / FREQUENTLY ASKED QUESTIONS (FAQs) Can you explain the concept briefly on how the software actually works? What is the recommended bandwidth?

Optimizing TYPO3 performance

Deploying and Optimizing SQL Server for Virtual Machines

Certification Document macle GmbH Grafenthal-S1212M 24/02/2015. macle GmbH Grafenthal-S1212M Storage system

How to recover a failed Storage Spaces

AP ENPS ANYWHERE. Hardware and software requirements

Introduction to Cloud Computing

NEXTGEN v5.8 HARDWARE VERIFICATION GUIDE CLIENT HOSTED OR THIRD PARTY SERVERS

Very Large Enterprise Network, Deployment, Users

Uptime Infrastructure Monitor. Installation Guide

Safe Harbor Statement

Solution Brief July All-Flash Server-Side Storage for Oracle Real Application Clusters (RAC) on Oracle Linux

InfoScale Storage & Media Server Workloads

Sizing guide for Microsoft Hyper-V on HP server and storage technologies

System requirements for A+

Xanadu 130. Business Class Storage Solution. 8G FC Host Connectivity and 6G SAS Backplane. 2U 12-Bay 3.5 Form Factor

Current Status of FEFS for the K computer

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

Video Surveillance Storage and Verint Nextiva NetApp Video Surveillance Storage Solution

Analysis of VDI Storage Performance During Bootstorm

Cisco-EMC Microsoft SQL Server Fast Track Warehouse 3.0 Enterprise Reference Configurations. Data Sheet

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

Investigation of storage options for scientific computing on Grid and Cloud facilities

Benchmarking Cassandra on Violin

High Performance Oracle RAC Clusters A study of SSD SAN storage A Datapipe White Paper

StarWind iscsi SAN: Configuring Global Deduplication May 2012

Performance Validation and Test Results for Microsoft Exchange Server 2010 Enabled by EMC CLARiiON CX4-960

Encrypting MySQL data at Google. Jonas Oreland and Jeremy Cole

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Enterprise Network Deployment, 10,000 25,000 Users

Dragon Medical Enterprise Network Edition Technical Note: Requirements for DMENE Networks with virtual servers

Install Guide for JunosV Wireless LAN Controller

Microsoft SQL Server 2008 Data and Backup Compression

Oracle Architecture. Overview

Update: About Apple RAID Version 1.5 About this update

DB2 for Linux, UNIX, and Windows Performance Tuning and Monitoring Workshop

Date: March Reference No. RTS-CB 018

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Transcription:

LSST Database server Osman AÏDEL

Plan n Infrastructure n State of the art n Post ingestion Procedure n Indexes Osman AÏDEL mercredi 18 décembre 13 2

Infrastructure " Serveur NEC - Express5800/120Rg-1 " 2 processors Intel(R) Xeon(R) CPU 5160 @ 3.00GHz (Dual core) " 40GB RAM " 2 SAS disks [15k tpm] of 300 GB (RAID 1) " QLogic 4Gb/s FC dual-port card " Pillar Axiom 600 " 1 SAN Slammer " 8 x Brick SATA v2 (13x2TB) " File system " Ext4 : block size 4KB " 12 To allocated in RAID5 Osman AÏDEL mercredi 18 décembre 3

Plan n Infrastructure n State of the art n Post ingestion Procedure n Indexes Osman AÏDEL mercredi 18 décembre 4

State of the art lsst_prod_dedupe_byfilter_g lsst_prod_dedupe_byfilter_i lsst_prod_dedupe_byfilter_r lsst_prod_dedupe_byfilter_z lsst_prod_dedupe_byfilter_u lsst_prod_dc_2013_2 18 databases TOP 5 : SDSS DR7 Survey Stripe 82 Index Size (GB) Data Size(GB) krughoff_sdss_quality_db krughoff_early_prod_11032012 Others 0 100 200 300 400 500 600 700 800 Osman AÏDEL mercredi 18 décembre 5

State of the art RunDeepForcedSource - MyISAM Engine - 87 columns - Single Primary key - 7 indexes (keys) - 2 billion rows - Row size : 395 B - Data size : around 710 GB - Index size : around 150 GB Osman AÏDEL mercredi 18 décembre 6

State of the art Osman AÏDEL mercredi 18 décembre 7

Plan n Infrastructure n State of the art n Post ingestion Procedure n Indexes Osman AÏDEL mercredi 18 décembre 8

Post ingestion procedure https://dev.lsstcorp.org/trac/wiki/summer2013/configandstacktestingplans/dedupeforcedsources Step 1: Data Loading on RunDeepSource with PK Step 2 : Enabling keys on RunDeepSource Step 3 : Adding an additional key on RunDeepSource Step 4 : Creating a temporary table including data Step 5 : Adding keys to the temporary table Step 6 : Creating a memory table based on step 4 Step 7 :updating data on RunDeepSource from step 6 Execution Time (hours) 0 10 20 30 40 50 60 Total execution time 154 hours ( 6 days 10 hours) Osman AÏDEL mercredi 18 décembre 9

Post ingestion procedure n Step 2 : enabling keys on RunDeepForcedSource Mysql needs to sort data before building the index Mysql has two sort algorithms for sorting Key cache : recommended for small indexes Filesort : recommended for huge indexes From Mysql client Anyone can run it Very difficult to force the filesort algorithm Ìmpredictible switching from filesort to key cache algorithm Osman AÏDEL mercredi 18 décembre 10

Post ingestion procedure n Step 2 : enabling keys on RunDeepForcedSource From Myisamchk command Possibility to parallelize the index building More flexible than Mysql client Best performance in mono-thread with myisam_sort_buffer_size=2gb A bug prevents to exceed 4GB on the sort_buffer_size Local access is required Table locked Consuming a lot of disk space ( 1 TB) n Step 6 7: Updating data on RunDeepForcedSource Both steps were initialy merged into one step Execution time VERY,VERY long > 4 days Osman AÏDEL mercredi 18 décembre 11

Post ingestion procedure n Procedure only run at CC-IN2P3 : Qserv not yet At NSCA not yet n Suggestions : Step 1 : Loadind data ( 40 hours) Create table RunDeepForcedSource without primary key Alter table add primary key Step 2-3 could be grouped together RunDeepForcedSource is, by default, a heap table it might be interesting to sort data at the datafile level Partitionning but on which column? Osman AÏDEL mercredi 18 décembre 12

Plan n Infrastructure n State of the art n Post ingestion Procedure n indexes Osman AÏDEL mercredi 18 décembre 13 13

Indexes Select * from mytable where id =5; Without index => full table scan Important cost id First Name 1 a a 3 z z 5 e e 8 r r 7 t t 2 y y 4 u u 6 i i 9 k k 0 p p Last name Osman AÏDEL mercredi 18 décembre 14

Indexes Create index idx on mytable(id); Select * from mytable where id =5; Low cost 3 6 9 0 1 2 3 4 5 6 7 8 9 id First Name 1 a a 3 z z 5 e e 8 r r 7 t t 2 y y 4 u u 6 i i 9 k k 0 p p Last name Osman AÏDEL mercredi 18 décembre 15

Indexes -> Optimizer based on COST -> Index selectivity = cardinality / Nb of rows -> example : Index selectivity = 1 Select * from mytable where id > 3 High Cost -> Full scan 3 6 9 0 1 2 3 4 5 6 7 8 9 id First Name 1 a a 3 z z 5 e e 8 r r 7 t t 2 y y 4 u u 6 i i 9 k k 0 p p Last name Osman AÏDEL mercredi 18 décembre 16

Indexes Osman AÏDEL mercredi 18 décembre 17

Indexes datafile Block size : 4Kb Osman AÏDEL mercredi 18 décembre 18

Indexes datafile Block size : 4Kb Osman AÏDEL mercredi 18 décembre 19