Dick Sites Google Inc. February Datacenter Computers. modern challenges in CPU design
|
|
|
- Marlene McLaughlin
- 10 years ago
- Views:
Transcription
1 Dick Sites Google Inc. February 2015 Datacenter Computers modern challenges in CPU design
2 Thesis: Servers and desktops require different design emphasis February
3 Goals Draw a vivid picture of a computing environment that may be foreign to your experience so far Expose some ongoing research problems Perhaps inspire contributions in this area February
4 Analogy (S.A.T. pre-2005) : as : February
5 Analogy (S.A.T. pre-2005) : as : February
6 Datacenter Servers are Different Move data: big and small Real-time transactions: 1000s per second Isolation between programs Measurement underpinnings February
7 Move data: big and small
8 Move data: big and small Move lots of data Disk to/from RAM Network to/from RAM SSD to/from RAM Within RAM Bulk data Short data: variable-length items Compress, encrypt, checksum, hash, sort February
9 Lots of memory 4004: no memory i7: 12MB L3 cache February
10 Little brain, LOTS of memory Server 64GB-1TB Drinking straw access February
11 High-Bandwidth Cache Structure Hypothetical 64-CPU-thread server Core Core Core PCs Core 15 L1 caches L1 caches L1 caches L1 caches L2 cache L2 cache L3 cache February
12 Forcing Function Move 16 bytes every CPU cycle What are the direct consequences of this goal? February
13 Move 16 bytes every CPU cycle Load 16B, Store 16B, test, branch All in one cycle = 4-way issue minimum Need some 16-byte registers At 3.2GHz, 50 GB/s read + 50 GB/s write, 100 GB/sec, one L1 D-cache Must avoid reading cache line before write if it is to be fully written (else 150 GB/s) February
14 Short strings Parsing, words, packet headers, scattergather, checksums the _ quick _ brown _ fox VLAN IP hdr TCP hdr chksum February
15 Generic Move, 37 bytes thequ ickbrownfoxjumps overthelazydog.. thequ ickbrownfoxjumps overthelazydog.. 5 bytes 16 bytes 16 bytes February
16 Generic Move, 37 bytes 16B aligned cache accesses thequ ickbrownfoxjumps overthelazydog.. thequ ickbrownfoxjumps overthelazydog.. 5 bytes 16 bytes 16 bytes February
17 Generic Move, 37 bytes, aligned target 16B aligned cache accesses thequick brownfoxjumpsove rthelazydog.. thequickbr ownfoxjumpsovert helazydog.. 10 bytes 16 bytes 11 bytes 32B shift helpful: LD, >>, ST February
18 Useful Instructions Load Partial R1, R2, R3 Store Partial R1, R2, R3 Load/store R1 low bytes with 0(R2), length from R3 low 4 bits, 0 pad high bytes, R1 = 16 byte register 0(R2) can be unaligned Length zero never segfaults Similar to Power architecture Move Assist instructions Handles all short moves Remaining length is always multiple of 16 February
19 Move 16 bytes every CPU cycle L1 data cache: 2 aligned LD/ST accesses per cycle: 100 GB/s, plus 100 GB/s fill Core 0 L1 caches Core 1 L1 caches Core 2 L1 caches PCs Core 15 L1 caches L2 cache L2 cache L3 cache February
20 Move 16 bytes every CPU cycle Sustained throughout L2, L3, and RAM Perhaps four CPUs simultaneously 100 GB/s x Core 0 L1 caches Core 1 L1 caches Core 2 L1 caches PCs Core 15 L1 caches 200 GB/s x 4 L2 cache L2 cache 400 GB/s x 1 L3 cache February
21 Top 20 Stream Copy Bandwidth (April 2014) Date Machine ID ncpus COPY SGI_Altix_UV_ GB/s SGI_Altix_UV_ SGI_Altix_ Fujitsu_SPARC_M10-4S ScaleMP_Xeon_X6560_64B SGI_Altix_3700_Bx SGI_Altix_ NEC_SX IBM_Power_ Oracle_SPARC_T NEC_SX-5-16A ScaleMP_XeonX5570_vSMP_16B NEC_SX HP_AlphaServer_GS Our Forcing Function GB/s Cray_T932_ E Oracle_Sun_Server_X Fujitsu/Sun_Enterprise_M NEC_SX IBM_System_p5_ Intel_XeonPhi_SE10P February
22 Latency Buy 256GB of RAM only if you use it Higher cache miss rates And main memory is ~200 cycles away February
23 Cache Hits 200-cycle 1-cycle hit Hz February
24 Cache Miss to Main Memory 200-cycle Hz cache miss February
25 What constructive work could you do during this time? February
26 Latency Buy 256GB of RAM only if you use it Higher cache miss rates And main memory is ~200 cycles away For starters, we need to prefetch about 16B * 200cy = 3.2KB to meet our forcing function; call it 4KB February
27 Additional Considerations L1 cache size = associativity * page size Need bigger than 4KB pages Translation buffer at 256 x 4KB covers only 1MB of memory Covers much less than on-chip caches TB miss can consume 15% of total CPU time Need bigger than 4KB pages With 256GB of 64M pages Need bigger than 4KB pages February
28 Move data: big and small It s still the memory Need a coordinated design of instruction architecture and memory implementation to achieve high bandwidth with low delay February
29 Modern challenges in CPU design Lots of memory Multi-issue CPU instructions every cycle that drive full bandwidth Full bandwidth all the way to RAM, not just to L1 cache More prefetching in software Bigger page size(s) Topic: do better February
30 Real-time transactions: 1000s per second
31 A Single Transaction Across ~40 Racks of ~60 Servers Each Each arc is a related client-server RPC (remote procedure call) February
32 A Single Transaction RPC Tree vs Time: Client & 93 Servers??? February
33 Single Transaction Tail Latency One slow response out of 93 parallel RPCs slows the entire transaction February
34 One Server, One User-Mode Thread vs. Time Eleven Sequential Transactions (circa 2004) Normal: msec Long: 1800 msec February
35 One Server, Four CPUs: User/kernel transitions every CPU every nanosecond (Ktrace) CPU 0 CPU 1 CPU usec CPU 3 February
36 16 CPUs, 600us, Many RPCs CPUs RPCs February
37 16 CPUs, 600us, Many RPCs LOCKs THREADs February
38 That is A LOT going on at once Let s look at just one long-tail RPC in context February
39 16 CPUs, 600us, one RPC CPUs RPCs February
40 16 CPUs, 600us, one RPC LOCKs THREADs February
41 Wakeup Detail Thread 19 frees lock, sends wakeup to waiting thread 25 Thread 25 actually runs 50 us 50us wakeup delay?? February
42 Wakeup Detail Target CPU was busy; kernel waited THREADs 50 us 50us CPUs CPU 9 busy; But 3 was idle February
43 CPU Scheduling, 2 Designs Re-dispatch on any idle CPU core But if idle CPU core is in deep sleep, can take us to wake up Wait to re-dispatch on previous CPU core, to get cache hits Saves squat if could use same L1 cache Saves ~10us if could use same L2 cache Saves ~100us if could use same L3 cache Expensive if cross-socket cache refills Don t wait too long February
44 Real-time transactions: 1000s per second Not your father s SPECmarks To understand delays, need to track simultaneous transactions across servers, CPU cores, threads, queues, locks February
45 Modern challenges in CPU design A single transaction can touch thousands of servers in parallel The slowest parallel path dominates Tail latency is the enemy Must control lock-holding times Must control scheduler delays Must control interference via shared resources Topic: do better February
46 Isolation between programs
47 Histogram: Disk Server Latency; Long Tail Latency 99th %ile = 696 msec February
48 Non-repeatable Tail Latency Comes from Unknown Interference Isolation of programs reduces tail latency. Reduced tail latency = higher utilization. Higher utilization = $$$.
49 Many Sources of Interference Most interference comes from software But a bit from the hardware underpinnings In a shared apartment building, most interference comes from jerky neighbors But thin walls and bad kitchen venting can be the hardware underpinnings February
50 Isolation issue: Cache Interference CPU thread 0 is moving 16B/cycle flat out, filling caches, hurting threads 3, 7, Core Core 1 L1 caches L2 cache PCs Core 15 L1 caches February
51 Isolation issue: Cache Interference CPU thread 0 is moving 16B/cycle flat out, hurting threads 3, 7, 63 Today Core 0 Core 1 Desired Core 0 Core ~ ~ ~ February
52 Cache Interference How to get there? Partition by ways no good if 16 threads and 8 ways No good if result is direct-mapped Underutilizes cache Selective allocation Give each thread a target cache size Allocate lines freely if under target Replace only own lines if over target Allow over-budget slop to avoid underutilization February
53 Cache Interference Each thread has target:current, replace only own lines if over target Desired Target Core 0 Core 1 Current ~ ~ Target Current ~ February
54 Cache Interference Each thread has target:current, replace only own lines if over target Requires owner bits per cache line expensive bits Requires 64 target/current at L3 Fails if L3 not at least 64-way associative Can rarely find own lines in a set February
55 Design Improvement Track ownership just by incoming paths Plus separate target for kernel accesses Plus separate target for over-target accesses Fewer bits, 8-way assoc OK Target OV K Current OV K Core 0 ~ Desired Core 1 ~ Target Current OV OV K K ~ February
56 Isolation between programs Good fences make good neighbors We need better hardware support for program isolation in shared memory systems February
57 Modern challenges in CPU design Isolating programs from each other on a shared server is hard As an industry, we do it poorly Shared CPU scheduling Shared caches Shared network links Shared disks More hardware support needed More innovation needed February
58 Measurement underpinnings
59 Profiles: What, not Why Samplesof 1000s of transactions, merging results Pro: understanding average performance Blind spots: outliers, idle time February Chuck Close
60 Traces: Why Full detail of individual transactions Pro: understanding outlier performance Blind spot: tracing overhead February
61 Histogram: Disk Server Latency; Long Tail CPU profile shows the 99% common cases Latency 99th %ile = 696 msec The 1% long tail never shows up February
62 Trace: Disk Server, Event-by-Event Read RPC + disk time Write hit, hit, miss 700ms mixture 13 disks, three normal seconds: February
63 Trace: Disk Server, 13 disks, 1.5 sec Phase transition to 250ms boundaries exactly Read Write Read hit Write hit February
64 Trace: Disk Server, 13 disks, 1.5 sec Latencies: 250ms, 500ms, for nine minutes Read Write Read hit Write hit February
65 Why? Probably not on your guessing radar Kernel throttling the CPU use of any process that is over purchased quota Only happened on old, slow servers February
66 Disk Server, CPU Quota bug Understanding Why 4 sped up 25% of entire disk fleet worldwide! Had been going on for three years Savings paid my salary for 10 years Hanlon's razor: Never attribute to malice that which is adequately explained by stupidity. Sites corollary: Never attribute to stupidity that which is adequately explained by software complexity. February
67 Measurement Underpinnings All performance mysteries are simple once they are understood Mystery means that the picture in your head is wrong; software engineers are singularly inept at guessing how their view differs from reality February
68 Modern challenges in CPU design Need low-overhead tools to observe the dynamics of performance anomalies Transaction IDs RPC trees Timestamped transaction begin/end Traces CPU kernel+user, RPC, lock, thread traces Disk, network, power-consumption Topic: do better February
69 Summary: Datacenter Servers are Different
70 Datacenter Servers are Different Move data: big and small Real-time transactions: 1000s per second Isolation between programs Measurement underpinnings February
71 References Claude Shannon, A Mathematical Theory of Communication, The Bell System Technical Journal, Vol. 27, pp , , July, October, Richard Sites, It s the Memory, Stupid!, Microprocessor Report August 5, Ravi Iyer, CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms, ACM International Conference on Supercomputing, M Bligh, M Desnoyers, R Schultz, Linux Kernel Debugging on Google-sized clusters, Linux Symposium, February
72 References Daniel Sanchez and Christos Kozyrakis, The ZCache: Decoupling Ways and Associativity. IEEE/ACM Symp. on Microarchitecture (MICRO-43), Benjamin H. Sigelman, Luiz Andre Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, Chandan Shanbhag, Dapper, a Large-Scale Distributed Systems Tracing Infrastructure, Google Technical Report dapper , April Daniel Sanchez, Christos Kozyrakis, Vantage: Scalable and Efficient Fine- Grained Cache Partitioning, Symp. on Computer Architecture ISCA Luiz André Barroso and Urs Hölzle, The Datacenter as a Computer An Introduction to the Design of Warehouse-Scale Machines, 2nd Edition February
73 Thank You, Questions? February
74 Thank You February
EFFICIENT ANALYSIS OF APPLICATION SERVERS IN THE CLOUD
EFFICIENT ANALYSIS OF APPLICATION SERVERS IN THE CLOUD Progress report meeting December 2012 Phuong Tran Gia [email protected] Under the supervision of Prof. Michel R. Dagenais Dorsal Laboratory,
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA
CUDA Optimization with NVIDIA Tools Julien Demouth, NVIDIA What Will You Learn? An iterative method to optimize your GPU code A way to conduct that method with Nvidia Tools 2 What Does the Application
Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju
Chapter 7: Distributed Systems: Warehouse-Scale Computing Fall 2011 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note:
SGI High Performance Computing
SGI High Performance Computing Accelerate time to discovery, innovation, and profitability 2014 SGI SGI Company Proprietary 1 Typical Use Cases for SGI HPC Products Large scale-out, distributed memory
DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE
DSS Data & Diskpool and cloud storage benchmarks used in IT-DSS CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Geoffray ADDE DSS Outline I- A rational approach to storage systems evaluation
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,
Putting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719
Putting it all together: Intel Nehalem http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Intel Nehalem Review entire term by looking at most recent microprocessor from Intel Nehalem is code
Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
Resource Efficient Computing for Warehouse-scale Datacenters
Resource Efficient Computing for Warehouse-scale Datacenters Christos Kozyrakis Stanford University http://csl.stanford.edu/~christos DATE Conference March 21 st 2013 Computing is the Innovation Catalyst
AIX NFS Client Performance Improvements for Databases on NAS
AIX NFS Client Performance Improvements for Databases on NAS October 20, 2005 Sanjay Gulabani Sr. Performance Engineer Network Appliance, Inc. [email protected] Diane Flemming Advisory Software Engineer
RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University
RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction
Scalable Internet Services and Load Balancing
Scalable Services and Load Balancing Kai Shen Services brings ubiquitous connection based applications/services accessible to online users through Applications can be designed and launched quickly and
Google File System. Web and scalability
Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might
D1.2 Network Load Balancing
D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June [email protected],[email protected],
GraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
Performance and scalability of a large OLTP workload
Performance and scalability of a large OLTP workload ii Performance and scalability of a large OLTP workload Contents Performance and scalability of a large OLTP workload with DB2 9 for System z on Linux..............
Zadara Storage Cloud A whitepaper. @ZadaraStorage
Zadara Storage Cloud A whitepaper @ZadaraStorage Zadara delivers two solutions to its customers: On- premises storage arrays Storage as a service from 31 locations globally (and counting) Some Zadara customers
COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters
COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network
Client/Server and Distributed Computing
Adapted from:operating Systems: Internals and Design Principles, 6/E William Stallings CS571 Fall 2010 Client/Server and Distributed Computing Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Traditional
VI Performance Monitoring
VI Performance Monitoring Preetham Gopalaswamy Group Product Manager Ravi Soundararajan Staff Engineer September 15, 2008 Agenda Introduction to performance monitoring in VI Common customer/partner questions
Scalable Internet Services and Load Balancing
Scalable Services and Load Balancing Kai Shen Services brings ubiquitous connection based applications/services accessible to online users through Applications can be designed and launched quickly and
Architecture of Hitachi SR-8000
Architecture of Hitachi SR-8000 University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Slide 1 Most of the slides from Hitachi Slide 2 the problem modern computer are data
Hardware Configuration Guide
Hardware Configuration Guide Contents Contents... 1 Annotation... 1 Factors to consider... 2 Machine Count... 2 Data Size... 2 Data Size Total... 2 Daily Backup Data Size... 2 Unique Data Percentage...
Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
Optimizing Ext4 for Low Memory Environments
Optimizing Ext4 for Low Memory Environments Theodore Ts'o November 7, 2012 Agenda Status of Ext4 Why do we care about Low Memory Environments: Cloud Computing Optimizing Ext4 for Low Memory Environments
Scalability of web applications. CSCI 470: Web Science Keith Vertanen
Scalability of web applications CSCI 470: Web Science Keith Vertanen Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing Approaches
COMPUTER HARDWARE. Input- Output and Communication Memory Systems
COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)
Linux Scheduler. Linux Scheduler
or or Affinity Basic Interactive es 1 / 40 Reality... or or Affinity Basic Interactive es The Linux scheduler tries to be very efficient To do that, it uses some complex data structures Some of what it
Datacenter Operating Systems
Datacenter Operating Systems CSE451 Simon Peter With thanks to Timothy Roscoe (ETH Zurich) Autumn 2015 This Lecture What s a datacenter Why datacenters Types of datacenters Hyperscale datacenters Major
Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software
Best Practices for Monitoring Databases on VMware Dean Richards Senior DBA, Confio Software 1 Who Am I? 20+ Years in Oracle & SQL Server DBA and Developer Worked for Oracle Consulting Specialize in Performance
A Deduplication File System & Course Review
A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror
High-Density Network Flow Monitoring
Petr Velan [email protected] High-Density Network Flow Monitoring IM2015 12 May 2015, Ottawa Motivation What is high-density flow monitoring? Monitor high traffic in as little rack units as possible
Understanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
Infrastructure Matters: POWER8 vs. Xeon x86
Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report
Question: 3 When using Application Intelligence, Server Time may be defined as.
1 Network General - 1T6-521 Application Performance Analysis and Troubleshooting Question: 1 One component in an application turn is. A. Server response time B. Network process time C. Application response
How To Make Your Database More Efficient By Virtualizing It On A Server
Virtualizing Pervasive Database Servers A White Paper From For more information, see our web site at Virtualizing Pervasive Database Servers Last Updated: 03/06/2014 As servers continue to advance in power,
Whitepaper: performance of SqlBulkCopy
We SOLVE COMPLEX PROBLEMS of DATA MODELING and DEVELOP TOOLS and solutions to let business perform best through data analysis Whitepaper: performance of SqlBulkCopy This whitepaper provides an analysis
File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System
CS341: Operating System Lect 36: 1 st Nov 2014 Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati File System & Device Drive Mass Storage Disk Structure Disk Arm Scheduling RAID
Maximizing VMware ESX Performance Through Defragmentation of Guest Systems. Presented by
Maximizing VMware ESX Performance Through Defragmentation of Guest Systems Presented by July, 2010 Table of Contents EXECUTIVE OVERVIEW 3 TEST EQUIPMENT AND METHODS 4 TESTING OVERVIEW 5 Fragmentation in
Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat
Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
Chronicle: Capture and Analysis of NFS Workloads at Line Rate
Chronicle: Capture and Analysis of NFS Workloads at Line Rate Ardalan Kangarlou, Sandip Shete, and John Strunk Advanced Technology Group 1 Motivation Goal: To gather insights from customer workloads via
Network Attached Storage. Jinfeng Yang Oct/19/2015
Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability
1 Storage Devices Summary
Chapter 1 Storage Devices Summary Dependability is vital Suitable measures Latency how long to the first bit arrives Bandwidth/throughput how fast does stuff come through after the latency period Obvious
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
Storage in Database Systems. CMPSCI 445 Fall 2010
Storage in Database Systems CMPSCI 445 Fall 2010 1 Storage Topics Architecture and Overview Disks Buffer management Files of records 2 DBMS Architecture Query Parser Query Rewriter Query Optimizer Query
External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13
External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing
Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind
Capacity Estimation for Linux Workloads
Capacity Estimation for Linux Workloads Session L985 David Boyes Sine Nomine Associates 1 Agenda General Capacity Planning Issues Virtual Machine History and Value Unique Capacity Issues in Virtual Machines
Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
Performance Guideline for syslog-ng Premium Edition 5 LTS
Performance Guideline for syslog-ng Premium Edition 5 LTS May 08, 2015 Abstract Performance analysis of syslog-ng Premium Edition Copyright 1996-2015 BalaBit S.a.r.l. Table of Contents 1. Preface... 3
Tyche: An efficient Ethernet-based protocol for converged networked storage
Tyche: An efficient Ethernet-based protocol for converged networked storage Pilar González-Férez and Angelos Bilas 30 th International Conference on Massive Storage Systems and Technology MSST 2014 June
Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts
Part V Applications Cloud Computing: General concepts Copyright K.Goseva 2010 CS 736 Software Performance Engineering Slide 1 What is cloud computing? SaaS: Software as a Service Cloud: Datacenters hardware
Speeding Up Cloud/Server Applications Using Flash Memory
Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck
Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering
Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:
Performance Counters Technical Data Sheet Microsoft SQL Overview: Key Features and Benefits: Key Definitions: Performance counters are used by the Operations Management Architecture (OMA) to collect data
Internet Content Distribution
Internet Content Distribution Chapter 2: Server-Side Techniques (TUD Student Use Only) Chapter Outline Server-side techniques for content distribution Goals Mirrors Server farms Surrogates DNS load balancing
Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems
1 Any modern computer system will incorporate (at least) two levels of storage: primary storage: typical capacity cost per MB $3. typical access time burst transfer rate?? secondary storage: typical capacity
COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service
COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Yunhong Jiang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
Chapter 12: Mass-Storage Systems
Chapter 12: Mass-Storage Systems Chapter 12: Mass-Storage Systems Overview of Mass Storage Structure Disk Structure Disk Attachment Disk Scheduling Disk Management Swap-Space Management RAID Structure
IOmark- VDI. HP HP ConvergedSystem 242- HC StoreVirtual Test Report: VDI- HC- 150427- b Test Report Date: 27, April 2015. www.iomark.
IOmark- VDI HP HP ConvergedSystem 242- HC StoreVirtual Test Report: VDI- HC- 150427- b Test Copyright 2010-2014 Evaluator Group, Inc. All rights reserved. IOmark- VDI, IOmark- VM, VDI- IOmark, and IOmark
Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7
Storing : Disks and Files Chapter 7 Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet base Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Disks and
Deploying and Optimizing SQL Server for Virtual Machines
Deploying and Optimizing SQL Server for Virtual Machines Deploying and Optimizing SQL Server for Virtual Machines Much has been written over the years regarding best practices for deploying Microsoft SQL
09'Linux Plumbers Conference
09'Linux Plumbers Conference Data de duplication Mingming Cao IBM Linux Technology Center [email protected] 2009 09 25 Current storage challenges Our world is facing data explosion. Data is growing in a amazing
The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology
3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related
Enterprise Applications
Enterprise Applications Chi Ho Yue Sorav Bansal Shivnath Babu Amin Firoozshahian EE392C Emerging Applications Study Spring 2003 Functionality Online Transaction Processing (OLTP) Users/apps interacting
Overlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm [email protected] [email protected] Department of Computer Science Department of Electrical and Computer
Chapter 10: Mass-Storage Systems
Chapter 10: Mass-Storage Systems Physical structure of secondary storage devices and its effects on the uses of the devices Performance characteristics of mass-storage devices Disk scheduling algorithms
Optimizing Linux Performance
Optimizing Linux Performance Why is Performance Important Regular desktop user Not everyone has the latest hardware Waiting for an application to open Application not responding Memory errors Extra kernel
Scalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
White Paper. Recording Server Virtualization
White Paper Recording Server Virtualization Prepared by: Mike Sherwood, Senior Solutions Engineer Milestone Systems 23 March 2011 Table of Contents Introduction... 3 Target audience and white paper purpose...
PERFORMANCE TUNING ORACLE RAC ON LINUX
PERFORMANCE TUNING ORACLE RAC ON LINUX By: Edward Whalen Performance Tuning Corporation INTRODUCTION Performance tuning is an integral part of the maintenance and administration of the Oracle database
Computer Architecture. Secure communication and encryption.
Computer Architecture. Secure communication and encryption. Eugeniy E. Mikhailov The College of William & Mary Lecture 28 Eugeniy Mikhailov (W&M) Practical Computing Lecture 28 1 / 13 Computer architecture
The Google File System
The Google File System By Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung (Presented at SOSP 2003) Introduction Google search engine. Applications process lots of data. Need good file system. Solution:
Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU
Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU Savita Shiwani Computer Science,Gyan Vihar University, Rajasthan, India G.N. Purohit AIM & ACT, Banasthali University, Banasthali,
RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29
RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f. one large disk) Parallelism improves performance Plus extra disk(s) for redundant data storage Provides fault tolerant
Performance in the Infragistics WebDataGrid for Microsoft ASP.NET AJAX. Contents. Performance and User Experience... 2
Performance in the Infragistics WebDataGrid for Microsoft ASP.NET AJAX An Infragistics Whitepaper Contents Performance and User Experience... 2 Exceptional Performance Best Practices... 2 Testing the WebDataGrid...
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
How to Build a Massively Scalable Next-Generation Firewall
How to Build a Massively Scalable Next-Generation Firewall Seven measures of scalability, and how to use them to evaluate NGFWs Scalable is not just big or fast. When it comes to advanced technologies
Solving I/O Bottlenecks to Enable Superior Cloud Efficiency
WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one
WHITE PAPER FUJITSU PRIMERGY SERVER BASICS OF DISK I/O PERFORMANCE
WHITE PAPER BASICS OF DISK I/O PERFORMANCE WHITE PAPER FUJITSU PRIMERGY SERVER BASICS OF DISK I/O PERFORMANCE This technical documentation is aimed at the persons responsible for the disk I/O performance
KVM & Memory Management Updates
KVM & Memory Management Updates KVM Forum 2012 Rik van Riel Red Hat, Inc. KVM & Memory Management Updates EPT Accessed & Dirty Bits 1GB hugepages Balloon vs. Transparent Huge Pages Automatic NUMA Placement
White Paper Perceived Performance Tuning a system for what really matters
TMurgent Technologies White Paper Perceived Performance Tuning a system for what really matters September 18, 2003 White Paper: Perceived Performance 1/7 TMurgent Technologies Introduction The purpose
Chapter 18: Database System Architectures. Centralized Systems
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
Performance Characteristics of Large SMP Machines
Performance Characteristics of Large SMP Machines Dirk Schmidl, Dieter an Mey, Matthias S. Müller [email protected] Rechen- und Kommunikationszentrum (RZ) Agenda Investigated Hardware Kernel Benchmark
One Server Per City: C Using TCP for Very Large SIP Servers. Kumiko Ono Henning Schulzrinne {kumiko, hgs}@cs.columbia.edu
One Server Per City: C Using TCP for Very Large SIP Servers Kumiko Ono Henning Schulzrinne {kumiko, hgs}@cs.columbia.edu Goal Answer the following question: How does using TCP affect the scalability and
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
The Database is Slow
The Database is Slow SQL Server Performance Tuning Starter Kit Calgary PASS Chapter, 19 August 2015 Randolph West, Born SQL Email: [email protected] Twitter: @rabryst Basic Internals Data File Transaction Log
GraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
Programmable Networking with Open vswitch
Programmable Networking with Open vswitch Jesse Gross LinuxCon September, 2013 2009 VMware Inc. All rights reserved Background: The Evolution of Data Centers Virtualization has created data center workloads
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
How to handle Out-of-Memory issue
How to handle Out-of-Memory issue Overview Memory Usage Architecture Memory accumulation 32-bit application memory limitation Common Issues Encountered Too many cameras recording, or bitrate too high Too
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based
