Scalable Internet Architectures
|
|
|
- Wendy Ferguson
- 10 years ago
- Views:
Transcription
1 1 Scalable Internet Architectures how to build scalable production Internet services and... how not to build them
2 2 A bit about the speaker Principal Consultant OmniTI Computer Consulting, Inc. open-source developer mod_backhand, wackamole, Daiquiri, Spread, OpenSSH/SecurID, a variety of CPAN modules, etc. closed-source developer Ecelerity (MTA), EC Cluster (MTA Clustering) open-source advocate Closed source software has technical risk. closed-source advocate It s about business, not software. Finding the right tool for the job sometimes leads to closed-source solutions.
3 3 What is Scalability? Definition: How well a solution to some problem will work when the size of the problem increases. What s missing?... when the size decreases the solution must fit
4 4 Production Environments high uptime low maintenance formal procedures cost controlled
5 5 High Uptime Availability despite individual system failures parallel servers all servers are live and can handle transactions cheap and common for web servers expensive for databases hot spare/standby fail-over system that is seamless and immediate (automated) common for HA/LB solutions many databases have built-in facilities providing hot-spare service warm spare/standby fail-over system is nearly immediate, but not seamless (not automated) common technique for databases, cheap and easy cold spare/standby I have the equipment and backups to get it running if it were to fail.
6 6 Maintenance The single largest expense in most environments Contributing factors: The number of unique required products in the architecture The stability and replaceability of required products Uneducated development and implementation decisions The complexity and frequency of staging and pushing new code
7 7 Formal Procedures Scalability marginally impacts procedure Procedure grossly impacts scalability developer code review religious use of revision control planned and reviewed upgrade strategies intelligent, low-cost (resources) push procedures
8 8 Three Simple Rules optimize where it counts complexity has costs use the right tool
9 9 Three Simple Rules #1: Amdahl s Law Good Better improving execution time by 50% of code that executes 2% of the time results in 1% performance improvement improving execution time by 10% of code that executes 80% of the time results in 8% performance improvement
10 10 Three Simple Rules #2: Complex architectures are expensive adding an additional architectural component to a service or set of services increases the system complexity linearly requiring an additional architectural component for a service increases the system complexity exponentially
11 11 Three Simple Rules #3: Using the wrong tool is expensive (and stupid) using a tool because it is easy or familiar doesn't make it right it is often a gratuitous waste of resources white papers are marketing tools and may not represent the most practical solution it s about good design and implementation practices
12 12 Building Production Systems
13 13 Production Fundamentals understand the stability of the software understand the velocity of development understand administrative aspects understand the likelihood of failure and the support for each component
14 14 Software Stability Stability is not just reliability Also consider: release cycles upgrade paths feature additions, deprications, and removals
15 15 The Need For Speed vs. The Need For Control Understand the velocity of development For Small Projects: use revision control For Large Projects: use revision control For ALL Projects: use revision control No revision control? Accident waiting to happen
16 16 The Need For Speed vs. The Need For Control Unchecked speed is costly Rapid release cycles (once/day) are needed in some businesses An equilibrium must achieved or the situation will explode Properly used revision control allows for speed and control It is challenging, but meticulous unwavering adherence to policy and procedure will deliver you from disaster.
17 17 Administration This deserves a lot of attention (despite the single slide here) Systems Administration costs money Short release cycles on components means perpetual administration Constant change in development product results in different stress on: Databases, Networks, Systems... and the people that maintain them Adding components or complicating the architecture complicates: Monitoring Upgrading Scaling down should the need arise
18 18 Likelihood of Failure (the hidden administrative nightmare) Internally Developed Application Failures Suck. Third-Party Component Failures Suck More! It is seen as an administration responsibility Regardless if developers dictated their inclusion in the architecture SAs, NAs, and DBAs suddenly become responsible for the ongoing maintenance of all third-party products -- open source or commercial This is often beyond the expertise/attention of the individual or team Systems fail, it s part of life Chronic problems and failures will explode your TCO
19 19 Likelihood of Failure (solution to the hidden administrative nightmare) Don t leave requirement assessments at: This won t work... but you re the boss Worst Bad implementing something that won t work being responsible for making it work getting fired for perceived incompetence getting fired for refusing to implement something that has no hope of working Best work with the development team to revise requirements and architectural needs
20 20 Clustered Image Serving
21 21 Goal Static image serving 120MBs throughput 24x7 uptime requirements Three geographically distributed sites
22 22 Decisions high uptime Users Users HA/LB HA/LB Apache Apache Apache White Paper Approach expensive, dedicated, singlepurpose HA/LB devices Apache HA Apache HA Apache HA Peer-based HA cheap and reusable commodity machines
23 23 The Tired Tiered Approach Pros: Fine-grained, connection-based request distribution (load balancing) 100,000+ concurrent connections Session management (sticky) One IP per service Cons: Expensive Single purpose Your HA solution needs HA!!! 3 locations requires 6 units High maintenance (additional hardware component)
24 24 Peer-based HA Wackamole Pros: No specialized hardware Low maintenance (software daemon) Simple Free Cons: Naïve load balancing (DNS RR) Requires multiple IPs for a single service (bad for multi-ssl)
25 25 Policy & Procedure Pushing content Even for small (~100Mb) image repositories, pushes are expensive dumb protocols have horrible network costs rsync still incurs substantial I/O for each mirror multicast rsync could work, but there are no solid implementations Pulling content Assuming a slow rate of change, cache-on-demand is solid Use Apache + mod_proxy (Reverse Proxy + Caching) Fine-grained cache purging is a challenge
26 26 Scaling Up 3 Sites Goal 200Mbs throughput requirement The goal is lower latency only 2 web servers per site needed for fault tolerance Traditional White Paper Approach 3 x dual HA/LB 3 x 2 image web servers Peer-based HA Solution 3 x 2 image web servers 3 x 2 x $ x 2 x $2000 $ x 2 x $2000 $12000
27 27 Scaling Down 1 Site Goal 10Mbs throughput requirement The goal is lower latency only 2 web servers per site needed for fault tolerance Traditional White Paper Approach dual HA/LB 2 image web servers Peer-based HA Solution 2 image web servers 2 x $ x $2000 $ x $2000 $4000
28 28 Technical Details Each box running FreeBSD 5-stable Spread v wackamole Apache /mod_ssl + mod_proxy + patches
29 29 Spread: What is it? Group Communication Messaging Bus Membership Clear Delivery Semantics Reliable or Unreliable FIFO, Causal Agreed, Safe View of membership for delivery Fast and Efficient N subscribers!= N x bandwidth Multicast or broadcast Usable C, Perl, Python, Java, PHP, Ruby API
30 30 Spread: Configuration # New Jersey Site Spread_Segment :4803 { image-0-1 a.b.c.101 image-0-2 a.b.c.102 } # San Jose Site Spread_Segment :4803 { image-1-1 d.e.f.101 image-1-2 d.e.f.102 } image-0-1 segment 1 image-0-2 Internet # Germany Site Spread_Segment :4803 { image-2-1 g.h.i.101 image-2-2 g.h.i.102 } image-1-1 image-1-2 segment 2 segment 3 image-2-1 image-2-2
31 31 Wackamole: Configuration Spread Daemon & Group Virtual Interfaces Controlled Notifications of ownership Spread = 4803 SpreadRetryInterval = 5s Group = wack1 Control = /var/run/wack.it Prefer None VirtualInterfaces { { fxp0:a.b.c.111/32 } { fxp0:a.b.c.112/32 } } Arp-Cache = 90s Notify { fxp0:a.b.c.1/32 fxp0:a.b.c.0/24 throttle 16 arp-cache } Balancing parameters balance { AcquisitionsPerRound = all interval = 4s } mature = 5s
32 32 Apache: Configuration # Don t act as a free image caching service! <Directory proxy:*> deny from all </Directory> # But act provide service to us <Directory proxy: allow from all </Directory> RewriteEngine on RewriteLogLevel 0 RewriteRule ^proxy: - [F] RewriteRule ^(http: ftp:) - [F] RewriteRule ^\/*([^\/]+)(.*)$ [P,L] RewriteRule.* - [F] ProxyRequests on CacheRoot /data/cache CacheSize ProxyPassReverse /
33 33 Why patch mod_proxy? mod_proxy hashes URLs for local caching better distribution of files over directories nice to your filesystem good for forward caches makes purging individual URLs less intuitive Patched to write intuitive filenames a URL like: becomes /data/cache/ SAs can troubleshoot issues with certain URLs cached files can be purged easily with rm use Spread to distribute and coordinate cache purging operations
34 34 The Next Step We have three mini-clusters installed and configured, ready to throw bits by the trillions How people find them?...dns, obviously How do people find the closest cluster to them?...clever DNS, [not so] obviously
35 35 Replica Location server-side redirection Application is responsible Use HTTP redirects: images-sj.example.com images-nj.example.com images-de.example.com
36 36 Replica Location Proximity-based DNS DNS Provides Convergence Can take hours/days DNS servers near clusters: ns-sj.example.com ns-nj.example.com ns-de.example.com
37 37 Replica Location DNS Shared IP (a.k.a. AnyCast) DNS servers near clusters: ns-sj.example.com ns-nj.example.com ns-de.example.com All DNS servers have the same IP Network block is announced from all sites via BGP Routing protocols provide immediate convergence
38 38 Web Cluster Logging
39 39 The Setup and The Goal Cluster of web servers Apache thttpd Logs are vital must be stored in more than one place Real-time assessments hit rates load balancing HTTP response code rates
40 40 Traditional Configuration Local Logging, Post-process Aggregation Log written locally on web servers space must be allocated Consolidation happens periodically crashes will result in missing data aggregators must preserve chronology (expensive) real-time metrics cannot be calculated Monitor(s) must run against log server monitors must tail log files requires resources on the logging server
41 41 Traditional Configuration Local Logging, Post-process Aggregation
42 42 TCP/IP or UDP/IP Logging Syslog, Syslog-NG Logs are written directly to logging server(s) UDP is unreliable and thus not useful TCP is a point-to-point protocol Two logging servers means all info gets sent twice. Add a monitor and that s three times! Real-time metrics can now be collected monitors must still be run against log servers (or the publishers must be reconfigured)
43 43 TCP/IP or UDP/IP Logging Syslog, Syslog-NG
44 44 Passive Network Logging sniffers Requires no modification of architecture Add/remove publishers (web severs) on-the-fly Drops Logs! When tested head-to-head with traditional logging mechanisms we see loss Missing logs are unacceptable Clean the white dust off your upper lip and choose another implementation
45 45 Reliable Multicast Logging mod_log_spread Flexible Operations Add/remove publishers (web severs) on-the-fly Add/remove subscribers (loggers/monitors) on-the-fly Reliable Multicast (based on Spread) Multiple subscribers don t incur addition network overhead Individual real-time passive and active monitors Monitors can be attached without resource consumption concerns Passive monitors that draw graphs, assess trends, detect failures Active monitors that feed metrics back into a production system Who s online Real-time page access metrics
46 46 Reliable Multicast Logging mod_log_spread
47 47 mod_log_spread The Publisher mod_log_spread is really a patch to mod_log_config Like pipes in mod_log_config: /path/to/rotatelogs filename 3600 mls adds a Spread group destination: $groupname LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %T" combined CustomLog $example combined
48 48 spreadlogd The Subscriber spreadlogd writes logs... BufferSize = Spread { Port = 4803 Log { RewriteTimestamp = CommonLogFormat Group = "example" File = /data/logs/apache/ } }
49 49 Other Real-Time Tools Other Subscribers ApacheTop C++ top -style real-time hit display mls_mon graphical hit rates by server and by code
50 50 clients cache application Caching Architectures
51 51 What is a cache? Cache: A small fast memory holding recently accessed data, designed to speed up subsequent access to the same data. Most often applied to processor-memory access but also used for a local copy of data accessible over a network etc.
52 52 The Layered Cache clients Exists above/in-front Knows little or nothing about what s underneath Works fabulously for static data (like images) cache application
53 53 The Integrated Cache Exists in the application. Knows the data and how the application uses it. Works well for data that doesn t change rapidly but is relatively expensive to query. application DB cache
54 54 The Data Cache Exists in the data store Knows the data, the queries and how the data has changed. Works well always called computational reuse oldest trick in the book MySQL 4 has this they call it a Query Cache application cache DB
55 55 Write-thru Cache Occurs at update location Knows the data, the app, the queries and how the data has changed. Works well for administrative updates many WebLogs work this way Can be very adaptive and flexible admin application cache DB
56 56 A Real-World Example News site News items are stored in Oracle User Preferences are stored in Oracle Hundreds of different sections Each with thousands of different articles Pages: hits/second shows personalized user info on EVERY page front page shows top N F articles for forum F (limit 10)
57 57 The Approach Oracle is fast enough why abuse Oracle for this purposes? surely there are better things for Oracle to be doing Updates are controlled updates to news items only happen from a publisher news update:read ratio is miniscule user preferences are only ever updated by the user
58 58 Articles Article publishing sticks news items in Oracle The straight forward way page pulls user prefs from cookie (or bounces off a cookie populator) page pulls news item from database I hate query strings I like: RewriteRule ^/news/items/([^/]*).html$ /www/docs/news/article.php?id=$1 [L]
59 59 Articles Cached We pull the item that is likely to never change cheaper if the page just hard coded the news item writing the news article out into a PHP page is a hassle... or is it? Have the straight forward page cache it /news/article.php writes /news/items/12345.html as a PHP page that still expands personal info from cookie, but has the news item content statically included as HTML. RewriteCond %{REQUEST_FILENAME} ^/news/items/([^/]*).html RewriteCond %{REQUEST_FILENAME}!-f RewriteRule ^/news/items/([^/]*).html$ /www/docs/news/article.php?id=$1 [L]
60 60 Articles Cache Invalidation Run a cache invalidator on each web server connects to Spread as a subscriber accepts /www/docs/news/items/####.html deletion requests accepts full purge requests Article publishing stash item #### in Oracle (insert or update) publish through Spread an invalidation of #### Changing the look of the article pages change article.php to have the desired effect (and write the appropriate php cache pages) publish through Spread a full purge
61 61 The Result All news item pages require zero DB requests the business can now make your life difficult by requesting new crap on these pages that can t be so easily cached Far fewer database connections required all databases appreciate that (Oracle, MySQL, Postgres) Bottleneck is now Apache+mod_php crazy fast with tools like APC inherently scalable... just add more web servers room for more application features
62 62 Tiered Architectures
63 63 Why Tier? Dedicate resources to specific components Often a good approach to scaling systems up Requiring single purpose components is a good way to lock into a big (expensive) architecture Tiers make computer science problems easier Understand the trade off of solving hard problems vs. maintaining tiered solutions Tiers add complexity and increase maintenance costs More components, more pieces, more moving parts... More can (and does) go wrong.
64 64 Dedicated Resources Classic Example: Apache on dedicated web servers Database on dedicated machine Why? it is easy to have 4 web server, hard to have 4 databases Lock-in Example: Web application on several servers Requires local session state and sticky sessions Why? scaling down to 2 servers will still require a load balancer that can stick sessions.
65 65 Tiering to Compensate (for problems that are hard to solve) Database Replication is hard Anyone who tells you otherwise is lying or not telling you the whole story Session Replication is not so hard use a technology like Splash! offload responsibility to the client Tiers are expensive technically and financially Some problems more difficult than tiers are expensive
66 66 Tiers are expensive Intrinsically difficult to scale down If it is a real production system... Complete staging environment Complete development environment
67 67 Effective Replication Eliminating The Need To Tier
68 68 Types of Database Replication Master-Slave a data set has a master server changes to the data set are sent to slaves dml must be performed at the master read-only queries can be performed anywhere no challenging synchronization algorithms
69 69 Types of Database Replication Master-Master data modification can be performed anywhere coordinating ACID and XA constraints is hard manage full transactions view consistency initial synchronization synchronization algorithms 2-phase commit (2PC) 3-phase commit (3PC)
70 70 Types of Database Replication Multi-Master data modification can be performed anywhere coordinating ACID and XA constraints is hard manage full transactions view consistency initial synchronization synchronization algorithms are complex 2PC and 3PC are unrealistic as N 2 handshakes must happen EVS Engine COReL
71 71 Relative Performance
72 72 Relative Performance
73 73 State of Affairs Multi-Master replication is a long way off current implementors use 2PC no enterprise offerings achieve EVS Engine performance architectures that push databases hard aren t willing to cut performance for replication multi-master is ready for architectures with low update rates that demand replication for data safety Master-Slave is ready for prime time MySQL (native master-slave replication) Oracle snapshots/materialized views
74 74 News Site Revisited Replicate the database on each web server Oracle on each web server replicate the needed tables certainly doesn t scale financially If the site used MySQL... zero capital investment news items don t need to be cached in PHP pages legitimizes more intense queries in live site pages
75 75 The Right Tool For The Job
76 76 Who s Online? a real-world example A service requires who s online info Users that have loaded an object within x minutes Need to know the last page the user hit The service is exposed throughout the site
77 77 Scalability Requirements x = 30 (minutes) 5000 hits/second 100,000 concurrent users Queries: current users online (count) current users on this page sorted by last access (limit 30)
78 78 Let s use a familiar tool We use MySQL anyway We use MySQL to drive the site We are familiar with MySQL Queries are cake: select count(1) from recent_hits where hitdate > SUBDATE(NOW(), INTERVAL 30 MINUTE) select username, hitdate from recent_hits where url =? and hitdate > SUBDATE(NOW(), INTERVAL 30 MINUTE) order by hitdate desc limit 30
79 79 Getting More Specific 100,000+ row table assuming we sweep out stale data 5,000 replaces/second indexes required on hitdate and url 1,000 queries/second MySQL s query cache doesn t help at all, the updates invalidate it Replaces cannot block queries MyISAM is not an option, we use InnoDB Both queries require a full table scan! All in addition to the existing demands of the site!
80 80 Try Some Tests On the development box (Idle dual Xeon, SCSI drives) 1,400 replaces/second 800 queries/second On the production box (dual Xeon, SCSI disk array w/ 1GB cache) 200 replaces/second 150 queries/second ARE YOU INSANE?!
81 81 Build a Custom Tool We need which urls/users/timestamp tracking So... we need to add an update to each page No... that won t catch images, let s use a mod_perl log hook. Wait... we are already writing logs, let s aggregate passively if we use mod_log_spread, we just need to add a subscriber Passive aggregation can handle bursty traffic by lagging behind a bit it can t slow down the app! Pick a data structure Multi-Indexed Skiplist -- why? Free balancing -- randomized O(lg n) insertion, deletion, location O(1) popping (for culling expired sessions) it precisely meets the requirements... and I like them.
82 82 Choose a Language I like C... so sue me. The concept: Skiplist Index on: username url,hitdate hitdate Single thread, event driven Receive messages from Spread: parse username, url, hitdate delete from skiplist by username O(lg n) insert into skiplist O(lg n) pop the end of the skiplist of any hitdate > 30 minutes O(1) Receive client requests Cardinality query is O(1), single write() Users on a url query is O(lg), O(1) to fill out iovec, single writev()
83 83 The Result 6 hours worth of coding and testing 800 lines of C code (server) use libspread and libskiplist 40 lines of perl (client module for web app) On commodity hardware ($2k box) ~80,000 inserts/second more hits than we ll ever see! ~100,000,000 counts/second -- not including write() ~500,000 users for urls/second -- not including writev() That ll do.
84 84 The Right Tool For The Job MySQL is a great tool it is used to drive the rest of the site... spectacularly Using it for this project would: wasted valuable resources in the architecture saved a few hours of work allowed you to not use your brain
85 85 This Image Intentionally Left Blank Conclusion
86 86 Scalable Internet Architectures Building them just isn t that hard... if you carefully analyze the problem at hand don t make sloppy or rash technical decisions always think like a computer scientist
87 87 Thank You OmniTI Computer Consulting, Inc. My biggest fan club Lisa, Zoe & Gianna Look for my book in late 2005!
Scalable Internet Architectures
1 Scalable Internet Architectures how to build scalable production Internet services and... how not to build them 2 A bit about the speaker Principal OmniTI Computer Consulting, Inc. open-source developer
Scalability of web applications. CSCI 470: Web Science Keith Vertanen
Scalability of web applications CSCI 470: Web Science Keith Vertanen Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing Approaches
Wikimedia architecture. Mark Bergsma <[email protected]> Wikimedia Foundation Inc.
Mark Bergsma Wikimedia Foundation Inc. Overview Intro Global architecture Content Delivery Network (CDN) Application servers Persistent storage Focus on architecture, not so much on
High Availability Solutions for the MariaDB and MySQL Database
High Availability Solutions for the MariaDB and MySQL Database 1 Introduction This paper introduces recommendations and some of the solutions used to create an availability or high availability environment
MAGENTO HOSTING Progressive Server Performance Improvements
MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 [email protected] 1.866.963.0424 www.simplehelix.com 2 Table of Contents
Cloud Based Application Architectures using Smart Computing
Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products
Internet Content Distribution
Internet Content Distribution Chapter 2: Server-Side Techniques (TUD Student Use Only) Chapter Outline Server-side techniques for content distribution Goals Mirrors Server farms Surrogates DNS load balancing
Tushar Joshi Turtle Networks Ltd
MySQL Database for High Availability Web Applications Tushar Joshi Turtle Networks Ltd www.turtle.net Overview What is High Availability? Web/Network Architecture Applications MySQL Replication MySQL Clustering
1. Comments on reviews a. Need to avoid just summarizing web page asks you for:
1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of
Apache Tomcat. Load-balancing and Clustering. Mark Thomas, 20 November 2014. 2014 Pivotal Software, Inc. All rights reserved.
2 Apache Tomcat Load-balancing and Clustering Mark Thomas, 20 November 2014 Introduction Apache Tomcat committer since December 2003 [email protected] Tomcat 8 release manager Member of the Servlet, WebSocket
Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at
Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at distributing load b. QUESTION: What is the context? i. How
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
SiteCelerate white paper
SiteCelerate white paper Arahe Solutions SITECELERATE OVERVIEW As enterprises increases their investment in Web applications, Portal and websites and as usage of these applications increase, performance
Product Review: James F. Koopmann Pine Horse, Inc. Quest Software s Foglight Performance Analysis for Oracle
Product Review: James F. Koopmann Pine Horse, Inc. Quest Software s Foglight Performance Analysis for Oracle Introduction I ve always been interested and intrigued by the processes DBAs use to monitor
SCALABILITY AND AVAILABILITY
SCALABILITY AND AVAILABILITY Real Systems must be Scalable fast enough to handle the expected load and grow easily when the load grows Available available enough of the time Scalable Scale-up increase
Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.
Portable Scale-Out Benchmarks for MySQL MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc. Continuent 2008 Agenda / Introductions / Scale-Out Review / Bristlecone Performance Testing Tools /
Building Scalable Web Sites: Tidbits from the sites that made it work. Gabe Rudy
: Tidbits from the sites that made it work Gabe Rudy What Is This About Scalable is hot Web startups tend to die or grow... really big Youtube Founded 02/2005. Acquired by Google 11/2006 03/2006 30 million
How Comcast Built An Open Source Content Delivery Network National Engineering & Technical Operations
How Comcast Built An Open Source Content Delivery Network National Engineering & Technical Operations Jan van Doorn Distinguished Engineer VSS CDN Engineering 1 What is a CDN? 2 Content Router get customer
Copyright www.agileload.com 1
Copyright www.agileload.com 1 INTRODUCTION Performance testing is a complex activity where dozens of factors contribute to its success and effective usage of all those factors is necessary to get the accurate
Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
Designing, Scoping, and Configuring Scalable Drupal Infrastructure. Presented 2009-05-30 by David Strauss
Designing, Scoping, and Configuring Scalable Drupal Infrastructure Presented 2009-05-30 by David Strauss Understanding Load Distribution Predicting peak traffic Traffic over the day can be highly irregular.
S y s t e m A r c h i t e c t u r e
S y s t e m A r c h i t e c t u r e V e r s i o n 5. 0 Page 1 Enterprise etime automates and streamlines the management, collection, and distribution of employee hours, and eliminates the use of manual
CS514: Intermediate Course in Computer Systems
: Intermediate Course in Computer Systems Lecture 7: Sept. 19, 2003 Load Balancing Options Sources Lots of graphics and product description courtesy F5 website (www.f5.com) I believe F5 is market leader
Apache Performance Tuning
Apache Performance Tuning Part 2: Scaling Out Sander Temme Agenda Introduction Redundancy in Hardware Building Out: Separate Tiers Building Out: Load Balancing Caching Content Conclusion
Exploring Oracle E-Business Suite Load Balancing Options. Venkat Perumal IT Convergence
Exploring Oracle E-Business Suite Load Balancing Options Venkat Perumal IT Convergence Objectives Overview of 11i load balancing techniques Load balancing architecture Scenarios to implement Load Balancing
BASICS OF SCALING: LOAD BALANCERS
BASICS OF SCALING: LOAD BALANCERS Lately, I ve been doing a lot of work on systems that require a high degree of scalability to handle large traffic spikes. This has led to a lot of questions from friends
Software-defined Storage Architecture for Analytics Computing
Software-defined Storage Architecture for Analytics Computing Arati Joshi Performance Engineering Colin Eldridge File System Engineering Carlos Carrero Product Management June 2015 Reference Architecture
The Sierra Clustered Database Engine, the technology at the heart of
A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel
ScaleArc for SQL Server
Solution Brief ScaleArc for SQL Server Overview Organizations around the world depend on SQL Server for their revenuegenerating, customer-facing applications, running their most business-critical operations
Database Scalability and Oracle 12c
Database Scalability and Oracle 12c Marcelle Kratochvil CTO Piction ACE Director All Data/Any Data [email protected] Warning I will be covering topics and saying things that will cause a rethink in
Managing your Domino Clusters
Managing your Domino Clusters Kathleen McGivney President and chief technologist, Sakura Consulting www.sakuraconsulting.com Paul Mooney Senior Technical Architect, Bluewave Technology www.bluewave.ie
G22.3250-001. Porcupine. Robert Grimm New York University
G22.3250-001 Porcupine Robert Grimm New York University Altogether Now: The Three Questions! What is the problem?! What is new or different?! What are the contributions and limitations? Porcupine from
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &
Table of Contents. Overview... 1 Introduction... 2 Common Architectures... 3. Technical Challenges with Magento... 6. ChinaNetCloud's Experience...
Table of Contents Overview... 1 Introduction... 2 Common Architectures... 3 Simple System... 3 Highly Available System... 4 Large Scale High-Performance System... 5 Technical Challenges with Magento...
Tier Architectures. Kathleen Durant CS 3200
Tier Architectures Kathleen Durant CS 3200 1 Supporting Architectures for DBMS Over the years there have been many different hardware configurations to support database systems Some are outdated others
Ensuring scalability and performance with Drupal as your audience grows
Drupal performance and scalability Ensuring scalability and performance with Drupal as your audience grows Presented by Jon Anthony Bounty.com Northern and Shell (OK! Magazine etc) Drupal.org/project/
CHAPTER 4 PERFORMANCE ANALYSIS OF CDN IN ACADEMICS
CHAPTER 4 PERFORMANCE ANALYSIS OF CDN IN ACADEMICS The web content providers sharing the content over the Internet during the past did not bother about the users, especially in terms of response time,
Cloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
Networking Topology For Your System
This chapter describes the different networking topologies supported for this product, including the advantages and disadvantages of each. Select the one that best meets your needs and your network deployment.
Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity
P3 InfoTech Solutions Pvt. Ltd http://www.p3infotech.in July 2013 Created by P3 InfoTech Solutions Pvt. Ltd., http://p3infotech.in 1 Web Application Deployment in the Cloud Using Amazon Web Services From
Windows Server Performance Monitoring
Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly
DEPLOYMENT GUIDE Version 1.0. Deploying the BIG-IP LTM with Apache Tomcat and Apache HTTP Server
DEPLOYMENT GUIDE Version 1.0 Deploying the BIG-IP LTM with Apache Tomcat and Apache HTTP Server Table of Contents Table of Contents Deploying the BIG-IP LTM with Tomcat application servers and Apache web
MySQL Enterprise Backup
MySQL Enterprise Backup Fast, Consistent, Online Backups A MySQL White Paper February, 2011 2011, Oracle Corporation and/or its affiliates Table of Contents Introduction... 3! Database Backup Terms...
Comparing MySQL and Postgres 9.0 Replication
Comparing MySQL and Postgres 9.0 Replication An EnterpriseDB White Paper For DBAs, Application Developers, and Enterprise Architects March 2010 Table of Contents Introduction... 3 A Look at the Replication
MySQL High Availability Solutions. Lenz Grimmer <[email protected]> http://lenzg.net/ 2009-08-22 OpenSQL Camp St. Augustin Germany
MySQL High Availability Solutions Lenz Grimmer < http://lenzg.net/ 2009-08-22 OpenSQL Camp St. Augustin Germany Agenda High Availability: Concepts & Considerations MySQL Replication
MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)
MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!) Erdélyi Ernő, Component Soft Kft. [email protected] www.component.hu 2013 (c) Component Soft Ltd Leading Hadoop Vendor Copyright 2013,
Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations
Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations Technical Product Management Team Endpoint Security Copyright 2007 All Rights Reserved Revision 6 Introduction This
CS 188/219. Scalable Internet Services Andrew Mutz October 8, 2015
CS 188/219 Scalable Internet Services Andrew Mutz October 8, 2015 For Today About PTEs Empty spots were given out If more spots open up, I will issue more PTEs You must have a group by today. More detail
Managing your Red Hat Enterprise Linux guests with RHN Satellite
Managing your Red Hat Enterprise Linux guests with RHN Satellite Matthew Davis, Level 1 Production Support Manager, Red Hat Brad Hinson, Sr. Support Engineer Lead System z, Red Hat Mark Spencer, Sr. Solutions
Web Application Firewalls: When Are They Useful? OWASP AppSec Europe May 2006. The OWASP Foundation http://www.owasp.org/
Web Application Firewalls: When Are They Useful? OWASP AppSec Europe May 2006 Ivan Ristic Thinking Stone [email protected] +44 7766 508 210 Copyright 2006 - The OWASP Foundation Permission is granted
Availability Digest. www.availabilitydigest.com. Redundant Load Balancing for High Availability July 2013
the Availability Digest Redundant Load Balancing for High Availability July 2013 A large data center can comprise hundreds or thousands of servers. These servers must not only be interconnected, but they
High Level Design Distributed Network Traffic Controller
High Level Design Distributed Network Traffic Controller Revision Number: 1.0 Last date of revision: 2/2/05 22c:198 Johnson, Chadwick Hugh Change Record Revision Date Author Changes 1 Contents 1. Introduction
Web Caching and CDNs. Aditya Akella
Web Caching and CDNs Aditya Akella 1 Where can bottlenecks occur? First mile: client to its ISPs Last mile: server to its ISP Server: compute/memory limitations ISP interconnections/peerings: congestion
LOAD BALANCING TECHNIQUES FOR RELEASE 11i AND RELEASE 12 E-BUSINESS ENVIRONMENTS
LOAD BALANCING TECHNIQUES FOR RELEASE 11i AND RELEASE 12 E-BUSINESS ENVIRONMENTS Venkat Perumal IT Convergence Introduction Any application server based on a certain CPU, memory and other configurations
5 Mistakes to Avoid on Your Drupal Website
5 Mistakes to Avoid on Your Drupal Website Table of Contents Introduction.... 3 Architecture: Content.... 4 Architecture: Display... 5 Architecture: Site or Functionality.... 6 Security.... 8 Performance...
Yahoo! Communities Architectures Ian Flint
Yahoo! Communities Architectures Ian Flint November 9, 2007 1 Agenda What makes Yahoo! Yahoo!? Hardware Infrastructure Software Infrastructure Operational Infrastructure Process Examples 2 What makes Yahoo!
ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy
ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy OVERVIEW The global communication and the continuous growth of services provided through the Internet or local infrastructure require to
Understanding and Building HA/LB Clusters
Understanding and Building HA/LB Clusters mod_backhand, wackamole and other such freedoms Theo Schlossnagle Principal Consultant OmniTI Computer Consulting, Inc. Defining Cluster, HA,
Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.
Architectures Cluster Computing Job Parallelism Request Parallelism 2 2010 VMware Inc. All rights reserved Replication Stateless vs. Stateful! Fault tolerance High availability despite failures If one
MakeMyTrip CUSTOMER SUCCESS STORY
MakeMyTrip CUSTOMER SUCCESS STORY MakeMyTrip is the leading travel site in India that is running two ClustrixDB clusters as multi-master in two regions. It removed single point of failure. MakeMyTrip frequently
Building a Highly Available and Scalable Web Farm
Page 1 of 10 MSDN Home > MSDN Library > Deployment Rate this page: 10 users 4.9 out of 5 Building a Highly Available and Scalable Web Farm Duwamish Online Paul Johns and Aaron Ching Microsoft Developer
Real World Considerations for Implementing Desktop Virtualization
Real World Considerations for Implementing Desktop Virtualization The Essentials Series sponsored by Intro duction to Desktop Virtualization for the IT Pro... 1 What Is Desktop Virtualization?... 2 VDI
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems Finding a needle in Haystack: Facebook
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment
Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment WHAT IS IT? Red Hat Network (RHN) Satellite server is an easy-to-use, advanced systems management platform
Tuning Tableau Server for High Performance
Tuning Tableau Server for High Performance I wanna go fast PRESENT ED BY Francois Ajenstat Alan Doerhoefer Daniel Meyer Agenda What are the things that can impact performance? Tips and tricks to improve
Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment
Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment WHAT IS IT? Red Hat Satellite server is an easy-to-use, advanced systems management platform for your Linux infrastructure.
Setup Guide Access Manager 3.2 SP3
Setup Guide Access Manager 3.2 SP3 August 2014 www.netiq.com/documentation Legal Notice THIS DOCUMENT AND THE SOFTWARE DESCRIBED IN THIS DOCUMENT ARE FURNISHED UNDER AND ARE SUBJECT TO THE TERMS OF A LICENSE
Network Attached Storage. Jinfeng Yang Oct/19/2015
Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability
SAN Conceptual and Design Basics
TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer
Rails Application Deployment. July 2007 @ Philly on Rails
Rails Application Deployment July 2007 @ Philly on Rails What Shall We Deploy Tonight? Blogging/publishing system Standard Rails application Ships with gems in vendor directory Easy rake task for database
Geospatial Server Performance Colin Bertram UK User Group Meeting 23-Sep-2014
Geospatial Server Performance Colin Bertram UK User Group Meeting 23-Sep-2014 Topics Auditing a Geospatial Server Solution Web Server Strategies and Configuration Database Server Strategy and Configuration
Web Application Hosting Cloud Architecture
Web Application Hosting Cloud Architecture Executive Overview This paper describes vendor neutral best practices for hosting web applications using cloud computing. The architectural elements described
Virtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
Monitoring Microsoft Exchange to Improve Performance and Availability
Focus on Value Monitoring Microsoft Exchange to Improve Performance and Availability With increasing growth in email traffic, the number and size of attachments, spam, and other factors, organizations
1Intro. Apache is an open source HTTP web server for Unix, Apache
Apache 1Intro Apache is an open source HTTP web server for Unix, Microsoft Windows, Macintosh and others, that implements the HTTP / 1.1 protocol and the notion of virtual sites. Apache has amongst other
A virtual SAN for distributed multi-site environments
Data sheet A virtual SAN for distributed multi-site environments What is StorMagic SvSAN? StorMagic SvSAN is a software storage solution that enables enterprises to eliminate downtime of business critical
Active-Active and High Availability
Active-Active and High Availability Advanced Design and Setup Guide Perceptive Content Version: 7.0.x Written by: Product Knowledge, R&D Date: July 2015 2015 Perceptive Software. All rights reserved. Lexmark
Application Brief: Using Titan for MS SQL
Application Brief: Using Titan for MS Abstract Businesses rely heavily on databases for day-today transactions and for business decision systems. In today s information age, databases form the critical
Accelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
MyISAM Default Storage Engine before MySQL 5.5 Table level locking Small footprint on disk Read Only during backups GIS and FTS indexing Copyright 2014, Oracle and/or its affiliates. All rights reserved.
PolyServe Matrix Server for Linux
PolyServe Matrix Server for Linux Highly Available, Shared Data Clustering Software PolyServe Matrix Server for Linux is shared data clustering software that allows customers to replace UNIX SMP servers
Business-centric Storage for small and medium-sized enterprises. How ETERNUS DX powered by Intel Xeon processors improves data management
Business-centric Storage for small and medium-sized enterprises How DX powered by Intel Xeon processors improves data management Data management requirements are increasing day by day Storage administrators
ENTERPRISE-CLASS MONITORING SOLUTION FOR EVERYONE ALL-IN-ONE OPEN-SOURCE DISTRIBUTED MONITORING
ENTERPRISE-CLASS MONITORING SOLUTION FOR EVERYONE ALL-IN-ONE OPEN-SOURCE DISTRIBUTED MONITORING 1 CONTENTS About Zabbix Software... 2 Main Functions... 3 Architecture... 4 Installation Requirements...
An Oracle White Paper July 2011. Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide
Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide An Oracle White Paper July 2011 1 Disclaimer The following is intended to outline our general product direction.
MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?
MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? Ashutosh Shinde Performance Architect [email protected] Validating if the workload generated by the load generating tools is applied
TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models
1 THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY TABLE OF CONTENTS 3 Introduction 14 Examining Third-Party Replication Models 4 Understanding Sharepoint High Availability Challenges With Sharepoint
4D WebSTAR 5.1: Performance Advantages
4D WebSTAR 5.1: Performance Advantages CJ Holmes, Director of Engineering, 4D WebSTAR OVERVIEW This white paper will discuss a variety of performance benefits of 4D WebSTAR 5.1 when compared to other Web
<Insert Picture Here> Oracle In-Memory Database Cache Overview
Oracle In-Memory Database Cache Overview Simon Law Product Manager The following is intended to outline our general product direction. It is intended for information purposes only,
BigMemory & Hybris : Working together to improve the e-commerce customer experience
& Hybris : Working together to improve the e-commerce customer experience TABLE OF CONTENTS 1 Introduction 1 Why in-memory? 2 Why is in-memory Important for an e-commerce environment? 2 Why? 3 How does
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based
How To Choose Between A Relational Database Service From Aws.Com
The following text is partly taken from the Oracle book Middleware and Cloud Computing It is available from Amazon: http://www.amazon.com/dp/0980798000 Cloud Databases and Oracle When designing your cloud
Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.2
Job Reference Guide SLAMD Distributed Load Generation Engine Version 1.8.2 June 2004 Contents 1. Introduction...3 2. The Utility Jobs...4 3. The LDAP Search Jobs...11 4. The LDAP Authentication Jobs...22
Web Application Hosting in the AWS Cloud Best Practices
Web Application Hosting in the AWS Cloud Best Practices May 2010 Matt Tavis Page 1 of 12 Abstract Highly-available and scalable web hosting can be a complex and expensive proposition. Traditional scalable
Storage Backup and Disaster Recovery: Using New Technology to Develop Best Practices
Storage Backup and Disaster Recovery: Using New Technology to Develop Best Practices September 2008 Recent advances in data storage and data protection technology are nothing short of phenomenal. Today,
