1 20steps Version: December 2014 (Draft) Bricks Cluster Technical Whitepaper by Helmut Hoffer von Ankershoffen (Owner & CTO) Bastian Brodbeck (Lead Frontend Development) Zimmerstraße 26 Aufgang B 8th Floor 10969 Berlin Tel. +49 (0)30 9940 400 00 Fax. +49 (0)30 9940 400 03 www.20steps.de info@20steps.de
2 Overview Bricks by 20steps 1 is a platform for digital products that need high availability and fast responses. Therefore Bricks is deployed on a High-Availability-Cluster called Bricks Cluster. This whitepaper gives an overview of the architecture and technology used for such cluster systems. The Bricks Cluster is built with scalability and efficiency in mind without ignoring security. Its architecture is made for high availability and disaster recovery. The whitelabel configuration for an agency is called a Color in Bricks terminology 2. Each Color has its own Bricks Cluster. Architecture The Bricks Cluster consists of multiple nodes each designed for its purpose. These nodes are bare-metal machines dedicated for Bricks and no services running on the nodes are virtualized. Diagram showing Bricks Cluster Architecture 1 2 Hereafter referred to as Bricks see Bricks - Technical Whitepaper
3 The Brick is connected via multiple redundant upstreams to multiple peerings to the internet. A switch receiving all requests, delegates the requests to a load balancer which then decides what node running the application handles the request. The applications run on multiple Bricks Nodes. Each Bricks Node is protected by a firewall through which all requests go. The firewall utilises IP-tables. Each request is handled either directly by the varnish (http requests) or first by a nginx instance (https requests) as an SSLaccelerator (see Security ). The Varnish serves as cache for requests as well as a load balancer to share the load to all Bricks Nodes (see Scalability ). Diagram showing Bricks Node configuration For PHP applications such as Bricks an Apache with mod_php is used as web server. Alternatively the HipHop Virtual Machine (HHVM) can be used instead of mod_php. A Tomcat with a Java Virtual Machine (JVM) handles all Java applications written with frameworks such as Grails or Spring. Memcache a general-purpose distributed memory caching system is used to cache data in RAM to reduce the number of times an external data source must be read. Complementary to the Bricks Nodes the Bricks Cluster has two Database Nodes for persisting the applications data (see Persistency ). The application writes data on this master and reads the data from database slaves located on the Bricks Nodes. An external bbackup server is located at a different data center and a monitoring system monitors the Bricks Cluster.
4 Persistency Most applications have to deal with data and have to store or retrieve it in some way. There are many different kinds of data and how an application needs to handle it. To be able to offer a wide variety of how data is stored Bricks Cluster offer multiple solutions to keep your data persistent. Currently Bricks Cluster offers SQL (MariaDB), NoSQL (Mongo) and fulltext (Solr, ElasticSearch) ready to be used by Bricks. To keep the data persistent even if a node is unresponsive all data is stored redundant. The main Database Node is used to write data and is replicated to another dedicated Database Node. Additionally data is stored redundant on the Bricks Nodes, serving as database slaves and for reading the stored data. Some data like images or videos are not stored in a database but directly on a Bricks Node. When new media files are uploaded to the responsible Bricks Node lsyncd, a live syncing deamon monitoring local directories, automatically syncs those files to all the other Bricks Nodes in the Bricks Cluster. Diagram showing SQL-Replication representative for all database replication and File Replication
5 High-Availability For most online businesses now a days the key to success is to always be available. Not only does a customer want to access services around the clock, he also wants to be sure that no data is lost should the service be unavailable. Unavailability of a service or website does not only mean that the server is currently not available, but also that the load of the servers is so high that the request has to wait a long time. To prevent that, Bricks Cluster is built redundant by having multiple Bricks Nodes and Database Nodes. All requests for an application are shared across the Bricks Nodes being able to accommodate the outage of a node. Even the load balancer is redundant. Each Bricks Node has a software load balancer provided by Varnish. One being responsible for distributing the requests to the nodes, the others as a backup to take over for the active if needed. A regular heartbeat check of the Bricks Nodes constantly checks for the availability of each Bricks Node to be able to change the responsible load balancer. In case one Bricks Node does not respond anymore or goes offline another Bricks Node takes over its responsibility. The Database Node master is replicated to a second Database Node being able to take over the first one in case of an outage. The Bricks Cluster is monitored to detect any problem as soon as possible. This way we can react quickly and bring the Bricks Cluster back to its full potential in no time. All these methods and our data centers partner Hetzner help the Bricks Cluster to have an uptime of 99%. Disaster Recovery In case of any kind of disaster leading to the loss of data Bricks Clusters periodically makes backups of all data and keeping them save at another location. This additionally minimizes the risk of losing any data. All backups are kept for a certain time, having multiple backups made at various dates available in case one is needed. Additionally all the applications code is safely stored on an git remote server. Git is a distributed revision control and source code management (SCM) system allowing us not only to be able to restore any state of the applications code at any time, but also to quickly deploy the application on new servers. Scalability To balance the current load for one application an Bricks Node is equipped with a software load balancer provided by Varnish able to delegate a request to another Bricks Node with free capacity. In case all the capacity provided by the available Bricks Nodes does not suffice anymore, we simple add one (or more) Bricks Node with the same configuration to the Bricks Cluster and add it to the lload balancers configuration and the load balancer will automatically delegate requests to the new system.
6 All databases do replicate their data on each Bricks Node, serving as a backup in case the master database should suffer dataloss. Additionally this data can be read without the overhead of requesting it first from another server. This is especially useful for example Solr, which is used for calculation heavy searches - each Bricks Node has all relevant data available without the need to request it first. Efficiency Efficiency is what can make the difference compared to a competitor. A slower website has a higher bouncing rate and with the need for continuously more complex online services the need for efficiency is bigger than ever. Bricks Cluster helps getting the most out of an application by using various techniques to reduce the time needed to respond to an request by customers. Bricks Clusters utilizes Varnish as a caching system to reduce the load of the Apache web server by storing and serving the respond to the same requests. The application itself uses memcache to store data or objects redundant and distributed over the Bricks Nodes to reduce reading external data sources. The execution of the application on the web server is sped up by using an PHP accelerator or the HipHop Virtual Machine (HHVM). To reduce the page loading time, and therefore the response time, Bricks Cluster uses the PageSpeed module to automatically applying web performance best practices to pages and associated assets. Diagram: uncached HTTP-Request
7 Diagram: HTTP-Request with response from cache Setup Setting up a new Bricks Cluster is done by setting up a new Color and creating a new whitelabel installation of Bricks. All necessities are handled by a setup script. Adding new Bricks Nodes to the Bricks Cluster can quickly be done any time via a script written by us for that purpose. There is no need for a single person to manually configure the servers for setting up a new Bricks Cluster or a new node. Security Each Bricks Node is protected by a firewall configured via IP-tables to deny unwanted access and for protecting the data. HTTPS requests are handled by an nginx instance serving as SSL-accelerator by decrypting the request keeping the Apache responsive for other requests. The nginx then servers the decrypted request to the load balancer. The nginx not only prevents the Apache execution thread being unavailable during decryption, it also adds an additional layer of security not exposing the web server directly for secure connection. The Bricks Cluster gets available security updates preventing the exploitation of known security holes.
8 Diagram: HTTP-Request Handling versus HTTPS-Request Handling Management For deploying an application to the Bricks Cluster the distributed revision control and source code management (SCM) system called git is used. This has the benefit of having really fast and reversible deployments not having to manually upload the application via (s)ftp. Before an application is uploaded to the Bricks Cluster it gets deployed and tested on a pre-live Bricks Cluster with the same configuration. Bricks Cluster regularly gets software updates to stay up-to-date. Support Bricks Cluster offers support via multiple channels. We provide a service hotline were we a reachable 24 hours 7 days a week. In addition we provide support via email. All systems are monitored and we are getting notified when a problem arises by our hosting partner Hetzner so we can react to any problem with the Bricks Cluster. Data Center Bricks Cluster is located at the data center of Hetzner. All nodes have Quad-Core CPUs with hyper-threading-technology and are equipped with a SSD Raid 1. All nodes have a redundant 1-gigabit connection. Each node has two NICs one for internal network traffic and one for external traffic both 1- gigabit connections. Hetzner guarantees a 1 GBits/s connection to the Internet. Sustainablility
9 The Bricks Cluster is fully powered by eco-friendly renewable energy sources. All components used by Hetzner for the Bricks Cluster nodes are energy efficient.