Apache: Traditional VS Cloud Configuration Course of Enterprise Digital Infrastructure 2014/2015 Authors: Razim Aliyev Giacomo Bellazzi Nicolò Marchesi Paulin Tchonin Eric Villa 1
TABLE OF CONTENTS ABSTRACT SECTION 1: APACHE CONFIGURATION 3 4 1.1 INTRODUCTION TO RASPERBERRY PI 2 4 1.2 INSTALLATION OF APACHE, PHP AND MYSQL 4 1.3 APACHE ENVIROMENT 5 1.4 CACHING MECHANISM 7 SECTION 2: WORDPRESS 9 2.1 BENEFITS OF USING WORDPRESS 9 2.2 INSTALLATION 9 2.3 LOGIN 12 2.4 DASHBOARD 12 SECTION 3: ADVANCED CONFIGURATIONS FOR APACHE 15 3.1 REGISTRATION OF A DOMAIN NAME 15 3.2 HTTPS/SSL IMPLEMENTATION 17 3.3 ADVANCED IMPROVEMENTS 19 SECTION 4: CLOUD CONFIGURATION 21 4.1 LAUNCH OF AN AMAZON EC2 INSTANCE 21 4.2 INSTALL REQUIRED SOFTWARE ON THE EC2 INSTANCE 24 4.3 SETUP AMAZON RDS FOR THE WORDPRESS DB 25 4.4 SETUP ELASTIC LOAD BALANCING 28 4.5 TESTING ELASTIC LOAD BALANCING THROUGH APACHE JMETER 30 4.6 BENEFITS OF AUTO SCALING 33 2
Abstract In this project we have installed a WordPress website in a single local machine and also in a Cloud infrastructure under Apache environment to perform a load test, to see how it works in terms of scalability. 3
Section 1: Apache Configuration In this first section, Apache installation and some settings will be discussed in deeper, in order to get an overview of the environment and to obtain possible improvements. We need consider following general configuration files and options that can be controlled within Apache when doing the project 1.1 Introduction to Raspberry PI 2 In this section, the installation of Apache on a Linux machine (Raspberry PI 2), will be discussed. Raspberry PI 2 used for the Linux machine. Raspberry Pi is good tool used in practice, mainly to develop projects, learn programming and purposes like these. From this point of view application of Raspberry PI 2 in the project can be very helpful. Regarding the specifications of the machine we can say that the machine has a CPU which is Quad-core and 900MHz. It has 1GB of RAM and SDRAM type of memory. As for other specifications, it is accompanied by 10/100 Ethernet in terms of networking, four USB 2 ports, Broadcom VideoCore IV Graphics card and so on. Moreover, its increased clock frequency and multiple cores make the Raspberry Pi 2 powerful tool to use for the project. 1.2 Installation of Apache, PHP and MySQL Installation of Apache, using the standard repo of Debian, is done by the following commands: sudo add-apt-repository ppa:ondrej/php5 sudo apt-get update sudo apt-get install apache2 sudo apt-get install php5 sudo apt-get install MySQL After these commands have been executed, the Apache web server will work, with the standard configuration. To test if the standard installation went good, it is necessary to open the browser 4
and type the local IP address of the machine. If a hello world page were displayed, the test would be fine. 1.3 Apache environment In this section, an overview of the Apache system will be involved, in particular all the main important files, which let the administrator to modify the settings of the web server. Apache2 folder is the main area where we work on because Apache keeps its main configuration files within the "/etc/apache2" folder. While doing the experiments we need to work with some files and sub-directories in this directory. Followings are the files and directories can be used for the configuration of the server: apache2.conf is the main configuration file for the server in which we configure defaults and this is the central point of access for the server to read configuration details. ports.conf file is used to specify the ports that virtual hosts should listen on. sites-available directory contains all of the files of configuration for virtual host that define different web sites. Now let's have a look at these files in a more detailed way: Apache2.conf File In this file we set configuration for the global Apache server process, configuration for the default server, and configuration of Virtual Hosts. In this project we will focus mainly on global configurations and modify them for testing and experiment purposes. Followings are the main configurations we will concentrate on: Timeout decides the amount of time server has to fulfill each request. By default, this parameter is set to "300", but we can modify according to our wish. KeepAlive means whether or not server allows persistent connections and more than one request per connection. By default it is on but we can turn it off for experiment purposes. If this is set to off, each request will have to establish a new connection, which can result in significant overhead. 5
MaxKeepAliveRequests controls how many separate request each connection will have before dying.if we leave this number high we can achieve maximum performance. By default it is 100, but we can change it for testing purposes. KeepAliveTimeout how many seconds we wait for the next request from the same client on the same connection. If we timeout limit is reached, then the connection will die and server will have to establish a new connection. By default it is 5, but again we can modify it to test. Ports.conf file As we said in this file we specify which ports the virtual hosts listen to, and we can do configurations for SSL. So it is also important for the implementation of HTTPS. We can change the ports or add more ports here. For example: NameVirtualHost *:80 Listen 80 NameVirtualHost *:443 Sites-available directory In sites-available directory we have files for virtual host declaration. Our configuration file fantacalciopizza.com.conf is located here and we can configure our parameters here. For example, we can decide parameters like: ServerAdmin: fantacalciopizza@gmail.com, ServerName: fantacalciopizza.com, ServerAlias: www.fantacalciopizza.com, DocumentRoot /var/www/fantacalciopizza.com/public_html Document Root/var/www is one of the impotant directories because by default the document root folder for apache2 in Ubuntu is /var/www. This is where we can store our site documents. But we change the default site location to a different one if we wanted to. In our case our root file is located in /var/www/fantacalciopizza.com directory. This aspect is very important for Virtual Private Hosting. 6
1.4 Caching mechanism Caching is a very important task, because can improve the performance of the Apache web server.this operation can be done in different levels such as caching for files, key values or HTTP. File caching File caching is a basic caching strategy which simply opens files when the server starts and keeps them available to speed up access. It is mainly used improve performance of slow filesystems. The important module used here is mod_file_cache. To use this module, we need to enable the module. For example: sudo a2enmod file_cache Then we can set up file handle caching, using the CacheFile directive. This directive takes a list of file paths. Thus, when restarted apache will open the files listed and store their files in the cache for faster access. Key value caching Key value caching used mainly for storing SSL sessions or authentication details. For instance, it is used to avoid repeating expensive operations involved with setting up a client's access to content and so on. Primary modules used here are mod_socache_dbm, mod_socache_dc, mod_socache_memcache, mod_socache_shmcb. The handshake that must be performed to establish an SSL connection and this can be overhead. So caching the session data we can avoid this overhead. HTTP caching HTTP caching is used for caching general content. The Apache HTTP caching mechanism caches responses according to the HTTP caching policies. Primary modules involved in this part is mod_cache. In order to enable caching, we need to enable the mod_cache. We can enable these modules by typing: 7
sudo a2enmod cache sudo a2enmod cache_disk Most of the configuration for caching happen in virtual host definitions or in different location. However, enabling mod_cache_disk also enables a global configuration for some general attributes. We are more interested in virtual host part so to do this we go to following directory: sudo nano / etc/ apache2/sites-enabled One of the interesting concepts for us is cookies We can ignore cookies by telling Apache ignore Set-Cookie headers and not store them in the cache. So the Set-cookie header will be removed before the headers are cached. CacheIgnoreHeaders Set-Cookie We can enable caching for our virtual host by configuring directives such as CacheEnable disk CacheHeader on CacheDefaultExpire 600 CacheMaxExpire 86400 CacheLastModifiedFactor 0.5 We can modify Etags by using FileEtag directive. FileEtag All This will add "public" to the value our Cache already has and will include an Etag for our content. 8
Section 2: WordPress In this part we will see how to setup a website using a content management system WordPress. WordPress is an Open Source software system that can be used by everyone to create blogs and website. It s customizable by the use of themes and plugins that can be downloaded from the WordPress site or from other place on the web. WordPress started in 2003 and now is the largest hosted blogging tool also used by big company like Samsung or New York Times. 2.1 Benefits of using WordPress Here are some reason that led us to use WordPress: 1) it s easy to use We don t need a bachelor degree to setup a blog or a website using WordPress. 2) It s free 3) It s easy to perform a load test on a web server with a WordPress website using the tool BLAZEMETER 2.2 Installation To install WordPress we need essentially three thing: Web server Database MySQL The installation file of WordPress that can be easily downloaded from the website of WordPress. Once we have downloaded the latest version of WordPress, which is a zip file. we need to unzip it and move all the files on the web server. To upload all the file of WordPress on our web server, we need a software which can handle FTP connection. For this purpose, we used Cyberduck. 9
Cyberduck is an open source client for FTP and SFTP, WebDAV, and Amazon S3, available for Mac OS X and Windows. Now we can start the installation of WordPress with our browser, by going to the page http://www.fantacalciopizza.com/setup-config.php. The domain name of our website is fantacalciopizza.com. Step 1: the first thing we have to do, is to choose the language in which we want to continue the installation. 10
Step 2: provide information about our database, which will be used by WordPress to create some tables, which will be useful to manage our website. Step 3: now we just have to enter some information like username and password, which we will employed for the administration of our website and press the button "Install WordPress" to conclude the installation. 11
2.3 Login Before making any changes in our website, we will need to login using username and password, which we have defined during the installation process. To login for our website we can go to the following URL http://www.fantacalciopizza.com/wp-admin 2.4 Dashboard Once we ve logged in, the WordPress dashboard appears. It s our main administration homepage. This homepage are subdivided into three parts: header area (at the very top of the 12
dashboard), menu options (at the left hand side of the dashboard), and main Panel (at the center part). In the dashboard menu option we can find all the options to update and configure our website. Here are some options: Posts This is where you can create a new Blog Post. You can also update your Categories and Post Tags. Media This is where all our uploaded images, documents or files are stored. Pages 13
This is where you create and maintain all our Pages. Comments We can manage all Comments within this section, including replying to comments or marking them as Spam. Appearance This menu is where we can control how our website will look. We can choose a new Theme, manage our site Widgets or Menus. Plugins Plugins extend and expand the functionality of WordPress. We can add or delete plugins within here as well as activate or deactivate them. 14
Section 3: Advanced configurations for Apache In this section, some important advanced configurations for Apache will be discussed, such as the registration of a domain name, the implementation of HTTPS and improvements about redirection in case of errors and security. 3.1 Registration domain name In this section, they will be discuss the procedures to follow, in order to register a domain name for the web site, implemented in the project and how to setup the Apache web server, to let it works correctly. The domain names are very important in the Internet infrastructure, for many reason; first of all, they're useful, because a name is more easily recognizable and memorizable by humans, instead of the IP address of a specific web server, where a web site is hosted. For the administrator of the web server, is more easy to deal with domain names, in case it is necessary to move to a different physical location, without any modification for the user's side. There are many layers/levels of domain names, for example top-level-domain, such as.com,.eu,.it, second-level, third-level and lower. All the regulations for domain names, are under authoritative by the company called ICANN, which defines rules, syntax etc... for this purpose. Due to the fact that there are many requests of registration for domain names, ICANN has delegated other companies to do that. On the web, there are a very large number of provides, which can offer the possibility to register a domain name, for a very cheap price. For this project, has been choosen one of the cheapest one, which is called GoDabby.com, which offers domain registration for 7.99 euro/years. In this project, has been bought a second-level domain, which has the following syntax: domain_name.extension. The first step to make, in order to register a domain name, is to check if is actually available. Onced checked that the domain name is free, is possible to buy different extension, for example mydomain.com, mydomain.it etc... After selected the domain name, it's required to fill some fields, with information of the owner of the service. It is very important to write the correct values, because in case of errors, the domain registration can fail. Also, in the case of the country code top level domain, some additional rules have to be agreed. For instance, for the.it case, the owner should has an Italian physical address. 15
Once all the data has been sent and the service has been paid, in less than one hour, the domain will be available to be used. Some additional services can be purchased, for example the one which let to improve the privacy, against the use of the command WHOIS, which let everyone to obtain some personal information, about the owner of the domain name. Now it's possible to configure the infrastructure, to be used with the DNS. First of all some, further information about the infrastructure should be issued. The web server has been installed into a residential ADSL network, where the ISP provides a static IP address. To let the web server be reached from outside the local network, some port forwarding is necessary. To make this operation, is necessary to have access to the page control of the main router and set, that every requests on port 80, which comes from the public IP of the router, should be forwarded to the local IP of the machine. To test if this operation has been done correctly, it is enough to open the browser from a different network and write the public IP address of the residential ADSL. If the page of the web site has been displayed, the configuration made was good. The next step, has the objective to setup the DNS server, to let translate the domain name, which has been registered, to IP address of the web server. Usually, two ways are possible: the first one is to register a record of type A or AAA in the authoritative web server of the ISP of the residential ADSL, which contains the IP address of the web server. In case of this procedure isn't available (for consumer contracts typically isn't possible), a kind of pointer operation is required. In this case, due to the fact that the first way isn't available, the second one has been followed; for this operation, it is necessary to go to the dashboard of GoDabby, to the section DNS Zone File. In the section for the record of type A, it is necessary to insert the public IP address of web server for the specific domain purchased and the DNS server of GoDabby, will manage any DNS queries to translate the hostname, to the correct IP address. It is also required to insert the TTL value for the record of type A, and this is very important for security reason; in this case the value has been setted to be 1/2 hour. The last step, is to set the domain name, as server name, in the configuration file in the folder /etc/apache2/sites-available. In this case, in the section of the VirtualHost for port number 80, it is necessary to add the following line: 16
ServerName www.mydomain.com Onced saved the file, it is necessary to reload the server Apache, by typing the following command, in the terminal: sudo service apache2 restart After completed all these operations and waited some hours for the initialization of the DNS server, it is possible to have access to the web site, hosted in the web server, by typing the domain name. 3.2 HTTPS/SSL implementation In this section, the HTTPS protocol and its implementation in the Apache web server will be discussed. In the standard configuration, Apache is only listening the traffic on port 80, where the HTTP stands. Due the fact that HTTP doesn t implement any sort of security, because all the packets sent are in plain text, it is possible to have a look them and for this reason, some credentials, can be sniffed in a very easy way, for instance with Wireshark. For this reason, the use of HTTPS is very important in a web server, especially in the case where the web site hosts some personal information or credentials. HTTPS is a combination of the use of the standard HTTP with the security capabilities of SSL and TLS. This protocol provides security and encryption, over the data exchanged between a client and the server. For this reason, the main motivation of the implementation of this protocol, is to provide authentication, to protect privacy and integrity of the information sent over the Internet. The key points of this mechanism are, the creation of a secure channel between the client and the server, the use of public/private key for the part of encryption and the use of certificate for the aspect of trust. Due to the fact that the HTTPS is very complex, the implementation on a web server, is not so easy, also because several steps are required, to configure it correctly. Let s summarize it: Creation of a private key for the web server for encryption Creation/acquisition of the certificate for trust Installation of the certificate on the web server Implementation of the VirtualHost over port 443 Port forwarding, in case is necessary Test if everything has been configured correctly 17
For this project, for testing, three different SSL certificate have been installed in the web server: 1. Self-signed 2. From StartSSL 3. From Comodo The first one, has been created using the following command within the Linux machine: sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /etc/apache2/ssl/the_apache_key.key -out /etc/apache2/ssl/the_apache_crt.crt After filling some information in the form, the two files will be generated in the machine. The second can be obtained, by logging on the web site startssl.com, from which is possible to get a free SSL certificate. The only requirement, is to have a domain name for the web site. In this case, it is necessary to create a private key, by inserting a password and the key will be displayed on the page. For Apache, it is necessary to export it as decrypted. After a few hours, the certificate will be delivered to the email account inserted in the registration and then it will be possible to copy all the files required, into the web server. The way to obtain the third certificate, is very similar to the previously one. Also in this case it is necessary to have domain, but in this case, the certificate will be valid for just 90 days, then it will necessary to pay. By the way, there are some different between these certificates; all of them can be used to send data from client to server in an encrypted way, but not all can guarantee the trust condition. The first one, in fact, has been generated directly on the machine and no other has certificated it, so all the browsers will display an alert box, about the fact that it isn t trusted, so possible attack, such man-in-the-middle, can occurs. For this reason, this certificate is useful just for test about HTTPS encryption. The other two, can guarantee more security, due the fact that, are trusted by other companies. The only difference, is that the second one is displayed correctly in the browser, so the green logo will be showed, but Java doesn t recognize it as trusted, for security reason. The third one, is working perfectly with browser and also with Java environment, but it isn t free. After the acquisition of the certificate, the next step is to enable the port 443, the one for HTTPS, for the web server. 18
In this case, it is necessary to deal with the file with extension.conf, in /etc/apache2/sitesavailable/, for the web site hosted in the web server. The default one, is called default-ssl.conf. After the line of code, for the VirtualHost of port 80, it is necessary to add a new VirtualHost for port 443, with these lines: <VirtualHost *:443> SSLEngine on SSLCertificateFile /etc/apache2/ssl/ssl.crt SSLCertificateKeyFile /etc/apache2/ssl/private.key SSLCertificateChainFile /etc/apache2/ssl/sub.class1.server.ca.pem Of course, the file with extension.crt, key and.pem, must have the correct name. Also in this case, it is necessary to make some configuration, on port forwarding over the port 443 on the router configuration page. Once all these tasks have been completed, it is possible to test that the Apache web server, works correctly with HTTPS. Here are the results of the test: this logo has been displayed for the first certificate, the one which has been self- generated this logo has been displayed for the other two certificates; in this case, the HTTPS can encrypt the data and due to the fact that the certificate has been verified to be trusted, it is almost sure that the data send are safety exchanged with the correct server. 3.3 Advanced configuration In this section, some additional features available in Apache will be discussed, to improve security for the web server. By default, Apache will list all the file within a folder, if there is no any file named index. In this case, all the file available within that folder, will be listed. Of course, this can reduce the security of the web site, because all the files will be able to be downloaded. By the way, the file with extension.php, will be downloaded already compiled by the web server, so no PHP code can be downloaded. In all case, the fact that some files, which aren t linked in the web site, can be seen from everyone, of course isn t so good. 19
To modify this aspect, is necessary to open the main configuration file called apache2.conf, which is in the /etc/apache2/ path. Once opened, it is necessary to go the lines of code about <Directory> and replace them with: <Directory /var/www/> </Directory> Options FollowSymLinks AllowOverride None Require all granted The line of code, which block the listing of the files, is Options FollowSymLinks. Another improvement, can be done about the standard not found page displayed in Apache. In fact is possible to modify that the page, with another one, which has been customized by the user. To complete this operation, it is necessary to add the following line of code, to the VirtualHost section of the.conf file of the website: ErrorDocument 404 "ERROR MESSAGE" Of course is possible to insert an entire HTML file, which can be more confident for the user. It is also possible, to customize the other pages, for all types of HTTP codes. The last one advanced configuration discussed, is about the redirection mechanism. If the HTTPS protocol is enabled in the web server, it is better to use it respect to HTTP in the private are and don t let the user use the HTTP for that access. For this reason, it is a good idea to set up the redirection mechanism, to improve security for the user. In this case, it is necessary to modify the.conf file for the web site, and add the following line of code in virtualhost section for port 80: Redirect permanent / https://domain_name In this case, all the requests for the web site with the standard HTTP protocol, will be redirected to the secure HTTPS. Of course is possible to select just some redirection operation, for just some directory, such the one for the login. With all these operations, the web server will more secure for both the clients and the administrator of the infrastructure. 20
Section 4: Cloud configuration The next step in our project was to migrate the configuration of the Apache MySQL server and WordPress on the cloud. For this task we used the services provided by Amazon Web Services. In the next paragraphs we will explain the steps we did in order to design our cloud based architecture, what Amazon services we used and their features. 4.1 Launch an Amazon EC2 Instance Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It s simple web service interface allows to obtain and configure capacity with minimal friction and provides complete control over the computing resources. Amazon EC2 reduces the time required to obtain and boot new servers instances to minutes, allowing to quickly scale capacity to better follow the variation in computing requirements, both up and down. Moreover it changes the economics of computing by allowing to pay only for the capacity that is actually in use. Amazon EC2 provides the tools to build failure resilient applications and block common failure scenarios. As the Raspberry PI, the Amazon EC2 instance will act as the web server hosting an Apache Web server, WordPress and MySQL Client installation. As we will examine in depth later, to guarantee the statelessness of each instance, there will be no MySQL server installation on the EC2 instance as the database will be hosted on a different service called RDS. For creating the EC2 instance we have to access the EC2 section on the Amazon Dashboard and click on Launch Instance; this will open up a Wizard that will guide us on the configuration of our machine in the cloud: 1. Choose an Amazon Machine Image (AMI) An Amazon Machine Image (AMI) provides the information required to launch an instance, which is a virtual server in the cloud. For what concernes us it is a template for the root volume of the instance (i.e. an operating system, an application server, and applications). We choose an Amazon Linux AMI (HVM) with an EBS-backed SSD. The default image includes AWS command line tools, Python, Ruby, Perl, and Java and the repositories include Docker, PHP, MySQL, PostgreSQL, and other packages. HVM AMIs are machines with a fully virtualized set of hardware so the virtualization type provides the ability to run an operating system directly on top of a virtual machine without any modification, as if it were run on the bare-metal hardware. The Amazon EC2 21
host system emulates some or all of the underlying hardware that is presented to the guest. 4.1 - Example of AMI registration and instance launch 2. Choose an Instance Type Next we can select the hardware configuration of our instance. As the only one eligible for the free tier we choose a T2.micro istance that comes with 1 GB of RAM memory, 1 virtual processor with 2.5 Ghz clock frequency and backed with an EBS-storage of 8 GB. This kind of instance can be created only in a Virtual Private Cloud (VPC) 3. Select a Virtual Private Cloud (VPC) Amazon Virtual Private Cloud lets you provision a logically isolated section of the cloud where we can launch AWS resources in a defined virtual network. This gives complete control over the virtual networking environment like the selection of IP address range, creation of subnets, and configuration of route tables and network gateways. For launching the EC2 instance we have to create a new VPC that will contains all the required infrastructure. 4. Configure Security Groups A security group acts as a virtual firewall that controls the traffic for one or more instances. When we launch the instance we associate one or more security groups with the instance and add rules to each security group that allow traffic to or from its associated instances. We choose these rules for being able to access to the instance in a secure way through SSH: 22
4.2 - security groups configuration 5. Create a Key Pair Amazon EC2 uses public key cryptography to encrypt and decrypt login information. Public key cryptography uses a public key to encrypt a piece of data, such as a password, then the recipient uses the private key to decrypt the data. The public and private keys are known as a key pair. To log in to the instance, we must create a key pair, specify the name of the key pair when you launch the instance, and provide the private key when you connect to the instance. Linux instances have no password, and you use a key pair to log in using SSH. After creating the key pair in.pem format we have to download the key for later use. 6. Recap and Launch As the last step Amazon provide us with all the information entered so far as a quick recap and asks if we want to launch the EC2 instance with this configuration. In the Instance Dashboard we can see our newly created instance: 4.3 - EC2 instance status and description 23
4.2 Install Required Software on the EC2 Instance The AMI specification tells that the instance comes already with a set of preinstalled software but for setting up our WordPress installation we need to install Apache and MySQL Client, we don t need MySQL Server because, as previously stated, the database will not be hosted on the EC2 instance but on a separate service. We can access the EC2 instance through PuTTY, a free implementation of Telnet and SSH for Windows and Unix platform. As PuTTY does not natively support the private key format (.pem) generated by Amazon EC2 so PuTTY has a tool named PuTTYgen which can convert keys to the required PuTTY format (.ppk). We have to convert our private key into this format (.ppk) before attempting to connect to our instance using PuTTY: 4.4 -.ppk key generation through PuTTyGen 24
With our newly generated key we can connect via SSH to the EC2 instance just specifying the IP address of the instance and the key for authentication. The default user created on any instance is ec2-user. We can now update all the packages installed on the instance and begin the installation of a LAMP web server. The method to achieve this is no different from any traditional Linux package installation: sudo yum update sudo yum install -y httpd24 php56 MySQL55 php56-mysqlnd Lastly we have to install the WordPress package by donwloading the packet from the official source and unzip the folder: wget https://wordpress.org/latest.tar.gz tar -xzf latest.tar.gz As the configuration is identical to the traditional installation we will skip all the next part so take as reference the previous sections. 4.3 Setup Amazon RDS for the WordPress DB Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database management tasks, freeing you up to focus on your applications and business. As we want to exploit all the features of the cloud we have to design our architecture to be stateless, this means that all the data needed by the instances that hosts the WordPress installation cannot be stored on the instance itself. In particular WordPress stores in the database the users and the actual content posted. The database will be instead placed on the RDS service and linked to WordPress during configuration. 25
For creating the RDS database we have to access the RDS section on the Amazon Dashboard and to Click Launch DB Instance, this will launch the DB Instance Wizard: 1. Select Engine Amazon RDS supports MySQL, Oracle, SQL Server, and PostgreSQL database engines. We proceed to select the MySQL engine. 2. Choose Multi-AZ deployment Amazon RDS Multi-AZ deployments provide enhanced availability and durability for Database (DB) Instances making optimal for production database workloads. When provisioning a Multi-AZ DB Instance, Amazon RDS automatically creates a primary DB Instance and synchronously replicates the data to a standby instance in a different Availability Zone (AZ). Each AZ runs on its own physically distinct, independent infrastructure, and is engineered to be highly reliable. In case of an infrastructure failure Amazon RDS performs an automatic failover for resuming database operations as soon as the failover is complete. Since the endpoint of the DB Instance remains the same after a failover, the application can resume database operation without needing a manual administrative intervention. Since this is a rather basic implementation we didn t exploit this feature, nonetheless it is worth saying that it could drastically improve the availability and durability of the DB, eliminating a configuration where a single point of failure can result in the unavailability of the service. 3. Specify Database Details For launching the DB are required some specifications that will define in more details the setup of the database architecture. The instance specification will define the setup of the machine that will run the database (the engine verision, the DB instance class for defining the computational power of the machine, the storage type...), the settings contains the instance identifier and the user with root access privilieges. In network and security we can define where in the cloud will be started the instance and the associated security group; as we aim to define a robust architecture the DB instance will be created inside the same VPC as the EC2 instance where WordPress is hosted, but will not be accessible from any point external of the VPC. Moreover the security group is configured for letting the DB instance accept request only by the WordPress instance, adding another layer of security to our architecture. 26
4.5 - database settings 4. Recap and Launch As the last step Amazon provide us with all the information entered so far as a quick recap and asks if we want to launch the RDSinstance with this configuration. We can see in the RDS dashboard the newly created instance: 4.6 - RDS instance status and description 27
4.4 Setup Elastic Load Balancing Our next goal was to set up a simple infrastructure in which you can automatically increase the number of EC2 instances you re using when the user demand goes up, and you can decrease the number of EC2 instances when demand goes down. As Auto Scaling dynamically adds and removes EC2 instances, you need to ensure that the traffic coming to your web application is distributed across all of your running EC2 instances. AWS provides the Elastic Load Balancing service to distribute the incoming web traffic, called the load, automatically among all the EC2 instances that you are running. Elastic Load Balancing uses load balancers to monitor traffic and handle requests that come through the Internet and routes them among EC2 instances inside an Auto Scaling Group. To use Elastic Load Balancing with your Auto Scaling group, you first create a load balancer and then register your Auto Scaling group with the load balancer. Your load balancer acts as a single point of contact for all incoming traffic. You can find Load Balancers under the NETWORK & SECURITY section of AWS Management Console. Clicking Create Load Balancer, a Wizard for the configuration of your Load Balancer will start. In the first step of the wizard we defined: name of the load balancer; VPC which was the same defined for the EC2 instance; listener configuration; subnets of the VPC for Availability Zones. For what concerns the listener configuration, we decided to make the Load Balancer able to receive HTTP traffic from port 80 and to forward the same to port 80 of EC2 istances. About VPC subnets, we created three subnets of the same VPC, associated to three different availability zones. The incoming HTTP traffic will be routed by the Load Balancer to these subnets; creating more than one subnet, thus distributing EC2 instances over different availability zones, we provide higher availability for the Load Balancer. The next step of the WIzard was characterized by the creation of an Ad-Hoc security group for the Load Balancer; a virtual firewall, which allows to receive only HTTP traffic from anywhere. Anywhere means that there is no IP address filtering in the incoming HTTP traffic. 28
The load balancer will automatically perform health checking on EC2 instances and route traffic only to those which passed the health check test. The Wizard allowed us also to specify Health Checks configuration. We were able to define the ping protocol, the ping port, the ping path and other advanced settings such as response timeout, health check interval, unhealty threshold and healthy threshold. We define as ping protocol HTTP at port 80 with path /health_check.txt ; this means that the Load Balancer will perform a GET to obtain a file called health_check.txt inside the Web Server root directory. Obvioulsy, the objective is not to get the file but to verify the availability of the EC2 instance. We didn t add the EC2 instance previously configured to the Load Balancer since we wanted to create an Auto Scaling Group which launches the instances and then attaches the group to the Load Balancer. Before creating an Auto Scaling Group, we had to create a Launch Configuration. A launch configuration is a template for the EC2 instances launched into an Auto Scaling group. The first thing we had to define was the AMI; since an AMI includes a template for the root volume (for example, operating system, applications, web server, application server), creating an AMI from the EC2 instance in which we have installed Apache, php, MySQL client and WordPress, allowed us to include in the Auto Scaling Group different machine with the same root volume template. The second thing we had to define was the Instance Type; it provides a description of the hardware configuration of the instances in the Auto Scaling Group. We choose, as Instance Type, t2.micro. The last thing we defined for launch configuration was the Security Group; we created a Security Group for the Auto Scaling Group. Since we created the Launch Configuration for the EC2 instances, our next step was to create the Auto Scaling Group. We assigned to the Auto Scaling Group a Launch Configuration and a name. Than we defined the starting number of EC2 inside the Auto Scaling Group. We defined also the VPC where the EC2 instances will be launched. Inside the advanced details, we assigned the Load Balancer, previously created, to the Auto Scaling Group. 29
In order to allow the number of EC2 instances to increase or decrease automatically with a certain logic, we add scaling policies. A scaling policy is a set of instructions to add or remove a specific number of instances in response to an Amazon CloudWatch alarm that you assign to it. When the alarm triggers, it will execute the policy and adjust the size of your group accordingly. We based our scaling policies on the Network Out metric. This metric identifies the volume of outgoing network traffic to an application on a single instance. The units of this metric are bytes. Concerning the DECREASE policy, we defined that one instance will be removed if the Network Out is less than 500000 bytes for 60 seconds. In the INCREASE policy we defined that one instance will be added if the Network Out is greater than 1000000 bytes for 60 seconds. At this point, to access the website on our back-end instances, we paste the DNS name, which the load balancer received by default, into the address field of a web browser. 4.5 Testing Elastic Load Balancing through Apache JMeter Since we set up our scaled and load balanced website, we exploited the load test functionality of a Java application called Apache JMeter. The first step we did to create our load test plan, was to add a Thread Group element which tells JMeter the number of users to simulate, how often the users should send requests, and the how many requests they should send. 30
As we can see from the figure, we decided to simulate 500 users. Since we set Loop Count to 1, the number of users represents the number of requests to be sent. The interval between two requests issues is defined in the Ramp-Up period. Since we specified a Ramp_Up Period of 1 second, every minute 60 requests are sent. In the second step we define the tasks that they had to perform: send HTTP requests to the Load Balancer. We specified the default settings for the HTTP requests: Name; Server Name; Port Number. We left the remaining fields with their default values. As Server Name, we specified the DNS name which the load balancer received by default. In the next step we created the HTTP request element which represented the request sent by a simulated user. The HTTP request method was already set to GET by default; we only specified the name of the HTTP request element and the Path. As Path, we specified the Web Server s Root: /. 31
After the load test started, we noticed, through AWS Cloud Watch Monitoring service, that the number of instances within the Auto Scaling Group, scaled from one, which was the initial and minimum number of machines in the Auto Scaling Group configuration, up to three. Since the size of the WordPress index page is, roughly, 30KByte and JMeter issued 60 HTTP GET request in a minute, in the same time the load balancer send out, roughly, 1.8 MByte. As we described before, we set the NetworkOut Policy Threshold to 1MByte per minute. Thus, the result we obtained, in terms of EC2 instanced scaling, was what we expected it to be. Since the load test finished, the number of instances progressively decreased to one. 4.7 - CloudWatch showing how the number of EC2 instances scales 32
4.6 Benefits of Auto Scaling Implementing this kind of infrastructure, our Website gained the following benefits: Better fault tolerance. Auto Scaling can detect when an instance is unhealthy, terminate it, and launch an instance to replace it. Better availability. Using multiple Availability Zones, if one Availability Zone becomes unavailable, Auto Scaling can launch instances in another one to compensate. Better cost management. Auto Scaling can dynamically increase and decrease capacity as needed. Because you pay for the EC2 instances you use, you save money by launching instances when they are actually needed and terminating them when they aren't needed. 33