1 CHAPTER 2 THEORETICAL FOUNDATION 2.1 Theoretical Foundation Cloud computing has become the recent trends in nowadays computing technology world. In order to understand the concept of cloud, people should understand the concept of virtualization first. They are indeed two different technologies, but they are also similar one to another. Therefore, in this chapter, the author would like to discuss the details about cloud computing, that is divided into three sections, characteristics, categories, and attributes, as well as the main objective of this thesis, what is multitenant environment and what are the risks of it. The author would like also to explain about the foundation of cloud computing, which is virtualization and three different types of virtualization, as well as the example of virtualization, virtual machine, and the main component in doing virtualization, which is hypervisor. The last thing that the author would like to discuss is Proxmox OS, in which will be used by the author to demonstrate the testing in this project. The author of this thesis is hoping that these explanations of unusual terminologies that will be used quite often in the thesis will help the readers to be able to understand the theories, concepts, and the whole ideas behind them Cloud Computing Cloud computing has become the most discussed topic recently in technology world. Basically, cloud computing can be defined as an environment that people can use the technology as they need it, as long as they need it, and pay as their
2 usage, without involving any installation on their own desktop. Cloud can only work as long as there is an internet connection. Therefore, a fast and reliable internet connection is a crucial element in order to use the cloud. Figure 1 - The Cloud Computing Adoption Model  By looking at the figure above, it shows the cloud computing adoption model, in which it displays the levels of adoption starting from the general to the detail. The first level, which is Virtualization, is the basis of cloud computing. Virtualization is most likely connected to hypervisor, in which it is used to divide server so it can be used to install multiple OSs. Virtualization is also coordinating applications, so it can work properly with multiple virtual images . The second level is Cloud Experimentation, which is in order to succeed in cloud computing world, people need to have knowledge on this field. The knowledge is gathered by doing experimentation in cloud, and by then people can have the experience by doing so. The experience and knowledge are needed
3 in order to progress to the next level. The third level is the Cloud Foundations, and in this level basically people should establish the foundation for creating the application, a platform that will be used to manage the virtualized applications . This level is compulsory, so people will have a strong foundation first, before going to the next level. The fourth level is the Cloud Advancement, in which at this stage people should be ready to implement their own cloud or use the public cloud available on the internet. The virtualized applications on the previous level should also be implemented and ready to use. Finally, the last level of the cloud adoption model is the Cloud Actualization: Hypercloud. On this level, the usage of cloud technology will be at its maximum performance. Maximum performance can be defined as a fully-dynamic and autonomic computing environment, in which the application workload can be distributed equally among the cloud servers . The aim for this level is to get a comparative cost advantages rather by using traditional model. In other words, cloud computing is an approach to deliver services, while virtualization is the service itself . When virtualization is basically one physical computer that virtualized into many computing environments, but on the other hand, cloud computing is many different computers that pretending to be one computing environment. Therefore, the author can conclude that cloud computing and virtualization are two different things, but they are closely related to each other.
4 Characteristics Characteristics are what make something distinct from the others. In computing world, it is important to have a well-distinct characteristic, so the technology proposed can be stood out in the market. In this case, cloud computing also has two characteristics that differentiate it from other technology. 1. The ability of inter connectivity of computer servers the user will be able to carry out a variety of tasks in different locations . Meaning that, as long as the user have internet connection, an access to the system is still possible no matter the location of the user or where the system is hosted. 2. Allows for the outsourcing of a key element of the company s work - When dealing with an IT department, a huge amount of budget is sometimes needed in order to implement a good system in one company. One way to reduce the cost for developing IT system is by having a cloud computing system . The system can be relocated to a cheaper environment, in which the cost of hardware, service, and other cloud related resources, are lower than the resources in the current location. Therefore the budget saved from having a cloud computing system can be allocated into another department that needs it.
5 Categories Since the term cloud computing is so broad, a categorization takes place in order to divide the scope specifically. Cloud computing can be divided into three categories based on the service that the vendors provide. The categories are Software-as-a-service (SaaS), Infrastructure-as-a-service (IaaS), and Platform-as-a-service (PaaS), as shown in the figure below. Figure 2 - Cloud Service Models  Software-as-a-service (SaaS) SaaS can be defined as a web-based software deployment model that makes the software available through a web browser . In other words, the vendors of this service is renting the software on the internet, so people can use it and pay for what they have used. As the user of this service, people will not need to worry about the cost for buying the application, the compatibility of operating
6 system, or even the language that is used to write the application. The user can put aside those attributes and start to use the application on the cloud. Gmail, YahooMail, and Hotmail are the examples of SaaS in terms of an program. People can access the mail server from those companies, without the need to install any software. They just need to have a web browser and internet connection to access the . It is different when compared to Microsoft Outlook or Apple Mail, even though it provides the same functionality as an program. The users of these applications will be required to install the software itself Infrastructure-as-a-service (IaaS) IaaS provides computing resources (e.g. server, routers, processing power, network bandwidth, and storage) as the service . In other words, the users can rent resources from the IaaS vendors, rather than buy it, in which it will cost much more. When the customers buy this service, they will only pay for what they had used, no less and no more. By using the IaaS, customers will gain two benefits from it. The benefits are: 1. In terms of cost, since the customers are only paying for the resources that they actually used or consumed , the
7 cost will be lower compared to the traditional computing, in which customers need to pay a fixed amount no matter if they use it or not. 2. In terms of elasticity, while using cloud, the customers can control the number of resources that they are willing to use at any given time . Based on their own computing requirements and configuration, the IaaS provider can respond to quickly scale up or down the resources. Therefore, when the traffic is at high, the customers can contact the provider to scaled up the capacity in order to manage the huge traffic at that time, and when everything goes back to normal, it can be scaled down up to its original position. GoGrid and ReliaCloud are the example of IaaS vendors . GoGrid is a company that offering Windows and Linux cloud servers, elastic hardware load balancing (it divides the internet traffic across two or more web servers ), and cloud storage. On the other hand, ReliaCloud is offering cloud servers and storage resource Platform-as-a-service (PaaS) The term platform itself means a base for creating something. In this case, PaaS vendors will provide the customer with the infrastructure, and a complete operational and development
8 environment for the deployment of the applications . By having this service, the customers should not worry anymore about how their data centers should be housed, what hardware are needed to build a data center, or even the high cost of electricity for having a huge data center followed by the systems to keep it nice and cool, and not overheat. However, there is a downside in using PaaS. When the user wants to use this cloud service, the user need to write the code in the vendor s specified language . The most well-known PaaS is Google App Engine and Windows Azure. In order to use the Google App Engine, users need to write the applications in Phyton in the Google s development frameworks. On the other hand, when users want to create an application using Windows Azure, they need to write it in Microsoft based programming language , i.e..net, or Microsoft Visual Studio s language, i.e. Java, PHP, and jsp Attributes There are five attributes of cloud computing that people can see and analyze on how strong a cloud service can be to the cloud computing model .
9 Service Based Cloud service provided by the vendors can be categorized as an off the shelf, because the service is created to fulfill the specific needs of the customers, and therefore the technologies are designed to fulfill those needs Scalable and Elastic The service is scalable and elastic, because it can scale up or down the capacity as the customer demands. Different times will be having different connection traffics. In busy hours for example, the traffics are much higher than usual hours, therefore the customers can decide to scale up the capacity of the resources that they used to manage the traffic going into the server. While in usual hours, the capacity can be scaled down, so the customers will not need to pay for resources that are not being used Shared The software, infrastructures, or platforms that are offered by the vendors are shared among the customers of the service. Therefore, the unused resources will be able to serve multiple needs for multiple customers simultaneously Metered by Use The services provided by the vendors are being tracked down with usage metrics to enable multiple payment models. One example
10 of the model is pay-as-you-go plan, in which customers will only need to pay based on their usage (amount of the service used, hour(s) of usage, how much data transfer), and not on the cost of the equipment Uses Internet Technologies There will be no cloud computing if there is no internet connection. Internet is the crucial factor that a cloud computing can be established. Without an internet connection, there is no way a user can access the system Multi-tenant Environment In cloud environment, vendors are sharing the software, infrastructures, and platforms to be used by multiple customers using internet connectivity. The physical server is still one machine, but it can provide services to multiple customers, this is what people called a multi-tenant environment. In IaaS, customers or tenants are sharing infrastructure resources (e.g. servers, hardware, and storage), while in SaaS, tenants are sourcing the same application, which means that multiple tenants will be most likely to stored their data in the same database . Therefore, a security and complete isolation has become the major concern for this multi-tenant environment. There is a risk that the tenants might pose one to another within the same vendor, and this is a very critical flaw that multi-tenant environment have.
11 2.1.3 Virtualization or Virtual Technology As the author has mentioned in the previous, virtualization is acting as the basis for cloud computing, therefore in order to get to know the concept of cloud computing, the reader should also understand the concept of virtualization. Virtualization can be defined as dis-associating the tight bond between software and hardware. While in the traditional world of technology, different server s operating system will be required to use its own server . In other words, there will be a lot of servers that people need to manage due to the variants of server s OSs available in the market. This condition will result in some negative effects, such as: it will consume more electricity power, waste resources, and if it has reached its maximum capacity, the owners is rather scale it up (more memory or processors) or scale it out (more servers). However, by had discovering virtualization technology, the ability to dis-associate the bond of software and hardware meaning that people can use the same hardware while having different software servers. The idea of this technology is to allow for a multiplicity of access points from a single outlet. The outlet can be in form of physical server, memory, or even computers itself. Virtualization has one crucial element to gain control of the overall system, which is the central command unit will be able to supervise the activities that are being done on outpost computers . Therefore, once the central has found out a misbehave computer, it will be detected and the supervisor can check and fix the problem.
12 Virtualization can be categorized into three types based on the object that is been virtualized. The categories are hardware or server, desktop, and storage virtualization Hardware or Server Virtualization This type of virtualization is the most common and used in IT departments of a company. The virtualization takes place on the server s hardware, in which it partition the physical server into any number of virtual servers that allows running different OSs on their allocated memory, CPU, and disk footprints . VMware, Microsoft, and Citrix are the three established companies that are experts in doing the server virtualization. There are some advantages in doing server virtualization. 1. In terms of hardware utilization, the cost that are needed to buy a new hardware/server can be reduced since all different types of OS can be run on a single physical server, and also the energy that can be saved by only having one server compared to multiple servers. 2. In terms of security, clean images can be used to restore the current system, and also by using virtual machines, it can provide sandboxing (a technique for creating a confined execution environments that can be used to run untrusted programs ) and isolation to limit the attacks from outside parties .
13 Therefore, by having virtualization, it will make the system much safer, because it is running under a complete isolation, or even when the system has already been affected by malware, the users can restore it to the latest images that they have and it will go back normal again. 3. In terms of development, debugging and performance monitoring scenarios can be easily done by having images that can be used to rewind the process if anything goes wrong or error. It is much more convenient to have images to rewind the process, rather than try to find the problem and fix it, in which it will take much more time. There are also some disadvantages by using this type of virtualization. 1. In terms of administration, even though the number of physical server is less, but the number of virtual machines can be a lot . Therefore, the administrator would need to have the ability to deal with many OS when doing the setting and maintenance. It requires more training and education to be able to run this virtualization smoothly. 2. In terms of licensing, when people are running three copies of Windows on a single machine, it may require to have three separate licenses, because many software-licensing do not take virtualization into account . Therefore, even though users can divide a single machine into many OSs, they still need to pay for
14 the OSs license, in which it will be expensive. If the OS is an open source, the users should be worried that the OS is not safe, and might be attacked by the hackers. 3. In terms of performance, to be having a full potential of virtualization, powerful processors should be installed in the virtual machines . It is fine to use ordinary processors just like other usual servers, but it will not expose the potential of virtualization, because the system will run slower compared to the normal one, since the processor is divided into several OSs Desktop Virtualization In these days, the rapid growth of business world is requiring a quick and efficient workplace that the employees need to be having. It is essential, because the more productive an employee, the more profit that the company will get. Therefore, desktop virtualization comes in handy. Desktop virtualization is the act of putting the different computing layers, and store some or all of them in a data center . By using this technology, it allows the administrator to store the application centrally and stream it to a desktop based on user access. The work flow will be much more efficient and productive, compared to the traditional system where every computer is a stand-alone PC.
15 There are some advantages in doing desktop virtualization. 1. In terms of flexibility, users of this type of virtualization will have a huge advantage in terms of flexibility. They can access their own virtualized desktop from anywhere via LAN or WAN, and at any time. If for example a user is in the middle of a meeting, yet the user is forgetting one important data. Rather than going out the meeting room and get the data using flash disk, the user can connect to his/her desktop environment, and stream the data into the current computer. 2. In terms of security, as the system is centralized, all security applications, i.e. antivirus, firewall, and intrusion detection/prevention system, can be applied on the data center and easily managed by the administrator . It is much easier rather than install and applying security applications in each and every computer that the company have. There are also the disadvantages using desktop virtualization. 1. In terms of licensing, the company still needs to buy the OS s licenses for each and every user. Therefore, there is no reduction in cost by doing desktop virtualization in licensing the software. 2. In terms of cost, the company needs to buy the desktop virtualization software, servers (if they do not have it currently), centralized storage infrastructure, and upgrade the network
16 bandwidth so the communication traffic will run smoothly . All of these matters will cost a lot of money in order to fulfill the desktop virtualization plan, despite the fact that it will help the company to be more productive than before Storage Virtualization Storage has become an important part of computer system. The main purpose of storage device is to save data or files into the computer for later purpose. It is also to provide memory space for applications and software that is installed in the computer. Storage virtualization can be defined as an act of hiding the complexity of internal functions of a storage services or devices from the applications . Meaning that, rather than having different layers of storage devices, i.e. disks, tapes, and optical devices, the virtualization will hide these layers and convert it into one group of storage, so by the time the applications want to read or write into the storage, it will be read or written in one single pool of storage Hypervisor The main component that can make virtualization happened is named hypervisor. Hypervisor is also called a Virtual Machine Manager or VMM. Hypervisor is a hardware virtualization technique that allows multiple guest OSs to run under one single host system simultaneously. Hypervisor will then allocate the resources of the host computer to be shared among those guest OSs depending on each of the OS requirement . Therefore, hypervisor plays a big role in virtualization technology. Without the resources allocation done by the
17 hypervisor, guest OSs will not be able to operate, because they have insufficient resources Virtual Machine A virtual machine (VM) is an isolated software environment that can run its own operating systems and applications inside the host computer . Meaning that VM is acting like a physical computer, in which it has its own CPU, RAM hard disk, and network interface card (NIC), but actually it is a software that is installed inside a host computer. VM will think that it is a real computer, because the operating system cannot tell any difference between a VM and physical machine. By implementing virtual machine into the system, there are some benefits that the user might get. 1. In terms of isolation, even though the VM is running inside a single computer, they are remain completely isolated from the host system as if there are two different physical machines. Also when there is a failure on one of the virtual machine, it will not affect the other virtual machine or the host computer. 2. In terms of ease of testing, VM is a portable and easy to manage software . VM software has a feature called snapshot. This feature allows the user to save the current image of the system. Therefore, before the user install new application into the system, it is better to take a snapshot before the installation process, because when the application turns out an
18 error and affecting the whole system, the user can just rollback to the last snapshot taken Proxmox VE (Virtual Environment) OS Proxmox OS is an open source operating system that has been created by Proxmox Server Solutions GmbH in the year Basically Proxmox will be used as a platform for running virtual machines. It might looks similar to the other virtual machine creator, i.e. VMware. However, there are some features that the other vendor does not have. The features are: Container and Full Virtualization This feature is created to provide the maximum flexibility for the users when using the OS. The flexibility will be in terms of the performance and usability of the OS that will be installed inside the Proxmox. There are two categories for this feature, as follow: Container Virtualization (OpenVZ) OpenVZ is able to create multiple secure and isolated containers. Each container will perform just as a traditional server, meaning that it can be rebooted, has root access, IP addresses, memory, configuration files, and applications . This category is the most suitable for running Linux servers, i.e. Ubuntu, Debian, and Fedora.
19 Full Virtualization (KVM) KVM stands for Kernel-based Virtual Machine. It is a full virtualization solution on x86 hardware, in which it had the virtualization extensions, i.e. Intel VT or AMD-V CPU . For each and every KVM, they will have their own private virtualized hardware, for example: network card, hard disk, memory, and graphics adapter. KVM is suitable for installing Windows Operating Systems Central Web-based Management In order to manage the whole servers installed in Proxmox, the admin can use a web-based application just in one place. Therefore, there is no need for the admin to open different servers application just to monitor or configure it. The admin can just open up the web browser installed in a computer, then type in the IP addresses of the virtual machines. The communication line between the admin s computer and the virtual machines will be via SSL (Secure Socket Layer) encryption, therefore the communication line will be secure Cluster and Live Migration Clustering in Proxmox meaning that the admin can install as many servers as he/she like, in which it will be represented in nodes, and make it into one cluster where there is one master node and the other will become the child node. The control and configuration will be under the
20 master node fully. Therefore, the admin will not have difficulties to manage multiple servers at once. Another important feature in Proxmox is the live migration function. The migration will be used to move one virtual machine from one node to another node. The migration is important in order to keep each and every server running smoothly, meaning that the amount of hard disk and memory is not fully occupied, or the workload of the server is not overloaded. There are two types of migration, as follow: Live/Online Migration: the migration takes place when the virtual machine is still on Offline Migration: the migration takes place when the virtual machine has been turned off System Implementation In order to make the multi-tenant architecture secure and isolated one from the others, there are some key components that need to be followed. The components are access policies, application deployment, and data access and protection .
21 Access Policies In order to maintain data confidentiality and isolation within one multitenant architecture, a policy regarding which tenant can and should be only accessing which volume is needed. Otherwise, every tenant in that physical machine might be able to access the other tenant s data. Figure 3 - User Access Policies 
22 Application Deployment Fully Isolated Business Logic In this approach, the tenant will be given a single physical server to be used for their own purposes. It means that each tenant will have their own isolated environment, and simply separated from the other tenant s server. Figure 4 - Fully Isolated Business Logic  Virtualized Application Servers In this type of application deployment, the tenant will share a single application server, but given a single virtual machine for each of the tenant. Therefore, every process will be taken place inside that particular virtual machine, under the same application
23 server. By sharing the application server, the problem of multitenancy might be occurred. There is a risk that the data being processed can be swapped within the tenants, because all of the processes are being processed inside the same server. Figure 5 - Virtualized Application Servers  Shared Virtual Servers This type of application deployment is slightly different with the previous one, which is Virtualized Application Servers. The only difference is that in this type, there is only a single virtual machine up and running to be used by both of the tenants. With this type of application deployment, the problem of multi-tenancy can be occurred at both the virtual machine, and/or the application server, similar to above. Since it is running under the same virtual machine, there is a risk that the interface that each tenant see is not belong to that particular tenant, or even the risk in login
24 problem, for example, when Tenant 1 login into the system, but suddenly, the page of Tenant 2 that is appearing on the screen. This risk can be happening due to the isolation failure or the software bug. Figure 6 - Shared Virtual Servers  Shared Application Servers The last application deployment is by using a shared application server for all of the tenants. Meaning that, all of the tenants will be using only one application server, and execute their own programs using different sessions or threads. Each and every tenant will be given a unique session ID, that will be used to uniquely identifies the tenant inside that application server. The mechanism is similar in accessing website. When Tenant 1 wants to login into the system, the server will give the session SID01 to the tenant. The server will be identifying Tenant 1 by using that session SID01.
25 Figure 7 - Shared Application Servers  Data Access and Protection All of the data inside a cloud computing will be saved into a database. The structure of the database implementation may vary. There are three ways to manage the data inside a multi-tenant environment, as shown in the figure below. Figure 8 - Three Approaches to Managing Multi-Tenant Data 
26 Separate Database By having a separate database for each and every tenant, it is the best practice to implement the complete isolation in virtualization. Even though that the resources will be shared among the tenants, but each tenant has his/her own isolated area from the others. Therefore, it is the most secure database implementation compared to the other two approaches. However, the main drawback of this approach is the high hardware and maintenance costs and requirements will be needed to make sure that the system up and running. Figure 9 - Separate Database Shared Database, Separate Schemas Another approach is to having a same database for all of the tenants, but different schemas for each of the tenants. Schema in this case means a set of tables that is used by the tenant. The main drawback for this architecture is in the data recovery process. When there is a failure in the system, restoring the database will be dealing with all of the tenants data, regardless whether all of
27 the tenants have lost their data or just one tenant. In comparison with the separate database approach, the recovery process will only took place on that particular tenant s database, rather than everyone. Figure 10 - Shared Database, Separate Schema Shared Database, Shared Schema The final approach is using the same database and the same schema for all of the tenants in the cloud. In order to differentiate among the tenants, a new column is added into the table. The column will be the Tenant ID, in which each tenant will be given a unique ID that will represent that the row belong to the particular tenant. This approach has the lowest hardware cost,
28 because it only uses one database. In contrast, the security will be the main issue for this approach. Since the data will be saved in one schema inside one database, therefore there is no complete isolation for each tenant. Figure 11 - Shared Database, Shared Schema