Alfresco Enterprise on Azure: Reference Architecture Page 1 of 14
Abstract Microsoft Azure provides a set of services for deploying critical enterprise workloads on its highly reliable cloud platform. Alfresco is an Enterprise Content Management (ECM) system featuring document management, web content management, collaboration management, records management and image management. The paper provides policy- makers and system administrators with specific technical guidance on how to configure, deploy, and run an Alfresco server on Azure. Outlined is the reference architecture for an Alfresco deployment that addresses scalability, availability, and security requirements. Included is an implementation guide and Azure services that can be used to easily and quickly create a working Alfresco on Azure. Introduction Enterprises need to grow and manage their global computing platform rapidly and efficiently, while simultaneously optimizing and managing capital costs and expenses. The computing and storage services from Azure meet this need by providing a global computing platform, as well as services that simplify managing the platform, storage, and database. With the Azure platform, companies can rapidly provision compute capacity, or quickly and flexibly, extend an existing on- premises platform into the cloud. Alfresco is an enterprise content management (ECM) platform for use by organizations interested in managing business- critical processes that are related to document management, collaboration, secure mobile and desktop access to vital files. The flexible compute, storage, and database services that Azure offers make it an ideal platform on which to run an Alfresco deployment. Page 2 of 14
Alfresco Enterprise Reference Architecture While Alfresco supports a wide variety of content management use cases (including documents, records, web publishing, and more), this whitepaper presents a single common core configuration that you can adapt to virtually any scenario. The reference architecture described in this paper maps Azure services to all of the components required by an Alfresco service. This paper also includes some information on using an Azure automation template to install and configure an Alfresco cluster, which can be performed in approximately one hour. For a full detailed walkthrough of the security groups, policies, and configuration file modifications used, please see the Azure Automation template. A typical Alfresco cluster requires the following components: An HTTP(S) load balancer Two or more Alfresco servers Shared file storage A shared database We can run each of these components using Azure VM. We recommend that you simplify administration and probably lower your overall costs by using the other Azure services that correspond to Alfresco requirements. Here are the Azure services that correspond to the Alfresco requirements and that we use in this paper. The Load Balancing service provides HTTP and HTTPS load balancing across the Alfresco servers. Azure provides auto scaling, with which your Alfresco cluster can add or reduce servers based on their use, providing additional servers during peak hours and lowering costs by removing servers during off hours. This functionality is tightly integrated with the Load Balancing. Azure blob storage provides shared file storage for the cluster. Blob is an ideal storage system for Alfresco for several reasons: It is highly durable object storage designed to provide 11 9 s (99.999999999%) of durability, which means you no longer need to manage backups of your content store. Alfresco stores items as objects. Changes to objects are stored as unique objects rather than as updates to existing objects. This makes blob storage a perfect storage system. Blob storage provides virtually unlimited scalability with support for an unlimited number of objects up to 100 GB in size, and customers only pay for they use. This greatly simplifies sizing your environment, because you don't need to worry about how much space your cluster will need in the future, and your storage costs map directly to the amount of storage that you use. Azure SQL/MS SQL is used as a shared database. Initially, MS SQL on a VM is supported. Azure SQL is a managed database service all the administrative tasks for managing the database are handled by Azure. The database is deployed in multiple Availability Zones for high availability and Page 3 of 14
automatically backed up on a schedule that we can define. Once Alfresco Enterprise supports Azure SQL (the CloudlyIO team is actively working on this support with Alfresco team). Architecture Overview Before we begin working with the Alfresco, it's a good idea to familiarize ourselves with regions, Availability Zones, and endpoints, which are components of the Azure platform. Fault Tolerance As far as fault tolerance is concerned, there are some basic differences between AWS and Azure. Figure 1: Fault Tolerance Comparison between AWS and Azure Load Balancer Microsoft Azure offers load balancing services for virtual machines (IaaS) and cloud services (PaaS) hosted in the Microsoft Azure cloud. Load balancing allows our Alfresco to scale and provides resiliency to application failures, among other benefits. The load balancing services can be accessed by specifying input endpoints on our services either via the Microsoft Azure Portal or via the service model of our application. Once a hosted service with one or more input endpoints is deployed in Microsoft Azure, it automatically configures the load balancing services offered by Microsoft Azure platform. To get the benefit of resiliency/redundancy of our services, we need to have at least two virtual machines serving the same endpoint. The following diagram is an example of an application hosted in Microsoft Azure that uses load balancing services to direct incoming traffic (on address/port 1.2.3.4:80) to three virtual machines, all listening on port 80. Page 4 of 14
Figure 2: Alfresco Enterprise Reference Architecture Endpoints All virtual machines that we create in Azure can automatically communicate using a private network channel with other virtual machines in the same cloud service or virtual network. However, other resources on the Internet or other virtual networks require endpoints to handle the inbound network traffic to the virtual machine. When we create a virtual machine in the Management Portal, we can create these endpoints, such as for Remote Desktop, Windows PowerShell scripting, or Secure Shell (SSH). After creating the virtual machine, we can create additional endpoints as needed. We also can manage incoming traffic to the public port by configuring rules for the network access control list (ACL) of the endpoint. Page 5 of 14
Each endpoint has a public port and a private port. The private port is used internally by the virtual machine to listen for traffic on that endpoint. The public port is used by the Azure load balancer to communicate with the virtual machine from external resources. After creating an endpoint, we can use the network access control list (ACL) to define rules that help isolate and control the incoming traffic on the public port. Default values for the ports and protocol for these endpoints are provided when the endpoints are created through the Management Portal. For all other endpoints, we specify the ports and protocol when we create the endpoint. Resources can connect to an endpoint by using either the TCP or UDP protocol. The TCP protocol includes HTTP and HTTPS communication. Important: Firewall configuration is done automatically for ports associated with Remote Desktop and Secure Shell (SSH), and in most cases for Windows PowerShell scripting. For ports specified for all other endpoints, no configuration is done automatically to the firewall in the guest operating system. When we create an endpoint, we'll need to configure the appropriate ports in the firewall to allow the traffic we intend to route through the endpoint. Auto Scaling One of the key benefits that the Microsoft Azure technology platform delivers is the ability to rapidly scale our Alfresco, in the cloud, in response to changes in demand. Scalability is a key feature of Microsoft Azure. When we deploy alfresco to Microsoft Azure, we deploy roles: web roles for the externally facing portions of our application and worker roles to handle back- end processing. When we run Alfresco in Microsoft Azure, our roles run as role instances (we can think of role instances as virtual machines). We can specify how many role instances we want for each of our roles; the more instances we have, the more computing power we have available for that role; but obviously more instances cost more. There are, of course, some specific design requirements if our roles are to operate correctly when there are multiple instances of that role, but Microsoft Azure looks after the infrastructure requirements for us. Scaling Alfresco on Azure An Autoscaling Application Block allows us to define how Alfresco can automatically handle changes in the load levels that it might experience over time. It helps us minimize our operational costs, while still providing excellent performance and availability to our users. It also helps to reduce the number of manual tasks that our operators must perform. The application block works through a collection of user- defined rules, which control when and how Alfresco should respond when the load varies. Rules are either constraint rules that set limits on the Page 6 of 14
minimum and maximum number of role instances in alfresco, or reactive rules that adjust the current number of role instances based on counters or measurements that we collect from alfresco. The Autoscaling Application Block supports the following techniques for handling varying load levels: VM Scaling The Autoscaling Application Block varies the number of VMs to accommodate variations in the load on Alfresco. Throttling The Autoscaling Application Block limits or disables certain (relatively) expensive operations in alfresco when the load is above certain thresholds. These two autoscaling techniques are not mutually exclusive, and we can use both to implement a hybrid autoscaling solution in alfresco. For load balancing we need both the virtual machines to be under the same cloud service. Only then we get the option to load balance them. On Azure, we have a load balancer out of the box. We don t need to pay extra for it or activate it. In the moment when we have more than one instance of a specific role, load balancer will start to work and do his job. Page 7 of 14
Use the Azure Automation Template to Deploy an Alfresco Cluster This section explains the rationale behind the design of the architecture, and describes the steps that the Azure Automation template performs when it creates the infrastructure and configures the Alfresco servers. The Automation template will perform three main tasks: 1. Creating the VMs 2. Installing Alfresco and modifying configuration files 3. Configuring Traffic Manager and Auto Scaling service Creating the Infrastructure First, we create a new VM environment for the deployment. We have to select an OS template when using the Alfresco image created earlier. We must select names for the VMs (e.g. Alfresco0 and Alfresco00). Type the name of the VM selected as A2 (2 Core, 3.5GB Memory). A username and password should be selected for logging into the VM. Next, we should create a cloud service. It is important that both VMs are the same for traffic manager and scaling. Figure 3: Alfresco Cloud Service Configuration Page 8 of 14
We should select the same region for both of the VMs. If we select the same cloud service for both of the VMs, by default the VMs will be deployed to the same region. Figure 4: Region, Affinity Group, Virtual Network Now we have to create and select a single storage for both of the VMs. Figure 5: Blog Storage Availability Set is next option to set before an Endpoint can be configured. Figure 6: Availability Set Finally an endpoint can be configured. Page 9 of 14
Figure 7: Endpoint Public and private port 80 should be added to the endpoints for HTTP (TCP protocol). We also have to set the VM agent to the VMs for further accessibility of Alfresco servers. Figure 8: VM, VM Agent Page 10 of 14
Configure the Database For the first release of Alfresco Enterprise on Azure, Microsoft SQL Server 2008 will be supported. We will be deploying both master and slave databases from a VM image where the database is pre- configured. Install Alfresco The setup wizard for Microsoft Windows installs all the software and components that we require for running Alfresco. This setup wizard installs Alfresco and additional software, including a Tomcat application server, PostgreSQL database, SWFTools, LibreOffice, and ImageMagick. We will deselect PostgreSQL from the components through Advanced Setup Wizard as we want to use MS SQL / Microsoft SQL Server. Page 11 of 14
Configure Auto Scaling Auto scaling is enabled, but in an inactive state, which will be fully enabled after setting up the scheduling. While auto scaling VMs, it is important that you proactively provision the maximum number of VMs potentially needed to handle your peak capacity, then add them to the same availability set. For example, if on the busiest day of the week it takes six machines to handle all of your traffic, you will need to create six instances. From there, you will install your application on those instances, configuring them to handle traffic etc., then add them to an availability set with the other five machines. For Alfresco, initially we are using two VMs. Figure 9: Auto Scaling Scheduling Page 12 of 14
Conclusion This paper describes a common deployment scenario for Alfresco Enterprise and how it can be deployed in the Azure environment, in a manner that is highly available, can scale up and down, and provides a storage option that is both highly durable and low cost. The Auto scaling Application Block allows you to automatically scale out the number of Alfresco instances to closely match the demands of the application. This is an effective technique for controlling the running costs while addressing your ECM s scalability needs on- demand. By leveraging deployment services such as Azure automation to create a deployment, you also are assured that the results are easily portable, and will have a repeatable and predictable output every time. Further Reading 1. Alfresco on Azure: http://cloudly.io/alfresco 2. Alfresco Enterprise on Azure : Implementation Guide : http://cloudly.io/alfresco/azure_alfresco_enterprise_implementation_guide.pdf 3. Alfresco Enterprise on Azure : Automation Template : http://cloudly.io/alfresco/azure_alfresco_automation_template.ps1 Page 13 of 14
Page 14 of 14