Liferay Portal s Document Library: Architectural Overview, Performance and Scalability
Table of Contents EXECUTIVE SUMMARY... 1 HIGH LEVEL ARCHITECTURE... 2 User Interface Layer... 2 Service Layer.... 2 Storage Layer.... 3 SCALABILITY.... 4 Application Servers... 4 Multi-cluster Configurations.... 4 Storage Components.... 5 Metadata.... 5 Document Binaries and Digital Assets.... 5 Search Components.... 6 PERFORMANCE.... 7 Environment Configuration.... 7 Methodology.... 8 Benchmark Results.... 8 CONCLUSION.... 9 MOVING FORWARD.... 9 Contact Us.... 9 Get a Free Trial.... 9
Executive Summary LIFERAY PORTAL S DOCUMENT LIBRARY According to leading analysts, enterprises will face roughly a doubling in the quantity of documents, rich media, and social media content they must manage each year. With such an explosion in content, many organizations are ill-equipped to manage this data influx. Organizations that have previously invested in content management tools face pressure from stakeholders, customers, and partners to provide agile, easy-to-use tools for content creation, collaboration, and updating. Liferay Portal offers IT organizations a balance of practical tools and strong workflow processes crucial to the management of content, with a Document and Media Repository that fulfills key use cases such as: Document and Media Management with asset management features such as content lifecycle management, version control, check-in, check-out, search, and retrieval capability. Categorization and Tagging with the ability to assign taxonomies to content for easy browsing and searching. Faceted Search for users to search for content metadata and within documents, PowerPoints, PDFs, and more, as well as the ability to browse search results via various facets, including asset type, taxonomy, and modification dates. Collaborative Knowledge Repository with social rating, comments, and faceted search to easily find content, related content, tag suggestions. Multi-channel Presentation and ability to present content via multiple channels, including traditional web and mobile access. This paper will examine the overall architecture of the Liferay Document and Media Repository, how to scale it to keep pace with rapid content growth, and its notable performance characteristics. 1
High Level Architecture LIFERAY PORTAL S DOCUMENT LIBRARY Liferay s Document and Media Repository follows a very typical architecture for content management systems including user interface, service, and storage layers. Each layer is designed to scale vertically or horizontally depending upon system requirements. Figure 1 illustrates the three architectural layers for further discussion. Media and Document Gallery Web UI Media and Document Display Sync Mobile Sync Desktop Web Services Security and Authorization Services Metadata Asset Repository Asset Processing Social Services Workflow Search Search Engine Metadata Store Asset Store Search Indices Figure 1 - Liferay High Level Architecture USER INTERFACE LAYER User Interface Components Document and Media Gallery Web UI: A rich user interface to access repository contents, upload new content, organize content, and more. Liferay Sync Mobile: Provides access to the Liferay repository from ios and Android devices. Liferay Sync Desktop: Provides two-way synchronization of content and documents between the repository and desktop environments. SERVICE LAYER Content and Document Management Services Metadata: Metadata extraction and processing services. Versioning: Revision control, versioning, history. Workflow: Business process management services. Custom Content Types: User-defined metadata capturing and storage. 2
Search Engine Services Content search facilities including full document and metadata search. STORAGE LAYER Asset Storage Binary asset storage for content files such as MSOffice documents, PDFs, instructional videos, audio files, and images. Metadata Storage Database storage for content metadata. Can be in relational or NoSQL database. Search Indices Storage Storage for search indices used by the search engine. 3
Scalability Figure 2 illustrates a reference deployment architecture for Liferay s content management solution. The deployment environment consists of load balancers, web servers, application servers, and storage servers: Web servers provide additional services like compression, URL rewriting, and static resource caching. Load balancers, whether they are software (e.g., within Apache) or hardware (e.g., F5 BigIP, Cisco), intelligently direct load between servers. Load balancers may be deployed between web and application servers and/or in front of web servers. Application servers host the Java application server within which the Liferay solution is executed (e.g., Tomcat, WebSphere, WebLogic). Database, Search, and Repository servers store relational data, search indices, and binary files, respectively. Web Servers Load Balancer Application Servers Database Servers Search Indices Repository Servers Figure 2 - Reference Deployment View Each component of the architecture can be scaled independently based upon your performance requirements. For the purpose of this discussion, we will focus on scaling the application server, storage, and search components. Application Servers As a JEE (Java Enterprise Edition) application, Liferay s content management solution resides within a Java enterprise application server (e.g., Tomcat, JBoss, GlassFish, WebSphere, WebLogic). You may choose to either deploy on a single node (vertically) or cluster Liferay across multiple nodes (horizontally). Most tend to choose the horizontal scaling path. MULTI-CLUSTER CONFIGURATIONS In addition to traditional clustering, you may choose to create multiple clusters dedicated to servicing different types of clients. 4
Figure 3 illustrates how a Liferay deployment can be split into two logical clusters: one to handle user requests from mobile devices, another from traditional web browsers. This ensures optimal experience in accordance with the user s client. Web Server and Load Balancers Web UI Cluster Sync and Sync Mobile Figure 3 - Native and Browser Deployment Clusters Storage Components While scaling the application server components helps tremendously with performance, we must also properly scale in the storage components. Properly-scaled storage components will better cope with your enterprise s rapidly growing content needs. Liferay s Document and Media Repository, like other content management solutions, requires storage for the following information: Metadata: Various metadata associated with a given asset (e.g., contract effective date, author). Document Binaries and Digital Assets: Typically the asset binaries (e.g., document, video, audio file). METADATA Metadata information is most often captured and stored in a RDBMS (e.g., Oracle, MySQL, DB2, MSSQL). The RDBMS can be clustered (e.g., Oracle RAC) or replicated with separate databases dedicated for reading and writing. Database scaling is generally beyond the scope for scaling Liferay. However, Liferay provides support for read/write database splitting and thus facilitates database scaling. Metadata information also tends to be unstructured data. With recent advances in Big Data and NoSQL technology, NoSQL databases like MongoDB and Cassandra provide an added degree of scalability for unstructured data. Liferay s metadata storage layer can leverage metadata storage in NoSQL databases. DOCUMENT BINARIES AND DIGITAL ASSETS Similar to metadata storage, you have a choice of how to store document binaries and digital assets. Asset binaries (e.g., PDF, Word document, video) can be stored on a file-based storage device, in the database, in the cloud, or on a distributed file system. A file-based storage device tends to be the preferred choice since SAN (storage area network) or NAS (network attached storage) tends to provide the highest IO performance. However, this storage mechanism also comes with higher scalability costs. 5
Another available choice is storage in a relational database (RDBMS). Storage into a RDBMS enables the solution to benefit from the replication and backup facilities available in many RDBMS implementations. This approach tends to be the slowest in performance since streaming large Binary Large Objects (BLOBs) in and out of a RDBMS is quite expensive. Cloud storage ensures availability of virtually boundless amounts of storage. For those looking to take advantage of the cloud for storage, Liferay provides support for binary storage in cloud storage facilities like Amazon S3 and Rackspace CloudFiles. While providing unlimited capacity, cloud storage also incurs higher network latency and reduced IO speeds. Finally, Liferay provides you the option to store using distributed file systems like GridFS and HDFS. Made popular by Big Data and NoSQL tools, distributed file systems help bring the concepts of sharding to the file system. SEARCH COMPONENTS The final layer we will examine is the search component layer. When considering scalability of search services, one must take into consideration two different types of requests: Index read (or search) requests - e.g., find the word Liferay in a set of documents. Index write (or update) requests - e.g., add a new PowerPoint presentation to the system and fully index its contents. The two operations provide inherently different performance characteristics. In most content management systems, search requests substantially outnumber indexing requests (e.g., 85% searches but only 15% updates). Also, search requests are more CPU bound as the search engine browses cached indices and computes matches. Index update requests, on the other hand, tend to be more IO bound. Liferay s content management solution provides a choice of search engines. For simplicity, you may choose to use the embedded search engine. However, for more complex scenarios, you may wish to deploy an enterprise search appliance like Solr, Google Search Engine, FAST Search Engine, Endeca, or Autonomy. Liferay s search architecture allows for easy swapping of search engines by simply deploying a new search engine adapter. Each search engine will have its own scaling capabilities. However, if you choose to use Apache Solr, it provides both read/write clustering and also index sharding for improved resiliency and load balancing. 6
Performance Having explored the solution s scalability aspects, we will now examine how Liferay s solution performs in a variety of test scenarios. Liferay s engineering team performed intensive tuning and testing to demonstrate the scalability of Liferay Portal EE for several use cases, including content and document management. We will focus on examining the content and document management results. ENVIRONMENT CONFIGURATION The benchmark environment conforms to the reference deployment architecture. It consists of the following tiers: 1. Web Server Tier: Delivers static content elements such as images, rich-media, and other static files such as style sheets. 2. Application Tier: Hosts Liferay-supported application servers such as Tomcat, JBoss, Oracle WebLogic, and IBM WebSphere. (Please see LPEE support matrix for additional platforms.) 3. Database Tier: Hosts Liferay-supported database servers such as MySQL, Oracle, MS SQL, IBM DB2, and Postgres. (Please see LPEE support matrix for additional platforms.) For simplicity, Liferay opted not to insert a firewall or a hardware load balancer into the benchmark environment. Application Tier Web Tier Apache Web Server Application Server Application Server Database Tier Database Server Figure 4 - Performance Testing Environment Hardware platforms: Web Server 1 x Intel Core 2 Duo E6405 2.13GHz CPU, 2MB L2 cache (2 cores total) 4GB memory, 1 x 146GB 7.2k RPM IDE Application Server 2 x Intel Core 2 Quad X5677 3.46GHz CPU, 12MB L2 cache (8 cores and 16 threads) 16GB memory, 2 x 146GB 10k RPM SCSI Database Tier 2 x Intel Core 2 Quad X5677 3.46GHz CPU, 12MB L2 cache (8 cores and 16 threads) 16GB memory, 4 x 146GB 15k RPM SCSI Network: Gigabit network between all servers and test clients 7
Software: Liferay Portal 6.2 Enterprise Edition Sun Java 7 (1.7.0_50) Tomcat 7.0.40 CentOS 6.4 64-bit Linux MySQL 5.6.13 Community Server Apache HTTPD Server 2.2 Grinder 3 load test client with Liferay customizations METHODOLOGY Liferay utilized the Grinder load testing tool and its distributed load injectors. In all test scenarios, the injectors ramped up users at a rate of one user every 100 milliseconds until achieving the desired virtual user load. The benchmark data was gathered after an initial ramp up time of five minutes to initialize all application elements and warm up all of the injectors. As part of data gathering, the following statistics were gathered: OS level statistics on web, application, and database servers (including CPU, context switches, and IO performance). JVM garbage collection information using Visual VM and garbage collector logs. Average transaction times, standard deviations, and throughput obtained from the Grinder console. A single application server was used to determine maximum throughput. Once the maximum throughput was reached on a single server, Liferay added a second application server to prove the linear scalability hypothesis: that doubling the available application server hardware will double the maximum number of virtual users supported by the system. BENCHMARK RESULTS The document repository performance test cases demonstrate the typical usage scenarios with users browsing for files, viewing file details (e.g., metadata, comments, ratings), downloading files, and finally uploading new files. The testing environment removes potential network bottlenecks by providing fast network connections between clients downloading files and the document repository (1Gbps). As shown in the below table, overall transaction times for browsing, viewing, uploading, and downloading documents remain sub-second across most transactions. At the performance inflection point of 10,000 users, 95% of file downloads occurred in 0.137s for a 100KB document. Document upload times for a 100KB document with 10,000 virtual users remains under 0.85s for 95% of the users. Virtual Users Duration (min) Browse Folder μ(ms) Browse Folder σ(ms) View File Details μ(ms) View File Details σ(ms) Download File μ(ms) Download File σ(ms) Upload File μ(ms) Upload File σ(ms) 7500 30 100 22 30.5 12.9 6.57 8.18 193 31.7 8000 30 109 32.1 33.4 17.1 7.13 12.2 212 46.8 9000 30 133 75.1 41.0 43.8 10.5 35.1 261 119 9500 30 142 67.7 43.3 33.7 9.97 22.3 281 107 10000 30 196 131 64.5 75.6 21.8 57.7 376 219 Document Library 8
Document Repository Activity Time 400 350 Upload File 300 Mean TXN Time (ms) 250 200 150 100 50 0 7500 Browse Folder View File Details Download File 8000 9000 9500 10000 Concurrent Users Document Repository Mean Time Some key findings of the study are: 1. The Liferay Portal Document Repository easily supports 10,000 virtual users per server while accessing over 2 million documents in the document repository. 2. Given sufficient database resources and efficient load balancing, Liferay Portal can scale linearly as one adds additional application servers to a cluster. For further information, please see the whitepaper Liferay Portal Performance - Benchmark Study of Liferay Portal 6.2 Enterprise Edition. Conclusion With the Liferay repository, enterprises will be able to implement a lightweight content management solution that: Provides an easy-to-use user interface for accessing content and collaborating on content. Provides multi-platform access for web and mobile computing. Is deployable to both public and private cloud-based environments. Easily scales to meet the demands of growing enterprise content. Moving Forward CONTACT US For more information about Liferay Portal, contact sales@liferay.com. GET A FREE TRIAL Experience Liferay Portal for yourself by downloading a free trial at /free-trial. 9
Liferay is a provider of leading enterprise open source portal and collaboration software products, used by major enterprises worldwide including Allianz, Carrefour, Cisco Systems, Danone, Lufthansa Flight Training, Rolex, Siemens, Société Générale, Toyota and the United Nations. Liferay, Inc. offers professional services, technical support, custom development, and professional training to ensure successful deployment in the most demanding IT environments 2014, Liferay, Inc. All rights reserved. 140709