Tableau Server 7.0 scalability

Tableau Server 7.0 scalability February 2012

p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different server configurations using one, two, or three servers in a dedicated, load test environment. We tested two different types of reports with varying degrees of complexity. We tried to simulate real-world usage by having each user perform a variety of tasks including loading the report, performing a selection, filtering the view, and changing tabs. This paper describes those tests and outlines techniques for improving Tableau Server s performance. Our test results indicate near-linear scaling with predictable user response times. Based on testing and the conservative estimate that the number of concurrent users on the system at any one time is 10%, we were able to demonstrate that Tableau Server can scale to support 1260 users on a single, eight-core server all the way to 3130 users on a clustered, 3-node, 24-core server deployment. Since the results show linear scaling, larger user deployments can be supported by adding additional nodes to the system. Results for a simple workbook: Number of Total number of Number of Servers concurrent users users 1 126 1260 2* 226 2260 3* 313 3130 * See Server Configuration section guarantee of client response times. These benchmarks results were performed in a controlled lab environment, without other applications running during execution. Actual results will vary based on a number of variables including, but not limited to, load type, hardware, network speed, browser settings, and database performance. Scaling from user to enterprise At Tableau, we know that data visualization significantly improves the ability to understand information. However, rather than analyzing data in text form and then creating visualizations of those findings, we invented a technology by which visualization is part of the journey and not just the destination. This invention is called VizQL. As users found out how easy it was to create their own data visualizations and how much value they provided to the business, Tableau s reputation quickly grew. What began with the user in mind, quickly gained interest at the enterprise level. In January 2012, we launched version 7.0 of our software. Many of the enhancements we made were in response to the growing demand for our products to support large and enterprise-wide deployments. As more users discover the power of visuzalization, along with self-service analysis and reporting, IT finds itself being asked to configure and manage the Tableau software and servers to support a larger number of users, groups, and interactions. Results for a complex workbook: Number of Servers Number of Total number of concurrent users users 1 39 390 2* 81 810 3* 112 1120 * See Server Configuration section Please note that these tests show results from a specific test configuration and should not be taken as a It s quite natural then that a great topic of interest for CIOs, IT managers, and IT architects is the scalability of the Tableau Server. They want to first be assured that Tableau can support an enterprise deployment and then they want to gain an understanding about what to expect in terms of performance in order to help guide architectural decisions. However, in order to understand scalability, you have to be able to get your mind around multiple dimensions and that s not easy!

p3 Scalability is not black and white Every environment is unique and there are many variables that impact performance. And the complexity grows exponentially when you consider the impact of variables in different combinations. These variables include: Hardware considerations: Server type, disk speed, amount of memory, processor speed, and number of processors. Architecture: Number of servers, architecture design, network speed/traffic, data source type, and location. Usage: Workbook complexity, concurrent user activity, and data caching. large groups of users, we needed a way to test and refine these features. We wanted the testing to be as true-to-life as possible, replicating even the toughest business conditions. So, what better place to test than Tableau Public? Tableau Public is a free service that lets anyone publish interactive data to the web. Once the data is uploaded, anyone can interact with the data, download it, or create their own visualizations with it--no programming skills needed. Tableau Public has been very successful and claims such users as the Wall Street Journal, Pan American Health Organization, The Guardian Datablog, and Oxford University. It s been used to help illustrate research, news, blog posts, and even to support non-profit advocacy. Software configuration: Configuration settings of Tableau Server. Data: Data volumes, database type, and database configuration. Changes in any one of these variables may change the scalability anywhere from hardly noticeable to greatly increased. For example, Tableau s performance often improves as more users hit the system. Though it s counterintuitive, what happens is that performance gets better as more data is cached and is now readily available to users. But even caching can behave differently based on other variables, such as the length of time before the cache is set to expire and how many processes the cache is run across. In a simple world, scalability would be as easy as if one server supports 100 users, then two servers would support 200 users. As you already know, it s just not that simple. Eating our own dog food: the Tableau Public story Today, Tableau Server is running at a high scale in our own data centers as part of the Tableau Public solution. As we improved the features of Tableau to support very Figure 1: Tableau Public Tableau Public is a free service that lets you create and share data visualizations on the web. The gallery of data visualizations was created by users just like you, using Tableau Public.

p4 Tableau Public supports over 15 million distinct users, serving up over 600,000 views per week. In fact, Tableau Public hit the record of over 94,000 views in one hour in late 2011. How we tested 7.0 We put a lot of thought into how we would test version 7.0 and wanted our tests to be reflective of real-world scenarios. It was not sufficient to simply to hit the server with massive, concurrent requests to max out the system. We wanted to simulate actual usage with users performing different operations across the different reports and dataset. The goal of these tests was to determine the point at which the system reaches maximum capacity, including the total number of users at that point. No more users will be served until another user s session ends. Little s Law helps us illustrate this point very well. Figure 2: Tableau Public traffic trends This dashboard illustrates overall usage trends for Tableau Public. You can see the differences between impressions and user sessions demonstrating interactivity with the published visualizations. The Tableau Public configuration is similar to a corporate deployment of Tableau Server with a few exceptions. While the core components of Tableau Public are the same as Tableau Server, Tableau Public users are limited to extracts of 50 MB or less and are not faced with data security when accessing the data. Tableau Public, runs tens of thousands of queries run every single day. And, although the data sizes are relatively small, they have a high-degree of variability. This made Tableau Public the perfect environment to test our new performance improvements. We were able to design Tableau 7.0 with more predictable performance to support more users. In fact, we have been testing Tableau 7.0 in production on Tableau Public for over six months prior to its formal launch. What s Little s Law? For example, imagine a small coffee shop that has one worker who can only help one customer at a time. People enter, buy coffee, and leave. A basic cup of coffee is served up quickly and more complex drinks take longer. This drives the rate at which they serve people and send them on their way. However, if the number of customers arriving exceeds the number of customers leaving, eventually the coffee shop will fill up and no one else can enter. The coffee shop is maxed out. The variables that determine the maximum number of customers in the shop at any one time are the length of time they spend there, the complexity of their drink order, and the number of workers serving them. We designed our tests with Little s Law in mind, looking to find the maximum users that can be on the system at any point in time. Our variables were the size and configuration of the system, the user session length, and the complexity of the report requests. For our tests, we held the system size and session length constant and ran two different scenarios; one with simple report requests and one with complex report requests.

p5 Server Configuration The servers that we used for testing were Dell PowerEdge R410 servers running Windows Server 2008 R2 Standard Edition. The servers had 16GB of RAM and were running two Intel Xeon 2Ghz quad core CPUs (eight processing cores). We performed a variety of tests, from single server to multiple server deployments. All hardware was identical across the various tests, and we used the default Tableau Server configuration. Configuring the server setup in this way has the potential to more than double the performance, as the application overhead is separated from the VizQL and cache workload. Data We do understand that the database, size, and structure of the data that users need can vary greatly, with some data requiring a direct connection to the database while other data requests may be available as Tableau Data Extracts. For our tests, we used Tableau Data Extracts in order to reduce the data connectivity variable and better isolate the report complexity results. Report complexity Most organizations will have a multitude of reports with varying degrees of complexity, so we wanted to be able to better understand the scaling differences across different types of reports. We ran our tests using two reports a simple one that renders very quickly and a more complex report that renders a lot of data and contains many different zones. Figure 3: Server architecture with three nodes In a distributed environment we separated the VizQL and Web App processes from the repository and gateway in order to better understand load across nodes. Figure 4: Simple Viz The simple workbook does not have much complexity and renders very quickly. In our single server configuration, all processes are located on one, primary server. Our two-server configuration distributes the workload across two worker servers, while keeping the application processes, repository and extracts on a primary server. Our three-server configuration adds a third worker server.

p6 Figure 5: Complex Viz The complex workbook displays a large number of marks and uses different visualizations. The rendering of this workbook is slower than the simple workbook. Session length In real-world deployments, organizations will have different types of users, from casual to power users. Depending on the user, their activity and session lengths will vary. For this test, we used a 60-second user session length. A session length is the amount of time the user is connected to the system. During the session, we performed a variety of interactions, including selections, filtering, and changing tabs. These operations were done randomly across the 60-second session length and assume some think time in between. When looking at the types of users in your organization, please note that shorter user session lengths will put more load on the server since users are performing more actions in a shorter amount of time. Alternatively, users connected for longer session lengths, performing about the same number of actions, will allow more users on the system. 7.0 scalability results When running the tests for 7.0, our goal was to determine the point at which the system is at maximum capacity, or throughput, and the performance of the system begins to degrade. As more transactions are performed on the system, the system eventually reaches a state where no more resources can be requested. These resources include CPU, memory, disk, or network. In the results below, the primary resource that was at capacity was the CPU. As we tested for maximum throughput, we looked at a number of different metrics. These include: Number of transactions per second (TPS). This represents the throughput and performance of the system. A high-performance system will have a higher TPS and will be able to support a higher number of concurrent users. When the system becomes overloaded, the number of transactions that can be supported decreases since there is an increasing number of clients in the queue waiting. Average session length. This is the length of time the user would need to be connected to perform the tasks of the report either the simple report or complex report. It represents the performance that a user can expect to see for the viz. When the system becomes overloaded session lengths will start to increase. Number of clients. This represents the number of concurrent users performing operations on the system. We record the point where the overall system performance begins to decrease to determine the maximum number of concurrent users that can be supported.

p7 Simple report The first set of tests that we performed was based on the simple report, using a variety of configurations from one to three servers. Here are the results: Number of Servers Number of Total number of concurrent users users 1 126 1260 2 226 2260 3 313 3130 The following chart demonstrates the various metrics that were evaluated and their behavior at different load levels. Complex report We performed the same type of tests with a more complex report. Considering the complexity of the operations, we expected and found that the number of users that Tableau Server can support will be less than the test using a simple report. The scalability of the system follows similar patterns as before, but the system will reach its peak sooner. Here are the results: Number of Total number of Number of Servers concurrent users users 1 39 390 2 81 810 3 112 1120 The following chart demonstrates the various metrics that were evaluated and their behavior at different load levels. Figure 6: Simple report server scalability The results demonstrate the scalability of the simple workbook across multiple nodes. When the system is at capacity then the performance of the system decreases. In this simple report scenario, we can see that each of the server configurations handle increasing numbers of transactions up to a point in time where they reach a maximum capacity and must wait for resources to free up. It s at this point that the user s session lengths will increase. As you can see, session lengths remain fairly constant until the system reaches maximum capacity. In our testing, CPU utilization was the overloaded resource at maximum capacity. Figure 7: Complex report server scalability The results demonstrate the scalability of the complex workbook across multiple nodes. results indicate nearlinear scaling with predictable user response times. Scalability considerations Most of the scalability questions we get center around how to determine the system configuration or how to optimize an existing system for a given user-base size. Supporting concurrent users Maximum throughput can be used to determine the number of concurrent users a given system can sustain. The number of concurrent users is a function of the workload (or throughput) that each client is executing and

p8 the amount of time over which that workload is executed. This time period is part of the overall session length, which includes both user interactions and think time. Determining system size for a given user base means understanding how those users will use the system (level of complexity of interactions), how long they are likely to stay connected (or session expiry length), and how much think time they re likely to have in between interactions. For initial estimation purposes, feel free to use our test results. For sizing considerations, the conservative concurrency rate is estimated to be 10% of the total number of all licensed users at any one point in time. Best practices for optimization In addition to a system that is optimally designed, there are best practices that can be used to greatly improve performance and reduce the average response time. Use Tableau data extracts. If your database queries are slow, consider using extracts to increase query performance. Extracts store data in-memory so that users can access the data without making direct requests to the database. They can be easily filtered when users don t need the detail, significantly improving response time. The extract engine can be distributed to a local extract engine on a separate machine to maximize performance if necessary. Final thoughts The test results indicate a linear scaling system with predictable user response times. Based on testing we were able to demonstrate that Tableau Server can scale to support the demands of the enterprise, from 1260 users on a single server all the way to 3130 on a 3-node server deployment. Since the results show linear scaling, larger user deployments can be supported by adding additional nodes to the system. Your organization will likely have a different architecture, mix of reports, and differing levels of interactivity. These tests are meant to be demonstrative of the ways in which the enterprise-class Tableau Server can scale, and the expected number of users that a particular configuration might be able to support. Your mileage will vary. Use this data as guidance for your own deployment. Schedule updates during off-peak times.often, data sources are being updated in real-time but users only need data daily or weekly. Scheduling extracts for off-peak hours can reduce peak-time load on both the database and Tableau Server. Avoid expensive operations during peak times. Publishing, especially large files, can be a very resourceconsuming task. While it may be difficult to influence login behavior, it s often easy to influence publishing behavior. Ask users to publish during off-peak hours, avoiding busy times like Monday mornings. Cache views. As multiple users begin to access Tableau Server, the response time will initially increase due to the contention for shared resources. With caching turned on, views from each request coming into the system will be cached and then renders much more quickly when the next user requests the same view.

p9 About Tableau Tableau Software helps people see and understand data. Ranked by Gartner in 2011 as the world s fastest growing business intelligence company, Tableau helps individuals quickly and easily analyze, visualize and share information. With more than 7,000 customers worldwide of all sizes and across industries, Tableau is used by individuals throughout an organization, in an office and on-the-go. See the impact Tableau can have on your data by downloading the free trial at www.tableausoftware.com/trial. Copyright Tableau Software, Inc. 2012. All rights reserved. 837 North 34th Street, Suite 400, Seattle, WA 98103 U.S.A.