1 Enterprise Vault Field Enablement TECHNICAL PAPER Enterprise Vault TECHNICAL PAPER Sizing an Enterprise Vault Solution for Exchange Archiving Version: V1.1 Released: February 15 th, 2007 Relates to: Symantec Enterprise Vault 7.0 Author: Andy Joyce Senior Manager, Enterprise Vault Field Enablement
2 Sizing an Enterprise Vault Solution for Exchange Archiving Page i Table of Contents Introduction...1 Purpose of this Whitepaper...1 Target Audience...1 Acknowledgements...1 Terminology...2 Overview: What Is Involved In Sizing An Enterprise Vault Solution?...3 Main Outputs of EV Sizing...3 Main Factors to Consider...4 Part 1: Sizing Estimates for Presales Scenarios...6 Typical Values Used in Sizing Estimates...6 Rules of Thumb Used In Presales Sizing Estimates...7 Factors That May Affect These Rules of Thumb...11 When One EV Server May Not Be Enough...13 Example: Sizing for Mailbox Archiving, Journal Archiving and PST Migration for a 5000 user site...14 Part 2 - Detailed Sizing...18 Getting Data on Which to Base a Detailed Sizing...18 Using the Symantec Exchange Store Reporter (ESR)...21 Using Exchange Mailbox Store Sizes to Determine the Mailbox Backlog...23 Using Exchange System Manager to Determine the Mailbox Backlog...24 Using Exchange Message Tracking...24 Using SMTP Gateway Statistics...25 Using Exchange Transaction Logs...25 Using 3 rd Party Data Capture/Analysis Tools...25 Using Performance Monitor...25 Getting Data about PST Files...26 Estimating EV Server Throughput Requirements...26 Throughput for Mailbox Backlog Archiving...27 Throughput for PST Migration...28 Throughput for Mailbox Steady-state Archiving...29 Throughput for Journal Archiving...30 Throughput for Combined Mailbox and Journal Archiving...32 Throughput for Public Folder Archiving...32 Effect of Message size on Archiving Throughput...33 Effect of Large Numbers of Mailboxes...33 Estimating Storage Requirements...34 A Note on Sizing for Specialized Storage Devices...34 Storage required for Mailbox Backlog Archiving...34 Vault Stores Mailbox Backlog...34 Indexes Mailbox Backlog Archiving...35 SQL Databases Mailbox Backlog Archiving...36 Storage required for Mailbox Steady-state Archiving...36 Vault Stores Mailbox Steady-State Archiving...38 Indexes Mailbox Steady-State Archiving...39 SQL Databases Mailbox Steady-state Archiving...39 Storage required for PST Migration...39 Vault Stores - PST Migration...40 Indexes PST Migration...40 SQL Databases PST Migration...41 Storage required for Journaling...41 Vault Stores Journal Archiving...43 Indexes Journal Archiving...44 SQL Databases Journal Archiving...44 Storage required for Public Folder archiving...44 Additional Reading...45 APPENDIX Working Out Journaling Fan-Out Factors i
3 Sizing an Enterprise Vault Solution for Exchange Archiving Page 1 Introduction Purpose of this Whitepaper The purpose of this whitepaper is to give guidance to Enterprise Vault specialists and consultants in sizing an EV solution; to determine the number and specifications of the EV servers, and the amount of storage that will be required by the solution for the first few years. After an initial Overview section, introducing some common concepts, this whitepaper has two main sections; one section focusing on producing a sizing estimate for presales, which is based on assumptions and rules of thumb, and another section focusing on a more detailed sizing estimate, using actual data, and more in-depth calculation. This type of sizing is more typical in a postsales scenario, usually as part of a formal Solution Design, or in presales when a prospective customer might require a greater level of detail and accuracy to will be included in the solution sizing. This whitepaper concerns itself with sizing an EV solution for Exchange and PST file archiving only; it does not include archiving from File Systems, Microsoft SharePoint Services, Lotus Domino, SMTP, IM or any of the other archiving sources that Enterprise Vault supports. These may be the subject of future whitepapers. This whitepaper is not intended to be a replacement for engaging qualified EV architects to do a full Solution Design for larger and/or more complex sites. Target Audience This whitepaper is targeted at Symantec and partner specialists and consultants who are required to size an Enterprise Vault solution. This document assumes that the reader has a good knowledge of the functions and architecture of the Enterprise Vault for Microsoft Exchange solution, and knowledge of Microsoft Exchange Server and common Microsoft Windows management tools and utilities. Acknowledgements The author would like to thanks the following people for their assistance in developing and reviewing this whitepaper: Steve Blair, Alex Brown, Mark Davies, Chris Dooley, Jason Jensen, Glenn Martin, Wes Smith, Gareth Bridges and Matt Walsh.
4 Sizing an Enterprise Vault Solution for Exchange Archiving Page 2 Terminology Although it is expected that the reader will have the prerequisite knowledge of Enterprise Vault and Microsoft Exchange, some terminology may not be familiar and so it is explained in the table below. Backlog Steady-State Single-Instance Storage (SIS) Single-Instance Ratio (SIR) Sharers Addresses Exchange Store Reporter (ESR) When archiving is first enabled for an Exchange Server (or, more accurately, for selected mailboxes on the Exchange Server), there will usually be a number of messages that already meet the eligibility requirements for archiving. For example, messages that are older than 60 days. These messages constitute the backlog which must be archived before the archiving system enters steady-state. Once the backlog has been cleared, each day an incremental number of messages from each mailbox is archived. This is what is known as steady-state. Steady-state is somewhat of a misnomer as there may be some fluctuation in the message of messages archived each day, and there may also be some growth over an extended period. However, for the purposes of EV design and sizing, the term steadystate is used. Both Microsoft Exchange and Enterprise Vault have the concept of single-instance storage or SIS. This is where storage is optimized by only a single copy of a message being held, although it may be referenced in several mailboxes or Archives. Within Exchange, the boundary of SIS is the Mailbox Store. Within Enterprise Vault, it is the Vault Store Partition although there are a couple of circumstances that may affect this and are discussed later in this whitepaper. The ratio of references to messages to actual copies held is the single-instance ratio or SIR. For example, if there are 1500 references to 1000 messages then the SIR is 1.5. Sharers is a term used to describe the mailboxes or Archives that have references to a particular message. For example, if a message is sent from Fred to Greg and Harry, and all three mailboxes are located in the same Mailbox Store then all three are initially Sharers of the single copy that is actually stored in Exchange. If one of them deletes their copy (actually, their reference ) then the number of sharers is now reduced to 2. A similar concept exists for single-instance storage within Enterprise Vault. In the context of this whitepaper, the term Addresses is used to refer to all Addresses, including the sender, of a message. In the above example, Fred, Greg and Harry would all be classed as Addresses. The Exchange Store Reporter or ESR is a stand-alone tool developed as a side-project by the Enterprise Vault Engineering team. It is used to analyze the contents of Exchange server databases to report on the number and size of messages within certain 15-day age bands. This is output to an XML file that can be loaded into a variety of analysis spreadsheets to produce a presales or post-sales style solution sizing report. At the time of writing this whitepaper there is no standard spreadsheet and Symantec System Engineers and Consultants in various regions use their own favorite version.
5 Sizing an Enterprise Vault Solution for Exchange Archiving Page 3 Overview: What Is Involved In Sizing An Enterprise Vault Solution? Main Outputs of EV Sizing There are two main outputs when sizing an EV solution: Servers: How many EV and SQL servers will be needed to process the estimated daily archiving load, or the initial backlog from mailboxes, Public Folders and PST files, and what are the specifications of these servers? If multiple EV servers are required, where will the EV servers be located? Will these servers be able to cope with the daily archiving in two or three years time, after message volumes or mailbox numbers have increased? Storage: How much storage is required for the Vault Stores, Indexes and SQL databases, both initially (for the backlog and PST migration) and for the ongoing steady-state mailbox, Public Folder and Journal Archiving for the next 2 to 3 years. Note that it is generally unwise to size an EV solution beyond two or three years ahead. The benchmarks, estimates and assumptions on which the sizing is based are unlikely to be valid beyond that time. Additionally, server and storage technology changes, and what may appear to be an outrageous storage requirement now may be considered insignificant in five years time. From a presales perspective, all that is typically required is an indication of the number and specifications of the servers required, including SQL servers, and an estimate of how much storage will be needed. Sometimes it will be sufficient to provide the customer with ballpark estimates based on typical scenarios. Other times, the customer will require a more detailed analysis based on parameters specific to their own environment; for example, mailbox analysis provided using the Symantec Exchange Store Reporter or their own analysis or estimates of messaging volumes and characteristics within their organization. Strictly speaking, this level of detailed analysis and design ought to be a part of a formal Professional Services engagement, but it is acknowledged that customers in different regions/segments may have differing expectations regarding the scope and cost of a presales sizing exercise. For post-sales sizing, the level of detail and accuracy required is usually greater; especially if part of a formal Solution Design. In these cases, more detailed calculation (rather than ballpark estimates) is required; although this more detailed sizing will still be based in part on a number of stated assumptions and typical values. In both cases, it may be necessary to determine the potential network usage caused by archiving from remote Exchange servers back to a centralized EV server, or using a remote EV server and a centralized SQL server. This is required to assess the overall topology of the EV solution; to determine the server hardware, licensing costs and implementation effort required.
6 Sizing an Enterprise Vault Solution for Exchange Archiving Page 4 Main Factors to Consider The main factors which need to be considered when doing an EV sizing are: The amount of message data that needs to be archived. We need to consider the following: o How much is in the backlog? o How much is in PST files? o How much will be archived from mailboxes each day? o How much will be archived from Public Folders each day? o How much will be archived from Exchange Journal Mailboxes each day? The time available to do the archiving. We need to consider the following: o Does the backlog need to be cleared in a specified time? For example, if mailboxes are being reduced in size prior to a planned Exchange migration. o Does the PST migration need to be completed in a specified time? For example, before a planned file server consolidation or desktop refresh? o How much time will be available each day to do the steady-state Mailbox and Public Folder archiving, not coinciding with periods of peak user activity, Exchange and Enterprise Vault backups and scheduled daily system maintenance? o What is the main period of activity for Exchange Journaling, so that we can configure an EV solution to keep up? o For daily archiving, how will this change over the next two to three years as mailbox numbers increase or decrease, and message volumes and sizes increase? The distribution of mailboxes across multiple Exchange Servers. We need to consider the following: o How many mailboxes will be enabled for archiving in total? o How will these be split across Exchange servers? o How many of these Exchange servers are remote (i.e. across a WAN link from where the EV solution will be located)? If there are some, consider the two additional questions: What is the usable network capacity for archiving from these remote Exchange servers? What is the required network capacity to successfully archive the estimated amount of data in the available time across these network links? How will all these factors change over time? For the steady-state mailbox and Public Folder archiving, and for Journal Archiving, how will this change over the next two or three years as mailbox numbers increase or decrease, and message volumes and sizes increase? What other EV components will run on the EV servers? Will file archiving, SharePoint archiving or archiving from other sources be configured on these EV Exchange archiving servers? What else will run on these EV servers? It is strongly recommended that, in all but the smallest sites, only Enterprise Vault (and possibly an SQL server, dedicated to EV use) is installed on the EV servers. Backup agents and antivirus software are okay, but customers should be discouraged from running other applications on the EV server, or using the EV server as the backup master server. Doing so will potentially lead to degraded archiving performance, potential retrieval delays or service disruptions. It is strongly recommended that Enterprise Vault is not installed on a production Microsoft Exchange Server. User access rarely plays a significant part in sizing an EV solution. It may sometimes be a factor to consider in the overall design in terms of whether it is acceptable to have any latency when a remote user accesses an archived item, but it does not generally figure in the calculations required to determine the EV server characteristics. It is generally accepted that access to archived items is minimal, and some latency when accessing an archive across a WAN is a trade-off for the reduced cost of a fully centralized EV solution. As an example, an Australian financial services company with approximately 3500 users has a 14-day archiving policy and also allows its users to manually archive on demand. This is a customer where information and communication is key to their business so that the vast majority of users are extremely active within
7 Sizing an Enterprise Vault Solution for Exchange Archiving Page 5 everyday. With the aggressive 14 day archiving policy, they typically archive between 25 and 30 messages per mailbox per workday. However, each day, less than one third of these users access an archived item, and those users only access an average of around 4 archived messages each per day. As can be seem from this example, even with a very aggressive archiving policy the frequency that users access archived items is not significant.
8 Sizing an Enterprise Vault Solution for Exchange Archiving Page 6 Part 1: Sizing Estimates for Presales Scenarios This section is aimed primarily at producing sizing estimates in a presales scenario, where the time and data required in order to produce a more detailed analysis and sizing estimate may not be available. These presales estimates are based on rules of thumb and assumptions, obtained from years of Symantec Enterprise Vault specialists observations in the field. Typical Values Used in Sizing Estimates The following are considered typical at the time of writing this whitepaper: The average size of a current Microsoft Exchange message is around 75KB. A typical office-worker user sends and receives between 60 and 80 messages per workday. 70 is used as a typical value. Of these messages, between 15 and 25 will remain in the user s mailbox to be archived a few weeks later (the typical age-based archiving policy is about 60 days). 20 messages archived per mailbox per day is a good average to use for presales sizing. The average number of internal addresses of a message is around 3. However, by the time a message has aged to the point it is eligible for Mailbox Archiving typically only between 1.2 and 1.5 sharers of that message remain. 1.3 is a typical value used. A higher number of sharers of items in PST files is assumed; 2.4 is a typical value used. This is due to more useful items being moved to PST files and not being deleted subsequently. Messages, including any attachments, will typically be compressed within Enterprise Vault by between 30% and 50%, depending on the nature of any attachments. For example, if the organization has a higher than typical number of image, audio or media attachments the compression ratio may be to the lower end of the scale. 50% compression is usually used for presales sizing. A typical site will have between 4 and 8 hours per day that can be dedicated to Mailbox Archiving. This is to avoid work hours and backups (including Enterprise Vault). Usually for a presales sizing, 6 hours per day is assumed, and it is assumed that archiving will be run 5 nights per week. Prior to archiving, approximately 70% of a typical Exchange Mailbox Store is older than 60 days. According to Symantec benchmarks 1, a properly configured dual-processor EV server is capable of archiving approximately 25,000 messages per hour. A quad-processor is capable of archiving approximately 40,000 items per hour. A single-processor server could archive around 15,000 messages per hour. These benchmarks are based on an average message size of 70KB, with a typical mix of messages with and without attachments, and a typical mix of common attachment types. The storage consumed by Medium Indexing is typically around 8% of the original size of the messages. So, for 100GB of unarchived messages, the resulting index will be approximately 8GB when they are archived. For Full Indexing, it is typically around 12%, and for Brief Indexing it is around 3%. Unless requested otherwise, Medium Indexing is used except in Asian countries where Full Indexing is required in order to support the Asian written languages. It is assumed that the organization will have between 250 and 260 official workdays per year, and storage sizing is typically based on this. Additional sent or received on weekends and holidays are factored into these figures. Generally, 250 work days per year is assumed. 1 Symantec Enterprise Vault EV 7.0 Performance Guide, February 2007
9 Sizing an Enterprise Vault Solution for Exchange Archiving Page 7 Rules of Thumb Used In Presales Sizing Estimates Combining the typical values covered in the previous section gives some simple sizing rules of thumb that can be used in presales sizing (and in post-sales formal designs when actual data is not available or cannot be relied upon). These rules are based on a very simple model, with zero growth in message sizes or daily volumes, and with a single Vault Store Partition (i.e. a single EV Server running a Storage Service) to maximize singleinstance storage. These additional factors are included in the detailed sizing later in this document. Rule of Thumb #1: A dual-processor EV server can support Mailbox Archiving from up to 7500 mailboxes Based on: Average of 20 messages archived per day from each mailbox Typical mix of messages with and without attachments, typical selection of attachment types 6 hour archiving window EV server capable of archiving 25,000 messages per hour Properly configured EV server, 2.8Ghz XEON, with minimum 2GB RAM, dedicated mirrored pair of physical disks for MSMQ data files, locally attached or SAN storage (or NAS storage over LAN-speed network), adequate SQL server Notes: A single-processor EV server (15,000 messages/hour) could support about 4,500 mailboxes A quad-processor EV server (40,000 messages/hour) could support about 12,000 mailboxes This Rule of Thumb has been derived by multiplying the average number of messages to archive per day from each mailbox by the total number of mailboxes, and then dividing this into the typical number of hours that might be available for mailbox archiving during a weekday. 20 messages per mailbox per day is a typical figure, based on Symantec field observations, and is in the range commonly seen when a customer has a moderate age-based archiving policy (e.g. greater than 60 days). Rule of Thumb #2: Storage required for 100GB of Mailbox Archiving Backlog 48GB Vault Store (i.e. roughly 50% of original size) 8GB Indexes (Medium level indexing) (i.e. 8% of original size) 1GB SQL Based on: Typical mix of messages with and without attachments, typical selection of attachment types Average message size, in backlog, of 50KB Single Vault Store Partition Single-instance ratio of 1.3 Compression of 50% Notes: This can be scaled linearly for more or fewer mailboxes, or for larger or smaller mailboxes For Full indexing, double the Index size, or for Brief indexing, half the Index size This Rule of Thumb is based on the size of the Backlog ; that is the portion of the current Exchange Mailbox Stores that immediately meets the archiving policy. The SQL size is simply the estimated number of items (averaging 50KB) in 100GB of backlog, multiplied by 500 bytes per item; this is the average size of the SQL entry added per item. The Medium Index size is simply the original size of the backlog messages, multiplied by 8%. The Vault Store size calculation is done by multiplying the total backlog size by a typical compression factor (50%) and dividing by a typical single-instance factor (1.3) due to Enterprise Vault s singleinstance storage within a Vault Store Partition. Then, 5KB per sharer is added, due to the per-sharer information held for each archived item; this includes information such as whether that person read, replied or forwarded the message, what folder it was archived from, whether any follow-up flags were applied etc.
10 Sizing an Enterprise Vault Solution for Exchange Archiving Page 8 Rule of Thumb #3: Storage required for migration of 100GB of PST files 26GB Vault Store 7GB Indexes (Medium level indexing) 0.8GB SQL Based on: Typical mix of messages with and without attachments, typical selection of attachment types Average message size, in PST files, of 50KB Single Vault Store Average number of sharers of a message located in one or more PST files is 2.4 Compression of 50% Notes: This can be scaled linearly for a smaller or larger amount of PST data Each PST file is assumed to contain approximately 10% general overhead, and for each message an additional 5KB overhead. The number of amount of actual message data to migrate is determined from this. For Full indexing, double the Index size For Brief indexing, half the Index size This Rule of Thumb has been derived in a similar way to the prevous rule (for the Mailbox backlog ). Although it appears to be for the same amount of original data the resulting storagae requirements in EV are different. This is because; o The overhead in a PST means that for 100GB of PST files there is only approximately 83GB of actual message data o The higher number of sharers for PST items means a greater net reduction in the amount of storage needed when these items are migrated into a single Vault Store Partition. Rule of Thumb #4: Storage required for 1000 mailboxes for 1 year Steady-state Mailbox Archiving 161GB Vault Store 29GB Indexes (Medium level indexing) 2.4GB SQL Based on: Average of 20 messages archived per day from each mailbox Typical mix of messages with and without attachments, typical selection of attachment types Average message size of 75KB 250 work days per year Zero growth in message size or volumes during the year Single Vault Store Partition Single-instance ratio of 1.3 Compression of 50% Notes: This can be scaled linearly for more or less mailboxes, and more or less messages per day For Full indexing, double the Index size (60GB per 1000 mailboxes) For Brief indexing, half the Index size (15GB per 1000 mailboxes) This Rule of Thumb is derived by multiplying the typical number of messages archived per mailbox per day by the number of mailboxes and the typical number of workdays per year. The total size is based on this figure, multiplied by a typical average message size. The SQL, Index and Vault Store estimates are then based on these figures; o SQL = number of messages multiplied by 500 bytes o Medium Indexes = Total message size multipled by 8% o Vault Stores = Total size multiplied by compression and divided by SIR, then 5KB added for each sharer (the total number of messages archived with no SIS factored)
11 Sizing an Enterprise Vault Solution for Exchange Archiving Page 9 Rule of Thumb #5: Storage required for 1000 mailboxes for 1 year Journal Archiving 270GB Vault Store 38GB Indexes (Medium level indexing) 3.2GB SQL Based on: Each user sends and receives an average of 80 messages per workday Each message has an average of 3 internal addresses Typical mix of messages with and without attachments, typical selection of attachment types Average message size of 75KB 250 work days per year Zero growth in message size or volumes during the year Single Vault Store Partition Single Journal Mailbox / Single-instance ratio of 1.0 Compression of 50% Notes: This can be scaled linearly for more or less mailboxes, and more or less messages per day For Full indexing, double the Index size (60GB per 1000 mailboxes) For Brief indexing, half the Index size (15GB per 1000 mailboxes) This Rule of Thumb has been derived by estimating the number of Journal Messages per year; by taking the total number of messages send and received by internal users over the year, and dividing by the typical number of internal addressees. This Rule Of Thumb assumes that all Journal Messages will go to a single Journal Mailbox; for scenarios with multiple Journal Mailboxes please refer to Part 2 of this whitepaper. Rule of Thumb #6: Bandwidth required to archive across a WAN connection is 1.5 kilobits/sec per remote mailbox Based on: Average of 20 messages archived per day from each remote mailbox Typical mix of messages with and without attachments, typical selection of attachment types Average message size of 75KB Four hour archiving window Notes: This is based on the EV server being located centrally and archiving across the WAN from a remote Exchange server Bandwidth required is to transmit the total amount of data to archive, plus approximately a 50% MAPI protocol overhead, in the allotted archiving window. Note that this does not include the (minimal) bandwidth required to return to the Exchange server, after the Vault Store backups have completed, to replace safety copies with Enterprise Vault shortcuts.
12 Sizing an Enterprise Vault Solution for Exchange Archiving Page 10 Rule of Thumb #7: A dual-processor EV server can support Journal Archiving from approximately 15,000 mailboxes (or 400,000 messages per day) 2 Based on: Up to 400,000 messages being Journalled per day (each user sending and receiving 80 messages per day, and these having an average of 3 internal Addresses) Typical mix of messages with and without attachments, typical selection of attachment types Journal Archiving running throughout the day, but this Rule of Thumb assumes 90% of messages will be Journalled within a 16 hour period each day. This is to allow for time for Enterprise Vault backups, and to give some room for growth or contingency for server downtime. Journaling to a single Journal Mailbox to avoid duplication of Journal Messages. EV server capable of archiving 25,000 messages per hour Properly configured EV server, 2.8Ghz XEON, with minimum 2GB RAM, dedicated mirrored pair of physical disks for MSMQ data files, locally attached or SAN storage (or NAS storage over LAN-speed network), adequate SQL server Notes: A single-processor EV server could support Journal Archiving for about 9,000 mailboxes or 240,000 messages per day A quad-processor EV server could support Journal Archiving for about 26,000 mailboxes however in these large environments it is likely that multiple Journal Mailboxes would be configured and multiple EV servers would be used for redundancy. These factors are considered more in Part 2 of this whitepaper. Rule of Thumb #8: A dual-processor EV server can archive approximately 2GB per hour Based on: Typical mix of messages with and without attachments, typical selection of attachment types EV server capable of archiving 25,000 messages per hour Archiving from local Exchange Servers with no performance bottlenecks Properly configured EV server, 2.8Ghz XEON, with minimum 2GB RAM, dedicated mirrored pair of physical disks for MSMQ data files, locally attached or SAN storage (or NAS storage over LAN-speed network), adequate SQL server Notes: A single-processor EV server could archive approximately 1.2GB per hour A quad-processor EV server could archive approximately 2.7GB per hour Note that there are no rules of thumb for sizing Public Folders. This is because use of Public Folders varies enormously between organizations and no consistent or typical usage patterns have been observed. However, the calculations required to size throughput and storage for Public Folder archiving are contained in the postsales section of this whitepaper. 2 Note this been rounded down for simplicity; the actual figures based on the calculations are 16,667 mailboxes or 444,444 messages per day.
13 Sizing an Enterprise Vault Solution for Exchange Archiving Page 11 Factors That May Affect These Rules of Thumb There are a number of factors that may affect these Rules of Thumb, and it is worth being acquainted with these to be able to decide whether it is appropriate to use these Rules of Thumb to do an EV solution sizing, and to be able to articulate them to the customer when presenting a high-level solution EV design or sizing estimate. Low mailbox quotas. If low (e.g. 30MB or less) mailbox quotas are in place, it is likely that users have been encouraged, or forced, to move messages out of their mailboxes and into PST files. Ideally, once Enterprise Vault has been implemented, this practice will become unnecessary and will cease, however it is important to identify if this user behavior has been prevalent because it may have affected the data on which the sizing is based. For example, if the ESR has been used to analyze the contents of mailboxes, it may have returned an average mailbox growth of 5 messages per day, and perhaps an average message size of only 20KB. These figures are lower than we d expect to see for most sites but are absolutely typical of a site where a low mailbox quota is in place, and users are moving a lot of messages (and especially larger messages) to PST files, or deleting them. In this case, it is invalid to base a sizing on only 5 messages per mailbox per day, or a low average message size, because once EV is implemented it would be expected that usage patterns alter more toward the norm. In this case, it would be better to explain the situation to the customer and use typical values in the sizing rather than the actuals obtained from the ESR. High, or non-existent, mailbox quotas. Conversely to low mailbox quotas, if mailboxes have a very high (>500MB) or non-existent mailbox quotas, the users may very rarely delete anything. A typical user sends and receives about 80 messages per day, of which by the time they become eligible for archiving (say after 60 days) only a fraction remain. Typically this will be between 15 to 25 messages. Users will normally delete a large portion of messages they receive to keep their mailbox free of clutter and below their mailbox quota. However, if they have a very generous mailbox quota or none set at all they may not be so inclined to do that daily housekeeping and so the number of messages that will remain in the mailboxes will be higher than our typical rules of thumb. This will be a factor when calculating both backlog and steady-state sizing. Quota-based archiving. Most of the calculations and rules of thumb in this whitepaper are based on age-based archiving polices rather than quota-based. This is by far the predominant method of setting up Mailbox Archiving. With quota-based archiving it is less easy to predict the number of messages to be archived each day, although generally it will be less than if an age-based archiving policy is implemented (because only those mailboxes that are over the defined threshold are targeted). For sizing purposes it is usual to base the estimates on a typical age-based policy, with the caveat that they will probably be higher than those that will actually result from a quota-based policy. Multiple Vault Stores. The Vault Store sizing is based on a typical single-instance ratio (SIR) of 1.3. This means that for, say, every 130 messages found in users mailboxes there are only 100 different messages, the other 30 being duplicates. This SIR is commonly observed in production EV sites, where single Vault Stores are used. However, if multiple Vault Stores are used (either to have a logical split e.g. per Exchange Server or a physical split, because multiple EV servers are needed) then this single-instancing will be less effective because single-instancing does not span across multiple Vault Stores. If multiple Vault Stores are used it should be assumed that there is a lower rate of singleinstancing (i.e. the SIR equals 1.1, or even 1.0) and then any additional single-instancing that does occur can be seen as upside. An exception to this might be when functional groups of users have been assigned to Vault Stores and therefore the chance of all instances of a message being archived to the same Vault Store are increased. This allocation of users to specific Vault Stores is extremely easy using Granular Provisioning in Enterprise Vault V7.0. Multiple Vault Store Partitions. Within each Vault Store it is possible to split the storage across multiple Vault Store Partitions; only one of which is open for writing at once. This is generally done for scalability (i.e. to expand the storage across additional physical partitions or devices), or to reduce backup times (only the currently open Vault Store partition needs to be backed up on a regular basis). However, once a Vault Storage Partition is closed, it cannot be written to and so any subsequent sharers of a message that has already been archived into that partition will result in a new copy of the message
14 Sizing an Enterprise Vault Solution for Exchange Archiving Page 12 being written to the currently open Vault Store Partition. As the Vault Store Partitions are date based (based on the original date of the message), this rollover of Vault Store Partitions will only have a significant effect during backlog or PST migration (when messages are not necessarily being archived in chronological order, and so a partition could be closed before a second or later instance of a message is archived). The SIS for steady-state and journal archiving should not be effected significantly by rolling over Vault Store Partitions. Using Enterprise Vault Collections. If a Vault Store Partition has been configured to use Enterprise Vault Collections (where multiple messages are gathered into a smaller number of collection files to make more efficient use of disk space and reduce backup times, and optionally as preparation for moving to secondary storage) then this could also affect single-instancing. Once a message has been collected, a subsequent instance of the message being archived will result in a new copy of the message being created in the Vault Store Partition. Again, this will have most impact on backlog and PST migration where messages are not archived in chronological order. Steady-state and journaling should be less affected as Enterprise Vault Collection is normally configured with a few days delay between archiving and collection occurring and this should allow for any additional instances to be fully shared before the message is collected. Remote Exchange servers. Although it is ideal to have a centralized EV solution, archiving from remote Exchange servers across the WAN, there are a number of factors that may mean that archiving, or users accessing archived data, across the WAN is not practical. In this case a separate EV server in the remote location may be required and this will need to be sized separately (both for archiving throughput and storage). As multiple EV servers are used, then this also means that the effect on single-instancing of having multiple Vault Store Partitions (see previous) needs to be considered. However, it may also be that the majority of for the remote site users flows within that site and therefore single-instancing may not be adversely affected by having a separate Vault Store Partition. This is not an exact science. Multiple Journal Mailboxes. Ideally, the customer will configure Exchange Journaling so that all journal messages are sent to a single Journal Mailbox on an existing or dedicated Exchange server. This means that only one journal message is sent per message, irrespective of how many Exchange servers and Mailbox Stores the mailboxes of the message addresses are located in. However, if multiple Journal Mailboxes are configured, and unless mailboxes have been functionally grouped into Mailbox Stores, it is likely that multiple copies of journal messages will be generated. For example, with 3 Journal Mailboxes and 3 addresses per message, 100 unique messages may result in around 210 journal messages being generated. This is a complex calculation based on probabilities of single and multiple journal copies per message. For 3 sharers and 2 Journal Mailboxes the factor is For 3 sharers and 3 Journal Mailboxes, the factor is For 3 sharers and 4 Journal Mailboxes the factor is Although, if all these Journal Mailboxes are archived to the same Vault Store the single-instancing will be reinstated, the EV servers must still do more work as each Journal Message must be retrieved from the Journal Mailboxes and processed. Therefore, it is preferable to have a single Journal Mailbox. However this may not always be practical as each Journal Mailbox may only be processed by one EV server and there are limits to how many items per hour can be archived (e.g. 25,000 messages per hour using a dual-processor EV server). A non-typical mix of message types. All the standard EV benchmarks and rules of thumb are based on a typical office user, who s has about 20% with attachments and these attachments are typical office documents (word, excel, PPT, PDF etc.). If this is not the case at the customer, then the rules of thumb may need to be adjusted. For example, a customer that sends around a high proportion of high-quality image files; the average message size will be larger, the compression we can achieve within the archive will be less, and the indexing overhead will be less (as all we can index of an image file is the meta-information, not the content). Large ZIP files also compress little but require more processing to create the Indexible stream, from which the contents are indexed. A large proportion of low usage or inactive mailboxes. This tends to occur with larger organizations, especially those that have a lot of field workers. For example, a large state railway department in Australia has 12,000 mailboxes, but the majority of these use rarely as they are train-drivers, station workers, maintenance engineers, track workers etc. In this case, the average number of
15 Sizing an Enterprise Vault Solution for Exchange Archiving Page 13 messages archived per day from each mailbox, with a 28-day policy, is very low (about 5 messages per day per mailbox) but this has been skewed low by the inactive or low-use mailboxes. Those office users have a more typical number of messages (15 to 25) being archived per day. Therefore, it is important when doing a high-level sizing to understand the mix of normal vs. low use mailboxes and adjust accordingly. It would be wrong to assume that all 12,000 mailboxes have 20 messages per day archived from them, as this would potentially lead to far more servers and storage being purchased than really necessary. Specialized Storage Devices. These Rules of Thumb are based on storing the archived messages in DVS files (savesets) on a traditional NTFS storage device, and using typical compression and singleinstancing values for this platform. Specialized storage, such as EMC Centera tm and WORM storage, will have different characteristics: sometimes superior to NTFS storage, sometimes not. This needs to be factored in if sizing for a solution that will use a specialized storage device. For example, if WORM storage is being used as primary Vault Storage then no single-instancing will occur (as the DVS file cannot be updated with information for the second and subsequent sharers). If a device such as an EMC Centera tm is being used, the format in which the archived data is stored is different and so a different sizing calculation must be used. This is outside the scope of this whitepaper. When One EV Server May Not Be Enough There are a number of circumstances when a single EV server may not enough. These include: Steady-state archiving throughput requirements. There will be situations where a single EV server cannot archive enough data in the time available each day. This may be because the number of mailboxes is high (>7500 is a good rule of thumb) or a sufficient archiving window (approximately 1 hour for every 1250 mailboxes) is not available per day, or greater than the typical number of messages per day need to be archived. Backlog or PST Migration throughput requirements. If there are specific requirements that the backlog needs to be cleared in a certain period (for example, prior to a planned Exchange migration or server consolidation) or PST files must be migrated by a certain date, then the archiving throughput needs to be considered and this may result in additional EV servers being required. Note that even if deploying additional EV servers is a temporary solution, splitting the archive into multiple Vault Stores (a requirement of having multiple EV servers running Storage Services) will be permanent and some single-instancing may be lost. This is a design trade-off that needs to be made in order to archive the backlog or PST files more quickly. Insufficient available network bandwidth. Specifically, between a remote site with Exchange Servers and a centralized Enterprise Vault site. This would necessitate either a full independent EV solution being deployed at that site, or an EV server which is part of the existing EV site being deployed, temporarily or permanently, at the remote site. This is outside the scope of this whitepaper and, in any case, should really be referred to an experienced Enterprise Vault design consultant. High-availability requirements. Multiple EV servers may be required in order to implement a highavailability solution. These may be configured using EV Building Blocks or a clustering solution supported by EV; currently Veritas Cluster Services and Microsoft Cluster Services. This is outside the scope of this whitepaper and, in any case, should really be referred to an experienced Enterprise Vault design consultant. Administrative boundaries. It may be that an Exchange Server at a remote site is managed by a different group and therefore the EV server may also be required to separate administrative responsibilities. Client access considerations. This is rarely a concern with Enterprise Vault, as users typically access only a small number of messages from the archive per day. However, if it is believed that users will access a large number of messages per day (or a number of large messages) then additional EV servers may be required to provide optimal response times. Generally, this is more the case if the users
16 Sizing an Enterprise Vault Solution for Exchange Archiving Page 14 are remote and have to access the archived items across a WAN; in which case a local EV server may be a better option. Note: It may be a better solution to implement only a centralized EV server and have the remote users utilize Exchange 2003 in Cached Mode and Offline Vault, as it means users will maintain a local cache of their archive and only download messages from the centralized EV server when the archived content is not found locally. Heavy use of Archive Explorer, Enterprise Vault Search or Discovery Accelerator. A single Enterprise Vault server can keep approximately 24 users Indexes in memory at once. If less than 24 users are actively searching their Index (which includes an Archive Explorer folder refresh) at the same time then typical search times will be less than a second. However if more than this number are attempting to search at exactly the same time, the search times will start to increase as Indexes need to be swapped out of memory to make room for other Indexes being searched. However, it should be remembered that this is an archive of older messages and the likelihood of heavy concurrent searching by a lot of users is rare. A general rule is that each Enterprise Vault server can support around 1,000 Archive Explorer users per hour. That is, if all the users invoked and used Archive Explorer for the first time in the space of one hour, that system can support 1,000 users. If the use was spread over five hours, that system could support 5,000 users. It is rare for that number of users to be active Archive Explorer users even when they are all accessing the same Enterprise Vault server. Based on this rule of thumb, and the fact that a single EV server can typically support archiving from 7,500 mailboxes, it follows that a server that is sized appropriately for archiving throughput should also be able to support the typical number of Searches and Archive Explorer users expected for that number of mailboxes. Additional EV components. For example, adding File System Archiving may necessitate adding an additional EV server to the EV site as the total archiving throughput required can no longer be accommodated by a single EV server. This may not be the case, as perhaps File System Archiving can be performed during the weekend while the Mailbox Archiving runs daily, but this should be considered. Running File System Archiving and Mailbox Archiving at the same time on the same EV server invariably means that not much Mailbox Archiving is done as File System Archiving effectively jumps the queue by bypassing the message queuing that Mailbox Archiving uses and goes directly to the Storage Service. In addition, some EV components simply cannot exist on the same EV server; these are Discovery Accelerator and Compliance Accelerator. Large Discovery Accelerator Searches or Production Runs. If this is a requirement of the EV solution, Symantec Professional Services should be engaged to assist with sizing the solution. Example: Sizing for Mailbox Archiving, Journal Archiving and PST Migration for a 5000 user site The site in this example is as follows: There are two Exchange servers in the data-centre, with a total of 5000 active mailboxes, evenly divided between the two servers. The ESR has not been run, and the customer does not know what their average message sizes and daily volumes are, but assumes they are typical. The users have a 50MB mailbox quota, but an export of the mailbox list from Exchange System Manager has shown that the current average mailbox size is only 25MB. There is also a total of 500GB of PST files to be migrated from file servers. There are no requirements to have the Mailbox backlog or PST files archived by a certain date. The customer has six hours per workday evening that they can schedule Mailbox Archiving. They will run PST migration during the day as this is optimal when using EV client-driven PST migration. The customer has asked for an indicative sizing for Mailbox Archiving for the next two years, including initial requirements for the PST migration and mailbox backlog. The steps to do the sizing estimate are as follows:
17 Sizing an Enterprise Vault Solution for Exchange Archiving Page 15 Determine number and specification of EV Servers 1. Are multiple EV servers required due to network topology? No, both Exchange Servers are centrally located 2. Are there requirements to clear to backlog or PST migration by a certain date? No. 3. Determine EV servers required for steady-state Mailbox Archiving (using Rule of Thumb #1) The number of messages to archive per day is assumed to be typical and we have six hours per day to do archiving. From our Rule of Thumb, a dual-processor EV server can support up to approximately 7,500 users. A single-processor server could support up to about 4,500 mailboxes but this is not allowing enough room from growth; so a dual-processor server would be a better choice. 4. Determine EV servers required for Journal Archiving (using Rules of Thumb #7) The number of messages to archive per day is assumed to be typical. From our Rule of Thumb, a dual-processor EV server can support Journal Archiving up to 15,000 users and a single-processor server can support Journal archiving for about 9,000 users. Either would be fine for Journal Archiving, but a dual-processor server is needed to meet the Mailbox Archiving requirements. 5. Will multiple types of archiving run concurrently? PST Migration will occur during the day, and coincide with Journal Archiving. However, there are no time-critical requirements for PST migration and, in any case, there is sufficient archiving throughput capacity to deal with PST migration and Journal archiving concurrently. A small amount of Journal Archiving will also coincide with the scheduled Mailbox Archiving. However the journal volumes at these times are small and the EV server has adequate throughput capacity to cover this. Conclusion: 1 dual-processor EV server will be sufficient. Determine number of SQL Servers 6. How many SQL servers are needed? Best practice is to have 1 SQL Server for up to 8 EV servers. Conclusion: Clearly, in this case, a single SQL server will suffice and an existing SQL Server will most likely cope with the load. Determine the Storage required 7. Estimate storage required for backlog Mailbox Archiving (using Rules of Thumb) We have 5000 mailboxes, averaging 25MB in size, and we ll assume 70% will be archived in the backlog. This gives an estimated 85GB of physical data, but we need to multiply it by our typical single-instance ratio of 1.3 to get a total message size excluding SIS as this is what out Rule of Thumb is based on. So, we have approximately 110GB of messages to archive from the backlog. Compare this to one of our Rules of Thumb: Rule of Thumb #2: Storage required for 100GB of Mailbox Archiving Backlog 48GB Vault Store (i.e. roughly 50% of original size) 8GB Indexes (Medium level indexing) (i.e. 8% of original size) 1GB SQL Compared to our rule of thumb, we have about 10% more message data to archive. If we assume everything else is equal, as we must for these purposes, then we ll multiply our rule of thumb figures by 110% to get: 52GB for Vault Stores 9GB for Medium Indexes 1.1GB for SQL databases
18 Sizing an Enterprise Vault Solution for Exchange Archiving Page Estimate storage for PST Migration (using Rules of Thumb) We have 500GB of PST files to migrate. Using our PST migration Rule of Thumb: Rule of Thumb #3: Storage required for migration of 100GB of PST files 26GB Vault Store 7GB Indexes (Medium level indexing) 0.8GB SQL Compared to our rule of thumb, we have five times the volume of PST files (500GB), so we must multiply our rule of thumb figures by that amount to get the following approximate estimates: 130GB for Vault Stores 35GB for Medium Indexes 4GB for SQL databases 9. Estimate storage for steady-state Mailbox Archiving for two years. We have 5000 mailboxes, which the customer assumes are typical, so we will use our Rule of Thumb: Rule of Thumb #4: Storage required for 1000 mailboxes for 1 year Steady-state Mailbox Archiving 161GB Vault Store 29GB Indexes (Medium level indexing) 2.4GB SQL Compared to our rule of thumb, we have five times the number of mailboxes, so we must multiple our rule of thumb figures by that amount: 805GB Vault Store 145GB Indexes (Medium level indexing) 12GB SQL For these presales estimates we assume that message sizes and daily volumes do not increase, and so for two years we simply multiply the above figures by two: 1610GB Vault Store 290GB Indexes (Medium level indexing) 24GB SQL 10. Estimate storage for Journal Archiving for two years. We have 5000 mailboxes, which the customer assumes are typical, so we will use our Rule of Thumb: Rule of Thumb #5: Storage required for 1000 mailboxes for 1 year Journal Archiving 270GB Vault Store 38GB Indexes (Medium level indexing) 3.2GB SQL Compared to our rule of thumb, we have five times the number of mailboxes, so we must multiply our rule of thumb figures by five: 1350GB Vault Store 190GB Indexes (Medium level indexing) 16GB SQL For these presales estimates we assume that message sizes and daily volumes do not increase, and so for two years we simply multiply the above figures by two: 2700GB Vault Store 380GB Indexes (Medium level indexing) 32GB SQL
19 Sizing an Enterprise Vault Solution for Exchange Archiving Page Combining all of the above storage estimates we arrive at the following: Backlog PST Migration 2 Years Mailbox Archiving 2 Years Journal Archiving Totals Vault Store Indexes SQL databases Totals Conclusion: An estimated 5267GB of storage will be required for the first two years, including the initial backlog and PST migration.
20 Sizing an Enterprise Vault Solution for Exchange Archiving Page 18 Part 2 - Detailed Sizing This section goes into sizing in more detail, and is aimed at Consultants doing formal solution designs and System Engineers needing to go to a deeper level for presales sizing. The sizing techniques in this section will use actual data obtained from analyzing the customer environment, or best guess estimates about the customer s environment rather than the generic figures that the sizing in Part 1 is based on. In addition, other factors will be introduced; such as annual growth rates for message volumes, sizes and mailbox numbers. Getting Data on Which to Base a Detailed Sizing To produce a detailed sizing report, the following information is needed: Mailbox Backlog Archiving The number of messages in the backlog. The total size of the messages in the backlog. The average size of messages in the backlog. This data is best obtained using the Exchange Store Reporter. However it can be obtained, or estimated, using other methods described later in this whitepaper. Mailbox Steady-State Archiving The number of mailboxes that will be enabled for archiving. The average number of number of messages to archive from each mailbox per work day. Note that this is not the number of messages sent and/or received per day; unless the users do not delete anything at all, or move anything to PST files, which is unlikely. The average size of messages currently; i.e. messages sent and/or received in the past couple of weeks. This data can be obtained using the ESR, although it must always be remembered that the results may not accurately reflect the true picture if users are actively moving messages to PST files or aggressively deleting due to low mailbox quotas. See a later discussion on use of the ESR to get this data. It is also useful to get an estimate from the customer of the following: The anticipated growth in size of average message over the next year The anticipated growth in number of messages that will be archived per mailbox per day over next 12 months The expected growth in number of active mailboxes over the next years Some typical estimates of these factors may be used, and are given later in this whitepaper, if the customer cannot provide them. PST Migration The total size of the PST files to be migrated The average size of messages in the PST files This information needs to be obtained by the customer using a method appropriate to their environment; see some guidance on this later in this whitepaper. The average size of messages in the PST files is difficult to obtain without a lot of effort, and so it is usually assumed that this is the same as the messages in the mailbox backlog. However, this may not be the case if users are aggressively moving messages to PST files (especially larger messages) to stay below a mailbox quota, and so often an assumption must be used. Typically, an average message size of 50KB will be assumed for messages in PST files.
21 Sizing an Enterprise Vault Solution for Exchange Archiving Page 19 Journal Archiving The average number of messages to Journal archive per work day. The average size of messages currently; i.e. messages sent and/or received in the past couple of weeks. The number of Journal Mailboxes into which messages will be journalled. Note that this is not the total number of Exchange mailboxes! Estimate of the typical number of addresses on a message, including the sender. Generally an assumption of 3 to 4 is used. This information, with the exception of the current average size of messages, cannot be obtained from the ESR results, and usually message tracking logs or performance counters must be used. Often, though, the simplest way to determine how much will be journalled is to turn on Exchange Journaling for a day or two, or ideally longer, and see how much builds up in the Journal Mailboxes. These messages can then be deleted at the end, or on a daily basis, once the customer has recorded the number and total size of these messages. If a customer is reluctant to turn on Exchange Journaling for this purpose, then it is worth pointing out that they will have to do this anyway to implement Journal Archiving. A lot of potential customers do not realize this, believing that journaling is purely a function of Enterprise Vault. Note also that the number of Journal Mailboxes is important to know. This is because, if multiple Journal Mailboxes are being used, then there is a strong possibly that multiple copies of each journal message will be generated within Exchange and must be archived by Enterprise Vault. Although these will, most likely, be single-instanced within the Vault Store they must still be extracted individually from the Journal Mailboxes and will be indexed separately. The degree to which this occurs is due to a combination of the number of Journal Mailboxes and the average number of internal addresses, but also the allocation of users to Mailbox Stores (and thus to Journal Mailboxes) within Exchange is also a factor. For example, if all Sales mailboxes are in one Mailbox Store and this is being journalled to a Sales Journal mailbox, and all the R&D staff are in a different Mailbox Store which is being journalled to the R&D Journal mailbox, then it may be that very few journal messages are sent to both Journal Mailboxes as the two communities do not interact that often. However, if a customer s mailboxes are not organized in such a way, and so are simply divided between Mailbox Stores on an arbitrary basis such as, for example, the initial letter of the user s surname - then there is a high chance of journal messages being sent to multiple Journal Mailboxes. So, does this really matter? Well, assuming the average message has 3 internal addresses, and these users are randomly spread across multiple Mailbox Stores, for which multiple Journal Mailboxes have been configured, then the following table shows the estimated fan out effect of multiple Journal Mailboxes. Number of Journal Mailboxes Number of Journal Messages Generated Per Message So, if say 100,000 unique messages are sent and/or received per day then, with three Journal Mailboxes, this would fan out to 211,000 journal messages to be archived per day. Although this would be singleinstanced when stored within the Vault Store (assuming a single Journal Vault Store Partition), it would still require additional archiving horsepower to process the increased number of messages in Journal Mailboxes, and would also result in larger Indexes and Enterprise Vault SQL tables.
22 Sizing an Enterprise Vault Solution for Exchange Archiving Page 20 Public Folder Backlog Archiving The number of messages/documents in the Public Folder backlog. The total size of messages/documents in the Public Folder backlog. The average size of messages/documents in the Public Folder backlog. This information could be obtained using the ESR. However, very few of the spreadsheets typically used by Symantec System Engineers and Consultants extract and analyze Public Folder data from the ESR XML report files. Public Folder Steady-state Archiving The average number of messages/documents to archive from Public Folders per work day The current average size of messages/documents in Public Folders As with the Public Folder backlog, this information can be captured using the ESR but again this data is not normally analyzed by the spreadsheets that Symantec System Engineers and Consulting use. The amount of data being added to Public Folders on a daily basis though is unlikely to be significant or consistent, and so it is probably safer and more efficient to base any sizing on a simplistic assumption like 25% of users will add one 200KB document to the Public Folders each working day. The customer will need to supply this assumption. In addition, the following values are needed: The expected compression within the archive. Most Symantec sizing is based on an assumed compression of a typical mix of messages and office style attachments of around 50%. However, if the customer has more than typical number of audio, multimedia, graphics files or zip files being sent or stored in Exchange then this compression ratio can be expected to be less perhaps down to 30% compression, or less. Conversely, if the customer s consists mainly of plain text messages (perhaps sending of graphics files etc. has been blocked) then a higher compression ratio may be achieved. The general rule of thumb is to use 50% unless you know that the customer s usage is significantly different from the norm. The default level of Enterprise Vault Indexing required for Mailbox Archiving (including PST files), Public Folders and Journaling The type of Indexing that the customer requires for each of these types of archiving will determine the size of the respective Indexes. The size of an Index is estimated as a percentage of the size of the original data, and percentage varies depending on the type of Indexing being used, as shown in the following table. Indexing Level What gets indexed Typical size (% of original data) Brief Message attributes such as; 3% Subject, Sender, Recipients, Dates, Folder etc. Medium As above, PLUS content of 8% message body and indexible attachments. These are over 250 of the common attachment types, including Microsoft Word, Excel, PowerPoint, Adobe PDF etc. Full As above, but indexing is done on phrases level, rather than only individual words. 12% Note that if the customer has a large percentage of non-indexible attachments (such as graphics, audio or multi-media files) then the resulting size of Medium and Full Indexes may be lower than those figures given above. This is especially true when you consider that these files will also typically be quite large. Again, this is something you may want to factor into a sizing, especially when it can lend some credibility to a sizing if
23 Sizing an Enterprise Vault Solution for Exchange Archiving Page 21 you acknowledge that a customer s unique usage profile makes some difference to the result, even if it is only a couple of percentage points. Using the Symantec Exchange Store Reporter (ESR) Exchange Store Reporter is a tool that gathers data from a Microsoft Exchange Server mailbox or public folder store and then displays the results graphically. More usefully, as the on-screen graph is a little hard to interpret, it also creates an XML report file which can be imported into an Excel spreadsheet for analysis. There are several different spreadsheets that Symantec System Engineers and Consultants use to interpret the ESR XML data depending on region and purpose of the analysis. The ESR collects age and size data about all items in the Exchange Server stores, except for the following: System mailboxes Draft items Enterprise Vault shortcuts or pending items (if Enterprise Vault as already been run against the Exchange Server). The ESR can gather data from the following Microsoft Exchange Servers: Exchange Server 2003 SP1 or later, running on Windows 2003 Enterprise SP1 or later. Exchange Server 2000 SP3 or later, running on Windows 2000 SP4 or Windows 2000 Advanced Server SP4 or later. Exchange Server 5.5 SP3 or later, running on Windows NT 4.0 SP6a or Windows 2000 SP4 or later. It has been tested with the following clustered configurations: Microsoft Exchange Server 2003 on Windows Server 2003 clustered using Microsoft Windows Server 2003 Clustering Services, or VERITAS Cluster Server (VCS) 4.3. Microsoft Exchange Server 2000 on Windows Server 2000 clustered using Microsoft Windows Server 2000 Clustering Services, or VERITAS Cluster Server (VCS) 4.3. The Exchange Store Reporter does not have to be installed on the Exchange server itself, however the computer on which it is installed must have: One of the following operating systems: o Windows Server 2003 SP1 o Windows 2000 SP4 (Server or Professional) o Windows XP Professional SP2 One of the following versions of Microsoft Outlook o Outlook 2000 SP3 o Outlook 2003 SP2 A reliable, ideally LAN-speed, network connection to the Exchange Servers. Note ESR 2.0 has only been tested running on the above listed versions of Windows and Outlook, however it may also work with other Service Pack versions in addition to those listed. The Windows account from which you run Exchange Store Reporter must have sufficient Exchange Server permissions to gather the required data: For Exchange 2003 or Exchange 2000 the account must have Full Control access to each Exchange Server. For Exchange 5.5 the account must have Service Account Admin permissions at the Site and Configuration level. The Exchange Store Reporter can be downloaded from ftp://ftp.veritas.com/pub/products/evesr20.zip and does not require a license. There are a number of very important things to remember about the ESR and the results it produces: It only samples the Exchange mailboxes or Public Folders; it does NOT sample PST files.
24 Sizing an Enterprise Vault Solution for Exchange Archiving Page 22 It only samples what is in those mailboxes or Public Folders at the time it runs. It is a point in time analysis of the current contents of the Exchange stores. This means that it cannot give figures on how many messages are actually sent or received per day this data must be obtained by other means. It also means that, if users are moving items to PST files, these will not be factored into the results. More on that later. The sample rate that is specified when running the tool refers to how many mailboxes are analyzed (rather than the number of messages). The number of messages, and their total size, is then scaled up for the total number of mailboxes on the server. So, for example, if 100,000 messages were found when 25% of mailboxes were sampled, the ESR would report the total number of messages was 400,000. The default sample rate (10%) is rather low for most servers (with less than 2500 mailboxes) because it can mean that the number of mailboxes sampled is not considered statistically significant. For example, if 10% of 100 mailboxes were sampled, those 10 mailboxes chosen may not accurately represent the typical mailboxes on the server; whereas with 1000 mailboxes, a 10% sample is more statistically accurate. The following table shows the recommended minimum sample sizes for different numbers of Exchange mailboxes on a server: Total Mailboxes on Server Statistically Significant Sample # of Mailboxes % % % % % % % % % % % Recommended minimum Sample % This has been derived using a sample size calculator from and is based on 95% confidence level and confidence interval of 5. Statisticians will understand what that means! ESR can take a while to run usually hours for any Exchange server of a reasonable size. This can be reduced by using a lower sample size but consider the effect on accuracy as shown by the above table. If leaving the ESR to run over an extended period, remember that any Exchange server or network outages will cause the ESR to abort, and it will to be restarted from the beginning. Remember the ESR is making a (single) MAPI connection to the Exchange server, using Outlook. Therefore it may not be the most performant over a WAN link and it may be better to install ESR on a machine local to the Exchange server rather than try to run it across the WAN. When running the ESR, always remember to specify a mailbox that is hosted on the Exchange server that is being sampled. This is simply for the initial connection and any mailbox on that server that is visible in the address list can be specified. When re-running the ESR against a different Exchange server, remember to specific a mailbox on THAT server. A lot of people mistakenly only change the Exchange server name and leave the mailbox as it was; this results in the ESR going back to the original server (as that is where the specified mailbox is hosted) and sampling it for a second time. Generally, if you get two sets of results that are very similar, perhaps with only the slightest difference in the number and total size of messages and the same number of mailboxes, this is because the ESR has been run against the same Exchange server twice.
25 Sizing an Enterprise Vault Solution for Exchange Archiving Page 23 Interpreting the results output directly from the ESR can be difficult, and most System Engineers and Consultants will load the ESR report XML files into Excel spreadsheets for analysis. A number of these exist with Symantec, so contact a local Symantec Enterprise Vault System Engineer or Consultant. It is also VERY IMPORTANT to realize that analysis of this data, even with one of the spreadsheets, is best done by an EV specialist with a trained and experienced eye. These people will be able to interpret whether the results have been skewed by factors such as low mailbox quotas and/or users moving items to PST files, or an atypical user demographic. As a quick check, compare the results you get to the rules of thumb listed earlier. For example, is the typical current message size between 50KB and 70KB? Is the typical mailbox growing by between 15 and 25 messages per day? If the answer to either question is no then it is possible the ESR results have been skewed and you should consider using typical rules of thumb instead. Using Exchange Mailbox Store Sizes to Determine the Mailbox Backlog Some customers will quote the size of their Exchange information stores when asking for an EV sizing. In a post-sales environment, this is not adequate for an accurate sizing. However, in a presales environment it may be all you have to work with. One thing is certain; you can t take the size of the Exchange Stores and assume that this either represents the total size of the messages to archive or the size of the resulting Vault Stores. You need to factor in such things as the amount of the Mailbox Stores that are taken up by the Recovered Items cache (a.k.a. the Dumpster ), unused space (a.k.a. white space ) within the Mailbox Store database, and nonarchivable objects within the store. A conservative estimate might be that no more than 70% of an Exchange database might be actual non-deleted messages and then perhaps 70% of that will be eligible for immediate archiving (i.e. constitutes the backlog ). Of course, this includes some single-instancing within the Exchange Store, typically an SIR of between 1.2 and 1.5 for older messages, and we need to factor that in when estimating the total number and size of messages we will archive (Remember: we must still extract each individual instance of a message from every mailbox it is still in, and only once in the archive do we reapply single-instancing). This is best illustrated with an example: The customer tells us their Exchange Stores total 100GB. We ask if we can run the ESR and they say no (which very rarely happens in real life ) so we must reluctantly go ahead with an estimate based on the size of their Exchange stores. We will assume that 70% (70GB) is real non-delete that could be archived. We will then assume that 70% of this (49GB) would be eligible for archiving immediately (i.e this is the backlog ). We will multiply this by a typical SIR of 1.3 to determine the total size of messages with singleinstancing ignored. This gives us approximately 64GB of messages we will need to process as Backlog archiving. Assuming an average message size of older messages is 50KB, we divide our 64GB of backlog by this value to get an estimate of the total number of messages; in this case, 1,342,177 messages. Note: This is all a gross simplification and only gives an estimate of the size of the backlog, and not the ongoing steady-state. This method should only be used as a last resort when no other information is available and/or the customer is insistent on using the Exchange database size as a basis for the estimate. Note this method is based on a number of assumptions; including the amount of actual message data in the Exchange stores and the amount of that data that immediately meets the archiving criteria.
26 Sizing an Enterprise Vault Solution for Exchange Archiving Page 24 Using Exchange System Manager to Determine the Mailbox Backlog A slightly better, but still not ideal, method to determine the mailbox backlog is to do an export of mailbox sizes from Exchange System Manager. This at least removes the assumption of how much actual is contained in the Mailbox Stores. From Exchange System Manager, each Mailbox Store needs to be expanded in turn and the Export List option used (as shown in the figure below) to export the information to a tab- or commaseparated file. These files can then be imported into Microsoft Excel to calculate the total size, the total number of items, and the average item size. Note that this will include items such as Calendar and Contact items, which may or may not be archived depending on the way in which the customer wants EV configured, and also items in the Deleted Items folder which are not normally archived. However, it gives a better estimate than the previous method of estimating based on the size of the Exchange Mailbox Store databases. Using Exchange Message Tracking An effective means of getting accurate message traffic statistics (for estimating steady-state and journaling) is to use the standard Message Tracking feature built into Exchange. The customer may already have this enabled, but if not it can be switched on for a limited period (say, one week). The Message Tracking feature logs a number of events into the Message Tracking log file as messages are sent, routed through and received by each Exchange server. The log file is actually a tab-separated text file that may be loaded into an Excel spreadsheet (assuming you have less than lines in the log file, which at typically between 6 and 10 events per
27 Sizing an Enterprise Vault Solution for Exchange Archiving Page 25 message is probably less than 10,000 messages) or loaded into Access or some other data analysis program. You could even use a utility like Windows Grep (http://www.wingrep.com/) to search the log file for specific event codes in order to get a count of these events or extract just those events to another (smaller) file for importing into Excel. Refer to Microsoft Knowledgebase article for a description of these event Ids. There are also a number of third party tools available to analyze the Message Tracking logs and produce reports of daily message traffic. Using SMTP Gateway Statistics Another common occurrence is that a customer will state they have a certain number or volume of messages being sent and received per day. On the surface this would appear to be very useful information, but you need to determine where/how that statistic was measured. Often these statistics have been pulled from SMTP mail gateway between the customer s internal organization and the Internet. While this data might have some use, the picture is incomplete as it does not encapsulate the internal messaging traffic. In most cases you will need to disregard this data completely and use data obtained by other means or Rules of Thumb. Using Exchange Transaction Logs There is no direct way to interrogate the Exchange transaction logs to determine the message statistics required in order to size an EV solution however there are 3 rd party tools available that do this. Using 3 rd Party Data Capture/Analysis Tools As previously mentioned, there are 3 rd party tools available to analyze either the message tracking logs, the transaction logs, performance counters or the Exchange databases in order to get statistics about daily message volumes. Using Performance Monitor An extremely useful and easy way to get statistics on regular basis is to use the Windows Performance Monitor, especially if you set up a Counter Log to sample the statistics on an hourly basis. Useful counters to log would be: \\<servername>\msexchangeis Mailbox[_Total]\Local deliveries \\<servername>\msexchangeis Mailbox[_Total]\Messages Sent Local Deliveries are the number of messages being received by the Mailbox Store. Messages Sent are messages that have been submitted to the Exchange message transport service. These two may be added together to give the number of messages being sent and received. If you have a separate Mailbox Store for Exchange Journaling then it would be useful to separate this out by including a counter for that specific Mailbox Store rather than the overall Total; for example \\K2004\MSExchangeIS Mailbox[Journal Storage Group-Journal Store (K2004)]\Local deliveries \\K2004\MSExchangeIS Mailbox[Journal Storage Group-Journal Store (K2004)]\Messages Sent You may also include \\<servername>\msexchangeis Mailbox[_Total]\Total count of Recoverable Items \\<servername>\msexchangeis Mailbox[_Total]\Total size of Recoverable Items These give you an indication of how much is being deleted over period of time.
28 Sizing an Enterprise Vault Solution for Exchange Archiving Page 26 Getting Data about PST Files The total size of PST files in an organization can potentially be very significant to an Enterprise Vault sizing estimate. Very often the Exchange Store Reporter analysis may show that the mailboxes are quite small and growing fairly slowly, however the reality is that users are actively moving messages to PST files instead of storing them in their mailboxes. The size of PST files needs to be determined to work out the amount of storage that will be needed for SQL databases, Indexes and Vault Stores when the PST files are migrated into Enterprise Vault, and also how long this process may potentially take to complete. There is no equivalent of the ESR available from Symantec to discover and report on PST files, except of course Enterprise Vault itself, once fully implemented in production. It would be unusual to install EV purely to report on PST files during a presales phase, but if it is already installed as a Proof of Concept it may be worth running the PST Locator Task to get details of what PST files exist on client computers and file servers. PST migration is often also done as second phase in an EV deployment, so it may also be the PST Locator Task is run to validate the PST sizing that was done, using other methods, during the presales phase. Estimating EV Server Throughput Requirements In most cases, we are only required to consider the archiving throughput requirements for steady-state Mailbox Archiving, Journal Archiving and occasionally Public Folder archiving. Sometimes, we also need to consider the throughput requirements for the Mailbox and/or Public Folder backlog archiving and/or for PST migration when there is a fixed time in which this phase must be completed; for example, prior to an Exchange migration all the mailboxes must be significantly reduced in size (by archiving the backlog), or File Server that houses all the PST files is being decommissioned on a certain date. These are dealt with separately in the following sections.
Deployment Planning Guide August 2011 Copyright: 2011, CCH, a Wolters Kluwer business. All rights reserved. Material in this publication may not be reproduced or transmitted in any form or by any means,
Microsoft Exchange 2010 and Advanced Email Archiving Prepared by C. Keiper Storage Services Consultant Reprinted with the permission of C. Keiper Contents Executive Overview... 3 Introduction... 4 Availability,
HP Performance Engineering Best Practices Series for Performance Engineers and Managers Performance Monitoring Best Practices Document Release Date: 201 Software Release Date: 2014 Legal Notices Warranty
White Paper MICROSOFT EXCHANGE 2010 STORAGE BEST PRACTICES AND DESIGN GUIDELINES FOR EMC STORAGE EMC Solutions Group Abstract Microsoft Exchange has rapidly become the choice of messaging for many businesses,
Issue 4 Handling Inactive Data Efficiently 1 Editor s Note 3 Does this mean long term backup? NOTE FROM THE EDITOR S DESK: 4 Key benefits of archiving the data? 5 Does archiving file servers help? 6 Managing
HP Performance Engineering Best Practices Series for Performance Engineers and Managers Performance Monitoring Best Practices Document Release Date: May 2009 Software Release Date: May 2009 Legal Notices
Microsoft Exchange Server 2010 storage overview and HP storage options Technical white paper Table of contents Exchange Server 2010 storage is all about options... 2 Exchange technology changes... 2 Microsoft
Technical Report Microsoft SQL Server and NetApp SnapManager for SQL Server on NetApp Storage Best Practices Guide Abhishek Basu, NetApp February 2013 TR-4003 Abstract This best practice guide is designed
TECHNICAL WHITE PAPER: DATA AND SYSTEM PROTECTION Achieving High Availability with Symantec Enterprise Vault Chris Dooley January 3, 2007 Technical White Paper: Data and System Protection Achieving High
HP StoreOnce Catalyst and HP Data Protector 7 Implementation and Best Practice Guide Release 1 Executive Summary This guide is intended to enable the reader to understand the basic technology of HP StoreOnce
IBM i on Power - Performance FAQ February 5, 2013 IBM Corporation Table of Contents 1 Introduction 8 1.1 Purpose of this document 8 1.2 Overview 8 1.3 Document Responsibilities 8 2 What Is Performance?
NDMP Backup of Dell EqualLogic FS Series NAS using CommVault Simpana A Dell EqualLogic Reference Architecture Dell Storage Engineering June 2013 Revisions Date January 2013 June 2013 Description Initial
Symantec NetBackup (NBU) Design Best Practices with Data Domain GlassHouse Whitepaper Introduction Written by: Brian Sakovitch and Kelley Alexander GlassHouse Technologies, Inc. Protecting the ever expanding
High Availability and Scalability with Domino Clustering and Partitioning on AIX Marcelo R. Barrios, Ole Conradsen, Charles Haramoto, Didac Marin International Technical Support Organization http://www.redbooks.ibm.com
EMC SOLUTIONS FOR MICROSOFT SQL SERVER WITH EMC VNX SERIES EMC Solutions Group Abstract This document describes various best practices for deploying Microsoft SQL Server with EMC VNX series storage arrays.
Dell Compellent Storage Center Microsoft Exchange Server 2010 Best Practices Document revision Date Revision Comments 11/19/2009 A Draft 11/30/2009 B Initial Release 7/26/2010 C Technical Review/Updates
Relational Database Management Systems in the Cloud: Microsoft SQL Server 2008 R2 Miles Ward July 2011 Page 1 of 22 Table of Contents Introduction... 3 Relational Databases on Amazon EC2... 3 AWS vs. Your
This product is protected by U.S. and international copyright and intellectual property laws. This product is covered by one or more patents listed at http://www.vmware.com/download/patents.html. VMware
Best Practices for Securing Privileged Accounts 2015 Hitachi ID Systems, Inc. All rights reserved. Contents 1 Introduction 1 2 Risk management 2 2.1 Baseline risks............................................
HP B6200 Backup System Recommended Configuration Guidelines Introduction... 3 Purpose of this guide... 4 Executive summary... 4 Challenges in Enterprise Data Protection... 4 A summary of HP B6200 Backup
Performance Study VMware vcenter Server Performance and Best Practices VMware vsphere 4.1 VMware vcenter Server allows you to manage all levels of a VMware vsphere deployment from datacenters to clusters,
Best Practices Guide McAfee epolicy Orchestrator for use with epolicy Orchestrator versions 4.5.0 and 4.0.0 COPYRIGHT Copyright 2011 McAfee, Inc. All Rights Reserved. No part of this publication may be
WHITE PAPER VERITAS NetBackup Technical Overview VERITAS NETBACKUP TECHNICAL OVERVIEW 1 TABLE OF CONTENTS VERITAS NetBackup Technical Overview...1 Product Overview...4 Key Features of NetBackup...4 NetBackup
CA ARCserve Backup for Windows Implementation Guide r15 This documentation and any related computer software help programs (hereinafter referred to as the "Documentation") are for your informational purposes
NetVault : Backup Application Plugin Module (APM) for Exchange Server version 4.1 User s Guide MEG-101-4.1-EN-01 10/29/09 Copyrights NetVault: Backup APM for Exchange Server User s Guide Software Copyright
1 1 1 1 1 Contents I. NATURE OF THE ACTION...1 II. JURISDICTION AND VENUE... III. BACKGROUND FACTS... IV. A. The Business of Gartner... B. The Business of ZL... C. The Email Archiving Market... WRONGFUL
White Paper White Paper Microsoft SQL Server Best Practices with Data Domain Deduplication Storage Abstract Users are faced with many options and tradeoffs when choosing a backup strategy for Microsoft
best practice design guide Deploying High Density Wi- Fi DESIGN AND CONFIGURATION GUIDE FOR ENTERPRISE Table of Contents Intended Audience... 3 Overview... 4 Performance Requirements... 5 Classroom Example...