Design Implications for Enterprise Storage Systems via Multi-Dimensional Trace Analysis

Transcription

1 Design Impliations for Enterprise Storage Systems via Multi-Dimensional Trae Analysis Yanpei Chen, Kiran Srinivasan, Garth Goodson, Randy Katz University of California, Berkeley, NetApp In. {yhen2, {skiran, ABSTRACT Enterprise storage systems are faing enormous hallenges due to inreasing growth and heterogeneity of the data stored. Designing future storage systems requires omprehensive insights that existing trae analysis methods are ill-equipped to supply. In this paper, we seek to provide suh insights by using a new methodology that leverages an objetive, multi-dimensional statistial tehnique to extrat data aess patterns from network storage system traes. We apply our method on two large-sale real-world prodution network storage system traes to obtain omprehensive aess patterns and design insights at user, appliation, file, and diretory levels. We derive simple, easily implementable, threshold-based design optimizations that enable effiient data plaement and apaity optimization strategies for servers, onsolidation poliies for lients, and improved ahing performane for both. Categories and Subjet Desriptors C.4 [Performane of Systems]: Measurement tehniques; D.4.3 [Operating Systems]: File Systems Management Distributed file systems 1. INTRODUCTION Enterprise storage systems are designed around a set of data aess patterns. The storage system an be speialized by designing to a speifi data aess pattern; e.g., a storage system for streaming video supports different aess patterns than a ument repository. The better the aess pattern is understood, the better the storage system design. Insights into aess patterns have been derived from the analysis of existing file system workloads, typially through trae analysis studies [1, 3, 17, 19, 24]. While this is the orret general strategy for improving storage system design, past approahes have ritial shortomings, espeially given reent hanges in tehnology trends. In this paper, we present a new design methodology to overome these shortomings. The data stored on enterprise network-attahed storage systems is undergoing hanges due to a fundamental shift in the underlying tehnology trends. We have observed three suh trends, inluding: Sale: Data size grows at an alarming rate [12], due to new types of soial, business and sientifi appliations [2], and the desire to never delete data. Heterogeneity: The mix of data types stored on these storage systems is beoming inreasingly omplex, eah having its own requirements and aess patterns [22]. Consolidation: Virtualization has enabled the onsolidation of multiple appliations and their data onto fewer storage servers [6, 23]. These virtual mahines (VMs) also present aggregate data aess patterns more omplex than those from individual lients. Better design of future storage systems requires insights into the hanging aess patterns due to these trends. While trae studies have been used to derive data aess patterns, we believe that they have the following shortomings: Unidimensional: Although existing methods analyze many aess harateristis, they do so one at a time, without revealing ross-harateristi dependenies. Expertise bias: Past analyses were performed by storage system designers looking for speifi patterns based on prior workload expetations. This introdues a bias that needs to be revisited based on the new tehnology trends. Storage server entri: Past file system studies foused primarily on storage servers. This reates a ritial knowledge gap regarding lient behavior. To overome these shortomings, we propose a new design methodology baked by the analysis of storage system traes. We present a method that simultaneously analyzes multiple harateristis and their ross dependenies. We use a multi-dimensional, statistial orrelation tehnique, alled k-means [2], that is ompletely agnosti to the harateristis of eah aess pattern and their dependenies. The K-means algorithm an analyze hundreds of dimensions simultaneously, providing added objetivity to our analysis. To further redue expertise bias, we involve as many relevant harateristis as possible for eah aess pattern. In addition, we analyze patterns at different granularities (e.g., at the user session, appliation, file level) on the storage server as well as the lient, thus addressing the need for understanding lient patterns. The resulting design insights enable poliies for building new storage systems.

2 Client side observations and design impliations 1. Client sessions with IO sizes >128KB are read only or write only. Clients an onsolidate sessions based on only the read-write ratio. 2. Client sessions with duration >8 hours do 1MB of IO. Client ahes an already fit an entire day s IO. 3. Number of lient sessions drops off linearly by 2% from Monday to Friday. Servers an get an extra day for bakground tasks by running at appropriate times during week days. 4. Appliations with <4KB of IO per file open and many opens of a few files do only random IO. Clients should always ahe the first few KB of IO per file per appliation. 5. Appliations with >5% sequential read or write aess entire files at a time. Clients an request file prefeth (read) or delegation (write) based on only the IO sequentiality. 6. Engineering appliations with >5% sequential read and sequential write are doing ode ompile tasks, based on file extensions. Servers an identify ompile tasks; server has to ahe the output of these tasks. Server side observations and design impliations 7. Files with >7% sequential read or write have no repeated reads or overwrites. Servers should delegate sequentially aessed files to lients to improve IO performane. 8. Engineering files with repeated reads have random aesses. Servers should delegate repeatedly read files to lients; lients need to store them in flash or memory. 9. All files are ative (have opens, IO, and metadata aess) for only 1-2 hours in a few months. Servers an use file idle time to ompress or dedupliate to inrease storage apaity. 1. All files have either all random aess or >7% sequential aess. (Seen in past studies too) Servers an selet the best storage medium for eah file based on only aess sequentiality. 11. Diretories with sequentially aessed files almost always ontain randomly aessed files as well. Servers an hange from per-diretory plaement poliy (default) to per-file poliy upon seeing any sequential IO to any files in a diretory. 12. Some diretories aggregate only files with repeated reads and overwrites. Servers an delegate these diretories entirely to lients, tradeoffs permitting. Table 1: Summary of design insights, separated into insights derived from lient aess patterns and server aess patterns. We analyze two reent, network-attahed storage file system traes from a prodution enterprise dataenter. Table 1 summarizes our key observations and design impliations, they will be detailed later in the paper. Our methodology leads to observations that would be diffiult to extrat using past methods. We illustrate two suh aess patterns, one showing the value of multi-granular analysis (Observation 1 in Table 1) and another showing the value of multi-feature analysis (Observation 8). First, we observe (Observation 1) that sessions with more than 128KB of data reads or writes are either read-only or write-only. This observation affets shared ahing and onsolidation poliies aross sessions. Speifially, lient OSs an detet and o-loate ahe sensitive sessions (read-only) with ahe insensitive sessions (write-only) using just one parameter (read-write ratio). This improves ahe utilization and onsolidation (inreased density of sessions per server). Similarly, we observe (Observation 8) that files with >7% sequential read or sequential write have no repeated reads or overwrites. This aess pattern involves four harateristis: read sequentiality, write sequentiality, repeated read behavior, and overwrite behavior. The observation leads to a useful poliy: sequentially aessed files do not need to be ahed at the server (no repeated reads), whih leads to an effiient buffer ahe. These observations illustrate that our methodology an derive unique design impliations that leverage the orrelation between different harateristis. To summarize, our ontributions are: Identify storage system aess patterns using a multidimensional, statistial analysis tehnique. Build a framework for analyzing traes at different granularity levels at both server and lient. Analyze our speifi traes and present the aess patterns identified. Derive design impliations for various storage system omponents from the aess patterns. In the rest of the paper, we motivate and desribe our analysis methodology (Setions 2 and 3), present the aess patterns we found and the design insights (Setion 4), provide the impliations on storage system arhiteture (Setion 5), and suggest future work (Setion 6). 2. MOTIVATION AND BACKGROUND Past trae-based studies have examined a range of storage system protools and use ases, delivering valuable insights for designing storage servers. Table 2 summarizes the ontributions of past studies. Many studies predate urrent tehnology trends. Analysis of real-world, orporate workloads or traes have been sparse, with only three studies among the ones listed [13, 15, 18]. A number of studies have foused on NFS trae analysis only [8, 1]. This fous somewhat neglets systems using the Common Internet File System (CIFS) protool [5], with only a single CIFS study [15]. CIFS systems are important sine CIFS is the network storage protool for Windows, the dominant OS on ommodity platforms. Our work uses the same traes as [15], but we perform analysis using a methodology that extrats multi-dimensional insights at different layers. This methodology is suffiiently different from prior work as to make the analysis findings not omparable. The following disusses the need for this methodology. 2.1 Need for Insights at Different Layers We divide our view of the storage system into behavior at lients and servers. Storage lients interfae diretly with users, who reate and view ontent via appliations. Separately, servers store the ontent in a durable and effiient fashion over the network. Past network storage system trae studies fous mostly on storage servers (Table 2). Storage lient behavior is underrepresented primarily due to the reliane on stateless NFS traes. This leaves a knowledge gap about aess patterns at storage lients. Speifially, these questions are unanswered: Do appliations exhibit lear aess patterns? What are the user-level aess patterns? Any orrelation between users and appliations? Do all appliations interat with files the same way?

3 Study Date of File N/w Multi- Multi- Data Trae Insights/ Traes System FS Dim. Layer Set Info Contributions Ousterhout, et al. [17] 1985 BSD Eng Live Seminal patterns analysis: Large, sequential read aess; limited read-write; bursty I/O; short file lifetimes, et. Ramakrishnan, et al. [18] VAX/ VMS Eng, HPC, Corp Live Relationship between files and proesses - on usage patterns, sharing, et. Baker, et al. [3] 1991 Sprite Eng Live Analysis of distributed file system; omparison to [17], ahing effets. Gribble, et al. [1] Sprite, NFS, VxFS Eng, Bakup Live, Snap Workload self-similarity Doueur, et al. [7] 1998 FAT, FAT32, NTFS Vogels [24] 1998 FAT, NTFS Eng Snap Analysis of file and diretory attributes: size, age, lifetime, diretory depth Eng, HPC Live, Snap Supported past observations and trends in NTFS Zhou et al. [25] 1999 VFAT PC Live Analysis of personal omputer workloads Roselli, et al. [19] VxFS, Eng, Live Inreased blok lifetimes, ahing NTFS Server strategies Ellard, et al. [8] 21 NFS Eng, Live NFS peuliarities, pathnames an aid file layout Agrawal, et al. [1] 2-4 FAT, FAT32, NTFS Eng Snap Distribution of file size and type in namespae, hange in file ontents over time Leung, et al. [15] 27 CIFS Corp, Eng Kavalanekar, et al. [13] 27 NTFS Web, Corp This paper 27 CIFS Corp, Eng Live File re-open, sharing, ativity harateristis; hanges ompared to previous studies Live Study of web (live maps, web ontent, et.) workloads in servers via events traing. Live Setion 4 Table 2: Past studies of storage system traes. Corp stands for orporate use ases. Eng stands for engineering use ases. Live implies live requests or events in traes were studied, Snap implies snapshots of file systems were studied. Insights on these aess patterns lead to better design of both lients and servers. They enable server apabilities suh as per session quality of servie (QoS), or per appliation servie level objetives (SLOs). They also inform various onsolidation, ahing, and prefething deisions at lients. Eah of these aess patterns is visible only at a partiular semanti layer within the lient: users or appliations. We define eah suh layer as an aess unit, with the observed behaviors at eah aess unit being an aess pattern. The analysis of lient side aess units represents an improvement on prior work. On the server side, we extend the previous fous on files. We need to also understand how files are grouped within a diretory, as well as ross-file dependenies and diretory organization. Thus, we perform multi-layer and ross-layer dependeny analysis on the server also. This is another improvement on past work. 2.2 Need for Multi-Dimensional Insights Eah aess unit has ertain inherent harateristis. Charateristis that an be quantified are features of that aess unit. For example, for an appliation, the read size in bytes is a feature; the number of unique files aessed is another. Eah feature represents an independent mathematial dimension that desribes an aess unit. We use the terms dimension, feature, and harateristi interhangeably. The global set of features for an aess unit is limitless. Piking a good feature set requires domain knowledge. Many reent studies analyze aess patterns only one feature at a time. This represents a key limitation. The resulting insights, although valuable, lead to uniform poliies around a single design point. For example, study [15] revealed that most bytes are transferred from larger files. Although this is an useful observation, it does not reveal other harateristis of suh large files: Do they have repeated reads? Do they have overwrites? Do they have many metadata requests? And so on. Adding these dimensions breaks up the predominant aess pattern into smaller, minority aess patterns, eah may require a speifi storage poliy. Understanding minority aess patterns is inreasingly important, beause the trend toward data heterogeneity implies that no ommon ase will dominate storage system behavior. Minority aess patterns beome visible only upon analyzing multiple features simultaneously, hene the need for multi-dimensional insights. We also need to selet a reasonable number of features. Doing so allows us to fully desribe the aess patterns and redue the bias in piking any one feature. Manually identifying multi-feature dependenies is diffiult, and an lead to an untenable analysis. Therefore, we need tehniques that analyze a large number of features, sale to a high number of analysis data points, and do not require a priori knowledge of any ross-feature dependenies. Multi-dimensional statistis tehniques have solved similar problems in other domains [4, 9, 21]. We an apply similar tehniques and ombine them with domain speifi knowledge of the storage systems being analyzed.

4 In short, the need for multi-layered and multi-dimensional insights motivates our methodology. 3. METHODOLOGY In this setion, we desribe our analysis method in detail. We start with a desription of the traes we analyzed, followed by a desription of the aess units seleted for our study. Next, we desribe key steps in our analysis proess, inluding seleting the right features for eah aess unit, using the k-means data lustering algorithm to identify aess patterns, and additional information needed to interpret and generalize the results. 3.1 Traes Analyzed We olleted CIFS traes from two large-sale, enterpriselass file servers deployed at our orporate dataenters. One server overs roughly 1 employees in marketing, sales, finane, and other orporate roles. We all this the orporate trae. The other server overs roughly 5 employees in various engineering roles. We all this the engineering trae. We desribed the trae olleting infrastruture in [15]. The orporate trae reflets ativities on 3TB of ative storage from 9/2/27 to 11/21/27. It ontains ativity from many Windows appliations. The engineering trae reflets ativities on 19TB of ative storage from 8/1/27 to 11/14/27. It interleaves ativity from both Windows and Linux appliations. In both traes, many lients use virtualization tehnologies. Thus, we believe we have representative traes with regards to the tehnology trends in sale, heterogeneity, and onsolidation. Also, sine protoolindependent users, appliations, and stored data remain the primary fators affeting storage system behavior, we believe our analysis is relevant beyond CIFS. 3.2 Aess Units As mentioned in Setion 2.1, we analyze aess patterns at multiple aess units at the server and the lient. Seleting aess units is subjetive. We hose aess units that form lear semanti design boundaries. On the lient side, we analyze two aess units: Sessions: Sessions reflet aggregate behavior of an user. A CIFS session is bounded by mathing session onnet and logoff requests. CIFS identifies it by a tuple - {lient IP address, session ID}. Appliation instane: Analysis at this level leads to appliation speifi optimizations in lient VMs. CIFS identifies eah appliation instane by the tuple - {lient IP address, session ID, and proess ID}. We also analyzed file open-loses, but obtained no useful insights. Hene we omit that aess unit from the paper. We also examined two server side aess units: File: Analyzing file level aess patterns failitates perfile poliies and optimization tehniques. Eah file is uniquely identified by its full path name. Deepest subtree: This aess unit is identified by the diretory path immediately ontaining the file. Analysis at this level enables per-diretory poliies. Session App. Instane App. Instane App. Instane Deepest subtree A Deepest subtree A/B Figure 1: Aess units analyzed. At lients, eah session ontains many appliation instanes. At servers, eah subtree ontains many files. Figure 1 shows the semanti hierarhy among different aess units. At lients, eah session ontains many appliation instanes. At servers, eah subtree ontains many files. 3.3 Analysis Proess Our method (Figure 2) involves the following steps: 1. Collet network storage system traes (Setion 3.1). 2. Define the desriptive features for eah aess unit. This step requires domain knowledge about storage systems (Setion 3.3.1). 3. Extrat multiple instanes of eah aess unit, and ompute from the trae the orresponding numerial feature values of eah instane. 4. Input those values into k-means, a multi-dimensional statistial data lustering tehnique (Setion 3.3.2). 5. Interpret the k-means output and derive aess patterns by looking at only the relevant subset of features. This step requires knowledge of both storage systems and statistis. We also need to extrat onsiderable additional information to support our interpretations (Setion 3.3.3). 6. Translate aess patterns to design insights. We give more details about Steps 2, 4, and 5 below Seleting features for eah aess unit Seleting the set of desriptive features for eah aess unit requires domain knowledge about storage systems (Step 2 in Figure 2). It also introdues some subjetivity, sine the hoie of features limits on how one aess pattern an differ from another. The human designer needs to selet some basi features initially, e.g., total IO size and read-write ratio for a file. We will not know whether we have a good set of features until we have ompleted the entire analysis proess. If the analysis results leave some design hoie ambiguities, we need to add new features to larify those ambiguities, again using domain knowledge. For example, for the deepest subtrees, we ompute various perentiles (25th, 5th, and 75th) of ertain features like read-write ratio beause the average value for those features did not learly separate the aess patterns. We then repeat the analysis proess using the new feature set. This iterative proess leads to a long feature set for all aess units, somewhat reduing the subjetive bias of a small feature set. We list in Setion 4 the hosen features for eah aess unit. Most of the features used in our analysis (Setion 4) are selfexplanatory; some ambiguous or omplex features require preise definitions, suh as: File File File File

5 1. Trae olletion 2. Selet layers, define features 3. Compute numerial feature values means to other lustering algorithms suh as hierarhial lustering and k-means derivatives [2]. 6. Design impliations 5. Interpret results 4. Identify aess patterns by k-means Figure 2: Methodology overview. The two-way arrows and the loop from Step 2 through Step 5 indiate our many iterations between the steps. IO: We use IO as a substitute for read and write. Sequential reads or writes: We onsider two read or writes requests to be sequential if they are onseutive in time, and the file offset + request size of the first request equals the file offset of the seond request. A single read or write request is by definition not sequential. Repeated reads or overwrites: We trak aesses at 4KB blok boundaries within a file, with the offset of the first blok being zero. A read is onsidered repeated if it aesses a blok that has been read in the past half hour. We use an equivalent definition for overwrites Identifying aess patterns via k-means A key part of our methodology is the k-means multi-dimensional orrelation algorithm. We use it to identify aess patterns simultaneously aross many features (Step 4 in Figure 2). K-means is a well-known, statistial orrelation algorithm. It identifies sets of data points that ongregate around a region in n-dimensional spae. These ongregations are alled lusters. Given data points in an n- dimensional spae, k-means piks k points at random as initial luster enters, assigns data points to their nearest luster enters, and reomputes new luster enters via arithmeti means aross points in the luster. K-means iterates the assignment-reompute proess until the luster enters beome stationary. K-means an run with multiple sets of initial luster enters and return the best result [2]. For eah aess unit, we extrat different instanes of it from the trae, i.e., all session instanes, appliation instanes, et. For eah instane, we ompute the numerial values of all its features. This gives us a data array in whih eah row orrespond to an instane, i.e., a data point, and eah olumn orrespond to a feature, i.e., a dimension. We input the array into k-means, and the algorithm finds the natural lusters aross all data points. We onsider all data points in a luster as belonging to a single equivalene lass, i.e., a single aess pattern. The numerial values of the luster enters indiate the harateristis of eah aess pattern. We hoose k-means for two reasons. First, k-means is algorithmially simple. This allows rapid proessing on large data sets. We used a modified version of the k-means C library [14], in whih we made some improvements to limits the memory footprint when proessing large data sizes. Seond, k-means leads to intuitive labels of the luster enters. This helps us translate the statistial behavior extrated from the traes into tangible insights. Thus, we prefer k- K-means requires us to speify k, the number of lusters. This is a diffiult task sine we do not know a priori the number of natural lusters in the data. We ompute the intra-luster residual variane from the k-means results - the sum of squared distanes from eah data point to its assigned luster enter. This is a standard metri for luster quality, and gives us a lower bound on k. We annot set k so small that the residual variane forms a large fration of the total variane, i.e., residual variane the sum of squared distanes from eah data point to the global average of all data points. We optionally inrease k beyond the lower bound until some key aess patterns an be separated. Conurrently, we take are not to inrease k too high, to prevent having an unwieldy number of aess patterns and design targets. We applied this reasoning to set k at eah lient and server aess unit Interpreting and generalizing the results The k-means algorithm gives us a set of aess patterns with various harateristis. We need additional information to understand the signifiane of the results. This information omes from omputing various seondary data outside of k- means analysis (Step 5 in Figure 2: We gathered the start and end times of eah session instane, aggregated by times of the day and days of the week. This gave us insight into how users launh and end sessions. We examine filename extensions of files assoiated with every aess pattern belonging to these aess units: appliation instanes, files, and deepest subtrees. This information onnets the aess patterns to more easily reognizable file extensions. We perform orrelation analysis between the file and deepest subtrees aess units. Speifially, we ompute the number of files of eah file aess pattern that is loated within diretories in eah deepest subtree aess pattern. This information aptures the organizations of files in diretories. Suh information gives us a detailed piture about the semantis of the aess patterns, resulting in human understandable labels to the aess patterns. Suh labels help us translate observations to design impliations. Furthermore, after identifying the design impliations, we explore if the design insights an be extrapolated to other trae periods and other storage system use ases. We aomplish this by repeating our exat analysis over multiple subsets of the traes, for example, a week s worth of traes at a time. This allow us to examine how our analysis would be different had we obtained only a week s trae. Aess patterns that are onsistent, stable aross different weeks would indiate that they are likely to be more general than just our traing period or our use ases. 4. ANALYSIS RESULTS & IMPLICATIONS This setion presents the aess patterns we identified and the aompanying design insights. We disuss lient and

6 (a). Desriptive features for eah session Duration Total metadata requests Overwrite ratio Diretories aessed Total IO size Avg. time between IO requests Tree onnets Appliation instanes seen Read:write ratio by bytes Read sequentiality Unique trees aessed Total IO requests Write sequentiality File opens Read:write ratio by requests Repeated read ratio Unique files opened (b). Corporate session Full day work Half day on- Short ontent Short ontent Supporting Supporting aess patterns tent viewing viewing generation metadata read-write % of all sessions.5%.7% 1.2%.2% 96% 1.4% Duration 8 hrs 4 hrs 1 min 7 min 7 se 1 se Total IO size 11 MB 3 MB 128 KB 3 MB 42 B Read:write ratio by bytes 3:2 1: 1: :1 : 1:1 Metadata requests Read sequentiality 7% 8% % - - % Write sequentiality 8% - - 9% - % File opens:files 2:4 8:15 3:7 5:15 : 6:3 Tree onnet:trees 5:2 3:2 2:2 2:2 1:1 2:2 Diretories aessed Appliation instanes (). Engineering session Full day work Human edit Appliation gene- Short ontent Supporting Mahine geneaess patterns small files rated bakup/opy generation metadata rated update % of all sessions.4% 1.% 4.4%.4% 9% 3.6% Duration 1 day 2 hrs 1 min 1 hr 1 se 1 se Total IO size 5 MB 5 KB 2 MB 2 MB 36 B Read:write ratio 7:4 1:1 1: :1 : 1: Metadata requests Read sequentiality 6% % 9% - - % Write sequentiality 7% % - 9% - - File opens:files 13:2 9:2 6:5 15:6 : 1:1 Tree onnet:trees 1:1 1:1 1:1 1:1 1:1 1:1 Diretories aessed Appliation instanes Table 3: Session aess patterns. (a): Full list of desriptive features. (b) and (): Short names and desriptions of sessions in eah aess pattern; listing only the features that help separate the aess patterns. serve side aess patterns (Setion 4.1, 4.2). We also hek if these patterns persist aross time (Setion 4.3). For eah aess unit, we list the desriptive features (only some of whih help separate aess patterns), outline how we derived the high-level name (label) for eah aess pattern, and disuss relevant design insights. 4.1 Client Side Aess Patterns As mentioned in Setion 3.2, we analyze sessions and appliation instanes at lients Sessions Sessions reflet aggregate behavior of human users. We used 17 features to desribe sessions (Table 3). The orporate trae has 59,76 sessions, and the engineering trae has 232,33. In Table 3, we provide quantitative desriptions and short names for all the session aess patterns. We derive the names from examining the signifiant features: duration, read-write ratio, and IO size. We also looked at the aggregate session start and end times to get additional semanti knowledge about eah aess pattern. Figure 3 shows the start and end times for seleted session aess patterns. The start times of orporate full-day work sessions orrespond exatly to the U.S. work day 9am start, 12pm lunh, 5pm end. Corporate ontent generation sessions show slight inrease in the evening and towards Friday, indiating rushes to meet daily or weekly deadlines. In the engineering trae, the appliation generated bakup and mahine generated update sessions depart signifiantly from human workday and work week patterns, leading us to label them as appliation and mahine (lient OS) generated. One surprise was that the supporting metadata sessions aount for >9% of all sessions in both traes. We believe these sessions are not humanly generated. They last roughly 1 seonds, leaving little time for human mediated interations. Also, the session start rate averages to roughly one per employee per minute. We are ertain that our olleagues are not onneting and logging off every minute of the entire day. However, the shape of the start time graphs have a strong orrelation with the human work day and work week. We all these supporting metadata sessions mahine generated in support of human user ativities. These metadata sessions form a sort of bakground noise to the storage system. We observe the same bakground noise at other layers both at lients and servers. Observation 1: The sessions with IO sizes greater than 128KB are either read-only or write-only, exept for the full-day work sessions. Among these sessions, only read-only sessions utilize buffer ahe for repeated reads and prefethes. Write-only sessions only use the ahe to buffer writes. Thus, if we have a ahe evition poliy that reognizes their writeonly nature and releases the buffers immediately on flushing dirty data, we an satisfy many write-only sessions with relatively little buffer ahe spae. We an attain better onsolidation and buffer ahe utilization by managing the ratio of o-loated read-only and write-only sessions. This insight an be used by virtualization managers and lient operating systems to manage a shared buffer ahe between sessions. Reognizing suh read-only and write-only sessions

7 # of sessions # of sessions Corp full day work start end Corp short ontent generation 5 25 Corp supporting metadata Eng appliation generated bakup or opy Eng mahine generated update hrs of the day hrs of the day hrs of the day hrs of the day hrs of the day days of the week days of the week days of the week days of the week days of the week Figure 3: Number of sessions that start or ends at a partiular time. Number of session starts and ends in times of the day (top) and session starts in days of the week (bottom). Showing only seleted aess patterns. is easy. Examining a session s total read size and write size reveals their read-only or write-only nature. Impliation 1: Clients an onsolidate sessions effiiently based only on the read-write ratio. Observation 2: The full-day work, ontent-viewing, and ontent-generating sessions all do 1MB of IO. This means that a lient ahe of 1s of MB an fit the working set of a day for most sessions. Given the growth of flash devies on lients for ahing, despite large-sale onsolidation, lients should easily ahe a day s worth of data for all users. In suh a senario, most IO requests would be absorbed by the ahe, reduing network lateny and bandwidth utilization, and load on the server. Moreover, omplex ahe evition algorithms are unneessary. Impliation 2: Clients ahes an already fit an entire day s IO. Observation 3: The number of human-generated sessions and supporting sessions peaks on Monday and dereases steadily to 8% of the peak on Friday (Figure 3). This is true for all human generated sessions, inluding the ones not shown in Figure 3. There is onsiderable slak in the server load during evenings, lunh times, and even during working hours. This implies that the server an perform bakground tasks suh as onsisteny heks, maintenane, or ompression/dedupliation, at appropriate times during the week. A simple ount of ative sessions an serve as an effetive start and stop signal. By omputing the area under the urve for session start times by days of the week, we estimate that bakground tasks an squeeze out roughly one extra day s worth of proessing without altering the peak demand on the system. This is a 5% improvement over a setup whih performs bakground tasks only during weekends. In the engineering trae, the appliation generated bakup or opy sessions seem to have been already designed this way. Impliation 3: Servers get an extra day for bakground tasks by running them at appropriate times during week-days Appliation instanes Appliation instane aess patterns reflets appliation behavior, failitating appliation speifi optimizations. We used 16 features to desribe appliation instanes (Table 4). The orporate trae has 138,723 appliation instanes, and the engineering trae has 741,319. Table 4 provides quantitative desriptions and short names for all the appliation instane aess patterns. We derive the names from examining the read-write ratio, IO size, and file extensions aessed (Figures 4 and 5). We see again the metadata bakground noise. The supporting metadata appliation instanes aount for the largest fration, and often do not even open a file. There are many files without a file extension, a phenomenon also observed in reent storage system snapshot studies [16]. We notie that file extensions turn out to be poor indiators of appliation instane aess patterns. This is not surprising beause we separate aess patterns based on read/write properties. A user ould either view a. or reate a.. The same appliation software has different read/write patterns. This speaks to the strength of our multi-layer framework. Aggregating IO by appliation instanes gives lean separation of patterns; while aggregating just by appliation software or file extensions will not. We also find it interesting that most file extensions are immediately reognizable. This means that what people use network storage systems for, i.e., the file extensions, remains easily reognizable, even though how people use network storage systems, i.e., the aess patterns, is ever hanging and beoming more omplex. Observation 4: The small ontent viewing appliation and ontent update appliation instanes have <4KB total reads per file open and aess a few unique files many times. The small read size and multiple reads from the same files means that lients should prefeth and plae the files in a ahe optimized for random aess (flash/ssd/memory). The trend towards flash ahes on lients should enable this transfer. Appliation instanes have bi-modal total IO size - either

8 Fration of appliation instanes Fration of appliation instanes very small or large. Thus, a simple ahe management algorithm suffies; we always keep the first 2 bloks of 4KB in ahe. If the appliation instane does more IO, it is likely to have IO size in the 1KB-1MB range, so we evit it from the ahe. We should note that suh a poliy makes sense even though we proposed earlier to ahe all 11MB of a typial day s working set - 11MB of ahe beomes a onern when we have many onsolidated lients. Impliation 4: Clients should always ahe the first few KB of IO per file per appliation. Observation 5: We see >5% sequential read and write ratio for the ontent update appliations instanes (orporate) and the ontent viewing appliations instanes for humangenerated ontent (both orporate and engineering). Dividing the total IO size by the number of file opens suggest that these appliation instanes are sequentially reading and writing entire files for offie produtivity (.,.,.ppt,.pdf, et.) and multimedia appliations. This implies that the files assoiated with these appliations should be prefethed and delegated to the lient. Prefething means delivering the whole file to the lient before the whole file is requested. Delegation means giving a lient temporary, exlusive aess to a file, with the lient periodially synhronizing to server to ensure data durability. CIFS does delegation using opportunisti loks, while NFSv4 has a dediated operation for delegation. Prefething and delegation of suh files will improve read and write performane, lower network traffi, and lighten server load. The aess patterns again offer a simple, threshold-based deision algorithm. If an appliation instane does more than 1s of KB of sequential IO, and has no overwrite, then it is likely to be a ontent viewing or update appliation instane; suh files are prefethed and delegated to the lients. Impliation 5: Clients an request file prefeth (read) and delegation (write) based on only IO sequentiality. Observation 6: Engineering appliations with >5% sequential reads and >5% sequential writes are doing ode ompile tasks. We know this from looking at the file extensions in Figure 5. These ompile proesses show read sequentiality, write sequentiality, a signifiant overwrite ratio and large number of metadata requests. They rely on the server heavily for data aesses. We need more detailed lient side information to understand why lient ahes are ineffetive in this ase. However, it is lear that the server ahe needs to prefeth the read files for these appliations. The high perentage of sequential reads and writes gives us another threshold-based algorithm to identify these appliations. Impliation 6: Servers an identify ompile tasks by the presene of both sequential reads and writes; server has to ahe the output of these tasks. 4.2 Server Side Aess Patterns As mentioned in Setion 3.2, we analyzed two kinds of server side aess units: files and deepest subtrees Files File aess patterns help storage server designers develop per-file plaement and optimization tehniques. We used 25 features to desribe files (Table 5). Note that some of n.f.e. + lnk n.f.e. + ppt ini pdf n.f.e. + n.f.e. + ontent viewing app - app generated ontent n.f.e. + n.f.e. no files opened supporting metadata n.f.e. + n.f.e. + htm n.f.e. + n.f.e. + n.f.e. app generated file updates n.f.e. + pdf n.f.e. + ppt n.f.e. + pdf n.f.e. + ontent viewing app - human generated ontent n.f.e. + pdf n.f.e. + ppt n.f.e. + lnk n.f.e. + n.f.e. + ontent update app Figure 4: File extensions for orporate appliation instane aess patterns. For eah aess pattern (olumn), showing the fration of the two most frequent file extensions that are aessed together within a single appliation instane. n.f.e. denotes files with no file extension d + h n.f.e. + h + d h + dbo h + h + o ompilation app n.f.e. no files opened supporting metadata txt n.f.e. + n.f.e. rnd ontent update app - small pdf bmp ontent viewing app - human generated ontent txt n.f.e. pst rnd + n.t.e. ontent viewing app - small Figure 5: File extensions for engineering appliation instane aess patterns. For eah aess pattern (olumn), showing the fration of the two most frequent file extensions that are aessed together within a single appliation instane. n.f.e. denotes files with no file extension. the features inlude different perentiles of a harateristi, e.g., read request size as perentiles of all read requests. We believe inluding different perentiles rather than just the average would allow better separation of aess patterns. The orporate trae has 1,155,99 files, and the engineering trae has 1,89,571. In Table 5, we quantitative desriptions and short names for all the file aess patterns. Figures 6 and 7 give the most ommon file extensions in eah. We derived the names by examining the read-write ratio and IO size. For the engineering trae, examining the file extensions also proved useful, leading to labels suh as edit ode and ompile output, and read only log/bakup.

9 (a). Desriptive features for eah appliation instane Total IO size Total metadata requests Repeated read ratio File opens Read:write ratio by bytes Avg. time between IO requests Overwrite ratio Unique files opened Total IO requests by bytes Read sequentiality Tree onnets Diretories aessed Read:write ratio by requests Write sequentiality Unique trees aessed File extensions aessed (b). Corporate appliation Content viewing app - Supporting App generated Content viewing app - Content update instane aess patterns app generated ontent metadata file updates human generated ontent app % of all app instanes 16% 56% 14% 8.8% 5.1% Total IO 1 KB 1 KB 8 KB 3.5 MB Read:write ratio 1: : 1:1 1: 2:3 Metadata requests Read sequentiality 5% - % 8% 5% Write sequentiality - - % - 8% Overwrite ratio - - % - 5% File opens:files 19:4 : 1:4 2:4 6:11 Tree onnet:trees 2:2 : 2:2 2:2 2:2 Diretories aessed File extensions aessed (). Engineering appliation Compilation app Supporting Content update Content viewing app - Content viewing instane aess patterns metadata app - small human generated ontent app - small % of all app instanes 1.6% 93%.9% 2.% 2.5% Total IO 2 MB 2 KB 1 MB 3 KB Read:write ratio 9:1 : :1 1: 1: Metadata requests Read sequentiality 5% - - 9% % Write sequentiality 8% - % - - Overwrite ratio 2% - % - - File opens:files 145:75 : 3:1 5:4 2:1 Tree onnet:trees 1:1 : 1:1 1:1 1:1 Diretories aessed File extensions aessed Table 4: Appliation instane aess patterns. (a): Full list of desriptive features. (b) and (): Short names and desriptions of appliation instanes in eah aess pattern; listing only the features that help separate the aess patterns. We see that there are groupings of files with similar extensions. For example, in the orporate trae, the small random read aess patterns inlude many file extensions assoiated with web browser ahes. Also, multi-media files like.mp3 and. ongregate in the sequential read and write aess patterns. In the engineering trae, ode libraries group under the sequential write files, and read only log/bakup files ontain file extensions. to.99. However, the most ommon file extensions in eah trae still spread aross many aess patterns, e.g., offie produtivity files in the orporate trae and ode files in the engineering trae. Observation 7: For files with >7% sequential reads or sequential writes, the repeated read and overwrite ratios are lose to zero. This implies that there is little benefit in ahing these files at the server. They should be prefethed as a whole and delegated to the lient. Again, the bimodal IO sequentiality offers a simple algorithm for the server to detet whih files should be prefethed and delegated if a file has any sequential aess, it is likely to have a high perentage of sequential aess, therefore it should be prefethed and delegated to the lient. Future storage servers an suggest suh information to lients, leading to delegation requests. Impliation 7: Servers should delegate sequentially aessed files to lients to improve IO performane. Observation 8: In the engineering trae, only the edit ode and ompile output files have a high % of repeated reads. Those files should be delegated to the lients as well. The repeated reads do not show up in the engineering appliation instanes, possibly beause a ompilation proess launhes many hild proesses repeatedly reading the same files. Eah hild proess reads fresh data, even though the server sees repeated reads. With larger memory or flash ahes at lients, we expet this behavior to drop. The working set issues that lead to this senario need to be examined. If the repeated reads ome from a single lient, then the server an suggest that the lient ahe the appropriate files. We an again employ a threshold-based algorithm. Deteting any repeated reads at the server signals that the file should be delegated to the lient. At worst, only the first few reads will hit the server. Subsequent repeated reads are stopped at the lient. Impliation 8: Servers should delegate repeatedly read files to lients. Observation 9: Almost all files are ative (have opens, IO, and metadata aess) for only 1-2 hours over the entire trae period, as indiated by the typial opens/read/write ativity of all aess patterns. There are some regularly aessed files, but they are so few that they do not affet the k-means analysis. The lak of regular aess for most files means that there is room for the server to employ tehniques to inrease apaity by doing ompation on idle files. Common tehniques inlude dedupliation and ompression. The ativity on these files indiate that the IO performane impat should be small. Even if run onstantly, ompation has a low probability of affeting an ative file. Sine ommon libraries like gzip optimize for deompression [11], deompressing files at read time should have only slight performane impat. Impliation 9: Servers an use file idle time to ompress or dedupliate data to inrease storage apaity.

10 Fration of files Fration of files (a). Desriptive features for eah file Number of hours with 1, 2-3, or 4 file opens Number of hours with 1-1KB, 1KB-1MB, or >1MB reads Number of hours with 1-1KB, 1KB-1MB, or >1MB writes Number of hours with 1, 2-3, or 4 metadata requests Read request size - 25th, 5th, and 75th perentile of all requests Write request size - 25th, 5th, and 75th perentile of all requests Avg. time between IO requests - 25th, 5th, and 75th perentile of all request pairs Read sequentiality Write sequentiality Read:write ratio by bytes Repeated read ratio Overwrite ratio (b). Corporate file Metadata only Sequential write Sequential read Small random Smallest random Small random aess patterns write read read % of all files 59% 4.% 4.1% 4.7% 19% 9.2% Opens ativity 2hrs, 1 open 1hr, 2-3 opens 1hr, 2-3 opens 1hr, 2-3 opens 1hr, 1 open 1hr, 1 open Read ativity 1hr, 1KB-1MB 1hr, 1-1KB 1hr, 1-1KB Write ativity 1hr, 1KB-1MB 1hr, 1-1KB Read request size KB - 2KB 32KB Write request size - 6KB KB - - Read sequentiality - - 7% - % % Write sequentiality - 8% - % - - Read:write ratio : :1 1: :1 1: 1: (). Engineering file Metadata only Sequential write Small random Edit ode & Sequential read Read-only aess patterns read ompile output log/bakup % of all files 42% 1.9% 32% 7.3% 8.3% 8.1% Opens ativity 1hr, 1 open 1hr, 2-3 opens 1hr, 2-3 opens 1hr, 2-3 opens 1hr, 2-3 opens 2hrs, 2-3 opens Read ativity 1hr, 1-1KB 1hr, 1-1KB 1hr, 1-1KB 2hrs, 1-1KB Write ativity 1hr, >1MB Read request size KB 4KB 8-16KB 1KB Write request size - 64KB Read sequentiality - - % % 7% % Write sequentiality - 9% Repeated read ratio - - % 5% % % Read:write ratio : :1 1: 1: 1: 1: Table 5: File aess patterns. (a): Full list of desriptive features. (b) and (): Short names and desriptions of files in eah aess pattern; listing only the features that help separate the aess patterns xml io z pdf mp3 ppt pdf mp3 swf htm zip gif xml ppt eth h d bf o wma jar dll mp3 gif d h o d bq h h o bq -99 metadata only files seq write files seq read files small random write files smallest random read files small random read files idle, only metadata seq write files small random read files edit ode & ompile output seq read files read only log/bakup files Figure 6: File extensions for orporate files. Fration of file extensions in eah file aess pattern. Figure 7: File extensions for engineering files. Fration of file extensions in eah file aess pattern. Observation 1: All files have either all random aess or >7% sequential aess. The small random read and write files in both traes an benefit from being plaed on media with high random aess performane, suh as solid state drives (SSDs). Files with a high perentage of sequential aess an reside on traditional hard disk drives (HDDs), whih already optimize for sequential aess. The bimodal IO sequentiality offers yet another threshold-based plaement algorithm if a file has any sequential aess, it is likely to have a high perentage of sequential aess; therefore plae it on HDDs. Otherwise, plae it on SSDs. We note that there are more randomly aessed files than sequentially aessed files. Even though sequential files tend to be larger, we still need to do a working set analysis to determine the right size of server SSDs for eah use ase. Impliation 1: Servers an selet the best storage medium for eah file based only on aess sequentiality Deepest subtrees Deepest subtree aess patterns help storage server designers develop per-diretory poliies. We used 4 features to desribe deepest subtrees (Table 6). Some of the features

11 Fration of files Fration of files xml temp dirs for real data gif msg xml zip swf pdf mp3 lient mix read aheable dirs, dirs mostly seq xml io z metadata only dirs mp3 pdf mix write dirs, mostly seq zip htm gif xml small random read dirs Figure 8: File extensions for orporate deepest subtrees. Fration of file extensions in deepest subtree aess patterns eth h d o bf metadata only dir bq h small random read dir d bq h d o lient mix read aheable dirs, dirs mostly seq dll ryy jar seq write dirs obj dbo d o temp dirs for real data Figure 9: File extensions for engineering deepest subtrees. Fration of file extensions in deepest subtree aess patterns. inlude different perentiles of a harateristi, e.g. per file read sequentiality as perentiles of all files in a diretory. Inluding different perentiles rather than just the average allows better separation of aess patterns. The orporate trae has 117,64 deepest subtrees, and the engineering trae has 161,858. We use diretories and deepest subtrees interhangeably. In Table 6, we provide quantitative desriptions and short names for all the deepest subtree aess patterns. We derive the names using two types of information. First, we analyze the file extensions in eah subtree aess pattern (Figures 8 and 9). Seond, we examine how many files of eah file aess patterns are within eah subtree pattern (Figures 1). For brevity, we show only the graph for orporate deepest subtrees. The graph for the engineering deepest subtrees onveys the same information with regard to our design insights. For example, the random read and lient aheable labels ome from looking at the IO patterns. Temporary diretories aounted for the. files in those diretories. Mix read and mix write diretories onsidered the presene of both sequential and randomly aessed files in those diretories. The metadata bakground noise remains visible at the subtree layer. The spread of file extensions is similar to that for file aess patterns some file extensions ongregate and spread evenly. Interestingly, some subtrees have a large fration of metadata-only files that do not affet the desriptions of those subtrees. Some subtrees ontain only files of a single aess pattern (e.g., small random read subtrees in Figures 1). There, we an apply the design insights from the file aess patterns to the entire subtree. For example, the small random read subtrees an reside on SSDs. Sine there are more files than subtrees, per-subtree poliies an lower the amount of poliy information kept at the server. In ontrast, the mix read and mix write diretories ontain both sequential and randomly aessed files. Those subtrees need per-file poliies: Plae the sequentially aessed files on HDDs and the randomly aessed files on SSDs. Soft links to files an preserve the user-faing diretory organization, while allowing the server optimize per-file plaement. The server should automatially deide when to apply per-file or per-subtree poliies. Observation 11: Diretories with sequentially aessed files almost always ontain randomly aessed files also. Conversely, some diretories with randomly aess files will not ontain sequentially aessed files. Thus, we an default all subtrees to per-subtree poliies. Conurrently, we trak the IO sequentiality per subtree. If the sequentiality is above some threshold, then the subtree swithes to per-file poliies. Impliation 11: Servers an hange from per-diretory plaement poliy (default) to per-file poliy upon seeing any sequential IO to any files in a diretory. Observation 12: The lient aheable subtrees and temporary subtrees aggregate files with repeated reads or overwrites. Additional omputation showed that the repeated reads and overwrites almost always ome from a single lient. Thus, it is possible for the entire diretory to be prefethed and delegated to the lient. Delegating entire diretories an preempt all aesses that are loal to a diretory, but onsumes lient ahe spae. We need to understand the tradeoffs through a more in-depth working set and temporal loality analysis at both the file and deepest subtree levels. Impliation 12: Servers an delegate repeated read and overwrite diretories entirely to lients, tradeoffs permitting. 4.3 Aess Pattern Evolutions Over Time We want to know if the aess patterns are restrited to our partiular traing period or if they persist aross time. Only if the design insights remain relevant aross time an we rationalize their existene in similar use ases. We do not have enough traes to generalize beyond our monitoring period. We investigate the reverse problem - if we

12 (a). Desriptive features for eah subtree Number of hours with 1, 2-3, or 4 file opens Read:write ratio - 25th, 5th, and 75th perentile of files Number of hours with 1-1KB, 1KB-1MB, or >1MB reads Repeated read ratio - 25th, 5th, and 75th perentile of files Number of hours with 1-1KB, 1KB-1MB, or >1MB writes Overwrite ratio - 25th, 5th, and 75th perentile of files Number of hours with 1, 2-3, or 4 metadata requests Read sequentiality - aggregated aross all files Read request size - 25th, 5th, and 75th perentile of all requests Write sequentiality - aggregated aross all files Write request size - 25th, 5th, and 75th perentile of all requests Read:write ratio - aggregated aross all files Avg. time between IO requests - 25th, 5th, and 75th perentile of all request pairs Repeated read ratio - aggregated aross all files Read sequentiality - 25th, 5th, and 75th perentile of files in the subtree Overwrite ratio - aggregated aross all files Write sequentiality - 25th, 5th, and 75th perentile of files in the subtree (b). Corp. subtree Temp dirs for Client ahe- Mix read dirs, Metadata only Mix write dirs, Small random aess patterns real data able dirs mostly sequential dirs mostly sequential read dirs % of all subtrees 2.3% 4.1% 5.6% 64% 3.5% 21% Opens ativity 3hrs, >4 opens 3hr, 1 open 2hr, 1 open 2hr, 1 open 1hr, >4 opens 1hr, >4 opens Read ativity 3hrs, 1-1KB 2hrs, 1-1KB 1hr, 1-1KB 1hr, 1-1KB Write ativity 2hrs,.1-1MB 1hr, >1MB Read request size 4KB 4-1KB 4-32KB KB Write request size 4KB KB - Read sequentiality 1-3% % 5-7% - - % Write sequentiality 5-7% % - Repeat read ratio 2-5% 5% % - - % Overwrite ratio 3-7% % - Read:write ratio 1: to :1 1: 1: : :1 1: (). Eng. subtree Metadata only Small random Client ahe- Mixed read dirs, Sequential Temp dirs for aess patterns dirs read dirs able dirs mostly sequential write dirs real data % of all subtrees 59% 25% 6.1% 7.1% 1.9% 1.3% Opens ativity 1hr, 2-3 opens 1hr, >4 opens 1hr, >4 opens 1hr, >4 opens 1hr, >4 opens 3hrs, >4 opens Read ativity 1hr, 1-1KB 1hr, 1-1KB 1hr,.1-1MB 3hrs, 1-1KB Write ativity 1hr,.1-1MB 1hr, 1-1KB Read request size - 1-4KB 2-4KB 8-1KB KB Write request size KB 4-6KB Read sequentiality - % % 4-7% % Write sequentiality % 6-8% Repeat read ratio - % 5-6% % - -4% Overwrite ratio % -3% Read:write ratio : 1: 1: 1: :1 1: to :1 Table 6: Deepest subtree aess patterns. (a): Full list of desriptive features. (b) and (): Short names and desriptions of subtrees in eah aess pattern; listing only the features that help separate aess patterns. had to analyze traes from only a subset of our traing period, how would our results differ? We divided our traes into weeks and repeated the analysis for eah week. For brevity, we present only the results for weekly analysis of orporate appliation instanes and files. These two layers have yielded the most interesting design insights and they highlight separate onsiderations at the lient and server. Figure 11 shows the result for files. All the large aess patterns remain steady aross the weeks. However, the aess pattern orresponding to the smallest number of files, the small random write files, omes and goes week to week. There are exatly two, temporary, previously unseen aess patterns that are very similar to the small random files. The peaks in the metadata only files orrespond to weeks that ontain U.S. federal holidays or weeks immediately preeding a holiday long weekend. Furthermore, the numerial values of the desriptive features for eah aess pattern vary in a moderate range. For example, the write sequentiality of the sequentiality write files ranges from 5% to 9%. Figure 12 shows the result for appliation instanes. We see no new aess patterns, and the frational weight of eah aess pattern remains nearly onstant, despite holidays. Furthermore, the numerial values of desriptive features also remain nearly onstant. For example, the write sequentiality of the ontent update appliations varies in a narrow range from 8% to 85%. Thus, if we had done our analysis on just a week s traes, we would have gotten nearly idential results for appliation instanes, and qualitatively similar result for files. We believe that the differene omes from the limited duration of lient sessions and appliation instanes, versus the long-term persistene of files and subtrees. Based on our results, we are onfident that the aess patterns are not restrited just to our partiular trae period. Future storage systems should ontinuously monitor the aess patterns at all levels, automatially adjusting poliies as needed, and notify designers of previously unseen aess patterns. We should always be autious when generalizing aess patterns from one use ase to another. For use ases with the same appliations running on the same OS file API, we expet to see the same appliation instane aess patterns. Session aess patterns suh as daily work sessions are also likely to be general. For the server side aess patterns, we expet the files and subtrees with large frational weights to appear in other use ases. 5. ARCHITECTURAL IMPLICATIONS Setion 4 offered many speifi optimizations for plaement, ahing, delegation, and onsolidation deisions. We ombine the insights here to speulate on the arhiteture of future enterprise storage systems.

13 Fration of all app instanes Fration of all files # of files Temp dirs for real data Client aheable dirs Mix read dirs, mostly seq Metadata only dirs Mix write dirs, mostly seq Small random read dirs file aess patterns file aess patterns file aess patterns file aess patterns file aess patterns file aess patterns Figure 1: Corporate file aess patterns within eah deepest subtree. For eah deepest subtree aess pattern (i.e., eah graph), showing the number of files belonging to eah file aess pattern that belongs to subtrees in the subtree aess pattern. Corporate file aess pattern indies:. metadata only files; 1. sequential write files; 2. sequential read files; 3. small random write files; 4. small random read files; 5. less small random read files week # metadata only files sequential write files sequential read files small random write files smallest random read files small random read files medium partly seq write small read write speifi data plaement algorithms (Design insights 1 and 11). Also, the bakground metadata noise seen at all levels suggests that storage servers should both optimize for metadata aesses and redesign lient-server interations to derease the metadata hatter. Depending on the growth of metadata and the performane requirements, we also need to onsider plaing metadata on low lateny, non-volatile media like flash or SSDs. Figure 11: Corporate file aess patterns over 8 weeks. All patterns remain (hollow markers), but the frational weight of eah hanges greatly between weeks. Some small patterns temporarily appear and disappear (solid markers) week # supporting metadata app generated file updates ontent update app ontent viewing app - app generated ontent ontent viewing app - human generated ontent Figure 12: Corporate appliation instane aess patterns over 8 weeks. All patterns remain with near onstant frational weight. No new patterns appear. We see a lear separation of roles for lients and servers. The lient design an target high IO performane by a ombination of effiient delegation, prefething and ahing of the appropriate data. The servers should fous on inreasing their aggregated effiieny aross lients: ollaboration with lients (on ahing, delegation, et.) and exploiting user patterns to shedule bakground tasks. Automating bakground tasks suh as offline data dedupliation delivers apaity savings in a timely and hassle-free fashion, i.e., without system downtime or expliit sheduling. Regarding ahing at the server, we observe that very few aess patterns atually leverage the server s buffer ahe for data aesses. Design insights 4-6, 8 and 12 indiate a heavy role for the lient ahe and Design insight 7 suggests how not to use the server buffer ahe - ahing metadata only and ating as a warm/bakup ahe for lients would result in lower latenies for many aess patterns. We also see simple ways to take advantage of new storage media suh as SSDs. The lear identifiation of sequential and random aess file patterns enables effiient devie- Furthermore, we believe that storage systems should introdue many monitoring points to dynamially adjust the deision thresholds of plaement, ahing, or onsolidation poliies. We need to monitor both lients and servers. For example, when repeated read and overwrite files have been properly delegated to lients, the server would no longer see files with suh aess patterns. Without monitoring points at the lients, we would not be able to quantify the file delegation benefits. Storage systems should make extensible traing APIs to expedite the olletion of long-term future traes. This will failitate future work similar to ours. 6. CONCLUSIONS AND FUTURE WORK We must address the storage tehnology trends toward everinreasing sale, heterogeneity, and onsolidation. Current storage design paradigms that rely on existing trae analysis methods are ill equipped to meet the emerging hallenges beause they are unidimensional, fous only on the storage server, and are subjet to designer bias. We showed that a multi-dimensional, multi-layered trae-driven design methodology leads to more objetive design points with highly targeted optimizations at both storage lients and servers. Using our orporate and engineering use ases, we present a number of insights that informs future designs. We desribed in some detail the aess patterns we observed, and we enourage fellow storage system designers to extrat further insights from our observations. Future work inludes exploring the dynamis of hanging working sets and aess sequenes, with the goal of antiipating data aesses before they happen. Another worthwhile analysis is to look for optimization opportunities aross lients; this requires olleting traes at different lients, instead of only at the server. Also, we would like to explore opportunities for dedupliation, ompression, or data plaement. Doing so requires extending our analysis from data movement patterns to also inlude data ontent patterns. Furthermore, we would like to perform on-line analysis in live storage systems to enable dynami feedbak on plaement and optimization deisions. In addition, it would be useful

14 to build tools to synthesize the aess patterns, to enable designers to evaluate the optimizations we proposed here. We believe that storage system designers fae an inreasing hallenge to antiipate aess patterns. Our paper builds the ase that system designers an longer aurately antiipate aess patterns using intuition only. We believe that the orporate and engineering traes from our orporate headquarters would have similar use ases at other traditional and high-teh businesses. Other use ases would require us to perform the same trae olletion and analysis proess to extrat the same kind of ground truth. We also need similar studies at regular intervals to trak the evolving use of storage system. We hope that this paper ontributes to an objetive and prinipled design approah targeting rapidly hanging data aess patterns. NetApp, the NetApp logo, and Go further, faster are trademarks or registered trademarks of NetApp, In. in the United States and/or other ountries. 7. REFERENCES [1] N. Agrawal, W. J. Bolosky, J. R. Doueur, and J. R. Lorh. A Five-Year Study of File-System Metadata. In FAST 27. [2] E. Alpaydin. Introdution to Mahine Learning. MIT Press, Cambridge, Massahusetts, 24. [3] M. G. Baker, J. H. Hartman, M. D. Kupfer, K. W. Shirriff, and J. K. Ousterhout. Measurements of a distributed file system. In SOSP [4] P. Bodik, M. Goldszmidt, A. Fox, D. B. Woodard, and H. Andersen. Fingerprinting the dataenter: automated lassifiation of performane rises. In EuroSys 21. [5] Common Internet File System Tehnial Referene. Storage Network Industry Assoiation, 22. [6] IDC Whitepaper: The eonomis of Virtualization. Virtualization-appliation-based-ost-model-WP-EN. pdf. [7] J. R. Doueur and W. J. Bolosky. A Large-Sale Study of File-System Contents. In SIGMETRICS [8] D. Ellard, J. Ledlie, P. Malkani, and M. Seltzer. Passive NFS Traing of and Researh Workloads. In FAST 23. [9] A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Prediting Multiple Metris for Queries: Better Deisions Enabled by Mahine Learning. In ICDE 29. [1] S. Gribble, G. S. Manku, E. Brewer, T. J. Gibson, and E. L. Miller. Self-Similarity in File Systems: Measurement and Appliations. In SIGMETRICS [11] The gzip algorithm. [12] IDC Report: Worldwide File-Based Storage Foreast Update. http: // [13] S. Kavalanekar, B. L. Worthington, Q. Zhang, and V. Sharda. Charaterization of storage workload traes from prodution Windows Servers. In IISWC 28. [14] Open Soure Clustering Software - C Clustering Library. luster/software.htm, 21. [15] A. Leung, S. Pasupathy, G. Goodson, and E. Miller. Measurement and analysis of large-sale network file system workloads. In USENIX ATC 28. [16] D. T. Meyer and W. J. Bolosky. A Study of Pratial Dedupliation. In FAST 21. [17] J. K. Ousterhout, H. D. Costa, D. Harrison, J. A. Kunze, M. Kupfer, and J. G. Thompson. A trae-driven analysis of the Unix 4.2 BSD file system. In SOSP [18] K. K. Ramakrishnan, P. Biswas, and R. Karedla. Analysis of file I/O traes in ommerial omputing environments. In SIGMETRICS [19] D. Roselli, J. Lorh, and T. Anderson. A omparison of file system workloads. In USENIX 2. [2] I. Stoia. A Berkeley View of Big Data: Algorithms, Mahines and People. UC Berkeley EECS Annual Researh Symposium, 211. [21] K. Thomas, C. Grier, J. Ma, V. Paxson, and D. Song. Design and evaluation of a real-time URL spam filtering servie. In IEEE Symposium on Seurity and Privay 211. [22] R. Villars. The Migration to Converged IT: What it Means for Infrastruture, Appliations, and the IT Organization. IDC Diretions Conferene 211. [23] VMware Whitepaper: Server Consolidation and Containment. [24] W. Vogels. File system usage in Windows NT 4.. In SOSP [25] M. Zhou and A. J. Smith. Analysis of Personal Computer Workloads. In MASCOTS 1999.