Design Implications for Enterprise Storage Systems via Multi-Dimensional Trace Analysis

Size: px
Start display at page:

Download "Design Implications for Enterprise Storage Systems via Multi-Dimensional Trace Analysis"

Transcription

1 Design Impliations for Enterprise Storage Systems via Multi-Dimensional Trae Analysis Yanpei Chen, Kiran Srinivasan, Garth Goodson, Randy Katz University of California, Berkeley, NetApp In. {yhen2, {skiran, ABSTRACT Enterprise storage systems are faing enormous hallenges due to inreasing growth and heterogeneity of the data stored. Designing future storage systems requires omprehensive insights that existing trae analysis methods are ill-equipped to supply. In this paper, we seek to provide suh insights by using a new methodology that leverages an objetive, multi-dimensional statistial tehnique to extrat data aess patterns from network storage system traes. We apply our method on two large-sale real-world prodution network storage system traes to obtain omprehensive aess patterns and design insights at user, appliation, file, and diretory levels. We derive simple, easily implementable, threshold-based design optimizations that enable effiient data plaement and apaity optimization strategies for servers, onsolidation poliies for lients, and improved ahing performane for both. Categories and Subjet Desriptors C.4 [Performane of Systems]: Measurement tehniques; D.4.3 [Operating Systems]: File Systems Management Distributed file systems 1. INTRODUCTION Enterprise storage systems are designed around a set of data aess patterns. The storage system an be speialized by designing to a speifi data aess pattern; e.g., a storage system for streaming video supports different aess patterns than a ument repository. The better the aess pattern is understood, the better the storage system design. Insights into aess patterns have been derived from the analysis of existing file system workloads, typially through trae analysis studies [1, 3, 17, 19, 24]. While this is the orret general strategy for improving storage system design, past approahes have ritial shortomings, espeially given reent hanges in tehnology trends. In this paper, we present a new design methodology to overome these shortomings. The data stored on enterprise network-attahed storage systems is undergoing hanges due to a fundamental shift in the underlying tehnology trends. We have observed three suh trends, inluding: Sale: Data size grows at an alarming rate [12], due to new types of soial, business and sientifi appliations [2], and the desire to never delete data. Heterogeneity: The mix of data types stored on these storage systems is beoming inreasingly omplex, eah having its own requirements and aess patterns [22]. Consolidation: Virtualization has enabled the onsolidation of multiple appliations and their data onto fewer storage servers [6, 23]. These virtual mahines (VMs) also present aggregate data aess patterns more omplex than those from individual lients. Better design of future storage systems requires insights into the hanging aess patterns due to these trends. While trae studies have been used to derive data aess patterns, we believe that they have the following shortomings: Unidimensional: Although existing methods analyze many aess harateristis, they do so one at a time, without revealing ross-harateristi dependenies. Expertise bias: Past analyses were performed by storage system designers looking for speifi patterns based on prior workload expetations. This introdues a bias that needs to be revisited based on the new tehnology trends. Storage server entri: Past file system studies foused primarily on storage servers. This reates a ritial knowledge gap regarding lient behavior. To overome these shortomings, we propose a new design methodology baked by the analysis of storage system traes. We present a method that simultaneously analyzes multiple harateristis and their ross dependenies. We use a multi-dimensional, statistial orrelation tehnique, alled k-means [2], that is ompletely agnosti to the harateristis of eah aess pattern and their dependenies. The K-means algorithm an analyze hundreds of dimensions simultaneously, providing added objetivity to our analysis. To further redue expertise bias, we involve as many relevant harateristis as possible for eah aess pattern. In addition, we analyze patterns at different granularities (e.g., at the user session, appliation, file level) on the storage server as well as the lient, thus addressing the need for understanding lient patterns. The resulting design insights enable poliies for building new storage systems.

2 Client side observations and design impliations 1. Client sessions with IO sizes >128KB are read only or write only. Clients an onsolidate sessions based on only the read-write ratio. 2. Client sessions with duration >8 hours do 1MB of IO. Client ahes an already fit an entire day s IO. 3. Number of lient sessions drops off linearly by 2% from Monday to Friday. Servers an get an extra day for bakground tasks by running at appropriate times during week days. 4. Appliations with <4KB of IO per file open and many opens of a few files do only random IO. Clients should always ahe the first few KB of IO per file per appliation. 5. Appliations with >5% sequential read or write aess entire files at a time. Clients an request file prefeth (read) or delegation (write) based on only the IO sequentiality. 6. Engineering appliations with >5% sequential read and sequential write are doing ode ompile tasks, based on file extensions. Servers an identify ompile tasks; server has to ahe the output of these tasks. Server side observations and design impliations 7. Files with >7% sequential read or write have no repeated reads or overwrites. Servers should delegate sequentially aessed files to lients to improve IO performane. 8. Engineering files with repeated reads have random aesses. Servers should delegate repeatedly read files to lients; lients need to store them in flash or memory. 9. All files are ative (have opens, IO, and metadata aess) for only 1-2 hours in a few months. Servers an use file idle time to ompress or dedupliate to inrease storage apaity. 1. All files have either all random aess or >7% sequential aess. (Seen in past studies too) Servers an selet the best storage medium for eah file based on only aess sequentiality. 11. Diretories with sequentially aessed files almost always ontain randomly aessed files as well. Servers an hange from per-diretory plaement poliy (default) to per-file poliy upon seeing any sequential IO to any files in a diretory. 12. Some diretories aggregate only files with repeated reads and overwrites. Servers an delegate these diretories entirely to lients, tradeoffs permitting. Table 1: Summary of design insights, separated into insights derived from lient aess patterns and server aess patterns. We analyze two reent, network-attahed storage file system traes from a prodution enterprise dataenter. Table 1 summarizes our key observations and design impliations, they will be detailed later in the paper. Our methodology leads to observations that would be diffiult to extrat using past methods. We illustrate two suh aess patterns, one showing the value of multi-granular analysis (Observation 1 in Table 1) and another showing the value of multi-feature analysis (Observation 8). First, we observe (Observation 1) that sessions with more than 128KB of data reads or writes are either read-only or write-only. This observation affets shared ahing and onsolidation poliies aross sessions. Speifially, lient OSs an detet and o-loate ahe sensitive sessions (read-only) with ahe insensitive sessions (write-only) using just one parameter (read-write ratio). This improves ahe utilization and onsolidation (inreased density of sessions per server). Similarly, we observe (Observation 8) that files with >7% sequential read or sequential write have no repeated reads or overwrites. This aess pattern involves four harateristis: read sequentiality, write sequentiality, repeated read behavior, and overwrite behavior. The observation leads to a useful poliy: sequentially aessed files do not need to be ahed at the server (no repeated reads), whih leads to an effiient buffer ahe. These observations illustrate that our methodology an derive unique design impliations that leverage the orrelation between different harateristis. To summarize, our ontributions are: Identify storage system aess patterns using a multidimensional, statistial analysis tehnique. Build a framework for analyzing traes at different granularity levels at both server and lient. Analyze our speifi traes and present the aess patterns identified. Derive design impliations for various storage system omponents from the aess patterns. In the rest of the paper, we motivate and desribe our analysis methodology (Setions 2 and 3), present the aess patterns we found and the design insights (Setion 4), provide the impliations on storage system arhiteture (Setion 5), and suggest future work (Setion 6). 2. MOTIVATION AND BACKGROUND Past trae-based studies have examined a range of storage system protools and use ases, delivering valuable insights for designing storage servers. Table 2 summarizes the ontributions of past studies. Many studies predate urrent tehnology trends. Analysis of real-world, orporate workloads or traes have been sparse, with only three studies among the ones listed [13, 15, 18]. A number of studies have foused on NFS trae analysis only [8, 1]. This fous somewhat neglets systems using the Common Internet File System (CIFS) protool [5], with only a single CIFS study [15]. CIFS systems are important sine CIFS is the network storage protool for Windows, the dominant OS on ommodity platforms. Our work uses the same traes as [15], but we perform analysis using a methodology that extrats multi-dimensional insights at different layers. This methodology is suffiiently different from prior work as to make the analysis findings not omparable. The following disusses the need for this methodology. 2.1 Need for Insights at Different Layers We divide our view of the storage system into behavior at lients and servers. Storage lients interfae diretly with users, who reate and view ontent via appliations. Separately, servers store the ontent in a durable and effiient fashion over the network. Past network storage system trae studies fous mostly on storage servers (Table 2). Storage lient behavior is underrepresented primarily due to the reliane on stateless NFS traes. This leaves a knowledge gap about aess patterns at storage lients. Speifially, these questions are unanswered: Do appliations exhibit lear aess patterns? What are the user-level aess patterns? Any orrelation between users and appliations? Do all appliations interat with files the same way?

3 Study Date of File N/w Multi- Multi- Data Trae Insights/ Traes System FS Dim. Layer Set Info Contributions Ousterhout, et al. [17] 1985 BSD Eng Live Seminal patterns analysis: Large, sequential read aess; limited read-write; bursty I/O; short file lifetimes, et. Ramakrishnan, et al. [18] VAX/ VMS Eng, HPC, Corp Live Relationship between files and proesses - on usage patterns, sharing, et. Baker, et al. [3] 1991 Sprite Eng Live Analysis of distributed file system; omparison to [17], ahing effets. Gribble, et al. [1] Sprite, NFS, VxFS Eng, Bakup Live, Snap Workload self-similarity Doueur, et al. [7] 1998 FAT, FAT32, NTFS Vogels [24] 1998 FAT, NTFS Eng Snap Analysis of file and diretory attributes: size, age, lifetime, diretory depth Eng, HPC Live, Snap Supported past observations and trends in NTFS Zhou et al. [25] 1999 VFAT PC Live Analysis of personal omputer workloads Roselli, et al. [19] VxFS, Eng, Live Inreased blok lifetimes, ahing NTFS Server strategies Ellard, et al. [8] 21 NFS Eng, Live NFS peuliarities, pathnames an aid file layout Agrawal, et al. [1] 2-4 FAT, FAT32, NTFS Eng Snap Distribution of file size and type in namespae, hange in file ontents over time Leung, et al. [15] 27 CIFS Corp, Eng Kavalanekar, et al. [13] 27 NTFS Web, Corp This paper 27 CIFS Corp, Eng Live File re-open, sharing, ativity harateristis; hanges ompared to previous studies Live Study of web (live maps, web ontent, et.) workloads in servers via events traing. Live Setion 4 Table 2: Past studies of storage system traes. Corp stands for orporate use ases. Eng stands for engineering use ases. Live implies live requests or events in traes were studied, Snap implies snapshots of file systems were studied. Insights on these aess patterns lead to better design of both lients and servers. They enable server apabilities suh as per session quality of servie (QoS), or per appliation servie level objetives (SLOs). They also inform various onsolidation, ahing, and prefething deisions at lients. Eah of these aess patterns is visible only at a partiular semanti layer within the lient: users or appliations. We define eah suh layer as an aess unit, with the observed behaviors at eah aess unit being an aess pattern. The analysis of lient side aess units represents an improvement on prior work. On the server side, we extend the previous fous on files. We need to also understand how files are grouped within a diretory, as well as ross-file dependenies and diretory organization. Thus, we perform multi-layer and ross-layer dependeny analysis on the server also. This is another improvement on past work. 2.2 Need for Multi-Dimensional Insights Eah aess unit has ertain inherent harateristis. Charateristis that an be quantified are features of that aess unit. For example, for an appliation, the read size in bytes is a feature; the number of unique files aessed is another. Eah feature represents an independent mathematial dimension that desribes an aess unit. We use the terms dimension, feature, and harateristi interhangeably. The global set of features for an aess unit is limitless. Piking a good feature set requires domain knowledge. Many reent studies analyze aess patterns only one feature at a time. This represents a key limitation. The resulting insights, although valuable, lead to uniform poliies around a single design point. For example, study [15] revealed that most bytes are transferred from larger files. Although this is an useful observation, it does not reveal other harateristis of suh large files: Do they have repeated reads? Do they have overwrites? Do they have many metadata requests? And so on. Adding these dimensions breaks up the predominant aess pattern into smaller, minority aess patterns, eah may require a speifi storage poliy. Understanding minority aess patterns is inreasingly important, beause the trend toward data heterogeneity implies that no ommon ase will dominate storage system behavior. Minority aess patterns beome visible only upon analyzing multiple features simultaneously, hene the need for multi-dimensional insights. We also need to selet a reasonable number of features. Doing so allows us to fully desribe the aess patterns and redue the bias in piking any one feature. Manually identifying multi-feature dependenies is diffiult, and an lead to an untenable analysis. Therefore, we need tehniques that analyze a large number of features, sale to a high number of analysis data points, and do not require a priori knowledge of any ross-feature dependenies. Multi-dimensional statistis tehniques have solved similar problems in other domains [4, 9, 21]. We an apply similar tehniques and ombine them with domain speifi knowledge of the storage systems being analyzed.

4 In short, the need for multi-layered and multi-dimensional insights motivates our methodology. 3. METHODOLOGY In this setion, we desribe our analysis method in detail. We start with a desription of the traes we analyzed, followed by a desription of the aess units seleted for our study. Next, we desribe key steps in our analysis proess, inluding seleting the right features for eah aess unit, using the k-means data lustering algorithm to identify aess patterns, and additional information needed to interpret and generalize the results. 3.1 Traes Analyzed We olleted CIFS traes from two large-sale, enterpriselass file servers deployed at our orporate dataenters. One server overs roughly 1 employees in marketing, sales, finane, and other orporate roles. We all this the orporate trae. The other server overs roughly 5 employees in various engineering roles. We all this the engineering trae. We desribed the trae olleting infrastruture in [15]. The orporate trae reflets ativities on 3TB of ative storage from 9/2/27 to 11/21/27. It ontains ativity from many Windows appliations. The engineering trae reflets ativities on 19TB of ative storage from 8/1/27 to 11/14/27. It interleaves ativity from both Windows and Linux appliations. In both traes, many lients use virtualization tehnologies. Thus, we believe we have representative traes with regards to the tehnology trends in sale, heterogeneity, and onsolidation. Also, sine protoolindependent users, appliations, and stored data remain the primary fators affeting storage system behavior, we believe our analysis is relevant beyond CIFS. 3.2 Aess Units As mentioned in Setion 2.1, we analyze aess patterns at multiple aess units at the server and the lient. Seleting aess units is subjetive. We hose aess units that form lear semanti design boundaries. On the lient side, we analyze two aess units: Sessions: Sessions reflet aggregate behavior of an user. A CIFS session is bounded by mathing session onnet and logoff requests. CIFS identifies it by a tuple - {lient IP address, session ID}. Appliation instane: Analysis at this level leads to appliation speifi optimizations in lient VMs. CIFS identifies eah appliation instane by the tuple - {lient IP address, session ID, and proess ID}. We also analyzed file open-loses, but obtained no useful insights. Hene we omit that aess unit from the paper. We also examined two server side aess units: File: Analyzing file level aess patterns failitates perfile poliies and optimization tehniques. Eah file is uniquely identified by its full path name. Deepest subtree: This aess unit is identified by the diretory path immediately ontaining the file. Analysis at this level enables per-diretory poliies. Session App. Instane App. Instane App. Instane Deepest subtree A Deepest subtree A/B Figure 1: Aess units analyzed. At lients, eah session ontains many appliation instanes. At servers, eah subtree ontains many files. Figure 1 shows the semanti hierarhy among different aess units. At lients, eah session ontains many appliation instanes. At servers, eah subtree ontains many files. 3.3 Analysis Proess Our method (Figure 2) involves the following steps: 1. Collet network storage system traes (Setion 3.1). 2. Define the desriptive features for eah aess unit. This step requires domain knowledge about storage systems (Setion 3.3.1). 3. Extrat multiple instanes of eah aess unit, and ompute from the trae the orresponding numerial feature values of eah instane. 4. Input those values into k-means, a multi-dimensional statistial data lustering tehnique (Setion 3.3.2). 5. Interpret the k-means output and derive aess patterns by looking at only the relevant subset of features. This step requires knowledge of both storage systems and statistis. We also need to extrat onsiderable additional information to support our interpretations (Setion 3.3.3). 6. Translate aess patterns to design insights. We give more details about Steps 2, 4, and 5 below Seleting features for eah aess unit Seleting the set of desriptive features for eah aess unit requires domain knowledge about storage systems (Step 2 in Figure 2). It also introdues some subjetivity, sine the hoie of features limits on how one aess pattern an differ from another. The human designer needs to selet some basi features initially, e.g., total IO size and read-write ratio for a file. We will not know whether we have a good set of features until we have ompleted the entire analysis proess. If the analysis results leave some design hoie ambiguities, we need to add new features to larify those ambiguities, again using domain knowledge. For example, for the deepest subtrees, we ompute various perentiles (25th, 5th, and 75th) of ertain features like read-write ratio beause the average value for those features did not learly separate the aess patterns. We then repeat the analysis proess using the new feature set. This iterative proess leads to a long feature set for all aess units, somewhat reduing the subjetive bias of a small feature set. We list in Setion 4 the hosen features for eah aess unit. Most of the features used in our analysis (Setion 4) are selfexplanatory; some ambiguous or omplex features require preise definitions, suh as: File File File File

5 1. Trae olletion 2. Selet layers, define features 3. Compute numerial feature values means to other lustering algorithms suh as hierarhial lustering and k-means derivatives [2]. 6. Design impliations 5. Interpret results 4. Identify aess patterns by k-means Figure 2: Methodology overview. The two-way arrows and the loop from Step 2 through Step 5 indiate our many iterations between the steps. IO: We use IO as a substitute for read and write. Sequential reads or writes: We onsider two read or writes requests to be sequential if they are onseutive in time, and the file offset + request size of the first request equals the file offset of the seond request. A single read or write request is by definition not sequential. Repeated reads or overwrites: We trak aesses at 4KB blok boundaries within a file, with the offset of the first blok being zero. A read is onsidered repeated if it aesses a blok that has been read in the past half hour. We use an equivalent definition for overwrites Identifying aess patterns via k-means A key part of our methodology is the k-means multi-dimensional orrelation algorithm. We use it to identify aess patterns simultaneously aross many features (Step 4 in Figure 2). K-means is a well-known, statistial orrelation algorithm. It identifies sets of data points that ongregate around a region in n-dimensional spae. These ongregations are alled lusters. Given data points in an n- dimensional spae, k-means piks k points at random as initial luster enters, assigns data points to their nearest luster enters, and reomputes new luster enters via arithmeti means aross points in the luster. K-means iterates the assignment-reompute proess until the luster enters beome stationary. K-means an run with multiple sets of initial luster enters and return the best result [2]. For eah aess unit, we extrat different instanes of it from the trae, i.e., all session instanes, appliation instanes, et. For eah instane, we ompute the numerial values of all its features. This gives us a data array in whih eah row orrespond to an instane, i.e., a data point, and eah olumn orrespond to a feature, i.e., a dimension. We input the array into k-means, and the algorithm finds the natural lusters aross all data points. We onsider all data points in a luster as belonging to a single equivalene lass, i.e., a single aess pattern. The numerial values of the luster enters indiate the harateristis of eah aess pattern. We hoose k-means for two reasons. First, k-means is algorithmially simple. This allows rapid proessing on large data sets. We used a modified version of the k-means C library [14], in whih we made some improvements to limits the memory footprint when proessing large data sizes. Seond, k-means leads to intuitive labels of the luster enters. This helps us translate the statistial behavior extrated from the traes into tangible insights. Thus, we prefer k- K-means requires us to speify k, the number of lusters. This is a diffiult task sine we do not know a priori the number of natural lusters in the data. We ompute the intra-luster residual variane from the k-means results - the sum of squared distanes from eah data point to its assigned luster enter. This is a standard metri for luster quality, and gives us a lower bound on k. We annot set k so small that the residual variane forms a large fration of the total variane, i.e., residual variane the sum of squared distanes from eah data point to the global average of all data points. We optionally inrease k beyond the lower bound until some key aess patterns an be separated. Conurrently, we take are not to inrease k too high, to prevent having an unwieldy number of aess patterns and design targets. We applied this reasoning to set k at eah lient and server aess unit Interpreting and generalizing the results The k-means algorithm gives us a set of aess patterns with various harateristis. We need additional information to understand the signifiane of the results. This information omes from omputing various seondary data outside of k- means analysis (Step 5 in Figure 2: We gathered the start and end times of eah session instane, aggregated by times of the day and days of the week. This gave us insight into how users launh and end sessions. We examine filename extensions of files assoiated with every aess pattern belonging to these aess units: appliation instanes, files, and deepest subtrees. This information onnets the aess patterns to more easily reognizable file extensions. We perform orrelation analysis between the file and deepest subtrees aess units. Speifially, we ompute the number of files of eah file aess pattern that is loated within diretories in eah deepest subtree aess pattern. This information aptures the organizations of files in diretories. Suh information gives us a detailed piture about the semantis of the aess patterns, resulting in human understandable labels to the aess patterns. Suh labels help us translate observations to design impliations. Furthermore, after identifying the design impliations, we explore if the design insights an be extrapolated to other trae periods and other storage system use ases. We aomplish this by repeating our exat analysis over multiple subsets of the traes, for example, a week s worth of traes at a time. This allow us to examine how our analysis would be different had we obtained only a week s trae. Aess patterns that are onsistent, stable aross different weeks would indiate that they are likely to be more general than just our traing period or our use ases. 4. ANALYSIS RESULTS & IMPLICATIONS This setion presents the aess patterns we identified and the aompanying design insights. We disuss lient and

6 (a). Desriptive features for eah session Duration Total metadata requests Overwrite ratio Diretories aessed Total IO size Avg. time between IO requests Tree onnets Appliation instanes seen Read:write ratio by bytes Read sequentiality Unique trees aessed Total IO requests Write sequentiality File opens Read:write ratio by requests Repeated read ratio Unique files opened (b). Corporate session Full day work Half day on- Short ontent Short ontent Supporting Supporting aess patterns tent viewing viewing generation metadata read-write % of all sessions.5%.7% 1.2%.2% 96% 1.4% Duration 8 hrs 4 hrs 1 min 7 min 7 se 1 se Total IO size 11 MB 3 MB 128 KB 3 MB 42 B Read:write ratio by bytes 3:2 1: 1: :1 : 1:1 Metadata requests Read sequentiality 7% 8% % - - % Write sequentiality 8% - - 9% - % File opens:files 2:4 8:15 3:7 5:15 : 6:3 Tree onnet:trees 5:2 3:2 2:2 2:2 1:1 2:2 Diretories aessed Appliation instanes (). Engineering session Full day work Human edit Appliation gene- Short ontent Supporting Mahine geneaess patterns small files rated bakup/opy generation metadata rated update % of all sessions.4% 1.% 4.4%.4% 9% 3.6% Duration 1 day 2 hrs 1 min 1 hr 1 se 1 se Total IO size 5 MB 5 KB 2 MB 2 MB 36 B Read:write ratio 7:4 1:1 1: :1 : 1: Metadata requests Read sequentiality 6% % 9% - - % Write sequentiality 7% % - 9% - - File opens:files 13:2 9:2 6:5 15:6 : 1:1 Tree onnet:trees 1:1 1:1 1:1 1:1 1:1 1:1 Diretories aessed Appliation instanes Table 3: Session aess patterns. (a): Full list of desriptive features. (b) and (): Short names and desriptions of sessions in eah aess pattern; listing only the features that help separate the aess patterns. serve side aess patterns (Setion 4.1, 4.2). We also hek if these patterns persist aross time (Setion 4.3). For eah aess unit, we list the desriptive features (only some of whih help separate aess patterns), outline how we derived the high-level name (label) for eah aess pattern, and disuss relevant design insights. 4.1 Client Side Aess Patterns As mentioned in Setion 3.2, we analyze sessions and appliation instanes at lients Sessions Sessions reflet aggregate behavior of human users. We used 17 features to desribe sessions (Table 3). The orporate trae has 59,76 sessions, and the engineering trae has 232,33. In Table 3, we provide quantitative desriptions and short names for all the session aess patterns. We derive the names from examining the signifiant features: duration, read-write ratio, and IO size. We also looked at the aggregate session start and end times to get additional semanti knowledge about eah aess pattern. Figure 3 shows the start and end times for seleted session aess patterns. The start times of orporate full-day work sessions orrespond exatly to the U.S. work day 9am start, 12pm lunh, 5pm end. Corporate ontent generation sessions show slight inrease in the evening and towards Friday, indiating rushes to meet daily or weekly deadlines. In the engineering trae, the appliation generated bakup and mahine generated update sessions depart signifiantly from human workday and work week patterns, leading us to label them as appliation and mahine (lient OS) generated. One surprise was that the supporting metadata sessions aount for >9% of all sessions in both traes. We believe these sessions are not humanly generated. They last roughly 1 seonds, leaving little time for human mediated interations. Also, the session start rate averages to roughly one per employee per minute. We are ertain that our olleagues are not onneting and logging off every minute of the entire day. However, the shape of the start time graphs have a strong orrelation with the human work day and work week. We all these supporting metadata sessions mahine generated in support of human user ativities. These metadata sessions form a sort of bakground noise to the storage system. We observe the same bakground noise at other layers both at lients and servers. Observation 1: The sessions with IO sizes greater than 128KB are either read-only or write-only, exept for the full-day work sessions. Among these sessions, only read-only sessions utilize buffer ahe for repeated reads and prefethes. Write-only sessions only use the ahe to buffer writes. Thus, if we have a ahe evition poliy that reognizes their writeonly nature and releases the buffers immediately on flushing dirty data, we an satisfy many write-only sessions with relatively little buffer ahe spae. We an attain better onsolidation and buffer ahe utilization by managing the ratio of o-loated read-only and write-only sessions. This insight an be used by virtualization managers and lient operating systems to manage a shared buffer ahe between sessions. Reognizing suh read-only and write-only sessions

7 # of sessions # of sessions Corp full day work start end Corp short ontent generation 5 25 Corp supporting metadata Eng appliation generated bakup or opy Eng mahine generated update hrs of the day hrs of the day hrs of the day hrs of the day hrs of the day days of the week days of the week days of the week days of the week days of the week Figure 3: Number of sessions that start or ends at a partiular time. Number of session starts and ends in times of the day (top) and session starts in days of the week (bottom). Showing only seleted aess patterns. is easy. Examining a session s total read size and write size reveals their read-only or write-only nature. Impliation 1: Clients an onsolidate sessions effiiently based only on the read-write ratio. Observation 2: The full-day work, ontent-viewing, and ontent-generating sessions all do 1MB of IO. This means that a lient ahe of 1s of MB an fit the working set of a day for most sessions. Given the growth of flash devies on lients for ahing, despite large-sale onsolidation, lients should easily ahe a day s worth of data for all users. In suh a senario, most IO requests would be absorbed by the ahe, reduing network lateny and bandwidth utilization, and load on the server. Moreover, omplex ahe evition algorithms are unneessary. Impliation 2: Clients ahes an already fit an entire day s IO. Observation 3: The number of human-generated sessions and supporting sessions peaks on Monday and dereases steadily to 8% of the peak on Friday (Figure 3). This is true for all human generated sessions, inluding the ones not shown in Figure 3. There is onsiderable slak in the server load during evenings, lunh times, and even during working hours. This implies that the server an perform bakground tasks suh as onsisteny heks, maintenane, or ompression/dedupliation, at appropriate times during the week. A simple ount of ative sessions an serve as an effetive start and stop signal. By omputing the area under the urve for session start times by days of the week, we estimate that bakground tasks an squeeze out roughly one extra day s worth of proessing without altering the peak demand on the system. This is a 5% improvement over a setup whih performs bakground tasks only during weekends. In the engineering trae, the appliation generated bakup or opy sessions seem to have been already designed this way. Impliation 3: Servers get an extra day for bakground tasks by running them at appropriate times during week-days Appliation instanes Appliation instane aess patterns reflets appliation behavior, failitating appliation speifi optimizations. We used 16 features to desribe appliation instanes (Table 4). The orporate trae has 138,723 appliation instanes, and the engineering trae has 741,319. Table 4 provides quantitative desriptions and short names for all the appliation instane aess patterns. We derive the names from examining the read-write ratio, IO size, and file extensions aessed (Figures 4 and 5). We see again the metadata bakground noise. The supporting metadata appliation instanes aount for the largest fration, and often do not even open a file. There are many files without a file extension, a phenomenon also observed in reent storage system snapshot studies [16]. We notie that file extensions turn out to be poor indiators of appliation instane aess patterns. This is not surprising beause we separate aess patterns based on read/write properties. A user ould either view a. or reate a.. The same appliation software has different read/write patterns. This speaks to the strength of our multi-layer framework. Aggregating IO by appliation instanes gives lean separation of patterns; while aggregating just by appliation software or file extensions will not. We also find it interesting that most file extensions are immediately reognizable. This means that what people use network storage systems for, i.e., the file extensions, remains easily reognizable, even though how people use network storage systems, i.e., the aess patterns, is ever hanging and beoming more omplex. Observation 4: The small ontent viewing appliation and ontent update appliation instanes have <4KB total reads per file open and aess a few unique files many times. The small read size and multiple reads from the same files means that lients should prefeth and plae the files in a ahe optimized for random aess (flash/ssd/memory). The trend towards flash ahes on lients should enable this transfer. Appliation instanes have bi-modal total IO size - either

8 Fration of appliation instanes Fration of appliation instanes very small or large. Thus, a simple ahe management algorithm suffies; we always keep the first 2 bloks of 4KB in ahe. If the appliation instane does more IO, it is likely to have IO size in the 1KB-1MB range, so we evit it from the ahe. We should note that suh a poliy makes sense even though we proposed earlier to ahe all 11MB of a typial day s working set - 11MB of ahe beomes a onern when we have many onsolidated lients. Impliation 4: Clients should always ahe the first few KB of IO per file per appliation. Observation 5: We see >5% sequential read and write ratio for the ontent update appliations instanes (orporate) and the ontent viewing appliations instanes for humangenerated ontent (both orporate and engineering). Dividing the total IO size by the number of file opens suggest that these appliation instanes are sequentially reading and writing entire files for offie produtivity (.,.,.ppt,.pdf, et.) and multimedia appliations. This implies that the files assoiated with these appliations should be prefethed and delegated to the lient. Prefething means delivering the whole file to the lient before the whole file is requested. Delegation means giving a lient temporary, exlusive aess to a file, with the lient periodially synhronizing to server to ensure data durability. CIFS does delegation using opportunisti loks, while NFSv4 has a dediated operation for delegation. Prefething and delegation of suh files will improve read and write performane, lower network traffi, and lighten server load. The aess patterns again offer a simple, threshold-based deision algorithm. If an appliation instane does more than 1s of KB of sequential IO, and has no overwrite, then it is likely to be a ontent viewing or update appliation instane; suh files are prefethed and delegated to the lients. Impliation 5: Clients an request file prefeth (read) and delegation (write) based on only IO sequentiality. Observation 6: Engineering appliations with >5% sequential reads and >5% sequential writes are doing ode ompile tasks. We know this from looking at the file extensions in Figure 5. These ompile proesses show read sequentiality, write sequentiality, a signifiant overwrite ratio and large number of metadata requests. They rely on the server heavily for data aesses. We need more detailed lient side information to understand why lient ahes are ineffetive in this ase. However, it is lear that the server ahe needs to prefeth the read files for these appliations. The high perentage of sequential reads and writes gives us another threshold-based algorithm to identify these appliations. Impliation 6: Servers an identify ompile tasks by the presene of both sequential reads and writes; server has to ahe the output of these tasks. 4.2 Server Side Aess Patterns As mentioned in Setion 3.2, we analyzed two kinds of server side aess units: files and deepest subtrees Files File aess patterns help storage server designers develop per-file plaement and optimization tehniques. We used 25 features to desribe files (Table 5). Note that some of n.f.e. + lnk n.f.e. + ppt ini pdf n.f.e. + n.f.e. + ontent viewing app - app generated ontent n.f.e. + n.f.e. no files opened supporting metadata n.f.e. + n.f.e. + htm n.f.e. + n.f.e. + n.f.e. app generated file updates n.f.e. + pdf n.f.e. + ppt n.f.e. + pdf n.f.e. + ontent viewing app - human generated ontent n.f.e. + pdf n.f.e. + ppt n.f.e. + lnk n.f.e. + n.f.e. + ontent update app Figure 4: File extensions for orporate appliation instane aess patterns. For eah aess pattern (olumn), showing the fration of the two most frequent file extensions that are aessed together within a single appliation instane. n.f.e. denotes files with no file extension d + h n.f.e. + h + d h + dbo h + h + o ompilation app n.f.e. no files opened supporting metadata txt n.f.e. + n.f.e. rnd ontent update app - small pdf bmp ontent viewing app - human generated ontent txt n.f.e. pst rnd + n.t.e. ontent viewing app - small Figure 5: File extensions for engineering appliation instane aess patterns. For eah aess pattern (olumn), showing the fration of the two most frequent file extensions that are aessed together within a single appliation instane. n.f.e. denotes files with no file extension. the features inlude different perentiles of a harateristi, e.g., read request size as perentiles of all read requests. We believe inluding different perentiles rather than just the average would allow better separation of aess patterns. The orporate trae has 1,155,99 files, and the engineering trae has 1,89,571. In Table 5, we quantitative desriptions and short names for all the file aess patterns. Figures 6 and 7 give the most ommon file extensions in eah. We derived the names by examining the read-write ratio and IO size. For the engineering trae, examining the file extensions also proved useful, leading to labels suh as edit ode and ompile output, and read only log/bakup.

9 (a). Desriptive features for eah appliation instane Total IO size Total metadata requests Repeated read ratio File opens Read:write ratio by bytes Avg. time between IO requests Overwrite ratio Unique files opened Total IO requests by bytes Read sequentiality Tree onnets Diretories aessed Read:write ratio by requests Write sequentiality Unique trees aessed File extensions aessed (b). Corporate appliation Content viewing app - Supporting App generated Content viewing app - Content update instane aess patterns app generated ontent metadata file updates human generated ontent app % of all app instanes 16% 56% 14% 8.8% 5.1% Total IO 1 KB 1 KB 8 KB 3.5 MB Read:write ratio 1: : 1:1 1: 2:3 Metadata requests Read sequentiality 5% - % 8% 5% Write sequentiality - - % - 8% Overwrite ratio - - % - 5% File opens:files 19:4 : 1:4 2:4 6:11 Tree onnet:trees 2:2 : 2:2 2:2 2:2 Diretories aessed File extensions aessed (). Engineering appliation Compilation app Supporting Content update Content viewing app - Content viewing instane aess patterns metadata app - small human generated ontent app - small % of all app instanes 1.6% 93%.9% 2.% 2.5% Total IO 2 MB 2 KB 1 MB 3 KB Read:write ratio 9:1 : :1 1: 1: Metadata requests Read sequentiality 5% - - 9% % Write sequentiality 8% - % - - Overwrite ratio 2% - % - - File opens:files 145:75 : 3:1 5:4 2:1 Tree onnet:trees 1:1 : 1:1 1:1 1:1 Diretories aessed File extensions aessed Table 4: Appliation instane aess patterns. (a): Full list of desriptive features. (b) and (): Short names and desriptions of appliation instanes in eah aess pattern; listing only the features that help separate the aess patterns. We see that there are groupings of files with similar extensions. For example, in the orporate trae, the small random read aess patterns inlude many file extensions assoiated with web browser ahes. Also, multi-media files like.mp3 and. ongregate in the sequential read and write aess patterns. In the engineering trae, ode libraries group under the sequential write files, and read only log/bakup files ontain file extensions. to.99. However, the most ommon file extensions in eah trae still spread aross many aess patterns, e.g., offie produtivity files in the orporate trae and ode files in the engineering trae. Observation 7: For files with >7% sequential reads or sequential writes, the repeated read and overwrite ratios are lose to zero. This implies that there is little benefit in ahing these files at the server. They should be prefethed as a whole and delegated to the lient. Again, the bimodal IO sequentiality offers a simple algorithm for the server to detet whih files should be prefethed and delegated if a file has any sequential aess, it is likely to have a high perentage of sequential aess, therefore it should be prefethed and delegated to the lient. Future storage servers an suggest suh information to lients, leading to delegation requests. Impliation 7: Servers should delegate sequentially aessed files to lients to improve IO performane. Observation 8: In the engineering trae, only the edit ode and ompile output files have a high % of repeated reads. Those files should be delegated to the lients as well. The repeated reads do not show up in the engineering appliation instanes, possibly beause a ompilation proess launhes many hild proesses repeatedly reading the same files. Eah hild proess reads fresh data, even though the server sees repeated reads. With larger memory or flash ahes at lients, we expet this behavior to drop. The working set issues that lead to this senario need to be examined. If the repeated reads ome from a single lient, then the server an suggest that the lient ahe the appropriate files. We an again employ a threshold-based algorithm. Deteting any repeated reads at the server signals that the file should be delegated to the lient. At worst, only the first few reads will hit the server. Subsequent repeated reads are stopped at the lient. Impliation 8: Servers should delegate repeatedly read files to lients. Observation 9: Almost all files are ative (have opens, IO, and metadata aess) for only 1-2 hours over the entire trae period, as indiated by the typial opens/read/write ativity of all aess patterns. There are some regularly aessed files, but they are so few that they do not affet the k-means analysis. The lak of regular aess for most files means that there is room for the server to employ tehniques to inrease apaity by doing ompation on idle files. Common tehniques inlude dedupliation and ompression. The ativity on these files indiate that the IO performane impat should be small. Even if run onstantly, ompation has a low probability of affeting an ative file. Sine ommon libraries like gzip optimize for deompression [11], deompressing files at read time should have only slight performane impat. Impliation 9: Servers an use file idle time to ompress or dedupliate data to inrease storage apaity.

10 Fration of files Fration of files (a). Desriptive features for eah file Number of hours with 1, 2-3, or 4 file opens Number of hours with 1-1KB, 1KB-1MB, or >1MB reads Number of hours with 1-1KB, 1KB-1MB, or >1MB writes Number of hours with 1, 2-3, or 4 metadata requests Read request size - 25th, 5th, and 75th perentile of all requests Write request size - 25th, 5th, and 75th perentile of all requests Avg. time between IO requests - 25th, 5th, and 75th perentile of all request pairs Read sequentiality Write sequentiality Read:write ratio by bytes Repeated read ratio Overwrite ratio (b). Corporate file Metadata only Sequential write Sequential read Small random Smallest random Small random aess patterns write read read % of all files 59% 4.% 4.1% 4.7% 19% 9.2% Opens ativity 2hrs, 1 open 1hr, 2-3 opens 1hr, 2-3 opens 1hr, 2-3 opens 1hr, 1 open 1hr, 1 open Read ativity 1hr, 1KB-1MB 1hr, 1-1KB 1hr, 1-1KB Write ativity 1hr, 1KB-1MB 1hr, 1-1KB Read request size KB - 2KB 32KB Write request size - 6KB KB - - Read sequentiality - - 7% - % % Write sequentiality - 8% - % - - Read:write ratio : :1 1: :1 1: 1: (). Engineering file Metadata only Sequential write Small random Edit ode & Sequential read Read-only aess patterns read ompile output log/bakup % of all files 42% 1.9% 32% 7.3% 8.3% 8.1% Opens ativity 1hr, 1 open 1hr, 2-3 opens 1hr, 2-3 opens 1hr, 2-3 opens 1hr, 2-3 opens 2hrs, 2-3 opens Read ativity 1hr, 1-1KB 1hr, 1-1KB 1hr, 1-1KB 2hrs, 1-1KB Write ativity 1hr, >1MB Read request size KB 4KB 8-16KB 1KB Write request size - 64KB Read sequentiality - - % % 7% % Write sequentiality - 9% Repeated read ratio - - % 5% % % Read:write ratio : :1 1: 1: 1: 1: Table 5: File aess patterns. (a): Full list of desriptive features. (b) and (): Short names and desriptions of files in eah aess pattern; listing only the features that help separate the aess patterns xml io z pdf mp3 ppt pdf mp3 swf htm zip gif xml ppt eth h d bf o wma jar dll mp3 gif d h o d bq h h o bq -99 metadata only files seq write files seq read files small random write files smallest random read files small random read files idle, only metadata seq write files small random read files edit ode & ompile output seq read files read only log/bakup files Figure 6: File extensions for orporate files. Fration of file extensions in eah file aess pattern. Figure 7: File extensions for engineering files. Fration of file extensions in eah file aess pattern. Observation 1: All files have either all random aess or >7% sequential aess. The small random read and write files in both traes an benefit from being plaed on media with high random aess performane, suh as solid state drives (SSDs). Files with a high perentage of sequential aess an reside on traditional hard disk drives (HDDs), whih already optimize for sequential aess. The bimodal IO sequentiality offers yet another threshold-based plaement algorithm if a file has any sequential aess, it is likely to have a high perentage of sequential aess; therefore plae it on HDDs. Otherwise, plae it on SSDs. We note that there are more randomly aessed files than sequentially aessed files. Even though sequential files tend to be larger, we still need to do a working set analysis to determine the right size of server SSDs for eah use ase. Impliation 1: Servers an selet the best storage medium for eah file based only on aess sequentiality Deepest subtrees Deepest subtree aess patterns help storage server designers develop per-diretory poliies. We used 4 features to desribe deepest subtrees (Table 6). Some of the features

11 Fration of files Fration of files xml temp dirs for real data gif msg xml zip swf pdf mp3 lient mix read aheable dirs, dirs mostly seq xml io z metadata only dirs mp3 pdf mix write dirs, mostly seq zip htm gif xml small random read dirs Figure 8: File extensions for orporate deepest subtrees. Fration of file extensions in deepest subtree aess patterns eth h d o bf metadata only dir bq h small random read dir d bq h d o lient mix read aheable dirs, dirs mostly seq dll ryy jar seq write dirs obj dbo d o temp dirs for real data Figure 9: File extensions for engineering deepest subtrees. Fration of file extensions in deepest subtree aess patterns. inlude different perentiles of a harateristi, e.g. per file read sequentiality as perentiles of all files in a diretory. Inluding different perentiles rather than just the average allows better separation of aess patterns. The orporate trae has 117,64 deepest subtrees, and the engineering trae has 161,858. We use diretories and deepest subtrees interhangeably. In Table 6, we provide quantitative desriptions and short names for all the deepest subtree aess patterns. We derive the names using two types of information. First, we analyze the file extensions in eah subtree aess pattern (Figures 8 and 9). Seond, we examine how many files of eah file aess patterns are within eah subtree pattern (Figures 1). For brevity, we show only the graph for orporate deepest subtrees. The graph for the engineering deepest subtrees onveys the same information with regard to our design insights. For example, the random read and lient aheable labels ome from looking at the IO patterns. Temporary diretories aounted for the. files in those diretories. Mix read and mix write diretories onsidered the presene of both sequential and randomly aessed files in those diretories. The metadata bakground noise remains visible at the subtree layer. The spread of file extensions is similar to that for file aess patterns some file extensions ongregate and spread evenly. Interestingly, some subtrees have a large fration of metadata-only files that do not affet the desriptions of those subtrees. Some subtrees ontain only files of a single aess pattern (e.g., small random read subtrees in Figures 1). There, we an apply the design insights from the file aess patterns to the entire subtree. For example, the small random read subtrees an reside on SSDs. Sine there are more files than subtrees, per-subtree poliies an lower the amount of poliy information kept at the server. In ontrast, the mix read and mix write diretories ontain both sequential and randomly aessed files. Those subtrees need per-file poliies: Plae the sequentially aessed files on HDDs and the randomly aessed files on SSDs. Soft links to files an preserve the user-faing diretory organization, while allowing the server optimize per-file plaement. The server should automatially deide when to apply per-file or per-subtree poliies. Observation 11: Diretories with sequentially aessed files almost always ontain randomly aessed files also. Conversely, some diretories with randomly aess files will not ontain sequentially aessed files. Thus, we an default all subtrees to per-subtree poliies. Conurrently, we trak the IO sequentiality per subtree. If the sequentiality is above some threshold, then the subtree swithes to per-file poliies. Impliation 11: Servers an hange from per-diretory plaement poliy (default) to per-file poliy upon seeing any sequential IO to any files in a diretory. Observation 12: The lient aheable subtrees and temporary subtrees aggregate files with repeated reads or overwrites. Additional omputation showed that the repeated reads and overwrites almost always ome from a single lient. Thus, it is possible for the entire diretory to be prefethed and delegated to the lient. Delegating entire diretories an preempt all aesses that are loal to a diretory, but onsumes lient ahe spae. We need to understand the tradeoffs through a more in-depth working set and temporal loality analysis at both the file and deepest subtree levels. Impliation 12: Servers an delegate repeated read and overwrite diretories entirely to lients, tradeoffs permitting. 4.3 Aess Pattern Evolutions Over Time We want to know if the aess patterns are restrited to our partiular traing period or if they persist aross time. Only if the design insights remain relevant aross time an we rationalize their existene in similar use ases. We do not have enough traes to generalize beyond our monitoring period. We investigate the reverse problem - if we

12 (a). Desriptive features for eah subtree Number of hours with 1, 2-3, or 4 file opens Read:write ratio - 25th, 5th, and 75th perentile of files Number of hours with 1-1KB, 1KB-1MB, or >1MB reads Repeated read ratio - 25th, 5th, and 75th perentile of files Number of hours with 1-1KB, 1KB-1MB, or >1MB writes Overwrite ratio - 25th, 5th, and 75th perentile of files Number of hours with 1, 2-3, or 4 metadata requests Read sequentiality - aggregated aross all files Read request size - 25th, 5th, and 75th perentile of all requests Write sequentiality - aggregated aross all files Write request size - 25th, 5th, and 75th perentile of all requests Read:write ratio - aggregated aross all files Avg. time between IO requests - 25th, 5th, and 75th perentile of all request pairs Repeated read ratio - aggregated aross all files Read sequentiality - 25th, 5th, and 75th perentile of files in the subtree Overwrite ratio - aggregated aross all files Write sequentiality - 25th, 5th, and 75th perentile of files in the subtree (b). Corp. subtree Temp dirs for Client ahe- Mix read dirs, Metadata only Mix write dirs, Small random aess patterns real data able dirs mostly sequential dirs mostly sequential read dirs % of all subtrees 2.3% 4.1% 5.6% 64% 3.5% 21% Opens ativity 3hrs, >4 opens 3hr, 1 open 2hr, 1 open 2hr, 1 open 1hr, >4 opens 1hr, >4 opens Read ativity 3hrs, 1-1KB 2hrs, 1-1KB 1hr, 1-1KB 1hr, 1-1KB Write ativity 2hrs,.1-1MB 1hr, >1MB Read request size 4KB 4-1KB 4-32KB KB Write request size 4KB KB - Read sequentiality 1-3% % 5-7% - - % Write sequentiality 5-7% % - Repeat read ratio 2-5% 5% % - - % Overwrite ratio 3-7% % - Read:write ratio 1: to :1 1: 1: : :1 1: (). Eng. subtree Metadata only Small random Client ahe- Mixed read dirs, Sequential Temp dirs for aess patterns dirs read dirs able dirs mostly sequential write dirs real data % of all subtrees 59% 25% 6.1% 7.1% 1.9% 1.3% Opens ativity 1hr, 2-3 opens 1hr, >4 opens 1hr, >4 opens 1hr, >4 opens 1hr, >4 opens 3hrs, >4 opens Read ativity 1hr, 1-1KB 1hr, 1-1KB 1hr,.1-1MB 3hrs, 1-1KB Write ativity 1hr,.1-1MB 1hr, 1-1KB Read request size - 1-4KB 2-4KB 8-1KB KB Write request size KB 4-6KB Read sequentiality - % % 4-7% % Write sequentiality % 6-8% Repeat read ratio - % 5-6% % - -4% Overwrite ratio % -3% Read:write ratio : 1: 1: 1: :1 1: to :1 Table 6: Deepest subtree aess patterns. (a): Full list of desriptive features. (b) and (): Short names and desriptions of subtrees in eah aess pattern; listing only the features that help separate aess patterns. had to analyze traes from only a subset of our traing period, how would our results differ? We divided our traes into weeks and repeated the analysis for eah week. For brevity, we present only the results for weekly analysis of orporate appliation instanes and files. These two layers have yielded the most interesting design insights and they highlight separate onsiderations at the lient and server. Figure 11 shows the result for files. All the large aess patterns remain steady aross the weeks. However, the aess pattern orresponding to the smallest number of files, the small random write files, omes and goes week to week. There are exatly two, temporary, previously unseen aess patterns that are very similar to the small random files. The peaks in the metadata only files orrespond to weeks that ontain U.S. federal holidays or weeks immediately preeding a holiday long weekend. Furthermore, the numerial values of the desriptive features for eah aess pattern vary in a moderate range. For example, the write sequentiality of the sequentiality write files ranges from 5% to 9%. Figure 12 shows the result for appliation instanes. We see no new aess patterns, and the frational weight of eah aess pattern remains nearly onstant, despite holidays. Furthermore, the numerial values of desriptive features also remain nearly onstant. For example, the write sequentiality of the ontent update appliations varies in a narrow range from 8% to 85%. Thus, if we had done our analysis on just a week s traes, we would have gotten nearly idential results for appliation instanes, and qualitatively similar result for files. We believe that the differene omes from the limited duration of lient sessions and appliation instanes, versus the long-term persistene of files and subtrees. Based on our results, we are onfident that the aess patterns are not restrited just to our partiular trae period. Future storage systems should ontinuously monitor the aess patterns at all levels, automatially adjusting poliies as needed, and notify designers of previously unseen aess patterns. We should always be autious when generalizing aess patterns from one use ase to another. For use ases with the same appliations running on the same OS file API, we expet to see the same appliation instane aess patterns. Session aess patterns suh as daily work sessions are also likely to be general. For the server side aess patterns, we expet the files and subtrees with large frational weights to appear in other use ases. 5. ARCHITECTURAL IMPLICATIONS Setion 4 offered many speifi optimizations for plaement, ahing, delegation, and onsolidation deisions. We ombine the insights here to speulate on the arhiteture of future enterprise storage systems.

13 Fration of all app instanes Fration of all files # of files Temp dirs for real data Client aheable dirs Mix read dirs, mostly seq Metadata only dirs Mix write dirs, mostly seq Small random read dirs file aess patterns file aess patterns file aess patterns file aess patterns file aess patterns file aess patterns Figure 1: Corporate file aess patterns within eah deepest subtree. For eah deepest subtree aess pattern (i.e., eah graph), showing the number of files belonging to eah file aess pattern that belongs to subtrees in the subtree aess pattern. Corporate file aess pattern indies:. metadata only files; 1. sequential write files; 2. sequential read files; 3. small random write files; 4. small random read files; 5. less small random read files week # metadata only files sequential write files sequential read files small random write files smallest random read files small random read files medium partly seq write small read write speifi data plaement algorithms (Design insights 1 and 11). Also, the bakground metadata noise seen at all levels suggests that storage servers should both optimize for metadata aesses and redesign lient-server interations to derease the metadata hatter. Depending on the growth of metadata and the performane requirements, we also need to onsider plaing metadata on low lateny, non-volatile media like flash or SSDs. Figure 11: Corporate file aess patterns over 8 weeks. All patterns remain (hollow markers), but the frational weight of eah hanges greatly between weeks. Some small patterns temporarily appear and disappear (solid markers) week # supporting metadata app generated file updates ontent update app ontent viewing app - app generated ontent ontent viewing app - human generated ontent Figure 12: Corporate appliation instane aess patterns over 8 weeks. All patterns remain with near onstant frational weight. No new patterns appear. We see a lear separation of roles for lients and servers. The lient design an target high IO performane by a ombination of effiient delegation, prefething and ahing of the appropriate data. The servers should fous on inreasing their aggregated effiieny aross lients: ollaboration with lients (on ahing, delegation, et.) and exploiting user patterns to shedule bakground tasks. Automating bakground tasks suh as offline data dedupliation delivers apaity savings in a timely and hassle-free fashion, i.e., without system downtime or expliit sheduling. Regarding ahing at the server, we observe that very few aess patterns atually leverage the server s buffer ahe for data aesses. Design insights 4-6, 8 and 12 indiate a heavy role for the lient ahe and Design insight 7 suggests how not to use the server buffer ahe - ahing metadata only and ating as a warm/bakup ahe for lients would result in lower latenies for many aess patterns. We also see simple ways to take advantage of new storage media suh as SSDs. The lear identifiation of sequential and random aess file patterns enables effiient devie- Furthermore, we believe that storage systems should introdue many monitoring points to dynamially adjust the deision thresholds of plaement, ahing, or onsolidation poliies. We need to monitor both lients and servers. For example, when repeated read and overwrite files have been properly delegated to lients, the server would no longer see files with suh aess patterns. Without monitoring points at the lients, we would not be able to quantify the file delegation benefits. Storage systems should make extensible traing APIs to expedite the olletion of long-term future traes. This will failitate future work similar to ours. 6. CONCLUSIONS AND FUTURE WORK We must address the storage tehnology trends toward everinreasing sale, heterogeneity, and onsolidation. Current storage design paradigms that rely on existing trae analysis methods are ill equipped to meet the emerging hallenges beause they are unidimensional, fous only on the storage server, and are subjet to designer bias. We showed that a multi-dimensional, multi-layered trae-driven design methodology leads to more objetive design points with highly targeted optimizations at both storage lients and servers. Using our orporate and engineering use ases, we present a number of insights that informs future designs. We desribed in some detail the aess patterns we observed, and we enourage fellow storage system designers to extrat further insights from our observations. Future work inludes exploring the dynamis of hanging working sets and aess sequenes, with the goal of antiipating data aesses before they happen. Another worthwhile analysis is to look for optimization opportunities aross lients; this requires olleting traes at different lients, instead of only at the server. Also, we would like to explore opportunities for dedupliation, ompression, or data plaement. Doing so requires extending our analysis from data movement patterns to also inlude data ontent patterns. Furthermore, we would like to perform on-line analysis in live storage systems to enable dynami feedbak on plaement and optimization deisions. In addition, it would be useful

14 to build tools to synthesize the aess patterns, to enable designers to evaluate the optimizations we proposed here. We believe that storage system designers fae an inreasing hallenge to antiipate aess patterns. Our paper builds the ase that system designers an longer aurately antiipate aess patterns using intuition only. We believe that the orporate and engineering traes from our orporate headquarters would have similar use ases at other traditional and high-teh businesses. Other use ases would require us to perform the same trae olletion and analysis proess to extrat the same kind of ground truth. We also need similar studies at regular intervals to trak the evolving use of storage system. We hope that this paper ontributes to an objetive and prinipled design approah targeting rapidly hanging data aess patterns. NetApp, the NetApp logo, and Go further, faster are trademarks or registered trademarks of NetApp, In. in the United States and/or other ountries. 7. REFERENCES [1] N. Agrawal, W. J. Bolosky, J. R. Doueur, and J. R. Lorh. A Five-Year Study of File-System Metadata. In FAST 27. [2] E. Alpaydin. Introdution to Mahine Learning. MIT Press, Cambridge, Massahusetts, 24. [3] M. G. Baker, J. H. Hartman, M. D. Kupfer, K. W. Shirriff, and J. K. Ousterhout. Measurements of a distributed file system. In SOSP [4] P. Bodik, M. Goldszmidt, A. Fox, D. B. Woodard, and H. Andersen. Fingerprinting the dataenter: automated lassifiation of performane rises. In EuroSys 21. [5] Common Internet File System Tehnial Referene. Storage Network Industry Assoiation, 22. [6] IDC Whitepaper: The eonomis of Virtualization. Virtualization-appliation-based-ost-model-WP-EN. pdf. [7] J. R. Doueur and W. J. Bolosky. A Large-Sale Study of File-System Contents. In SIGMETRICS [8] D. Ellard, J. Ledlie, P. Malkani, and M. Seltzer. Passive NFS Traing of and Researh Workloads. In FAST 23. [9] A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Prediting Multiple Metris for Queries: Better Deisions Enabled by Mahine Learning. In ICDE 29. [1] S. Gribble, G. S. Manku, E. Brewer, T. J. Gibson, and E. L. Miller. Self-Similarity in File Systems: Measurement and Appliations. In SIGMETRICS [11] The gzip algorithm. [12] IDC Report: Worldwide File-Based Storage Foreast Update. http: // [13] S. Kavalanekar, B. L. Worthington, Q. Zhang, and V. Sharda. Charaterization of storage workload traes from prodution Windows Servers. In IISWC 28. [14] Open Soure Clustering Software - C Clustering Library. luster/software.htm, 21. [15] A. Leung, S. Pasupathy, G. Goodson, and E. Miller. Measurement and analysis of large-sale network file system workloads. In USENIX ATC 28. [16] D. T. Meyer and W. J. Bolosky. A Study of Pratial Dedupliation. In FAST 21. [17] J. K. Ousterhout, H. D. Costa, D. Harrison, J. A. Kunze, M. Kupfer, and J. G. Thompson. A trae-driven analysis of the Unix 4.2 BSD file system. In SOSP [18] K. K. Ramakrishnan, P. Biswas, and R. Karedla. Analysis of file I/O traes in ommerial omputing environments. In SIGMETRICS [19] D. Roselli, J. Lorh, and T. Anderson. A omparison of file system workloads. In USENIX 2. [2] I. Stoia. A Berkeley View of Big Data: Algorithms, Mahines and People. UC Berkeley EECS Annual Researh Symposium, 211. [21] K. Thomas, C. Grier, J. Ma, V. Paxson, and D. Song. Design and evaluation of a real-time URL spam filtering servie. In IEEE Symposium on Seurity and Privay 211. [22] R. Villars. The Migration to Converged IT: What it Means for Infrastruture, Appliations, and the IT Organization. IDC Diretions Conferene 211. [23] VMware Whitepaper: Server Consolidation and Containment. [24] W. Vogels. File system usage in Windows NT 4.. In SOSP [25] M. Zhou and A. J. Smith. Analysis of Personal Computer Workloads. In MASCOTS 1999.

A Holistic Method for Selecting Web Services in Design of Composite Applications

A Holistic Method for Selecting Web Services in Design of Composite Applications A Holisti Method for Seleting Web Servies in Design of Composite Appliations Mārtiņš Bonders, Jānis Grabis Institute of Information Tehnology, Riga Tehnial University, 1 Kalu Street, Riga, LV 1658, Latvia,

More information

' R ATIONAL. :::~i:. :'.:::::: RETENTION ':: Compliance with the way you work PRODUCT BRIEF

' R ATIONAL. :::~i:. :'.:::::: RETENTION ':: Compliance with the way you work PRODUCT BRIEF ' R :::i:. ATIONAL :'.:::::: RETENTION ':: Compliane with the way you work, PRODUCT BRIEF In-plae Management of Unstrutured Data The explosion of unstrutured data ombined with new laws and regulations

More information

Hierarchical Clustering and Sampling Techniques for Network Monitoring

Hierarchical Clustering and Sampling Techniques for Network Monitoring S. Sindhuja Hierarhial Clustering and Sampling Tehniques for etwork Monitoring S. Sindhuja ME ABSTRACT: etwork monitoring appliations are used to monitor network traffi flows. Clustering tehniques are

More information

An Efficient Network Traffic Classification Based on Unknown and Anomaly Flow Detection Mechanism

An Efficient Network Traffic Classification Based on Unknown and Anomaly Flow Detection Mechanism An Effiient Network Traffi Classifiation Based on Unknown and Anomaly Flow Detetion Mehanism G.Suganya.M.s.,B.Ed 1 1 Mphil.Sholar, Department of Computer Siene, KG College of Arts and Siene,Coimbatore.

More information

Unit 12: Installing, Configuring and Administering Microsoft Server

Unit 12: Installing, Configuring and Administering Microsoft Server Unit 12: Installing, Configuring and Administering Mirosoft Server Learning Outomes A andidate following a programme of learning leading to this unit will be able to: Selet a suitable NOS to install for

More information

Deduplication with Block-Level Content-Aware Chunking for Solid State Drives (SSDs)

Deduplication with Block-Level Content-Aware Chunking for Solid State Drives (SSDs) 23 IEEE International Conferene on High Performane Computing and Communiations & 23 IEEE International Conferene on Embedded and Ubiquitous Computing Dedupliation with Blok-Level Content-Aware Chunking

More information

Chapter 1 Microeconomics of Consumer Theory

Chapter 1 Microeconomics of Consumer Theory Chapter 1 Miroeonomis of Consumer Theory The two broad ategories of deision-makers in an eonomy are onsumers and firms. Eah individual in eah of these groups makes its deisions in order to ahieve some

More information

) ( )( ) ( ) ( )( ) ( ) ( ) (1)

) ( )( ) ( ) ( )( ) ( ) ( ) (1) OPEN CHANNEL FLOW Open hannel flow is haraterized by a surfae in ontat with a gas phase, allowing the fluid to take on shapes and undergo behavior that is impossible in a pipe or other filled onduit. Examples

More information

Behavior Analysis-Based Learning Framework for Host Level Intrusion Detection

Behavior Analysis-Based Learning Framework for Host Level Intrusion Detection Behavior Analysis-Based Learning Framework for Host Level Intrusion Detetion Haiyan Qiao, Jianfeng Peng, Chuan Feng, Jerzy W. Rozenblit Eletrial and Computer Engineering Department University of Arizona

More information

On the Characteristics of Spectrum-Agile Communication Networks

On the Characteristics of Spectrum-Agile Communication Networks 1 On the Charateristis of Spetrum-Agile Communiation Networks Xin Liu Wei Wang Department of Computer Siene University of California, Davis, CA 95616 Email:{liu,wangw}@s.udavis.edu Abstrat Preliminary

More information

Channel Assignment Strategies for Cellular Phone Systems

Channel Assignment Strategies for Cellular Phone Systems Channel Assignment Strategies for Cellular Phone Systems Wei Liu Yiping Han Hang Yu Zhejiang University Hangzhou, P. R. China Contat: wliu5@ie.uhk.edu.hk 000 Mathematial Contest in Modeling (MCM) Meritorious

More information

TECHNOLOGY-ENHANCED LEARNING FOR MUSIC WITH I-MAESTRO FRAMEWORK AND TOOLS

TECHNOLOGY-ENHANCED LEARNING FOR MUSIC WITH I-MAESTRO FRAMEWORK AND TOOLS TECHNOLOGY-ENHANCED LEARNING FOR MUSIC WITH I-MAESTRO FRAMEWORK AND TOOLS ICSRiM - University of Leeds Shool of Computing & Shool of Musi Leeds LS2 9JT, UK +44-113-343-2583 kia@i-maestro.org www.i-maestro.org,

More information

How To Fator

How To Fator CHAPTER hapter 4 > Make the Connetion 4 INTRODUCTION Developing seret odes is big business beause of the widespread use of omputers and the Internet. Corporations all over the world sell enryption systems

More information

Open and Extensible Business Process Simulator

Open and Extensible Business Process Simulator UNIVERSITY OF TARTU FACULTY OF MATHEMATICS AND COMPUTER SCIENCE Institute of Computer Siene Karl Blum Open and Extensible Business Proess Simulator Master Thesis (30 EAP) Supervisors: Luiano Garía-Bañuelos,

More information

Weighting Methods in Survey Sampling

Weighting Methods in Survey Sampling Setion on Survey Researh Methods JSM 01 Weighting Methods in Survey Sampling Chiao-hih Chang Ferry Butar Butar Abstrat It is said that a well-designed survey an best prevent nonresponse. However, no matter

More information

Learning Curves and Stochastic Models for Pricing and Provisioning Cloud Computing Services

Learning Curves and Stochastic Models for Pricing and Provisioning Cloud Computing Services T Learning Curves and Stohasti Models for Priing and Provisioning Cloud Computing Servies Amit Gera, Cathy H. Xia Dept. of Integrated Systems Engineering Ohio State University, Columbus, OH 4310 {gera.,

More information

A Keyword Filters Method for Spam via Maximum Independent Sets

A Keyword Filters Method for Spam via Maximum Independent Sets Vol. 7, No. 3, May, 213 A Keyword Filters Method for Spam via Maximum Independent Sets HaiLong Wang 1, FanJun Meng 1, HaiPeng Jia 2, JinHong Cheng 3 and Jiong Xie 3 1 Inner Mongolia Normal University 2

More information

Henley Business School at Univ of Reading. Pre-Experience Postgraduate Programmes Chartered Institute of Personnel and Development (CIPD)

Henley Business School at Univ of Reading. Pre-Experience Postgraduate Programmes Chartered Institute of Personnel and Development (CIPD) MS in International Human Resoure Management For students entering in 2012/3 Awarding Institution: Teahing Institution: Relevant QAA subjet Benhmarking group(s): Faulty: Programme length: Date of speifiation:

More information

Deadline-based Escalation in Process-Aware Information Systems

Deadline-based Escalation in Process-Aware Information Systems Deadline-based Esalation in Proess-Aware Information Systems Wil M.P. van der Aalst 1,2, Mihael Rosemann 2, Marlon Dumas 2 1 Department of Tehnology Management Eindhoven University of Tehnology, The Netherlands

More information

A Survey of Usability Evaluation in Virtual Environments: Classi cation and Comparison of Methods

A Survey of Usability Evaluation in Virtual Environments: Classi cation and Comparison of Methods Doug A. Bowman bowman@vt.edu Department of Computer Siene Virginia Teh Joseph L. Gabbard Deborah Hix [ jgabbard, hix]@vt.edu Systems Researh Center Virginia Teh A Survey of Usability Evaluation in Virtual

More information

From a strategic view to an engineering view in a digital enterprise

From a strategic view to an engineering view in a digital enterprise Digital Enterprise Design & Management 2013 February 11-12, 2013 Paris From a strategi view to an engineering view in a digital enterprise The ase of a multi-ountry Telo Hervé Paault Orange Abstrat In

More information

State of Maryland Participation Agreement for Pre-Tax and Roth Retirement Savings Accounts

State of Maryland Participation Agreement for Pre-Tax and Roth Retirement Savings Accounts State of Maryland Partiipation Agreement for Pre-Tax and Roth Retirement Savings Aounts DC-4531 (08/2015) For help, please all 1-800-966-6355 www.marylandd.om 1 Things to Remember Complete all of the setions

More information

Granular Problem Solving and Software Engineering

Granular Problem Solving and Software Engineering Granular Problem Solving and Software Engineering Haibin Zhu, Senior Member, IEEE Department of Computer Siene and Mathematis, Nipissing University, 100 College Drive, North Bay, Ontario, P1B 8L7, Canada

More information

Customer Reporting for SaaS Applications. Domain Basics. Managing my Domain

Customer Reporting for SaaS Applications. Domain Basics. Managing my Domain Produtivity Marketpla e Software as a Servie Invoiing Ordering Domains Customer Reporting for SaaS Appliations Domain Basis Managing my Domain Managing Domains Helpful Resoures Managing my Domain If you

More information

Chapter 5 Single Phase Systems

Chapter 5 Single Phase Systems Chapter 5 Single Phase Systems Chemial engineering alulations rely heavily on the availability of physial properties of materials. There are three ommon methods used to find these properties. These inlude

More information

SLA-based Resource Allocation for Software as a Service Provider (SaaS) in Cloud Computing Environments

SLA-based Resource Allocation for Software as a Service Provider (SaaS) in Cloud Computing Environments 2 th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing SLA-based Resoure Alloation for Software as a Servie Provider (SaaS) in Cloud Computing Environments Linlin Wu, Saurabh Kumar

More information

FIRE DETECTION USING AUTONOMOUS AERIAL VEHICLES WITH INFRARED AND VISUAL CAMERAS. J. Ramiro Martínez-de Dios, Luis Merino and Aníbal Ollero

FIRE DETECTION USING AUTONOMOUS AERIAL VEHICLES WITH INFRARED AND VISUAL CAMERAS. J. Ramiro Martínez-de Dios, Luis Merino and Aníbal Ollero FE DETECTION USING AUTONOMOUS AERIAL VEHICLES WITH INFRARED AND VISUAL CAMERAS. J. Ramiro Martínez-de Dios, Luis Merino and Aníbal Ollero Robotis, Computer Vision and Intelligent Control Group. University

More information

Trade Information, Not Spectrum: A Novel TV White Space Information Market Model

Trade Information, Not Spectrum: A Novel TV White Space Information Market Model Trade Information, Not Spetrum: A Novel TV White Spae Information Market Model Yuan Luo, Lin Gao, and Jianwei Huang 1 Abstrat In this paper, we propose a novel information market for TV white spae networks,

More information

Computer Networks Framing

Computer Networks Framing Computer Networks Framing Saad Mneimneh Computer Siene Hunter College of CUNY New York Introdution Who framed Roger rabbit? A detetive, a woman, and a rabbit in a network of trouble We will skip the physial

More information

FOOD FOR THOUGHT Topical Insights from our Subject Matter Experts

FOOD FOR THOUGHT Topical Insights from our Subject Matter Experts FOOD FOR THOUGHT Topial Insights from our Sujet Matter Experts DEGREE OF DIFFERENCE TESTING: AN ALTERNATIVE TO TRADITIONAL APPROACHES The NFL White Paper Series Volume 14, June 2014 Overview Differene

More information

Capacity at Unsignalized Two-Stage Priority Intersections

Capacity at Unsignalized Two-Stage Priority Intersections Capaity at Unsignalized Two-Stage Priority Intersetions by Werner Brilon and Ning Wu Abstrat The subjet of this paper is the apaity of minor-street traffi movements aross major divided four-lane roadways

More information

WORKFLOW CONTROL-FLOW PATTERNS A Revised View

WORKFLOW CONTROL-FLOW PATTERNS A Revised View WORKFLOW CONTROL-FLOW PATTERNS A Revised View Nik Russell 1, Arthur H.M. ter Hofstede 1, 1 BPM Group, Queensland University of Tehnology GPO Box 2434, Brisbane QLD 4001, Australia {n.russell,a.terhofstede}@qut.edu.au

More information

A novel active mass damper for vibration control of bridges

A novel active mass damper for vibration control of bridges IABMAS 08, International Conferene on Bridge Maintenane, Safety and Management, 3-7 July 008, Seoul, Korea A novel ative mass damper for vibration ontrol of bridges U. Starossek & J. Sheller Strutural

More information

Customer Efficiency, Channel Usage and Firm Performance in Retail Banking

Customer Efficiency, Channel Usage and Firm Performance in Retail Banking Customer Effiieny, Channel Usage and Firm Performane in Retail Banking Mei Xue Operations and Strategi Management Department The Wallae E. Carroll Shool of Management Boston College 350 Fulton Hall, 140

More information

Parametric model of IP-networks in the form of colored Petri net

Parametric model of IP-networks in the form of colored Petri net Parametri model of IP-networks in the form of olored Petri net Shmeleva T.R. Abstrat A parametri model of IP-networks in the form of olored Petri net was developed; it onsists of a fixed number of Petri

More information

AUDITING COST OVERRUN CLAIMS *

AUDITING COST OVERRUN CLAIMS * AUDITING COST OVERRUN CLAIMS * David Pérez-Castrillo # University of Copenhagen & Universitat Autònoma de Barelona Niolas Riedinger ENSAE, Paris Abstrat: We onsider a ost-reimbursement or a ost-sharing

More information

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules Improved Vehile Classifiation in Long Traffi Video by Cooperating Traker and Classifier Modules Brendan Morris and Mohan Trivedi University of California, San Diego San Diego, CA 92093 {b1morris, trivedi}@usd.edu

More information

To Coordinate Or Not To Coordinate? Wide-Area Traffic Management for Data Centers

To Coordinate Or Not To Coordinate? Wide-Area Traffic Management for Data Centers To Coordinate Or Not To Coordinate? Wide-Area Traffi Management for Data Centers Srinivas Narayana, Joe Wenjie Jiang, Jennifer Rexford, Mung Chiang Department of Computer Siene, and Department of Eletrial

More information

Findings and Recommendations

Findings and Recommendations Contrating Methods and Administration Findings and Reommendations Finding 9-1 ESD did not utilize a formal written pre-qualifiations proess for seleting experiened design onsultants. ESD hose onsultants

More information

Optimal Sales Force Compensation

Optimal Sales Force Compensation Optimal Sales Fore Compensation Matthias Kräkel Anja Shöttner Abstrat We analyze a dynami moral-hazard model to derive optimal sales fore ompensation plans without imposing any ad ho restritions on the

More information

university of illinois library AT URBANA-CHAMPAIGN BOOKSTACKS

university of illinois library AT URBANA-CHAMPAIGN BOOKSTACKS university of illinois library AT URBANA-CHAMPAIGN BOOKSTACKS CENTRAL CIRCULATION BOOKSTACKS The person harging this material is responsible for its renewal or its return to the library from whih it was

More information

Retirement Option Election Form with Partial Lump Sum Payment

Retirement Option Election Form with Partial Lump Sum Payment Offie of the New York State Comptroller New York State and Loal Retirement System Employees Retirement System Polie and Fire Retirement System 110 State Street, Albany, New York 12244-0001 Retirement Option

More information

5.2 The Master Theorem

5.2 The Master Theorem 170 CHAPTER 5. RECURSION AND RECURRENCES 5.2 The Master Theorem Master Theorem In the last setion, we saw three different kinds of behavior for reurrenes of the form at (n/2) + n These behaviors depended

More information

Using Live Chat in your Call Centre

Using Live Chat in your Call Centre Using Live Chat in your Call Centre Otober Key Highlights Yesterday's all entres have beome today's ontat entres where agents deal with multiple queries from multiple hannels. Live Chat hat is one now

More information

Electrician'sMathand BasicElectricalFormulas

Electrician'sMathand BasicElectricalFormulas Eletriian'sMathand BasiEletrialFormulas MikeHoltEnterprises,In. 1.888.NEC.CODE www.mikeholt.om Introdution Introdution This PDF is a free resoure from Mike Holt Enterprises, In. It s Unit 1 from the Eletrial

More information

UNIVERSITY AND WORK-STUDY EMPLOYERS WEB SITE USER S GUIDE

UNIVERSITY AND WORK-STUDY EMPLOYERS WEB SITE USER S GUIDE UNIVERSITY AND WORK-STUDY EMPLOYERS WEB SITE USER S GUIDE September 8, 2009 Table of Contents 1 Home 2 University 3 Your 4 Add 5 Managing 6 How 7 Viewing 8 Closing 9 Reposting Page 1 and Work-Study Employers

More information

Classical Electromagnetic Doppler Effect Redefined. Copyright 2014 Joseph A. Rybczyk

Classical Electromagnetic Doppler Effect Redefined. Copyright 2014 Joseph A. Rybczyk Classial Eletromagneti Doppler Effet Redefined Copyright 04 Joseph A. Rybzyk Abstrat The lassial Doppler Effet formula for eletromagneti waves is redefined to agree with the fundamental sientifi priniples

More information

Public DNS System and Global Traffic Management

Public DNS System and Global Traffic Management . This paper was presented as part of the main tehnial program at I INFOCOM 211 Publi DNS System and Global Traffi Management Cheng Huang Mirosoft Researh David A. Maltz Mirosoft Researh Albert Greenberg

More information

Robust Classification and Tracking of Vehicles in Traffic Video Streams

Robust Classification and Tracking of Vehicles in Traffic Video Streams Proeedings of the IEEE ITSC 2006 2006 IEEE Intelligent Transportation Systems Conferene Toronto, Canada, September 17-20, 2006 TC1.4 Robust Classifiation and Traking of Vehiles in Traffi Video Streams

More information

A Comparison of Service Quality between Private and Public Hospitals in Thailand

A Comparison of Service Quality between Private and Public Hospitals in Thailand International Journal of Business and Soial Siene Vol. 4 No. 11; September 2013 A Comparison of Servie Quality between Private and Hospitals in Thailand Khanhitpol Yousapronpaiboon, D.B.A. Assistant Professor

More information

Programming Basics - FORTRAN 77 http://www.physics.nau.edu/~bowman/phy520/f77tutor/tutorial_77.html

Programming Basics - FORTRAN 77 http://www.physics.nau.edu/~bowman/phy520/f77tutor/tutorial_77.html CWCS Workshop May 2005 Programming Basis - FORTRAN 77 http://www.physis.nau.edu/~bowman/phy520/f77tutor/tutorial_77.html Program Organization A FORTRAN program is just a sequene of lines of plain text.

More information

Impedance Method for Leak Detection in Zigzag Pipelines

Impedance Method for Leak Detection in Zigzag Pipelines 10.478/v10048-010-0036-0 MEASUREMENT SCIENCE REVIEW, Volume 10, No. 6, 010 Impedane Method for Leak Detetion in igzag Pipelines A. Lay-Ekuakille 1, P. Vergallo 1, A. Trotta 1 Dipartimento d Ingegneria

More information

In order to be able to design beams, we need both moments and shears. 1. Moment a) From direct design method or equivalent frame method

In order to be able to design beams, we need both moments and shears. 1. Moment a) From direct design method or equivalent frame method BEAM DESIGN In order to be able to design beams, we need both moments and shears. 1. Moment a) From diret design method or equivalent frame method b) From loads applied diretly to beams inluding beam weight

More information

RATING SCALES FOR NEUROLOGISTS

RATING SCALES FOR NEUROLOGISTS RATING SCALES FOR NEUROLOGISTS J Hobart iv22 WHY Correspondene to: Dr Jeremy Hobart, Department of Clinial Neurosienes, Peninsula Medial Shool, Derriford Hospital, Plymouth PL6 8DH, UK; Jeremy.Hobart@

More information

Procurement auctions are sometimes plagued with a chosen supplier s failing to accomplish a project successfully.

Procurement auctions are sometimes plagued with a chosen supplier s failing to accomplish a project successfully. Deision Analysis Vol. 7, No. 1, Marh 2010, pp. 23 39 issn 1545-8490 eissn 1545-8504 10 0701 0023 informs doi 10.1287/dea.1090.0155 2010 INFORMS Managing Projet Failure Risk Through Contingent Contrats

More information

Henley Business School at Univ of Reading. Chartered Institute of Personnel and Development (CIPD)

Henley Business School at Univ of Reading. Chartered Institute of Personnel and Development (CIPD) MS in International Human Resoure Management (full-time) For students entering in 2015/6 Awarding Institution: Teahing Institution: Relevant QAA subjet Benhmarking group(s): Faulty: Programme length: Date

More information

Sebastián Bravo López

Sebastián Bravo López Transfinite Turing mahines Sebastián Bravo López 1 Introdution With the rise of omputers with high omputational power the idea of developing more powerful models of omputation has appeared. Suppose that

More information

Static Fairness Criteria in Telecommunications

Static Fairness Criteria in Telecommunications Teknillinen Korkeakoulu ERIKOISTYÖ Teknillisen fysiikan koulutusohjelma 92002 Mat-208 Sovelletun matematiikan erikoistyöt Stati Fairness Criteria in Teleommuniations Vesa Timonen, e-mail: vesatimonen@hutfi

More information

Suggested Answers, Problem Set 5 Health Economics

Suggested Answers, Problem Set 5 Health Economics Suggested Answers, Problem Set 5 Health Eonomis Bill Evans Spring 2013 1. The graph is at the end of the handout. Fluoridated water strengthens teeth and redues inidene of avities. As a result, at all

More information

INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS

INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS Virginia Department of Taxation INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS www.tax.virginia.gov 2614086 Rev. 07/14 * Table of Contents Introdution... 1 Important... 1 Where to Get Assistane... 1 Online

More information

Soft-Edge Flip-flops for Improved Timing Yield: Design and Optimization

Soft-Edge Flip-flops for Improved Timing Yield: Design and Optimization Soft-Edge Flip-flops for Improved Timing Yield: Design and Optimization Abstrat Parameter variations ause high yield losses due to their large impat on iruit delay. In this paper, we propose the use of

More information

Neural network-based Load Balancing and Reactive Power Control by Static VAR Compensator

Neural network-based Load Balancing and Reactive Power Control by Static VAR Compensator nternational Journal of Computer and Eletrial Engineering, Vol. 1, No. 1, April 2009 Neural network-based Load Balaning and Reative Power Control by Stati VAR Compensator smail K. Said and Marouf Pirouti

More information

Big Data Analysis and Reporting with Decision Tree Induction

Big Data Analysis and Reporting with Decision Tree Induction Big Data Analysis and Reporting with Deision Tree Indution PETRA PERNER Institute of Computer Vision and Applied Computer Sienes, IBaI Postbox 30 11 14, 04251 Leipzig GERMANY pperner@ibai-institut.de,

More information

Green Cloud Computing

Green Cloud Computing International Journal of Information and Computation Tehnology. ISSN 0974-2239 Volume 4, Number 4 (2014), pp. 431-436 International Researh Publiations House http://www. irphouse.om /ijit.htm Green Cloud

More information

Information Security 201

Information Security 201 FAS Information Seurity 201 Desktop Referene Guide Introdution Harvard University is ommitted to proteting information resoures that are ritial to its aademi and researh mission. Harvard is equally ommitted

More information

Entrepreneur s Guide. Starting and Growing a Business in Pennsylvania FEBRUARY 2015. newpa.com

Entrepreneur s Guide. Starting and Growing a Business in Pennsylvania FEBRUARY 2015. newpa.com Entrepreneur s Guide Starting and Growing a Business in Pennsylvania FEBRUARY 2015 newpa.om The Entrepreneur s Guide: Starting and Growing a Business in Pennsylvania was prepared by the Pennsylvania Department

More information

A Context-Aware Preference Database System

A Context-Aware Preference Database System J. PERVASIVE COMPUT. & COMM. (), MARCH 005. TROUBADOR PUBLISHING LTD) A Context-Aware Preferene Database System Kostas Stefanidis Department of Computer Siene, University of Ioannina,, kstef@s.uoi.gr Evaggelia

More information

User s Guide VISFIT: a computer tool for the measurement of intrinsic viscosities

User s Guide VISFIT: a computer tool for the measurement of intrinsic viscosities File:UserVisfit_2.do User s Guide VISFIT: a omputer tool for the measurement of intrinsi visosities Version 2.a, September 2003 From: Multiple Linear Least-Squares Fits with a Common Interept: Determination

More information

Account Contract for Card Acceptance

Account Contract for Card Acceptance Aount Contrat for Card Aeptane This is an Aount Contrat for the aeptane of debit ards and redit ards via payment terminals, on the website and/or by telephone, mail or fax. You enter into this ontrat with

More information

TRENDS IN EXECUTIVE EDUCATION: TOWARDS A SYSTEMS APPROACH TO EXECUTIVE DEVELOPMENT PLANNING

TRENDS IN EXECUTIVE EDUCATION: TOWARDS A SYSTEMS APPROACH TO EXECUTIVE DEVELOPMENT PLANNING INTERMAN 7 TRENDS IN EXECUTIVE EDUCATION: TOWARDS A SYSTEMS APPROACH TO EXECUTIVE DEVELOPMENT PLANNING by Douglas A. Ready, Albert A. Viere and Alan F. White RECEIVED 2 7 MAY 1393 International Labour

More information

Pattern Recognition Techniques in Microarray Data Analysis

Pattern Recognition Techniques in Microarray Data Analysis Pattern Reognition Tehniques in Miroarray Data Analysis Miao Li, Biao Wang, Zohreh Momeni, and Faramarz Valafar Department of Computer Siene San Diego State University San Diego, California, USA faramarz@sienes.sdsu.edu

More information

Prices and Heterogeneous Search Costs

Prices and Heterogeneous Search Costs Pries and Heterogeneous Searh Costs José L. Moraga-González Zsolt Sándor Matthijs R. Wildenbeest First draft: July 2014 Revised: June 2015 Abstrat We study prie formation in a model of onsumer searh for

More information

Intuitive Guide to Principles of Communications By Charan Langton www.complextoreal.com

Intuitive Guide to Principles of Communications By Charan Langton www.complextoreal.com Intuitive Guide to Priniples of Communiations By Charan Langton www.omplextoreal.om Understanding Frequeny Modulation (FM), Frequeny Shift Keying (FSK), Sunde s FSK and MSK and some more The proess of

More information

GABOR AND WEBER LOCAL DESCRIPTORS PERFORMANCE IN MULTISPECTRAL EARTH OBSERVATION IMAGE DATA ANALYSIS

GABOR AND WEBER LOCAL DESCRIPTORS PERFORMANCE IN MULTISPECTRAL EARTH OBSERVATION IMAGE DATA ANALYSIS HENRI COANDA AIR FORCE ACADEMY ROMANIA INTERNATIONAL CONFERENCE of SCIENTIFIC PAPER AFASES 015 Brasov, 8-30 May 015 GENERAL M.R. STEFANIK ARMED FORCES ACADEMY SLOVAK REPUBLIC GABOR AND WEBER LOCAL DESCRIPTORS

More information

Agile ALM White Paper: Redefining ALM with Five Key Practices

Agile ALM White Paper: Redefining ALM with Five Key Practices Agile ALM White Paper: Redefining ALM with Five Key Praties by Ethan Teng, Cyndi Mithell and Chad Wathington 2011 ThoughtWorks ln. All rights reserved www.studios.thoughtworks.om Introdution The pervasiveness

More information

arxiv:astro-ph/0304006v2 10 Jun 2003 Theory Group, MS 50A-5101 Lawrence Berkeley National Laboratory One Cyclotron Road Berkeley, CA 94720 USA

arxiv:astro-ph/0304006v2 10 Jun 2003 Theory Group, MS 50A-5101 Lawrence Berkeley National Laboratory One Cyclotron Road Berkeley, CA 94720 USA LBNL-52402 Marh 2003 On the Speed of Gravity and the v/ Corretions to the Shapiro Time Delay Stuart Samuel 1 arxiv:astro-ph/0304006v2 10 Jun 2003 Theory Group, MS 50A-5101 Lawrene Berkeley National Laboratory

More information

Paid Placement Strategies for Internet Search Engines

Paid Placement Strategies for Internet Search Engines Paid Plaement Strategies for Internet Searh Engines Hemant K. Bhargava Smeal College of Business Penn State University 342 Beam Building University Park, PA 16802 bhargava@omputer.org Juan Feng Smeal College

More information

Software Ecosystems: From Software Product Management to Software Platform Management

Software Ecosystems: From Software Product Management to Software Platform Management Software Eosystems: From Software Produt Management to Software Platform Management Slinger Jansen, Stef Peeters, and Sjaak Brinkkemper Department of Information and Computing Sienes Utreht University,

More information

The Basics of International Trade: A Classroom Experiment

The Basics of International Trade: A Classroom Experiment The Basis of International Trade: A Classroom Experiment Alberto Isgut, Ganesan Ravishanker, and Tanya Rosenblat * Wesleyan University Abstrat We introdue a simple web-based lassroom experiment in whih

More information

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 9, NO. 3, MAY/JUNE 2012 401

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 9, NO. 3, MAY/JUNE 2012 401 IEEE TRASACTIOS O DEPEDABLE AD SECURE COMPUTIG, VOL. 9, O. 3, MAY/JUE 2012 401 Mitigating Distributed Denial of Servie Attaks in Multiparty Appliations in the Presene of Clok Drifts Zhang Fu, Marina Papatriantafilou,

More information

Discovering Trends in Large Datasets Using Neural Networks

Discovering Trends in Large Datasets Using Neural Networks Disovering Trends in Large Datasets Using Neural Networks Khosrow Kaikhah, Ph.D. and Sandesh Doddameti Department of Computer Siene Texas State University San Maros, Texas 78666 Abstrat. A novel knowledge

More information

INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS

INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS Virginia Department of Taxation INCOME TAX WITHHOLDING GUIDE FOR EMPLOYERS www.tax.virginia.gov 2614086 Rev. 01/16 Table of Contents Introdution... 1 Important... 1 Where to Get Assistane... 1 Online File

More information

Lemon Signaling in Cross-Listings Michal Barzuza*

Lemon Signaling in Cross-Listings Michal Barzuza* Lemon Signaling in Cross-Listings Mihal Barzuza* This paper analyzes the deision to ross-list by managers and ontrolling shareholders assuming that they have private information with respet to the amount

More information

AUTOMATED VISUAL TRAFFIC MONITORING AND SURVEILLANCE THROUGH A NETWORK OF DISTRIBUTED UNITS

AUTOMATED VISUAL TRAFFIC MONITORING AND SURVEILLANCE THROUGH A NETWORK OF DISTRIBUTED UNITS AUTOMATED VISUAL TRAFFIC MOITORIG AD SURVEILLACE THROUGH A ETWORK OF DISTRIBUTED UITS A. Koutsia a, T. Semertzidis a, K. Dimitropoulos a,. Grammalidis a and K. Georgouleas b a Informatis and Telematis

More information

Optimal Online Buffer Scheduling for Block Devices *

Optimal Online Buffer Scheduling for Block Devices * Optimal Online Buffer Sheduling for Blok Devies * ABSTRACT Anna Adamaszek Department of Computer Siene and Centre for Disrete Mathematis and its Appliations (DIMAP) University of Warwik, Coventry, UK A.M.Adamaszek@warwik.a.uk

More information

Measurement of Powder Flow Properties that relate to Gravity Flow Behaviour through Industrial Processing Lines

Measurement of Powder Flow Properties that relate to Gravity Flow Behaviour through Industrial Processing Lines Measurement of Powder Flow Properties that relate to Gravity Flow ehaviour through Industrial Proessing Lines A typial industrial powder proessing line will inlude several storage vessels (e.g. bins, bunkers,

More information

MATE: MPLS Adaptive Traffic Engineering

MATE: MPLS Adaptive Traffic Engineering MATE: MPLS Adaptive Traffi Engineering Anwar Elwalid Cheng Jin Steven Low Indra Widjaja Bell Labs EECS Dept EE Dept Fujitsu Network Communiations Luent Tehnologies Univ. of Mihigan Calteh Pearl River,

More information

An integrated optimization model of a Closed- Loop Supply Chain under uncertainty

An integrated optimization model of a Closed- Loop Supply Chain under uncertainty ISSN 1816-6075 (Print), 1818-0523 (Online) Journal of System and Management Sienes Vol. 2 (2012) No. 3, pp. 9-17 An integrated optimization model of a Closed- Loop Supply Chain under unertainty Xiaoxia

More information

Context-Sensitive Adjustments of Cognitive Control: Conflict-Adaptation Effects Are Modulated by Processing Demands of the Ongoing Task

Context-Sensitive Adjustments of Cognitive Control: Conflict-Adaptation Effects Are Modulated by Processing Demands of the Ongoing Task Journal of Experimental Psyhology: Learning, Memory, and Cognition 2008, Vol. 34, No. 3, 712 718 Copyright 2008 by the Amerian Psyhologial Assoiation 0278-7393/08/$12.00 DOI: 10.1037/0278-7393.34.3.712

More information

CIS570 Lecture 4 Introduction to Data-flow Analysis 3

CIS570 Lecture 4 Introduction to Data-flow Analysis 3 Introdution to Data-flow Analysis Last Time Control flow analysis BT disussion Today Introdue iterative data-flow analysis Liveness analysis Introdue other useful onepts CIS570 Leture 4 Introdution to

More information

A Theoretical Analysis of Credit Card Reform in Australia *

A Theoretical Analysis of Credit Card Reform in Australia * A Theoretial Analysis of Credit Card Reform in Australia * by Joshua S. Gans and Stephen P. King Melbourne Business Shool University of Melbourne First Draft: 12 th May, 2001 This Version: 5 th May, 2003

More information

Optimal Allocation of Filters against DDoS Attacks

Optimal Allocation of Filters against DDoS Attacks Optimal Alloation of Filters against DDoS Attaks Karim El Defrawy ICS Dept. University of California, Irvine Email: keldefra@ui.edu Athina Markopoulou EECS Dept. University of California, Irvine Email:

More information

Picture This: Molecular Maya Puts Life in Life Science Animations

Picture This: Molecular Maya Puts Life in Life Science Animations Piture This: Moleular Maya Puts Life in Life Siene Animations [ Data Visualization ] Based on the Autodesk platform, Digizyme plug-in proves aestheti and eduational effetiveness. BY KEVIN DAVIES In 2010,

More information

Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning

Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning Ranking Community Answers by Modeling Question-Answer Relationships via Analogial Reasoning Xin-Jing Wang Mirosoft Researh Asia 4F Sigma, 49 Zhihun Road Beijing, P.R.China xjwang@mirosoft.om Xudong Tu,Dan

More information

OpenScape 4000 CSTA V7 Connectivity Adapter - CSTA III, Part 2, Version 4.1. Developer s Guide A31003-G9310-I200-1-76D1

OpenScape 4000 CSTA V7 Connectivity Adapter - CSTA III, Part 2, Version 4.1. Developer s Guide A31003-G9310-I200-1-76D1 OpenSape 4000 CSTA V7 Connetivity Adapter - CSTA III, Part 2, Version 4.1 Developer s Guide A31003-G9310-I200-1-76 Our Quality and Environmental Management Systems are implemented aording to the requirements

More information

AngelCast: Cloud-based Peer-Assisted Live Streaming Using Optimized Multi-Tree Construction

AngelCast: Cloud-based Peer-Assisted Live Streaming Using Optimized Multi-Tree Construction AngelCast: Cloud-based Peer-Assisted Live Streaming Using Optimized Multi-Tree Constrution Raymond Sweha Boston University remos@s.bu.edu Vathe Ishakian Boston University visahak@s.bu.edu Azer Bestavros

More information

IT Essentials II: Network Operating Systems

IT Essentials II: Network Operating Systems Unit 13: IT Essentials II: Network Operating Systems Learning Outomes: A andidate following a programme of learning leading to this unit will be able to: demonstrate understanding of network operating

More information

3 Game Theory: Basic Concepts

3 Game Theory: Basic Concepts 3 Game Theory: Basi Conepts Eah disipline of the soial sienes rules omfortably ithin its on hosen domain: : : so long as it stays largely oblivious of the others. Edard O. Wilson (1998):191 3.1 and and

More information

Masters Thesis- Criticality Alarm System Design Guide with Accompanying Alarm System Development for the Radioisotope Production L

Masters Thesis- Criticality Alarm System Design Guide with Accompanying Alarm System Development for the Radioisotope Production L PNNL-18348 Prepared for the U.S. Department of Energy under Contrat DE-AC05-76RL01830 Masters Thesis- Critiality Alarm System Design Guide with Aompanying Alarm System Development for the Radioisotope

More information

The Optimal Deterrence of Tax Evasion: The Trade-off Between Information Reporting and Audits

The Optimal Deterrence of Tax Evasion: The Trade-off Between Information Reporting and Audits The Optimal Deterrene of Tax Evasion: The Trade-off Between Information Reporting and Audits Yulia Paramonova Department of Eonomis, University of Mihigan Otober 30, 2014 Abstrat Despite the widespread

More information