A Data De-duplication Access Framework for Solid State Drives

Size: px
Start display at page:

Download "A Data De-duplication Access Framework for Solid State Drives"

Transcription

1 JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, (2012) A Data De-duplication Access Framework for Solid State Drives Department of Electronic Engineering National Taiwan University of Science and Technology Taipei, 106 Taiwan {chwu; m }@mail.ntust.edu.tw With the rapid development of SSDs (Solid State Drives), traditional hard drives in many applications have been replaced by SSDs. Since SSDs consist of NAND flash memory, the main challenge to SSDs is that NAND flash memory is highly sensitive to write requests. A lot of write requests will cause garbage collection to reclaim free space due to the out-place update characteristic of flash memory. Frequent activities of garbage collection will reduce the lifetime of flash memory and overall performance. When SSDs are used for data storage, how to significantly decrease the amount of data written will become an important topic. In the paper, we will propose a data de-duplication access framework for SSDs. The objective is to eliminate duplicate data as much as possible and reduce space consumption. We will combine a file-based de-duplication and a static chunking de-duplication scheme to reach a complete data de-duplication. We will also investigate application-based locality and file-name locality to find out duplicate data. According to the experimental results, the proposed framework can efficiently identify duplicate data and decrease a lot of data written, and at the same time, the overhead is also reasonable. Keywords: embedded systems, flash memory, solid state drives, data de-duplication, storage systems 1. INTRODUCTION At present, most applications such as consumer electronics and embedded systems have adopted NAND flash memory as their main storage media. NAND flash memory has characteristics such as non-volatile, shock-resistant, and power-economic. Since the capacity of NAND flash-memory chips grows rapidly, SSDs (Solid State Drives) have been popular and are composed of NAND flash memory. For example, OCZ Storage has released huge-capacity NAND flash-based SSDs (e.g., 1TB SSD or above) in the markets. Until now, some desktops, notebooks, embedded systems, and servers have also adopted SSDs. We will refer to NAND flash memory as flash memory hereafter. The management of flash memory as a storage system is significantly different from those based on main memory and disks. The unit of an erase operation is a block and the unit of a read/write operation is a page. Since flash memory has the out-place update characteristic, the corresponding block should be erased before one residing page will be updated. When a lot of data are written or updated to flash memory, a lot of pages might be invalidated and should be erased. Therefore, free space on flash memory could become low after a number of writes, and activities (i.e., garbage collection) in the recy- Received May 31, 2011; accepted March 31, Communicated by Jiman Hong, Junyoung Heo and Tei-Wei Kuo. 941

2 942 cling of available space on flash memory must be done from time to time. However, garbage collection is considered as overhead in flash-memory management (caused by live data copying). Under current technology, a flash-memory block has a limitation on the erase cycle count. For example, a block could be erased for 100,000 times [1]. After that, a worn-out block could suffer from frequent write errors. A Wear-leveling policy intends to erase all blocks on flash memory evenly so that a longer overall lifetime could be achieved. Obviously, the garbage collection must reduce the overhead as much as possible and consider the wear-leveling policy. Overall, write operations caused larger overhead than read operations since write operations may trigger garbage collection activities and decrease the lifetime of flash memory due to erase operations. Traditional hard disk drives have been gradually replaced by SSDs on personal computers, embedded systems, and even server systems. Since SSDs consist of flash memory, these systems might cause a lot of write requests and will lead to performance degradation and reliable problems for SSDs. Fig. 1 shows the percentage of duplicate data on different systems, and duplicate data could occupy about 7%~23% storage size. If the duplicate data can be found out before writing, a lot of write requests to SSDs can be decreased and system performance will improve. This motivates this research. In the paper, we will propose a data de-duplication access framework for SSDs. The objective is to decrease a lot of write requests by eliminating duplicate data as much as possible. Fig. 1. Percentage of duplicate data on different systems. The rest of this paper is organized as follows: Section 2 is the related work for various data de-duplication algorithms. Section 3 presents the characteristics of duplicate data. Section 4 is the proposed data de-duplication access framework. Section 5 shows the experimental results. Section 6 is the conclusion. 2.1 Duplicate Data 2. RELATED WORKS There are two kinds of duplicate data in the storage systems [5]. One is called near-

3 A DATA DE-DUPLICATION ACCESS FRAMEWORK FOR SOLID STATE DRIVES 943 duplicate data and another one is called full-duplicate data. As shown in Fig. 2, whenever a part of a file is modified, a large part of existing data may be unmodified and will be re-written since upper applications (e.g., Word, Vi and Emacs) could not identify the unmodified data. As shown in Table 1 [7], there exists a large overlap between the newly version data and the older version data. As a result, when a file is updated, the unmodified data is called near-duplicate data. On the other hand, when a file is replicated, a new file that has the same content will be created. The content of the new file will be called full-duplicate data. In a file system, near-duplicate data and full-duplicate data always exist and waste storage space. For improving the performance of SSDs, the duplicate data should be identified and not be written such that the activities of garbage collection in SSDs will be decreased and the lifetime of SSDS can be increased. Fig. 2. Near-duplicate data. Table 1. Overlap between the newly and older version data. New Data Older version Data size New data Overlap emacs 20.7 source emacs MB 12.6 MB 76% Elisp doc. + new page postscript 4.1 MB 0.4 MB 90% MSWord doc. + edits MSWord 1.4 MB 0.4MB 68% Athicha Muthitacharoen (2001) measured amount of new data in directory that those data had been edited [7]. 2.2 Data De-Duplication Algorithm The objective of the data de-duplication algorithm is to identify the duplicate data as much as possible. The algorithm usually cuts data stream into fixed or variable size chunks. Each chunk will have an identifier by a fingerprint hash function. Since duplicate data can be identified by comparing the identifiers, the hash function should guarantee a very low collision probability. Only different data can be actually written to storage such as a single-instance storage [2]. There are two kinds of data de-duplication algorithm: inline checking and post checking. In-line checking will execute the identification of duplicate data before writing to storage. Post checking will execute the identification after writing to storage. For hard disk drives, posting checking may not affect system response time and will execute when system load is light. However, posting checking is not suit-

4 944 able for SSDs since it can not reduce the actual write requests. As a result, the kind of in-line checking will be used in the proposed method. 2.3 Data De-Duplication Algorithm The purpose of the fingerprint hash function is to generate a fingerprint for each chunk. Michael O. Rabin proposes that the fingerprint hash function (i.e., Fingerprinting) has following scheme [3, 4]: Fingerprinting Assume Ω is the set of all possible objects, for all A, B belong to Ω with high probability, we have f(a) = f(b) A = B. Note that fingerprinting may have a collision (i.e., f(a) = f(b) A B). A collision occurs if A B and f(a) = f(b). A fingerprint hash function must have very low collision probability. In the paper, chunks are hashed by the robust one-way hash function SHA-1 [9]. Several researchers show that the probability of hash collisions is less than by using the SHA-1 hash function [5]. SHA-1 produces a 160-bit digest from a message with a maximum length of (2 64 1) bits. 2.4 Data Chunking Since the data de-duplication algorithm usually cuts data stream into fixed or variable size chunks, there are two chunking approaches: static chunking and variable chunking [5, 6, 8] Static chunking Static chunking is a fixed size partitioning. In this approach, data stream is divided into a fixed size chunk and the de-duplicate algorithm can compare the fixed size chunks with low complexity. However, the effectiveness of this approach is highly sensitive to data modifications since even one byte modification at the beginning of a file can change the content of all fixed size chunks. For example, Fig. 3 shows that one byte is inserted at offset 5. Due to the shifting data, the content will be changed after offset 5. It is called a data shifting problem. Fig. 3. Two data chunking approaches: static chunking and variable chunking.

5 A DATA DE-DUPLICATION ACCESS FRAMEWORK FOR SOLID STATE DRIVES Static chunking Variable chunking is a variable size partitioning. In this approach, data stream is divided into variable sized chunks. It can solve the data shifting problem which is caused by the static chunking. Whenever a chunk of a file is modified, variable chunking will dynamically change the size of the corresponding chunk, as shown in Fig. 3. However, the overhead of this approach is to maintain the mapping information between a variable size chunk and one or more physical blocks. Although the method could cause management overhead, variable chunking could efficiently eliminate duplicate data by solving the data shifting problem. 3. CHARACTERISTICS OF DUPLICATE DATA To find out duplicate data effectively, we conduct a series of experiments to observe the characteristics of duplicate data. In the experiments, we set up an 8GB SSD storage system with Linux OS and Windows XP. The duplicate checking program is called easy duplicate finder and the checking level is file-based. According to the experimental results, two characteristics of duplicate data will be described as following. 3.1 Application-based Locality Fig. 4 shows that 86% of duplicate data exist in the same directory because a large proportion of duplicate data are generated by the same application. These duplicate data are usually temporary files or log files. We call the characteristic as application-based locality. By the application-based locality, it is efficient to identify duplicate data and also reduce the overhead of the data de-duplication algorithm. Fig. 4. Two characteristics of duplicate data: application-based locality and file-name locality. 3.2 File-name Locality Fig. 4 shows that 60% duplicate data tend to have the same file name (e.g., a file could be copied to other directory with the same file name), 20% duplicate data have the

6 946 similar file name (e.g., system.loga and system.logb could have the duplicate data), and the remaining 20% duplicate data could be unrelated in terms of file name. We call the characteristic as file-name locality. By the file-name locality, it is efficient to identify duplicate data and also reduce the overhead of the data de-duplication algorithm. 4. DATA DE-DUPLICATION ACCESS FRAMEWORK In the section, the data de-duplication (DDD) access framework will be described. The proposed framework can efficiently identify duplicate data and decrease a lot of data written, and at the same time, the overhead is also reasonable. There are four parts in the framework: Meta table Eliminate full-duplicate data Eliminate near-duplicate data Reference count The proposed framework is above the VFS (Virtual File System) layer and will be implemented as library code which can be called by the upper applications. In section 4.1, a meta table is to maintain related meta-data for the data de-duplication algorithm. Each entry in the meta table will denote a file which is currently maintained by the framework. In sections 4.2 and 4.3, we will discuss how to efficiently eliminate full-duplicate data and near-duplicate data. In section 4.4, the concept of reference count is used to maintain the sharing relationship in case the process of deleting files is incorrect. 4.1 Meta Table The design goal of the meta table is to maintain related meta-data for the data deduplication algorithm. Table 2. An entry in the meta table. Entry i.e. Name Key File Name FileD File-Based Fingerprint a9993e aba3... Reference Count 1 Physical Location FileD(0,4096) FileD_log(0, 2048) FileD(4097, 6144) In order to find out duplicate data, the framework will maintain an entry in the meta table for each file. Each entry is represented as follows: name key, file name, file-based fingerprint, reference count, and physical location. Table 2 shows an example. The file name means that the file is currently maintained by the framework. The name key is used

7 A DATA DE-DUPLICATION ACCESS FRAMEWORK FOR SOLID STATE DRIVES 947 to quickly identify which files that have the similar file name. The framework can generate the name key by a hash function and the hash function will sum all characters of the file name in ASCII code. This is because the design of the hash function will not cause heavy overhead and still has a high probability to find out similar file names. The filebased fingerprint is derived by SHA-1 to calculate entire file fingerprint. The file-based fingerprint can be used to quickly identify whether the file s content is the same with others or not. Since a file could be referenced by other entries, reference count will be used and its explanation will be introduced in section 4.4. The physical location will denote where the file is. In this example, FileD(0, 4096) FileD_log(0, 2048) FileD (4097, 6144) represents that the latest version data of FileD at offset 4097 to 6145 is located in the FileD_log at offset 0 to Eliminate Full-duplicate Data According to the application-based locality, a large portion of duplicate data might be generated by the same application. According to the file-name locality, those files that have the similar file name might have a large portion of duplicate data. Therefore, it is very efficient to eliminate duplicate data based on the localities. As shown in Fig. 5, we will use an example to explain how to eliminate duplicate data by the file-name locality. Assume that FileD will be created and written. (1) In the meta table, an entry should be generated for FileD. Its file name, name key, and file-based fingerprint should be created accordingly. (2) When FileD s name key (NK) is created, those files whose name keys are in the range [NK k, NK + k] can be quickly found out, where k is a threshold. Since NK (i.e., F + i + l + e + D ) is created by summing the characters of FileD in AS- CII code, k can be used to find out the files which have the similar file name. (3) After those files which have the similar file name (e.g., FileA and FileC) are found out, FileD s fingerprint can be compared with those files. If one file (e.g., FileA) has the same fingerprint with FileD, it means that FileD will not be written to SSDs and the physical location of the entry in the meta table for FileD should keep the information about FileA. Fig. 5. An example of eliminating full-duplicate data.

8 948 According to our observations, a small portion of files (e.g., 10%~20%) which have no similar file name might also have duplicate data. However, it could need a lot of computing time to find out these files without the similar file name. As a result, those files without the application-based locality and the file-name locality will not be identified for system performance. 4.3 Eliminate Near-duplicate Data In this section, the de-duplication algorithm for near-duplicate data will be described. Section presents the chunk fingerprint table to reduce checking time of fingerprinting. Section discusses the log file accessing scheme to solve the data shifting problem. Since invalid data could occur in the log file, a merge operation will reclaim the invalid data in section Chunk fingerprint table If a file has been updated recently, it might be updated soon. In order to quickly identify near-duplicate data, related fingerprints for the file should be recorded. This is because the fingerprint can help the identification of the near-duplicate data. The chunk fingerprint table will maintain a file's fingerprint when the file is updated. Each entry in the chunk fingerprint table will contain a file name and a 160-bit (SHA-1) fingerprint for each chunk of the file Log file A log file is used to solve the data shifting problem under the static chunking. As shown in Fig. 6 (2), a file may be modified due to data insertion. When the file is closed, the space from the modified area to the last chunk might be written to SSDs. However, if the modified area can be located, as shown in Fig. 6 (3), only the modified area is written to the log file and the corresponding physical location in the meta table should add the new mapping. As shown in Fig. 6 (4), it can locates where the valid area is in the original Fig. 6. Log file: an example of data insertion.

9 A DATA DE-DUPLICATION ACCESS FRAMEWORK FOR SOLID STATE DRIVES 949 file and where the modified area is in the log file. The physical location format has been described in section 4.1. Since only the modified area is written, unnecessary (redundant) data written can be avoided. In the following subsection, we will present how to efficiently identify the modified area when a file is updated Identification of near-duplicate data In the section, we will propose an identification of near-duplicate data under fixedsized chunks. Assume that FileD is the old version file and FileD is the new version file after FileD is updated. Assume that the modified area consists of two parts such as Fig. 7, the identification will execute forward order checking and backward order checking. Forward order checking means that each chunk in FileD is compared with each chunk in FileD by their fingerprints in a forward sequential order from the beginning. Backward order checking has the similar meaning except it is executed in a backward sequential order from the ending. Since FileD and FileD might have different file length, forward order checking and backward order checking can locate different starting offsets and ending offsets for those different parts in the modified area. Note that forward order checking and backward order checking will still execute until all modified areas are found out. After the modified area is identified and written, the corresponding physical location is also updated. As shown in Fig. 8, since the modified area consists of two parts, FileD_log(0,2048) and FileD_log(2049,4097) denote new data mapping and will be added in the corresponding physical location. Fig. 7. Identification of near-duplicate data. Fig. 8. The relocation of de-duplicate algorithm.

10 Merge operation Since data are written to the log file in an appending way, the log file might contain invalid data as time goes by. A merge operation is required to reclaim the invalid data and is like garbage collection for flash memory. As shown in Fig. 9, FileD and its log file FileD_log can be merged to a new file by coping valid data according to the corresponding physical location in the meta table. Obviously, the merge operation might cause overhead in SSDs but it can increase space utilization. Although this is a trade-off, the merge operation can be executed in an optimized why only when the amount ratio of invalid data and valid data is large than a threshold. Note that the amount ratio of invalid data and valid data should be maintained when each update is executed. 4.4 Reference Count Reference count can be used to maintain the sharing relationship in the framework. The process of deleting files must check the corresponding reference count, and the reference file can be deleted only when the reference count is 0. As shown in Fig. 10, FileB Fig. 9. An example of merge operation. Fig. 10. The process of deleting files.

11 A DATA DE-DUPLICATION ACCESS FRAMEWORK FOR SOLID STATE DRIVES 951 is referred by FileA such that their reference counts are 2 and 1, respectively. If FileA is deleted, the reference count of FileB will become 1 and FileA will be deleted actually since its reference count is 0. On the other hand, if FileB is deleted, FileB can not be deleted actually. This is because FileA still refers to FileB. 5. EVALUATION In the experimental environment, the processor is Pentium Dual CPU E GHz and the main memory size is 3GB. The SSD storage is Transcend 8GB SLC solidstate-disk (TS8GSSD25S-S). In the experiments, we will use some famous file systems such as FAT32, NTFS and Ext3 as our comparison baseline. FAT32 and NTFS are the file systems in Windows XP. Ext3 is the most popular file system in Linux and was built on Ubuntu 8.10 in the experiments. A famous benchmark io_profile [15] was used in the experiments and can generate a large number of open-seek-write-close operations at random locations in a set of files. It can be used to measure the performance without the influence of the operating system s buffer cache. We will create a set of test files (e.g., about 60 files) and each file size is 4MB. We will set the random write size from 512/ 4096 bytes to 131,072 bytes and execute 100 repetitions for each random write size. In the experiments, we will measure average throughput, average latency, and bytes written. 5.1 Throughput The experiment measured the average throughput under different random write sizes from 512/4096 bytes to 131,072 bytes. Note that the smallest sector size is 512 bytes and 4096 bytes for FAT32/NTFS and Ext3, respectively. The results are shown in Fig. 11. The x-axis represents the random write size for each repetition and the y-axis represents the average throughput. When the random write size was small, the access framework might spend a lot of time to find out a small amount of duplicate data such that the average throughput was not good. However, when the random write size was increased (e.g., above 2048 bytes), the average throughput will be better than that without the access framework. This was because a large amount of duplicate data can be identified for the large write size. As a result, the access framework can benefit those applications that tend to have a large amount of data written. Fig. 11. Average throughput.

12 Latency The experiment measured the average latency. The average latency means that how long a write request can be finished and also represents an overhead for the access framework. The result is shown in Fig. 12. The x-axis represents the random write size for each repetition and the y-axis represents the average latency. As described in the previous section, high throughput also reflected short latency. This was because the large write size can provide a better opportunity to identify a large amount of duplicate data such that short latency can be achieved. Overall, a file system with the access framework can have short latency. Fig. 12. Average latency. 5.3 Bytes Written Fig. 13. Bytes written. The experiment measured the actual number of bytes written into SSDs. As shown in Fig. 13, the x-axis represents the random write size for each repetition and the y-axis represents the number of bytes written. When a save command by an application (without the identification of duplicate data) is issued to a 4MB file, 4MB data written to SSDs could be required. However, the access framework can decrease the actual number of bytes written significantly when compared to that without the identification of duplicate data.

13 A DATA DE-DUPLICATION ACCESS FRAMEWORK FOR SOLID STATE DRIVES CONCLUSION In the paper, we propose a data de-duplication access framework for SSDs. The framework can eliminate duplicate data as much as possible such that the system performance improves and the lifetime of SSDs increases. There are four parts: (1) Meta table; (2) Eliminate full-duplicate data; (3) Eliminate near-duplicate data; and (4) Reference count, in the access framework. All required meta data for the data de-duplication algorithm are stored in the meta table. Full-duplicate and near-duplicate data can be identified efficiently by the application-based locality and the file-name locality. In particular, the concept of reference count is used to maintain the sharing relationship in case the process of deleting files is incorrect. According to the experimental results, the access framework is efficient to eliminate duplicate data such that the average throughput and latency can be improved significantly. At the same time, the overhead caused by the fingerprint checking is also reasonable. Future research should involve further examination of the characteristics of duplicate data, especially when different applications are executed. With a more considered approach incorporating application designs and access patterns, a real prototype will be designed and manufactured. The real prototype can provide fast and efficient SSDs. REFERENCES 1. Samsung Electronics, NAND Flash-Memory Datasheet and SmartMedia Data Book, Microsoft, Single instance storage in Microsoft windows storage server 2003 R2, Technical White Paper, R. M. Karp and M. O. Rabin, Efficient randomized pattern-matching algorithms, IBM Journal of Research and Development, Vol. 31, 1987, pp M. O. Rabin, Fingerprinting by random polynomials, Center for Research in Computing Technology, Harvard University, A. Brinkmann, Data deduplication, Theoretical Aspects of Storage Systems, pc2.uni-paderborn.de/fileadmin/pc2/media/staffweb/andre_brinkmann/courses/wr oclaw_storage_systems/wroclaw_chapter_3_-deduplication.pdf. 6. D. R. Bobbarjung, S. Jagannathan, and C. Dubnicki, Improving duplicate elimination in storage systems, ACM Transactions on Storage, Vol. 2, 2006, pp A. Muthitacharoen, B. Chen, and D. Mazieres, A low-bandwidth network file system, in Proceedings of the 18th ACM Symposium on Operating Systems Principles, 2001, pp J. Kubiatowicz, et al., Oceanstore: An architecture for global store persistent storage, in Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, 2000, pp National Institute of Standards and Technology, Secure hash standard, Federal Information Processing Standards Publication 180-1, 1995.

14 954 Chin-Hsien Wu ( ) received his B.S. degree in Computer Science from National Chung Cheng University in He received his M.S. and Ph.D. degree in Computer Science from National Taiwan University in 2001 and 2006, respectively. Now, he is an Associate Professor at the department of Electronic Engineering in National Taiwan University of Science and Technology. He is also a member of ACM and IEEE. His research interests include embedded systems, real-time systems, ubiquitous computing, and flash-memory storage systems. Hau-Shan Wu ( ) received his B.S. degree in Computer Science from Tamkang University in He received his M.S. degree in Electronic Engineering from National Taiwan University of Science and Technology in His research interests include embedded systems and flash-memory storage systems.

Multi-level Metadata Management Scheme for Cloud Storage System

Multi-level Metadata Management Scheme for Cloud Storage System , pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,

More information

Offline Deduplication for Solid State Disk Using a Lightweight Hash Algorithm

Offline Deduplication for Solid State Disk Using a Lightweight Hash Algorithm JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.5, OCTOBER, 2015 ISSN(Print) 1598-1657 http://dx.doi.org/10.5573/jsts.2015.15.5.539 ISSN(Online) 2233-4866 Offline Deduplication for Solid State

More information

De-duplication-based Archival Storage System

De-duplication-based Archival Storage System De-duplication-based Archival Storage System Than Than Sint Abstract This paper presents the disk-based backup system in which only relational database files are stored by using data deduplication technology.

More information

FAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory

FAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of flash Memory based Solid State Drives FAST 11 Yongseok Oh University of Seoul Mobile Embedded System Laboratory

More information

Data Storage Framework on Flash Memory using Object-based Storage Model

Data Storage Framework on Flash Memory using Object-based Storage Model 2011 International Conference on Computer Science and Information Technology (ICCSIT 2011) IPCSIT vol. 51 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V51. 118 Data Storage Framework

More information

The assignment of chunk size according to the target data characteristics in deduplication backup system

The assignment of chunk size according to the target data characteristics in deduplication backup system The assignment of chunk size according to the target data characteristics in deduplication backup system Mikito Ogata Norihisa Komoda Hitachi Information and Telecommunication Engineering, Ltd. 781 Sakai,

More information

A Deduplication-based Data Archiving System

A Deduplication-based Data Archiving System 2012 International Conference on Image, Vision and Computing (ICIVC 2012) IPCSIT vol. 50 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V50.20 A Deduplication-based Data Archiving System

More information

Design of a NAND Flash Memory File System to Improve System Boot Time

Design of a NAND Flash Memory File System to Improve System Boot Time International Journal of Information Processing Systems, Vol.2, No.3, December 2006 147 Design of a NAND Flash Memory File System to Improve System Boot Time Song-Hwa Park*, Tae-Hoon Lee*, and Ki-Dong

More information

Flash Memory. Jian-Jia Chen (Slides are based on Yuan-Hao Chang) TU Dortmund Informatik 12 Germany 2015 年 01 月 27 日. technische universität dortmund

Flash Memory. Jian-Jia Chen (Slides are based on Yuan-Hao Chang) TU Dortmund Informatik 12 Germany 2015 年 01 月 27 日. technische universität dortmund 12 Flash Memory Jian-Jia Chen (Slides are based on Yuan-Hao Chang) TU Dortmund Informatik 12 Germany 2015 年 01 月 27 日 These slides use Microsoft clip arts Microsoft copyright restrictions apply Springer,

More information

File System Management

File System Management Lecture 7: Storage Management File System Management Contents Non volatile memory Tape, HDD, SSD Files & File System Interface Directories & their Organization File System Implementation Disk Space Allocation

More information

DEXT3: Block Level Inline Deduplication for EXT3 File System

DEXT3: Block Level Inline Deduplication for EXT3 File System DEXT3: Block Level Inline Deduplication for EXT3 File System Amar More M.A.E. Alandi, Pune, India ahmore@comp.maepune.ac.in Zishan Shaikh M.A.E. Alandi, Pune, India zishan366shaikh@gmail.com Vishal Salve

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

Buffer-Aware Garbage Collection for NAND Flash Memory-Based Storage Systems

Buffer-Aware Garbage Collection for NAND Flash Memory-Based Storage Systems Buffer-Aware Garbage Collection for NAND Flash Memory-Based Storage Systems Sungin Lee, Dongkun Shin and Jihong Kim School of Computer Science and Engineering, Seoul National University, Seoul, Korea {chamdoo,

More information

Theoretical Aspects of Storage Systems Autumn 2009

Theoretical Aspects of Storage Systems Autumn 2009 Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements

More information

CAVE: Channel-Aware Buffer Management Scheme for Solid State Disk

CAVE: Channel-Aware Buffer Management Scheme for Solid State Disk CAVE: Channel-Aware Buffer Management Scheme for Solid State Disk Sung Kyu Park, Youngwoo Park, Gyudong Shim, and Kyu Ho Park Korea Advanced Institute of Science and Technology (KAIST) 305-701, Guseong-dong,

More information

Byte-index Chunking Algorithm for Data Deduplication System

Byte-index Chunking Algorithm for Data Deduplication System , pp.415-424 http://dx.doi.org/10.14257/ijsia.2013.7.5.38 Byte-index Chunking Algorithm for Data Deduplication System Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2 and Young Woong Ko

More information

RNFTL: A Reuse-Aware NAND Flash Translation Layer for Flash Memory

RNFTL: A Reuse-Aware NAND Flash Translation Layer for Flash Memory RNFTL: A Reuse-Aware NAND Flash Translation Layer for Flash Memory Yi Wang, DuoLiu, MengWang, ZhiweiQin, Zili Shao and Yong Guan Department of Computing College of Computer and Information Management The

More information

Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages

Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages Soohyun Yang and Yeonseung Ryu Department of Computer Engineering, Myongji University Yongin, Gyeonggi-do, Korea

More information

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of

More information

An Efficient B-Tree Layer Implementation for Flash-Memory Storage Systems

An Efficient B-Tree Layer Implementation for Flash-Memory Storage Systems An Efficient B-Tree Layer Implementation for Flash-Memory Storage Systems CHIN-HSIEN WU and TEI-WEI KUO National Taiwan University and LI PING CHANG National Chiao-Tung University With the significant

More information

An Adaptive Striping Architecture for Flash Memory Storage Systems of Embedded Systems

An Adaptive Striping Architecture for Flash Memory Storage Systems of Embedded Systems An Adaptive Striping Architecture for Flash Memory Storage Systems of Embedded Systems Li-Pin Chang and Tei-Wei Kuo {d65269,ktw}@csientuedutw Department of Computer Science and Information Engineering

More information

hybridfs: Integrating NAND Flash-Based SSD and HDD for Hybrid File System

hybridfs: Integrating NAND Flash-Based SSD and HDD for Hybrid File System hybridfs: Integrating NAND Flash-Based SSD and HDD for Hybrid File System Jinsun Suk and Jaechun No College of Electronics and Information Engineering Sejong University 98 Gunja-dong, Gwangjin-gu, Seoul

More information

ChunkStash: Speeding up Inline Storage Deduplication using Flash Memory

ChunkStash: Speeding up Inline Storage Deduplication using Flash Memory ChunkStash: Speeding up Inline Storage Deduplication using Flash Memory Biplob Debnath Sudipta Sengupta Jin Li Microsoft Research, Redmond, WA, USA University of Minnesota, Twin Cities, USA Abstract Storage

More information

File Systems for Flash Memories. Marcela Zuluaga Sebastian Isaza Dante Rodriguez

File Systems for Flash Memories. Marcela Zuluaga Sebastian Isaza Dante Rodriguez File Systems for Flash Memories Marcela Zuluaga Sebastian Isaza Dante Rodriguez Outline Introduction to Flash Memories Introduction to File Systems File Systems for Flash Memories YAFFS (Yet Another Flash

More information

Model and Validation of Block Cleaning Cost for Flash Memory*, **

Model and Validation of Block Cleaning Cost for Flash Memory*, ** Model and Validation of Block Cleaning Cost for Flash Memory*, ** Seungjae Baek 1, Jongmoo Choi 1, Donghee Lee 2, and Sam H. Noh 3 1 Division of Information and Computer Science, Dankook University, Korea,

More information

An Efficient B-Tree Layer for Flash-Memory Storage Systems

An Efficient B-Tree Layer for Flash-Memory Storage Systems An Efficient B-Tree Layer for Flash-Memory Storage Systems Chin-Hsien Wu, Li-Pin Chang, and Tei-Wei Kuo {d90003,d6526009,ktw}@csie.ntu.edu.tw Department of Computer Science and Information Engineering

More information

Implementation and Challenging Issues of Flash-Memory Storage Systems

Implementation and Challenging Issues of Flash-Memory Storage Systems Implementation and Challenging Issues of Flash-Memory Storage Systems Tei-Wei Kuo Department of Computer Science & Information Engineering National Taiwan University Agenda Introduction Management Issues

More information

Indexing on Solid State Drives based on Flash Memory

Indexing on Solid State Drives based on Flash Memory Indexing on Solid State Drives based on Flash Memory Florian Keusch MASTER S THESIS Systems Group Department of Computer Science ETH Zurich http://www.systems.ethz.ch/ September 2008 - March 2009 Supervised

More information

Solid State Drive (SSD) FAQ

Solid State Drive (SSD) FAQ Solid State Drive (SSD) FAQ Santosh Kumar Rajesh Vijayaraghavan O c t o b e r 2 0 1 1 List of Questions Why SSD? Why Dell SSD? What are the types of SSDs? What are the best Use cases & applications for

More information

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely

More information

Empirical Inspection of IO subsystem for Flash Storage Device at the aspect of discard

Empirical Inspection of IO subsystem for Flash Storage Device at the aspect of discard , pp.59-63 http://dx.doi.org/10.14257/astl.2016.135.16 Empirical Inspection of IO subsystem for Flash Storage Device at the aspect of discard Seung-Ho Lim and Ki-Jin Kim Division of Computer and Electronic

More information

p-oftl: An Object-based Semantic-aware Parallel Flash Translation Layer

p-oftl: An Object-based Semantic-aware Parallel Flash Translation Layer p-oftl: An Object-based Semantic-aware Parallel Flash Translation Layer Wei Wang, Youyou Lu, and Jiwu Shu Department of Computer Science and Technology, Tsinghua University, Beijing, China Tsinghua National

More information

Quanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud

Quanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Quanqing XU Quanqing.Xu@nicta.com.au YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Outline Motivation YuruBackup s Architecture Backup Client File Scan, Data

More information

SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs

SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND -Based SSDs Sangwook Shane Hahn, Sungjin Lee, and Jihong Kim Department of Computer Science and Engineering, Seoul National University,

More information

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services Jiansheng Wei, Hong Jiang, Ke Zhou, Dan Feng School of Computer, Huazhong University of Science and Technology,

More information

In-Block Level Redundancy Management for Flash Storage System

In-Block Level Redundancy Management for Flash Storage System , pp.309-318 http://dx.doi.org/10.14257/ijmue.2015.10.9.32 In-Block Level Redundancy Management for Flash Storage System Seung-Ho Lim Division of Computer and Electronic Systems Engineering Hankuk University

More information

Comparison of NAND Flash Technologies Used in Solid- State Storage

Comparison of NAND Flash Technologies Used in Solid- State Storage An explanation and comparison of SLC and MLC NAND technologies August 2010 Comparison of NAND Flash Technologies Used in Solid- State Storage By Shaluka Perera IBM Systems and Technology Group Bill Bornstein

More information

Flexible Storage Allocation

Flexible Storage Allocation Flexible Storage Allocation A. L. Narasimha Reddy Department of Electrical and Computer Engineering Texas A & M University Students: Sukwoo Kang (now at IBM Almaden) John Garrison Outline Big Picture Part

More information

A PRAM and NAND Flash Hybrid Architecture for High-Performance Embedded Storage Subsystems

A PRAM and NAND Flash Hybrid Architecture for High-Performance Embedded Storage Subsystems A PRAM and NAND Flash Hybrid Architecture for High-Performance Embedded Storage Subsystems Jin Kyu Kim 1 Hyung Gyu Lee 1 Shinho Choi 2 Kyoung Il Bahng 2 1 Samsung Advanced Institute of Technology, CTO,

More information

Deploying De-Duplication on Ext4 File System

Deploying De-Duplication on Ext4 File System Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College

More information

Managing Storage Space in a Flash and Disk Hybrid Storage System

Managing Storage Space in a Flash and Disk Hybrid Storage System Managing Storage Space in a Flash and Disk Hybrid Storage System Xiaojian Wu, and A. L. Narasimha Reddy Dept. of Electrical and Computer Engineering Texas A&M University IEEE International Symposium on

More information

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE 1 M.PRADEEP RAJA, 2 R.C SANTHOSH KUMAR, 3 P.KIRUTHIGA, 4 V. LOGESHWARI 1,2,3 Student,

More information

On Benchmarking Popular File Systems

On Benchmarking Popular File Systems On Benchmarking Popular File Systems Matti Vanninen James Z. Wang Department of Computer Science Clemson University, Clemson, SC 2963 Emails: {mvannin, jzwang}@cs.clemson.edu Abstract In recent years,

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Data Backup and Archiving with Enterprise Storage Systems

Data Backup and Archiving with Enterprise Storage Systems Data Backup and Archiving with Enterprise Storage Systems Slavjan Ivanov 1, Igor Mishkovski 1 1 Faculty of Computer Science and Engineering Ss. Cyril and Methodius University Skopje, Macedonia slavjan_ivanov@yahoo.com,

More information

Spatial Data Management over Flash Memory

Spatial Data Management over Flash Memory Spatial Data Management over Flash Memory Ioannis Koltsidas 1 and Stratis D. Viglas 2 1 IBM Research, Zurich, Switzerland iko@zurich.ibm.com 2 School of Informatics, University of Edinburgh, UK sviglas@inf.ed.ac.uk

More information

Original-page small file oriented EXT3 file storage system

Original-page small file oriented EXT3 file storage system Original-page small file oriented EXT3 file storage system Zhang Weizhe, Hui He, Zhang Qizhen School of Computer Science and Technology, Harbin Institute of Technology, Harbin E-mail: wzzhang@hit.edu.cn

More information

A File-System-Aware FTL Design for Flash-Memory Storage Systems

A File-System-Aware FTL Design for Flash-Memory Storage Systems A File-System-Aware FTL Design for Flash-Memory Storage Systems Po-Liang Wu Department of Computer Science and Information Engineering National Taiwan University, Taipei 16, Taiwan, R.O.C. Email: b9129@csie.ntu.edu.tw

More information

SkimpyStash: RAM Space Skimpy Key-Value Store on Flash-based Storage

SkimpyStash: RAM Space Skimpy Key-Value Store on Flash-based Storage SkimpyStash: RAM Space Skimpy Key-Value Store on Flash-based Storage Biplob Debnath,1 Sudipta Sengupta Jin Li Microsoft Research, Redmond, WA, USA EMC Corporation, Santa Clara, CA, USA ABSTRACT We present

More information

HHB+tree Index for Functional Enhancement of NAND Flash Memory-Based Database

HHB+tree Index for Functional Enhancement of NAND Flash Memory-Based Database , pp. 289-294 http://dx.doi.org/10.14257/ijseia.2015.9.9.25 HHB+tree Index for Functional Enhancement of NAND Flash Memory-Based Database Huijeong Ju 1 and Sungje Cho 2 1,2 Department of Education Dongbang

More information

2LGC: An Atomic-Unit Garbage Collection Scheme with a Two-Level List for NAND Flash Storage

2LGC: An Atomic-Unit Garbage Collection Scheme with a Two-Level List for NAND Flash Storage 2LGC: An Atomic-Unit Garbage Collection Scheme with a Two-Level List for NAND Flash Storage Sanghyuk Jung and Yong Ho Song Department of Electronics Computer Engineering, Hanyang University, Seoul, Korea

More information

Filesystems Performance in GNU/Linux Multi-Disk Data Storage

Filesystems Performance in GNU/Linux Multi-Disk Data Storage JOURNAL OF APPLIED COMPUTER SCIENCE Vol. 22 No. 2 (2014), pp. 65-80 Filesystems Performance in GNU/Linux Multi-Disk Data Storage Mateusz Smoliński 1 1 Lodz University of Technology Faculty of Technical

More information

Flash Memory Based Failure Recovery Model by Using the F-Tree Index

Flash Memory Based Failure Recovery Model by Using the F-Tree Index , pp.283-290 http://dx.doi.org/10.14257/ijmue.2015.10.10.28 Flash Memory Based Failure Recovery Model by Using the F-Tree Index Sung-Soo Han 1* and Chang-Ho Seok 2 1 Department of Statistics and Information

More information

Integrating NAND Flash Devices onto Servers By David Roberts, Taeho Kgil, and Trevor Mudge

Integrating NAND Flash Devices onto Servers By David Roberts, Taeho Kgil, and Trevor Mudge Integrating NAND Flash Devices onto Servers By David Roberts, Taeho Kgil, and Trevor Mudge doi:.45/498765.49879 Abstract Flash is a widely used storage device in portable mobile devices such as smart phones,

More information

SH-Sim: A Flexible Simulation Platform for Hybrid Storage Systems

SH-Sim: A Flexible Simulation Platform for Hybrid Storage Systems , pp.61-70 http://dx.doi.org/10.14257/ijgdc.2014.7.3.07 SH-Sim: A Flexible Simulation Platform for Hybrid Storage Systems Puyuan Yang 1, Peiquan Jin 1,2 and Lihua Yue 1,2 1 School of Computer Science and

More information

3Gen Data Deduplication Technical

3Gen Data Deduplication Technical 3Gen Data Deduplication Technical Discussion NOTICE: This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice and

More information

Survey of Filesystems for Embedded Linux. Presented by Gene Sally CELF

Survey of Filesystems for Embedded Linux. Presented by Gene Sally CELF Survey of Filesystems for Embedded Linux Presented by Gene Sally CELF Presentation Filesystems In Summary What is a filesystem Kernel and User space filesystems Picking a root filesystem Filesystem Round-up

More information

DEDUPLICATION has become a key component in modern

DEDUPLICATION has become a key component in modern IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 3, MARCH 2016 855 Reducing Fragmentation for In-line Deduplication Backup Storage via Exploiting Backup History and Cache Knowledge Min

More information

Recovery Protocols For Flash File Systems

Recovery Protocols For Flash File Systems Recovery Protocols For Flash File Systems Ravi Tandon and Gautam Barua Indian Institute of Technology Guwahati, Department of Computer Science and Engineering, Guwahati - 781039, Assam, India {r.tandon}@alumni.iitg.ernet.in

More information

Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card

Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card Version 1.0 April 2011 DB15-000761-00 Revision History Version and Date Version 1.0, April 2011 Initial

More information

Computer Engineering and Systems Group Electrical and Computer Engineering SCMFS: A File System for Storage Class Memory

Computer Engineering and Systems Group Electrical and Computer Engineering SCMFS: A File System for Storage Class Memory SCMFS: A File System for Storage Class Memory Xiaojian Wu, Narasimha Reddy Texas A&M University What is SCM? Storage Class Memory Byte-addressable, like DRAM Non-volatile, persistent storage Example: Phase

More information

Embedded Operating Systems in a Point of Sale Environment. White Paper

Embedded Operating Systems in a Point of Sale Environment. White Paper Embedded Operating Systems in a Point of Sale Environment White Paper December 2008 Contents Embedded Operating Systems in a POS Environment... 3 Overview... 3 POS Operating Systems... 3 Operating Systems

More information

Ryusuke KONISHI NTT Cyberspace Laboratories NTT Corporation

Ryusuke KONISHI NTT Cyberspace Laboratories NTT Corporation Ryusuke KONISHI NTT Cyberspace Laboratories NTT Corporation NILFS Introduction FileSystem Design Development Status Wished features & Challenges Copyright (C) 2009 NTT Corporation 2 NILFS is the Linux

More information

A Group-Based Wear-Leveling Algorithm for Large-Capacity Flash Memory Storage Systems

A Group-Based Wear-Leveling Algorithm for Large-Capacity Flash Memory Storage Systems A Group-Based Wear-Leveling Algorithm for Large-Capacity Flash Memory Storage Systems Dawoon Jung, Yoon-Hee Chae, Heeseung Jo, Jin-Soo Kim, and Joonwon Lee Computer Science Division Korea Advanced Institute

More information

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP 1 M.SHYAMALA DEVI, 2 V.VIMAL KHANNA, 3 M.SHAHEEN SHAH 1 Assistant Professor, Department of CSE, R.M.D.

More information

Understanding endurance and performance characteristics of HP solid state drives

Understanding endurance and performance characteristics of HP solid state drives Understanding endurance and performance characteristics of HP solid state drives Technology brief Introduction... 2 SSD endurance... 2 An introduction to endurance... 2 NAND organization... 2 SLC versus

More information

Flash Memory Technology in Enterprise Storage

Flash Memory Technology in Enterprise Storage NETAPP WHITE PAPER Flash Memory Technology in Enterprise Storage Flexible Choices to Optimize Performance Mark Woods and Amit Shah, NetApp November 2008 WP-7061-1008 EXECUTIVE SUMMARY Solid state drives

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

UBI with Logging. Brijesh Singh Samsung, India brij.singh@samsung.com. Rohit Vijay Dongre Samsung, India rohit.dongre@samsung.com.

UBI with Logging. Brijesh Singh Samsung, India brij.singh@samsung.com. Rohit Vijay Dongre Samsung, India rohit.dongre@samsung.com. UBI with Logging Brijesh Singh Samsung, India brij.singh@samsung.com Rohit Vijay Dongre Samsung, India rohit.dongre@samsung.com Abstract Flash memory is widely adopted as a novel nonvolatile storage medium

More information

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression Sponsored by: Oracle Steven Scully May 2010 Benjamin Woo IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

A PRAM and NAND Flash Hybrid Architecture for High-Performance Embedded Storage Subsystems

A PRAM and NAND Flash Hybrid Architecture for High-Performance Embedded Storage Subsystems 1 A PRAM and NAND Flash Hybrid Architecture for High-Performance Embedded Storage Subsystems Chul Lee Software Laboratory Samsung Advanced Institute of Technology Samsung Electronics Outline 2 Background

More information

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems*

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* Junho Jang, Saeyoung Han, Sungyong Park, and Jihoon Yang Department of Computer Science and Interdisciplinary Program

More information

Metadata Feedback and Utilization for Data Deduplication Across WAN

Metadata Feedback and Utilization for Data Deduplication Across WAN Zhou B, Wen JT. Metadata feedback and utilization for data deduplication across WAN. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 31(3): 604 623 May 2016. DOI 10.1007/s11390-016-1650-6 Metadata Feedback

More information

An Overview of Flash Storage for Databases

An Overview of Flash Storage for Databases An Overview of Flash Storage for Databases Vadim Tkachenko Morgan Tocker http://percona.com MySQL CE Apr 2010 -2- Introduction Vadim Tkachenko Percona Inc, CTO and Lead of Development Morgan Tocker Percona

More information

Data Deduplication in BitTorrent

Data Deduplication in BitTorrent Data Deduplication in BitTorrent João Pedro Amaral Nunes October 14, 213 Abstract BitTorrent is the most used P2P file sharing platform today, with hundreds of millions of files shared. The system works

More information

Contents. WD Arkeia Page 2 of 14

Contents. WD Arkeia Page 2 of 14 Contents Contents...2 Executive Summary...3 What Is Data Deduplication?...4 Traditional Data Deduplication Strategies...5 Deduplication Challenges...5 Single-Instance Storage...5 Fixed-Block Deduplication...6

More information

ProTrack: A Simple Provenance-tracking Filesystem

ProTrack: A Simple Provenance-tracking Filesystem ProTrack: A Simple Provenance-tracking Filesystem Somak Das Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology das@mit.edu Abstract Provenance describes a file

More information

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose Abhirupa Chatterjee 1, Divya. R. Krishnan 2, P. Kalamani 3 1,2 UG Scholar, Sri Sairam College Of Engineering, Bangalore. India

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

NAND Flash Memories. Understanding NAND Flash Factory Pre-Programming. Schemes

NAND Flash Memories. Understanding NAND Flash Factory Pre-Programming. Schemes NAND Flash Memories Understanding NAND Flash Factory Pre-Programming Schemes Application Note February 2009 an_elnec_nand_schemes, version 1.00 Version 1.00/02.2009 Page 1 of 20 NAND flash technology enables

More information

Two-Level Metadata Management for Data Deduplication System

Two-Level Metadata Management for Data Deduplication System Two-Level Metadata Management for Data Deduplication System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3.,Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea { kongjs,

More information

EMC XTREMIO EXECUTIVE OVERVIEW

EMC XTREMIO EXECUTIVE OVERVIEW EMC XTREMIO EXECUTIVE OVERVIEW COMPANY BACKGROUND XtremIO develops enterprise data storage systems based completely on random access media such as flash solid-state drives (SSDs). By leveraging the underlying

More information

Offloading file search operation for performance improvement of smart phones

Offloading file search operation for performance improvement of smart phones Offloading file search operation for performance improvement of smart phones Ashutosh Jain mcs112566@cse.iitd.ac.in Vigya Sharma mcs112564@cse.iitd.ac.in Shehbaz Jaffer mcs112578@cse.iitd.ac.in Kolin Paul

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_SSD_Cache_WP_ 20140512 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges...

More information

A+ Guide to Software: Managing, Maintaining, and Troubleshooting, 5e. Chapter 3 Installing Windows

A+ Guide to Software: Managing, Maintaining, and Troubleshooting, 5e. Chapter 3 Installing Windows : Managing, Maintaining, and Troubleshooting, 5e Chapter 3 Installing Windows Objectives How to plan a Windows installation How to install Windows Vista How to install Windows XP How to install Windows

More information

NAND Flash-based Disk Cache Using SLC/MLC Combined Flash Memory

NAND Flash-based Disk Cache Using SLC/MLC Combined Flash Memory International Workshop on Storage Network Architecture and Parallel I/Os NAND Flash-based Disk Cache Using /MLC Combined Flash Memory Seongcheol Hong School of Information and Communication Engineering

More information

Boosting Database Batch workloads using Flash Memory SSDs

Boosting Database Batch workloads using Flash Memory SSDs Boosting Database Batch workloads using Flash Memory SSDs Won-Gill Oh and Sang-Won Lee School of Information and Communication Engineering SungKyunKwan University, 27334 2066, Seobu-Ro, Jangan-Gu, Suwon-Si,

More information

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING José Daniel García Sánchez ARCOS Group University Carlos III of Madrid Contents 2 The ARCOS Group. Expand motivation. Expand

More information

A Virtual Storage Environment for SSDs and HDDs in Xen Hypervisor

A Virtual Storage Environment for SSDs and HDDs in Xen Hypervisor A Virtual Storage Environment for SSDs and HDDs in Xen Hypervisor Yu-Jhang Cai, Chih-Kai Kang and Chin-Hsien Wu Department of Electronic and Computer Engineering National Taiwan University of Science and

More information

Flash Memory Solutions for Small Business

Flash Memory Solutions for Small Business Optimization Techniques Change the Competitive Landscape Jered Floyd CTO, Permabit Technology Corp. Flash Memory Summit 2013 Santa Clara, CA 1 Permabit and Albireo Overview Headquarters: Cambridge, MA

More information

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

Flash In The Enterprise

Flash In The Enterprise Flash In The Enterprise Technology and Market Overview Chris M Evans, Langton Blue Ltd Architecting IT January 2014 Doc ID: AI1401-01S Table of Contents The Need for Flash Storage... 3 IOPS Density...

More information

Efficient Identification of Hot Data for Flash Memory Storage Systems

Efficient Identification of Hot Data for Flash Memory Storage Systems Efficient Identification of Hot Data for Flash Memory Storage Systems JEN-WEI HSIEH and TEI-WEI KUO National Taiwan University and LI-PIN CHANG National Chiao-Tung University Hot data identification for

More information

Encrypt-FS: A Versatile Cryptographic File System for Linux

Encrypt-FS: A Versatile Cryptographic File System for Linux Encrypt-FS: A Versatile Cryptographic File System for Linux Abstract Recently, personal sensitive information faces the possibility of unauthorized access or loss of storage devices. Cryptographic technique

More information

Google File System. Web and scalability

Google File System. Web and scalability Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might

More information

760 Veterans Circle, Warminster, PA 18974 215-956-1200. Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA 18974.

760 Veterans Circle, Warminster, PA 18974 215-956-1200. Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA 18974. 760 Veterans Circle, Warminster, PA 18974 215-956-1200 Technical Proposal Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA 18974 for Conduction Cooled NAS Revision 4/3/07 CC/RAIDStor: Conduction

More information

Flash-Friendly File System (F2FS)

Flash-Friendly File System (F2FS) Flash-Friendly File System (F2FS) Feb 22, 2013 Joo-Young Hwang (jooyoung.hwang@samsung.com) S/W Dev. Team, Memory Business, Samsung Electronics Co., Ltd. Agenda Introduction FTL Device Characteristics

More information

Data Deduplication and Tivoli Storage Manager

Data Deduplication and Tivoli Storage Manager Data Deduplication and Tivoli Storage Manager Dave Cannon Tivoli Storage Manager rchitect Oxford University TSM Symposium September 2007 Disclaimer This presentation describes potential future enhancements

More information