Similarity Evaluation Scheme Using Fixed-length VLC Based Representative Hash

Size: px
Start display at page:

Download "Similarity Evaluation Scheme Using Fixed-length VLC Based Representative Hash"

Transcription

1 Similarity Evaluation Scheme Using Fixed-length VLC Based Representative Hash Young Jun Yoo 1, Jin Kim, Jung min So, Jeong Gun Lee, Sun Jung Kim,Young Woong Ko 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea { willow72, jinkim, jso, jeonggun.lee, sunkim, yuko}@hallym.ac.kr Abstract. File similarity evaluation is essential for data deduplication and similar file searching. In the file similarity processing time, the CPU consumption and resource overhead of memory are increased as the number of files increase. Moreover, as the file size is getting bigger, the overhead of metadata capacity is critical. In this paper, we suggest the similarity evaluation scheme using Fixed-length VLC with representative hash scheme, which reduce overall processing time of similarity evaluation. Experiment result shows that the proposed system can reduce processing time and data bandwidth. Keywords: File, Synchronization, chunking, Hash, FLC 1 Introduction File similarity evaluation is used in various fields such as malicious file detection and data deduplication. In security area, malicious files are generated by modifying original executable file, so the file similarity search is used to identify file similarity or find malicious code from the executable programs within the system. Usually, the malicious file is very similar with original file because file patch is appended in the middle of original file. Data deduplication system also handles similar files with various chunking approach. The well-known chunking approaches are FLC(Fixedlength chunking) and VLC(Variable-length chunking). Deduplication approaches can save computing resources through detecting the file similarity and eliminating the duplicated region of a file. In file similarity searching system module, the performance depends on hash comparison speed, therefore effective hash comparison is very critical. In this paper, we propose file similarity evaluation system which determines similarity between client files and the server files. By searching high similarity files, we can reduce file synchronization overhead. This technique is much faster than traditional file synchronization systems which compare hash data of a client to the 1 This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(.2012R1A1A ) and this research was supported by the MKE, Korea and NHN. under IT/SW Creative research program supervised by the NIPA (NIPA-2012) IST 2013, ASTL Vol. 23, pp , 2013 SERSC

2 Proceedings, The 2nd International Conference on Information Science and Technology server one. The proposed scheme adapts VLC approach for data chunking, however it only hashes fixed-length data block for extract hash values. Also the proposed scheme reduces hash comparison time by using representative hash scheme which generates a hash list that only contains key feature hash list of a file. 2 Related works Currently, there are several well-known file synchronization schemes including Rsync[1], Venti[2], Active Sync[3], HotSync[4] and CPISync[5]. Those approaches lacks of checking file similarity, so it is difficult to apply those scheme to versioning system where frequently data is modified and has lots of data duplication between files. There are more elaborated research results where each scheme considers file duplication and provides efficient file synchronization; LLRFS[6] use CDC(Contents Define Chunking) and set reconciliation to diminish data duplication and minimize network traffic. Tpsync[7] finds duplicated region in a file with CDC and apply rolling checksum for more enhanced deduplication. Rsync[1] is a software application for Unix systems which synchronizes files and directories from one location to another while minimizing data transfer using delta encoding when appropriate. An important feature of Rsync not found in most similar programs/protocols is that the mirroring takes place with only one transmission in each direction. Rsync can copy or display directory contents and copy files, optionally using compression and recursion. Venti[2] is a network storage system that permanently stores data blocks. A 160-bit SHA-1 hash of the data(called score by Venti) acts as the address of the data. This enforces a write-once policy since no other data block can be found with the same address. The addresses of multiple writes of the same data are identical. So duplicate data is easily identified and the data block is stored only once. 3 Design and Implementation of the system In this paper, we designed and implemented the file similarity evaluation scheme for file synchronization system. In file synchronization is the process of ensuring that files in multiple locations are updated via certain rules. Updated files are copied from a source location to one or more target locations, or updated files are copied in both directions. The purpose of file synchronization is to keep the file of multiple locations identical to each other. The file synchronization is widely used for backups or updating on storage systems and it is especially useful for mobile devices, or others that work on multiple computers. To reduce data bandwidth, file synchronization has to prevent copying already identical files and thus have to save processing time during file copying. In this paper, we propose a practical approach for minimizing metadata communication overhead for remote file synchronization system. The main idea of this paper is to apply efficient hash comparison technique by reducing hash keys of file into small number of hash keys using representative hash selection. We designed and implemented the proposed file synchronization scheme on the static chunking 305

3 Similarity Evaluation Scheme Using Fixed-length VLC Based Representative Hash deduplication system. The chunking approaches are often used as separated to FLC (Fixed-Length Chunking) and VLC (Variable-Length Chunking) to find file similarity. VLC operation module is much higher effective to determining file similarity than other approach but high probability to big overhead of comparing process. Fixedlength chunking approach is dividing file into block by predefined constant length and calculating hash (MD5, SHA1 ) values for each block. Then accumulates hashes to hash-list and comparing these values to file hashes in the server to finds file similarity. FLC performs very fast because file is divided into blocks by constant length. The proposed system reads whole file stream, chunking into fixed-size blocks starting from each anchor position and generates hash values. The hash values are reduced by using the representative hash scheme. Figure 1. Pseudo code of proposed system The algorithm (figure 1) shows how the proposed system performing chunking and hashing. First, the algorithm searches anchor bytes in a file stream. Second, if it finds anchor bytes in a file stream, it extracts block hash from the fixed-sized block starting from the anchor. Figure 2. Representative hash concept 306

4 Proceedings, The 2nd International Conference on Information Science and Technology As can be seen in figure 2, Rabin hash function is used for computing a hash key for a block. The Rabin hash starts at each byte in the first byte of a file and over the block size of bytes to its right. If the Rabin computation at the first byte is completed then we have to compute the Rabin hash at the second byte incrementally from the first hash value. Now that the hash value at the second byte is available then we use it to incrementally compute the hash value at the third, and continue this process. We have to sort the Rabin hash value and choose only 10 maximum values as a representative hash. In this work, we made the representative hash list for all files before data deduplication. We extract one representative hash for 1 MByte therefore the amount of additional information for file similarity is not critical for metadata management. 4 Experiment Result We now present some experimental results that show the potential of the proposed algorithm. To perform comprehensive analysis on the proposed algorithm, we implemented the client and the server on the platform that consist of 3GHz Pentium 4 Processor, WD-1600JS hard disk and 100Mbps network. We made experimental data set using patch that means data block used for modifying a file in a random manner. In this experiment, we modified a file using lseek function in Linux system using random file offset and applied a patch to make test data file. For each run, we did multiple runs with different data sets, and plot the average resulting value. Experiment original data is modified as 90%, 80% and 70% duplicates by Linux dd command. Experiment goal is to measure the evaluation result of file similarity by Fixed-size VLC with representative hash(fvr) and traditional FLC approaches. Also we also measured error size when we measured file similarity. Table-1. File similarity experiment result FLC FVR Number of hash Perform speed (sec) Table-1 shows the performance result of execution speed. To see experiment result more detail, FLC file similarity approach produces hashes and FVR approach only produces 500 representative hashes when performing the hash operation to the same file. The difference between performing speed of traditional FLC in 0.4 sec and FVR in sec. Table-3. Performing error distribution Actual file similarity Perform result(%) Error(%) 90(%) (%) (%)

5 Similarity Evaluation Scheme Using Fixed-length VLC Based Representative Hash Table-2 shows errors of the proposed system performing result. From the experiment proposed system can determine each 88,4% duplicates of real 90 duplicates which 1.6% loss 81% of 80 real duplicates which 1% loss and 73.8% duplicates of 70% real duplicates which 3.8% loss in experiment. Therefore proposed system is considered to be proved to can determine file similarity in very high rate with fast time. 5 Conclusion In this paper, we introduced structure for representative hash based file similarity evaluation technique which almost similar results as traditional FLC but reducing overhead. By searching high similarity files, we can reduce file synchronization overhead. This technique is much faster than traditional file synchronization systems which compare hash data of a client to the server one. The proposed scheme adapts VLC approach for data chunking, however it only hashes fixed-length data block for extract hash values. References 1. Tridgell, A.: Efficient algorithms for sorting and synchronization. PhD thesis, The Australian National University (1999) 2. Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings of the FAST 2002 Conference on File and Storage Technologies. Volume 4 (2002) 3. Meunier, P., Nystrom, S., Kamara, S., Yost, S., Alexander, K., Noland, D., Crane, J.: Activesync, tcp/ip and b wireless vulnerabilities of wince-based pdas. WETICE 2002, IEEE International workshops on (2002) 4. HotSync, P.: Palm developer online documentation (2007) 5. Starobinski, D., Trachtenberg, A., Agarwal, S.: Efficient pda synchronization. Mobile Computing, IEEE Transactions on 2(1) (2003) 6. Yan, H., Irmak, U., Suel, T.: Algorithms for low-latency remote synchronization. In: INFOCOM The 27th Conference on Computer Communications. IEEE (2008) 7. Xu, D., Sheng, Y., Ju, D., Wu, J., Wang, D.: High effective two-round remote file fast synchronization algorithm. 5(1) (2011) 308

Two-Level Metadata Management for Data Deduplication System

Two-Level Metadata Management for Data Deduplication System Two-Level Metadata Management for Data Deduplication System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3.,Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea { kongjs,

More information

Multi-level Metadata Management Scheme for Cloud Storage System

Multi-level Metadata Management Scheme for Cloud Storage System , pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1

More information

Byte-index Chunking Algorithm for Data Deduplication System

Byte-index Chunking Algorithm for Data Deduplication System , pp.415-424 http://dx.doi.org/10.14257/ijsia.2013.7.5.38 Byte-index Chunking Algorithm for Data Deduplication System Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2 and Young Woong Ko

More information

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of

More information

Theoretical Aspects of Storage Systems Autumn 2009

Theoretical Aspects of Storage Systems Autumn 2009 Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements

More information

A Data De-duplication Access Framework for Solid State Drives

A Data De-duplication Access Framework for Solid State Drives JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 941-954 (2012) A Data De-duplication Access Framework for Solid State Drives Department of Electronic Engineering National Taiwan University of Science

More information

Deploying De-Duplication on Ext4 File System

Deploying De-Duplication on Ext4 File System Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College

More information

Availability Digest. www.availabilitydigest.com. Data Deduplication February 2011

Availability Digest. www.availabilitydigest.com. Data Deduplication February 2011 the Availability Digest Data Deduplication February 2011 What is Data Deduplication? Data deduplication is a technology that can reduce disk storage-capacity requirements and replication bandwidth requirements

More information

A Network Differential Backup and Restore System based on a Novel Duplicate Data Detection algorithm

A Network Differential Backup and Restore System based on a Novel Duplicate Data Detection algorithm A Network Differential Backup and Restore System based on a Novel Duplicate Data Detection algorithm GUIPING WANG 1, SHUYU CHEN 2*, AND JUN LIU 1 1 College of Computer Science Chongqing University No.

More information

Quanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud

Quanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Quanqing XU Quanqing.Xu@nicta.com.au YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Outline Motivation YuruBackup s Architecture Backup Client File Scan, Data

More information

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS Nehal Markandeya 1, Sandip Khillare 2, Rekha Bagate 3, Sayali Badave 4 Vaishali Barkade 5 12 3 4 5 (Department

More information

A Web Site Protection Oriented Remote Backup and Recovery Method

A Web Site Protection Oriented Remote Backup and Recovery Method 2013 8th International Conference on Communications and Networking in China (CHINACOM) A Web Site Protection Oriented Remote Backup and Recovery Method He Qian 1,2, Guo Yafeng 1, Wang Yong 1, Qiang Baohua

More information

The assignment of chunk size according to the target data characteristics in deduplication backup system

The assignment of chunk size according to the target data characteristics in deduplication backup system The assignment of chunk size according to the target data characteristics in deduplication backup system Mikito Ogata Norihisa Komoda Hitachi Information and Telecommunication Engineering, Ltd. 781 Sakai,

More information

On the Scalability of Data Synchronization Protocols for PDAs and Mobile Devices

On the Scalability of Data Synchronization Protocols for PDAs and Mobile Devices On the Scalability of Data Synchronization Protocols for PDAs and Mobile Devices S. Agarwal D. Starobinski A. Trachtenberg {ska,staro,trachten}@bu.edu Department of Electrical and Computer Engineering

More information

NAS 259 Protecting Your Data with Remote Sync (Rsync)

NAS 259 Protecting Your Data with Remote Sync (Rsync) NAS 259 Protecting Your Data with Remote Sync (Rsync) Create and execute an Rsync backup job A S U S T O R C O L L E G E COURSE OBJECTIVES Upon completion of this course you should be able to: 1. Having

More information

A Survey on Deduplication Strategies and Storage Systems

A Survey on Deduplication Strategies and Storage Systems A Survey on Deduplication Strategies and Storage Systems Guljar Shaikh ((Information Technology,B.V.C.O.E.P/ B.V.C.O.E.P, INDIA) Abstract : Now a day there is raising demands for systems which provide

More information

A Deduplication-based Data Archiving System

A Deduplication-based Data Archiving System 2012 International Conference on Image, Vision and Computing (ICIVC 2012) IPCSIT vol. 50 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V50.20 A Deduplication-based Data Archiving System

More information

09'Linux Plumbers Conference

09'Linux Plumbers Conference 09'Linux Plumbers Conference Data de duplication Mingming Cao IBM Linux Technology Center cmm@us.ibm.com 2009 09 25 Current storage challenges Our world is facing data explosion. Data is growing in a amazing

More information

Data Deduplication HTBackup

Data Deduplication HTBackup Data Deduplication HTBackup HTBackup and it s Deduplication technology is touted as one of the best ways to manage today's explosive data growth. If you're new to the technology, these key facts will help

More information

File Protection using rsync. Setup guide

File Protection using rsync. Setup guide File Protection using rsync Setup guide Contents 1. Introduction... 2 Documentation... 2 Licensing... 2 Overview... 2 2. Rsync technology... 3 Terminology... 3 Implementation... 3 3. Rsync data hosts...

More information

Hardware Configuration Guide

Hardware Configuration Guide Hardware Configuration Guide Contents Contents... 1 Annotation... 1 Factors to consider... 2 Machine Count... 2 Data Size... 2 Data Size Total... 2 Daily Backup Data Size... 2 Unique Data Percentage...

More information

De-duplication-based Archival Storage System

De-duplication-based Archival Storage System De-duplication-based Archival Storage System Than Than Sint Abstract This paper presents the disk-based backup system in which only relational database files are stored by using data deduplication technology.

More information

A Method of Deduplication for Data Remote Backup

A Method of Deduplication for Data Remote Backup A Method of Deduplication for Data Remote Backup Jingyu Liu 1,2, Yu-an Tan 1, Yuanzhang Li 1, Xuelan Zhang 1, Zexiang Zhou 3 1 School of Computer Science and Technology, Beijing Institute of Technology,

More information

A Deduplication File System & Course Review

A Deduplication File System & Course Review A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror

More information

Data Deduplication and Tivoli Storage Manager

Data Deduplication and Tivoli Storage Manager Data Deduplication and Tivoli Storage Manager Dave Cannon Tivoli Storage Manager rchitect Oxford University TSM Symposium September 2007 Disclaimer This presentation describes potential future enhancements

More information

Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing

Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing www.ijcsi.org 227 Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing Dhuha Basheer Abdullah 1, Zeena Abdulgafar Thanoon 2, 1 Computer Science Department, Mosul University,

More information

Efficient and Safe Data Backup with Arrow

Efficient and Safe Data Backup with Arrow Efficient and Safe Data Backup with Arrow Technical Report UCSC-SSRC-8-2 June 28 Casey Marshall csm@soe.ucsc.edu Storage Systems Research Center Baskin School of Engineering University of California, Santa

More information

A block based storage model for remote online backups in a trust no one environment

A block based storage model for remote online backups in a trust no one environment A block based storage model for remote online backups in a trust no one environment http://www.duplicati.com/ Kenneth Skovhede (author, kenneth@duplicati.com) René Stach (editor, rene@duplicati.com) Abstract

More information

bup: the git-based backup system Avery Pennarun

bup: the git-based backup system Avery Pennarun bup: the git-based backup system Avery Pennarun 2010 10 25 The Challenge Back up entire filesystems (> 1TB) Including huge VM disk images (files >100GB) Lots of separate files (500k or more) Calculate/store

More information

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen Design and Implementation of a Storage Repository Using Commonality Factoring IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen Axion Overview Potentially infinite historic versioning for rollback and

More information

A Policy-based De-duplication Mechanism for Securing Cloud Storage

A Policy-based De-duplication Mechanism for Securing Cloud Storage International Journal of Electronics and Information Engineering, Vol.2, No.2, PP.70-79, June 2015 70 A Policy-based De-duplication Mechanism for Securing Cloud Storage Zhen-Yu Wang 1, Yang Lu 1, Guo-Zi

More information

A Policy-based De-duplication Mechanism for Securing Cloud Storage

A Policy-based De-duplication Mechanism for Securing Cloud Storage International Journal of Electronics and Information Engineering, Vol.2, No.2, PP.95-102, June 2015 95 A Policy-based De-duplication Mechanism for Securing Cloud Storage Zhen-Yu Wang 1, Yang Lu 1, Guo-Zi

More information

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose Abhirupa Chatterjee 1, Divya. R. Krishnan 2, P. Kalamani 3 1,2 UG Scholar, Sri Sairam College Of Engineering, Bangalore. India

More information

MADR Algorithm to Recover Authenticity from Damage of the Important Data

MADR Algorithm to Recover Authenticity from Damage of the Important Data , pp. 443-452 http://dx.doi.org/10.14257/ijmue.2014.9.12.39 MADR Algorithm to Recover Authenticity from Damage of the Important Data Seong-Ho An 1, * Kihyo Nam 2, Mun-Kweon Jeong 2 and Yong-Rak Choi 1

More information

The X-DBaaS-Based Stock Trading System to Overcome Low Latency in Cloud Environment

The X-DBaaS-Based Stock Trading System to Overcome Low Latency in Cloud Environment , pp.127-136 http://dx.doi.org/10.14257/ijmue.2015.10.10.14 The X-DBaaS-Based Stock Trading System to Overcome Low Latency in Cloud Environment Hyoyoung Shin 1 and Hyungjin Kim 2* 1 Department of IT Security,

More information

Egnyte Local Cloud Architecture. White Paper

Egnyte Local Cloud Architecture. White Paper w w w. e g n y t e. c o m Egnyte Local Cloud Architecture White Paper Revised June 21, 2012 Table of Contents Egnyte Local Cloud Introduction page 2 Scalable Solutions Personal Local Cloud page 3 Office

More information

Cyber Forensic for Hadoop based Cloud System

Cyber Forensic for Hadoop based Cloud System Cyber Forensic for Hadoop based Cloud System ChaeHo Cho 1, SungHo Chin 2 and * Kwang Sik Chung 3 1 Korea National Open University graduate school Dept. of Computer Science 2 LG Electronics CTO Division

More information

Research and Performance Analysis of HTML5 WebSocket for a Real-time Multimedia Data Communication Environment

Research and Performance Analysis of HTML5 WebSocket for a Real-time Multimedia Data Communication Environment Vol.46 (Multimedia 2014), pp.307-312 http://dx.doi.org/10.14257/astl.2014.46.64 Research and Performance Analysis of HTML5 WebSocket for a Real-time Multimedia Data Communication Environment Jin-tae Park

More information

SmartSync Backup Efficient NAS-to-NAS backup

SmartSync Backup Efficient NAS-to-NAS backup Allion Ingrasys Europe SmartSync Backup Efficient NAS-to-NAS backup 1. Abstract A common approach to back up data stored in a NAS server is to run backup software on a Windows or UNIX systems and back

More information

Data Reduction: Deduplication and Compression. Danny Harnik IBM Haifa Research Labs

Data Reduction: Deduplication and Compression. Danny Harnik IBM Haifa Research Labs Data Reduction: Deduplication and Compression Danny Harnik IBM Haifa Research Labs Motivation Reducing the amount of data is a desirable goal Data reduction: an attempt to compress the huge amounts of

More information

Rsync Internet Backup Whitepaper

Rsync Internet Backup Whitepaper WHITEPAPER BackupAssist Version 6 www.backupassist.com Cortex I.T. Labs 2001-2008 2 Contents Introduction... 3 Important notice about terminology... 3 Rsync: An overview... 3 Performance... 4 Summary...

More information

MSc Computer Security and Forensics. Examinations for 2009-2010 / Semester 1

MSc Computer Security and Forensics. Examinations for 2009-2010 / Semester 1 MSc Computer Security and Forensics Cohort: MCSF/09B/PT Examinations for 2009-2010 / Semester 1 MODULE: COMPUTER FORENSICS & CYBERCRIME MODULE CODE: SECU5101 Duration: 2 Hours Instructions to Candidates:

More information

Log files management. Katarzyna KAPUSTA

Log files management. Katarzyna KAPUSTA Log files management Katarzyna KAPUSTA CERN openlab 07 September 2012 CERN openlab otn-2012-01 openlab Summer Student Report Log files management Katarzyna KAPUSTA Giacomo TENAGLIA 07 September 2012 Version

More information

EMC VNXe File Deduplication and Compression

EMC VNXe File Deduplication and Compression White Paper EMC VNXe File Deduplication and Compression Overview Abstract This white paper describes EMC VNXe File Deduplication and Compression, a VNXe system feature that increases the efficiency with

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

An Efficient Application Virtualization Mechanism using Separated Software Execution System

An Efficient Application Virtualization Mechanism using Separated Software Execution System An Efficient Application Virtualization Mechanism using Separated Software Execution System Su-Min Jang, Won-Hyuk Choi and Won-Young Kim Cloud Computing Research Department, Electronics and Telecommunications

More information

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES 1 MYOUNGJIN KIM, 2 CUI YUN, 3 SEUNGHO HAN, 4 HANKU LEE 1,2,3,4 Department of Internet & Multimedia Engineering,

More information

Data Deduplication and Tivoli Storage Manager

Data Deduplication and Tivoli Storage Manager Data Deduplication and Tivoli Storage Manager Dave annon Tivoli Storage Manager rchitect March 2009 Topics Tivoli Storage, IM Software Group Deduplication technology Data reduction and deduplication in

More information

Fossil an archival file server

Fossil an archival file server Fossil an archival file server Russ Cox rsc@mit.edu PDOS Group Meeting January 7, 2003 http://pdos/~rsc/talks History... Cached WORM file server (Quinlan and Thompson): active file system on magnetic disk

More information

Read Performance Enhancement In Data Deduplication For Secondary Storage

Read Performance Enhancement In Data Deduplication For Secondary Storage Read Performance Enhancement In Data Deduplication For Secondary Storage A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Pradeep Ganesan IN PARTIAL FULFILLMENT

More information

FAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory

FAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of flash Memory based Solid State Drives FAST 11 Yongseok Oh University of Seoul Mobile Embedded System Laboratory

More information

Security Measures of Personal Information of Smart Home PC

Security Measures of Personal Information of Smart Home PC , pp.227-236 http://dx.doi.org/10.14257/ijsh.2013.7.6.22 Security Measures of Personal Information of Smart Home PC Mi-Sook Seo 1 and Dea-Woo Park 2 1, 2 Department of Integrative Engineering, Hoseo Graduate

More information

Figure 1. The cloud scales: Amazon EC2 growth [2].

Figure 1. The cloud scales: Amazon EC2 growth [2]. - Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 shinji10343@hotmail.com, kwang@cs.nctu.edu.tw Abstract One of the most important issues

More information

Laptop Backup for Remote Workforce

Laptop Backup for Remote Workforce Whitepaper Laptop Backup for Remote Workforce Whitepaper The whitepaper explains the use case, existing solution and Druvaa s approach for on-the-move laptop backup for travelling employees. D r u v a

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card

More information

Enterprise Backup and Restore technology and solutions

Enterprise Backup and Restore technology and solutions Enterprise Backup and Restore technology and solutions LESSON VII Veselin Petrunov Backup and Restore team / Deep Technical Support HP Bulgaria Global Delivery Hub Global Operations Center November, 2013

More information

A Study on Countering VoIP Spam using RBL

A Study on Countering VoIP Spam using RBL 2011 2nd International Conference on Networking and Information Technology IPCSIT vol.17 (2011) (2011) IACSIT Press, Singapore A Study on Countering VoIP Spam using RBL Seokung Yoon, Haeryoung Park, Myoung

More information

Understanding EMC Avamar with EMC Data Protection Advisor

Understanding EMC Avamar with EMC Data Protection Advisor Understanding EMC Avamar with EMC Data Protection Advisor Applied Technology Abstract EMC Data Protection Advisor provides a comprehensive set of features that reduce the complexity of managing data protection

More information

A PERFORMANCE EVALUATION OF COMMON ENCRYPTION TECHNIQUES WITH SECURE WATERMARK SYSTEM (SWS)

A PERFORMANCE EVALUATION OF COMMON ENCRYPTION TECHNIQUES WITH SECURE WATERMARK SYSTEM (SWS) A PERFORMANCE EVALUATION OF COMMON ENCRYPTION TECHNIQUES WITH SECURE WATERMARK SYSTEM (SWS) Ashraf Odeh 1, Shadi R.Masadeh 2, Ahmad Azzazi 3 1 Computer Information Systems Department, Isra University,

More information

Online De-duplication in a Log-Structured File System for Primary Storage

Online De-duplication in a Log-Structured File System for Primary Storage Online De-duplication in a Log-Structured File System for Primary Storage Technical Report UCSC-SSRC-11-03 May 2011 Stephanie N. Jones snjones@cs.ucsc.edu Storage Systems Research Center Baskin School

More information

Chapter 11 I/O Management and Disk Scheduling

Chapter 11 I/O Management and Disk Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization

More information

DEXT3: Block Level Inline Deduplication for EXT3 File System

DEXT3: Block Level Inline Deduplication for EXT3 File System DEXT3: Block Level Inline Deduplication for EXT3 File System Amar More M.A.E. Alandi, Pune, India ahmore@comp.maepune.ac.in Zishan Shaikh M.A.E. Alandi, Pune, India zishan366shaikh@gmail.com Vishal Salve

More information

A Method of Deduplication for Data Remote Backup

A Method of Deduplication for Data Remote Backup A Method of Deduplication for Data Remote Backup Jingyu Liu 1,2, Yu-an Tan 1, Yuanzhang Li 1, Xuelan Zhang 1, and Zexiang Zhou 3 1 School of Computer Science and Technology, Beijing Institute of Technology,

More information

PACK: PREDICTION-BASED CLOUD BANDWIDTH AND COST REDUCTION SYSTEM

PACK: PREDICTION-BASED CLOUD BANDWIDTH AND COST REDUCTION SYSTEM PACK: PREDICTION-BASED CLOUD BANDWIDTH AND COST REDUCTION SYSTEM Abstract: In this paper, we present PACK (Predictive ACKs), a novel end-to-end traffic redundancy elimination (TRE) system, designed for

More information

Web-Based Data Backup Solutions

Web-Based Data Backup Solutions "IMAGINE LOSING ALL YOUR IMPORTANT FILES, IS NOT OF WHAT FILES YOU LOSS BUT THE LOSS IN TIME, MONEY AND EFFORT YOU ARE INVESTED IN" The fact Based on statistics gathered from various sources: 1. 6% of

More information

86 Int. J. Engineering Systems Modelling and Simulation, Vol. 6, Nos. 1/2, 2014

86 Int. J. Engineering Systems Modelling and Simulation, Vol. 6, Nos. 1/2, 2014 86 Int. J. Engineering Systems Modelling and Simulation, Vol. 6, Nos. 1/2, 2014 Dual server-based secure data-storage system for cloud storage Woong Go ISAA Lab, Department of Information Security Engineering,

More information

SmartSync NAS-to-NAS Data Replication

SmartSync NAS-to-NAS Data Replication SmartSync NAS-to-NAS Data Replication 1. Abstract 7/23 Henry Ho Data replication has become a common feature among NAS systems. It provides a cost-effective and efficient implementation of remote data

More information

MOBILE APPLICATIONS AND CLOUD COMPUTING. Roberto Beraldi

MOBILE APPLICATIONS AND CLOUD COMPUTING. Roberto Beraldi MOBILE APPLICATIONS AND CLOUD COMPUTING Roberto Beraldi Course Outline 6 CFUs Topics: Mobile application programming (Android) Cloud computing To pass the exam: Individual working and documented application

More information

Service Overview CloudCare Online Backup

Service Overview CloudCare Online Backup Service Overview CloudCare Online Backup CloudCare s Online Backup service is a secure, fully automated set and forget solution, powered by Attix5, and is ideal for organisations with limited in-house

More information

Content-Aware Load Balancing using Direct Routing for VOD Streaming Service

Content-Aware Load Balancing using Direct Routing for VOD Streaming Service Content-Aware Load Balancing using Direct Routing for VOD Streaming Service Young-Hwan Woo, Jin-Wook Chung, Seok-soo Kim Dept. of Computer & Information System, Geo-chang Provincial College, Korea School

More information

Reducing Backups with Data Deduplication

Reducing Backups with Data Deduplication The Essentials Series: New Techniques for Creating Better Backups Reducing Backups with Data Deduplication sponsored by by Eric Beehler Reducing Backups with Data Deduplication... 1 Explaining Data Deduplication...

More information

An Improvement Technique for Simulated Annealing and Its Application to Nurse Scheduling Problem

An Improvement Technique for Simulated Annealing and Its Application to Nurse Scheduling Problem An Improvement Technique for Simulated Annealing and Its Application to Nurse Scheduling Problem Young-Woong Ko, DongHoi Kim, Minyeong Jeong, Wooram Jeon, Saangyong Uhmn and Jin Kim* Dept. of Computer

More information

Product Brief. DC-Protect. Content based backup and recovery solution. By DATACENTERTECHNOLOGIES

Product Brief. DC-Protect. Content based backup and recovery solution. By DATACENTERTECHNOLOGIES Product Brief DC-Protect Content based backup and recovery solution By DATACENTERTECHNOLOGIES 2002 DATACENTERTECHNOLOGIES N.V. All rights reserved. This document contains information proprietary and confidential

More information

An Authorized Duplicate Check Scheme for Removing Duplicate Copies of Repeating Data in The Cloud Environment to Reduce Amount of Storage Space

An Authorized Duplicate Check Scheme for Removing Duplicate Copies of Repeating Data in The Cloud Environment to Reduce Amount of Storage Space An Authorized Duplicate Check Scheme for Removing Duplicate Copies of Repeating Data in The Cloud Environment to Reduce Amount of Storage Space Jannu.Prasanna Krishna M.Tech Student, Department of CSE,

More information

Cloud De-duplication Cost Model THESIS

Cloud De-duplication Cost Model THESIS Cloud De-duplication Cost Model THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By Christopher Scott Hocker

More information

Wide-area Network Acceleration for the Developing World. Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

Wide-area Network Acceleration for the Developing World. Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton) Wide-area Network Acceleration for the Developing World Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton) POOR INTERNET ACCESS IN THE DEVELOPING WORLD Internet access is a scarce

More information

Rsync Internet Backup Whitepaper

Rsync Internet Backup Whitepaper WHITEPAPER BackupAssist Version 5.1 www.backupassist.com Cortex I.T. Labs 2001-2008 2 Contents Introduction... 3 Important notice about terminology... 3 Rsync: An overview... 3 Performance... 4 Summary...

More information

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011 Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

More information

An Efficiency Keyword Search Scheme to improve user experience for Encrypted Data in Cloud

An Efficiency Keyword Search Scheme to improve user experience for Encrypted Data in Cloud , pp.246-252 http://dx.doi.org/10.14257/astl.2014.49.45 An Efficiency Keyword Search Scheme to improve user experience for Encrypted Data in Cloud Jiangang Shu ab Xingming Sun ab Lu Zhou ab Jin Wang ab

More information

An Active Packet can be classified as

An Active Packet can be classified as Mobile Agents for Active Network Management By Rumeel Kazi and Patricia Morreale Stevens Institute of Technology Contact: rkazi,pat@ati.stevens-tech.edu Abstract-Traditionally, network management systems

More information

A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures

A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures Afsin Akdogan, Hien To, Seon Ho Kim and Cyrus Shahabi Integrated Media Systems Center University of Southern California, Los Angeles,

More information

Contents. WD Arkeia Page 2 of 14

Contents. WD Arkeia Page 2 of 14 Contents Contents...2 Executive Summary...3 What Is Data Deduplication?...4 Traditional Data Deduplication Strategies...5 Deduplication Challenges...5 Single-Instance Storage...5 Fixed-Block Deduplication...6

More information

Solaris For The Modern Data Center. Taking Advantage of Solaris 11 Features

Solaris For The Modern Data Center. Taking Advantage of Solaris 11 Features Solaris For The Modern Data Center Taking Advantage of Solaris 11 Features JANUARY 2013 Contents Introduction... 2 Patching and Maintenance... 2 IPS Packages... 2 Boot Environments... 2 Fast Reboot...

More information

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside Managing the information that drives the enterprise STORAGE Buying Guide: DEDUPLICATION inside What you need to know about target data deduplication Special factors to consider One key difference among

More information

ABSTRACT 1 INTRODUCTION

ABSTRACT 1 INTRODUCTION DEDUPLICATION IN YAFFS Karthik Narayan {knarayan@cs.wisc.edu}, Pavithra Seshadri Vijayakrishnan{pavithra@cs.wisc.edu} Department of Computer Sciences, University of Wisconsin Madison ABSTRACT NAND flash

More information

Algorithms for Delta Compression and Remote File Synchronization

Algorithms for Delta Compression and Remote File Synchronization Algorithms for Delta Compression and Remote File Synchronization Torsten Suel Nasir Memon CIS Department Polytechnic University Brooklyn, NY 11201 suel,memon @poly.edu Abstract Delta compression and remote

More information

DEDISbench: A Benchmark for Deduplicated Storage Systems

DEDISbench: A Benchmark for Deduplicated Storage Systems DEDISbench: A Benchmark for Deduplicated Storage Systems J. Paulo, P. Reis, J. Pereira and A. Sousa High-Assurance Software Lab (HASLab) INESC TEC & University of Minho Abstract. Deduplication is widely

More information

How To Balance In Cloud Computing

How To Balance In Cloud Computing A Review on Load Balancing Algorithms in Cloud Hareesh M J Dept. of CSE, RSET, Kochi hareeshmjoseph@ gmail.com John P Martin Dept. of CSE, RSET, Kochi johnpm12@gmail.com Yedhu Sastri Dept. of IT, RSET,

More information

Smart Integrated Multiple Tracking System Development for IOT based Target-oriented Logistics Location and Resource Service

Smart Integrated Multiple Tracking System Development for IOT based Target-oriented Logistics Location and Resource Service , pp. 195-204 http://dx.doi.org/10.14257/ijsh.2015.9.5.19 Smart Integrated Multiple Tracking System Development for IOT based Target-oriented Logistics Location and Resource Service Ju-Su Kim, Hak-Jun

More information

A SIGNIFICANT REDUCTION OF CLOUD STORAGE BY ELIMINATION OF REPETITIVE DATA

A SIGNIFICANT REDUCTION OF CLOUD STORAGE BY ELIMINATION OF REPETITIVE DATA INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE A SIGNIFICANT REDUCTION OF CLOUD STORAGE BY ELIMINATION OF REPETITIVE DATA M.Rajashekar Reddy 1, B.Ramya 2 1 M.Tech Student, Dept of

More information

A First Look at Mobile Cloud Storage Services: Architecture, Experimentation and Challenge

A First Look at Mobile Cloud Storage Services: Architecture, Experimentation and Challenge A First Look at Mobile Cloud Storage Services: Architecture, Experimentation and Challenge Yong Cui Tsinghua University Zeqi Lai Tsinghua University Ningwei Dai Tsinghua University Abstract Mobile cloud

More information

Original-page small file oriented EXT3 file storage system

Original-page small file oriented EXT3 file storage system Original-page small file oriented EXT3 file storage system Zhang Weizhe, Hui He, Zhang Qizhen School of Computer Science and Technology, Harbin Institute of Technology, Harbin E-mail: wzzhang@hit.edu.cn

More information

Deduplication Demystified: How to determine the right approach for your business

Deduplication Demystified: How to determine the right approach for your business Deduplication Demystified: How to determine the right approach for your business Presented by Charles Keiper Senior Product Manager, Data Protection Quest Software Session Objective: To answer burning

More information

Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages

Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages Soohyun Yang and Yeonseung Ryu Department of Computer Engineering, Myongji University Yongin, Gyeonggi-do, Korea

More information

REMOTE BACKUP-WHY SO VITAL?

REMOTE BACKUP-WHY SO VITAL? REMOTE BACKUP-WHY SO VITAL? Any time your company s data or applications become unavailable due to system failure or other disaster, this can quickly translate into lost revenue for your business. Remote

More information

EMAIL DATA DE-DUPLICATION SYSTEM

EMAIL DATA DE-DUPLICATION SYSTEM EMAIL DATA DE-DUPLICATION SYSTEM A Final Project Presented to The Faculty of the Department of General Engineering San José State University In Partial Fulfillment of the Requirements for the Degree Master

More information

Government Information Security System with ITS Product Pre-qualification

Government Information Security System with ITS Product Pre-qualification Government Information Security System with ITS Product Pre-qualification Wan S. Yi 1, Dongbum Lee 2, Jin Kwak 2, Dongho Won 1 1 Information Security Group, Sungkyunkwan University, 300 Cheoncheon-dong,

More information

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University

More information

Lecture 11. RFS A Network File System for Mobile Devices and the Cloud

Lecture 11. RFS A Network File System for Mobile Devices and the Cloud Lecture 11 RFS A Network File System for Mobile Devices and the Cloud Yuan Dong, Jinzhan Peng, Dawei Wang, Haiyang Zhu, Fang Wang, Sun C. Chan, Michael P. Mesnier Advanced Operating Systems January 4th,

More information