Applied Mathematics & Information Sciences

Author Country (or Countries)



Data deduplication technique is widely deployed in cloud backup storage system to reduce storage space and to minimize the transmission of redundant data for proper utilization of network bandwidth. During cloud backup service, redundancy of typical backup data dominated heavily by duplicate chunks. The intrinsic drawback of this system is detecting the similar chunks.The storage server consists of large volume of chunks, making the duplicate detection process much more complicated which decreases deduplication efficiency and increases deduplication overhead. In this paper we propose Bayesian method for source local deduplication for finding out duplicate chunks. For finding chunk similarity, the learning based similarity metrics are developed. The data features are used to train Bayesian system. Our experimental results shows that precision, recall and F measure values are high compared to SVM and GP. Due to these high values the proposed Bayesian method increases deduplication efficiency and reduces deduplication overhead. Therefore the proposed Bayesian method yields better performance than Support Vector Machine Model and Genetic approach.

Digital Object Identifier (DOI)