A NOVEL TECHNIQUE FOR DATA DEDUPLICATION WITH SHA-1 IN HADOOP FRAMEWORK

Authors

  • Sonam Bhardwaj* & Preeti Malik Author

Keywords:

Big Data, Deduplication, Sha-1, Hadoop ,HDFS

Abstract

Big Data a transpiring research matter in hand analyzing and processing which is a defiance for current systems leading to high processing costs and degraded performance and quality. The centralized architecture is unable to cope up with the challenge of massive data resulting in storage space issues and processing time conflicts. The proposed technique addresses the above problem by applying the deduplication technique on various dataset containing unstructured data and implementing SHA-1 algorithm for calculation of fixed size digests and only storing the unique values. The research work is favoured by Hadoop that contains Distributed MapReduce framework with Mapper and Reducer programs for processing and reduction of data respectively.By enforcing the proposed technique there is a gain in space saved, reduction in time consumed, increased deduplication ration as well as number of duplicate files are detected efficiently.

Downloads

Published

2017-04-30

Issue

Section

Articles

How to Cite

A NOVEL TECHNIQUE FOR DATA DEDUPLICATION WITH SHA-1 IN HADOOP FRAMEWORK. (2017). International Journal of Engineering Sciences & Management Research, 4(4), 78-87. https://ijesmr.com/index.php/ijesmr/article/view/377