A NOVEL TECHNIQUE FOR DATA DEDUPLICATION WITH SHA-1 IN HADOOP FRAMEWORK

Sonam Bhardwaj* & Preeti Malik

Authors

Sonam Bhardwaj* & Preeti Malik Author

Keywords:

Big Data, Deduplication, Sha-1, Hadoop ,HDFS

Abstract

Big Data a transpiring research matter in hand analyzing and processing which is a defiance for current systems leading to high processing costs and degraded performance and quality. The centralized architecture is unable to cope up with the challenge of massive data resulting in storage space issues and processing time conflicts. The proposed technique addresses the above problem by applying the deduplication technique on various dataset containing unstructured data and implementing SHA-1 algorithm for calculation of fixed size digests and only storing the unique values. The research work is favoured by Hadoop that contains Distributed MapReduce framework with Mapper and Reducer programs for processing and reduction of data respectively.By enforcing the proposed technique there is a gain in space saved, reduction in time consumed, increased deduplication ration as well as number of duplicate files are detected efficiently.

A NOVEL TECHNIQUE FOR DATA DEDUPLICATION WITH SHA-1 IN HADOOP FRAMEWORK

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Language

Information

Indexed IN