Performance Analysis of Duplicate Record Detection Techniques

Syed Hasan Adil*, Syed Saad Azhar Ali, Kamran Raza, Mansoor Ebrahim

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

In this paper, a comprehensive performance analysis of duplicate data detection techniques for relational databases has been performed. The research focuses on traditional SQL based and modern bloom filter techniques to find and eliminate records which already exist in the database while performing bulk insertion operation (i.e. bulk insertion involved in the loading phase of the Extract, Transform, and Load (ETL) process and data synchronization in multisite database synchronization). The comprehensive performance analysis was performed on several data sizes using SQL, bloom filter, and parallel bloom filter. The results show that the parallel bloom filter is highly suitable for duplicate detection in the database.

Original languageEnglish
Pages (from-to)4755-4758
Number of pages4
JournalEOS ASSOC
Volume9
Issue number5
DOIs
StatePublished - Oct 2019
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2019, Dr D. Pylarinos. All rights reserved.

Keywords

  • SQL
  • bloom filter
  • database
  • duplicate detection

ASJC Scopus subject areas

  • General Engineering
  • Materials Science (miscellaneous)
  • Signal Processing

Fingerprint

Dive into the research topics of 'Performance Analysis of Duplicate Record Detection Techniques'. Together they form a unique fingerprint.

Cite this