Historical document layout analysis using anisotropic diffusion and geometric features

Galal M. BinMakhashen*, Sabri A. Mahmoud

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

There are several digital libraries worldwide which maintain valuable historical manuscripts. Usually, digital copies of these manuscripts are offered to researchers and readers in raster-image format. These images carry several document degradations that may hinder automatic information retrieval solutions such as manuscript indexing, categorization, retrieval by content, etc. In this paper, we propose a learning-free and hybrid document layout analysis for handwritten historical manuscripts. It has two main phases: page characterization and segmentation. First, the proposed method locates main-content initially using top-down whitespace analysis. It employs anisotropic diffusion filtering to find whitespaces. Then, it extracts template features representing manuscripts’ authors writing behavior. After that, moving windows are used to scan the manuscript page and define main-content boundaries more precisely. We evaluated the proposed method on two datasets: One set is publicly available with 38 historical manuscript pages, and the other set of 51 historical manuscript pages that are collected from the online Harvard Library. Experiments on both datasets show promising results in terms of segmentation quality of main-content that reaches up to 98.5% success rate.

Original languageEnglish
Pages (from-to)329-342
Number of pages14
JournalInternational Journal on Digital Libraries
Volume21
Issue number3
DOIs
StatePublished - 1 Sep 2020

Bibliographical note

Publisher Copyright:
© 2020, Springer-Verlag GmbH Germany, part of Springer Nature.

Keywords

  • Anisotropic diffusion filtering
  • Document analysis
  • Document indexing
  • Document retrieval
  • Geometric feature
  • Image segmentation
  • Whitespace analysis

ASJC Scopus subject areas

  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Historical document layout analysis using anisotropic diffusion and geometric features'. Together they form a unique fingerprint.

Cite this