In this paper, we propose a new dataset and a ground-truthing methodology for layout analysis of historical
documents with complex layouts. The dataset is based on a generic model for ground-truth presentation of
the complex layout structure of historical documents. For the purpose of extracting uniformly the document
contents, our model defines five types of regions of interest: page, text block, text line, decoration, and comment.
Unconstrained polygons are used to outline the regions. A performance metric is proposed in order to evaluate
various page segmentation methods based on this model. We have analysed four state-of-the-art ground-truthing
tools: TRUVIZ, GEDI, WebGT, and Aletheia. From this analysis, we conceptualized and developed Divadia, a
new tool that overcomes some of the drawbacks of these tools, targeting the simplicity and the efficiency of the
layout ground truthing process on historical document images. With Divadia, we have created a new public
dataset. This dataset contains 120 pages from three historical document image collections of different styles and
is made freely available to the scientific community for historical document layout analysis research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.