24 March 2014 Document flow segmentation for business applications
Author Affiliations +
Abstract
The aim of this paper is to propose a document flow supervised segmentation approach applied to real world heterogeneous documents. Our algorithm treats the flow of documents as couples of consecutive pages and studies the relationship that exists between them. At first, sets of features are extracted from the pages where we propose an approach to model the couple of pages into a single feature vector representation. This representation will be provided to a binary classifier which classifies the relationship as either segmentation or continuity. In case of segmentation, we consider that we have a complete document and the analysis of the flow continues by starting a new document. In case of continuity, the couple of pages are assimilated to the same document and the analysis continues on the flow. If there is an uncertainty on whether the relationship between the couple of pages should be classified as a continuity or segmentation, a rejection is decided and the pages analyzed until this point are considered as a "fragment". The first classification already provides good results approaching 90% on certain documents, which is high at this level of the system.
© (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Hani Daher, Abdel Belaïd, "Document flow segmentation for business applications", Proc. SPIE 9021, Document Recognition and Retrieval XXI, 90210G (24 March 2014); doi: 10.1117/12.2043141; https://doi.org/10.1117/12.2043141
PROCEEDINGS
11 PAGES


SHARE
Back to Top