16 February 2010 Document reconstruction by layout analysis of snippets
Author Affiliations +
Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or to describe the layout/structure of a document. Also skew detection of scanned documents is performed to support OCR algorithms that are sensitive to skew. In this paper document analysis is applied to snippets of torn documents to calculate features for the reconstruction. Documents can either be destroyed by the intention to make the printed content unavailable (e.g. tax fraud investigation, business crime) or due to time induced degeneration of ancient documents (e.g. bad storage conditions). Current reconstruction methods for manually torn documents deal with the shape, inpainting and texture synthesis techniques. In this paper the possibility of document analysis techniques of snippets to support the matching algorithm by considering additional features are shown. This implies a rotational analysis, a color analysis and a line detection. As a future work it is planned to extend the feature set with the paper type (blank, checked, lined), the type of the writing (handwritten vs. machine printed) and the text layout of a snippet (text size, line spacing). Preliminary results show that these pre-processing steps can be performed reliably on a real dataset consisting of 690 snippets.
© (2010) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Florian Kleber, Florian Kleber, Markus Diem, Markus Diem, Robert Sablatnig, Robert Sablatnig, } "Document reconstruction by layout analysis of snippets", Proc. SPIE 7531, Computer Vision and Image Analysis of Art, 753107 (16 February 2010); doi: 10.1117/12.843687; https://doi.org/10.1117/12.843687

Back to Top