A formulation of a hierarchical page decomposition technique for technical journal pages using attribute grammars is presented. In this approach, block-grammars are recursively applied until a page is classified into its most significant sub-blocks. While a grammar devised for each block depends on its logical function, it is possible to formulate a generic description for all block grammars using attribute grammars. This attribute grammar formulation forms a generic framework on which this syntactic approach is based, while the attributes themselves are derived from publication-specific knowledge. The attribute extraction process and the formulation itself are covered in this paper. We discuss an application of attribute grammars to a document analysis problem, the extraction of logical, relational information from the image of tables.
"Document recognition: an attribute grammar approach", Proc. SPIE 2660, Document Recognition III, (7 March 1996); doi: 10.1117/12.234695; https://doi.org/10.1117/12.234695