24 January 2011 Keyword and image-based retrieval of mathematical expressions
Author Affiliations +
Two new methods for retrieving mathematical expressions using conventional keyword search and expression images are presented. An expression-level TF-IDF (term frequency-inverse document frequency) approach is used for keyword search, where queries and indexed expressions are represented by keywords taken from LATEX strings. TF-IDF is computed at the level of individual expressions rather than documents to increase the precision of matching. The second retrieval technique is a form of Content-Based Image Retrieval (CBIR). Expressions are segmented into connected components, and then components in the query expression and each expression in the collection are matched using contour and density features, aspect ratios, and relative positions. In an experiment using ten randomly sampled queries from a corpus of over 22,000 expressions, precision-at-k (k = 20) for the keyword-based approach was higher (keyword: μ = 84.0, σ = 19.0, imagebased: μ = 32.0, σ = 30.7), but for a few of the queries better results were obtained using a combination of the two techniques.
© (2011) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Richard Zanibbi, Richard Zanibbi, Bo Yuan, Bo Yuan, } "Keyword and image-based retrieval of mathematical expressions", Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 78740I (24 January 2011); doi: 10.1117/12.873312; https://doi.org/10.1117/12.873312

Back to Top