Paper
8 February 2015 Math expression retrieval using an inverted index over symbol pairs
David Stalnaker, Richard Zanibbi
Author Affiliations +
Proceedings Volume 9402, Document Recognition and Retrieval XXII; 940207 (2015) https://doi.org/10.1117/12.2074084
Event: SPIE/IS&T Electronic Imaging, 2015, San Francisco, California, United States
Abstract
We introduce a new method for indexing and retrieving mathematical expressions, and a new protocol for evaluating math formula retrieval systems. The Tangent search engine uses an inverted index over pairs of symbols in math expressions. Each key in the index is a pair of symbols along with their relative distance and vertical displacement within an expression. Matched expressions are ranked by the harmonic mean of the percentage of symbol pairs matched in the query, and the percentage of symbol pairs matched in the candidate expression. We have found that our method is fast enough for use in real time and finds partial matches well, such as when subexpressions are re-arranged (e.g. expressions moved from the left to the right of an equals sign) or when individual symbols (e.g. variables) differ from a query expression. In an experiment using expressions from English Wikipedia, student and faculty participants (N=20) found expressions returned by Tangent significantly more similar than those from a text-based retrieval system (Lucene) adapted for mathematical expressions. Participants provided similarity ratings using a 5-point Likert scale, evaluating expressions from both algorithms one-at-a-time in a randomized order to avoid bias from the position of hits in search result lists. For the Lucene-based system, precision for the top 1 and 10 hits averaged 60% and 39% across queries respectively, while for Tangent mean precision at 1 and 10 were 99% and 60%. A demonstration and source code are publicly available.
© (2015) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
David Stalnaker and Richard Zanibbi "Math expression retrieval using an inverted index over symbol pairs", Proc. SPIE 9402, Document Recognition and Retrieval XXII, 940207 (8 February 2015); https://doi.org/10.1117/12.2074084
Lens.org Logo
CITATIONS
Cited by 17 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Mathematics

Latex

Genetic algorithms

Neodymium

Electroluminescent displays

Liquid crystals

Matrices

Back to Top