Skip to main navigation Skip to search Skip to main content

Tangent-V: Math formula image search using line-of-sight graphs

  • SUNY Buffalo
  • Rochester Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

We present a visual search engine for graphics such as math, chemical diagrams, and figures. Graphics are represented using Line-of-Sight (LOS) graphs, with symbols connected only when they can ‘see’ each other along an unobstructed line. Symbol identities may be provided (e.g., in PDF) or taken from Optical Character Recognition applied to images. Graphics are indexed by pairs of symbols that ‘see’ each other using their labels, spatial displacement, and size ratio. Retrieval has two layers: the first matches query symbol pairs in an inverted index, while the second aligns candidates with the query and scores the resulting matches using the identity and relative position of symbols. For PDFs, we also introduce a new tool that quickly extracts characters and their locations. We have applied our model to the NTCIR-12 Wikipedia Formula Browsing Task, and found that the method can locate relevant matches without unification of symbols or using a math expression grammar. In the future, one might index LOS graphs for entire pages and search for text and graphics. Our source code has been made publicly available.

Original languageEnglish
Title of host publicationAdvances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Proceedings
EditorsDjoerd Hiemstra, Philipp Mayr, Norbert Fuhr, Claudia Hauff, Benno Stein, Leif Azzopardi
PublisherSpringer Verlag
Pages681-695
Number of pages15
ISBN (Print)9783030157111
DOIs
StatePublished - 2019
Event41st European Conference on Information Retrieval, ECIR 2019 - Cologne, Germany
Duration: Apr 14 2019Apr 18 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11437 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference41st European Conference on Information Retrieval, ECIR 2019
Country/TerritoryGermany
CityCologne
Period04/14/1904/18/19

Keywords

  • Graphics search
  • Image search
  • Mathematical Information Retrieval (MIR)
  • PDF symbol extraction

Fingerprint

Dive into the research topics of 'Tangent-V: Math formula image search using line-of-sight graphs'. Together they form a unique fingerprint.

Cite this