Paper
27 December 2000 Accessing textual information embedded in Internet images
Author Affiliations +
Proceedings Volume 4311, Internet Imaging II; (2000) https://doi.org/10.1117/12.411891
Event: Photonics West 2001 - Electronic Imaging, 2001, San Jose, CA, United States
Abstract
Indexing and searching for WWW pages is relying on analyzing text. Current technology cannot process the text embedded in images on WWW pages. This paper argues that this is a significant problem as text in image form is usually semantically important (e.g. headers, titles). The results of a recent study are presented to show that the majority (76%) of words embedded in images do not appear elsewhere in the main text and that the majority (56%) of ALT tag descriptions of images are incorrect of do not exist at all. Research under way to devise tools to extracted text from images based on the way humans perceive color differences is outlined and results are presented.
© (2000) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Apostolos Antonacopoulos, Dimosthenis Karatzas, and Jordi Ortiz-Lopez "Accessing textual information embedded in Internet images", Proc. SPIE 4311, Internet Imaging II, (27 December 2000); https://doi.org/10.1117/12.411891
Lens.org Logo
CITATIONS
Cited by 17 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Internet

Visualization

Image processing

Optical character recognition

RGB color model

Data conversion

Image segmentation

Back to Top