Indexing and searching for WWW pages is relying on analyzing text. Current technology cannot process the text embedded in images on WWW pages. This paper argues that this is a significant problem as text in image form is usually semantically important (e.g. headers, titles). The results of a recent study are presented to show that the majority (76%) of words embedded in images do not appear elsewhere in the main text and that the majority (56%) of ALT tag descriptions of images are incorrect of do not exist at all. Research under way to devise tools to extracted text from images based on the way humans perceive color differences is outlined and results are presented.
Conference Committee Involvement (8)
Document Recognition and Retrieval XVIII
26 January 2011 | San Francisco Airport, California, United States
Document Recognition and Retrieval XVII
20 January 2010 | San Jose, California, United States
Document Recognition and Retrieval XVI
21 January 2009 | San Jose, California, United States
Document Recognition and Retrieval XV
30 January 2008 | San Jose, California, United States
Document Recognition and Retrieval XIV
30 January 2007 | San Jose, CA, United States
Document Recognition and Retrieval XIII
18 January 2006 | San Jose, California, United States
Document Recognition and Retrieval XII
19 January 2005 | San Jose, California, United States
Document Recognition and Retrieval XI
21 January 2004 | San Jose, California, United States
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.