KEYWORDS: Optical character recognition, Information science, Scientific research, Lanthanum, Computer security, Data mining, Detection and tracking algorithms, Electronic imaging, Current controlled current source, Mirrors
This paper presents the implementation and evaluation of a pattern-based program to extract date of birth information
from OCR text. Although the program finds data of birth information with high precision and recall, this type of
information extraction task seems to be negatively impacted by OCR errors.
KEYWORDS: Optical character recognition, Information science, Scientific research, Lanthanum, Data storage, Data processing, Internet, Electronic imaging, Databases, Feature extraction
We report on an attempt to build an automatic redaction system by applying information extraction techniques to the identification of private dates of birth. We conclude that automatic redaction is a promising concept although information extraction is significantly affected by the presence of OCR error.
This paper presents the implementation and evaluation of a Hidden Markov Model to extract addresses from OCR text. Although Hidden Markov Models discover addresses with high precision and recall, this type of Information Extraction task seems to be affected negatively by the presence of OCR text.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.