4 March 2024 Human-like cognition: visual features grouping for hard-to-group text dataset
Xin Li, Hangyuan Liu, Chunfeng Tao, Ruiyi Han, Shumin Yang
Author Affiliations +
Abstract

Most existing arbitrary shape text detection methods employ connected components and text center lines for grouping text instances, which assume that texts in adjacent positions belong to the same instance. However, many hard-to-group scene texts are too complex to be effectively processed in this way. To address this challenge, we propose a novel scene text-spotting method that utilizes feature-based clustering inspired by human cognitive principles of text perception. Our approach involves first utilizing a character spotter to obtain the location and the transcription information of the characters. Then, a lightweight recognition network extracts the visual features of the characters by their locations. These visual features are then grouped into instances through a K-means-fuzzy-net, which explicitly model visual feature similarity to effectively group the nested text, the large-margin text, the continuous text, and the one with overlapping characters. Finally, the recognition results of text instances are processed by a word correction module to improve the overall accuracy and reduce the vulnerability of individual character detection. Additionally, we have contributed a hard-to-group text dataset. Experiments demonstrate the state-of-the-art performance of our method in addressing scenarios. Hard-to-group text dataset is available at: https://github.com/baggio321/Hard-to-Group-Text-Dataset.

© 2024 SPIE and IS&T
Xin Li, Hangyuan Liu, Chunfeng Tao, Ruiyi Han, and Shumin Yang "Human-like cognition: visual features grouping for hard-to-group text dataset," Journal of Electronic Imaging 33(2), 023002 (4 March 2024). https://doi.org/10.1117/1.JEI.33.2.023002
Received: 7 August 2023; Accepted: 8 January 2024; Published: 4 March 2024
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visualization

Optical character recognition

Cognition

Education and training

Object detection

Visual process modeling

Feature extraction

Back to Top