Paper
16 February 2022 Replacing speaker-independent recognition task with speaker-dependent task for lip-reading using First Order Motion Model
Michinari Kodama, Takeshi Saitoh
Author Affiliations +
Proceedings Volume 12083, Thirteenth International Conference on Graphics and Image Processing (ICGIP 2021); 1208323 (2022) https://doi.org/10.1117/12.2623640
Event: Thirteenth International Conference on Graphics and Image Processing (ICGIP 2021), 2021, Kunming, China
Abstract
There is a tendency to deal with a speaker-independent recognition task in the lip-reading field by collecting speech scenes from many speakers. The data collection task is time-consuming. This paper proposes a method to solve this problem. According to a driving video, First Order Motion Model (FOMM) is a deep generative model that generates a video sequence from a source image. Our idea is to apply FOMM to all speech scenes in the dataset to generate the speech scenes recording from one speaker. We propose a preprocessing method to replace the speaker-independent recognition task with the speaker-dependent recognition task by applying FOMM. We applied the proposed method to two publicly available databases: OuluVS and CUAVE, and confirmed that the recognition accuracy was improved by applying the proposed method to both databases.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Michinari Kodama and Takeshi Saitoh "Replacing speaker-independent recognition task with speaker-dependent task for lip-reading using First Order Motion Model", Proc. SPIE 12083, Thirteenth International Conference on Graphics and Image Processing (ICGIP 2021), 1208323 (16 February 2022); https://doi.org/10.1117/12.2623640
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Databases

Motion models

Feature extraction

Image processing

Speaker recognition

Cameras

Mouth

RELATED CONTENT

Gait recognition system based on (2D)2 PCA and HMM
Proceedings of SPIE (August 29 2016)
Dental archives based on images
Proceedings of SPIE (May 22 1997)
Proposal for identifying images
Proceedings of SPIE (June 01 1990)

Back to Top