Paper
6 October 1997 Video indexing based on image and sound
Pascal Faudemay, Claude Montacie, Marie-Jose Caraty
Author Affiliations +
Proceedings Volume 3229, Multimedia Storage and Archiving Systems II; (1997) https://doi.org/10.1117/12.290365
Event: Voice, Video, and Data Communications, 1997, Dallas, TX, United States
Abstract
Video indexing is a major challenge for both scientific and economic reasons. Information extraction can sometimes be easier from sound channel than from image channel. We first present a multi-channel and multi-modal query interface, to query sound, image and script through 'pull' and 'push' queries. We then summarize the segmentation phase, which needs information from the image channel. Detection of critical segments is proposed. It should speed-up both automatic and manual indexing. We then present an overview of the information extraction phase. Information can be extracted from the sound channel, through speaker recognition, vocal dictation with unconstrained vocabularies, and script alignment with speech. We present experiment results for these various techniques. Speaker recognition methods were tested on the TIMIT and NTIMIT database. Vocal dictation as experimented on newspaper sentences spoken by several speakers. Script alignment was tested on part of a carton movie, 'Ivanhoe'. For good quality sound segments, error rates are low enough for use in indexing applications. Major issues are the processing of sound segments with noise or music, and performance improvement through the use of appropriate, low-cost architectures or networks of workstations.
© (1997) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Pascal Faudemay, Claude Montacie, and Marie-Jose Caraty "Video indexing based on image and sound", Proc. SPIE 3229, Multimedia Storage and Archiving Systems II, (6 October 1997); https://doi.org/10.1117/12.290365
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Image segmentation

Speaker recognition

Visualization

Databases

Human-machine interfaces

Associative arrays

RELATED CONTENT

Human interface to large multimedia databases
Proceedings of SPIE (April 01 1994)
Video query formulation
Proceedings of SPIE (March 23 1995)
Putting the media into hypermedia
Proceedings of SPIE (March 14 1995)
Video indexing an approach based on moving object and...
Proceedings of SPIE (April 14 1993)

Back to Top