In this paper, we present an approach to learn latent semantic analysis models from loosely annotated images for automatic image annotation and indexing. The given annotation in training images is loose due to: 1. ambiguous correspondences between visual features and annotated keywords; 2. incomplete lists of annotated keywords. The second reason motivates us to enrich the incomplete annotation in a simple way before learning a topic model. In particular, some "imagined" keywords are poured into the incomplete annotation through measuring similarity between keywords in terms of their co-occurrence. Then, both given and imagined annotations are employed to learn probabilistic topic models for automatically annotating new images. We conduct experiments on two image databases (i.e., Corel and ESP) coupled with their loose annotations, and compare the proposed method with state-of-the-art discrete annotation methods. The proposed method improves word-driven probability latent semantic analysis (PLSA-words) up to a comparable performance with the best discrete annotation method, while a merit of PLSA-words is still kept, i.e., a wider semantic range.
The success of the bag-of-words approach for text has inspired the recent use of analogous strategies for global representation of images with local visual features.
Many applications have been proposed for object detection, image annotation, queries-by-example, relevance feedback, automatic annotation, and clustering.
In this paper, we investigate the validity of the bag-of-words analogy for image representation and, more specifically,
local pattern selection for feature generation.
We propose a generalized document representation framework and apply it to the evaluation of two pattern selection strategies for images: dense sampling and point-of-interest detection.
We present empirical results that support our contention that text-based experimentation can provide useful insights into the effectiveness of image representations based on the bag-of-visual-words technique.
Query By Visual Example (QBVE) has been widely exploited in image retrieval. Global visual similarity as well as points of interest matching have proven their efficiency when example image/region is available. If starting image is missing, the Query By Visual Thesaurus (QBVT) paradigm offsets it by allowing the user to compose his mental query image through visual patches summarizing the region database. In this paper, we propose to enrich the paradigm of mental image search by constructing a reliable visual thesaurus of the regions provided by a new coherence criterion. Our criterion encapsulates the local distribution of detected points of interest within a region. It leads to semantic labelling of regions categories using points spatial topology. Our point-based criterion has been validated on a generic image database combining homogenous regions as well as irregularly and fully textured patterns.
In this paper, we present our work for automatic generation of textual metadata based on visual content analysis of video news. We present two methods for semantic object detection and recognition from a cross modal image-text thesaurus. These thesaurus represent a supervised association between models and semantic labels. This paper is concerned with two semantic objects: faces and Tv logos. In the first part, we present our work for efficient face detection and recogniton with automatic name generation. This method allows us also to suggest the textual annotation of shots close-up estimation. On the other hand, we were interested to automatically detect and recognize different Tv logos present on incoming different news from different Tv Channels. This work was done jointly with the French Tv
Channel TF1 within the "MediaWorks" project that consists on an hybrid text-image indexing and retrieval plateform for video news.
To allow efficient browsing of large image collections, we have to provide a summary of its visual content. We present in this paper a robust approach to organize image databases: the Adaptive Robust Competition (ARC). This algorithm relies on a non-supervised database categorization, coupled with a selection of prototypes in each resulting category. This categorization is performed using image descriptors, which describe the visual appearance of the images. A principal component analysis is performed for every feature to reduce dimensionality. Then, clustering is performed in challenging conditions by minimizing a Competitive Agglomeration objective function with an extra noise cluster to collect outliers. The competition is improved to be adaptive to clusters of various densities. In a second step, we provide the user with tools to correct possible misclassifications and personalize the image categories. The constraints to deal with for such a system are the simplicity of the user feedback and the rapidity to propose a new category based on the user's criteria.
The problem of segmenting images with fuzzy clustering is considered. A new approach called 'gradual focusing (beta) -decision' that proceeds in two steps is proposed. First the most 'ambiguous' pixels are revealed from the remaining ones. Next, fine boundaries segmentation is provided by focusing only on ambiguous zone. Since our approach takes into account global information as well as local one, an accurate and smooth result is obtained.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.