KEYWORDS: Signal to noise ratio, Algorithm development, Atomic force microscopy, Interference (communication), Distance measurement, Digital watermarking, Systems modeling, Databases, Feature extraction, Signal processing
Audio fingerprints can be seen as hashes of the perceptual content of an audio excerpt. Applications include linking metadata to unlabeled audio, watermark support, and broadcast monitoring. Existing systems identify a song by comparing its fingerprint to pre-computed fingerprints in a database. Small changes of the audio induce small differences in the fingerprint. The song is identified if these fingerprint differences are small enough. In addition, we found that distances between fingerprints of the original and a compressed version can be used to estimate the quality (bitrate) of the compressed version. In this paper, we study the relationship between compression bit-rate and fingerprint differences. We present a comparative study of the response to compression using three fingerprint algorithms (each representative for a larger set of algorithms), developed at Philips, Polytechnic University of Milan, and Microsoft, respectively. We have conducted experiments both using the original algorithms and using versions modified to achieve similar operation conditions, i.e., the fingerprints use the same number of bits per second. Our study shows similar behavior for these three algorithms.
KEYWORDS: Digital watermarking, Linear filtering, Data hiding, Distortion, Fourier transforms, Quantization, Modulation, Optical filters, Electronic filtering, Signal to noise ratio
Rational Dither Modulation (RDM) is a high-rate data hiding method invariant to gain attacks. We propose an extension of RDM to construct a scheme that is robust to arbitrary linear time-invariant filtering attacks, as opposed to standard Dither Modulation (DM) which we show to be extremely sensitive to those attacks. The novel algorithm, named Discrete Fourier Transform RDM (DFT-RDM) basically works in the DFT domain, applying the RDM core on each frequency channel. We illustrate the feasibility of DFT-RDM by passing the watermarked signal through an implementation of a graphic equalizer: the average error probability is small enough to justify the feasibility of adding a coding with interleaving layer to DFT-RDM. Two easily implementable improvements are discussed: windowing and spreading. In particular, the latter is shown to lead to very large gains.
KEYWORDS: Digital watermarking, Electronic filtering, Linear filtering, Optical filters, Quantization, Statistical analysis, Signal analyzers, Computer programming, Data modeling, Error analysis
This paper presents a scheme for estimating two-band amplitude scale attack within a quantization-based watermarking context. Quantization-based watermarking schemes comprise a class of watermarking schemes that achieves the channel capacity in terms of additive noise attacks. Unfortunately, Quantization-based watermarking schemes are not robust against Linear Time Invariant (LTI) filtering attacks. We concentrate on a multi-band amplitude scaling attack that modifies the spectrum of the signal using an analysis/synthesis filter bank. First we derive the probability density function (PDF) of the attacked data. Second, using a simplified approximation of the PDF model, we derive a Maximum Likelihood (ML) procedure for estimating two-band amplitude scaling factor. Finally, experiments are performed with synthetic and real audio signals showing the good performance of the proposed estimation technique under realistic conditions.
In distributed video coding, the complexity of the video encoder is reduced at the cost of a more complex video decoder.
Using the principles of Slepian andWolf, video compression is then carried out using channel coding principles, under the
assumption that the video decoder can temporally predict side-information that is correlated with the source video frames.
In recent work on distributed video coding the application of turbo codes has been studied. Turbo codes perform well in
typical (tele-)communications settings. However, in distributed video coding the dependency channel between source and
side-information is inherently non-stationary, for instance due to occluded regions in the video frames. In this paper, we
study the modeling of the virtual dependency channel, as well as the consequences of incorrect model assumptions on the
turbo decoding process. We observe a strong dependency of the performance of the distributed video decoder on the model
of the dependency channel.
KEYWORDS: Video, Computer programming, Video compression, Distortion, Video coding, Scalable video coding, Control systems, Data modeling, Optimization (mathematics), Internet
Peer-to-peer networks (P2P) form a distributed communication infrastructure that is particularly well matched to video
streaming using multiple description coding. We form M descriptions using MDC-FEC building on a scalable version
of the "Dirac" video coder. The M descriptions are streamed via M different application layer multicast (ALM) trees
embedded in the P2P network. Client nodes (peers in the network) receive a number of descriptions m < M that is
dependent on their bandwidth. In this paper we consider the optimization of the received video qualities, taking into account
the distribution of the clients' bandwidth. We propose three "fairness" criteria to define the criterion to be optimized.
Numerical results illustrate the effects of the different fairness criteria and client bandwidth distributions on the rates
allocated to the compressed video layers and multiple descriptions.
Quantization-based watermarking schemes comprise a class of watermarking schemes that achieves the channel capacity in terms of additive noise attacks. The existence of good high dimensional lattices that can be efficiently implemented and incorporated into watermarking structures, made quantization-based watermarking schemes
of practical interest. Because of the structure of the lattices, watermarking schemes making use of them are vulnerable to non-additive operations, like amplitude scaling in combination with additive noise. In this paper, we propose a secure Maximum Likelihood (ML) estimation technique for amplitude scaling
factors using subtractive dither. The dither has mainly security purposes and is assumed to be known to the watermark encoder and decoder. We derive the probability density function (PDF) models of the watermarked and attacked data in the presence of subtractive dither. The derivation of these models follows the lines of reference 5, where we derived the PDF models in the absence of dither. We derive conditions for the dither sequence statistics
such that a given security level is achieved using the error probability of the watermarking system as objective function. Based on these conditions we are able to make approximations to the PDF models that are used in the ML estimation procedure. Finally, experiments are performed with real audio and speech signals showing
the good performance of the proposed estimation technique under realistic conditions.
KEYWORDS: Nickel, Data storage, Databases, Internet, Multimedia, Prototyping, Personal digital assistants, Data modeling, Human-computer interaction, Data communications
The Wi-Fi walkman is a mobile multimedia application that we developed to investigate the technological and usability aspects of human-computer interaction with personalized, intelligent and context-aware wearable devices in peer-to-peer wireless environments such as the future home, office, or university campuses. It is a small handheld device with a wireless link that contains music content. Users carry their own walkman around and listen to music. All this music content is distributed in the peer-to-peer network and is shared using ad-hoc networking. The walkman naturally interacts with the users and users’ interest with each other in a peer-to-peer environment. Without annoying interactions, it can learn the users’ music interest/taste and consequently provide personalized music recommendation according to the current situated context and user’s interest.
We present in this paper the results of our study on the human perception of geometric distortions in images. The ultimate goal of this study is to devise an objective measurement scheme for geometric distortions in images, which should have a good correspondence to human perception of the distortions. The study is divided into two parts. The first part of the study is the design and implementation of a user-test to measure human perception of geometric distortions in images. The result of this test is then used as a basis to evaluate the performance of the second part of the study, namely the objective quality measurement scheme. Our experiment shows that our objective quality measurement has good correspondence to the result of the user test and performs much better than a PSNR measurement.
A possible solution to the difficult problem of geometrical distortion of watermarked images in a blind watermarking
scenario is to use a template grid in the autocorrelation function. However, the important drawback of this method is
that the watermark itself can be estimated and subtracted, or the peaks in the Fourier magnitude spectrum can be
removed. A recently proposed solution is to modulate the watermark with a pattern derived from the image content and
a secret key. This effectively hides the watermark pattern, making malicious attacks much more difficult. However, the
algorithm to compute the modulation pattern is computationally intensive. We propose an efficient implementation,
using frequency domain filtering, to make this hiding method more practical. Furthermore, we evaluate the performance
of different kinds of modulation patterns. We present experimental results showing the influence of template hiding on
detection and payload extraction performance. The results also show that modulating the ACF based watermark
improves detection performance when the modulation signal can be retrieved sufficiently accurately. Modulation signals
with small average periods between zero crossings provide the most watermark detection improvement. Using these
signals, the detector can also make the most errors in retrieving the modulation signal until the detection performance
drops below the performance of the watermarking method without modulation.
Quantization-based watermarking schemes are vulnerable to
amplitude scaling. Therefore the scaling factor has to be
accounted for either at the encoder, or at the decoder, prior to
watermark decoding. In this paper we derive the marginal
probability density model for the watermarked and attacked data,
when the attack channel consists of amplitude scaling followed by
additive noise. The encoder is Quantization Index Modulation with
Distortion Compensation. Based on this model we obtain two
estimation procedures for the scale parameter. The first approach
is based on Fourier Analysis of the probability density function. The estimation of the
scaling parameter relies on the structure of the received data.
The second approach that we obtain is the Maximum Likelihood
estimator of the scaling factor. We study the performance of the
estimation procedures theoretically and experimentally with real
audio signals, and compare them to other well known approaches for
amplitude scale estimation in the literature.
KEYWORDS: Digital watermarking, Probability theory, Sensors, Transform theory, Distortion, Information technology, Image sensors, Image filtering, Linear filtering, Telecommunications
One way of recovering watermarks in geometrically distorted images is by performing a geometrical search. In addition to the computational cost required for this method, this paper considers the more important problem of false positives. The maximal number of detections that can be performed in a geometrical search is bounded by the maximum false positive detection probability required by the watermark application. We show that image and key dependency in the watermark detector leads to different false positive detection probabilities for geometrical searches for different images and keys. Furthermore, the image and key dependency of the tested watermark detector increases the random-image-random-key false positive detection probability, compared to the Bernoulli experiment that was used as a model.
One of the most active research area in the watermarking community is the research in dealing with geometric distortion. The geometric distortion problem has two aspects, namely its effect on watermark detectability and its effect on the perceptual quality of the watermarked data. Most research in this area has been concentrated on addressing the first aspect of the problem, and research on objective visual quality assessment of geometrically distorted images is not widely discussed in the literature. As a consequence, there is a lack of objective visual quality measurement for this class of distortion. In this paper we propose a method of objectively assessing the perceptual quality of geometrically distorted images. Our approach is based on the modeling of a complex, global geometric distortion using local, simpler geometric transformation models. The locality of this simpler geometric transformation determines the visual quality of the distorted images.
A challenging aspect in the development of robust watermarking algorithms is the ability to withstand complex geometric distortion of the media. A few existing techniques are known to deal with such transformations up to a certain level. Traditionally, the measure of the degradation caused by an attack on an image only addressed the pixel value modification. However, a degradation consequent to the geometric distortion of an image cannot be measured with traditional criteria. Therefore the evaluation and comparison of the robustness to desynchronization of different watermarking schemes was not possible. In this paper, we present an innovative method to measure the distortion introduced by complex geometric deformations of an image. The distortion measure is expressed in term of how closely the applied transform can be approximated by a simpler transform model (e.g. RST transform, affine transform). The scheme relies on the local least square estimation of the parameters of the reference transform model. Eventually, we illustrate the proposed measure by presenting some results for different complex image distortions.
KEYWORDS: Digital watermarking, Sensors, Image filtering, Video, Image processing, Linear filtering, Distortion, Digital image processing, Video processing, Cameras
For most watermarking methods, preserving the synchronization between the watermark embedded in a digital data (image, audio or video) and the watermark detector is critical to the success of the watermark detection process. Many digital watermarking attacks exploit this fact by disturbing the synchronization of the watermark and the watermark detector, and thus disabling proper watermark detection without having to actually remove the watermark from the data. Some techniques have been proposed in the literature to deal with this problem. Most of these techniques employ methods to reverse the distortion caused by the attack and then try to detect the watermark from the repaired data. In this paper, we propose a watermarking technique that is not sensitive to synchronization. This technique uses a structured noise pattern and embeds the watermark payload into the geometrical structure of the embedded pattern.
KEYWORDS: Digital watermarking, Directed energy weapons, Video, Visualization, Internet, Spatial resolution, Multimedia, Computer programming, Information technology, Data communications
Digital video data distribution through the internet is becoming more common. Film trailers, video clips and even video footage from computer and video games are now seen as very powerful means to boost sales of the aforementioned products. These materials need to be protected to avoid copyright infringement issues. However, these materials are encoded at a low bit-rate to facilitate internet distribution and this poses a challenge to the watermarking operation. In this paper we present an extension to the Differential Energy Watermarking algorithm, to use it in low bit-rate environment. We present the extension scheme and its evaluate its performance in terms of watermark capacity, robustness and visual impact.
In this paper we present a system for automated analysis, classification and indexing of broadcast news programs. The system first analyses the visual and the speech stream of an input news program in order to obtain an initial partitioning of the program into the so-called report segments. The analysis of the visual stream provides the boundaries of the report segments lying at the beginning and the end of each anchorperson shot. This analysis step is performed by applying an existing techniques for anchorperson shot detection. The analysis of the speech stream gives the boundaries of the report segments lying in the middle of each silent interval. Then, the transcribed speech of each of the report segments is matched with the content of a large pre-specified textual topic database. This database covers a large number of topics and can be updated by the user at any time. Fro each topic a vast number of keywords is given, each of which is also assigned a weight that indicates the importance of a keyword for a certain topic. The result of the matching procedure is a list of probable topics per report segment, where for each topic on the list a likelihood is computed based on the number of relevant keywords found in the segment and on the weights of those keywords. The list of topics per segment is then shortened by separating the most probable from least probable topics based on their likelihood. Finally, the likelihood values of the most probable topics are used in the last system module to merge related neighboring segments into reports. The most probable topics serving as the base for the segment-merging procedure are the same time the retrieval indexes for the reports and are used for classifying together all reports in the database that cover one and the same topic.
KEYWORDS: Forward error correction, Computer programming, Video, Mobile communications, Image compression, Video coding, Data compression, Data communications, Optical spheres, Data modeling
In this paper we assess the independence of the optimization of source and channel coding parameters. We propose a method to separate the source and channel coding optimization as much as possible while maintaining the possibility of joint optimization. We theoretically derive key parameters that must be passed through an interface between source and channel coding. This separation greatly reduces the complexity of the optimization problem and enhances the flexibility.
KEYWORDS: Visualization, Video, Matrices, Image segmentation, Algorithm development, Lithium, Silicon, Information visualization, Information technology, Information theory
The paper addresses visual similarity measuring and hierarchical groping of the most representative frames for the purpose of video abstraction. Our approach concentrates on measuring similarity of image regions. To produce a visual similarity measure we used as primary information the color histograms in the YUV color space. The difference from previous histogram based approaches is that we divide the input images into rectangles whose sizes depend on the local 'structure' of the image. We assume that similar regions in two different images would have approximately the same rectangle structure. Therefore, it should ge enough to compare the color histogram of the pixels within these rectangles in order to determine the similarity of two regions in two different images. We measure similarity between regions by a similarity score that is asymmetric. Such a measure cannot be used in classical clustering techniques for grouping representative frames. Our approach is therefore based on graph theoretic techniques. First, we construct an oriented weighted graph having as vertices the original set of key-frames. Next, we construct the set of weighted edges, according to the similarity values computed for each ordered pair of key-frames. Finally, we transform this graph into a collection of two-level trees, whose root key-frames form an abstract of the original ones. For graph construction and transformation, we present here two algorithms. The experiments we performed with the proposed technique showed improvements with respect to the way the visual content is represented. This conclusion is based on subjective assessment of the result groupings and the selection of the most representative key-frame.
In our earlier work we have proposed a watermarking algorithm for JPEG/MPEG streams that is based on selectively discarding high frequency DCT coefficients. Like any watermarking algorithm, the performance of our method must be evaluated by the robustness of the watermark, the size of the watermark, and the visual degradation the watermark introduces. These performance factors are controlled by three parameters, namely the maximal coarseness of the quantizer used in re-encoding, the number of DCT blocks used to embed a single watermark bit, and the lowest DCT coefficient that we permit to be discarded. It is possible to determine these parameters experimentally. In this paper, however, we follow a more rigorous approach and develop a statistical model for the watermarking algorithm. Using this model we derive the probability that a label bit cannot be embedded. The resulting model can be used, for instance, for maximizing the robustness against re-encoding and for developing adequate error correcting codes for the label bit string.
In this paper, we present the concept of an efficient semiautomatic system for analysis, classification and indexing of TV news program material, and show the feasibility of its practical realization. The only input into the system, other than the news program itself, are the spoken words, serving as keys for topic prespecification. The chosen topics express user's current professional or private interests and are used for filtering the news material correspondingly. After the basic analysis steps on a news program stream, including the processes of shot change detection and key frame extraction, the system automatically represents the news program as a series of longer higher-level segments. Each of them contains one or more video shots and belongs to one of the coarse categories, such as anchorperson (news reader) shots, news shot series, the starting and ending program sequence. The segmentation procedure is performed on the video component of the news program stream and the results are used to define the corresponding segments in the news audio stream. In the next step, the system uses the prespecified audio keys to index the segments and group them into reports, being the actual retrieval units. This step is performed on the segmented news audio stream by applying the wordspotting procedure to each segment. As a result, all the reports on prespecified topics are easily reachable for efficient retrieval.
We propose a new model for the prediction of distortion visibility in digital image sequences, which is aimed at use in digital video compression algorithms. The model is an extension of our spatial vision model with a spatio-temporal contrast sensitivity function and an eye movement estimation algorithm. Due to the importance of smooth pursuit eye movements when viewing image sequences, eye movements cannot be neglected in a spatio-temporal vision model. Although eye movements can be incorporated by motion compensation of the contrast sensitivity function, the requirements for this motion compensation are different than those for motion compensated prediction in video coding. We propose an algorithm for the estimation of smooth pursuit eye movements, under the worst-case assumption that the observer is capable of tracking all objects in the image.
KEYWORDS: Video, Databases, Video processing, Data storage, Statistical analysis, Cameras, Visualization, Video compression, Digital libraries, Algorithm development
In the European project SMASH mass-market storage systems for domestic use are under study. Besides the storage technology that is developed in this project, the related objective of user-friendly browsing/query of video data is studied as well. Key issues in developing a user-friendly system are (1) minimizing the user-intervention in preparatory steps (extraction and storage of representative information needed for browsing/query), (2) providing an acceptable representation of the stored video content in view of a higher automation level, (3) the possibility for performing these steps directly on the incoming stream at storage time, and (4) parameter-robustness of algorithms used for these steps. This paper proposes and validate novel approaches for automation of mentioned preparatory phases. A detection method for abrupt shot changes is proposed, using locally computed threshold based on a statistical model for frame-to-frame differences. For the extraction of representative frames (key frames) an approach is presented which distributes a given number of key frames over the sequence depending on content changes in a temporal segment of the sequence. A multimedia database is introduced, able to automatically store all bibliographic information about a recorded video as well as a visual representation of the content without any manual intervention from the user.
In the European project SMASH a mass multimedia storage device for home usage is being developed. The success of such a storage system depends not only on technical advances, but also on the existence of an adequate copy protection method. Copy protection for visual data requires fast and robust labeling techniques. In this paper, two new labeling techniques are proposed. The first method extends an existing spatial labeling technique. This technique divides the image into blocks and searches an optimal label- embedding level for each block instead of using a fixed embedding-level for the complete image. The embedding-level for each block is dependent on a lower quality JPEG compressed version of the labeled block. The second method removes high frequency DCT-coefficients in some areas to embed a label. A JPEG quality factor and the local image structure determine how many coefficients are discarded during the labeling process. Using both methods a perceptually invisible label of a few hundred bits was embedded in a set of true color images. The label added by the spatial method is very robust against JPEG compression. However, this method is not suitable for real-time applications. Although the second DCT-based method is slightly less resistant to JPEG compression, it is more resistant to line-shifting and cropping than the first one and is suitable for real-time labeling.
We introduce a new model that can be used in the perceptual optimization of standard color image coding algorithms (JPEG/MPEG). The human visual system model is based on a set of oriented filters and incorporates background luminance dependencies, luminance and chrominance frequency sensitivities, and luminance and chrominance masking effects. The main problem in using oriented filter-based models for the optimization of coding algorithms is the difference between the orientation of the filters in the model domain and the DCT block transform in decoding domain. We propose a general method to combine these domains by calculating a local sensitivity for each DCT (color) block. This leads to a perceptual weighting factor for each DCT coefficient in each block. We show how these weighting factors allow us to use advanced techniques for optimal bit allocation in JPEG (e.g. custom quantization matrix design and adaptive thresholding). With the model we propose it is possible to calculate a perceptually weighted mean squared error (WMSE) directly in the DCT color domain, although the model itself is based on a directional frequency band decomposition.
KEYWORDS: Digital video recorders, Video, Video compression, Computer programming, Associative arrays, Digital recording, Head, Telecommunications, Multimedia, Prototyping
The forthcoming introduction of helical scan digital data tape recorders with high access bandwidth and large capacity will facilitate the recording and retrieval of a wide variety of multimedia information from different sources, such as computer data and digital audio and video. For the compression of digital audio and video, the MPEG standard has internationally been accepted. Although helical scan tape recorders can store and playback MPEG compressed signals transparently they are not well suited for carrying out special playback modes, in particular fast forward and fast reverse. Only random portions of a original MPEG bitstream are recovered on fast playback. Unfortunately these shreds of information cannot be interpreted by a standard MPEG decoder, due to loss of synchronization and missing reference pictures. In the EC-sponsored RACE project DART (Digital Data Recorder Terminal) the possibilities for recording and fast playback of MPEG video on a helical scan recorder have been investigated. In the approach we present in this paper, we assume that not transcoding is carried out on the incoming bitstream at recording time, nor that any additional information is recorded. To use the shreds of information for the reconstruction of interpretable pictures, a bitstream validator has been developed to achieve conformance to the MPEG-2 syntax during fast playback. The concept has been validated by realizing hardware demonstrators that connect to a prototype helical scan digital data tape recorder.
KEYWORDS: Distortion, Computer programming, Image quality, Quantization, Video compression, Video, Visualization, Visual system, Video coding, Digital signal processing
Variable bit rate transmission opens the way to constant quality video coding. However, this requires a different approach from the traditional constant bit rate coding techniques since a constant distortion does not yield a constant quality. We introduce a method to determine the maximum distortion locally, and a technique to minimize the bit rate regarding this local maximum distortion. Also, it is shown that with bit stream shaping and peak bit rate control the bit rate will always be lower than in a CBR source with similar quality in the most difficult scenes.
The subject of the extraction of a dedicated fast playback stream from a normal play MPEG encoded stream is important for recording applications where fast playback is supported by such a device. The most important issue is the selection of the coefficients of codewords to retain for the fast playback stream. In this paper several codeword extraction methods of varying complexity, ranging from optimal extraction methods to a zonal extraction method are evaluated. The range of possible solutions gives the possibility to make a trade-off between performance and complexity. The newly developed selection method, based on a Lagrangian cost minimization per block in combination with a feedback rate control, yields an attractive performance-complexity combination.
KEYWORDS: Video, Scanners, Head, Interfaces, Mode locking, Signal to noise ratio, Computer programming, Reverse modeling, Data modeling, Systems modeling
In this paper several formatting solutions which trick mode (fast forward, fast reverse) support for helical scan recording are proposed. First, two basic methods where the trick mode signal is built up from semi-random sections of the normal stream, will be discussed. These low complexity methods form the basis for the extension to more advanced solutions that employ a dedicated bit stream for trick modes. A method of formatting this dedicated trick stream on a recorder without phase lock between the scanner and the tape during fast forward is presented by means of three separate case studies. In order to guarantee that the trick mode stream is read at the chosen speed multiple copies of the same stream need to be placed at well chosen places on the tape. This solution is extremely attractive for systems where a limited number of speedup levels are required. Recorders that can perform phase locking during fast forward yield a particular advantage for high speed-up (n equals 12, 18, ...) factors and allow for the support of more distinct speedup levels.
In this paper we introduce a block based vector field estimation technique which is based upon a genetic algorithm and which exploits the dual sensor nature of a stereoscopic signal in order to accelerate its convergence. This vector field estimation technique has been designed to produce smooth vector fields at a small block size without sacrificing accuracy. Conversely (false) accuracy does not impinge upon the smoothness of the vector field.
The MPEG video coding algorithm is used in a large variety of video recording applications. A key constraint for video coding algorithms for consumer (tape) recorder applications is the bit stream editability requirement; i.e., it must be possible to replace N consecutive frames by N new consecutive frames on the storage media, using at most the same number of bits. In this paper this constraint is satisfied by the use of a forward rate control mechanism, such that each group of pictures (GoP) will be encoded using a fixed number of bits (within a tolerance margin to be minimized). The problem of performing a forward state allocation (quantizer step allocation) is limited to the picture level by performing a pre-allocation, assigning a fraction of the available bits to each of the frames of a GoP. The state allocation per picture amounts to the correct selection of the quantization step size for all slices. This is done by forming parametric models for both the rate (R) and the distortion (D), such that for a particular slice, the R-D curve can be predicted. Using the R-D curves of every slice of the picture, the state allocation can be performed. With the described algorithm the GoP rate error is within 4% in the stationary mode; if a non-stationary mode that includes a re-allocation based on feedback information is added, the error is within 1%.
For the transmission of HDTV-signals a data reduction is necessary. In currently implemented systems this data reduction is achieved using sub-Nyquist sampling for the stationary part of the image sequence. If the concept of sub-Nyquist is to be extended to moving parts of the scene, the problem of critical velocities is introduced. We propose to solve this problem by shifting the sampling lattice according to the motion, in such a way that no discarded pixels are present in the direction of the displacements. As such this method can be called motion compensated sub-Nyquist sampling. We show how this algorithm can be extended to incorporate fractional accuracy of the motion estimation and combinations with other subsampling structures. The control structure is based on the results of an error analysis of motion compensated interpolation schemes. The experimental results show a improved performance compared to fixed subsampling and nonadaptive sub-Nyquist sampling.
Two adaptive approaches for nonstationary filtering of image sequences are presented and experimentally compared. According to the first approach, a recursive spatio-temporal motion- compensated (MC) estimator is applied to the noisy sequence that adapts to the local spatial and temporal signal activity. A separable 3-D estimator is proposed that consists of three coupled 1-D estimators; its input is the noisy image plus additional signals that contain spatial information provided by a simple edge-detector or temporal information provided by the MC backward difference (registration error). The steady-state gain and the parameters of this separable estimator are computed by closed form formulae, thus allowing a very efficient implementation. According to the second approach, the noisy signal is first decomposed into a stationary and a nonstationary part based on an estimate of its local mean and deviation. A minimum variance estimator of the local mean and deviation of the observed signal is used. After the current mean is subtracted from the observed signal and the signal is normalized by using the current deviation, a relatively simple noise filter is used for filtering the stationary part. The above methods are applied to the filtering of noisy video-conference image sequences for various levels of noise. Both methods show a very satisfactory performance taking into consideration their simplicity and computational efficiency.
For an economical introduction of HDTV a substantial datareduction is necessary, while maintaining the high image quality. To this end Sub-Nyquist techniques can be used (MUSE, HD-MAC) for stationary parts of the image, spreading the sampling of one complete frame over several frames and combining these different frames at the receiver. Without special precautions Sub-Nyquist sampling is not possible for moving areas of the image, in this paper a new algorithm will be described for the subsampling of moving parts of a video sequence. The advantage of this new method is that the full spatial resolution can be preserved, while also maintaining the full temporal resolution. To prevent aliasing at certain velocities ( critical velocities ), the image is divided into a high-pass and a low-pass part prior to subsampling. At the receiver a motion compensated interpolation filter is used to reconstruct the original image.
KEYWORDS: Video, Visual communications, Video coding, Electronic filtering, Optical filters, Image processing, Signal processing, Video compression, Data compression, Receivers
With the introduction of a multitude of video and multimedia services in the context of broadband communication networks, compatibility between various different data compression systems is becoming increasingly important. Significant research efforts have been recently directed towards so-called hierarchical coding schemes that provide compatibility by splitting the source signal into several hierarchical layers. Compatibility is achieved in this way since a receiver selects and decodes only those layers which are relevant for its (fixed resolution) display monitor. In this paper we introduce and investigate two hierarchical spatio-temporal subband decompositions; the so-called ’full’ and ’reduced temporal hierarchy’ decompositions. The former supports progressive-scan video signals only while the latter is capable of handling interlaced signals as well. The better frequency discrimination of the ’full temporal hierarchy' decomposition is expected to lead to higher coding performances for the interlaced signals but has a higher complexity.
KEYWORDS: Point spread functions, Image processing, Image resolution, Image restoration, Autoregressive models, Visual communications, Visual process modeling, Linear filtering, Image acquisition, Signal to noise ratio
In this paper we discuss the use of maximum likelihood estimation procedures for the identification of unknowns blur from a blurred image. The main focus will be on the problem of estimating the coefficients of relatively large point-spread functions, and the estimation of the support size of point-spread functions in general. Two improved blur identification techniques are proposed which are both based on the expectation- maximization algorithm. In the first method we describe the point-spread function by a parametric model, while in the second method resolution pyramids are employed to identify the point-spread function in a hierarchical manner.
KEYWORDS: Quantization, Video, Video coding, Signal to noise ratio, Receivers, Visual communications, Transmitters, Broadband telecommunications, Visualization, Data communications
The Broadband Integrated Services Digital Network (BISDN) based on lightwave technology is supposed to become the all-purpose exchange area communications network of the future. All digital video services are integrated with applications ranging from videophone, teleconferencing to digital TV (signals according to the CCIR rec. 601) and High Definition TV (HDTV) distribution. A desirable feature of the various video services is the upward and downward compatibility in resolution in order to guarantee a free exchange of services, transmitters and receivers. This paper proposes an n-level progressive hierachical intraframe coding scheme based on subband coding. In this scheme several spatial low-resolution services are available as subsets of the coded HDTV data which can directly be received at lower bit rates. Progressive coding of the HDTV signal is employed in order to prevent quantization errors from propagating to higher resolution signals. Special attention is given to the design of the quantizers required for the progressive coding, and to the incorporation of the side panels coding.
A number of different algorithms have recently been proposed to identify the image and blur model parameters from an image that is degraded by blur and noise. This paper gives an overview of the developments in image and blur identification under a unifying maximum likelihood framework. In fact, we show that various recently published image and blur identification algorithms are different implementations of the same maximum likelihood estimator resulting from different modeling
assumptions and/or considerations about the computational complexity. The use of the maximum likelihood estimation in image and blur identification
is illustrated by numerical examples.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.