In this paper, we propose a novel feature map compression method for Video Coding for Machines (VCM). The proposed method performs a principal component analysis (PCA)-based transform on feature pyramid network (FPN) feature maps using predefined basis and mean vectors. In addition, the proposed method reduces redundancy between different resolution levels within FPN feature maps based on redundancy between FPN layers. The fixed predefined basis and mean are employed through PCA with a set of training data set. For any input videos, transform coefficients are obtained by performing transform with the fixed basis and compressed using Versatile Video Coding (VVC). Experimental results show that the proposed method achieves 89.22% and 86.57% BD-rate gain compared to the VCM feature anchor in instance segmentation, and object detection, respectively.
KEYWORDS: Video, Video compression, Machine vision, Distortion, Video coding, Signal processing, Image compression, Visual process modeling, Networks, Image processing
We previously trained the compression network via optimization of bit-rate and distortion (feature domain MSE) [1]. In this paper, we propose feature map compression method for video coding for machine (VCM) based on deep learning-based compression network that joint training for optimizing both compressed bit rate and machine vision task performance. We use bmshij2018-hyperporior model in the CompressAI [2] as the compression network, and compress the feature map which is the output of stem layer in the Faster R-CNN X101-FPN network of Detectron2 [3]. We evaluated the proposed method by evaluation framework for MPEG VCM. The proposed method shows the better results than VVC of MPEG VCM anchor.
Since the big hit of 3D movie 'Avatar', broadcasting industry is preparing launch of 3DTV service. There are many
ways to transmit stereoscopic video data and provide 3DTV service. In order to provide 3DTV service of full HDTV
quality and at the same time 2DTV service to legacy viewers in terrestrial broadcasting, we need to send two images of
full resolution encoded separately. We present new hybrid encoding schemes: 'MPEG-2+AVC', 'MPEG-2+AVC
Interview' and 'MPEG-2+HEVC', and then compare objective and subjective performance of each scheme. 'MPEG-
2+HEVC' shows the best result and it shows the possibility of providing backward compatible 3DTV service while
keeping HDTV quality to legacy HDTV users.
In this paper, 16-order and 32-order integer transform kernels are designed for the HD video coding in
H.264|MPEG-4 AVC and the performance analyses for large transforms are presented. An adaptive block size transform
coding scheme is also proposed based on the proposed transform kernels. Thus, additional 16-order (16 × 16, 16 × 8 and
8×16) and 32-order (32×32, 32×16 and 16×32) transforms are performed in addition to 8×8 and 4×4 transforms
which are exploited in the Fidelity Range Extension of H.264|MPEG-4 AVC. The experimental results show that the
variable block size transforms with the proposed higher order transform kernels yields 14.96% of bit saving in maximum
for HD video sequences.
We propose a rate-distortion optimized transform coding method that adaptively employs either integer cosine transform that is an integer-approximated version of discrete cosine transform (DCT) or integer sine transform (IST) in a rate-distortion sense. The DCT that has been adopted in most video-coding standards is known as a suboptimal substitute for the Karhunen-Loève transform. However, according to the correlation of a signal, an alternative transform can achieve higher coding efficiency. We introduce a discrete sine transform (DST) that achieves the high-energy compactness in a correlation coefficient range of −0.5 to 0.5 and is applied to the current design of H.264/AVC (advanced video coding). Moreover, to avoid the encoder and decoder mismatch and make the implementation simple, an IST that is an integer-approximated version of the DST is developed. The experimental results show that the proposed method achieves a Bjøntegaard Delta-RATE gain up to 5.49% compared to Joint model 11.0.
In this paper, we propose a new selective regional slice coding method for H.264 that can improve the quality of decoded
video. We can get better performance when we compare to conventional coding methods. And it also serves better
shapes in a picture by removing adjacent to macroblock edge. In this paper, flexible macroblock ordering (FMO) is used
for transmitting the slice. FMO is used for sliced coding in packet loss environment. We propose the way to modify to
improve slice coding in the sequence. The experiment result of the proposed method shows the improvement of quality
and also error robustness. The result of implementation about such system is improved 0.11dB to 1.52dB when we
compare to non-slice coding in the experiment.
Technical evolutions in the field of information technology have changed many aspects of the industries and the life of human beings. Internet and broadcasting technologies act as core ingredients for this revolution. Various new services that were never possible are now available to general public by utilizing these technologies. Multimedia service via IP networks becomes one of easily accessible service in these days. Technical advances in Internet services, the provision of constantly increasing network bandwidth capacity, and the evolution of multimedia technologies have made the demands for multimedia streaming services increased explosively. With this increasing demand Internet becomes deluged with multimedia traffics. Although multimedia streaming services became indispensable, the quality of a multimedia service over Internet can not be technically guaranteed. Recently users demand multimedia service whose quality is competitive to the traditional TV broadcasting service with additional functionalities. Such additional functionalities include interactivity, scalability, and adaptability. A multimedia that comprises these ancillary functionalities is often called richmedia. In order to satisfy aforementioned requirements, Interactive Scalable Multimedia Streaming (ISMuS) platform is designed and developed. In this paper, the architecture, implementation, and additional functionalities of ISMuS platform are presented. The presented platform is capable of providing user interactions based on MPEG-4 Systems technology [1] and supporting an efficient multimedia distribution through an overlay network technology. Loaded with feature-rich technologies, the platform can serve both on-demand and broadcast-like richmedia services.
In this paper, we present an effective streaming method for MPEG-4 contents using the schedule information of image objects and progressive JPEG. The proposed method is designed for Interactive Scalable Multimedia Streaming (ISMuS) system. In rich interactive contents, the amount of image objects is not negligible for a streaming service with QoS. If a streaming system does not manage the image data, it could create a bottleneck in the system. The proposed method considers the schedule information of image objects to be displayed within a specific time frame, generally within a few second. Since the proposed method uses the progressive JPEG instead of Baseline JPEG, it treats image object as scalable Object. The streaming server sends surely DC data of each image object and AC data of image object is sent only when there is an enough room for AC data in network bandwidth. The priorities of audio and video elementary stream are also within the consideration as well as those image objects according to the varying network status.
KEYWORDS: Video, Multimedia, Stereolithography, Chemical species, Internet, Visualization, Networks, Video coding, Scalable video coding, Standards development
In this paper, we present an MPEG-4 contents streaming system and propose MPEG-4 contents streaming scheme by using priority. The presented streaming system which consists of a server and a client supports MPEG-4 contents compliant with ISO/IEC 14496-1 and enables a user to interact with MPEG-4 contents over IP networks. The server consists of GUI, Server Management Layer, Sync Layer, and Delivery Layer. The client supports to display MPEG-4 contents stored in local storage or received through IP networks. Moreover, we propose an MPEG-4 contents streaming scheme that the object a user prefers to watch is sent first by increasing priority and objects with low priority are dropped at a server side when network bandwidth is not enough to transmit all objects that are supposed to appear in the scene. We made experiment of the proposed scheme with the presented MPEG-4 contents streaming system, and the experiment results are shown in this paper. If we use the proposed scheme for MPEG-4 contents streaming, it is possible for a user to watch a video of interest in high quality and video of indifference in low quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.