Proceedings Article | 18 November 2019
KEYWORDS: Video, Computing systems, Systems modeling, Computer vision technology, Visual process modeling, Visual system, Visualization, Neural networks, Internet
Immersive video applications grow faster for users to freely navigate within a virtualized 3D environment for entertainment, productivity, training, etc. Fundamentally, such system can be facilitated by an interactive Gigapixel Video Streaming (iGVS) platform from array camera capturing to end user interaction. This interactive system demands a large amount of network bandwidth to sustain the reliable service provisioning, hindering its massive market adoption. Thus, we propose to segment the gigapixel scene into non-overlapped spatial tiles. Each tile only covers a sub-region of the entire scene. One or more tiles will be used to represent an instantaneous viewport interested by a specific user. Tiles are then encoded at a variety of quality scales using various combinations of spatial, temporal and amplitude resolutions (STAR), which are typically encapsulated into temporally-aligned tile video chunks (or simply chunks). Chunks at different quality level can be processed in parallel for real-time purpose. With such setup, diverse chunk combinations can be simultaneously accessed by heterogeneous user per its request, and viewport-adaptation based content navigation in an immersive space can be also realized by adapting multiscale chunks properly, under the bandwidth constraints. A serial computational vision models measuring the perceptual quality of viewport video in terms of its quality scales, adaptation factors, as well as the peripheral vision thresholds, are devised to prepare and guide the chunk adaptation for the best perceptual quality index. Furthermore, in response to the time-varying network, a deep reinforcement learning (DRL) based adaptive real-time streaming (ARS) scheme is developed, by learning the future decision from the historical network states, to maximize the overall quality of experience (QoE) in a practical Internet-based streaming scenario. Our experiments have revealed that averaged QoE can be improved by about 60%, and its standard deviation can be also reduced by ≈ 30%, in comparison to the popular Google congestion control algorithm widely adopted in existing system for adaptive streaming, demonstrating the efficiency of our multiscale accelerated iGVS for immersive video application.