White balance estimation (WBE) is one of the most fundamental and crucial steps in modern Image Signal Processor (ISP). Recent years have witnessed the advancements of deep-learning based WBE. However, existing models were mostly trained on individual datasets with limited samples captured using various camera sensors, making it hard for model generalization. In this paper, we propose a novel Channel-Attention optimized U-net model, in which an angular loss is embedded, to accurately estimate the white balance. We demonstrate our approach on recently released largescale dataset “Cube Plus” captured using the same camera sensor, offering state-of-the-art performance.
Immersive video applications grow faster for users to freely navigate within a virtualized 3D environment for entertainment, productivity, training, etc. Fundamentally, such system can be facilitated by an interactive Gigapixel Video Streaming (iGVS) platform from array camera capturing to end user interaction. This interactive system demands a large amount of network bandwidth to sustain the reliable service provisioning, hindering its massive market adoption. Thus, we propose to segment the gigapixel scene into non-overlapped spatial tiles. Each tile only covers a sub-region of the entire scene. One or more tiles will be used to represent an instantaneous viewport interested by a specific user. Tiles are then encoded at a variety of quality scales using various combinations of spatial, temporal and amplitude resolutions (STAR), which are typically encapsulated into temporally-aligned tile video chunks (or simply chunks). Chunks at different quality level can be processed in parallel for real-time purpose. With such setup, diverse chunk combinations can be simultaneously accessed by heterogeneous user per its request, and viewport-adaptation based content navigation in an immersive space can be also realized by adapting multiscale chunks properly, under the bandwidth constraints. A serial computational vision models measuring the perceptual quality of viewport video in terms of its quality scales, adaptation factors, as well as the peripheral vision thresholds, are devised to prepare and guide the chunk adaptation for the best perceptual quality index. Furthermore, in response to the time-varying network, a deep reinforcement learning (DRL) based adaptive real-time streaming (ARS) scheme is developed, by learning the future decision from the historical network states, to maximize the overall quality of experience (QoE) in a practical Internet-based streaming scenario. Our experiments have revealed that averaged QoE can be improved by about 60%, and its standard deviation can be also reduced by ≈ 30%, in comparison to the popular Google congestion control algorithm widely adopted in existing system for adaptive streaming, demonstrating the efficiency of our multiscale accelerated iGVS for immersive video application.