MPEG Omnidirectional Media Format (OMAF) specifies both a viewport-dependent video profile and a viewportdependent presentation profile to enable immersive media applications. A sub-picture-based approach for viewportdependent streaming is one of the main approaches being explored in MPEG OMAF standardization. This paper presents a sub-picture-based omnidirectional video live streaming platform with state of the art technologies integrated on both server and client sides to illustrate the benefits of such viewport-dependent omnidirectional video streaming approach. The technologies include omnidirectional video acquisition, sub-picture partitioning, real-time GPUaccelerated HEVC encoding and DASH-based live streaming. The presented platform supports virtual reality (VR) clients including both VR head-mounted displays (HMDs) and conventional 2D displays. Viewing orientation tracking and realtime viewport extraction are also supported in our platform. As with all live streaming platforms, one of the main goals of our platform is to minimize end-to-end system latency. A new metric called Comparable-Quality Viewport Switching (CQVS) latency is proposed to evaluate the performance of viewport dependent video streaming and presentation. The CQVS latency is defined as the amount of time it takes for the viewport video quality to improve to a level comparable to that prior to viewport switching. The platform was demonstrated in the Joint 3GPP and VRIF workshop on VR and the 2018 Mobile World Congress as one of the first OMAF-compliant viewport-dependent live streaming solutions.
360° video is an emerging new format in the media industry enabled by the growing availability of virtual reality devices. It provides the viewer a new sense of presence and immersion. Compared to conventional rectilinear video (2D or 3D), 360° video poses a new and difficult set of engineering challenges on video processing and delivery. Enabling comfortable and immersive user experience requires very high video quality and very low latency, while the large video file size poses a challenge to delivering 360° video in a quality manner at scale. Conventionally, 360° video represented in equirectangular or other projection formats can be encoded as a single standards-compliant bitstream using existing video codecs such as H.264/AVC or H.265/HEVC. Such method usually needs very high bandwidth to provide an immersive user experience. While at the client side, much of such high bandwidth and the computational power used to decode the video are wasted because the user only watches a small portion (i.e., viewport) of the entire picture. Viewport dependent 360°video processing and delivery approaches spend more bandwidth on the viewport than on non-viewports and are therefore able to reduce the overall transmission bandwidth. This paper proposes a dual buffer segment scheduling algorithm for viewport adaptive streaming methods to reduce latency when switching between high quality viewports in 360° video streaming. The approach decouples the scheduling of viewport segments and non-viewport segments to ensure the viewport segment requested matches the latest user head orientation. A base layer buffer stores all lower quality segments, and a viewport buffer stores high quality viewport segments corresponding to the most recent viewer’s head orientation. The scheduling scheme determines viewport requesting time based on the buffer status and the head orientation. This paper also discusses how to deploy the proposed scheduling design for various viewport adaptive video streaming methods. The proposed dual buffer segment scheduling method is implemented in an end-to-end tile based 360° viewports adaptive video streaming platform, where the entire 360° video is divided into a number of tiles, and each tile is independently encoded into multiple quality level representations. The client requests different quality level representations of each tile based on the viewer’s head orientation and the available bandwidth, and then composes all tiles together for rendering. The simulation results verify that the proposed dual buffer segment scheduling algorithm reduces the viewport switch latency, and utilizes available bandwidth more efficiently. As a result, a more consistent immersive 360° video viewing experience can be presented to the user.
This paper proposes a layer based buffer aware rate adaptation design which is able to avoid abrupt video quality
fluctuation, reduce re-buffering latency and improve bandwidth utilization when compared to a conventional simulcast
based adaptive streaming system. The proposed adaptation design schedules DASH segment requests based on the
estimated bandwidth, dependencies among video layers and layer buffer fullness.
Scalable HEVC video coding is the latest state-of-art video coding technique that can alleviate various issues caused by
simulcast based adaptive video streaming. With scalable coded video streams, the video is encoded once into a number
of layers representing different qualities and/or resolutions: a base layer (BL) and one or more enhancement layers (EL),
each incrementally enhancing the quality of the lower layers. Such layer based coding structure allows fine granularity
rate adaptation for the video streaming applications.
Two video streaming use cases are presented in this paper. The first use case is to stream HD SHVC video over a
wireless network where available bandwidth varies, and the performance comparison between proposed layer-based
streaming approach and conventional simulcast streaming approach is provided. The second use case is to stream
4K/UHD SHVC video over a hybrid access network that consists of a 5G millimeter wave high-speed wireless link and a
conventional wired or WiFi network. The simulation results verify that the proposed layer based rate adaptation
approach is able to utilize the bandwidth more efficiently. As a result, a more consistent viewing experience with higher
quality video content and minimal video quality fluctuations can be presented to the user.
Scalable video coding provides an efficient solution to support video playback on heterogeneous devices with various channel conditions in heterogeneous networks. SHVC is the latest scalable video coding standard based on the HEVC standard. To improve enhancement layer coding efficiency, inter-layer prediction including texture and motion information generated from the base layer is used for enhancement layer coding. However, the overall performance of the SHVC reference encoder is not fully optimized because rate-distortion optimization (RDO) processes in the base and enhancement layers are independently considered. It is difficult to directly extend the existing joint-layer optimization methods to SHVC due to the complicated coding tree block splitting decisions and in-loop filtering process (e.g., deblocking and sample adaptive offset (SAO) filtering) in HEVC. To solve those problems, a joint-layer optimization method is proposed by adjusting the quantization parameter (QP) to optimally allocate the bit resource between layers. Furthermore, to make more proper resource allocation, the proposed method also considers the viewing probability of base and enhancement layers according to packet loss rate. Based on the viewing probability, a novel joint-layer RD cost function is proposed for joint-layer RDO encoding. The QP values of those coding tree units (CTUs) belonging to lower layers referenced by higher layers are decreased accordingly, and the QP values of those remaining CTUs are increased to keep total bits unchanged. Finally the QP values with minimal joint-layer RD cost are selected to match the viewing probability. The proposed method was applied to the third temporal level (TL-3) pictures in the Random Access configuration. Simulation results demonstrate that the proposed joint-layer optimization method can improve coding performance by 1.3% for these TL-3 pictures compared to the SHVC reference encoder without joint-layer optimization.
This paper proposes a parallel decoding framework for scalable HEVC (SHVC). Various optimization technologies are implemented on the basis of SHVC reference software SHM-2.0 to achieve real-time decoding speed for the two layer spatial scalability configuration. SHVC decoder complexity is analyzed with profiling information. The decoding process at each layer and the up-sampling process are designed in parallel and scheduled by a high level application task manager. Within each layer, multi-threaded decoding is applied to accelerate the layer decoding speed. Entropy decoding, reconstruction, and in-loop processing are pipeline designed with multiple threads based on groups of coding tree units (CTU). A group of CTUs is treated as a processing unit in each pipeline stage to achieve a better trade-off between parallelism and synchronization. Motion compensation, inverse quantization, and inverse transform modules are further optimized with SSE4 SIMD instructions. Simulations on a desktop with an Intel i7 processor 2600 running at 3.4 GHz show that the parallel SHVC software decoder is able to decode 1080p spatial 2x at up to 60 fps (frames per second) and 1080p spatial 1.5x at up to 50 fps for those bitstreams generated with SHVC common test conditions in the JCT-VC standardization group. The decoding performance at various bitrates with different optimization technologies and different numbers of threads are compared in terms of decoding speed and resource usage, including processor and memory.
The next generation video coding standard, High Efficiency Video Coding (HEVC), is under development by the
Joint Collaborative Team on Video Coding (JCT-VC) of the ITU-T VCEG and the ISO/IEC MPEG. As the first
version of single-layer HEVC standard comes close to completion, there is a great interest to extend the standard
with scalable capabilities. In this paper, an inter-layer Motion Field Mapping (MFM) algorithm is proposed for the
scalable extension of HEVC to generate the motion field of inter-layer reference pictures, such that the correlation
between the motion vectors (MVs) of base-layer and enhancement-layer can be exploited. Moreover, as the
proposed method does not change any block-level operation, the existing single-layer encoder and decoder logic of
HEVC can be directly applied without modification of motion vector prediction for the enhancement-layer. The
experimental results show the effectiveness of the proposed MFM method in improving the performance of
enhancement-layer motion prediction in scalable HEVC.