MPEG Omnidirectional Media Format (OMAF) specifies both a viewport-dependent video profile and a viewportdependent presentation profile to enable immersive media applications. A sub-picture-based approach for viewportdependent streaming is one of the main approaches being explored in MPEG OMAF standardization. This paper presents a sub-picture-based omnidirectional video live streaming platform with state of the art technologies integrated on both server and client sides to illustrate the benefits of such viewport-dependent omnidirectional video streaming approach. The technologies include omnidirectional video acquisition, sub-picture partitioning, real-time GPUaccelerated HEVC encoding and DASH-based live streaming. The presented platform supports virtual reality (VR) clients including both VR head-mounted displays (HMDs) and conventional 2D displays. Viewing orientation tracking and realtime viewport extraction are also supported in our platform. As with all live streaming platforms, one of the main goals of our platform is to minimize end-to-end system latency. A new metric called Comparable-Quality Viewport Switching (CQVS) latency is proposed to evaluate the performance of viewport dependent video streaming and presentation. The CQVS latency is defined as the amount of time it takes for the viewport video quality to improve to a level comparable to that prior to viewport switching. The platform was demonstrated in the Joint 3GPP and VRIF workshop on VR and the 2018 Mobile World Congress as one of the first OMAF-compliant viewport-dependent live streaming solutions.
360° video is an emerging new format in the media industry enabled by the growing availability of virtual reality devices. It provides the viewer a new sense of presence and immersion. Compared to conventional rectilinear video (2D or 3D), 360° video poses a new and difficult set of engineering challenges on video processing and delivery. Enabling comfortable and immersive user experience requires very high video quality and very low latency, while the large video file size poses a challenge to delivering 360° video in a quality manner at scale. Conventionally, 360° video represented in equirectangular or other projection formats can be encoded as a single standards-compliant bitstream using existing video codecs such as H.264/AVC or H.265/HEVC. Such method usually needs very high bandwidth to provide an immersive user experience. While at the client side, much of such high bandwidth and the computational power used to decode the video are wasted because the user only watches a small portion (i.e., viewport) of the entire picture. Viewport dependent 360°video processing and delivery approaches spend more bandwidth on the viewport than on non-viewports and are therefore able to reduce the overall transmission bandwidth. This paper proposes a dual buffer segment scheduling algorithm for viewport adaptive streaming methods to reduce latency when switching between high quality viewports in 360° video streaming. The approach decouples the scheduling of viewport segments and non-viewport segments to ensure the viewport segment requested matches the latest user head orientation. A base layer buffer stores all lower quality segments, and a viewport buffer stores high quality viewport segments corresponding to the most recent viewer’s head orientation. The scheduling scheme determines viewport requesting time based on the buffer status and the head orientation. This paper also discusses how to deploy the proposed scheduling design for various viewport adaptive video streaming methods. The proposed dual buffer segment scheduling method is implemented in an end-to-end tile based 360° viewports adaptive video streaming platform, where the entire 360° video is divided into a number of tiles, and each tile is independently encoded into multiple quality level representations. The client requests different quality level representations of each tile based on the viewer’s head orientation and the available bandwidth, and then composes all tiles together for rendering. The simulation results verify that the proposed dual buffer segment scheduling algorithm reduces the viewport switch latency, and utilizes available bandwidth more efficiently. As a result, a more consistent immersive 360° video viewing experience can be presented to the user.
This paper proposes a layer based buffer aware rate adaptation design which is able to avoid abrupt video quality
fluctuation, reduce re-buffering latency and improve bandwidth utilization when compared to a conventional simulcast
based adaptive streaming system. The proposed adaptation design schedules DASH segment requests based on the
estimated bandwidth, dependencies among video layers and layer buffer fullness.
Scalable HEVC video coding is the latest state-of-art video coding technique that can alleviate various issues caused by
simulcast based adaptive video streaming. With scalable coded video streams, the video is encoded once into a number
of layers representing different qualities and/or resolutions: a base layer (BL) and one or more enhancement layers (EL),
each incrementally enhancing the quality of the lower layers. Such layer based coding structure allows fine granularity
rate adaptation for the video streaming applications.
Two video streaming use cases are presented in this paper. The first use case is to stream HD SHVC video over a
wireless network where available bandwidth varies, and the performance comparison between proposed layer-based
streaming approach and conventional simulcast streaming approach is provided. The second use case is to stream
4K/UHD SHVC video over a hybrid access network that consists of a 5G millimeter wave high-speed wireless link and a
conventional wired or WiFi network. The simulation results verify that the proposed layer based rate adaptation
approach is able to utilize the bandwidth more efficiently. As a result, a more consistent viewing experience with higher
quality video content and minimal video quality fluctuations can be presented to the user.
This paper proposes an efficient Scalable High efficiency Video Coding (SHVC) to High Efficiency Video Coding (HEVC) transcoder, which can reduce the transcoding complexity significantly, and provide a desired trade-off between the transcoding complexity and the transcoded video quality. To reduce the transcoding complexity, some of coding information, such as coding unit (CU) depth, prediction mode, merge mode, motion vector information, intra direction information and transform unit (TU) depth information, in the SHVC bitstream are mapped and transcoded to single layer HEVC bitstream. One major difficulty in transcoding arises when trying to reuse the motion information from SHVC bitstream since motion vectors referring to inter-layer reference (ILR) pictures cannot be reused directly in transcoding. Reusing motion information obtained from ILR pictures for those prediction units (PUs) will reduce the complexity of the SHVC transcoder greatly but a significant reduction in the quality of the picture is observed. Pictures corresponding to the intra refresh pictures in the base layer (BL) will be coded as P pictures in enhancement layer (EL) in the SHVC bitstream; and directly reusing the intra information from the BL for transcoding will not get a good coding efficiency. To solve these problems, various transcoding technologies are proposed. The proposed technologies offer different trade-offs between transcoding speed and transcoding quality. They are implemented on the basis of reference software SHM-6.0 and HM-14.0 for the two layer spatial scalability configuration. Simulations show that the proposed SHVC software transcoder reduces the transcoding complexity by up to 98-99% using low complexity transcoding mode when compared with cascaded re-encoding method. The transcoder performance at various bitrates with different transcoding modes are compared in terms of transcoding speed and transcoded video quality.
This paper proposes a parallel decoding framework for scalable HEVC (SHVC). Various optimization technologies are implemented on the basis of SHVC reference software SHM-2.0 to achieve real-time decoding speed for the two layer spatial scalability configuration. SHVC decoder complexity is analyzed with profiling information. The decoding process at each layer and the up-sampling process are designed in parallel and scheduled by a high level application task manager. Within each layer, multi-threaded decoding is applied to accelerate the layer decoding speed. Entropy decoding, reconstruction, and in-loop processing are pipeline designed with multiple threads based on groups of coding tree units (CTU). A group of CTUs is treated as a processing unit in each pipeline stage to achieve a better trade-off between parallelism and synchronization. Motion compensation, inverse quantization, and inverse transform modules are further optimized with SSE4 SIMD instructions. Simulations on a desktop with an Intel i7 processor 2600 running at 3.4 GHz show that the parallel SHVC software decoder is able to decode 1080p spatial 2x at up to 60 fps (frames per second) and 1080p spatial 1.5x at up to 50 fps for those bitstreams generated with SHVC common test conditions in the JCT-VC standardization group. The decoding performance at various bitrates with different optimization technologies and different numbers of threads are compared in terms of decoding speed and resource usage, including processor and memory.