In an interactive multiview image navigation system, a user requests switches to adjacent views as he observes
the static 3D scene from different viewpoints. In response, the server transmits encoded data to enable client-
side decoding and rendering of the requested viewpoint images. It is clear that there exists correlation between
consecutive requested viewpoint images that can be exploited to lower transmission rate. In previous works, this
is done using a pixel-based synthesis and coding approach for view-switch along the x-dimension (horizontal camera
motion): given texture and depth maps of the previous view, texture pixels are individually shifted horizontally
to the newly requested view, each according to its disparity value, via depth-image-based rendering (DIBR).
Unknown pixels in the disoccluded region in the new view (pixels not visible in the previous view) are either
inpainted, or intra-coded and transmitted by server for reconstruction at decoder.
In this paper, to enable efficient view-switch along the z-dimension (camera motion into / out of the scene),
we propose an alternative layer-based synthesis and coding approach. Specifically, we first divide each multiview
image into depth layers, where adjacent pixels with similar depth values are grouped to the same layer. During
a view-switch into the scene, spatial region in a layer is enlarged via super-resolution, where the scale factor is
determined by the distance between the layer and the camera. On the other hand, during a view-switch out
of the scene, spatial region in a layer is shrunk via low-pass filtering and down-sampling. Due to high quality
reconstruction of depth layers in the new view via rescaling, coding and transmission of a depth layer in the new
view by server is necessary only in the rare case when the quality of layer-based reconstruction is poor, saving
transmission rate. Experiments show that our layer-based approach can reduce bit-rate by up to 35% compared
to previous pixel-based approach.