In recent years, we have observed the advent of plenoptic modalities such as light fields, point clouds and holography in many devices and applications. Besides plenty of technical challenges brought by these new modalities, a particular challenge is arising at the horizon, namely providing interoperability between these devices and applications, and – in addition – at a cross-modality level. Based on these observations the JPEG committee (ISO/IEC JTC1/SC29/WG1 and ITU-T SG16) has initiated a new standardization initiative – JPEG Pleno – that is intended to define an efficient framework addressing the above interoperability issues. In this paper, an overview is provided about its current status and future plans.
Currently under development high fidelity and interactive full parallax light field displays have unique and challenging requirements due to human factors and space constraints imposed on them. A high fidelity light field display with no vergence accommodation conflict and desktop size foot print implies a display with tens of gigapixels and pixel pitch in the range of 10 microns or below. Achieving interactive image and video display performance on these types of displays requires a fundamental redesign of the display input interface and image processing pipeline. In this paper, we discuss various ways of addressing these issues with light field compression and display system design innovations.
Ostendo’s Quantum Photonic Imager (QPI) is very small pixel pitch, emissive display with high brightness and low
power consumption. We used QPI’s to create a high performance light field display tiles with a very small form factor.
Using these light field display tiles various full parallax light field displays demonstrating small form factor, high
resolution and focus cues were created. In this paper, we will explain the design choices that were made in creating the
displays and their effects on the display performance. This paper details the system design approach including: hardware
design, software design, compression methods and human factors.
Full parallax light field displays require high pixel density and huge amounts of data. Compression is a necessary tool used by 3D display systems to cope with the high bandwidth requirements. One of the formats adopted by MPEG for 3D video coding standards is the use of multiple views with associated depth maps. Depth maps enable the coding of a reduced number of views, and are used by compression and synthesis software to reconstruct the light field. However, most of the developed coding and synthesis tools target linearly arranged cameras with small baselines. Here we propose to use the 3D video coding format for full parallax light field coding. We introduce a view selection method inspired by plenoptic sampling followed by transform-based view coding and view synthesis prediction to code residual views. We determine the minimal requirements for view sub-sampling and present the rate-distortion performance of our proposal. We also compare our method with established video compression techniques, such as H.264/AVC, H.264/MVC, and the new 3D video coding algorithm, 3DV-ATM. Our results show that our method not only has an improved rate-distortion performance, it also preserves the structure of the perceived light fields better.
We introduce our light field display simulation software that simulates the image observed by a viewer looking at a full parallax light field display. The simulation software uses the display parameters, viewer location and orientation, viewer pupil size and focus location to simulate the image observed by the viewer. This software has been used in simulating full parallax light field displays of various geometry and complexities as well as image processing and full parallax light field compression algorithms. The simulation results follow the real world observations very closely.
With the recent introduction of Ostendo’s Quantum Photonic Imager (QPI) display technology, a very small pixel pitch, emissive display with high brightness and low power consumption became available. We used QPI’s to create a high performance light field display tiles with a very small form factor. Using 8 of these QPI light field displays tiled in a 4x2 array we created a tiled full parallax light field display. Each individual light field display tile combines custom designed micro lens array layers with monochrome green QPIs. Each of the light field display tiles can address 1000 x 800 pixels placed under an array of 20 x 16 lenslets with 500 μm diameters. The light field display tiles are placed with small gaps to create a tiled display of approximately 46 mm (W) x 17 mm (H) x 2 mm (D) in mechanical dimensions. The prototype tiled full parallax light field display demonstrates small form factor, high resolution and focus cues.
Full-parallax light field displays utilize a large volume of data and demand efficient real-time compression algorithms to
be viable. Many compression techniques have been proposed. However, such solutions are impractical in bandwidth,
processing or power requirements for a real-time implementation. Our method exploits the spatio angular redundancy in
a full parallax light field to compress the light field image, while reducing the total computational load with minimal
perceptual degradation. Objective analysis shows that depending on content, bandwidth reduction from two to four
orders of magnitude is possible. Subjective analysis shows that the compression technique produces images with
acceptable quality, and the system can successfully reproduce the 3D light field, providing natural binocular and full
We describe a set of experiments that compare 2D CRT, shutter glasses and autostereoscopic displays; measure user preference for different tasks in different displays; measure the effect of previous user experience in the interaction performance for new tasks; and measure the effect of constraining the user's hand motion and hand-eye coordination. In this set of tests, we used interactive object selection and manipulation tasks using standard scalable configurations of 3D block objects. We also used a 3D depth matching test in which subjects are instructed to align two objects located next to each other on the display to the same depth plane. New subjects tested with hands out of field of view constraint performed more efficiently with glasses than with autostereoscopic displays, meaning they were able to match the objects with less movement. This constraint affected females more negatively than males. From the results of the depth test, we note that previous subjects on average performed better than the new subjects. Previous subjects had more correct results than the new subjects, and they finished the test faster than the new subjects. The depth test showed that glasses are preferred to autostereo displays in a task that involves only stereoscopic depth.
In this paper we describe experimental measurements and comparison of human interaction with three different types of stereo computer displays. We compare traditional shutter glasses-based viewing with three-dimensional (3D) autostereoscopic viewing on displays such as the Sharp LL-151-3D display and StereoGraphics SG 202 display. The method of interaction is a sphere-shaped “cyberprop” containing an Ascension Flock-of-Birds tracker that allows a user to manipulate objects by imparting the motion of the sphere to the virtual object. The tracking data is processed with OpenGL to manipulate objects in virtual 3D space, from which we synthesize two or more images as seen by virtual cameras observing them. We concentrate on the quantitative measurement and analysis of human performance for interactive object selection and manipulation tasks using standardized and scalable configurations of 3D block objects. The experiments use a series of progressively more complex block configurations that are rendered in stereo on various 3D displays. In general, performing the tasks using shutter glasses required less time as compared to using the autostereoscopic displays. While both male and female subjects performed almost equally fast with shutter glasses, male subjects performed better with the LL-151-3D display, while female subjects performed better with the SG202 display. Interestingly, users generally had a slightly higher efficiency in completing a task set using the two autostereoscopic displays as compared to the shutter glasses, although the differences for all users among the displays was relatively small. There was a preference for shutter glasses compared to autostereoscopic displays in the ease of performing tasks, and glasses were slightly preferred for overall image quality and stereo image quality. However, there was little difference in display preference in physical comfort and overall preference. We present some possible explanations of these results and point out the importance of the autostereoscopic "sweet spot" in relation to the user's head and body position.
We describe new techniques for interactive input and manipulation of three-dimensional data using a motion tracking system combined with an autostereoscopic display. Users interact with the system by means of video cameras that track a light source or a user's hand motions in space. We process this 3D tracking data with OpenGL to create or manipulate objects in virtual space. We then synthesize two to nine images as seen by virtual cameras observing the objects and interlace them to drive the autostereoscopic display. The light source is tracked within a separate interaction space, so that users interact with images appearing both inside and outside the display. With some displays that use nine images inside a viewing zone (such as the SG 202 autostereoscopic display from StereoGraphics), user head tracking is not necessary because there is a built-in left right look-around capability. With such multi-view autostereoscopic displays, more than one user can see the interaction at the same time and more than one person can interact with the display.
This research explores architectures and design principles for monolithic optoelectronic integrated circuits (OEICs) through the implementation of an optical multi-token-ring network testbed system. Monolithic smart pixel CMOS OEICs are of paramount importance to high performance networks, communication switches, computer interfaces, and parallel signal processing for demanding future multimedia applications. The general testbed system is called Reconfigurable Translucent Smart Pixel Array (R-Transpar) and includes a field programmable gate array (FPGA), a transimpedance receiver array, and an optoelectronic very large-scale integrated (OE-VLSI) smart pixel array. The FPGA is an Altera FLEX10K100E chip that performs logic functions and receives inputs from the transimpedance receiver array. A monolithic (OE-VLSI) smart pixel device containing an array of 4 X 4 vertical-cavity surface-emitting lasers (VCSELs) spatially interlaced with an array of 4 X 4 metal- semiconductor-metal (MSM) detectors connects to these devices and performs optical input-output functions. These components are mounted on a printed circuit board for testing and evaluation of integrated monolithic OEIC designs and various optical interconnection techniques. The system moves information between nodes by transferring 3-D optical packets in free space or through fiber image guides. The R-Transpar system is reconfigurable to test different network protocols and signal processing functions. In its operation as a 3-D multi-token-ring network, we use a specific version of the system called Transpar-Token-Ring (Transpar-TR) that uses novel time-division multiplexed (TDM) network node addressing to enhance channel utilization and throughput. Host computers interface with the system via a high-speed digital I/O board that sends commands for networking and application algorithm operations. We describe the system operation and experimental results in detail.
We present a networking and signal processing architecture called Transpar-TR (Translucent Smart Pixel Array-Token- Ring) that utilizes smart pixel technology to perform 2D parallel optical data transfer between digital processing nodes. Transpar-TR moves data through the network in the form of 3D packets (2D spatial and 1D time). By utilizing many spatial parallel channels, Transpar-TR can achieve high throughput, low latency communication between nodes, even with each channel operating at moderate data rates. The 2D array of optical channels is created by an array of smart pixels, each with an optical input and optical output. Each smart pixel consists of two sections, an optical network interface and ALU-based processor with local memory. The optical network interface is responsible for transmitting and receiving optical data packets using a slotted token ring network protocol. The smart pixel array operates as a single-instruction multiple-data processor when processing data. The Transpar-TR network, consisting of networked smart pixel arrays, can perform pipelined parallel processing very efficiently on 2D data structures such as images and video. This paper discusses the Transpar-TR implementation in which each node is the printed circuit board integration of a VCSEL-MSM chip, a transimpedance receiver array chip and an FPGA chip.