The timely detection of terrain drop-offs is critical for safe and efficient off-road mobility, whether with human drivers
or with terrain navigation systems that use autonomous machine-vision. In this paper, we propose a joint tracking and
detection machine-vision approach for accurate and efficient terrain drop-off detection and localization. We formulate
the problem using a hyperstereo camera system and build an elevation map using the range map obtained from a stereo
algorithm. A terrain drop-off is then detected with the use of optimal drop-off detection filters applied to the range
map. For more robust results, a method based on multi-frame fusion of terrain drop-off evidence is proposed. Also
presented is a fast, direct method that does not employ stereo disparity mapping. We compared our algorithm's
detection of terrain drop-offs with time-code data from human observers viewing the same video clips in stereoscopic
3D. The algorithm detected terrain drop-offs an average of 9 seconds sooner, or 12m farther, than the human
observers. This suggests that passive image-based hyperstereo machine-vision may be useful as an early warning
system for off-road mobility.
This paper discusses the depth acuity research conducted in support of the development of a Modular Multi-Spectral Stereoscopic (M2S2) night vision goggle (NVG), a customizable goggle that lets the user select one of five goggle configurations: monocular thermal, monocular image intensifier (I2), binocular I2, binocular thermal, and binocular dual-waveband (thermal imagery to one eye and I2 imagery to the other eye). The motives for the development of this type of customizable goggle were (1) the need for an NVG that allows the simultaneous use of two wavebands, (2) the need for an alternative sensor fusion method to avoid the potential image degradation that may accompany digitally fused images, (3) a requirement to provide the observer with stereoscopic, dual spectrum views of a scene, and (4) the need to handle individual user preferences for sensor types and ocular configurations employed in various military operations. Among the increases in functionality that the user will have with this system is the ability to convert from a binocular I2 device (needed for detailed terrain analysis during off-road mobility) to a monocular thermal device (for increased situational awareness in the unaided eye during nights with full moon illumination). Results of the present research revealed potential depth acuity advantages that may apply to off-road terrain hazard detection for the binocular thermal configuration. The results also indicated that additional studies are needed to address ways to minimize binocular incompatibility for the dual waveband configuration.
Military missions often require drivers to maneuver across hazardous, off-road terrain using visual displays rather than direct vision. When soldiers use 2D displays, significantly more mobility errors occur than when soldiers use 3D displays that provide a stereoscopic view of the terrain. The purposes of the present experiment were to quantify the visual forewarning of a drop-off provided by a stereoscopic 3D display compared to a 2D display, and to measure the potential of increased camera separation (i.e., hyperstereo) for enhancing the benefit of 3D for the detection of terrain drop-offs. This experiment consisted of four viewing conditions: 0X (the 2D condition), 1X (stereo with the normal interpupillary distance [IPD] between the viewpoints provided to the two eyes), 2X (stereo with twice the normal IPD), and 3X (stereo with three times the normal IPD). Thirty-two participants viewed 80 video clips, each clip depicting an approach to a terrain drop-off as would be seen in a daytime driving situation. As soon as the drop-off became apparent, he or she pressed a brake pedal. As expected, the average detection time for drop-offs viewed with 1X (stereo display) was significantly better than when drop-offs were viewed with the 0X (2D) display. The failure to observe further improvements in task performance with 2X and 3X IPD suggests follow-on research to determine whether these unexpected hyperstereo results may be attributable to adverse side effects of hyperstereo: increased mismatch between accommodation and convergence, the minification effect, and increased stereoscopic “frame violation.”
The research was designed to determine if night vision goggles (NVGs) with white phosphor displays would enable better object recognition and contrast sensitivity than with green phosphor displays. Thirty-six Maryland National Guard members served in the research. Each performed an object recognition task and a contrast sensitivity task using two binocular NVGs (AN/AVS-9, Model F4949G) that were matched in all respects except that one was fitted with white phosphor displays (P- 45W), while the other had the yellowish green phosphor displays (P-43). Results indicated an overall advantage for the white phosphor for object recognition, but no difference between the phosphors for contrast sensitivity. Questionnaire data indicated a strong preference for the white phosphor NVG.
Because different imaging sensors provide different signature cues to distinguish targets from backgrounds there has been a substantial amount of effort put into how to merge the information from different sensors. Unfortunately, when the imagery from two different sensors is combined the noise from each sensor is also combined in the resultant image. Additionally, attempts to enhance the target distinctness from the background also enhance the distinctness of false targets and clutter. Even so there has been some progress in trying to mimic the human vision capability by color contrast enhancement. But what has not been tried is how to mimic how the human vision system inherently does this sensor function of our color cone sensors. We do our sensor fusion in the pre- attentive phase of human vision. This requires the use of binocular stereo vision because we do have two eyes. In human vision the images from each eye are split in half, and the halves are sent to opposite sides of the brain for massively parallel processing. We don't know exactly how this process works, but the results is a visualization of the world that is 3D in nature. This process automatically combines the color, texture, and size and shape of the objects that make up the two images that our eyes produce. It significantly reduces noise and clutter in our visualization of the world. In this pre-attentive phase of human vision tha takes just an instant to accomplish, our human vision process has performed an extremely efficient fusion of cone imagery. This sensor fusion process has produced a scene where depth perception and surface contour cues are used to orient and distinguish objects in the scene before us. It is at this stage that we begin to attentively sort through the scene for objects or targets of interest. In many cases, however, the targets of interest have already been located because of their depth or surface contour cues. Camouflaged targets that blend perfectly into complex backgrounds may be made to pop out because of their depth cues. In this paper we will describe a new method termed RGB stereo sensor fusion that uses color coding of the separate pairs of sensor images fused to produce wide baseline stereo images that are displayed to observers for search and target acquisition. Performance enhancements for the technique are given as well as rationale for optimum color code selection. One important finding was that different colors (RGB) and different spatial frequencies are fused with different efficiencies by the binocular vision system.
Navigation, especially in aviation, has been plagued since its inception with the hazards of poor visibility conditions. Our ground vehicles and soldiers have difficulty moving at night or in low visibility even with night vision augmentation because of the lack of contrast and depth perception. Trying to land an aircraft in fog is more difficult yet, even with radar tracking. The visible and near-infrared spectral regions have been ignored because of the problem with backscattered radiation from landing light illumination similar to that experienced when using high beam headlights when driving in fog. This paper describes the experimentation related to the development of a visible/near-infrared active hyperstereo vision system for landing an aircraft in fog. Hyperstereo vision is a binocular system with baseline separation wider than the human interocular spacing. The basic concept is to compare the imagery obtained from alternate wings of the aircraft while illuminating only from the opposite wing. This produces images with a backscatter radiation pattern that has a decreasing gradient away from the side with the illumination source. Flipping the imagery from one wing left to right and comparing it to the opposite wing imagery allows the backscattered radiation pattern to be subtracted from both sets of imagery. The use of retro-reflectors along the sides of the runway allows the human stereo fusion process to fuse the forward scatter blurred hyperstereo imagery of the array of retro-reflectors while minimizing backscatter. The appropriate amount of inverse point spread function deblurring is applied for improved resolution of scene content to aid in detection of objects on the runway. The experimental system is described and preliminary results are presented to illustrate the concept.
Navigation, especially in aviation, has been plagued since its inception with the hazards ofpoor visibility conditions. Vehicular ground movement is also hampered at night or in low visibility even with night vision augmentation because of the lack of contrast and depth perception. For landing aircraft in fog, the visible and near-infrared have been discounted because of the large backscatter coefficients in favor of primarily radar that penetrates waterladen atmospheres. Aircraft outfitted with an Instrumentation Landing System (ILS) can land safely on an aircraft carrier in fog. Landing at an airport with an ILS is not safe because there is no way to detect small-scale obstacles that do not show up on radar but can cause a landing crash. We have developed and tested a technique to improve navigation through fog based on chopped active visible laser illumination and wide baseline stereo (hyperstereo) viewing with real-time image correction of backscatter radiation and forward scattering blur. The basis of the approach to developing this active hyperstereo vision system for landing aircraft in fog is outlined in the invention disclosure ofthe Army Research Laboratory (ARL) patent application ARL-97-72, filed Dec. 1997. Testing this concept required a matched pair of laser illuminators and cameras with synchronized choppers, a computer for near real-time acquisition and analysis of the hyperstereo imagery with ancillary stereo display goggles, a set of specular reflectors, and a fog generator/characterizer. The basic concept of active hyperstereo vision is to compare the imagery obtained from alternate wings ofthe aircraft while illuminating only from the opposite wing. This produces images with a backscatter radiation pattern that has an increasing gradient towards the side with the illumination source. Flipping the imagery from one wing left to right and comparing it to the opposite wing imagery will allow the backscattered radiation pattern to be subtracted from both sets of imagery. Use of specular reflectors along the sides of the runway will allow the human stereo fusion process to fuse the forward scatter blurred hyperstereo imagery of the array of specular reflectors with backscatter eliminated and allow the appropriate amount of inverse point spread function deblurring to be applied for optimum resolution of scene content (i.e., obstacles on the runway). Results of this testing will be shown.
Off-road mobility at night is a critical factor in modern military operations. Soldiers traversing off-road terrain, both on foot and in combat vehicles, often use 2D viewing devices (such as a driver's thermal viewer, or biocular or monocular night-vision goggles) for tactical mobility under low-light conditions. Perceptual errors can occur when 2D displays fail to convey adequately the contours of terrain. Some off-road driving accidents have been attributed to inadequate perception of terrain features due to using 2D displays (which do not provide binocular-parallax cues to depth perception). In this study, photographic images of terrain scenes were presented first in conventional 2D video, and then in stereoscopic 3D video. The percentage of possible correct answers for 2D and 3D were: 2D pretest equals 52%, 3D pretest equals 80%, 2D posttest equals 48%, 3D posttest equals 78%. Other recent studies conducted at the US Army Research Laboratory's Human Research and Engineering Directorate also show that stereoscopic 3D displays can significantly improve visual evaluation of terrain features, and thus may improve the safety and effectiveness of military off-road mobility operation, both on foot and in combat vehicles.
It has been shown that people consistently underestimate distances between objects in the depth direction as compared to the lateral direction. This study examined the use of artificially enhanced stereopsis (hyperstereopsis) in judging relative distances. The data showed that doubling interocular distance by means of a telestereoscope reduced the illusory compression of depth: subjects who viewed the scene without the telestereoscope averaged a depth compression of 0.28. Subjects who used the telestereoscope yielded an average compression of 0.40. Individual verbal self-reports of depth compression effects were unreliable, pointing out the value of quantitative experimental methods.
This study explores the research hypothesis that perceptual learning could occur with visual exposure to a repeating alternation between 2-D vs 3-D video images of terrain hazards typically encountered in off-road driving of ground vehicles. In individual sessions, each of the nine untrained test subjects was shown 20 off-road terrain-hazard scenes on a color video display. Each hazard was shown first in 2-D mode, then in 3-D mode, and then with 2-D/3-D mode alternating on the video screen. In 2-D mode, only one of the 20 terrain hazards was perceived by two of the nine subjects, whereas all 20 terrain hazards were immediately perceived by all subjects when the display switched over to 3-D mode. A post-test presented mirror-image versions of the same 20 hazards in 2-D only, to determine if the previous 2-D/3- D alternation treatment improved the ability to detect terrain hazards in 2-D mode. At the end of their session, test subjects were given a questionnaire asking them to rate the degree of perceptual training resulting from 2-D/3-D alternation. All 20 subjects reported that the 2-D/3- D alternation improved their sensitivity to the monocular cues of terrain hazards presented on a 2-D video display. The implications that can be drawn from this preliminary study are: (1) in off-road driving by means of a conventional 2-D video display, operators will fail to perceive many significant terrain hazards; (2) however, with a 3-D video display, operators will immediately perceive most terrain hazards and will interpret terrain contours easily and accurately; (3) a more extensive experiment is indicated to formally determine the extent of the perceptual training that can be obtained by 2-D/3-D alternation.