Mixed reality training systems using Head Mounted Displays (HMDs) require very high precision knowledge of the 3D
location and 3D orientation of the user's head. This is required by the system to know where to insert the synthetic
actors and objects in the HMD. The inserted objects must appear stable and not jitter or drift. Moreover latency of less
than 5 milliseconds for pose estimation is required for lag-free see-through HMD operation. We describe how to achieve
this performance using a multi-camera based visual navigation system mounted on the HMD. A Kalman filter is used to
integrate high rate estimates from an IMU with a visual odometry system and to predict head motion. Landmark
matching and GPS when available are used to correct any drifts.
This paper describes a system for automatically detecting potential targets (that pop-up or move into view)
and to cue the operator to potential threats. Detection of independently moving targets from a moving ground
vehicle is challenging due to the strong parallax effects caused by the camera motion close to the 3D structure in
the environment. We present a 3D approach for detecting and tracking such independently moving targets with
multiple monocular cameras. In our approach, we first recover the camera position and orientation by employing a
visual odometry method. Next, using multiple consecutive frames with the estimated camera poses, the structure
of the scene at the reference frame is explicitly recovered by a motion stereo approach, and corresponding optical
flow fields between the reference frame and other frames are also estimated. Third, an advanced filter is designed
by combining second order differences between 3D warping and optical flow warping to distinguish the moving
object from parallax regions. We present results of the algorithm on data collected with an eight-camera system
mounted on a vehicle under multiple scenarios that include moving and pop-up targets.
Traditional vision-based navigation system often drifts over time during navigation. In this paper, we propose
a set of techniques which greatly reduce the long term drift and also improve its robustness to many failure
conditions. In our approach, two pairs of stereo cameras are integrated to form a forward/backward multi-stereo
camera system. As a result, the Field-Of-View of the system is extended significantly to capture more
natural landmarks from the scene. This helps to increase the pose estimation accuracy as well as reduce the
failure situations. Secondly, a global landmark matching technique is used to recognize the previously visited
locations during navigation. Using the matched landmarks, a pose correction technique is used to eliminate the
accumulated navigation drift. Finally, in order to further improve the robustness of the system, measurements
from low-cost Inertial Measurement Unit (IMU) and Global Positioning System (GPS) sensors are integrated
with the visual odometry in an extended Kalman Filtering framework. Our system is significantly more accurate
and robust than previously published techniques (1∼5% localization error) over long-distance navigation both
indoors and outdoors. Real world experiments on a human worn system show that the location can be estimated
within 1 meter over 500 meters (around 0.1% localization error averagely) without the use of GPS information.
In order to train war fighters for urban warfare, live exercises are held at various Military Operations on Urban Terrain
(MOUT) facilities. Commanders need to have situation awareness (SA) of the entire mock battlefield, and also the
individual actions of the various war fighters. The commanders must be able to provide instant feedback and play
through different actions and 'what-if' scenarios with the war fighters. The war fighters in their turn should be able to
review their actions and rehearse different maneuvers.
In this paper, we describe the technologies behind a prototype training system, which tracks war fighters around an
urban site using a combination of ultra-wideband (UWB) Radio Frequency Identification (RFID) and smart video
based tracking. The system is able to: (1) Tag each individual with an unique ID using an RFID system, (2) Track and
locate individuals within the domain of interest, (3) Associate IDs with visual appearance derived from live videos, (4)
Visualize movement and actions of individuals within the context of a 3D model, and (5) Store and review activities
with (x,y,ID) information associated with each individual.
Dynamic acquisition and recording of the precise location of individual troops and units during training greatly aids the
analysis of the training sessions allowing improved review, critique and instruction.
In this paper, we present solutions for tracking the 3D pose (location and orientation) of robot or vehicle
undergoing general motion (6 degrees of freedom, rotation and translation) based on video streams captured
by distributed aperture passive sensor system. A novel algorithm for multi-camera visual odometry is
described. Previous published methods for visual odometry have used video streams from 1, 2 or 3 cameras
in a monocular binocular or trinocular configurations. In this paper, we present general methods and results
for visual odometry for a fixed configuration or known configuration of an arbitrary number of cameras. The
images from the different cameras may have no overlap what so ever. The relative pose and configuration of
the cameras comprising the distributed aperture system is assumed to be pre-calibrated and known at any
time instant. We demonstrate that we can very accurately and robustly track the vehicle pose using the
distributed aperture system.
High-resolution 3D imaging ladar systems can penetrate foliage and camouflage to sample fragments of concealed surfaces of interest. Samples collected while the ladar moves can be integrated into a coherent object shape, provided that sensor poses are known. We detail a system for automatic data-driven registration of ladar frames, consisting of a coarse search stage, a pairwise fine registration stage using an iterated closest points algorithm, and a multi-view registration strategy. We evaluate this approach using simulated and field-collected ladar imagery of foliage-occluded objects. Even after alignment and aggregation, it is often difficult for human observers to find, assess, and recognize objects from a point cloud display. We survey and demonstrate basic display manipulations, surface fitting techniques, and clutter suppression to enhance visual exploitation of 3D imaging ladar data.
In this paper, we present the Video Flashlight System and 3D Visualization Display for providing total battlefield situation awareness by integrating a blanket of ground and aerial video cameras, and UGS data within a 3D model of the site. The system enables visualization of an integrated view of a scene, combining video and sensor data from multiple cameras and UGS. Users can move seamlessly in space -- monitor a site from an aerial view and fly-down to examine suspicious activity up close. The system detects moving objects from all cameras and provides an integrated view of all motion in the monitored zones. Users can click on a moving object to get a zoomed-in view or updated data from the sensor. The aerial and ground videos are geo-registered to a world coordinate system. GPS-located UGS data is correctly positioned on the 3D display. Finally, the system can be used to track multiple objects from camera to camera and make measurements such as velocity, etc., and to fuse data from other emplaced sensors into the displayed scene.
In a typical security and monitoring system a large number of networked cameras are installed at fixed positions around a site under surveillance. There is generally no global view or map that shows the guard how the views of different cameras relate to one another. Individual cameras may be equipped with pan, tilt and zoom capabilities, and the guard may be able to follow an intruder with one camera, then pick him up with another. But such tracking can be difficult, and hand off between cameras disorienting. The guard does not have the ability to continually shift his viewpoint. More over current systems do not scale up with the number of cameras. The system becomes more unwieldy as cameras are added to the system. In this paper, we will present the system and key algorithms for remote immersive monitoring of an urban site using a blanket of video cameras. The guard monitors the world using a live 3D model, which is constantly being updated from different directions using the multiple video streams. The world can be monitored remotely from any virtual viewpoint. The observer can see the entire scene from far and get a bird's eye view or can fly/zoom in and see activity of interest up close. A 3D-site model is constructed of the urban site and used as glue for combining the multiple video streams. Moreover each of the video cameras has smart image processing associated with it, which allows it to detect moving and new objects in the scene and recover their 3D geometry and pose of the camera with respect to the world model. Each video stream is overlaid on top of the video model using the recovered pose. Virtual views of the scene are generated by combining the various video streams, the background 3D model and the recovered 3D geometry of foreground objects. These moving objects are highlighted on the 3D model and used as a cue by the operator to direct his viewpoint.
In this paper we present 3D registration techniques for detecting breast cancer lesions based on differences between the high spatial resolution 'pre' and 'post' contrast administration MR breast scans. We also present registration techniques for detecting lesions based on the time course of enhancement after administration of the contrast agent based on changes in the low spatial resolution, high temporal resolution, dynamic MR data taken during the absorption of the contrast agent. Alignment of the 'pre' and 'post' MR data and of the dynamic MR images is done by a direct optimization technique to estimate a global affine deformation model using a coarse-to-fine control strategy over a 3D pyramid. This global model is followed by a one iteration flow estimation to account for any local non-rigidity. We present results of the registration process and visualization of the difference volumes obtained after alignment.
In this paper we present 3D registration techniques for detecting breast cancer lesions based on differences between the high spatial resolution 'pre' and 'post' contrast administration MR breast scans. We also present registration techniques for detecting lesions based on the time course of enhancement after administration of the contrast agent based on changes in the low spatial resolution, high temporal resolution, dynamic MR data taken during the absorption of the contrast agent. Alignment of the 'pre' and 'post' MR data and of the dynamic MR images is done by a direct optimization technique to estimate a global affine deformation model using a coarse-to-fine control strategy over a 3D pyramid. This global model is followed by a one iteration flow estimation to account for any local non-rigidity. We present results of the registration process and visualization of the different volumes obtained after alignment.
Least-squares and robust methods were presented for determining the location and orientation of a mobile robot from visual measurements of modeled 3-D landmarks. However, building the 3-D landmark models is a time consuming and tedious process. For landmark-based navigation methods to be widely applicable, automatic methods have to be developed to build new 3-D models and enhance the existing models. Ideally, a robot would continuously build and update its world model as it explores the environment. This paper presents techniques to determine the 3-D location of image features from a sequence of monocular 2-D images captured by a camera mounted on the robot. The approach adopted here is to first build a partial model (possibly noisy) either manually, by stereo, or by tracking and reconstructing shallow structures over a sequence of images using the constraint of affine trackability. This model is subsequently used to compute the pose that relates the model coordinate system and the camera coordinate system of the image frames in the sequence. The unmodeled 3-D features (those not already in the model) are tracked over the image sequence and their 3-D locations recovered by a pseudo-triangulation process, a form of `induced stereo.' The triangulation process is also used to make new 3-D measurements of the initial model points. These measurements are then fused with the previous estimates to refine the set of initial model points.