Pan-tilt-zoom (PTZ) cameras are frequently used in surveillance applications as they can observe a much larger region of
the environment than a fixed-lens camera while still providing high-resolution imagery. The pan, tilt, and zoom
parameters of a single camera may be simultaneously controlled by online users as well as automated surveillance
applications. To accurately register autonomously tracked objects to a world model, the surveillance system requires
accurate knowledge of camera parameters. Due to imprecision in the PTZ mechanism, these parameters cannot be
obtained from PTZ control commands but must be calculated directly from camera imagery. This paper describes the
efforts undertaken to implement a real-time calibration system for a stationary PTZ camera. The approach continuously
tracks distinctive image feature points from frame to frame, and from these correspondences, robustly calculates the
homography transformation between frames. Camera internal parameters are then calculated from these homographies.
The calculations are performed by a self contained program that continually monitors images collected by the camera as
it performs pan, tilt, and zoom operations. The accuracy of the calculated calibration parameters are compared to ground
truth data. Problems encountered include inaccuracies in large orientation changes and long algorithm execution time.
Recently, there has been an increasing interest in using panoramic images in surveillance and target tracking
applications. With the wide availability of off-the-shelf web-based pan-tilt-zoom (PTZ) cameras and the advances of
CPUs and GPUs, object tracking using mosaicked images that cover a scene of 360° in near real-time has become a
reality. This paper presents a system that automatically constructs and maps full view panoramic mosaics to a cube-map
from images captured from an active PTZ camera with 1-25x optical zoom. A hierarchical approach is used in storing
and mosaicking multi-resolution images captured from a PTZ camera. Techniques based on scale-invariant local features
and probabilistic models for verification are used in the mosaicking process. Our algorithm is automatic and robust in
mapping each incoming image to one of the six faces of a cube with no prior knowledge of the scene structure. This
work can be easily integrated to a surveillance system that wishes to track moving objects in its 360° surrounding.
Large gains have been made in the automation of moving object detection and tracking. As these technologies continue to mature, the size of the field of regard and the range of tracked objects continue to increase. The use of a pan-tilt-zoom (PTZ) camera enables a surveillance system to observe a nearly 360° field of regard and track objects over a wide range of distances. However, use of a PTZ camera also presents a number of challenges. The first challenge is to determine how to optimally control the pan, tilt, and zoom parameters of the camera. The second challenge is to detect moving objects in imagery whose orientation and spatial resolution may vary on a frame-by-frame basis. This paper does not address the first issue, it is assumed that the camera parameters are controlled by either an operator or by an automated control process. We address only the problem of how to detect moving objects in imagery whose orientation and spatial resolution may vary on a frame-by-frame basis.
We describe a system for detection and tracking of moving objects using a PTZ camera whose parameters are not under our control. A previously published background subtraction algorithm is extended to handle arbitrary camera rotation and zoom changes. This is accomplished by dynamically learning 360°, multi-resolution, background models of the scene. The background models are represented as mosaics on 3D cubes. Tracking of local scale-invariant distinctive image features allows the determination of the camera parameters and the mapping from the current image to the mosaic cube. We describe the real-time implementation of the system and evaluate its performance on a variety of PTZ camera data.
Proc. SPIE. 6978, Visual Information Processing XVII
KEYWORDS: Image processing algorithms and systems, Lithium, Cell phones, Visual process modeling, Cameras, Image segmentation, 3D modeling, Environmental sensing, Global Positioning System, 3D image processing
We describe an approach to automatically detect building facades in images of urban environments. This is an important
problem in vision-based navigation, landmark recognition, and surveillance applications. In particular, with the proliferation
of GPS- and camera-enabled cell phones, a backup geolocation system is needed when GPS satellite signals are
blocked in so-called "urban canyons."
Image line segments are first located, and then the vanishing points of these segments are determined using the RANSAC
robust estimation algorithm. Next, the intersections of line segments associated with pairs of vanishing points are used
to generate local support for planar facades at different orientations. The plane support points are then clustered using an
algorithm that requires no knowledge of the number of clusters or of their spatial proximity. Finally, building facades are
identified by fitting vanishing point-aligned quadrilaterals to the clustered support points. Our experiments show good performance
in a number of complex urban environments. The main contribution of our approach is its improved performance
over existing approaches while placing no constraints on the facades in terms of their number or orientation, and minimal
constraints on the length of the detected line segments.
Implementation of an intelligent, automated target acquisition and tracking systems alleviates the need for operators to monitor video continuously. This system could identify situations that fatigued operators could easily miss.
If an automated acquisition and tracking system plans motions to maximize a coverage metric, how does the
performance of that system change when the user intervenes and manually moves the camera? How can the
operator give input to the system about what is important and understand how that relates to the overall task
balance between surveillance and coverage?
In this paper, we address these issues by introducing a new formulation of the average linear uncovered length
(ALUL) metric, specially designed for use in surveilling urban environments. This metric coordinates the often
competing goals of acquiring new targets and tracking existing targets. In addition, it provides current system
performance feedback to system users in terms of the system's theoretical maximum and minimum performance.
We show the successful integration of the algorithm via simulation.
The goal of the Demo III Program for Experimental Unmanned Vehicles (XUVs) is to develop and integrate technologies to provide a vehicle platform with autonomous capabilities. The platform will allow for the integration of modular mission packages to allow it to serve multiple needs across the battlefield. For demonstration purposes, the primary mission package will perform the functions of Reconnaissance, Surveillance, and Target Acquisition (RSTA). The RSTA mission package will provide the capability to conduct RSTA functions while both stationary and on the move. It will include a variety of sensor technologies along with signal and image processing capabilities to perform the RSTA mission. The paper will describe goals for the Demo III RSTA mission package, discuss the types of sensors being considered for platform integration, and summarize the RSTA related aspects of the recently awarded integration contract. Processing and algorithm capabilities required for the XUV to perform RSTA in an autonomous fashion, such as aided target recognition, motion detection, motion detection on the move, etc. will also be discussed.
We have developed a robotic unmanned ground vehicle (UGV) that performs reconnaissance, surveillance, and target acquisition. THis vehicle has been used in a number of tactical training exercises at Fort Hood, TX with US Army scouts from the 1st Armored Cavalry Division. The UGV, built around a high- mobility, multi-purpose, wheeled vehicle, is designed to be supervised by an operator at a control station located 10 km or more away from the UGV. The UGV's real-time automatic target acquisition system uses an infrared sensor to automatically detect and track moving ground vehicles out to a range of 5 km. When commanded by the operator, the UGV will engage a particular target with a laser designator. The ATA system is the topic of this paper. We describe the requirements of the ATA system, the algorithms used, their implementation, and the system's performance.
Modern military surveillance systems typically include a number of different, independently adjustable sensors distributed throughout an environment to be monitored. These sensors should be configured so that their integrated outputs provide the optimal combination of probability of target detection and probability of false alarm. While it is desirable to optimize this measure of system performance, it is also desirable to minimize the enemy's ability to detect these sensors. These are conflicting goals. Each sensor can typically monitor only a small part of the environment and can sample only a small number of target discriminants. Because there are only a limited number of sensors available, sensor placement and configuration are critical to system performance. A system may use passive sensors to cue active sensors, or use low-resolution sensors to cue high-resolution sensors. All available information (properties of the sensors, properties of the environment being monitored, and known target locations and properties) should be used to determine an optimal sensor configuration. We call this the sensor cueing problem. This paper describes an algorithm that uses a heuristic search to efficiently solve the sensor cueing problem. The algorithm assumes that sensor locations are fixed in advance, but that other attributes (pointing direction, field of view, focus, etc.) may be adjusted to maximize system performance. Expected system performance is based on how well the group of sensors covers regions of the environment known to contain targets, as well as regions of the environment where targets are expected to appear. The algorithm's performance and possible extensions are described.