This PDF file contains the front matter associated with SPIE Proceedings Volume 9463, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and Conference Committee listing.
As video systems move from analog NTSC to HD digital video, new system topologies, new transport systems, compression effects and new data spaces must be considered. This paper will explore some of the tradeoffs and benefits of HD video. There are many new elements of specification when designing an HD video system. This paper will survey HD video and compare the terms between it and analog video. It will also uncover new issues that did not exist in analog video systems. For example, transport bandwidth requirements are in gigabits per second. Only 45 minutes of 1080p/60 uncompressed video requires a terabyte of storage. Compression techniques are used to address transport bandwidth and storage capacity limitations. Compression introduces real time latency between the source and destination video. Latencies range from 50 milliseconds to several seconds depending on the complexity of the scene and the bandwidth of the transport. Latencies impact human remote control, data collection and time stamping strategies. Latency affects the overlay of time critical measurements; compression threatens the legibility of any text overlay when made at the source. The paper will reveal that HD resolution is three dimensional defined as lines, pixels and pixel depth. There are a variety of sampling techniques that take advantage of the foibles of our physiology to reduce frame data sizes. Some are barely perceptible to the eye, some compromise image quality. These sampling techniques will be described.
Motion video analysis is a challenging task, especially in real-time applications. In most safety and security critical applications, a human observer is an obligatory part of the overall analysis system. Over the last years, substantial progress has been made in the development of automated image exploitation algorithms. Hence, we investigate how the benefits of automated video analysis can be integrated suitably into the current video exploitation systems. In this paper, a system design is introduced which strives to combine both the qualities of the human observer’s perception and the automated algorithms, thus aiming to improve the overall performance of a real-time video analysis system. The system design builds on prior work where we showed the benefits for the human observer by means of a user interface which utilizes the human visual focus of attention revealed by the eye gaze direction for interaction with the image exploitation system; eye tracker-based interaction allows much faster, more convenient, and equally precise moving target acquisition in video images than traditional computer mouse selection. The system design also builds on prior work we did on automated target detection, segmentation, and tracking algorithms. Beside the system design, a first pilot study is presented, where we investigated how the participants (all non-experts in video analysis) performed in initializing an object tracking subsystem by selecting a target for tracking. Preliminary results show that the gaze + key press technique is an effective, efficient, and easy to use interaction technique when performing selection operations on moving targets in videos in order to initialize an object tracking function.
The world of television production is beginning to adopt 4K Super 35 mm (S35) image capture for a widening range of program genres that seek both the unique imaging properties of that large image format and the protection of their program assets in a world anticipating future 4K services. Documentary and natural history production in particular are transitioning to this form of production. The nature of their shooting demands long zoom lenses. In their traditional world of 2/3-inch digital HDTV cameras they have a broad choice in portable lenses – with zoom ranges as high as 40:1. In the world of Super 35mm the longest zoom lens is limited to 12:1 offering a telephoto of 400mm. Canon was requested to consider a significantly longer focal range lens while severely curtailing its size and weight. Extensive computer simulation explored countless combinations of optical and optomechanical systems in a quest to ensure that all operational requests and full 4K performance could be met. The final lens design is anticipated to have applications beyond entertainment production, including a variety of security systems.
This paper presents a technique for capturing 3D motion scans using hardware that can be constructed for approximately $5,000 in cost. This hardware-software solution, in addition to capturing the movement of the physical structures also captures color and texture data. The scanner configuration developed at the University of North Dakota is sufficient in size for capturing scans of a group of humans. Scanning starts with synchronization and then requires modeling of each frame. For some applications linking structural elements from frame-to-frame may also be required. The efficacy of this scanning approach is discussed and prospective applications for it are considered.
The projection of controlled moving targets is key to the quantitative testing of video capture and post processing for Motion Imagery. This presentation will discuss several implementations of target projectors with moving targets or apparent moving targets creating motion to be captured by the camera under test. The targets presented are broadband (UV-VIS-IR) and move in a predictable, repeatable and programmable way; several short videos will be included in the presentation. Among the technical approaches will be targets that move independently in the camera’s field of view, as well targets that change size and shape. The development of a rotating IR and VIS 4 bar target projector with programmable rotational velocity and acceleration control for testing hyperspectral cameras is discussed. A related issue for motion imagery is evaluated by simulating a blinding flash which is an impulse of broadband photons in fewer than 2 milliseconds to assess the camera’s reaction to a large, fast change in signal. A traditional approach of gimbal mounting the camera in combination with the moving target projector is discussed as an alternative to high priced flight simulators. Based on the use of the moving target projector several standard tests are proposed to provide a corresponding test to MTF (resolution), SNR and minimum detectable signal at velocity. Several unique metrics are suggested for Motion Imagery including Maximum Velocity Resolved (the measure of the greatest velocity that is accurately tracked by the camera system) and Missing Object Tolerance (measurement of tracking ability when target is obscured in the images). These metrics are applicable to UV-VIS-IR wavelengths and can be used to assist in camera and algorithm development as well as comparing various systems by presenting the exact scenes to the cameras in a repeatable way.
SMPTE has designed in significant data spaces in each frame that may be used to store time stamps and other time sensitive data. There are metadata spaces in both the analog equivalent of the horizontal blanking referred to as the Horizontal Ancillary (HANC) space and in the analog equivalent of the vertical interval blanking lines referred to as the Vertical Ancillary (VANC) space. The HANC space is very crowded with many data types including information about frame rate and format, 16 channels of audio sound bites, copyright controls, billing information and more than 2,000 more elements. The VANC space is relatively unused by cinema and broadcasters which makes it a prime target for use in test, surveillance and other specialized applications. Taking advantage of the SMPTE structures, one can design and implement custom data gathering and recording systems while maintaining full interoperability with standard equipment. The VANC data space can be used to capture image relevant data and can be used to overcome transport latency and diminished image quality introduced by the use of compression.
Virtually all of the video data (and full-motion-video (FMV)) that is currently collected and stored in support of missions has been corrupted to various extents by image acquisition and compression artifacts. Additionally, video collected by wide-area motion imagery (WAMI) surveillance systems and unmanned aerial vehicles (UAVs) and similar sources is often of low quality or in other ways corrupted so that it is not worth storing or analyzing. In order to make progress in the problem of automatic video analysis, the first problem that should be solved is deciding whether the content of the video is even worth analyzing to begin with. We present a work in progress to address three types of scenes which are typically found in real-world data stored in support of Department of Defense (DoD) missions: no or very little motion in the scene, large occlusions in the scene, and fast camera motion. Each of these produce video that is generally not usable to an analyst or automated algorithm for mission support and therefore should be removed or flagged to the user as such. We utilize recent computer vision advances in motion detection and optical flow to automatically assess FMV for the identification and generation of meta-data (or tagging) of video segments which exhibit unwanted scenarios as described above. Results are shown on representative real-world video data.