Various visible and infrared cameras have been tested for the early detection of wildfires to protect archeological
treasures. This analysis was possible thanks to the EU Firesense project (FP7-244088). Although visible cameras are low
cost and give good results during daytime for smoke detection, they fall short under bad visibility conditions. In order to improve the fire detection probability and reduce the false alarms, several infrared bands are tested ranging from the NIR to the LWIR. The SWIR and the LWIR band are helpful to locate the fire through smoke if there is a direct Line Of Sight. The Emphasis is also put on the physical and the electro-optical system modeling for forest fire detection at short and longer ranges. The fusion in three bands (Visible, SWIR, LWIR) is discussed at the pixel level for image
enhancement and for fire detection.
In this paper, we study a flexible framework for semantic analysis of human motion from a monocular surveillance
video. Successful trajectory estimation and human-body modeling facilitate the semantic analysis of human
activities in video sequences. As a first contribution, we propose a flexible framework that enables automatic
analysis of human behavior and semantic events. It can be utilized in surveillance applications with four-level
analysis results. The second contribution is the introduction of a 3-D reconstruction scheme for scene understanding.
The total framework consists of four processing levels: (1) a pre-processing level including background
modeling and multiple-person detection, (2) an object-based level performing trajectory estimation and posture
classification, (3) an event-based level for semantic analysis and (4) a visualization level including camera calibration
and 3-D scene reconstruction. Our proposed framework was evaluated and proved its effectiveness as it
achieves a near real-time performance (6-8 frames/second).
The extensive amount of video data stored on available media (hard and optical disks) necessitates video content
analysis, which is a cornerstone for different user-friendly applications, such as, smart video retrieval and intelligent
video summarization. This paper aims at finding a <i>unified</i> and <i>efficient</i> framework for court-net sports
video analysis. We concentrate on techniques that are generally applicable for more than one sports type to come
to a unified approach. To this end, our framework employs the concept of multi-level analysis, where a novel 3-D
camera modeling is utilized to bridge the gap between the object-level and the scene-level analysis. The new 3-D
camera modeling is based on collecting features points from two planes, which are perpendicular to each other, so
that a true 3-D reference is obtained. Another important contribution is a new tracking algorithm for the objects
(i.e. players). The algorithm can track up to four players simultaneously. The complete system contributes to
summarization by various forms of information, of which the most important are the moving trajectory and
real-speed of each player, as well as 3-D height information of objects and the semantic event segments in a
game. We illustrate the performance of the proposed system by evaluating it for a variety of court-net sports
videos containing badminton, tennis and volleyball, and we show that the feature detection performance is above
92% and events detection about 90%.
In this paper, we propose an effective framework for semantic analysis of human motion from a monocular
video. As it is difficult to find a good motion description for humans, we focus on a reliable recognition of the
motion type and estimate the body orientation involved in the video sequence. Our framework analyzes the
body motion in three modules: a pre-processing module, matching module and semantic module. The proposed
framework includes novel object-level processing algorithms, such as a local descriptor and a global descriptor
to detect body parts and analyze the shape of the whole body as well. Both descriptors jointly contribute to the
matching process by incorporating them into a new weighted linear combination for matching. We also introduce
a simple cost function based on time-index di.erences to distinguish motion types and cycles in human motions.
Our system can provide three different types of analysis results: (1) foreground person detection; (2) motion
recognition in the sequence; (3) 3-D modeling of human motion based on generic human models. The proposed
framework was evaluated and proved its effectiveness as it achieves the motion recognition and body-orientation
classification at the accuracy of 95% and 98%, respectively.
We propose a fully automatic and flexible framework for analysis and summarization of tennis broadcast video sequences, using visual features and specific game-context knowledge. Our framework can analyze a tennis video sequence at three levels, which provides a broad range of different analysis results. The proposed framework includes novel pixel-level and object-level tennis video processing algorithms, such as a moving-player detection taking both the color and the court (playing-field) information into account, and a player-position tracking algorithm based on a <i>3-D</i> camera model. Additionally, we employ scene-level models for detecting events, like service, base-line rally and net-approach, based on a number <i>real-world</i> visual features. The system can summarize three forms of information: (1) all court-view playing frames in a game, (2) the moving trajectory and real-speed of each player, as well as relative position between the player and the court, (3) the semantic event segments in a game. The proposed framework is flexible in choosing the level of analysis that is desired. It is effective because the framework makes use of several visual cues obtained from the <i>real-world</i> domain to model important events like service, thereby increasing the accuracy of the scene-level analysis. The paper presents attractive experimental results highlighting the system efficiency and analysis capabilities.
In this paper, we develop a tree-structured predictive partial matching (PPM) scheme for progressive compression of PointTexture images. By incorporating PPM with tree-structured coding, the proposed
algorithm can compress 3D depth information progressively into a single bitstream. Also, the proposed algorithm compresses color information using a differential pulse coding modulation (DPCM) coder and interweaves the compressed depth and color information effciently. Thus, the decoder can reconstruct 3D models from the coarsest resolution to the highest resolution from a single bitstream. Simulation results demonstrate that the proposed algorithm provides much better compression performance than a universal Lempel-Ziv coder, WinZip.
Stereo images provide an enhanced sense of presence, and have been found to be operationally useful in tasks requiring remote manipulation or judgment of spatial relationships in contrast to ordinary image. A conventional stereo system with a single left-right pair needs twice the raw data as a monoscopic imaging system. As a result there have been increasing attention given to image compression methods. As an important part of the stereo pair coding, disparity estimation influences the precision and efficiency of the coding system. The traditional disparity estimation methods for stereo pair coding are mostly fixed-size block matching (FSBM). But the disparity vectors estimated by this method are not very accurate. In order to find more accurate disparity vector, adaptive-size block matching (ASBM) algorithm was used in some stereo matching algorithms. And this kind of algorithms selected an appropriate window based on the content of image that improves the verity of estimation. But the primary problem of it is computational complexity that prevents its applying in stereo coding. In this paper, a novel hybrid block matching (HBM) disparity estimation algorithm is proposed. And on the basis of it, a complete stereo coding scheme is introduced. In this scheme, conventional ASBM is improved and integrated with FSBM. Improved ASBM of this algorithm only uses the predicted error of the intensity to control the size of the matching window, which results in a reduction in complexity in contrast to traditional ASBM algorithms. We provide experimental results that show that our HBM achieves more accurate disparity vectors as compared to a simple FSBM and reduces the complexity of the traditional ASBM. Results also demonstrate that the proposed coding scheme provides higher mean peak signal-to-noise ratio (PSNR), about 0.7-1.2 dB, as compared with fix-size blockwise coding algorithm.
The rapidly increasing usage of multimedia environments has led to a greater demand for image retrieval. In this paper, we propose a method for image database retrieval based on salient edges. It achieves both the desired efficiency and accuracy using a three-stage: in the first stage, we extract edge points from the original image and link them to edge curves; in the second stage, we select salient edges according to their lengths, and com*pute rotational angle histogram (RAH) and corners' average frequency (CAF) for every salient edge; in the last stage, a feature vector is generated based on those RAHs and CAFs. We have tested this technique using an image database containing more than 4000 images and all results show that our scheme can perform retrieval efficiently. When an image database is on the order of tens of thousands of images, suitable indexing methods become critical for efficient query processing. This paper also present a new indexing method called tree structured triangle inequality (TSTI), which combines triangle inequality indexing method with tree structured indexing technique. The experiments provide evidence that our proposed method can improve the retrieval speed but not reduce its accuracy.