For the 9000 train accidents reported each year in the European Union , the Recording Strip (RS) and Filling-Card
(FC) related to the train activities represent the only usable evidence for SNCF (the French railway operator) and most of
National authorities. More precisely, the RS contains information about the train journey, speed and related Driving
Events (DE) such as emergency brakes, while the FC gives details on the departure/arrival stations. In this context, a
complete checking for 100% of the RS was recently voted by French law enforcement authorities (instead of the 5%
currently performed), which raised the question of an automated and efficient inspection of this huge amount of
recordings. To do so, we propose a machine vision prototype, constituted with cassettes receiving RS and FC to be
digitized. Then, a video analysis module firstly determines the type of RS among eight possible types; time/speed curves
are secondly extracted to estimate the covered distance, speed and stops, while associated DE are finally detected using
convolution process. A detailed evaluation on 15 RS (8000 kilometers and 7000 DE) shows very good results (100% of
good detections for the type of band, only 0.28% of non detections for the DE). An exhaustive evaluation on a panel of
about 100 RS constitutes the perspectives of the work.
This paper presents a new fusion scheme for enhancing the result quality based on the combination of multiple different
detectors. We present a study showing the fusion of multiple video analysis detectors like "detecting unattended
luggage" in video sequences. One of the problems is the time jitter between different detectors, i.e. typically one system
can trigger an event several seconds before another one. Another issue is the computation of the adequate fusion of
realigned events. We propose a fusion system that overcomes these problems by being able (i) In the learning stage to
match off-line the ground truth events with the result of the detectors events using a dynamic programming scheme (ii)
To learn the relation between ground truth and result (iii) To fusion in real-time the events from different detectors
thanks to the learning stage in order to maximize the global quality of result. We show promising results by combining
outputs of different video analysis detector technologies.
The major drawback of interactive retrieval systems is the potential frustration of the user that is caused by an
excessive labelling work. Active learning has proven to help solving this issue, by carefully selecting the
examples to present to the user. In this context, the design of the user interface plays a critical role since it
should invite the user to label the examples elected by the active learning.
This paper presents the design and evaluation of an innovative user interface for image retrieval. It has been
validate using real-life IEEE PETS video surveillance data.
In particular, we investigated the most appropriate repartition of the display area between the retrieved video
frames and the active learning examples, taking both objective and subjective user satisfaction parameters into
The flexibility of the interface relies on a scalable representation of the video content such as Motion JPEG
2000 in our implementation.
On board video analysis has attracted a lot of interest over the two last decades with as main goal to improve safety by
detecting obstacles or assisting the driver. Our study aims at providing a real-time understanding of the urban road
traffic. Considering a video camera fixed on the front of a public bus, we propose a cost-effective approach to estimate
the speed of the vehicles on the adjacent lanes when the bus operates on a dedicated lane. We work on 1-D segments
drawn in the image space, aligned with the road lanes. The relative speed of the vehicles is computed by detecting and
tracking features along each of these segments. The absolute speed can be estimated from the relative speed if the camera
speed is known, e.g. thanks to an odometer and/or GPS. Using pre-defined speed thresholds, the traffic can be classified
into different categories such as 'fluid', 'congestion' etc. The solution offers both good performances and low computing
complexity and is compatible with cheap video cameras, which allows its adoption by city traffic management authorities.
We present a point based reconstruction and transmission pipeline for a collaborative tele-immersion system.
Two or more users in different locations collaborate with each other in a shared, simulated environment as
if they were in the same physical room. Each user perceives point-based models of distant users along with
collaborative data like molecule models. Disparity maps, computed by a commercial stereo solution, are filtered
and transformed into clouds of 3D points. The clouds are compressed and transmitted over the network to distant
users. At the other side the clouds are decompressed and incorporated into the 3D scene. The viewpoint used
to display the 3D scene is dependent on the position of the head of the user. Collaborative data is manipulated
through natural hand gestures. We analyse the performance of the system in terms of computation time, latency
and photo realistic quality of the reconstructed models.
Nowadays, video-conference tends to be more and more advantageous because of the economical and
ecological cost of transport. Several platforms exist. The goal of the TIFANIS immersive platform is to let
users interact as if they were physically together. Unlike previous teleimmersion systems, TIFANIS uses
generic hardware to achieve an economically realistic implementation. The basic functions of the system are
to capture the scene, transmit it through digital networks to other partners, and then render it according to
each partner's viewing characteristics. The image processing part should run in real-time.
We propose to analyze the whole system. it can be split into different services like central processing
unit (CPU), graphical rendering, direct memory access (DMA), and communications trough the network.
Most of the processing is done by CPU resource. It is composed of the 3D reconstruction and the detection
and tracking of faces from the video stream. However, the processing needs to be parallelized in several
threads that have as little dependencies as possible. In this paper, we present these issues, and the way we deal
Today's technologies in video analysis use state of the art systems and formalisms like onthologies and datawarehousing
to handle huge amount of data generated from low-level descriptors to high-level descriptors. In the IST
CARETAKER project we develop a multi-dimensional database with distributed features to add a centric data
view of the scene shared between all the sensors of a network.
We propose to enhance possibilities of this kind of system by delegating the intelligence to a lot of other
entities, also known as "Agents" which are specialized little applications, able to walk across the network and
work on dedicated sets of data related to their core domain. In other words, we can reduce, or enhance, the
complexity of the analysis by adding or not feature specific agents, and processing is limited to the data concerned
by the processing.
This article explains how to design and develop an agent oriented systems which can be used by a video
analysis datawarehousing. We also describe how this methodology can distribute the intelligence over the system,
and how the system can be extended to obtain a self reasoning architecture using cooperative agents. We will
demonstrate this approach.
Object tracking from multiple Pan Tilt Zoom (PTZ) cameras is an important task. This
paper deals with the evaluation of the result of such a system. This performance evaluation is
conducted by first considering the characterization of the PTZ parameters and then by the
trajectories themselves. The camera parameters with be evaluated with the homography errors;
the trajectories will be evaluated according to the location and miss-identification errors.
High-level Video content analysis such as video-surveillance is often limited by computational aspects of automatic image understanding, i.e. it requires huge computing resources for reasoning processes like categorization and huge amount of data to represent knowledge of objects, scenarios and other models.
This article explains how to design and develop a "near real-time adaptive image datamart", used, as a decisional support system for vision algorithms, and then as a mass storage system. Using RDF specification as storing format of vision algorithms meta-data, we can optimise the data warehouse concepts for video analysis, add some processes able to adapt the current model and pre-process data to speed-up queries. In this way, when new data is sent from a sensor to the data warehouse for long term storage, using remote procedure call embedded in object-oriented interfaces to simplified queries, they are processed and in memory data-model is updated. After some processing, possible interpretations of this data can be returned back to the sensor.
To demonstrate this new approach, we will present typical scenarios applied to this architecture such as people tracking and events detection in a multi-camera network. Finally we will show how this system becomes a high-semantic data container for external data-mining.
Globalisation of people's interaction in the industrial world and ecological cost of transport make video-conference an interesting solution for collaborative work. However, the lack of immersive perception makes video-conference not appealing. TIFANIS tele-immersion system was conceived to let users interact as if they were physically together. In this paper, we focus on an important feature of the immersive system: the automatic tracking of the user's point of
view in order to render correctly in his display the scene from the ther site. Viewpoint information has to be computed in a very short time and the detection system should be no intrusive, otherwise it would become cumbersome for the user, i.e. he would lose the feeling of "being there". The viewpoint detection system consists of several modules. First, an analysis module identifies and follows regions of
interest (ROI) where faces are detected. We will show the cooperative approach between spatial detection and temporal tracking. Secondly, an eye detector finds the position of the eyes within faces. Then, the 3D positions of the eyes are deduced using stereoscopic images from a binocular camera. Finally, the 3D scene is rendered in real-time according to the new point of view.
Partners of the CANDELA project are realizing a system for real-time image processing for traffic and video-surveillance applications. This system performs some segmentation, labels the extracted blobs and follows their track into the scene. We also address the problem of evaluating the results of such processes. We are developing a tool to generate and manage the results of the performance evaluation of VCA systems. This evaluation is done by comparison of the results of the global application and its components with a ground truth file generated manually. Both manually and automatically generated description files are formatted in XML. This descriptive markup language is then treated to assemble appropriately parts of the document and process this metadata. For a scientific purpose this tool will provide an objective measure of improvement and a mean to choose between competitive methods. In addition, it is a powerful tool for algorithm designers to measure the progress of their work at the different levels of the processing chain. For an industrial purpose this tool will assess both the accuracy of the VCA with an obvious marketing impact. We present the definition of the evaluation tool, its metrics and specific implementations designed for our applications.
We present an integrated real-time smart network camera. This system is composed of an image sensor, an embedded PC based electronic card for image processing and some network capabilities. The application detects events of interest in visual scenes, highlights alarms and computes statistics. The system also produces meta-data information that could be shared between other cameras in a network. We describe the requirements of such a system and then show how the design of the system is optimized to process and compress video in real-time. Indeed, typical video-surveillance algorithms as background differencing, tracking and event detection should be highly optimized and simplified to be used in this hardware. To have a good adequation between hardware and software in this light embedded system, the software management is written on top of the java based middle-ware specification established by the OSGi alliance. We can integrate easily software and hardware in complex environments thanks to the Java Real-Time specification for the virtual machine and some network and service oriented java specifications (like RMI and Jini). Finally, we will report some outcomes and typical case studies of such a camera like counter-flow detection.
In this article we present a generic, flexible, scalable and robust
approach for an intelligent real-time forensic visual system. The
proposed implementation could be rapidly deployable and integrates
minimum logistic support as it embeds low complexity devices (PCs and
cameras) that communicate through wireless network.
The goal of these advanced tools is to provide intelligent video storage
of potential video evidences for fast intervention during deployment
around a hazardous sector after a terrorism attack, a disaster, an air
crash or before attempt of it.
Advanced video analysis tools, such as segmentation and tracking are
provided to support intelligent storage and annotation.
Robust image watermarking algorithms have been proposed as methods for discouraging illicit copying and distribution of copyright material. Having robustness to pixels modifications in mind, many watermarking designers use techniques coming from the communications domain such as spread spectrum to embed hidden information, be it in the spatial or in the transform domain. There exist numerous attacks dedicated to make watermarking algorithms inefficient that degrade images by geometric distortions. One solution to counter them is to add synchronization information. In this paper we present an analysis of this type of distortions and we propose a metric to estimate the distortion undergone by an image. This metric is content independent, invariant to global translation, rotation and scaling, which can be considered as non-meaningful transformations. To demonstrate the relevance of this metric, we compare some of its results with the subjective degradation of the image produced by the Stirmark software.
In this article we present a generic, flexible and robust approach for an intelligent real-time video-surveillance system. A previous version of the system was presented in . The goal of these advanced tools is to provide help to operators by detecting events of interest in visual scenes and highlighting alarms and compute statistics. The proposed system is a multi-camera platform able to handle different standards of video inputs (composite, IP, IEEE1394 ) and which can basically compress (MPEG4), store and display them. This platform also integrates advanced video analysis tools, such as motion detection, segmentation, tracking and interpretation. The design of the architecture is optimised to playback, display, and process video flows in an efficient way for video-surveillance application. The implementation is distributed on a scalable computer cluster based on Linux and IP network. It relies on POSIX threads for multitasking scheduling. Data flows are transmitted between the different modules using multicast technology and under control of a TCP-based command network (e.g. for bandwidth occupation control). We report here some results and we show the potential use of such a flexible system in third generation video surveillance system. We illustrate the interest of the system in a real case study, which is the indoor surveillance.