The major drawback of interactive retrieval systems is the potential frustration of the user that is caused by an
excessive labelling work. Active learning has proven to help solving this issue, by carefully selecting the
examples to present to the user. In this context, the design of the user interface plays a critical role since it
should invite the user to label the examples elected by the active learning.
This paper presents the design and evaluation of an innovative user interface for image retrieval. It has been
validate using real-life IEEE PETS video surveillance data.
In particular, we investigated the most appropriate repartition of the display area between the retrieved video
frames and the active learning examples, taking both objective and subjective user satisfaction parameters into
The flexibility of the interface relies on a scalable representation of the video content such as Motion JPEG
2000 in our implementation.
On board video analysis has attracted a lot of interest over the two last decades with as main goal to improve safety by
detecting obstacles or assisting the driver. Our study aims at providing a real-time understanding of the urban road
traffic. Considering a video camera fixed on the front of a public bus, we propose a cost-effective approach to estimate
the speed of the vehicles on the adjacent lanes when the bus operates on a dedicated lane. We work on 1-D segments
drawn in the image space, aligned with the road lanes. The relative speed of the vehicles is computed by detecting and
tracking features along each of these segments. The absolute speed can be estimated from the relative speed if the camera
speed is known, e.g. thanks to an odometer and/or GPS. Using pre-defined speed thresholds, the traffic can be classified
into different categories such as 'fluid', 'congestion' etc. The solution offers both good performances and low computing
complexity and is compatible with cheap video cameras, which allows its adoption by city traffic management authorities.
Nowadays, video-conference tends to be more and more advantageous because of the economical and
ecological cost of transport. Several platforms exist. The goal of the TIFANIS immersive platform is to let
users interact as if they were physically together. Unlike previous teleimmersion systems, TIFANIS uses
generic hardware to achieve an economically realistic implementation. The basic functions of the system are
to capture the scene, transmit it through digital networks to other partners, and then render it according to
each partner's viewing characteristics. The image processing part should run in real-time.
We propose to analyze the whole system. it can be split into different services like central processing
unit (CPU), graphical rendering, direct memory access (DMA), and communications trough the network.
Most of the processing is done by CPU resource. It is composed of the 3D reconstruction and the detection
and tracking of faces from the video stream. However, the processing needs to be parallelized in several
threads that have as little dependencies as possible. In this paper, we present these issues, and the way we deal
Globalisation of people's interaction in the industrial world and ecological cost of transport make video-conference an interesting solution for collaborative work. However, the lack of immersive perception makes video-conference not appealing. TIFANIS tele-immersion system was conceived to let users interact as if they were physically together. In this paper, we focus on an important feature of the immersive system: the automatic tracking of the user's point of
view in order to render correctly in his display the scene from the ther site. Viewpoint information has to be computed in a very short time and the detection system should be no intrusive, otherwise it would become cumbersome for the user, i.e. he would lose the feeling of "being there". The viewpoint detection system consists of several modules. First, an analysis module identifies and follows regions of
interest (ROI) where faces are detected. We will show the cooperative approach between spatial detection and temporal tracking. Secondly, an eye detector finds the position of the eyes within faces. Then, the 3D positions of the eyes are deduced using stereoscopic images from a binocular camera. Finally, the 3D scene is rendered in real-time according to the new point of view.
In this paper, we present an integrated system for smart encoding in video surveillance. This system, developed within the European IST WCAM project, aims at defining an optimized JPEG 2000 codestream organization directly based on the semantic content of the video surveillance analysis module. The proposed system produces a fully compliant Motion JPEG 2000 stream that contains regions of interest (typically mobile objects) data in a separate layer than regions of less interest (e.g. static background). First the system performs a real-time unsupervised segmentation of mobiles in each frame of the video. The smart encoding module uses these regions of interest maps in order to construct a Motion JPEG 2000 codestream that allows an optimized rendering of the video surveillance stream in low bandwidth wireless applications, allocating more quality to mobiles than for the background. Our integrated system improves the coding representation of the video content without data overhead. It can also be used in applications requiring selective scrambling of regions of interest as well as for any other application dealing with regions of interest.
This paper presents a new method for remote and interactive browsing of long video surveillance sequences. The solution is based on interactive navigation in JPEG 2000 coded mega-images. We assume that the video 'key-frames' are available through automatic detection of scene changes or abnormal behaviors. These key-frames are concatenated in raster scanning order forming a very large 2D image, which is then compressed with JPEG 2000 to produce a scalable video summary of the sequence. We then exploit a mega image navigation platform, designed in full compliance with JPEG 2000 part 9 "JPIP", to search and visualize desirable content, based on client requests. The flexibility offered by JPEG 2000 allows highlighting key-frames corresponding to the required content within a low quality and low-resolution version of the whole summary. Such a fine grain scalability is a unique feature of our proposed JPEG 2000 video summaries expansion. This possibility to visualize key-frames of interests and playback the corresponding video shots within the context of the whole sequence enables the user to understand the temporal relations between semantically similar events. It is then particularly suited to analyzing complex incidents consisting of many successive events spread over a long period.
In this paper, we present an integrated system for video surveillance developed within the European IST WCAM project, using only standard multimedia and networking tools. The advantages of such a system, while allowing cost reduction and interoperability, is to benefit from the fast technological evolution of the video encoding and distribution tools.
We present an integrated real-time smart network camera. This system is composed of an image sensor, an embedded PC based electronic card for image processing and some network capabilities. The application detects events of interest in visual scenes, highlights alarms and computes statistics. The system also produces meta-data information that could be shared between other cameras in a network. We describe the requirements of such a system and then show how the design of the system is optimized to process and compress video in real-time. Indeed, typical video-surveillance algorithms as background differencing, tracking and event detection should be highly optimized and simplified to be used in this hardware. To have a good adequation between hardware and software in this light embedded system, the software management is written on top of the java based middle-ware specification established by the OSGi alliance. We can integrate easily software and hardware in complex environments thanks to the Java Real-Time specification for the virtual machine and some network and service oriented java specifications (like RMI and Jini). Finally, we will report some outcomes and typical case studies of such a camera like counter-flow detection.
JPEG2000 offers a new coding of images with a hierarchical data
structure from which a user can pick up the only necessary codestream
for the generation of representation matched to his needs. Therefore, a unitary and multipurpose image exchange system can be made from
combining the new JPEG2000 technology and classical mailer tools.
This paper proposes such a flexible access method. The proposed server consists of two POP3 servers, a SMTP server, the image list database and the image database. The POP3 server performs user
authentication and provides the image list. The SMTP server receives the client's request and transmits the appropriate image.
First the user connects to the 1st POP3 server with ID and password, then receives the available image list as e-mails. One mail is sent for each available representation of an image, i.e. each JPEG2000 progression order and image format. The hash value of the codestream and the JPEG2000 progression order values are included in the Message-ID of each mail header. Information, such as the name of the image, size, quality of image, is written in the text. Then, the user orders an image by replying to the corresponding mail of the list. When the SMTP server receives the client's e-mail, it prepares the desired image, and provides a mail with that image to the client via the second POP3 server. The proposed system is validated with different platform mailer tools.