Following a series of successful launch of MPEG standards on video compression, their applications reveal ever increasing needs for their content access without full decompression or in their compressed format. To this end, we further investigated a number of partial decoding schemes to address the issue of efficient content access to compressed video streams. By controlling the number of DCT coefficients involved in the inverse DCT, a number of partial decoding schemes can be designed featuring fast processing speed and low computing cost. By controlling the size of video frames, visually perceptual quality can be adjusted to suit various application including thumbnail image browsing, low resolution image processing, head tracking, skin detection, face recognition, and object segmentation etc. where full resolution frames are often not necessarily required. While achieving improved computing cost and processing speed, our work also features in: (i) reasonably good image quality for content browsing; (ii) compatibility with original MPEG-2 bit streams; and (iii) enormous potential for further application of MPEG-2 in video content management, content-based video frame retrieval, compressed video editing, and low bit-rate video communication such as those involving mobile phones and telephone networks etc. In addition, extensive experiments were carried out and reported to support our design.
We propose an image filtering technique for information filtering agent system. In this paper, contents based image filtering technique is proposed. In the proposed technique, content description of MPEG-7 is adopted into the image filtering. To verify the usefulness of the proposed method, an image-filtering agent system is developed on the network layer. MPEG-7 texture and color descriptors were employed as a content description. And MPEG-7 encoding the descriptors was performed just after receiving all packets of image data. Experimental result shows that the similarity-filtering ratio of the proposed method is much higher than that of conventional method without any cost of network speed.
The challenge in digital, interactive TV (digiTV) is to move the consumer from the refiguration state to the configuration state, where he can influence the story flow, the choice of characters and other narrative elements. Besides restructuring narrative and interactivity methodologies, one major task is content manipulation to provide the auditorium the ability to predefine actors that it wants to have in its virtual story universe. Current solutions in broadcasting video provide content as monolithic structure, composed of graphics, narration, special effects, etc. compressed into one high bit rate MPEG-2 stream. More personalized and interactive TV requires a contemporary approach to segment video data in real-time to customize contents. Our research work emphasizes techniques for interchanging faces/bodies against virtual anchors in real-time constrained broadcasted video streams. The aim of our research paper is to show and point out solutions for realizing real-time face and avatar customization. The major task for the broadcaster is metadata extraction by applying face detection/tracking/recognition algorithms, and transmission of the information to the client side. At the client side, our system shall provide the facility to pre-select virtual avatars stored in a local database, and synchronize movements and expressions with the current digiTV contents.
We present the implementation of real-time image filtering with retention of small-size details by means of use of DSP TMS320C6701. The filtering scheme is given for two filters connected in cascade. The first filter uses a similar scheme to KNN filter to provide the preservation of small-size details and the redescending M-estimators combined with the median estimator to provide impulsive noise rejection. The second filter uses an M filter to provide multiplicative noise suppression. We use different types of influence functions in the M-estimator to provide complex noise suppression. The efficiency of the proposed filter has been evaluated by numerous simulations.
In this paper, a real-time head detection and tracking system called HeadFinder is proposed. HeadFinder is a robust system which detects heads of people appeared in video images and track them. For the sake of effective detection we pay attention to motion and shape of a head, both of which are robust features to noise in video images. Since what the moving circle is a head is almost always true in our life space, we utilized it to detect heads. First, we detect outline of moving people in difference images between two consecutive video frames. Next, for the sake of circle detection, we use Hough transform which is known as a robust shape detection method. After the position and size (radius) of the detected circle are registered as a head model, HeadFinder switches to tracking phase. In order to raise the efficiency of tracking, we predict the domain where head will move. The size of predicted domain is proportional to the reliability of the head model, that is, the number of times of pursuit successes by present. Performances of HeadFinder in indoor and outdoor environment, are examined. Through experiments, we confirmed that HeadFinder works robustly against environment change and works well in real-time by a simple hardware.
This paper presents the real-time implementation of an auto- white-balancing and an auto-exposure algorithm on the TI TMS320DSC21 platform. This platform is a power-efficient single-chip processor that has been specifically designed for digital still cameras. Its architecture consists of five subsystems including an ARM micro-controller, a DSP core, a memory subsystem, two co-processors, and an imaging peripherals subsystem. Due to the memory constraints, the algorithms were modified to allow their real-time implementation of the processor, i.e. a processing rate of 30 frames per second. These modifications are discussed in the paper. The details of the algorithms are reported in an accompanying paper presented in the SPIE conference 4669B.
In this paper we study software requirements specification of real-time imaging applications. First, we briefly review the unique challenges faced when specifying real-time imaging systems. Then we present arguments as to why specification of real-time imaging systems should be studied as a special case. Next, we survey a subset of techniques that have been used to specify practical real-time imaging systems. Finally, we make recommendations for best practices in specifying real-time imaging systems requirements.
The ability of obstacles detection and tracking is essential for safe visual guidance of autonomous vehicles, especially in urban environments. In this paper, we first overview different plane projective transformation (PPT) based obstacle detection approaches under the planar ground assumption. Then, we give a simple proof of this approach with relative affine, a unified framework that includes the Euclidean, projective and affine frameworks by generalization and specialization. Next, we present a real-time hybrid obstacle detection method, which combined the PPT based method with the region segmentation based method to provide more accurate locations of obstacles. At last, with the vehicle's position information, a Kalman Filter is applied to track obstacles from frame to frame. This method has been tested on THMR-V (Tsinghua Mobile Robot V). Through various experiments we successfully demonstrate its real-time performance, high accuracy, and high robustness.
Road detection is the major task of autonomous vehicle guidance. We notice that feature lines, which are parallel to the road boundaries, are reliable cues for road detection in urban traffic. Therefore we present a real-time method that extracts the most likely road model using a set of feature-line-pairs (FLPs). Unlike the traditional methods that extract a single line, we extract the feature lines in pairs. Working with a linearly parameterized road model, FLP appears some geometrical consistency, which allows us to detect each of them with a Kalman filter tracking scheme. Since each FLP determines a road model, we apply regression diagnostics technique to robustly estimate the parameters of the whole road model from all FLPs. Another Kalman filter is used to track road model from frame to frame to provide a more precise and more robust detection result. Experimental results in urban traffic demonstrate real-time processing ability and high robustness.
This paper reports on the development of a real-time image fusion demonstration system using COTS technology. The system is designed to operate in a highly dynamic helicopter environment and across a challenging variety of operational conditions. The targeted application for the demonstrator system is low-level helicopter nap-of-the-earth flight, with particular emphasis being placed on providing improved pilot vision for an increased situational awareness capability. This work provides key technology assessment for the UK Ministry of Defence's Day/Night All Weather (D/NAW) helicopter program. The current operational requirement is to fuse imagery from two sensors - one a thermal imager, the other an image intensifier or visible band camera. However, provision has been made within the software and hardware architectures to support scaling to different cameras and further processors. Over the past two years the research has matured from algorithmic development and analysis using pre-registered recorded imagery to the current real-time image fusion demonstrator system which performs registration and warping prior to an adaptive image processing control scheme. This paper concentrates on the design and hardware implementation issues associated with the process of moving from experimental non real-time algorithms to a real-time image fusion demonstrator. Background information about the program is provided to help put the current system in context, and a brief overview of the algorithm set is also given. The design and hardware implementation issues associated with this scheme are discussed and results from initial field trials are presented.
A new image processing LSI SuperVchip with high-performance computing power has been developed. The SuperVchip has powerful capability for vision systems as follows: 1. General image processing by 3x3, 5x5, 7x7 kernel for high speed filtering function. 2. 16-parallel gray search engine units for robust template matching. 3. 49 block matching Pes to calculate the summation of the absolution difference in parallel for stereo vision function. 4. A color extraction unit for color object recognition. The SuperVchip also has peripheral function of vision systems, such as video interface, PCI extended interface, RISC engine interface and image memory controller on a chip. Therefore, small and high performance vision systems are realized via SuperVchip. In this paper, the above specific circuits are presented, and an architecture of a vision device equipped with SuperVchip and its performance are also described.
A key problem in the computer vision field is the measurement of object motion in a scene. The main goal is to compute an approximation of the 3D motion from the analysis of an image sequence. Once computed, this information can be used as a basis to reach higher level goals in different applications. Motion estimation algorithms pose a significant computational load for the sequential processors limiting its use in practical applications. In this work we propose a hardware architecture for motion estimation in real time based on FPGA technology. The technique used for motion estimation is Optical Flow due to its accuracy, and the density of velocity estimation, however other techniques are being explored. The architecture is composed of parallel modules working in a pipeline scheme to reach high throughput rates near gigaflops. The modules are organized in a regular structure to provide a high degree of flexibility to cover different applications. Some results will be presented and the real-time performance will be discussed and analyzed. The architecture is prototyped in an FPGA board with a Virtex device interfaced to a digital imager.
Recently, a growing community of researchers has used reconfigurable systems to solve computationally intensive problems. Reconfigurability provides optimized processors for systems on chip designs, and makes easy to import technology to a new system through reusable modules. The main objective of this work is the investigation of a reconfigurable computer system targeted for computer vision and real-time applications. The system is intended to circumvent the inherent computational load of most window-based computer vision algorithms. It aims to build a system for such tasks by providing an FPGA-based hardware architecture for task specific vision applications with enough processing power, using the minimum amount of hardware resources as possible, and a mechanism for building systems using this architecture. Regarding the software part of the system, a library of pre-designed and general-purpose modules that implement common window-based computer vision operations is being investigated. A common generic interface is established for these modules in order to define hardware/software components. These components can be interconnected to develop more complex applications, providing an efficient mechanism for transferring image and result data among modules. Some preliminary results are presented and discussed.
In this paper an implementation of a watershed algorithm on dynamically reconfigurable architecture is proposed. The hardware architecture dedicated to this algorithm consists of implementing the following transforms: labeled maker image processing, numerical rebuilding, geodesic distance function, and markers propagation. We show that the fast reprogrammability of the FPGA allows the sequential execution of these different stages, which compute watershed segmentation on a low cost hardware architecture.
There are several challenges when transferring time- sensitive, high bandwidth data across a lossy link with limited bandwidth. The current wireless transmission options do not offer the full bandwidth needed for raw video distribution. Thus, system trade-offs must be made between the bandwidth availability, the wireless system performance and the image quality of the video received for display. The user's expectations of image quality are often based on the viewing environment and current experiences with wired options. This paper will explore the end user requirements, wireless transmission options, challenges unique to a wireless environment and system performance issues.
The aim of this research is to extract the characteristic points for real-time automatic tracking of moving object. It is difficult to track human motion by considering and tracking all pixels of human figure, because it causes the very complex computation and consumes considerable processing time. Thus in this paper, we propose the method of extracting the characteristic points to represent the human figure. The characteristic points are the points that indicate the position of human in each frame and can be employed to track the human motion. The variance of color image data around every pixels of the object is calculated and the pixels of more than threshold variance value are extracted for the characteristic points. The points are distributed throughout the human figure by using a Characteristic Point Boundary technique which makes the points dispersed. Each characteristic point is extended to be square block in color information (that we call Matching Block) for matching correlation with the various position of the human figure in the frames followed. The employing of the characteristic points made the matching correlation to be easier and faster than the employing all pixels of the human figure.
In this paper, we propose a novel video encryption scheme based on multiple digital chaotic systems, which is called CVES (Chaotic Video Encryption Scheme). CVES is independent of any video compression algorithms, and can provide high security for real-time digital video with fast encryption speed, and can be simply realized both by hardware and software. Whatí»s more, CVES can be extended to support random retrieval of cipher-video with considerable maximal time-out; the extended CVES is called RRS-CVES (Random-Retrieval-Supported CVES). Essentially speaking, CVES is a universal fast encryption system and can be easily extended to other real-time applications. In CVES, 2n chaotic maps are used to generate pseudo-random signal to mask the video, and to make pseudo-random permutation of the masked video. Another single chaotic map is employed to initialize and control the above 2n chaotic maps. Detailed discussions are given to estimate the performance of CVES/RRS-CVES, respectively from the viewpoints of speed, security, realization and experiments.
The increased availability and usage of digital video lead to a need for automated video content analysis techniques. Most research on digital video content analysis includes automatic detection of the shot boundaries. However, those methods are not efficient in terms of computational time. In this paper, we propose the digital video camera system that can provide real-time shot boundary detection using the MPEG-7 descriptor. The video camera system is built so that MPEG-7 descriptors are extracted from frames of video. In this paper, the shot boundaries are achieved by measuring a distance of MPEG-7 descriptors for consecutive frames in real-time. Experimental results showed that the proposed video camera system provides fast and effective real-time shot boundary detection.