PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 7724, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vanishing points are elements of great interest in the computer vision field, since they are the main source of
information about the geometry of the scene and the projection process associated to the camera. They have
been studied and applied during decades for plane rectification, 3D reconstruction, and mainly auto-calibration
tasks.
Nevertheless, the literature lacks accurate online solutions for multiple vanishing point estimation. Most
strategies focalize on the accuracy, using highly computational demanding iterative procedures. We propose a
novel strategy for multiple vanishing point estimation that finds a trade-off between accuracy and efficiency,
being able to operate in real time for video sequences. This strategy takes advantage of the temporal coherence
of the images of the sequences to reduce the computational load of the processing algorithms while keeping a
high level of accuracy due to an optimization process.
The key element of the approach is a robust scheme based on the MLESAC algorithm, which is used in a
similar way to the EM algorithm. This approach ensures robust and accurate estimations, since we use the
MLESAC in combination with a novel error function, based on the angular error between the vanishing point
and the image features. To increase the speed of the MLESAC algorithm, the selection of the minimal sample
sets is substituted by a random sampling step that takes into account temporal information to provide better
initializations. Besides, for the sake of flexibility, the proposed error function has been designed to work using
as image features indiscriminately gradient-pixels or line segments. Hence, we increase the range of applications
in which our approach can be used, according to the type of information that is available.
The results show a real-time system that delivers real-time accurate estimations of multiple vanishing points
for online processing, tested in moving camera video sequences of structured scenarios, both indoors and outdoors,
such as rooms, corridors, facades, roads, etc.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the real world, there are a variety of applications of high resolution (HR) images in remote sensing, video frame
freezing, medicine, robot artificial viewing, military information acquisition, etc. Because of the high cost and physical
limitations of the acquisition hardware, the low-resolution (LR) images are used frequently. So, super-resolution (SR)
restoration is an emerged solution permitting to form one or a set of HR images from a sequence of LR images. The
proposed SR framework takes into account the spatial and spectral WT pixel information reconstructing different
video and texture nature, presenting good performance in terms of objective (PSNR, MAE, NCD) criteria and visual
subjective perception, employing the Wavelets based on atomic functions (WAF). Statistical simulations have
demonstrated the effectiveness of the novel approach. The real time digital processing has been implemented on DSP
of Texas Instruments TMS320DM642, demonstrating the effectiveness of the reconstruction of SR images in real time
processing mode, and justifying this in the video sequences of different nature, pixel resolution and motion behavior.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The advances in automated production processes have resulted in the need for detecting, reading and decoding 2D
datamatrix barcodes at very high speeds. This requires the correct combination of high speed optical devices that are
capable of capturing high quality images and computer vision algorithms that can read and decode the barcodes
accurately. Such barcode readers should also be capable of resolving fundamental imaging challenges arising from
blurred barcode edges, reflections from possible polyethylene wrapping, poor and/or non-uniform illumination,
fluctuations of focus, rotation and scale changes. Addressing the above challenges in this paper we propose the design
and implementation of a high speed multi-barcode reader and provide test results from an industrial trial. To authors
knowledge such a comprehensive system has not been proposed and fully investigated in existing literature. To reduce the
reflections on the images caused due to polyethylene wrapping used in typical packaging, polarising filters have been
used. The images captured using the optical system above will still include imperfections and variations due to scale,
rotation, illumination etc. We use a number of novel image enhancement algorithms optimised for use with 2D datamatrix
barcodes for image de-blurring, contrast point and self-shadow removal using an affine transform based approach and
non-uniform illumination correction. The enhanced images are subsequently used for barcode detection and recognition.
We provide experimental results from a factory trial of using the multi-barcode reader and evaluate the performance of
each optical unit and computer vision algorithm used. The results indicate an overall accuracy of 99.6 % in barcode
recognition at typical speeds of industrial conveyor systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computing image local statistics is required in many image processing applications such as local adaptive image
restoration, enhancement, segmentation, target location and tracking, to name a few. These computations must be carried
out in sliding window of a certain shape and weights. Generally, it is a time consuming operation with per-pixel
computational complexity of the order of the window size, which hampers real-time applications. For acceleration of
computations, recursive computational algorithms are used. However, such algorithms are available only for windows of
certain specific forms, such as rectangle and octagon, with uniform weights. We present a general framework of fast
parallel and recursive computation of image local statistics in sliding window of almost arbitrary shape and weights with
"per-pixel" computational complexity that is substantially of lower order than the window size. As an illustration of this
framework, we describe methods for computing image local moments such as local mean and variance, image local
histograms and local order statistics (in particular, minimum, maximum, median), image local ranks, image local DFT,
DCT, DcST spectra in polygon-shaped windows as well as in windows with non-uniform weights, such as Sine lobe,
Hann, Hamming and Blackman windows.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Although capturing and displaying stereo 3D content is now commonplace, information-rich light-field video content
capture, transmission and display are much more challenging, resulting in at least one order of magnitude increase in
complexity even in the simplest cases. We present an end-to-end system capable of capturing and real-time displaying of
high-quality light-field video content on various HoloVizio light-field displays, providing very high 3D image quality
and continuous motion parallax. The system is compact in terms of number of computers, and provides superior image
quality, resolution and frame rate compared to other published systems. To generate light-field content, we have built a
camera system with a large number of cameras and connected them to PC computers. The cameras were in an evenly
spaced linear arrangement. The capture PC was directly connected through a single gigabit Ethernet connection to the
demonstration 3D display, supported by a PC computation cluster. For the task of dense light field displaying massively
parallel reordering and filtering of the original camera images is required. We were utilizing both CPU and GPU threads
for this task. On the GPU we do the light-field conversion and reordering, filtering and the YUV-RGB conversion. We
use OpenGL 3.0 shaders and 2D texture arrays to have easy access to individual camera images. A network-based
synchronization scheme is used to present the final rendered images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work scalable parallelization methods for computing in real-time the H.264/AVC on multi-cores platforms,
such as the most recent Graphical Processing Units (GPUs) and Cell Broadband Engine (Cell/BE), are proposed.
By applying the Amdahl's law, the most demanding parts of the video coder were identified and the Single
Program Multiple Data and Single Instruction Multiple Data approaches are adopted for achieving real-time
processing. In particular, video motion estimation and in-loop deblocking filtering were offloaded to be executed
in parallel on either GPUs or Cell/BE Synergistic Processor Elements (SPEs). The limits and advantages of
these two architectures when dealing with typical video coding problems, such as data dependencies and large
input data are demonstrated. We propose techniques to minimize the impact of branch divergences and branch
misprediction, data misalignment, conflicts and non-coalesced memory accesses. Moreover, data dependencies
and memory size restrictions are taken into account in order to minimize synchronization and communication time
overheads, and to achieve the optimal workload balance given the available multiple cores. Data reusing technique
is extensively applied for reducing communication overhead, in order to achieve the maximum processing speedup.
Experimental results show that real time H.264/AVC is achieved in both systems by computing 30 frames per
second, with a resolution of 720×576 pixels, when full-pixel motion estimation is applied over 5 reference frames
and 32×32 search area. When quarter-pixel motion estimation is adopted, real time video coding is obtained on
GPU for larger search area and on Cell/BE for smaller search areas.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents ongoing work on the design of a two-dimensional (2D) systolic array for image processing.
This component is designed to operate on a multi-processor system-on-chip. In contrast with other 2D systolic-array
architectures and many other hardware accelerators, we investigate the applicability of executing multiple
tasks in a time-interleaved fashion on the Systolic Array (SA). This leads to a lower external memory bandwidth
and better load balancing of the tasks on the different processing tiles. To enable the interleaving of tasks, we
add a shadow-state register for fast task switching. To reduce the number of accesses to the external memory, we
propose to share the communication assist between consecutive tasks. A preliminary, non-functional version of the
SA has been synthesized for an XV4S25 FPGA device and yields a maximum clock frequency of 150 MHz requiring
1,447 slices and 5 memory blocks. Mapping tasks from video content-analysis applications from literature on the
SA yields reductions in the execution time of 1-2 orders of magnitude compared to the software implementation.
We conclude that the choice for an SA architecture is useful, but a scaled version of the SA featuring less logic
with fewer processing and pipeline stages yielding a lower clock frequency, would be sufficient for a video analysis
system-on-chip.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the most important techniques for hyperspectral data exploitation is spectral unmixing, which aims at
characterizing mixed pixels. When the spatial resolution of the sensor is not fine enough to separate different
spectral constituents, these can jointly occupy a single pixel and the resulting spectral measurement will be a
composite of the individual pure spectra. The N-FINDR algorithm is one of the most widely used and successfully
applied methods for automatically determining endmembers (pure spectral signatures) in hyperspectral image
data without using a priori information. The identification of such pure signatures is highly beneficial in order
to 'unmix' the hyperspectral scene, i.e. to perform sub-pixel analysis by estimating the fractional abundance
of endmembers in mixed pixels collected by a hyperspectral imaging spectrometer. The N-FINDR algorithm
attempts to automatically find the simplex of maximum volume that can be inscribed within the hyperspectral
data set. Due to the intrinsic complexity of remotely sensed scenes and their ever-increasing spatial and spectral
resolution, the efficiency of the endmember searching process conducted by N-FINDR depends not only on the
size and dimensionality of the scene, but also on its complexity (directly related with the number of endmembers).
In this paper, we develop a new parallel version of N-FINDR which is shown to scale better as the dimensionality
and complexity of the hyperspectral scene to be processed increases. The parallel algorithm has been implemented
on two different parallel systems, in which two different types of commodity graphics processing units (GPUs)
from NVidia™ are used to assist the CPU as co-processors. Commodity computing in GPUs is an exciting
new development in remote sensing applications since these systems offer the possibility of (onboard) high
performance computing at very low cost. Our experimental results, obtained in the framework of a mineral
mapping application using hyperspectral data collected by the NASA Jet Propulsion Laboratory's Airborne
Visible Infra-Red Imaging Spectrometer (AVIRIS), reveal that the proposed parallel implementation compares
favorably with the original version of N-FINDR not only in terms of computation time, but also in terms of the
the accuracy of the solutions that it provides. The real-time processing capabilities of our GPU-based N-FINDR
algorithms and other GPU algorithms for endmember extraction are also discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Human communication relies on a large number of different communication mechanisms like spoken language, facial expressions, or gestures. Facial expressions and gestures are one of the main nonverbal communication mechanisms and pass large amounts of information between human dialog partners. Therefore, to allow for
intuitive human-machine interaction, a real-time capable processing and recognition of facial expressions, hand and head gestures are of great importance. We present a system that is tackling these challenges. The input features for the dynamic head gestures and facial expressions are obtained from a sophisticated three-dimensional model, which is fitted to the user in a real-time
capable manner. Applying this model different kinds of information
are extracted from the image data and afterwards handed over to a real-time capable data-transferring
framework, the so-called Real-Time DataBase (RTDB). In addition to the head and facial-related features, also low-level image features regarding the human hand - optical flow, Hu-moments are stored into the RTDB for the evaluation process of hand gestures. In general, the input of a single camera is sufficient for the parallel
evaluation of the different gestures and facial expressions. The real-time capable recognition of the dynamic hand and head gestures are performed via different Hidden Markov Models, which have proven to be a quick and real-time capable classification method. On the other hand, for the facial expressions classical decision trees or
more sophisticated support vector machines are used for the classification process. These obtained results of the
classification processes are again handed over to the RTDB, where other processes (like a Dialog Management Unit) can easily access them without any blocking effects. In addition, an adjustable amount of history can be stored by the RTDB buffer unit.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a real-time implementation of a logo detection and tracking algorithm in video. The motivation of
this work stems from applications on smart phones that require the detection of logos in real-time. For example, one
application involves detecting company logos so that customers can easily get special offers in real-time. This algorithm
uses a hybrid approach by initially running the Scale Invariant Feature Transform (SIFT) algorithm on the first frame in
order to obtain the logo location and then by using an online calibration of color within the SIFT detected area in order
to detect and track the logo in subsequent frames in a time efficient manner. The results obtained indicate that this hybrid
approach allows robust logo detection and tracking to be achieved in real-time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Novel method for monitoring the entire three-dimensional shape of the chest wall in real time is presented. The system is
based on the multiple-line laser triangulation principle. The laser projector generates a light pattern of 33 equally inclined
light planes directed toward the measured surface. The camera records the illuminated surface from a different
viewpoint, and consequently, the light pattern is distorted by the shape of the surface. The acquired images are
transferred in the personal computer, where contour detection, three-dimensional surface reconstruction, shape analysis,
and displaying are performed in real time. Surface displacements are calculated by subtraction of the current measured
surface from the reference one. Differences are displayed with color palette, where the blue represent the inward
(negative) and the red represent the outward (positive) movement. The accuracy of the calibrated apparatus is ±0.5 mm,
which is calculated as a standard deviation between points of the measured and nominal reference surface. The
measuring range is approximately 400×600×500 mm in width, height and depth. The intention of this study was to
evaluate the system by means of its ability to distinguish between different breathing patterns and to verify the accuracy
of measuring chest wall deformation volumes during breathing. The results demonstrate that the presented 3-d measuring
system has a great potential as a diagnostic and training tool in case of monitoring the breathing pattern. We believe that
exact graphical communication with the patient is much more simple and easy to understand than verbal and/or
numerical.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic speaker identification in a videoconferencing environment will allow conference attendees to focus their
attention on the conference rather than having to be engaged manually in identifying which channel is active and who
may be the speaker within that channel. In this work we present a real-time, audio-coupled video based approach to
address this problem, but focus more on the video analysis side. The system is driven by the need for detecting a talking
human via the use of computer vision algorithms. The initial stage consists of a face detector which is subsequently
followed by a lip-localization algorithm that segments the lip region. A novel approach for lip movement detection based
on image registration and using the Coherent Point Drift (CPD) algorithm is proposed. Coherent Point Drift (CPD) is a
technique for rigid and non-rigid registration of point sets. We provide experimental results to analyse the performance
of the algorithm when used in monitoring real life videoconferencing data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Coded structured light is a technique that allows the 3-D reconstruction of poorly or non-textured scene areas.
The codes, uniquely associated with visual primitives of the projected pattern, allow to solve the correspondence
problem using local information only with robustness against pertubations like high curvatures, occlusions, out
of field of view, out-of-focus. Real-time 3-D reconstruction is possible with pseudo-random arrays, where the
encoding is done in a single pattern using spatial neighbourhood. Ensuring a higher Hamming distance between
all the used codewords, will allow to correct more mislabeled primitives and thus ensure patterns globally more
robust.1
Up to now, the proposed coding schemes ensured the Hamming distance between all the primitives of the
pattern which was generated offline, beforehand, producing Perfect SubMaps (PSM). But knowing the epipolar
geometry of the projector-camera system, one can ensure the Hamming distance only between primitives that
will project along nearby epipolar lines, because these only can produce correspondence ambiguity during the
decoding process. As for such a new coding scheme, the Hamming distance have to be checked only in subsets of
the pattern primitives, the patterns are globally far less constrained and therefore can be generated at a video
framerate. We call such a new pattern coding as SubPerfect SubMaps (SPSM).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we propose a hardware capture system for the generation of layered depth video content, and
realtime feedback mechanismus required to ensure optimal data acquisition for 3DTV productions. The capture
system consists of five color cameras and two time of flight cameras. The time of flight cameras allow direct
depth measurements and thus help to overcome difficulties of classical stereo matching. However they suffer
from low resolution and must be combined with stereo to achieve acceptable results in a post production process.
Realtime previews are hence necessary, so that a dynamic adaption of the scene as well as the capture parameters
becomes possible, and a good data quality can be assured.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a comparative analysis of local trinocular and binocular depth estimation techniques. Local
techniques are chosen because of their higher computational efficiency compared to global approaches. Our aim is to
quantify the benefits of the third camera with respect to performance and computational burden. We have adopted the
color-weighted local-window approach in stereo matching, where pixels within local spatial window around the pixel
being processed are penalized by their colors in order to ensure better adaptivity to local structures. Thus, the window
size becomes the main parameter which influences the quality and determines the execution time.
Extensive experiments on large set of data have been carried out to test trinocular versus binocular setting in terms of
quality of estimated depth and execution time. Both natural and artificial scenes have been tested. A set of quality
measures has been used to support the comparisons. MPEG Depth Estimation Reference Software has been used as a
reference benchmark as well. Results show that from some window size on, the trinocular setting outperforms the
binocular in general: providing higher quality for less computational time. While comparisons were done for 'pure' depth
estimation, we also run post-processing on depth estimates in order to analyze the potential of estimated depths to be
further improved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes two novel motion-vector based techniques for target detection and target tracking in surveillance
videos. The algorithms are designed to operate on a resource-constrained device, such as a surveillance
camera, and to reuse the motion vectors generated by the video encoder. The first novel algorithm for target
detection uses motion vectors to construct a consistent motion mask, which is combined with a simple
background segmentation technique to obtain a segmentation mask. The second proposed algorithm aims at
multi-target tracking and uses motion vectors to assign blocks to targets employing five features. The weights
of these features are adapted based on the interaction between targets. These algorithms are combined in one
complete analysis application. The performance of this application for target detection has been evaluated for
the i-LIDS sterile zone dataset and achieves an F1-score of 0.40-0.69. The performance of the analysis algorithm
for multi-target tracking has been evaluated using the CAVIAR dataset and achieves an MOTP of around 9.7
and MOTA of 0.17-0.25. On a selection of targets in videos from other datasets, the achieved MOTP and MOTA
are 8.8-10.5 and 0.32-0.49 respectively. The execution time on a PC-based platform is 36 ms. This includes the
20 ms for generating motion vectors, which are also required by the video encoder.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For target detection within a large-field cluttered background from a long distance, several difficulties, involving low
contrast between target and background, little occupancy, illumination ununiformity caused by vignetting of lens, and
system noise, make it a challenging problem. The existing approaches to dim target detection can be roughly divided into
two categories: detection before tracking (DBT) and tracking before detection (TBD). The DBT-based scheme has been
widely used in practical applications due to its simplicity, but it often requires working in the situation with a higher
signal-to-noise ratio (SNR). In contrast, the TBD-based methods can provide impressive detection results even in the
cases of very low SNR; unfortunately, the large memory requirement and high computational load prevents these
methods from real-time tasks. In this paper, we propose a new method for dim target detection. We address this problem
by combining the advantages of the DBT-based scheme in computational efficiency and of the TBD-based in detection
capability. Our method first predicts the local background, and then employs the energy accumulation and median filter
to remove background clutter. The dim target is finally located by double window filtering together with an improved
high order correlation which speeds up the convergence. The proposed method is implemented on a hardware platform
and performs suitably in outside experiments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper considers a novel indoor positioning method that is currently under development at the ETH Zurich. The
method relies on a digital spatio-semantic interior building model CityGML and a Range Imaging sensor. In contrast to
common indoor positioning approaches, the procedure presented here does not require local physical reference
infrastructure, such as WLAN hot spots or reference markers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Two synchronized cameras are utilized to obtain independent video streams to detect moving objects from two different
viewing angles. The video frames are directly correlated in time. Moving objects in image frames from the two cameras
are identified and tagged for tracking. One advantage of such a system involves overcoming effects of occlusions that
could result in an object in partial or full view in one camera, when the same object is fully visible in another camera.
Object registration is achieved by determining the location of common features in the moving object across simultaneous
frames. Perspective differences are adjusted. Combining information from images from multiple cameras increases
robustness of the tracking process. Motion tracking is achieved by determining anomalies caused by the objects'
movement across frames in time in each and the combined video information. The path of each object is determined
heuristically. Accuracy of detection is dependent on the speed of the object as well as variations in direction of motion.
Fast cameras increase accuracy but limit the speed and complexity of the algorithm. Such an imaging system has
applications in traffic analysis, surveillance and security, as well as object modeling from multi-view images. The
system can easily be expanded by increasing the number of cameras such that there is an overlap between the scenes
from at least two cameras in proximity. An object can then be tracked long distances or across multiple cameras
continuously, applicable, for example, in wireless sensor networks for surveillance or navigation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we propose a robust real-time image and sensor based approach for automatic 3d model acquisition
of sewer shafts from survey videos captured by a downward-looking fisheye-lens camera while lowering it into
the shaft. Our approach is based on Structure from Motion adjusted to the constrained motion and scene, and
involves shape recognition techniques in order to obtain the geometry of the scene appropriately. We perform a
time budget evaluation for the components of an existing off-line application based on previous work and design a
real-time application which can be applied during on-site inspection. The methods of our approach are modified
so that they can be executed on the GPU. Expensive bundle adjustment is avoided by applying a simple and fast
geometric correction of the computed reconstruction which is capable of handling inaccuracies of the intrinsic
camera calibration parameters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Convolution and correlation are very basic image processing operations with numerous applications ranging from image
restoration to target detection to image resampling and geometrical transformation. In real time applications, the crucial
issue is the processing speed, which implies mandatory use of algorithms with the lowest possible computational
complexity. Fast image convolution and correlation with large convolution kernels are traditionally carried out in the
domain of Discrete Fourier Transform computed using Fast Fourier Transform algorithms. However standard DFT based
convolution implements cyclic convolution rather than linear one and, because of this, suffers from heavy boundary
effects. We introduce a fast DCT based convolution algorithm, which is virtually free of boundary effects of the cyclic
convolution. We show that this algorithm have the same or even lower computational complexity as DFT-based
algorithm and demonstrate its advantages in application examples of image arbitrary translation and scaling with perfect
discrete sinc-interpolation and for image scaled reconstruction from holograms digitally recorded in near and far
diffraction zones. In geometrical resampling the scaling by arbitrary factor is implemented using the DFT domain scaling
algorithm and DCT-based convolution. In scaled hologram reconstruction in far diffraction zones the Fourier
reconstruction method with simultaneous scaling is implemented using DCT-based convolution. In scaled hologram
reconstruction in near diffraction zones the convolutional reconstruction algorithm is implemented by the DCT-based
convolution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.