This PDF file contains the front matter associated with SPIE Proceedings Volume 9406, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and Conference Committee listing.
Quintic Bézier splines are capable of dynamic adaptive path changes in curvature continuous fashion. This paper presents a general purpose method for synchronizing the turning and steering rates of a mobile robot’s wheels according to a given velocity profile when traversing a quintic Bézier spline or some other differentiable parametric curve. The presented method is applicable to any positioning of the wheels and a part of a more extensive effort to realize adaptive behavior in real-life wheeled robots. Spatial vector algebra was used with the practical benefit of simplified equations, which made the design and implementation of the synchronization method simpler and easier to implement. This paper gives brief overviews of quintic Bézier splines and spatial vectors. The successful synchronization of wheel motions is demonstrated with two simulated examples.
In this paper, we propose an image processing scheme for moving object detection from a mobile robot with a single camera. It especially aims at intruder detection for the security robot on either smooth or uneven ground surfaces. The proposed scheme uses the template matching with basis image reconstruction for the alignment between two consecutive images in the video sequence. The most representative template patches in one image are first automatically selected based on the gradient energies in the patches. The chosen templates then form a basis image matrix. A windowed subimage is constructed by the linear combination of the basis images, and the instances of the templates in the subsequent image are matched by evaluating their reconstruction error from the basis image matrix. For two well aligned images, a simple and fast temporal difference can thus be applied to identify moving objects from the background. The proposed template matching can tolerate ±10° in rotation and ±10% in scaling. By adding templates with larger rotational angles in the basis image matrixes, the proposed method can match images from severe camera vibrations. The proposed scheme achieves a fast processing rate of 32 frames per second for images of size 160×120 pixels.
Robotic exploration, for the purposes of search and rescue or explosive device detection, can be improved by using a team of multiple robots. Potential field navigation methods offer natural and efficient distributed exploration algorithms in which team members are mutually repelled to spread out and cover the area efficiently. However, they also suffer from field minima issues. Liu and Lyons proposed a Space-Based Potential Field (SBPF) algorithm that disperses robots efficiently and also ensures they are driven in a distributed fashion to cover complex geometry. In this paper, the approach is modified to handle two problems with the original SBPF method: fast exploration of enclosed spaces, and fast navigation of convex obstacles. Firstly, a “gate-sensing” function was implemented. The function draws the robot to narrow openings, such as doors or corridors that it might otherwise pass by, to ensure every room can be explored. Secondly, an improved obstacle field conveyor belt function was developed which allows the robot to avoid walls and barriers while using their surface as a motion guide to avoid being trapped. Simulation results, where the modified SPBF program controls the MobileSim Pioneer 3-AT simulator program, are presented for a selection of maps that capture difficult to explore geometries. Physical robot results are also presented, where a team of Pioneer 3-AT robots is controlled by the modified SBPF program. Data collected prior to the improvements, new simulation results, and robot experiments are presented as evidence of performance improvements.
Developing precise and low-cost spatial localization algorithms is an essential component for autonomous
navigation systems. Data collection must be of sufficient detail to distinguish unique locations, yet coarse enough to
enable real-time processing. Active proximity sensors such as sonar and rangefinders have been used for interior
localization, but sonar sensors are generally coarse and rangefinders are generally expensive. Passive sensors such as
video cameras are low cost and feature-rich, but suffer from high dimensions and excessive bandwidth. This paper
presents a novel approach to indoor localization using a low cost video camera and spherical mirror. Omnidirectional
captured images undergo normalization and unwarping to a canonical representation more suitable for processing.
Training images along with indoor maps are fed into a semi-supervised linear extension of graph embedding manifold
learning algorithm to learn a low dimensional surface which represents the interior of a building. The manifold surface
descriptor is used as a semantic signature for particle filter localization. Test frames are conditioned, mapped to a low
dimensional surface, and then localized via an adaptive particle filter algorithm. These particles are temporally filtered
for the final localization estimate. The proposed method, termed omnivision-based manifold particle filters, reduces
convergence lag and increases overall efficiency.
In this paper, we present an enhanced loop closure method* based on image-to-image matching relies on quantized local Zernike moments. In contradistinction to the previous methods, our approach uses additional depth
information to extract Zernike moments in local manner. These moments are used to represent holistic shape
information inside the image. The moments in complex space that are extracted from both grayscale and depth
images are coarsely quantized. In order to find out the similarity between two locations, nearest neighbour (NN)
classification algorithm is performed. Exemplary results and the practical implementation case of the method
are also given with the data gathered on the testbed using a Kinect. The method is evaluated in three different
datasets of different lighting conditions. Additional depth information with the actual image increases the detection rate especially in dark environments. The results are referred as a successful, high-fidelity online method
for visual place recognition as well as to close navigation loops, which is a crucial information for the well known
simultaneously localization and mapping (SLAM) problem. This technique is also practically applicable because
of its low computational complexity, and performing capability in real-time with high loop closing accuracy.
This paper presents improvements made to the intelligence algorithms employed on Q, an autonomous ground vehicle, for the 2014 Intelligent Ground Vehicle Competition (IGVC). In 2012, the IGVC committee combined the formerly separate autonomous and navigation challenges into a single AUT-NAV challenge. In this new challenge, the vehicle is required to navigate through a grassy obstacle course and stay within the course boundaries (a lane of two white painted lines) that guide it toward a given GPS waypoint. Once the vehicle reaches this waypoint, it enters an open course where it is required to navigate to another GPS waypoint while avoiding obstacles. After reaching the final waypoint, the vehicle is required to traverse another obstacle course before completing the run. Q uses modular parallel software architecture in which image processing, navigation, and sensor control algorithms run concurrently. A tuned navigation algorithm allows Q to smoothly maneuver through obstacle fields. For the 2014 competition, most revisions occurred in the vision system, which detects white lines and informs the navigation component. Barrel obstacles of various colors presented a new challenge for image processing: the previous color plane extraction algorithm would not suffice. To overcome this difficulty, laser range sensor data were overlaid on visual data. Q also participates in the Joint Architecture for Unmanned Systems (JAUS) challenge at IGVC. For 2014, significant updates were implemented: the JAUS component accepted a greater variety of messages and showed better compliance to the JAUS technical standard. With these improvements, Q secured second place in the JAUS competition.
Bag-of-words (BoW) is one of the most successful methods for object categorization. This paper proposes a statistical
codeword selection algorithm where the best subset is selected from the initial codewords based on the statistical
characteristics of codewords. For this purpose, we defined two types of codeword-confidences: cross- and within-category
confidences. The cross- and within-category confidences eliminate indistinctive codewords across categories and
inconsistent codewords within each category, respectively. An informative subset of codewords is then selected based on
these two codeword-confidences. The experimental evaluation for a scene categorization dataset and a Caltech-101 dataset
shows that the proposed method improves the categorization performance up to 10% in terms of error rate reduction when
cooperated with BoW, sparse coding (SC), and locality-constrained liner coding (LLC). Furthermore, the codeword size
is reduced by 50% leading a low computational complexity.
Mobile robots that rely on vision, for navigation and object detection, use saliency approaches to identify a set
of potential candidates to recognize. The state of the art in saliency detection for mobile robotics often rely upon
visible light imaging, using conventional camera setups, to distinguish an object against its surroundings based
on factors such as feature compactness, heterogeneity and/or homogeneity. We are demonstrating a novel multi-
polarimetric saliency detection approach which uses multiple measured polarization states of a scene. We leverage
the light-material interaction known as Fresnel reflections to extract rotationally invariant multi-polarimetric
textural representations to then train a high dimensional sparse texture model. The multi-polarimetric textural
distinctiveness is characterized using a conditional probability framework based on the sparse texture model
which is then used to determine the saliency at each pixel of the scene. It was observed that through the
inclusion of additional polarized states into the saliency analysis, we were able to compute noticeably improved
saliency maps in scenes where objects are difficult to distinguish from their background due to color intensity
similarities between the object and its surroundings.
The segmentation is the first step and core technology for semantic understanding of the video. Many tasks in the computer vision such as tracking, recognition and 3D reconstruction, etc. rely on the segmentation result as preprocessing. However, the video segmentation has been known to be a very complicated and hard problem. The objects in the video change their colors and shapes according to the surrounding illumination, the camera position, or the object motion. The color, motion, or depth has been utilized individually as a key clue for the segmentation in many researches. However, every object in the image is composed of several features such as color, texture, depth and motion. That is why single-feature based segmentation method often fails. Humans can segment the objects in video with ease because the human visual system enables to consider color, texture, depth and motion at the same time. In this paper, we propose the video segmentation algorithm which is motivated by the human visual system. The algorithm performs the video segmentation task by simultaneously utilizing the color histogram of the color, the optical flow of the motion, and the homography of the structure. Our results show that the proposed algorithm outperforms other appearance based segmentation method in terms of semantic quality of the segmentation . The proposed segmentation method will serve as a basis for better high-level tasks such as recognition, tracking , and video understanding .
Because of the further from the center of image the lower resolution and the severe non-linear distortion are the
characteristics of uncorrected fish-eye lens image, the traditional feature matching method can’t achieve good
performance in the applications of fish-eye lens, which correct distortion firstly and then matches the features in image.
Center-symmetric Local Binary Pattern (CS-LBP) is a kind of descriptor based on grayscale information from
neighborhood, which has high ability of grayscale invariance and rotation invariance. In this paper, CS-LBP will be
combined with Scale Invariant Feature Transform (SIFT) to solve the problem of feature point matching on uncorrected
fish-eye image. We first extract the interest points in the pair of fish-eye images by SIFT, and then describe the
corresponding regions of the interest points through CS-LBP. Finally the similarity of the regions will be evaluated using
the chi-square distance to get the only pair of points. For the specified interest point, the corresponding point in another
image can be found out. The experimental results show that the proposed method achieves a satisfying
matching performance in uncorrected fish-eye lens image. The study of this article will be useful to enhance the
applications of fish-eye lens in the field of 3D reconstruction and panorama restoration.
Fourier descriptors have long been used to describe the underling continuous contours of two-dimensional shapes. Approximations of shapes by polygons is a natural step for efficient algorithms in computer graphics and computer vision. This paper derives mathematical relationships between the Fourier descriptors of the continuous contour, and the corresponding descriptors of a polygon obtained by connecting samples on the contour. We show that the polygon's descriptors may be obtained analytically in two ways: first, by summing subsets of the contour's descriptors; and second, from the discrete Fourier transform (DFT) of the polygon's vertices. We also analyze, in the Fourier domain, shape approximation using interpolators. Our results show that polygonal approximation, with its potential benefits for efficient analysis of shape, is achievable in the Fourier descriptor domain.
In recent years, a variety of nonlinear dimensionality reduction techniques (NLDR) have been proposed in the literature. They aim to address the limitations of traditional techniques such as PCA and classical scaling. Most of these techniques assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. They provide a mapping from the high-dimensional space to the low-dimensional embedding and may be viewed, in the context of machine learning, as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Laplacian Eigenmaps (LE) is a nonlinear graph-based dimensionality reduction method. It has been successfully applied in many practical problems such as face recognition. However the construction of LE graph suffers, similarly to other graph-based DR techniques from the following issues: (1) the neighborhood graph is artificially defined in advance, and thus does not necessary benefit the desired DR task; (2) the graph is built using the nearest neighbor criterion which tends to work poorly due to the high-dimensionality of original space; and (3) its computation depends on two parameters whose values are generally uneasy to assign, the neighborhood size and the heat kernel parameter. To address the above-mentioned problems, for the particular case of the LPP method (a linear version of LE), L. Zhang et al.1 have developed a novel DR algorithm whose idea is to integrate graph construction with specific DR process into a unified framework. This algorithm results in an optimized graph rather than a predefined one.
If we apply the developed local polar edge detection method, or LPED method, to a binary image (with each pixel being either black or white), we can obtain the boundary points of all objects embedded in the more randomly distributed noise background in sub-milli-second time. Then we can apply our newly developed grouping or clustering algorithm to separate the boundary points for all objects into individual-object, boundary-point groups. Then we can apply our fast identification-and-tracking technique to automatically identify each object by its unique geometry shape and track its movement simultaneously for N objects like we did before for two objects. This paper will concentrate at the algorithm design of this superfast grouping technique. It is not like the classical combinatorial clustering algorithm in which the computation time increases exponentially with the number of points to be clustered. It is a linear time grouping method in which the grouping time increases only linearly with the number of the total points to be grouped. The total time for automatic grouping of 100-200 boundary points into separated object boundary groups is about 10 to 50 milli-seconds
Spherical stereo vision is a kind of stereo vision system built by fish-eye lenses, which discussing the stereo algorithms
conform to the spherical model. Epipolar geometry is the theory which describes the relationship of the two imaging
plane in cameras for the stereo vision system based on perspective projection model. However, the epipolar in
uncorrected fish-eye image will not be a line but an arc which intersects at the poles. It is polar curve. In this paper, the
theory of nonlinear epipolar geometry will be explored and the method of nonlinear epipolar rectification will be
proposed to eliminate the vertical parallax between two fish-eye images. Maximally Stable Extremal Region (MSER)
utilizes grayscale as independent variables, and uses the local extremum of the area variation as the testing results. It is
demonstrated in literatures that MSER is only depending on the gray variations of images, and not relating with local
structural characteristics and resolution of image. Here, MSER will be combined with the nonlinear epipolar rectification
method proposed in this paper. The intersection of the rectified epipolar and the corresponding MSER region is
determined as the feature set of spherical stereo vision. Experiments show that this study achieved the expected results.
The ability to find and grasp target items in an unknown environment is important for working robots. We developed an autonomous navigating and grasping robot. The operations are locating a requested item, moving to where the item is placed, finding the item on a shelf or table, and picking the item up from the shelf or the table. To achieve these operations, we designed the robot with three functions: an autonomous navigating function that generates a map and a route in an unknown environment, an item position recognizing function, and a grasping function. We tested this robot in an unknown environment. It achieved a series of operations: moving to a destination, recognizing the positions of items on a shelf, picking up an item, placing it on a cart with its hand, and returning to the starting location. The results of this experiment show the applicability of reducing the workforce with robots.
This paper presents the ground work carried out to achieve automatic fine grained recognition of stone masonry.
This is a necessary first step in the development of the analysis tool. The built heritage that will be assessed
consists of stone masonry constructions and many of the features analysed can be characterized according to the
geometry and arrangement of the stones. Much of the assessment is carried out through visual inspection. Thus,
we apply image processing on digital images of the elements under inspection. The main contribution of the paper
is the performance evaluation of the automatic categorization of masonry walls from a set of extracted straight
line segments. The element chosen to perform this evaluation is the stone arrangement of masonry walls. The
validity of the proposed framework is assessed on real images of masonry walls using machine learning paradigms.
These include classifiers as well as automatic feature selection.
Power line inspection is an important task for the maintenance of electrical infrastructure. UAVs (Unmanned aerial vehicle) can be very useful in the inspection process because the high costs of obtaining images of power lines from different perspectives and the logistic problems of manned flights. The use of the power line as a reference for navigation can be difficult because the different backgrounds, we consider the use of the tower as a reference in order to improve the orientation of the UAV respect to the electrical grid. In this work we generate a process for navigation based in tower detection. The navigation is performed by using the information extracted from a frontal camera in a visual control scheme and validated in virtual environments.
LIDAR devices for on-vehicle use need a wide field of view and good fidelity. For instance, a LIDAR for avoidance of landing collisions by a helicopter needs to see a wide field of view and show reasonable details of the area. The same is true for an online LIDAR scanning device placed on an automobile. In this paper, we describe a LIDAR system with full color and enhanced resolution that has an effective vertical scanning range of 60 degrees with a central 20 degree fovea. The extended range with fovea is achieved by using two standard Velodyne 32-HDL LIDARs placed head to head and counter rotating. The HDL LIDARS each scan 40 degrees vertical and a full 360 degrees horizontal with an outdoor effective range of 100 meters. By positioning them head to head, they overlap by 20 degrees. This creates a double density fovea. The LIDAR returns from the two Velodyne sensors do not natively contain color. In order to add color, a Point Grey LadyBug panoramic camera is used to gather color data of the scene. In the first stage of our system, the two LIDAR point clouds and the LadyBug video are fused in real time at a frame rate of 10 Hz. A second stage is used to intelligently interpolate the point cloud and increase its resolution by approximately four times while maintaining accuracy with respect to the 3D scene. By using GPGPU programming, we can compute this at 10 Hz. Our backfilling interpolation methods works by first computing local linear approximations from the perspective of the LIDAR depth map. The color features from the image are used to select point cloud support points that are the best points in a local group for building the local linear approximations. This makes the colored point cloud more detailed while maintaining fidelity to the 3D scene. Our system also makes objects appearing in the PanDAR display easier to recognize for a human operator.
In this paper, we propose an enhanced method of 3D object description and recognition based on local descriptors using RGB image and depth information (D) acquired by Kinect sensor. Our main contribution is focused on an extension of the SIFT feature vector by the 3D information derived from the depth map (SIFT-D). We also propose a novel local depth descriptor (DD) that includes a 3D description of the key point neighborhood. Thus defined the 3D descriptor can then enter the decision-making process. Two different approaches have been proposed, tested and evaluated in this paper. First approach deals with the object recognition system using the original SIFT descriptor in combination with our novel proposed 3D descriptor, where the proposed 3D descriptor is responsible for the pre-selection of the objects. Second approach demonstrates the object recognition using an extension of the SIFT feature vector by the local depth description. In this paper, we present the results of two experiments for the evaluation of the proposed depth descriptors. The results show an improvement in accuracy of the recognition system that includes the 3D local description compared with the same system without the 3D local description. Our experimental system of object recognition is working near real-time.
Fish-eye lens is a kind of short focal distance (f=6~16mm) camera. The field of view (FOV) of it is near or even
exceeded 180×180 degrees. A lot of literatures show that the multiple view geometry system built by fish-eye lens will
get larger stereo field than traditional stereo vision system which based on a pair of perspective projection images. Since
a fish-eye camera usually has a wider-than-hemispherical FOV, the most of image processing approaches based on the
pinhole camera model for the conventional stereo vision system are not satisfied to deal with the applications of this
category of stereo vision which built by fish-eye lenses. This paper focuses on discussing the calibration and the epipolar
rectification method for a novel machine vision system set up by four fish-eye lenses, which is called Special Stereo
Vision System (SSVS). The characteristic of SSVS is that it can produce 3D coordinate information from the whole
global observation space and acquiring no blind area 360º×360º panoramic image simultaneously just using single vision
equipment with one time static shooting. Parameters calibration and epipolar rectification is the basic for SSVS to realize
3D reconstruction and panoramic image generation.
Nonverbal communication, also known as body language, is an important form of communication. Nonverbal behaviors
such as posture, eye contact, and gestures send strong messages. In regard to nonverbal communication, eye contact is one
of the most important forms that an individual can use. However, lack of eye contact occurs when we use video
conferencing system. The disparity between locations of the eyes and a camera gets in the way of eye contact. The lock of
eye gazing can give unapproachable and unpleasant feeling. In this paper, we proposed an eye gazing correction for video
conferencing. We use two cameras installed at the top and the bottom of the television. The captured two images are
rendered with 2D warping at virtual position. We implement view morphing to the detected face, and synthesize the face
and the warped image. Experimental results verify that the proposed system is effective in generating natural gaze-corrected
Proc. SPIE 9406, Increasing signal-to-noise ratio of reconstructed digital holograms by using light spatial noise portrait of camera's photosensor, 94060O (8 February 2015); https://doi.org/10.1117/12.2079077
Digital holography is technique which includes recording of interference pattern with digital photosensor, processing of
obtained holographic data and reconstruction of object wavefront. Increase of signal-to-noise ratio (SNR) of
reconstructed digital holograms is especially important in such fields as image encryption, pattern recognition, static and
dynamic display of 3D scenes, and etc. In this paper compensation of photosensor light spatial noise portrait (LSNP) for
increase of SNR of reconstructed digital holograms is proposed. To verify the proposed method, numerical experiments
with computer generated Fresnel holograms with resolution equal to 512×512 elements were performed. Simulation of
shots registration with digital camera Canon EOS 400D was performed. It is shown that solo use of the averaging over
frames method allows to increase SNR only up to 4 times, and further increase of SNR is limited by spatial noise.
Application of the LSNP compensation method in conjunction with the averaging over frames method allows for 10
times SNR increase. This value was obtained for LSNP measured with 20 % error. In case of using more accurate LSNP,
SNR can be increased up to 20 times.
Nowadays, computer vision has been wildly used in our daily life. In order to get some reliable information, camera
calibration can not be neglected. Traditional camera calibration cannot be used in reality due to the fact that
we cannot find the accurate coordinate information of the referenced control points. In this article, we present a
camera calibration algorithm which can determine the intrinsic parameters both with the extrinsic parameters.
The algorithm is based on the parallel lines in photos which can be commonly find in the real life photos. That
is we can first get the intrinsic parameters as well as the extrinsic parameters though the information picked
from the photos we take from the normal life. More detail, we use two pairs of the parallel lines to compute
the vanishing points, specially if these parallel lines are perpendicular, which means these two vanishing points
are conjugate with each other, we can use some views (at least 5 views) to determine the image of the absolute
conic(IAC). Then, we can easily get the intrinsic parameters by doing cholesky factorization on the matrix of
IAC.As we all know, when connect the vanishing point with the camera optical center, we can get a line which is
parallel with the original lines in the scene plane. According to this, we can get the extrinsic parameters R and
T. Both the simulation and the experiment results meets our expectations.