PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 10806 including the Title Page, Copyright information, Table of Contents, Introduction, and Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In old movies, the common jitter is caused by translation, rotation and zooming. Aiming at the common phenomenon of video jitter, this paper proposes a method of combining Lucas-Kanade sparse optical flow with feature point matching to estimate the global motion parameters. Then, it is applied to the restoration of old film, so as to realize the motion compensation from the jitter frame to the reference frame, so as to achieve the image stabilization effect of the continuous sequence frame of the old movie. Experiment’s results show that this algorithm has a good real-time performance, and it can solve the problem of smooth transition between frame and frame effectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to solve the problem of low rate of automatic sorting, according to the identified physical characteristics of the object contour and color and relative distance and the different characteristics of vibration ,using the library contains colors, lines, locations and vibration, the match feature of packing box can be constructed by using regular expression. Experiment results showed that the recognition rate was 92.5% under constant perspective and average recognition time was 50.3ms. Compared with the Scale Invariant Feature Transform(SIFT) and Speeded Up Robust Feature(SURF), the proposed method achieved a significant improvement for automatic sorting and identification under constant perspective.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Existing salient object extraction methods for the low depth-of-field (DOF) image are usually based on local saliency. However, in the low DOF image, the smooth region of salient objects is similar to the background in local saliency, so they are easily confused. In this paper, a novel salient object extraction method is proposed by introducing Support Vector Data Description (SVDD) for salient object shape description. It is the first time that SVDD is used for salient object extraction. SVDD makes full use of global characteristics of salient objects, which makes it possible for our approach to accurately extract salient objects containing smooth regions. Experiments on a Flickr dataset consisting of 141 low DOF images indicate that F-measure of our approach is better than the existing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to reduce the errors in depth estimation, a credible depth estimation method based on superpixel constraint matching is proposed. It consists of normalized binocular image disparity optimization, credible granularity region segmentation and similarity measure of granularity region. This method segments the normalized binocular images finely by using the superpixel granulation method, and divides the binocular image into a large number of excellent granularity regions. To get the best match for each granularity partitioned, the correlative matching area is obtained by polar line constraint matching. And then the matching similarity measure function is used to achieve the best superpixel granularity regional matching results in binocular images, so as to find the two-dimensional correspondence of each granularity region. Finally, realize the depth information estimation of binocular parallax images. The experimental results show that this method can obviously reduce the errors of depth estimation in traditional methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To well maintain structure coherence of a repaired image, an exemplar-based image inpainting algorithm using structural feature offsets statistics is proposed. First, the whole degraded image is partitioned into structural part and non-structural part using Canny operator. Afterwards, the patch offsets statistics of each part are separately calculated and only a few dominant offsets for each part are selected as candidate labels. Finally, global energy function is constructed by simultaneously considering color and gradient information and solved by using multi-label graph cuts. Experimental results show that the proposed method yields generally better inpainted effect than three state-of-the-art methods in terms of structure coherence and neighborhood consistence.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To realize stable wide-baseline matching for structured scenes with low texture regions, a new matching method based on line intersection features (LIFS) is proposed, which combines the robustness of line feature and distinctiveness of keypoint’s descriptor. First, detect lines and compute line intersections. Second, line intersections from perspective projection of parallel lines or skew lines are filtered by parallel lines clustering and coplanar constraint which increases the stability and accuracy of line intersections. Third, local non maxima suppression is used to limit the intersections close to each other. Fourth, feature scales are computed for LIFS by simply utilizing the geometry distribution of intersections and endpoints of intersection lines. Finally, SURF descriptors are computed for LIFS in the computed scales and thus scale and rotation invariance is achieved. Experiment results show that compared with traditional matching method based on local features, the proposed method is more robust to image noise and illumination change. Besides, the proposed method has invariance to scale and rotation change and a certain degree of viewpoint change, providing an effective wide baseline matching method for images of structured scenes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Scale Invariant Feature Transform (SIFT) algorithm has been widely used for its excellent stability in rotation, scale and affine transformation. The local SIFT descriptor has excellent accuracy and robustness. However, it is only based on gray scale ignoring the overall color information of the image resulting in poorly recognizing to the images with rich color details. We proposed an optimized method of SIFT algorithm in this paper which shows superior performance in feature extraction and matching. RGB color space normalization is used to eliminate the effects of illumination position and intensity invariant on the image. Then we proposed a novel similarity retrieval method, which used K nearest neighbor search strategy by constructing K-D tree (k-dimensional tree), to process the key points extracted from the normalized color space. The key points of RGB space are filtered and combined efficiently. Experimental results demonstrate that the performance of the optimized algorithm is obviously better than the original SIFT algorithm in matching. The average matching accuracy of test samples is 87.05%, an average increase of 18.21%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the advantages of having large storage capacity and small code area, QR (quick response) codes have been widely used for automatic identification in many commercial applications such as parcel packaging, business cards and etc. The existing methods mainly focus on unambiguous QR code location with simple background, which always rely on the accomplishment of machine independently. While the QR code images with low quality and complex background always affect the accuracy and efficiency of location in automatic identification, especially the QR code images in which the finder patterns are destroyed. With the help of human, many interactive learning approaches can solve the problem of cognitive obstacles in computer operations. This paper focuses on locating blur QR codes with complex background by an efficient interactive two-stage framework. The first stage is rough location, which includes our interactive feature template setting and clustering process with our improved mean shift algorithm. Then we do the accurate location based on the optimization of the finder pattern detection. Experiments are performed on damaged, contaminated and scratched images with a complex background, which provide a quite promising result for QR code location.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Feature learning has been widely used for image recognition. However, limited training samples and much noise usually make it challenging in practical classification applications. Specifically, it makes sample covariance matrix usually deviate from true ones. To alleviate this bias, we utilize a fractional-order strategy to re-model sample spectra of covariance matrix. On the other hand, as the object classes’ boundary is not very clear in practice, it is necessary to incorporate fuzzy relationship into feature learning. In this paper, we propose a fuzzy fractional canonical correlation analysis (FFCCA), where sample spectra are reconstructed by fractional modeling and at the same time, fuzzy label information is considered. Experimental results on visual recognition have shown that FFCCA can learn more discriminative low-dimensional features, in contrast with existing feature learning methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In fine-grained object recognition task, over-fitting problem often occurred due to the small number of fine-gained data for each category, especially for the CNN with deep layers and millions parameters. Therefore, we proposed a data augmentation method based on interest points of feature, which can alleviate the over-fitting problem and improve the classification accuracy effectively. The key idea of our method is finding the interest points that attract the classifier which come from the output of the CNN middle layer, locating the areas corresponding to the interest points in the original images and cutting the areas out for augmentation. All work can be done through training procedure completely. The method requires no additional training models and more other parameters. We applied the proposed data augmentation method on CUB200-2011, Stanford Dogs and Aircraft datasets and achieved excellent performance with 11.32% classification accuracy improvement. The experiment results showed that the proposed method can mitigate the problem of over-fitting in fine-grained images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditional hyperspectral image classification typically uses raw spectral signatures without considering the spatial characteristics. In this paper, we proposed a novel method for hyperspectral image classification based on morphological attribute profiles. We employed independent component analysis for dimensionality reduction and designed an extended multiple attribute profiles (EMAP) to extract spatial features in ICA-induced subspaces. For accurate classification, we proposed a Bayesian maximum a posteriori formulation that couples EMAPs-based feature extraction for the class-conditional probability with an MRF-based prior. Experimental results show that the proposed method substantially outperforms traditional and state-of-the-art methods tending to result in smoother classification maps with fewer erroneous outliers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computer vision based interaction between bare hands and virtual objects is an urgent problem to be solved in augmented reality and teleoperation. Bare hand tracking is one of the key issues. An effective hand tracking method based on compressed sensing and multiple feature descriptors is studied in depth. Firstly, a rectangular tracking window containing the hand is determined manually in the initial frame. Using the compressed sensing theory, key Haar feature values and HOG (abbreviation of histogram of oriented gradients) feature values of the initial tracking window are calculated respectively. Thus the classifier is initialized. For the subsequent frames, those positive samples and negative ones around the moving hand are captured, their feature values are calculated, and the classifier is updated. The candidate region corresponding to the maximum of the classifier is taken as the target region of the moving hand in each frame. In the process, Haar feature values and HOG feature values of the candidate region samples are calculated respectively. Simulation experiments and real experiments are carried out by using the proposed tracking method. Experimental results demonstrate that the proposed method can track the moving hand effectively. The proposed hand tracking method can be used in the fields of human computer interaction, augmented reality and teleoperation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A support vectors classification Method Based on projection vector boundary feature is proposed. According to statistical theory and normal distribution characteristics in one-dimensional space, the proposed algorithm introduces a new definition of the margin, objective function is constructed in high-dimensional space, through solving the objective function, and projection line is obtained. After the training samples are projected to the line , we construct boundary vector sets in one-dimensional space, which are used to train support vector machine(SVM). Experiments on two artificial data sets and UCI standard data set show that the proposed method is effective.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a new effective and robust framework to recognize human actions from depth map sequence. Firstly, 3D motion trail model (3DMTM) is extracted to represent the temporal motion information. Then, two effective heterogeneous features are proposed to descried actions more comprehensive based on 3DMTM. By computing Multilayer Histograms of Oriented Gradient (MHOG) on 3DMTM, 3DMTM-MHOG is obtained to describe local detail information of different actions. Combining Gist and 3DMTM, we can get 3DMTM-Gist to model holistic structural feature of actions. The feature-level fusion method is utilized to merge two descriptors to form the final feature. Lastly, support vector machine (SVM) classification is used for multi-class action recognition. Experimental results on public depth action dataset (MSR Action3D dataset) show that our method is superior to the state-of-the-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This article based on the current robot to achieve the identification of objects exist a number of issues to design. In order to achieve the robots in the family scene to identify specific objects. Thus verifying its feasibility and practicability.Based on SURF algorithm and SVM classifier to extract local features and training, this paper proposes a PCA algorithm and Bag-of-Visual-Word algorithm to reduce the dimensionality and clustering of extracted features to facilitate SVM training while improving recognition accuracy and reducing computation time. At the same time using multi-view and Image Pyramid segmentation method to solve the occlusion and complex background recognition.All experiments were performed using the Webots robotics development platform and the OpenCV library.Experimental results show that the above method can ensure the real-time performance while ensuring the accuracy of recognition. It has a certain feasibility and practical value.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A methodology of accuracy evaluation of automated object recognition using sub-meter spatial resolution multispectral aerial images and neural network is proposed. The methodology is applied to detection of 5 land cover classes from visible and infrared images using a multilevel convolutional neural network (CNN). In this work the well-known indicators of accuracy classification have been chosen: the confusion matrix and Kappa coefficient. Image processing results are analyzed. It is shown that the recognized object boundaries are delineated with sufficiently high accuracy and classes are well separated. The results of testing confirmed sufficiently high qualitative and quantitative indicators of the developed methodology (classification accuracy, sustainability, reproducibility).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is of great significance to recognize hand posture since people with hearing and speech disabilities use sign language as the main medium of communication. To eliminate the shortcoming of low recognition rate caused by the redundant features in the traditional 3D hand posture recognition methods, an algorithm of 3D hand posture recognition with space coordinates based on optimal feature selection is proposed in this paper, which innovatively combines with XGBoost method. And three main steps involved are feature extraction, optimal feature selection and posture recognition respectively. Firstly, self-defined attributes and features are extracted from 3D coordinate data collected by Leap Motion Controller. Then, the XGBoost model combined with cross validation is employed to select optimal features from different attributes. Finally, the selected features instead of all extracted features are then fed into Gaussian Naive Bayes classifier to recognize the target posture. The proposed method is experimented on different data sequences containing ten heavily-used postures of Chinese Sign Language. The experimental results show that after processed by optimal feature selection, the proposed method can achieve higher recognition rate than the traditional methods, and reduce the number of training samples by half at the peak recognition rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This article discusses the issue of automatic target recognition (ATR) on Synthetic Aperture Radar images (SAR). Through learning the hierarchy of features automatically from massive training data, learning networks, such as Convolutional Neural Networks (CNN) has recently achieved the state-of-the-art results in many tasks. Moreover, unlike optical images, SAR imaging have the advantages of reduced sensitivity to weather conditions, day-night operation, penetration capability through obstacles, etc. Despite these utilities, several factors can affect the accuracy of the classification, such as errors linked with brightness values of the pixels and geometry registered by the satellite sensors. To correct these errors and extract better features about SAR targets, and obtain better accuracies a two steps algorithm called SAE-CNN-Recognizer(SCR) is proposed: Firstly, a pre-processing step consist of image enhancement is achieved using Sparse Auto-Encoder (SAE) to emphasize some image features for following analysis. Secondly, CNN architecture which consist of a feature extraction stage followed by a classification step using a softmax classifier. The experimental results on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset prove that this approach can accomplish an average accuracy higher than 97% on the classification of targets in ten categories, which is higher than the traditional CNN results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sign language is described by their significance primarily hand posture changes. But traditional colour-based detection methods are possible be influenced by complex background, skin tones and other parts of body. In order to overcome such problems, this article adopted the method based on RGB-D to detect the gesture area in the video. Then, the adaptively extracting key frame of sign language is adopted, according to the change of gesture area. So the problem is converted into obtaining the standard static gesture image. Then the identification results are sent to NAO robot. Well the human-robot interaction is completed. Experimental results showed that combination of colour space and depth threshold can greatly reduce the influence of complex background and skin colour region. Key frame extraction is a steady foundation for improving the rate of hand gesture recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Batch learning method is usually adopted for traditional SAR target identification, but training data of a system cannot be completely acquired at one time in practical application. When a new training sample is added, the batch training method needs to retrain the whole system. In order to solve this problem, cholesky factorization principle was adopted in this paper to promote extreme learning machine to an incremental learning form and apply it in the classifier training for SAR target identification. Moreover, in allusion to disadvantageous approximation capability of traditional single kernel function, a multi-scale wavelet kernel function was established to improve classification performance thereof. Experiment results show: when new SAR target sample is obtained, this algorithm only needs to update output weight value to update the system, without any retraining; it has extremely fast speed, with identification rate higher than that of traditional kernel extreme learning machine, SVM algorithm, etc., thus becoming a good choice for the online updating of SAR target identification system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Feature extraction based on Gammatone filterbank is more robust than that from Mel filterbank in underwater acoustic recognition. However, both conventional auditory features only represent the energy-based amplitude of the signal, and their performance decrease in low underwater SNR environments. Phase represented by instantaneous frequency (IF) may also contain some characteristics of the target. This paper proposes a novel fusion feature based on the outputs of Gammatone filters, in which an optimized algorithm of instantaneous frequency is given. Experiments employs Support Vector Machine (SVM) as the classifier and relative results indicate that significant performance gains can be obtained with instantaneous frequency information in low noise conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
By optimizing the parameters of neural network and applying it to gait recognition, we propose a gait recognition method based on optimized neural network. And we use gait Gaussian image to replace the most popular gait energy image in gait recognition. In this method, an eight-layer convolution neural network is built and initialized with the parameters of the well trained model Alexnet, which can speed up the model convergence and prevent over-fitting effectively. Compared with the traditional methods,the model training time is shortened and the model's expression ability is enhanced at the same time.Further, the gait Gaussian images of human motion are used to train the optimized neural network and update the parameters of the model, training with gait Gaussian image makes the expression of the model be better than the traditional training with gait energy image. To our knowledge, it is the first time to apply gait Gaussian image based neural network to gait recognition in existing researches, this is a breakthrough in the performance of the algorithm. Thus, we get an optimized neural network that can achieve gait recognition successfully. A satisfactory recognition result of the model was found by lots of experiments, especially when the targets carrying status or wearing coat. The experimental results show that the optimized neural network based gait recognition can speed up the model training. In addition,the optimization strategy well avoids over-fitting of the model, and the use of gait Gaussian image also makes the model better than the previous.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For 3D object recognition, a discriminative point cloud descriptor is required to represent the object. The existing global descriptors encode the whole object into a vector but they are sensitive to occlusion. On the contrary, the local descriptors encode only a small neighbor of a key point and are more robust to occlusion, but many objects have the same local surface. This paper presents a novel mixture method which segments a point cloud into multiple subparts to overcome the above shortcomings. In offline training stage, we propose to build up the model library that integrates both global and local surface of partial point clouds. In online recognition stage, the scene objects are represented by its subparts, and a voting scheme is performed for the recognition of scene objects. Experimental results on public datasets show that the proposed method promotes the recognition performance significantly compared to the conventional global and local descriptors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a system to recognize strawberries from other plants based based on convolutional neural networks(CNNs). The architecture of the proposed CNNs is comprised of a number of convolutional layers. The proposed CNNs can identify strawberries from other leaves with 95.2% accuracy which is comparably better than recent literature. The article also analyzes the influence of various factors such as kernel size, feature maps and CNN layers on the accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
When facing sea-sky background, island-shore background and other complex conditions, recognition rate and false alarm rate of the existing ship target recognition system based on a single wide-wave infrared image will be affected. For solving the above problems, this paper has studied the method of ship target recognition based on multi-spectral infrared images. The image data set of 5 medium-wave infrared images was collected and the samples data set was constructed by annotating the multi-spectral images manually. Firstly, Dense SIFT feature of each infrared image was extracted. Secondly, PCA was applied to each SIFT feature, reducing its dimensionality form 128 to 64. Then the spatial and spectral position information of each SIFT feature was integrated into the feature vector. Based on the Gaussian mixture model, the feature vectors of the multi-spectral images were encoded to obtain the Fisher vector representing the target. Finally, the linear SVM classifier was used to identify Fisher vector and further to recognize the target. Experimental results show that compared with single spectral infrared image, the recognition rate based on multi-spectral infrared images is higher. The recognition rate of the proposed algorithm reaches 0.97. Research indicates this paper provides a new method for ship target recognition based on multi-spectral infrared images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Unimodal analysis of finger-vein (FV) and finger dorsal texture (FDT) has been investigated intensively for personal recognition. Unfortunately, it is not robust to segmentation error and noise. Motivated by distribution trait of FV and FDT in a finger, we present a multimodal recognition method, called weighted sparse fusion for identification (WSFI), which uses FV and FDT images with fusion applied at the pixel level. Firstly, a new fused test sample, a weighted sum of FV and FDT images per-pixel, is obtained, the weight values are computed according to the reconstruction error of each FV and FDT pixels. And a new dictionary associated with the fused test sample is constructed in the same manner. Secondly, for every new fused test sample and the dictionary associated with it, the sparse representation based classification (SRC) is implemented for recognition. Experiments show that comparing with state-of-art techniques, our method achieves significant improvement in terms of accuracy rate (AR) equal error rate (EER).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A color-model-based control of a nonlinear system with significant light’s disturbance effects for an image process problem is proposed. First, a design methodology based on the Lyapunov analysis is presented. Second, the scheme is composed with an adaptive control part of the neurons controller with error effects, and a supervisory control part to enhance robustness against LED light disturbances and image model uncertainties. Third, an effective supervised adaptive control theory is used to tackle the image identification problem. Experimental results with a Kinect image sensor are obtained from a practical marker identification system, and they show that the proposed image identification technique has excellent performance when it is compared with the traditional image process method. Also, the feed-forward term of photoresistor is able to provide extra improvement in the image identification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The background information is full of essential clues which could be used to distinguish the foreground from images, especially when images contain multiple targets or complex backgrounds. In this paper, we formulate the saliency detection task as a labelling problem. We propose a novel saliency detection method via fusing a set of features based on background information. We firstly extract background features referred to as uniqueness feature, dense feature, and sparse feature. Specifically, uniqueness feature is defined using the color distinction and spatial distance based on the K-means algorithm; dense feature of the background segments is calculated by the PCA algorithm; sparse feature is computed based on the sparse encode algorithm. Then we fuse these background features under the CRF frame. Finally, we evaluate our proposed method on a new constructed dataset from THUS10000, SOD and ECSSD datasets to cover different scenarios. The experimental results show that our method can be well against the previous methods in terms of precision and recall.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Pedestrian detection is an important application in computer vision. Due to uneven illumination, serious obstacles, low quality images, abnormal posture and other factors, pedestrian detection faces the problem of low detection accuracy in complex scenes. In this paper, pedestrian detection algorithm based on deep convolution neural network is studied. Since shorter connections between the input and output layers can help to build deeper and more efficient network in CNN, a densely connected convolution structure is introduced in this paper to optimize the Deconvolutional Single Shot Detector and improve the feature utilization and reduce the network parameters. Meanwhile, by augmenting the context information, the detection performance for small size pedestrians is improved. The initial experimental results show that the proposed algorithm improves the detection accuracy to 87.84% at the speed of 12.3fps on low-resolution (64x128) pedestrian dataset, which outperforms the reference algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To solve the problem of target loss as occlusion for a variety of Correlation Filter based trackers, an improved tracking algorithm is proposed based on occlusion awareness and target re-detection mechanism in this paper, in which the occlusion awareness module is used to evaluate whether the tracked object is occluded or whether the tracking result is reliable. As the events as occlusion that results in tracking failure occur, the object re-detection module is triggered to redetect the original tracking target based on integral map of pixel-wise object confidence from color information. Furthermore, when the tracking quality is unreliable and no reliable object is re-detected, and the tracking model is not updated. Experiments show that the proposed algorithm can effectively avoid the problem of the Correlation Filter tracker’s variants, loss of the tracked object and model drift caused by occlusion, its tracking performance is obviously improved compared with that of several state-of-the-arts Correlation Filter tracker’s variants.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Object detection is one of the most important issues in the field of remote sensing analysis. The lack of semantic information about objects poses difficulty for traditional methods in exploring effective features for object discrimination. Being capable of feature extraction, a series of region-based convolutional neural networks (R-CNN) have been widely and successfully applied for object detection in natural images recently. However, most of them suffer from the poor detection performance of small-sized targets, which means that few of them can be introduced directly for small-sized object detection in remote sensing images. This paper proposes a modified method based on faster R-CNN, which is composed of a feature extraction network, a region proposal network and an object detection network. Compared to faster R-CNN, in the feature extraction network, the proposed method removes the forth pooling layer and employs dilated convolutions on the all subsequent convolutional layers to enhance the resolution of the final feature maps, which provide more detailed and semantic feature information of targets to help detect objects especially the small-sized one. In the object detection network, contextual features around the region proposals are added as complement feature information to help distinguish objects accurately. Experiments conducted on two data sets verify that our proposal obtains a superior performance on small-sized object detection in remote sensing images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a lane detection algorithm based on spline model in aerial video of the freeway, which consists of three sections: image segmentation, clustering of lane feature points and lane model parameter estimation. Firstly, the segmentation method is based on the characteristics of lane, such as color, width and shape. In the aspect of clustering of lane feature points, spectral clustering algorithm is used to accomplish the clustering of effective feature points, and the similarity matrix is constructed according to the line spacing. In terms of lane model selection and parameter estimation, we fulfill them with the following three procedures: 1) cubic B-spline curve is used in this paper to express the lane more accurately and to indicate the distance farther. 2) we evaluated the model parameters by taking advantage of the improved RANSAC algorithm. 3) we chose the Kalman filter to correct and predict lane parameters. The results of experiment demonstrate that the proposed method can detect the model parameters of every lane from the video of aerial photography freeway with high stability and high detection accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a novel method for improving algorithms which detect the presence of people in video sequences. Our focus is on algorithms for applications which require reporting and analyzing all scenes with detected people in long recordings. Therefore one of the target qualities of the classification result is its stability, understood as a low number of invalid scene boundaries. Many existing methods process images in the recording separately. The proposed method bases on the observation that real-life videos depict underlying continuous processes. The method is named FSA (Frame Sequence Analyzed). It is applicable for any underlying binary classification algorithm and it improves it by adding an additional result postprocessing step. The performed experiments are based on improving an established face detection algorithm, evaluated on a public dataset. The effectiveness of the FSA method is verified, acquiring very good results – improving the underlying algorithm in terms of all considered error measures. In the end, possible future improvements are discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Millimeter wave imaging technology has been used in human body security inspection in public. Comparing to traditional X-ray imaging, it is more efficient and without harm to human body. Thus an automatic dangerous objects detection for millimeter wave images is useful to greatly save human labor. However, due to technology limitation, millimeter wave images are usually low resolution and with high noise, thus the dangerous objects hidden in human body are hard to be found. In addition, the detection speed is of great significance in practice. This paper proposes an efficient method for dangerous objects detection for millimeter wave images. It is based on a single unified CNN (Convolutional Neural Network). Compared to traditional region-based method like RCNN, by setting some default anchors over different aspect ratios at the last feature map, it is able to frame object detection as a regression problem to these anchors while predicting class probabilities. The model gets 70.9 mAP at 50 frames per second in millimeter wave images dataset, obtaining better performance than other method, showing a promise in practical using in the future.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is difficult to realize the effective detection and tracking of high speed small target with single pulse working system, which is still used in a certain type of fire control radar. Based on coherent accumulation detection theory, the pulse accumulation detection technology is proposed to improve the target detection performance of fire control radar in this paper. According to the radar equation, the performance improvement is analyzed and calculated. The results show that the proposed method can effectively improve the detection distance and alarm time of the high speed small target.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Videos taken by hand-held camera easily contain both motion and jitter, which can result in a large number of false detections of moving object detection and achieve poor performance. In this paper, we propose a moving object detection algorithm adapted to videos from hand-held camera. The proposed algorithm uses the optical flow method to perform motion estimation and motion compensation on the videos. So the interferences caused by hand-held camera can be reduced. Then we establish background model to detect the moving object. The proposed algorithm is verified with videos from hand-held camera and compared with several state-of-the-art algorithms. Experimental results demonstrate that the proposed algorithm is effective for moving object detection in videos from hand-held camera.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vehicle detection is still a challenging task for intelligent vehicle platform. Real-time requirements and vehicle posture changes, illumination conditions, occlusion levels are the main difficulties. To handle these difficulties, a new algorithm for vehicle detection is proposed. A region of interest for an image is obtained by using the improved geometric constraints algorithm, and then the integral images are used to accelerate the feature extraction process within the region of interest. Finally, Multi-feature fusion algorithm is performed based on the confidence scores of the Gentle Adaboost classifications that are trained by Haar-like feature, HOG feature and LBP feature respectively. In the testing phase, the three confidence scores of the classifier are used to determine the classification results. The experimental results show that the proposed method can reduce the detection time effectively and improve the accuracy of vehicle detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Single Shot MultiBox Detector (SSD) is one of the fastest algorithms in the current object detection field, which uses fully convolutional neural network to detect all scaled objects in an image. Deconvolutional Single Shot Detector (DSSD) is an approach which introduces more context information by adding the deconvolution module to SSD. And the mean Average Precision (mAP) of DSSD on PASCAL VOC2007 is improved from SSD’s 77.5% to 78.6%. Although DSSD obtains higher mAP than SSD by 1.1%, the frames per second (FPS) decreases from 46 to 11.8. In this paper, we propose a single stage end-to-end image detection model called ESSD to overcome this dilemma. Our solution to this problem is to cleverly extend better context information for the shallow layers of the best single stage (e.g. SSD) detectors. Experimental results show that our model can reach 79.4% mAP, which is higher than DSSD and SSD by 0.8 and 1.9 points respectively. For 300×300 input, our testing speed is 25 FPS in single Nvidia Titan X GPU which is more than the original execution speed of DSSD.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The application of deep learning in traditional industries has not gained much attention. However, deep learning has a great potential to be transplanted to other fields. And we managed to apply two techniques in deep learning, object detection and tracking, in dynamic object counting. We test it on one of the basic problem in steel industry, rebar counting. To cope with this, we used an infrared camera to collect video of rebar on the spot so that rebar can be distinguished from background apparently. Then we use the video to complete counting work. We divided the counting process into two parts: detection and tracking. We improved SSD model to satisfy the detection demand of accuracy and speed, and use KCF to track. Given the fact that the rebar objects in video are scale-invariable, we reduced the feature map numbers as well as the anchors and gained a considerable speed-up, without worsening the accuracy. To getting rid of the error from the vibration of conveyor belt, we improved the tracking algorithm and make a satisfactory result. The application of our object counting system is not limited in rebar counting, and it can be transplanted to some other field.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Lane detection is one of the most important parts of ADAS system. This paper designs and implements an embedded system based on TDA2EG. A fast and effective lane detection algorithm is proposed to fit the embedded system. The algorithm was achieved through following steps: contrast enhancement, color space transformation, lane line feature extraction and matching, and least squares fit. Experimental results show that compared with the traditional edge detection method, this algorithm works better and at the same time can guarantee the accuracy and stability of the detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The mainstream detection methods Faster R-CNN and SSD are mainly designed for general dataset, but do not emphasize the detection effect of small targets and can not to achieve higher average detection accuracy on general dataset. In order to overcome the problem, we present a target detection method based on hierarchical and multi-scale convolutional neural network aiming at the detection task of maritime targets in complex scenario. To enhance the detection capability of small targets, we extract proposals of different scales in the multi-resolution convolution feature map in the region proposal network. To further improve the detection accuracy, we add an object detection network. The convolution feature maps with high-resolution are used to extract the targets, then an upsampling layer is added to enhance the resolution of the feature maps. The region proposal network and object detection network are then combined to realize the accurate detection of the target. The experiment results demonstrate that the proposed method achieves good detection results in maritime targets dataset, and the accuracy of target detection outperforms those of the mainstream detection methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper designs a moving target tracking system. Firstly, on the aspect of hardware, a moving target tracking platform is designed, which consists of visible CCD camera, servo PTZ, FPGA control card and PC. The platform can achieve target image acquisition, image transmission, PTZ control and algorithm processing. Secondly, on the aspect of algorithm, aiming at the deficiency of the Mean Shift (MS) algorithm in poor anti-background interference ability, a target modeling method is proposed. The method fuses HSV color feature and edge orientation feature for Mean-Shift iteration. Then the improved algorithm is tested on the DAPRA Egtest01 test sets. Experiment results show that the improved algorithm is good at tracking target whose color is similar to the background color. Finally, the improved algorithm is implemented on the hardware platform. The real-time automatic tracking experiment shows that the system can obtain a satisfied tracking target result.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In contrast to detection in the sky or sea background, infrared small target detection in the near-earth background shows its own particularity and complexity. In this paper, an infrared image preprocessing algorithm using dark image processing and improved K-SVD algorithm is proposed. In the first place, an infrared image model in the near-earth background is constructed. Due to the similar characteristic of low contrast between infrared images and foggy images, we propose an analogical method that analogizes infrared images as foggy images. On this basic, theories of image dehazing can be employed in the process. In this preprocessing algorithm, near-earth background suppression in infrared images is achieved by dark image processing method. After background suppression, an improved K-SVD algorithm based on NLM algorithm is applied for image denoising. Considering relevant information of different image blocks and the orthogonality between residual terms after denoising and chosen atoms, a regularized constraint represent for image self-similarity information is introduced to improve K-SVD algorithm. Experiments show that the proposed preprocessing algorithm can effectively suppress near-earth background, enhance the contrast between target and its peripheral region, and improve the performance of infrared image preprocessing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual saliency prediction has obtained a significant popularity these years but the majority research is for static saliency prediction. An approach to detect dynamic saliency of videos is proposed in this paper, which exploits a spatial-temporal fusion way. Spatial saliency is detected by a trained convolutional neutral network, and we use a larger convolutional kernel for some layers in our network because saliency is influenced by global contrast according to visual psychology. While temporal saliency is extracted by optical flow and we combine it with K-means cluster, which brings a more accurate result. In addition, the two are fused in an optimal weighted way. Our experiments on DIEM datasets outperforms compared to four other dynamic saliency models on two metrics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, object detection has been widely used in power systems to assist fault diagnosis of transmission lines. However, it is still faced with great challenges due to multi-size targets existing in a single inspection image. Current state-of-art object detection pipelines, like Faster R-CNN, perform well on large objects with low resolution, but usually fail to detect small objects due to low resolution and poor representation. Many existing object detectors for this problem typically exploit feature pyramids, multi-scale image inputs, etc., which can attain high accuracy but is computation and memory consuming. In this paper, we propose an improved cascaded Faster R-CNNs framework that reduces the computational cost while maintaining high detection accuracy to cope with multi-size object detection in high-resolution inspection images, where the first-stage Faster R-CNN is used to detect large objects while the second one detects small objects relative to large objects. We further merge the first-stage and the second into a single network by sharing convolutional features–using the semantic context between multi-size targets, the first stage tells the second where to look. For the "tell" step, we just map the bounding box coordinates of large objects detected in the first stage to the VGG16 network, crop the corresponding feature maps and feed them to the following second stage. Experiments on the test datasets demonstrate that our method achieves a higher detection mAP of 87.6% at 5FPS on an NVidia Titan X compared with the one-stage Faster R-CNN.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Robust hand detection and classification is one of the most essential tasks in sign language recognition. However, the problem is very challenging due to the complexity of hands in sign language. The performance of existing approaches can be easily affected by the numerous variations of sign language gestures, small and unobtrusive hand areas, and ever changing of hand locations. In this paper, to detect and classify the hands in sign language robustly, such kind of small objects that contain rich information, we propose an improved Faster R-CNN approach, namely Multi-scale Faster RCNN. Our approach extends the framework of the Faster R-CNN and a multi-scale strategy is adopted to incorporate hierarchical convolution feature maps. We evaluate our approach on the self-built sign language dataset and the experimental results demonstrate the effectiveness of our proposed approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper focuses on the problem of estimating the fundamental matrix with unknown radial distortion. The general method to the problem is Gröbner basis method. That solves nontrivial polynomial equations formed by a pair of correspondences under one-parameter division model for radial distortion, which is nonconvex and no noise-resistant. Using results from polynomial optimization tools and rank minimization method, this paper shows that the problem can be solved as a sequence of convex semi-definite programs. In the experiments, we show that the proposed method works well and is more noise-resistant.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The main contribution of this article is to solve the problem of detection of larger scale differences. Aiming at the problem of small target detection, we can better use the underlying features of the convolution network to construct the hyper convolution feature to achieve better detection and recognition effect. For larger scale target, by dilated convolution operation, the context information of different scales can be integrated into high-level feature information according to different receptive fields. In this experiment, we introduce the lightweight convolutional network, SqueezeNet, as the basic feature network. The network has small size, fast training speed and strong expression ability. In the experiment environment of single Titan X GPU card, the distribution of the migrated dataset can be better studied by increasing the size of batch images during training. After the pre-training of the VOC dataset, the migration training was carried out in the remote sensing image dataset, and the mAP of the detection of the 12 targets reached 0.937205, which reached a better level of detection result.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fabric defect detection, a popular topic in automation, is a necessary and essential step of quality control in the textile manufacturing industry. Traditional machine learning algorithms such as deep learning always require a large number of training samples in fabric defect detection. However, fabric defect rate has been greatly decreased because of production technology has been developed further. An algorithm called Bayesian Small Sample Learning (BSSL) based on Naive Bayes was proposed to solve the problem of lack of training samples. Firstly, it is important to remove the noise in the image which collected from experiment platform. After that, the reference values are obtained by learning few samples of different defective fabrics and defect-free fabrics. Finally, the feature values need to be extracted from the fabric to be detected and Bayesian algorithm is used to calculate the posterior probability which the reference values to the feature values when the learning process completed. The fabric was detected as defective or not determined by maximum posterior probability. Experimental results show that the proposed algorithm BSSL requires few defective samples for learning and also can achieve high accuracy of detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the infrared scene, locally adaptive regression kernels (LARK) feature has the advantage of sensitive to the change of small structure. The probability distribution of the target in the continuously adaptive mean -shift (CamShift) algorithm of single target tracking is weighted by the similarity of feature global matching . It can weaken the interference of background. In order to robustly track infrared target with shape changes, global matching is turned into local statistical matching according to the invariance of target local structure. The number of similar characteristics in the area around a point is used as the weights.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a multispectral salient object detection algorithm based on frequency domain is proposed, which has the advantages of high efficiency and simplicity. In this paper, the principle of the saliency detection in frequency domain is first studied, and the advantages of it in the detection of abnormal target are demonstrated. Then, using quaternion transform to extract spectral features in two spectral intervals. We extract salient information in frequency domain by quaternion fourier transform, the saliency map of two spectral interval method using PCA fusion. The adaptive threshold is used to segment image for highlighting salient objects. Finally, we collect multispectral images from the spectral system built by AOTF and near-infrared camera, and test the original and noisy images. The results show that the algorithm is efficient and robust.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Moving object detection and background estimation are important steps in numerous computer vision applications. Low rank and sparse representation based methods have attracted wide attention in background modeling field. However, many existing methods ignore the spatio-temporal information of the foreground. In this paper, a new low-rank and sparse representation model for moving object detection is proposed, in which we regard the image sequence as being made up of the sum of a low-rank static background matrix, a sparse foreground matrix and a sparser dynamic background matrix. The 3D total variation regularizer and weighted nonconvex nuclear norm are incorporated to refine our model. Extensive experiments on challenging datasets demonstrate that our method works effectively and outperforms many state-of-the-art approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Autonomous cars establish driving strategies by employing detection of the road. Most of the previous methods detect road with image semantic segmentation, which identifies pixel-wise class labels and predicts segmentation masks. We propose U-net1 , a novel segmentation network by learning deep convolution and deconvolution features. The architecture consists of an encoder and decoder network. The encoder network is trainable with a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the convolutional layers. The novelty of U-net lies in the manner in which the decoder deconvolves its lower resolution input feature maps. Specifically, the decoder network conjoins the encoder convolution features and decoder deconvolution features using the "concat" function, which achieves a good mapping between classes and filters at the expansion side of the network. The network is trained end-to-end and yields precise pixel-wise predictions at the original input resolution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The facial expression reenacted forgery (FERF) is a very complicated and meticulous video tampering type compared to other video tampering types, such as the simple copy-paste of frames or objects. The best results of FERF can make the target actor’s facial expressions follow the changes of the source actor’s in real time. Existing video tampering detection methods aim at detecting simple tampering type, like intra-frame or inter-frame forgery, which function little on the detection of FERF. In this paper, a novel video forgery detection method is proposed to detect FERF. Through the attentive analysis of the general progress of FERF, some abnormal subtle changes in facial region is exposed and utilized to verify the authenticity of videos. Moment features of detailed wavelet transform coefficients and optical-flow features of the videos are combined as feature vectors put into Support Vector Machine (SVM) for the classification of original videos and forged ones. The experimental results show that the proposed method is effective on the detection of FERF. We also compare our results with previous popular copy-paste forgery detection algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Varying illumination is a tricky issue in face recognition. In this paper, we make improvement on the logarithmic total variation (LTV) algorithm to handle the varying illumination in face image. First of all, logarithmic total variation (LTV) is adopt to separate the face image into high-frequency and low-frequency features. Then, a novel illumination normalization method is proposed to handle low-frequency feature, which is founded on the advanced contrast limited adaptive histogram equalization (CLAHE). Furthermore, threshold-value filtering is utilized to realize enhancement on high-frequency feature. Finally, the normalized face image can take shape through the normalized high-frequency feature and enhanced low-frequency feature. We make comparative experiments on YALE B databases, including three types of techniques. The finnal results show that CLA and TH-LTV algorithm owns excellent recognition performance compared to other state-of-art algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Binary gradient pattern (BGP) is a concise and efficient descriptor for face recognition which is robust to light, expression and occlusion. It has achieved remarkable results in face recognition applications. However, BGP descriptor is a universal operator, which does not reflect the particularity of human face in the process of face recognition. This paper draws on the experience of human face recognition based mainly on facial features(including eyes, nose and mouth), and proposes a method of face recognition based on heuristic information. This method firstly determines the general location of the above facial features according to human experience. Secondly, BGP operator is used to extract the features of the face, and we divide faces into several sub-blocks to obtain the histogram features of each sub-block. Finally, the features corresponding to the positions of the features are weighted. The method is fully validated in Yale and ORL libraries. Compared with the original BGP method, the recognition accuracy and robustness are significantly improved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The standard pipeline in pedestrian detection is sliding a pedestrian model on an image feature pyramid to detect pedestrians of different scales. In this pipeline, feature pyramid construction is time consuming and becomes the bottleneck for fast detection. Recently, a method called multiresolution filtered channels (MRFC) was proposed which only used single scale feature maps to achieve fast detection. However, as MRFC use gridwise sampling in the feature extraction process, the receptive field correspondence in different scales is weak. This shortcoming limits its accuracy. In this paper, we proposed a method which also uses single scale feature maps. The main difference between MRFC and our method lies in feature extraction. As opposed to using gridwise sampling, we use scale-aware pooling, which makes a better receptive field correspondence. Experiment on Caltech dataset shows our detector achieves fast detecting speed at the same time with high accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Obviously, face recognition may be good for obtaining the students’ learning behaviors in class, which are useful for either teaching quality estimation or individualized teaching. While, exact face detection is the first and necessary task in such application. Considering the real setting of a classroom, it is also challenging. After careful studying, it is found that special position of the cameras in a classroom may lead various poses, and severe occlusion problem, which can also occur in other indoor surveillance-used places, such as large gatherings. In this paper, a forehead-based face detection model applied to such particular environments are proposed. The key idea is to obtain faces by detecting forehead area, which has a relatively high position and rich-information of shape, color and texture, instead of commonly used landmarks. The method consists of a post classifier based on extended Haar-Like feature, and a second classifier based on a color feature, called Multi-Channel-Color-Frequency Feature (MCCFF). To make it more efficient, we combine them in the same cascade framework. Practically, experiments on the database obtained from the real class, i.e. BNULSVED show that the proposed approach is effective and efficient.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, the application of age and gender estimation from face images are becoming increasingly wider and deeper. Existing age and gender estimation pipelines usually process images through machine learning, like SVM, AdaBoost and etc. However, the performance gain of such method is usually limited to handle images with strict conditions or simple backgrounds. At present, age and gender estimation in an open environment still face enormous challenges. In this paper, we introduce a method based on double channel convolutional neural network (CNN) for accurate age and gender estimation in complex scenarios. To start with, detecting face regions with single-face or multifaces. Secondly, utilizing the face alignment based on the facial landmark detection. Finally, using double channel CNN structure with Xgboost to train the model for age and gender estimation. Experiments show that the proposed method based on double channel CNN can achieve a higher accuracy at comparable time cost compared with single channel CNN method and is robust to face images from wild conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Reliable human detection is important for a wide range of applications. In this paper, a particular designed method for real-time human detection has been developed. The method is robustly in cluttered and dynamic environments, and deals with depth images. The method has two steps, first the plausible candidate positions are localized by a super-pixel based segmentation and merging approach. Then we utilize a descriptor encoding the joint of depth difference information and 3D geometric characteristics of human upper body to refine the candidates by a deep randomized decision forest classifier. Our approach, which detects human in depth images, allows very fast speed and high accuracy in three publicly available datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Reliable human detection and tracking is important for a wide range of applications. In this paper, a particular designed method for real-time human detection has been proposed. The method is robustly in cluttered and dynamic environments, and deals with depth images. The method has two steps, first the hypothesis human head regions are localized by a superpixel based segmentation and merging approach. Then we utilize a multi-channel measurement and employ neural network for classification between human and non-human region refinement. Our approach, which detects human in depth images, allows very fast speed and high accuracy in three publicly available datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Most existing local feature based facial expression recognition system have concentrate on the salient region on the face, while the effectiveness of the selected region and the computational complexity of the system still need improved. To overcome the limits of the previous work, we propose a novel algorithm kernel ReliefF to select the discriminative patches on the face. The novel approach not only considers the whole feature but also enhances the locality of the variation of the expressive face. Furthermore, it takes less computational complexity. Experimental results on CK+ and RML demonstrate that the method significantly outperforms the state-of-the-art.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The traditional people detection is mainly based on the two-dimensional data acquired from RGB images or videos. While with the help of the Time-of-Flight (TOF) camera, researchers can convert traditional two-dimensional data based on images or videos into pseudo-three-dimensional data containing depth information to achieve more accurate people detection. The research of this paper uses only the depth information and it is an important part of people counting. Based on the preprocessing of depth images, an algorithm based on Connected Component Analysis is proposed according to the characteristics of people in top-view scene. Aiming at the shortcomings of the algorithm in the crowd, the 21Hough Transform(HT) people head detection algorithm combined with depth information and priori conditions is proposed. And thus, we succeed in screening out the non-head objects and achieving real-time, accurate people detection. This study lays a solid foundation for the follow-up people counting.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Dimensionality reduction has been widely used to deal with high dimensional data. In this paper, based on manifold learning and collaborative representation, an efficient subspace learning algorithm named Manifold Aware Discriminant Collaborative Graph Embedding (MADCGE), is proposed for face recognition. Firstly, the representation coefficients of face images are obtained by collaborative representation combined with label information and manifold structure. Then, it constructs a new graph with the coefficients obtained as the adjacent weights. Lastly, graph embedding is exploited to learn an optimal projective matrix for feature extraction. As a result, the proposed algorithm avoids choosing the neighborhood size of graph, which is difficult in literature. More importantly, it can not only preserve the linear reconstructive relationships between samples, but also sufficiently utilize the merits of label information and nonlinear manifold structure to further improve the discriminative ability. Extensive experiments on face databases (AR face database and YALE-B face database) are conducted to exam the performance of the proposed scheme and the results demonstrate that the proposed method has better performance than some other used methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Shooting is the regular army military training course which is often used as an artificial target, and it has disadvantages such as poor safety and low target efficiency. However, some existing automatic target-reading devices have disadvantages such as inaccurate data, high price and not easy to carry. Therefore, it is of great significance to develop a fast and accurate target, low price and easy to carry automatic target-reading device for military modernization. In this paper, the research object is chest round target. In order to complete image acquisition, image processing, bullet hole identification and ring value determination, we use image recognition technology and hardware support provided by Raspberry PI. And we set up an automatic target-reading system for data storage, management and other functions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a novel gait representation based on 3D-CNN, i.e., learning spatio-temporal multi-scale gait identity features (GaitID) using the 3-dimensional convolutional networks. Our contributions include: 1) explore different numbers of input frames for 3D-CNN model, 2) evaluate different features and gait representations in 3D-CNN, and 3) improve the net structure to learn multi-scale gait features with low dimensions. Nearest neighbor (NN) classifier was applied to identify the gait. When compared with other existing methods, the results reported on the CASIA-B dataset demonstrated that the proposed method not only achieved a competitive performance, but also still retained the discriminative power in a very low dimension (128-D), even with a simpler classifier.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Oil tank is one kind of foundational industrial facility for storage of oil and petrochemical products. Automatic recognition of the oil depot in the remote sensing image is of important practical significance in many fields. Nowadays, the Unmanned Aerial Vehicle (UAV) provides an available alternative solution to the satellite for monitoring the oil depot, owing to its advantages of flexibility, rapid response and minimal cost. In this paper, a novel oil tank extraction method based on detection of the elliptic rooftop is proposed. To start with, straight line segments of object boundary are extracted in the UAV imagery. Secondly, these lines are linked to form arc segments based on proper geometric criteria, and then elliptical rooftops are extracted based on these arcs to generate hypotheses of potential oil tanks. Finally, within Region of Interest (ROI) of rooftops, hypotheses disambiguation and verification of targets are accomplished primarily by extraction of facade contours of oil tanks. Experimental results demonstrate the good performance of our method on a variety of complex scenes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Skeleton-based methods have been proposed to detect and recognize meaningful human motion. It is known that most of them must contain some parameters. To achieve better recognition performance, various evolutionary schemes have been applied to select the optimal parameters in each phase of these human recognition methods. Experimental evaluations of various parameters, in terms of action recognition performance, should be done for obtaining the optimal parameter. In this paper, we propose an adaptive skeleton-based human action recognition system which can automatically adjust the experimental parameters according to the input data. We first extract some spatiotemporal local features by obtaining position differences of joints, which models actions over time. Then a two-layer affinity propagation (AP) algorithm is employed to select crucial postures. Our experiment results demonstrates that the proposed method works well for different dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vehicle logo, as the key information of vehicle, combined with other vehicle characteristics will make vehicle management more effective in the intelligent transportation system. However, it is still a challenging task to extract effective features for vehicle logo recognition, largely due to its variations in illumination and low resolution. Aiming at improving the recognition rate of vehicle logo recognition, this paper proposes a new vehicle logo recognition method. First, in the aspect of vehicle logo feature extraction, a vehicle logo feature extraction algorithm based on the fusion of SIFT features and Dense-SIFT features was put forward to generate local feature descriptors. Then Bag-of-words model was used to describe vehicle logo features and form visual dictionary histogram. Considering that bag-of-words model ignores spatial structure information of objects, we introduced spatial pyramid model into bag-of-words model. In the aspect of vehicle logo recognition, vehicle logo was classified by using Support Vector Machine (SVM) based on one-against-the-rest multiclassification structure. Finally, our method was verified effectively through the experiment compared to other methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Convolutional neural network is widely used in image recognition. The associated model is computationally demanding. Several solutions are proposed to accelerate its computation. Sparse neural network is an effective way to reduce the computational complexity of neural networks. However, most of the current acceleration programs do not make full use of this feature. In this paper, we design an acceleration unit, using FPGA as the hardware platform. The accelerator unit achieves parallel acceleration through multiple CU models. It eliminates the unnecessary operations by the Match model to improve efficiency. The experimental results show that when the sparsity is ninety percent, the performance can be increased to 3.2 times.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Texture recognition is a key topic in many applications of image analysis; many techniques have been proposed to measure the characteristics of this field. Among them, texture energy extracted with the “Tuned” mask is a rotation and scale invariant texture descriptor. However, the tuning process is computationally intensive and easily to trap into local optimum. In the proposed approach, how to obtain the “Tuned” mask is viewed as a combinatorial optimization problem and the optimal mask is acquired by maximizing the texture energy value via a newly proposed cuckoo search (CS) algorithm. Experimental results on samples and images show that the proposed method is suitable for texture recognition, the recognition accuracy is higher than genetic algorithm (GA) and particle swarm optimization (PSO) optimized “Tuned” mask scheme, and the water areas can be well recognized from the original image. It is a robust and efficient method to obtain the optimal “Tuned” mask for texture analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Self-Organizing Feature Map (SOFM), has been used for the qualitative identification of strawberry juice powders. The research was based on image recognition using powders obtained through an industrial spray-drying process. Results demonstrated that the color features were able to effectively distinguish the research material consisting of spray-dried powders of strawberry juice. The adequate model in terms of the lowest error value RMS (Root Mean Square) contained 46 neurons in the input layer and neurons in the output layer. The model is an effective tool for classifying wrong color changes in strawberry powders.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The automatic recognition algorithm of lunar terrain is one of the hot topics in recent years. The algorithm which single use CCD or DEM data as data source can’t get a satisfactory result. Some algorithms combine CCD and DEM data sources and make terrain identification in time domain. The recognition rate of these algorithms is improved, but the time efficiency is not satisfactory. In order to solve the above problems, a fast terrain recognition algorithm based on wavelet domain be proposed. in this paper.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Previous approaches for scene text detection or recognition have already achieved promising performances across various benchmarks. There are a lot of superior neural network models to choose from to train the desired classifiers. Besides concentrating on designing loss functions and neural network architectures, number and quality of dataset are key to using neural networks. In this paper we propose a new method for synthesizing text in natural scene images that takes into account data balance. For each image we obtain regions normal based on depth and regions information. After choosing a text from text resource, we blend the text in the original image by using the homography matrix of original region contours and mask contours where we put text directly in. Especially, the text source is obtained by a specific loss function which reflects the distances of current characters’ distribution and target characters’ distribution. Text detection experiments on standard dataset ICDAR2015 and augmented dataset demonstrate that our method of balanced synthetic dataset gets an 84.5% F-score which achieves 2% increase than the result of standard dataset and is also higher than synthetic dataset without balance. Training on balanced synthetic datasets achieves great improvement of text recognition than on some public standard recognition datasets and also performs better than synthetic datasets without balance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic license plate detection and recognition (ALPDR) in natural scene is a useful but difficult task as the all-weather and variety of lighting conditions. Though deep learning based ALPDR methods can achieve much higher recognition rate, it needs a large number of human-labelled samples to train the deep neuron network. In this paper, we propose a method to generate synthetic data based CNN ALPDR to avoid manually labelling lots of data and stabilize training. First, our data engine generates 100K synthetic car license plates to simulate real scene and train networks. Then, we design a recognition network to predict all characters holistically, avoiding the character segmentation. Some real scene data sets are employed to validate the effectiveness of our presented method. The accuracy of our ALPDR system is 91.18% and 95% in toll station dataset and 94.2% in traffic surveillance dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Integrated patrolling inspection train has been used worldwide for railway safety monitoring. The camera mounted under the train can capture the track image for abnormal fastener detection. For solving the high false positive alarm of rail fastener recognition arising from ballasts occlusion and non-uniform illumination, we proposed a fastener defect recognition method using deep learning model, and constructed four network structures based on AlexNet and ResNet to learn the fastener feature in complex background. The experimental results show that the RestNet18 network model with unfreezing convolutional layers not only performs well at the trained line, but also has good generalization at the new line, which is a more appropriate model for fastener recognition by comparison with the traditional handcraft feature and existing deep learning models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the relative motion between camera and scene to be photographed during exposure, image deblurring is commonly categorized into two types: space-invariant and space-variant versions. Space-variant image deblurring is more difficult since the blurring effect cannot be modeled by convolution. In this paper, we mainly focus on the problem of space-variant text image deblurring. To guarantee high-quality imaging, hyper-Laplacian prior is exploited to model the distribution of text image gradients. Under the maximum a posteriori (MAP) estimation framework, we propose to develop a nonconvex variational model to handle the problem of space-variant text image deblurring. The proposed method has the capacity of suppressing the undesirable space-variant blurring effect and ringing artifacts while preserving the main structural features. Several experiments will be conducted to compare our method with two popular image deblurring methods. Numerical results have demonstrated the superior performance of the proposed method in terms of PSNR, ISNR, MSSIM and visual quality assessments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we mainly study the problem of space variant motion blurred image restoration under general situation, in which the motion blur kernel needs to be estimated before image restoration. According to the optical imaging principle of digital camera, it is found that the rotation motion of the camera around three axes is the main influence factor of the outdoor long distance imaging blur. We use the Gaussian Mixture Model (GMM) and the Mixed Exponential Model (EMM) respectively to fit the gradient distribution of the natural image and the distribution of kernel element of motion blur kernel. Then we solve the relevant parameters in the GMM and EMM by using the Expectation Maximization (EM) Algorithm. We construct the mathematical model for estimation motion blur kernel in the Bayes framework. After estimating the space variant motion blur kernel, by assuming that the noise in the pixel domain and gradient domain both compliance with Gaussian distribution, we add a new regularized constraint to the image restoration model, which can effectively recover the details such as texture contour in the blurred image with the image restoration result converging to the optimal at the same time. Experimental results have demonstrated the satisfactory performance of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to contrast deficiency and blurred details in some low-light images, this paper presents a low-light image enhancement method using multi-layer fusion and detail recovery. Firstly, V channel in HSV color space is copied to three layers: Retinex layer, brightness layer and detail layer. In Retinex layer, the combination of weighted guided image filtering and morphology eliminates halo phenomenon. Meanwhile brightness and details of images are enhanced by the improved Retinex. Adaptive normalization is adopted to enhance brightness further in brightness layer. In detail layer, the improved model of local linear is to obtain more details. Finally, a detail recovery based on pixel arrangement is performed to avoid partial fuzzy details caused fusion. Experimental results show that the proposed method can more effectively highlight image details and improve the contrast.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the process of deep sea image acquisition, the illumination of the image is uneven due to the limitation of the deep sea imaging conditions. In this case, an image enhancement algorithm for auto adaptive exposure is proposed. First the algorithm calculates the distance between the light source and the imaging object based on the previous and the next frame, then calculate the exposure coefficient based on the distance, and then adaptively expose the image based on the exposure coefficient to achieve the purpose of enhancing the image. The experimental results show that this algorithm can effectively restore the uneven of the illumination due to the submarine conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to avoid the problem that the color distortion and the details are not obvious, this paper presents an improved defogging algorithm. It uses the dark channel prior to estimate the atmospheric light and the transmission and proposes a new gradient domain filtering to refine the transmission. Then, the intensity compensation is used to enhance the image color information. At the same time, the atmospheric light value is estimated and boundary constrains is used to roughly estimate the transmission, and, weights is added to define the transmission. So the scene transmission of both is refined, the image is reduced by atmospheric scattering model and two images are fused with the fusion strategy. The fusion of image uses the auto white balance to fine tune to get the final image. According to the experimental results, the algorithm effectively solves the problem of color distortion and loss of detail information in the defog image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we defined a characteristic response of convolutional layer mapping in a convolutional neural network model, and explored correlation between them by adjusting a trained convolutional neural network model’s structure. First of all, we did pre-processing include contrastive enhancement for photos. Then matched the photo's characteristic response to obtain its content information in a random image. And matched correlation between the characteristic responses of ink painting again to obtain its style information. The final step was to synthesize the image. Traditional ink painting method usually generate images with some basic features. So a particular style can’t be assigned, even generating stiff images without artistic conception. Aim at these situations, this paper proposes an ink painting synthesis methods based on convolutional neural network, which can produce better images. It retains both outline information of original photo and overall texture information of ink painting. This paper also presents a method ,which can merge ink painting style into a photo. The method works well in synthesizing grayscale image such as ink painting.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a new adaptive multi-directional weighted mean filter for removing salt-and-pepper noise is proposed. The new filter firstly introduces a variance parameter to judge the gray level difference between the current pixel and its neighborhood pixels and then designs the noise detector by combining the variance parameter and the gray level extreme. After noise detection, the new filter restores the noise corrupted pixels based on the multi-directional image information, which firstly adaptively selects the optimal filtering window and direction template and then replaces the gray level of each noise corrupted pixel by the weighted mean gray level of the pixels on the optimal template. Experimental results show that the proposed filter outperforms many existing main filters in terms of image denoising and detail preservation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An improved matching method base on Speed Up Robert Feature(SURF) is proposed to solve the problem of of false matching in image. Firstly, the SURF keypoints and SURF feature vector are extracted. Then, the coarse matching is performed based on the traditional method of Euclidean distance. Finally, adaptive threshold of Spearman correlation coefficient is used to obtain the final matching points set. The experiments show that the proposed algorithm based on SURF and Spearman correlation coefficient can reduce false matching rate compared with original algorithm in translation, zooming, rotation, varying illumination and noise jamming.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image matching is an important topic in the field of computer vision, in view of high robustness and accuracy, SIFT or the improved methods based on SIFT is generally used for image matching algorithms. The traditional SIFT method is implemented on grayscale images without regard to the color information of images, which may cause decreasing of the matching points and reduction of the matching accuracy. Prevailing color descriptors can effectively add color information into SIFT, however dramatically increase the complexity of algorithm. In this paper, a novel approach is proposed to take advantage of the color information for image matching based on SIFT. The proposed algorithm uses the gradient information of color channel as the compensation of luminance channel, which can effectively enhance the color information with SIFT. Experimental results show that the number of feature points and matching accuracy can be significantly promoted, while the complexity and performance of image matching algorithm are well trade-off.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Template matching for image sequences captured by mobile camera is widely applied in machine vision, RTLS (Real Time Location System), ADAS (Advanced Driver Assistant Systems), ITS (Intelligent Transportation System) and video surveillance system. Nowadays, the target tracking algorithms are mainly divided into two categories: generative model method and discriminative model method. Currently, discriminative model method is popular. This method mainly adopts image feature combined with machine learning to achieve template matching. Such algorithms require adequate image samples and tedious calculations, so there are many difficulties in the application of on-board systems such as ADAS and ITS. In this paper, we present a method based on visual feature information and structure information which can improve the accuracy of template matching effectively and proposed cuda architecture based parallel acceleration algorithm. Compared with previous method, the proposed method can achieve template matching robustly while maintaining a short operation time, so that it can be easily ported to the vehicle-mounted system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the unprecedented technology development of sensors, platforms and algorithms for 3D data acquisition and generation, airborne and close-range data, in the form of image based, Light Detection and Ranging (LiDAR) based point clouds, Digital Elevation Models (DEM) and 3D city models, become more accessible than ever before. Change detection or time-series data analysis in 3D has gained great attention due to its capability of providing volumetric dynamics to facilitate more applications and provide more accurate results. We try to use mini-UAV platforms to detect change in unauthorized construction. Use of direct geo-referencing data leads to registration failure between dense matched point cloud captured by mini-UAV platforms because of low-cost sensors. This paper therefore proposes a registration method for dense matched point cloud. We try to extract sift points in the images from different times, then we match points to get the same point. By using this method, we can get control points in the cloud point. Finally, we register the cloud points successfully.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep space image registration is an important part of space exploration research. To improve the robustness and efficiency, a new method based on the geometry feature of the triangles constructed by neighbor stars is proposed. Considering the characteristics of deep space image, such as lower signal to noise ratio and less stable features, the star points in the image have been chosen as the feature points, and the geometric distribution of the surrounding stars are regarded as descriptors. Firstly, the distance between every pair of stars is calculated, and the neighborhood stars are determined by sorting distance. Then the main direction of the current star is determined by the intensity distribution of the neighborhood stars. Whole space is divided into eight quadrants by the Clockwise direction while setting the main direction as starting direction. The strongest stars in each quadrant is selected to construct the triangles which will be used as the descriptors of the current star. Finally, the matching distance between the stars is defined and calculated, and the voting matrix is established to determine the matching pairs. Experimental results show that compared with the traditional matching method, the proposed algorithm has higher efficiency and precision both in situations such as translation, rotation, noise.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurate co-registration of SAR image is a crucial step in interferometry SAR (InSAR) processing, and lower registration accuracy will lead to more interferometry phase error. The most commonly used is 2-D polynomial function registration method (2-DR), which is simple and works well in most conditions. But in the case of long baselines and complex terrain scene, (2-DR) can not meet the precision requirements. The DEM-assisted geometrical SAR image registration method (GeoR), which use the external DEM and orbit data to calculate offsets, can theoretically achieve an ideal precision for any baseline and terrain. In GeoR, corresponding elevation value of each pixel in the master image needs to extracted from the external DEM. Unfortunately, there are errors in the DEM data and the orbit data, which often result in approximate horizontal shift error and mismatch between DEM and SAR image, caused low precision elevation value extraction and eventually degrade the registration performance. In this paper, the influences of DEM error and orbital error in elevation value extraction are firstly analyzed, and then an improved geometrical SAR image registration method based on elevation correction (I-GeoR) is proposed. In elevation correction, Elevation value in DEM can be correctly assigned to each pixel of the image, which is achieved by deformation extraction that based on SAR image simulation. Finally, two ALOS-PALSAR images are processed to vaidate the new method. The result shows that I-GeoR can effectively correct the elevation and has a higher registration precision.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The registration of infrared and visible images is a common multi-modal image registration, which is widely used in military, remote sensing and other fields. After describing the registration of infrared and visible images, this paper mainly introduces the SIFT(Scale Invariant Feature Transform) algorithm and SURF(Speeded Up Robust Features) algorithm based on local invariant feature in image registration. First, we extract SIFT and SURF key points of infrared and visible images respectively. Next, we use approximation nearest neighbor search method based on k-d tree algorithm to match key points. Finally, in order to improve the matching accuracy, the RANSAC algorithm is used to eliminate the error matching points. The experiment shows that for these two algorithms, the number of key points in infrared image is obviously smaller than that of visible light image. For these two images, the SURF algorithm is better than the SIFT algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Considering the noise, background interference and massive information, it is a challenging issue to recognize human actions from videos. We, in this paper, present a non-training spatiotemporal multiscale statistical matching (SMSM) model based on the dense computation of so-called spatiotemporal local adaptive regression kernel to identify non-compact human actions. Therefore, our model can avoid the overfitting problem caused by large sample training. First, we encode the local context similarity by exploiting Gaussian difference LARK (GLARK) features. This feature can well describe the shape and trend of the weak edge. Second, we propose multiscale composite template set in SMSM, whose robustness to the detection of variable human actions in different sizes. The proposed SMSM model can balance the relationship between GLARK structure of local small window and neighborhood structure of local large window. Moreover, our statistical process solves the problem of weak edge missed detection brought by background interference and promotes the multiscale matching efficiency. In our experiments, the proposed algorithm significantly outperforms existing matching methods and some supervised methods on the universally acknowledged challenging dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For improving the measurement accuracy of aviation part’s profile, a high-accuracy profile measurement method based on step boundary model is proposed in this paper. Firstly, the cases of light stripe imaging in the target boundary are analysed, and the corresponding ideal boundary models are built to determine the ideal boundary locations. Then, a subpixel boundary extraction method based on feature moment is presented, which can obtain sub-pixel boundary locations on the basis of the crude boundary detection. Next, based on the reconstruction algorithms, the profile of aviation part is measured. At last, experiments using a standard part are conducted to verify the accuracy of this method and a measurement experiment is carried to verify the effectiveness of the boundary extraction. Experiment results show that the presented method can achieve the object boundary extraction in complex background interference and light environment. The extraction method can reach an accuracy of 0.056% and satisfy the requirements of field measurement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we proposed a novel real-time mosaic block detection method based on intensity order and shape feature to automatically detect mosaic blocks in video. In the proposed method, the video frame is firstly converted to gray image, and it is divided into several layers according to the gray-scale levels to obtain the binary images. Secondly, performed the morphological operations on binary image include dilation and erosion to exclude the influence of single pixel and small area. Finally, extracted the shape features of connected domain of each binary image are used to detect the mosaic block. Experiments show that the proposed method can effectively detect the abnormal image which containing mosaic blocks in video and has the good performance on accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Liver tumor segmentation on computed tomography slices is a very difficult task because the medical images are often corrupted by noise and sampling artifacts. Besides, liver tumors are often surrounded by other abdominal structures with similar densities. Therefore, they often show the phenomenon of intensity inhomogeneity. These restrict the liver tumor segmentation. People tried to use traditional level set methods to segment the liver tumor, but the results were not satisfying due to the noise and the low gradient response on the liver tumor boundary. In this paper, we propose a multidistribution level set method which can overcome the insufficient segmentation and over-segmentation problems. We have done many experiments and compared our approach with the CV model and LSACM model. We also use the proposed method to segment the public data set from the “3D Liver Tumor Segmentation Challenge”. All results reveal that our method is better even for liver tumors with low contrast and blurred boundaries.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image segmentation is an important part of many computer vision tasks such as image recognition and image understanding. Traditional image segmentation algorithms are susceptible to the influence of complex backgrounds such as illumination, shading and occlusion, thus the application of convolution neural network to image segmentation becomes a hot spot of current research. But in the process of image convolution, as the convolution goes further, the image will lose some edge information, resulting in the blurring of the final partition edge. To overcome this problem, we propose an image segmentation algorithm combining the fully convolution neural network and K-means clustering algorithm. By conducting pixel matching between the coarse segmentation result obtained by using the convolution neural network and the segmentation results obtained by using K-means, the algorithm enhances the classification of pixels on the edge to improve segmentation accuracy. The proposed algorithm adopts two-stage training method to train and optimize the model. The experimental results on VOC2012 set validate the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we proposed a semi-automatic pulmonary nodule segmentation algorithm, which is operated within a region of interest for each nodule. It mainly includes two parts: the unsupervised training of auto-encoder and the supervised training of segmentation network. Applying an auto-encoder's unsupervised learning, we obtain a feature extractor that consists of its encoded part. Through adding some new neural network layers behind the feature extractor and do supervised learning on it, we get the final segmentation neural network. Compared with the traditional maximum two-dimensional entropy threshold segmentation algorithm, the dice correlation coefficient of this algorithm is 1% - 9% higher in 36 regions of interest segmentation experiments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The three channels in color images are related to each other, but the edge detection based on gray images tend to ignore the correlation between them. In this paper, we focus on the relationship between every component, highlighting the edge change caused by the color information. Anisotropic Gaussian kernels(ANGKs) edge detection algorithm based on the chromatic difference is proposed in order to improve the performance of edge detection in color images. The proposed algorithm focuses on the chromatic difference among three components. First we derive the color difference S from the gray scale Y and the three channels in RGB color space. Then we use the ANGKs to calculate the gradient magnitude and the direction of S and Y to get Smag and Ymag, respectively. The final edges are obtained by double threshold processing after fusing magnitudes of Smag and Ymag and non-maximum suppression. We evaluates the performance of the proposed algorithm qualitatively and quantitatively for non-noise images and noise images. The experimental results show that the performance of the proposed algorithm is comparable to other approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditional 3D information acquisition of human body relies on either foreground extraction or threshold segmentation in a plain background. It is difficult to be applied directly in complex background. In this paper, a novel method is proposed on the basis of binocular vision, which combines the semantic segmentation of FCN with the depth segmentation to get the human body depth map. The depth map is obtained by binocular camera, and each point in the depth map corresponds to the point in the left camera image. The position of the human body is gained through semantic segmentation of the left camera image, then automatic depth segmentation can be conducted based on the depth of human body in the depth map. The final result is obtained by taking the intersection of the depth map segmentation result and the left camera image segmentation result. The results show that the segmentation precision is much higher than that of purely semantic segmentation of FCN, the segmentation accuracy has increased about 2%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
How to quickly obtain the precise local region images of main structure within the skull from the whole set of computed tomography (CT) image is a significant job as well as a difficult task when analyzing the skull CT images. A local segmentation method of skull CT image based on morphological processing and sparse field level set is presented in this paper. First, using various morphological operations to remove the unnecessary regions and get the rough local image of target region. Then taking its contour as the initial evolution curve and utilizing the sparse field level set method to segment the skull CT image, the precise local region of main structures within the skull can be obtained, such as occipital bone which is prone to injury. Moreover, the target contour obtained by the previous segmentation can be used as the initial contour of the next image segmentation, because the adjacent slices image of CT are very similar. It helps to segment the whole set of CT image more quickly which is conducive to save a lot of time in clinical diagnosis. The experiment results show that the proposed method is feasible and has a great effect.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As a fundamental step for successful human detection via thermal infrared imagery, image segmentation is always difficult. In this paper, we propose an approach for segmentation of infrared human sequences captured by the camera mounted on a moving platform. The approach starts to detect moving regions that may contain humans in a frame via motion compensation and frame differencing. Next, it detects static regions that may also contain humans in the same frame via intensity thresholding. Finally, fusing the outputs from previous stages by morphological operations, it reaches algorithmic result. Experimental results indicate that the proposed approach outperforms its rivals in segmentation accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the infrared image segmentation, spectral clustering needs to calculate the similarity matrix between pixel points, the amount of data is large and the calculation is time-consuming. To solve this problem, an improved spectral clustering infrared image segmentation algorithm based on improved sparse matrix is proposed. The algorithm combines the feature of the whole image with the relationship between pixels, and then convinces the network to extract the infrared image feature information through convolution, and uses the selected feature information to construct the sparse similarity matrix, and completes the segmentation by combining the spectral clustering method. Experimental results show that this algorithm can effectively reduce the computational complexity of spectral clustering and effectively improve the segmentation result of the target area of infrared images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a model for real-time pedestrian detection based on a deep learning framework. With respect to the base network for feature extraction, we have improved the network based on Mobilenet which is a simple and fast convolutional neural network. We only use the front part of its network and then build several new multi-scale convolutional layers to calculate multi-scale feature maps. With respect to the detection network behind the feature extraction, we use a simplified SSD(single shot multibox detector) model to detect pedestrians with fewer feature maps. In addition, we design detection boxes with specific sizes according to pedestrian’s shape characteristics. To avoid overfitting, we apply data augmentation and dropout techniques to training. Experimental results on PASCAL VOC and KITTI confirm that the speed of our detection model has been increased by 22.2% while precision remains almost unchanged. Our approach makes a trade-off between speed and precision, and has an obvious speed advantage over other detection approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to achieve fast and accurate target segmentation of ship images, a segmentation method based on fuzzy entropy and salient region extraction is proposed. Firstly, the multi-level fuzzy entropy and differential evolution is applied to obtain image segmentation result quickly. Then, for obtaining the seed points, a saliency detection method based on dual pyramids and feature fusion is used, and the target core region is generated by morphological open operation using reconstruction and region maximum. Finally, the image segmentation results are binarized in each layer and combined, and the region block is selected with the largest overlap on the target core region for the target segmentation results. The experimental results show that the new method can realize the fast and accurate segmentation of ship target images under various complex scenes.