PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 11526, including the Title Page, Copyright information, and Table of Contents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fifth International Workshop on Pattern Recognition
The traditional discernible criteria for a 2D target are mostly based on Johnson criterion, to overcome the limitations of the Johnson criterion and fill the gap in a 3D point cloud, a novel discernible criterion has been proposed for the 3D point cloud. Based on the multifractal spectrum, the spatial distribution of the 3D point cloud is described. By analyzing the multifractal spectra at different resolutions, feature trend and the final discernible resolution are concluded. The experimental results show that the limiting resolution of T90, F15C is 585mm, the limiting resolution of T90 and Rexton is 517mm, and the limiting resolution of F15C and Rexton is 541mm. The proposed discernible criteria can provide theoretical support for limit identification resolution of 3D point cloud target.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Graph matching is a classical NP-hard problem, and it plays an important role in many applications in computer science. In this paper, we propose an approximate graph matching method. For two graphs to be matched, our method first constructs an association graph with nodes representing the candidate correspondences between the two original graphs. It then constructs an affinity matrix based on the local and global distance information between the original graphs’ nodes. Each element of the matrix represents the mutual consistency of a pair of nodes of the association graph. After simulating random walks on the association graph, a stable quasi-stationary distribution is obtained. With the Hungarian algorithm, our method finally discretizes the distribution to achieve an approximate matching between the two original graphs. Experiments on two commonly used datasets demonstrate the effectiveness of our method on graph matching.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Exemplar-based voice conversion (VC) methods have several disadvantages: too many exemplars, phoneme mismatches, and low conversion efficiency. To solve these problems, this paper proposes a voice conversion method based on nonnegative matrix factorization (NMF) using Dictionary optimization and clustering, which applies low-resolution features instead of high-resolution features to construct dictionaries. Dictionary optimization based on minimizing cepstrum distortion selects some fitter exemplars from the original dictionary. Exemplar clustering divides the dictionary into multiple sub-dictionaries which have better representation based on feature parameters. The ARCTIC database is used for experiments. Results show that the proposed method can significantly improve the quality of converted speech while reducing the number of exemplars and improving efficiency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As one of the important components of intelligent transportation, license plate recognition plays an irreplaceable role in people's daily life. For example, illegal vehicles often escape from punishment because of the number plate defacement or intentional occlusion, which further increases the difficulty of law enforcement. Therefore, it is significant for automatic recognition system to improve the identification efficiency of the contaminated or occluded license plate. This paper mainly focuses on the recognition of occlusion number plate. License plates can be divided into four categories: normal number plate, partial occlusion number plate, complete occlusion number plate and unsuspended number plate. The traditional OCR algorithm has a high accuracy in the recognition of Chinese characters, characters and numbers. Although the detection of normal and partial occlusion plates also shows a good recognition in the case of OCR, the recognition of complete occlusion and unsuspended license plates is still very poor. With the development of artificial intelligence, it is possible to identify all the sheltered and unsuspended plates better. Combining with the advantages of traditional algorithms, this paper uses traditional OCR and current deep learning algorithm to optimize the recognition effect of stained license plate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There are many object detection methods in terms of object recognition based on traditional methods, but they are not sufficient to meet the demand for accuracy and speed in real-life scenarios. And compared with mobile platform, cloud service is also not conducive to the use in practical scenarios. Therefor we optimize the YOLO (You Only Look Once, a method for real-time detection of objects) algorithm through renormalization processing, build the Chinese road sign dataset and perform random affine transformation, random blur, and brightness transformation processing on the dataset to enhance the generalization ability of the final model. The parameters of the model are fine-tuned to reduce the period required to train the model and improve the performance of deep learning. Finally, the deep learning model of object detection will be transplanted to iOS mobile terminal to meet the requirements of real-time and accuracy in automatic driving scenarios. We identifie three types of road objects. The detection accuracy of pedestrians on road scenes reaches 75.9%, and the average detection accuracy of buses, cars, bicycles, and motorcycles is 72%. The detection accuracy of road signs is 69%. Total accuracy is 74.31%. The average detection rate of running tests on mobile phones is 12.5 frames per second.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The main task of object detection is to identify and locate interested objects from still images or video sequences. It is one of the key tasks in the field of computer vision. However, the object usually has variable factors in brightness, shape, occlusion and so on, and is interfered by various and complex environmental factors, which makes the research opportunities and challenges of object detection algorithm coexist. In this paper, a main frame of object detection algorithm based on convolutional neural network is studied, which is based on regression. We propose a real-time object detection algorithm based on fully convolution network, which aims to solve the problems of low detection accuracy and poor location accuracy of objects in regression method. The innovation is that the proposed fully convolution network increases the detection flexibility of the model because it is not affected by the input scale. At the same time, we propose a multi feature fusion and multi border prediction strategy, which effectively improves the detection accuracy of small objects. In order to prove the effectiveness of the proposed algorithm, we use PASCAL VOC data set to carry out object detection experiments. In this paper, the accuracy of each object category and the average accuracy of all categories are calculated. Experiments show that the performance of the multi feature fusion algorithm based on the fully convolution network is better than that based on the regression idea such as YOLO, and more than 10% higher than that of the YOLO model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Four dimensional (4D) flight trajectories play an important role in air traffic future plans. In this paper, the time and altitude variables in 4D trajectories are analyzed for their characteristics, and the procedure of preprocessing flight trajectory data is provided, and support vector regression and decision tree regression are introduced to build the prediction models for trajectory time and altitude, respectively. It is demonstrated by the experiments on actual flight trajectory data that the proposed method can improve the 4D trajectory prediction accuracy effectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Hierarchical models with HMM has the advantage of recognizing Chinese characters in digital ink from non-native language writers. However, the recognition performance has been limited by the attribute of generative model of HMM. In this paper, we apply Hidden Conditional Random Field to improve the performance of hierarchical models. First, strokes in one Chinese character are classified with HCRF and then concatenated to the stroke symbol sequence. In the meantime, the structure of components in one ink character is extracted. According to the extraction result and the stroke symbol sequence, candidate characters are traversed and scored. Finally, the recognition candidate results are listed by descending. The approach proposed is validated by testing 19815 copies of the handwriting Chinese characters written by foreign students.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sign language recognition is challenging, due to the scarcity of available annotated corpora and the difficulty of large vocabulary. In this paper, we study the task based on a Chinese SL database-DEVISIGN, but it only has a few samples to train the deep network on the scratch. First, we segment the hand to eliminate the disturbance of irrelevant factors. By analyzing the special movement tendency of sign words, we propose two novel Key-frame selection schemes. Since no other datasets can have similar data distribution with our preprocessed data, we invent a novel cross-sampling approach, which successfully prevent the overfitting under small sample. To enhance the diversity of data, we take several samplingbased videos as input, and learn spatiotemporal features based on R(2+1)D-18 layers, which is successful in action recognition tasks. Finally, it is shown that our solution can obtain the state-of-the-art performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a classification method for rotor imbalance fault (RIF) using support vector machine (SVM) is proposed. It adopts an improved shuffled frog-leaping algorithm (ISFLA) to optimize the parameters of SVM. Given the nonuniformity and the defect of trapping into the local optimum solution of the initial population existed in SFLA, some improvement methods are presented in ISFLA-SVM. ISFLA employs random uniform design (RUD) to generate an initial population. Besides, the global optimum solution of the proposed method could be found by changing the updating strategy of Xw in the subgroup. The performance of these three classification algorithms, i.e., particle swarm optimization (PSO)-SVM, SFLA-SVM, and ISFLA-SVM are compared. Analysis results show that ISFLA-SVM has the highest recognition accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this article, an expression recognition algorithm based on feature fusion was proposed. First, 40 sets of Gabor filters were selected to perform filtering operations on the expression images to enhance the texture features of the expression images, and subsequently, Local Binary Patterns(LBP) operators were used to perform feature extraction on the filtered images output by each Gabor channel to obtain LBP feature maps. Then these characteristic graphs are taken as the input of the convolutional neural network and the convolutional neural network is trained.Finally, the input of the fully connected layer of the trained convolutional neural network was taken out separately as the features of the expression image, and these features are classified and identified using the extreme learning machine algorithm. The experimental results showed that the method in this paper was better than the method using a single feature and can effectively improve the recognition rate in expression recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual question answering is a task of significant importance for research in artificial intelligence. However, most studies often use simple gated recurrent units (GRU) to extract question or image high-level features, and it is not enough for achieving a better performance. In this paper, two improvements are proposed to a general VQA model based on the dynamic memory network (DMN). We initialize the question module of our model using the pre-trained language model. On the other hand, we utilize a new module to replace GRU in the input fusion layer of the input module. Experimental results demonstrate the effectiveness of our method with the improvement of 1.52% on the Visual Question Answering V2 dataset over baseline.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The pre-research shows that Linear prediction (LP) residual contains more discriminative information related to replay spoofing attacks, so this paper proposes three features based on LP residual and IMel filter-banks which closely distributed in the high-frequency regions for replay spoofing countermeasures. They are residual IMel frequency cepstral coefficient (RIMFC), LP residual Hilbert envelope IMel frequency cepstral coefficient (LHIMFC) and residual phase cepstral coefficient (RPC). The effectiveness of these features is demonstrated on ASVspoofing2017 Challenge Version 2.0 dataset. Experimental results indicate that the proposed features outperform the baseline system using constant Q cepstral coefficient (CQCC), and the equal error rate (EER) is reduced under the same conditions. Moreover, feature fusions help to achieve higher performance than traditional IMel frequency cepstral coefficient (IMFCC) and CQCC, which indicates that the complementary information of different features is beneficial for detecting replay attacks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to achieve the camera calibration, the calculation process of the camera’s internal and external parameters was obtained through the established camera calibration model. Based on the coplanar points, the camera calibration model was simplified. With distortion model and Levenberg-Marquardt algorithm, the system calibration’s accuracy was improved. The experimental results showed that the calibration error was smaller and the error data was more concentrated, which realized the accurate calibration of the camera.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Driven by recent computer vision applications, recovering 3D pose in the field of figure skating has become increasingly important. However, conventional works have suffered because of getting 3D information based on the corresponding 2D information directly or leaving the specificity of sports out of consideration. Issues such as restriction from self-occlusion, abnormal pose, limitation of venue and so on will result in poor results. Motivated by these problems, this paper proposes a multitask architecture based on a calibrated multi-camera system to facilitate jointly 3D jump pose of figure skater in the presence of the 2D Part Confidence Map. The proposals consist of three key components: Temporal smoothness and likelihood distribution based discrete probability points selection; Multi-perspective and combinations unification based large-scale venue 3D reconstruction; Spatial confidence point group and multiple constraints based human skeleton estimation. This work can be applied to 3D animated display and video motion capture of figure skating competition. The accuracy rate on the test sequences is 82.32% in body level and 92.96% in joint level.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper studies the deployment strategy for software components in distributed systems. Based on genetic algorithms, Intelligent Deployment Strategy is designed to optimize the allocation and deployment of software components to make distributed systems achieve load balancing and high efficiency. Simulation results show that using Intelligent Deployment Strategy can realize better allocation of system resources than common Round-Robin Scheduling strategy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, a video synthesis model based on Generative Adversarial Networks (Human GAN) is proposed, whose objective is to generate a photorealistic output by learning the mapping function from an input source to output video. However, the image to image generation is a quite popular problem, but the video synthesis problem is still unexplored. Directly employing existing image generation method without taking temporal dynamics into account leads to frequent temporally incoherent output with low visual quality. The proposed approach solves this problem by wisely designing generators and discriminators combined with Spatio-temporal adversarial objects. While comparing it to some robust baselines on public benchmarks, the proposed model proves to be superior in generating temporally coherent videos with extremely low artifacts. And results achieved by the proposed model are more realistic on both quantitative and qualitative measures compared to other existing baselines techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The system uses the Unity3D engine to develop the Android app, and develop AR technology modules with Vuforia toolkit, integrating geographic information service technology and panorama technology. We combined the two tracking and registration methods which are based on sensors and natural images features, to implement a tourism navigation and AR introduction system based on Kulangsu, a famous scenic spot in Xiamen. The system is mainly divided into two modules, the route navigation module and the scenic spots guide module.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.