This PDF file contains the front matter associated with SPIE Proceedings Volume WPR19, including the Title Page, Copyright information, Table of Contents, Author and Conference Committee lists.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Bayer color filter array (CFA) pattern is the most widely used CFA pattern in the digital color cameras market. The chroma 4:2:0 subsampling of Bayer CFA images is a necessary process prior to compression. In this paper, based on the CFA block-distortion minimization criterion, we propose an effective region-based chroma 4:2:0 subsampling method for Bayer CFA images. Based on the test Kodak and IMAX datasets, the experimental results demonstrated that in the current high efficiency video coding (HEVC) reference software HM-16.18, our method has substantial quality of the reconstructed images when compared with the existing six chroma subsampling methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Counterfeit protection plays an essential role in the business industry. A manufacturer and seller of these products gain income by misleading consumers into buying replicas and Class A products. This holds true for designer clothes, bags, jewelry, shoes, etc. Counterfeit products are produced with the intent to take advantage of the superior value of the original product. In this paper, the researchers propose a method of detecting counterfeit shoes, specifically Stan Smith and Gazelle of Adidas Company. The shoes will be captured in specific areas such as midsole, insole, quarter, tongue, sole, and heel cap where Adidas logos or trademark is present. This study uses image processing techniques such as Circular Hough Transform, A-KAZE, and Optical Character Recognition. The results showed that the methods were successful in determining an authentic shoe from a non-authentic shoe. It was determined that the system has an accuracy of 93% and 96%, while having an error rate of 7% and 4% for Gazelle and Stan Smith respectively. Furthermore, a true positive rate of 100% for both shoes implies that whenever a shoe is predicted to be authentic it is actually authentic. On another note, a false positive rate of 12.3% and 7.4% for Gazelle and Stan Smith respectively implies how often is it predicted to be authentic when it is actually not authentic.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the great development of image display technology and the widespread use of various image acquisition device, recapturing high-quality images from high-fidelity LCD (liquid crystal display) screens becomes relative convenient. These recaptured images pose serious threats on image forensic technologies and bio-authentication systems. In order to prevent the security loophole of image recapture attack, inspired by the effectiveness of LBP (local binary pattern) on recaptured image detection and the satisfactory performance of deep learning techniques on many image forensics tasks, we propose a recaptured image detection method based on convolutional neural networks with local binary patterns coding. The LBP coded maps are extracted as the input of the proposed convolutional neural networks architecture. Extensive experiments on two public high-quality recaptured image databases under two different scenarios demonstrate the superior of our designed method when compared with the state-of-the-art approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Convolutional neural network has achieved excellent success in single image super-resolution. In this paper, we present a progressive approach which reconstructs a high resolution image and optimizes the network at each level. In addition, our method can generate multi-scale HR image by one feed-forward network. The proposed method also utilizes the relationships among different scales, which help our network perform well on large scaling factors. Experiments on benchmark dataset demonstrate that our method achieves competitive performance against most state-of-the-art methods, especially for large scaling factors (e.g. 8×).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fine-grained visual classification (FGVC) is difficult due to the under-utilization of low-level features. This paper proposes a real-time method MBNet based on multi-stream multi-scale cross bilinear CNN that contributes to solving the problem. First, each layer of the multi-stream CNN is extracted by basic network such as VGGNet and others, followed by calculating multi-stream cross bilinear vector and bottom bilinear vector of low and high level features respectively. The FGVC results are predicted after feature fusion, which solves the problem that small and low-level details in the original image are easily overlooked. In the widely used datasets Caltech-UCSD Birds, Stanford Cars and Aircraft, the proposed method shows that the accuracy is significantly improved compared to the existing methods, reaching to state of the art level of 88.51%, 94.73% and 92.41%. It also meets the requirements of real-time tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The purpose of this paper is to determine the meat quality of raw pork and beef by means of a gas sensor and Open Source Computer Vision for real-time pattern recognition. This is to reinforce the meat quality detection. Nowadays, people only rely on a simple test method in determining the meat quality. This includes, sensory evaluation, physical, chemical and microbiological testing are described. Lipid Oxidation is a reaction that takes place when oxygen has access to products containing fat or pigments. The main purpose of the study is to determine the quality of raw pork and beef via different but effective methods. Subsequent to this, Oxidation pattern of meat was also investigated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, an improved threshold function is proposed for the discontinuity problem of hard threshold function and the constant deviation of original wavelet coefficients and estimated coefficients in soft threshold function. This function improves the deficiency of traditional threshold function by introducing control coefficients, and the function have certain flexibility. The experimental results show that the improved wavelet threshold function has a better effect in removing image gaussian noise than traditional threshold function. Moreover, the PSNR(peak signal to noise ratio ) and EPI (edge retention index) values of the image are both improved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recognizing fine-grained categories is difficult due to the challenges of discriminative region localization and fine-grained feature learning. To handle this circumstance, we propose a novel model termed SDN-Net for SE-DPN-Navigator Networks, which consists of DPN (Dual Path Networks), SE-blocks (Squeeze-and-Excitation Blocks) and a Navigator. DPN shares common features while maintaining the flexibility to explore new features. Moreover, we add SE-blocks into DPN to make up the SE-DPN which acts as a feature extractor of the proposed model, SE-blocks helps the model learn to use global information to selectively emphasize informative features and suppress less useful ones. We also use a Navigator to help the model to detect most informative regions without extra bounding box/part annotations. Our model can be trained end-to-end. With the great cooperation between these three components, we achieve state-of-the-art performance on two publicly available fine-grained recognition datasets (CUB-200-2001 and Stanford Cars). Besides, We have done ablation studies and confirmed the effectiveness of each components in the proposed model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many clustering algorithms work well on small data sets of less than 200 data objects. However, a large database may contain millions of objects, and clustering on such a large data set may lead to biased results. As data volumes and availability continue to grow, so does the need for large dataset analytics. Among the most commonly used clustering algorithms, K-means proved to be one of the most popular choices to provide acceptable results in a reasonable amount of time. In this paper, we present an improved k-means algorithm with better initial centroids. Also, we implement this modified algorithm on Hadoop platform. Experiments show that the improved k-means algorithm converges faster than the classic k-means and the average execution time is reduced compared to the traditional k-means.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, a complex-valued convolutional neural network (CV-CNN) has been used for the classification of polarimetric synthetic aperture radar (PolSAR) images, and has shown superior performance to most traditional algorithms. However, it usually yields unreliable results for the pixels distributing within heterogeneous regions or the edge areas. To solve this problem, in this paper, an edge reassigning scheme based on Markov random field (MRF) is considered to combine with the CV-CNN. In this scheme,both the polarimetric statistical property and label context information are employed. The experiments performed on a benchmark PolSAR image of Flevoland has demonstrated the superior performance of the proposed algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a new unsupervised classification framework based on tensor product graph (TPG) diffusion, which is generally utilized for optical image segmentation or image retrieval and for the first time used for PolSAR image classification in our work. First, the PolSAR image is divided into many superpixels by using a fast superpixel segmentation method. Second, seven features are extracted from the PolSAR image to form a feature vector based on segmented superpixels and construct a similarity matrix by using the Gaussian kernel. Third, TPG diffusion is performed on this similarity matrix to obtain a more discriminative similarity matrix by mining the higher order information between data points. Finally, spectral clustering based on diffused similarity matrix is adopted to automatically achieve the classification results. The experimental results conducted on both a simulated PolSAR image and a real-world PolSAR image demonstrate that our algorithm can effectively combine higher order neighborhood information and achieve higher classification accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The diagnosis of skull fracture is mainly judged by analyzing the scanned image of the skull. The diagnosis of skull fracture is essentially a special image classification problem. Recently, image classification methods based on deep learning have achieved good performance for general image classification. However, the effect of applying these methods to the diagnosis of skull fracture is not satisfactory. The reason is that it is difficult to distinguish the fracture regions from the background in the scanning image, and the extracted features of skull fracture and the background are very similar and indistinguishable. In order to solve the above problems, this paper proposed a novel skull fracture image classification approach based on attention mechanism, the proposed multi-scale transfer learning and residual network (ResNet), called attention-based multi-scale transfer ResNet (AMT-ResNet). In AMT-ResNet, attention mechanism is employed to give different focus to the feature information extracted by ResNet. In addition, the proposed multi-scale transfer learning is used to extract the common features from the multi-scale skull fracture images. Our proposed approach is evaluated on the datasets provided by Fujian medical university union hospital. Experimental results show that AMT-ResNet obtains better classification accuracy than other methods on skull fracture image classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The measurement of trajectory distance is the base of trajectory clustering. To deal with the flight trajectory clustering in air traffic, a novel method is proposed in this paper to measure the flight trajectory distance. This method views the trajectory as a set of segments, whose end points are trajectory points, and it measures the distance from a trajectory point to another trajectory, and thus presents the distance definition of trajectories. Based on the calculated distance matrix, spectral clustering algorithm is adopted to cluster flight trajectories. The experiment on actual flight trajectory data verifies the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The population of those who are developing caries lesions are increasing. To aid dental practitioners in detecting and identifying caries lesions that the time needed to observe an active lesion can be shortened and be more objective is a great help in slowing down the increasing rate of dental cases. The use of Near infrared light as a non-ionizing alternative for radiograph has been used in several medical studies. To maximize the use of NIR light, a prototype with image filtering and segmentation process and machine learning program was designed to identify caries lesion severity using the International Caries Classification and Management System (ICCMS) Caries Merged Categories. It uses CART (Classification and Regression Trees) a decision tree algorithm that trains to classify data and uses various classifiers for machine learning and model training. In the study, images with NIR illumination were used to test the performance of the prototype which was assessed by the dental practitioner beforehand. A total of 122 tooth samples were used in the simulation. Twenty percent (20%) of the total samples were classified as R0, 40% as RA, sixteen percent (16%) as RB and twenty-four percent (24%) as RC according to the ICCMS caries categories. The prototype was proven to yield results with a confidence level not less than ninety-five percent (95%). The Study was relevant to the process of immediate and non-ionizing determination of carries lesions and to the developing role of NIR light usage for tooth illumination.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
At present, academic research mainly focuses on detecting driver fatigue and distraction through the driver's eyes and head. But there are few studies on detecting driving behavior through the head, hands and even the body, most of which use the skin color detection method to extract a single full-image pixel as a feature and the dimension is too large, problems such as instantaneous region overlap and partial occlusion occur inevitably in the detection process, thereby affecting the detection accuracy. In this paper, we propose a driving posture detection method based on video and skin color region distance. The image features are represented by extracting the skin color region centroid coordinates of the sampled images from videos and converting them into feature distances. Then the BP neural network is used to implement the identification and classification of driving behavior, which can effectively improve the detection rate of the driving behavior, and finally realize the real-time warning of the driving process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The idea of ensemble learning can be used to solve problems about privacy preserving distributed data mining conveniently. Owners of distributed datasets can get an integrated model securely just by sharing and combining their sub models which are built on their respective sample sets, and generally the integrated model is more powerful than any sub model. However, sharing the sub models may cause serious privacy problems in some cases. So in this paper, we present a new method, based on which the data holders can integrate their sub polynomial regression models securely and efficiently without sharing them, and get the optimal combination regression model. In addition to theoretical analysis, we also verify the availability of the new method through experiments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposed a face recognition algorithm based on conjugate gradient extreme learning machine. General extreme learning machine algorithm, which is gained by using method of calculating generalized inverse, the process is a large amount of computation and memory consumption. For this problem, this paper proves the positive definiteness of the calculated matrix, and based on this, an extreme learning machine solution algorithm based on conjugate gradient algorithm was proposed and kernel function is introduced to improve its nonlinear classification performance. At the same time, DAG method is used to extend the binary classification conjugate gradient extreme learning machine to multi-classification problems. Experimental results show that the computational speed of the algorithm in this paper is faster than that of the general extreme learning machine algorithm, and the classification accuracy is higher than that of the general extreme learning machine algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic heartbeat classification is an important technique to assist doctors to identify ectopic heartbeats in long-term Holter recording. In this paper, the ECG signal in the MIT-BIH database is filtered first, and then the R-peak detection is performed by the classical method named Pan-Tompkin. The first 100 and the last 150 data points of the R-peak are as chosen as matching signals. Following the recommendation of the Advancement of Medical Instrumentation (AAMI), all the heartbeat samples of MIT-BIH could be grouped into four classes, such as normal or bundle branch block (i.e., class N), supraventricular ectopic (i.e., class S), ventricular ectopic (i.e., class V) and fusion of ventricular and normal (i.e., class F). The division of training and testing data complies with the inter-patient schema. The ECG signals are matched and recognized as specific cardiac diseases using curve fitting and the hierarchical dynamic time warping (DTW) algorithm.Experimental results show that the average classification accuracy of the proposed DTW algorithm is 92.51%, outperforming the other methods. The sensitivities for the classes N, S, V and F are 98.94%, 99.06%, 96.77% and 93.81% respectively, and the corresponding positive predictive values are 93.94%, 91.18%, 88.24% and 96.67%, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposed an automated system that captures an infrared image using Adafruit AMG8833 IR Thermal Camera and record audio via an omnidirectional microphone connected to a sound card and process the data to determine if the swine had experience thermal stress. Temperature together with the frequency and noise intensity of the swine were logged into the system for the data analysis. After the system detected that swine was under thermal stress, the misting and ventilation is activated that reduce the amount of heat the swine had experienced. Two test was conducted for comparison. A controlled setup with the misting and ventilating and an uncontrolled with only the thermal camera and microphone. The data gathered proves that maintaining the pig's temperature at normal levels through the help of an automated sprinkling and ventilating device results to better growth performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Point cloud registration in military scenarios is pivotal to automatic object reconstruction and recognition. This paper proposes 1) a multi-scale binary feature representation called mLoVS (multi-scale local voxelized structure) and 2) a “min-pooling” based feature matching technique for accurate registration of tank point clouds. The key insight of our method is that traditional fixed-scale feature matching methods either suffer from limited shape information or data missing caused by occlusion, while the multi-scale way provides a flexible matching choice. In addition, the binary nature of our feature representation can alleviate the increased time budget required by multi-scale feature matching. Experiments on several sets of tank point clouds confirm the effectiveness and overall superiority of our method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The extraction of errors is an important aspect of Chinese character writing research. Stroke errors are the origins of most handwriting mistakes. Previous works have made some efforts on the types of errors extracted, while most of them are either preset by rules or deficient to include all types of stroke errors. For foreign students learning Chinese as a foreign language, especially beginners whose writing habits and characteristics are affected by ones of their native languages, methods by presetting are difficult to adopt. Therefore, this paper initiated from the data itself, proposes an adaptive approach to extract handwriting errors based on the result of the stroke matching which is accurate to sampling points in strokes. After the tagging list given as a matching index, the writing errors are adaptively extracted in different stroke errors of Chinese characters, including missing strokes, extra strokes, concatenated strokes, broken strokes, redundant strokes, incomplete strokes, the error of orientation and order. After serial experiments, the result indicates that the proposed approach is effective in extracting handwriting stroke errors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a novel Vision- and system on chip (SoC)- based fall detection method for the elderly. Once, a fall event is detected, an alarm signal is immediately sent out to query first aid to the elderly. Our novel fall detection method consists of five effective steps: checking whether the light condition has been stabilized, GMM-based background and foreground estimation, a new strategy to solve the foreground lag problem, solving the false fall detection problem when light comes from a neighboring room, as well as the fall detection determination and the general-purpose input/output based warning mechanism. Based on the test videos, the experiments have been carried out demonstrate that our proposed fall detection method can meet the real-time, low cost, and high accuracy demands.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurate detection of traffic signs is vital in many applications, such as driving assistance systems and autonomous vehicles. However, since urban scenes are often cluttered with confusing objects, the signs may appear scale-variant (from large to small sizes) in a single image or image sequences of traffic scenes, when autonomous vehicles are moving fast; traffic signs therefore need to be detected early and accurately when they appear small in the images, and tracked for timely recognition and decision.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose an efficiently discriminative method that using AdaBoost as binary classifiers combined with musical signal properties for polyphonic piano music multi-pitch detection. As features, we use spectral components of multiples and divisions of notes’ fundamental frequency, which can reduce note’s feature redundancy compared with full spectrum. For the frame-level multi-pitch detection, the features of notes have adjacent pitches are similar (we called it shift invariance), which inspires us to use one binary classifier to detect those notes’ pitch. In a certain extent, those adjacent notes improves the classifier’s generalizability. In the post-processing stage, to combine with time property, we concatenate each notes’ several continuously frame-level predictions as their new features for final pitch detection. In conclusion, the proposed method with fewer classifiers achieves better performance compared with other methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Compared with the original speech, the replay attack speech passes through a complex channel mainly composed of a recording device and a playback device, and the frequency response of the channel causes a obvious change to the high and low frequency bands of the original speech spectrum. This paper proposed a Channel Difference Enhancement Cepstral Coefficient (CDECC) feature that enhances the channel frequency response difference, and detects the replay attack speech by enhancing the spectral difference caused by the channel frequency response. Experiments based on the ASVspoof 2017 Challenge data set show that the proposed method has a significant improvement in detection performance compared to the baseline system using Constant Q Cepstral Coefficients (CQCC), and the equal error rate (EER) is reduced by 18.20% under the same conditions, indicating that the performance of the CDECC feature is more effective than that of CQCC and MFCC features in detecting replay attack speech.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Based on the principle of SSD (Single Shot Multibox Detector) convolutional neural network algorithm, this paper develops corresponding training strategies, and uses the source data generated under a large number of power-grid scenarios to train and generate a 100-megabyte neural network model for intelligent monitoring of external force damage on transmission lines. Using the deep compression technology, the trained neural network model is re-trained and optimized in a targeted manner to ensure a compression ratio of 30%-50% under the premise that the accuracy is not degraded. In this way, the hardware storage resource configuration is more reasonable when the model is deployed on the embedded platform.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Change detection has been a challenging visual task due to the dynamic nature of real-world scenes. Good performance of existing methods depends largely on prior background images or a long-term observation. These methods, however, suffer severe degradation when they are applied to detection of instantaneously occurred changes with only a few preceding frames provided. In this paper, we exploit spatio-temporal convolutional networks to address this challenge, and propose a novel retrospective convolution, which features efficient change information extraction between the current frame and frames from historical observation. To address the problem of foreground-specific overfitting in learning-based methods, we further propose a data augmentation method, named static sample synthesis, to guide the network to focus on learning change-cued information rather than specific spatial features of foreground. Trained end-to-end with complex scenarios, our framework proves to be accurate in detecting instantaneous changes and robust in combating diverse noises. Extensive experiments demonstrate that our proposed method significantly outperforms existing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Single image crowd counting remains challenging primarily due to various issues, such as large scale variations, perspective and non-uniform crowd distribution. In this paper, we propose a novel architecture referred to Second-Order Convolutional Network (SOCN) to deal with this task from the perspective of improving the feature transformation capability of the network. The proposed SOCN applies a convolutional neural network as the backbone. We introduce three cascaded second-order blocks located behind the backbone to augment the family of transformation operations and increase the nonlinearity of the network, which can extract multi-scale and discriminative features. Furthermore, we design a context attention module (CAM) including dilated convolutions to assign weights to the score map of each second-order block for the purpose that the features which contribute to counting can be highlighted. We conduct various experiments on ShanghaiTeach1 and UCF_CC_502 datasets, and the results demonstrate the effectiveness of our method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of oblique photography (OP) in recent years, the accuracy of reality modeling has increased, which has led to a surge in computational complexity. To solve the problem, a lot of reality modeling software adopts the strategy of cluster parallel computing for modeling. In this paper, the regression analysis method is used to study the influence of the configuration of the compute nodes in the cluster, which aims at improving the computational efficiency of the cluster for the 3D reconstruction task. Furthermore, the M/M/S queuing model in queuing theory is used to model the multi-task assignment of the cluster, and the mathematical model between compute nodes and performance of the cluster is established, which achieves the effective quantitative evaluation of the cluster computing efficiency. Experiments show that the CPU performance of the compute nodes is the most critical hardware factor affecting the efficiency of the cluster.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wireless Capsule Endoscopy (WCE) enables physicians to examine gastrointestinal (GI) tract without surgery. It has become a widely used diagnostic technique while the huge image data brings heavy burden to doctors. As a result, computer-aided diagnosis systems that can assist doctors as a second observer gain great research interest. In this paper, we aim to demonstrate the feasibility of deep learning for lesion recognition. We propose a Second Glance framework for ulcer detection and verified its effectiveness and robustness on a large ulcer WCE dataset (largest one to our knowledge for this problem) which consists of 1,504 independent WCE videos. The performance of our method is compared with off-the-shelf detection frameworks. Our framework achieves the best ROC-AUC of 0.9235 and outperforms the results of RetinaNet (0.8901), Faster-RCNN(0.9038) and SSD-300 (0.8355).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The size of 3D model reconstructed based on oblique photography is always too large to load into Unity3D efficiently and robustly for roaming. To solve this problem, we propose a novel roaming method for oblique photography threedimensional models in Unity3D. The method can quickly load large-scale oblique photography model in Unity3D and realize fluency virtual roaming. Firstly, different level of detail models are generated by using LOD (level of detail) technology and divide the LOD models into blocks with same size. Secondly, we load the entire low LOD model as a panoramic view of the scene and load little high LOD model blocks around the location of viewpoint dynamically while roaming. A Nine-palace mode is adopted for high LOD model blocks selection strategy. Finally, a coroutines and asynchronous loading methods are used to further improve the roaming process. The experimental results show that our method is faster than Acute3D Viewer in the visualization of oblique photography model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problem of low accuracy and poor safety of traditional piston contact detection, a multi-parameter geometric measurement system of piston based on laser projection is proposed in this paper. Based on the principle of parallel light column projection imaging, a mechanical structure, such as motion control based on board card, was designed to realize multi-parameter non-contact detection of piston ring groove width, ring groove depth and ring groove inclination. Through the experimental study, the size tolerance of piston ring groove is about 0.02 mm, and the angle tolerance is about 0.2 degree, also the error analysis of the whole measuring system is carried out. The results show that it is feasible to detect the multi-parameter geometry of piston ring groove by laser projection principle, and it can simplify the step of piston detection, ensure the detection accuracy and improve the efficiency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, the research and development of environmental action monitoring at home and abroad are in full swing. RF sensing has achieved a series of breakthrough research results based on its advantages in the application of behavioral identification, positioning and target monitoring. This paper first introduces the application scenarios of RF-awared environment monitoring and compares them with traditional environmental monitoring technologies. Then it analyzes the basic principles of RF sensing technology and the applications of signal acquisition method, feature extraction method, fingerprint database establishment method and machine recognition method in behavior recognition, positioning and target monitoring. Finally, the limitations of the current research and the future development directions are pointed out.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study proposes an adaptive e-learning environment design framework with a focus on learner personas. Following the TPACK model, this study puts forward a framework of Persona-based Technological Pedagogical Content Design (PTPCD). The framework of PTPCD guides designers of adaptive e-learning environment through suggested and recommended Technological Pedagogical Content to the target learners with matching persona intelligently or semiintelligently. Designers are suggested to select the indicators consciously based on the pedagogy, technology, and specific content. Using data mining techniques to label personalized classification, which leads to learner personas. The PTPCD has theoretical and practical implications for designers and researchers of adaptive e-learning environment. Future studies are suggested to demonstrate and complement the framework in practice.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Classification is one of the most important techniques in machine learning. In classification problems, logistic regression and decision tree are two efficient algorithms in supervised learning. In this paper, we tested logical regression and CART decision tree algorithms on different datasets. The results received from experiments showed that CART decision tree performs much better in data set with more attributes and slight imbalanced data distribution. At the same time logistic regression is more accurate on datasets with fewer attributes and balanced data distribution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vision-based 1D barcode reading gains increasing research due to great demand of high degree of automation. Aiming at detecting image region of 1D barcodes, existing geometric approaches barely balance speed and precision. Deeplearning- based methods can locate 1D barcode fast but lack effective and accurate segmentation process, while pure geometric-based methods take unnecessary computational cost when processing high resolution image. We propose to integrate the deep-learning and geometric approaches, to tackle robust barcode localization in the presence of complicated background and accurate barcode detection within the localized region, respectively. Our integrated solution benefits the complementary advantages of the two methods. Through extensive experiments on standard benchmarks, we show our integrated approach outperforms the state-of-the-arts by at least 5 percentages.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, quantizing the weights of a deep neural network draws increasing attention in the area of network compression. An efficient and popular way to quantize the weight parameters is to replace a filter with the product of binary values and a real-valued scaling factor. However, the quantization error of such binarization method raises as the number of a filter's parameter increases. To reduce quantization error in existing network binarization methods, we propose group binary weight networks (GBWN), which divides the channels of each filter into groups and every channel in the same group shares the same scaling factor. We binarize the popular network architectures VGG, ResNet and DesneNet, and verify the performance on CIFAR10, CIFAR100, Fashion-MNIST, SVHN and ImageNet datasets. Experiment results show that GBWN achieves considerable accuracy increment compared to recent network binarization methods, including BinaryConnect, Binary Weight Networks and Stochastic Quantization Binary Weight Networks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The class imbalance problem is one of the key challenges in machine learning and data mining. Imbalanced data can result in the sub-optimal performance of classification models. To address the problem, a variety of data sampling methods have been proposed in previous studies. However, there is no universal solution and it is worth to explore which kind of data sampling technique is more effective in balancing class distribution in terms of the type of data and classifier. In this work, we present an experimental study based on a number of real-world data sets obtained from different disciplines. The goal is to investigate different sampling techniques in terms of the effectiveness of increasing the classification performance in imbalanced data sets. In particular, we study ten sampling methods of different types, including random sampling, clusterbased sampling, ensemble sampling and so on. Besides, the C4.5 decision tree algorithm is used to train the base classifiers and the performance is measured by using precision, G-Measure and Cohen's Kappa statistic.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.