PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 11719, including the Title Page, Copyright information, and Table of Contents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Intelligent Recognition Technology and Image Application
In this paper, an adaptive method is proposed to address the problems of over-enhancement of weak light image enhancing and adaptivity of parameter settings based on local gamma transform and illumination-reflection model. In which the source image is converted from RGB color space into the YUV color space firstly, and the illumination distribution of the scene is extracted by using a guided filtering function. Then, an adaptive local gamma transform is designed to perform enhancement on the illumination component and the dynamic range of gray-scale is expanded. Finally, the image is changed from YUV space to RGB space. Experimental results shows that the proposed algorithm can not only effectively improve the visual effect of the uneven light image but also reveal more detailed information in dark regions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
At present, remote sensing image vehicle detection based on deep learning has achieved certain results, but most of them rely on powerful PC computing power and cannot be deployed in satellites, so they cannot provide support for satellite in-orbit detection. Aiming at this problem, this paper proposes a remote sensing image vehicle detection method based on YOLOv5 model and successfully deploys it in Jetson TX2 embedded equipment that can be deployed on a satellite platform. Experiments have proved that the algorithm proposed in this article detects vehicle targets in a 12000*12000 pixels wide remote sensing image in an embedded device, and the detection time is only about 1 minute and 20 seconds at the fastest.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a warship image segmentation algorithm based on Mask RCNN network. Based on the Tensorflow+ Keral deep learning framework, the Mask-RCNN network structure was constructed. The segmentation of the image of warship at sea level was achieved by using the supervised learning method and tagging of the data set. Mask R-CNN is the most advanced convolutional neural network algorithm, which is mainly used for object detection and object instance segmentation of natural images. Due to the difficulty in obtaining warship samples and the insufficient number of data sets, the method of data enhancement is adopted to expand the data set. Through parameter adjustment and experimental verification, the mAP of warship reaches 0.603, which can meet the requirements of high-precision segmentation. The experimental results show that the Mask RCNN model has a very good effect on the image segmentation of naval ships at sea.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Nitrogen is an essential nutrient for citrus growth. Thus, the chemical analysis of leaf tissues is needed to determine nitrogen in the traditional agronomic method, which is time consuming, labor intensive, and costly. Satellite remote sensing (RS) can quickly acquire multispectral images of large-scale orchards and thus can support low-cost and periodic monitoring of nitrogen content in orchards. RS data have been widely used for the monitoring of nitrogen content in various crops and performed quite well in related researches. However, few studies have been conducted to evaluate the leaf nitrogen content (LNC) of citrus on the basis of the data acquired by satellite RS. In this study, Landsat 8 RS image data are used to estimate the distribution of LNC in an orchard, and the effectiveness of different estimation methods for monitoring LNC value is studied. Linear regression, partial least square regression (PLSR), support vector regression (SVR), random forest regression (RF), and deep neural network (DNN) models are constructed and compared. Experimental results demonstrate the feasibility of using satellite RS data in determining LNC in sugar citrus. In terms of evaluating LNC, the PLSR algorithm outperforms other algorithms in testing data, reaching a determination coefficient of 0.864, a root mean squared error of 1.217, and a mean relation error of 3.5%. An accurate spatial distribution of nitrogen content in an orchard can be obtained by our model, which can be used to provide powerful support for the practical management and operation of the orchard.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of remote sensing technology and the differences in remote sensing image classification, it is particularly important to be able to accurately use classification methods to classify images and to compare classification algorithms. In this paper, taking Yangshuo County as the research area, five common supervised classifications, namely support vector machine (SVM), maximum likelihood classification (MLC), neural network (NN), spectral angle mapping (SAM) and spectral information divergence (SID), are used to classify the land cover of remote sensing image data of GF- 2、Landsat8 and its fusion in the same area. The classification results are obtained and compared. Moreover, the overall classification accuracy (OA) and Kappa coefficient are used to evaluate the performance of the image classification algorithm. The results show that both MLC and SVM perform best on these three data sets. For higher spatial resolution GF-2 and fusion data, the OA and Kappa coefficients of both image data classifiers is 10% higher than those of Landsat8 data with higher spectral resolution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a low-resolution ground surveillance radar automatic target recognition(ATR) method based on onedimensional convolutional neural network (1D-CNN), which solves the problem of overfitting using complex CNN for data classification. First, the target recognition algorithm combines the time-domain waveform, power spectrum, and power transform spectrum into the three channels of the established 1D-CNN input. After that, the autoencoder is used to reduce the feature dimension and improve the classifier's ability to select parameters autonomously. Finally, the Bayesian hyperparameter optimization method is used to optimize hyperparameters, which not only simplifies the network structure, but also reduces the parameter calculation scale. We tested our method with the collected data to classify people and cars, and the results showed that the recognition accuracy rate has reached 99%. Compared with the traditional artificial feature extraction target recognition method, our model has better recognition performance and adaptability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Screening and diagnosing of the melanoma are crucial for the early diagnosis. As the deterioration of melanoma, it can be easily separated from the other materials based on the spectral features and spatial features. With the image of microscopic hyperspectral, this paper applies spectral math to preprocess the image firstly and the utilizes three traditional supervised classifications-maximum likelihood classification (MLC), convolution neural networks (CNN) and support vector machine (SVM) to make the segmentation after preprocess. Finally, we evaluate the accuracy of results generated by three to get the best segmentation method among them. This experiment shows practical value in pathological diagnosis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As a typical pattern recognition problem, specific emitter identification (SEI) is a crucial step to achieve efficient spectrum sensing. In this work, an emitter identification method based on Signal Graph Capsule Network, which refered as SGCN, is proposed. First, emitter signal is transformed into an undirected graph according to the Euclidean distance from its sampling point, and then take the undirected graph as the input of the network. Second, optimizing the topological structural characteristics by graph convolution operation on this undirected graph. Finally, by introduce the capsule network to improve the generalization ability and enhance the robustness. Extensive analysis and experiments on 30 individual emitters signals demonstrates the attentiveness of the proposed model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For the high-sensitivity cameras of a super-large star catalog, the conventional effective star identification methods for star sensors will face huge storage and calculations that current computers cannot afford. This paper presents a two-stage full-sky star identification method. 3~4 prominent stars are firstly quickly identified from a simplified star catalog, to determine the view direction. Then, three different strategies are adopted to recognize other remaining stars in the field of view: one strategy is to automatically load the K-vector table of the corresponding sky zone; one strategy is to temporarily generate a K-vector table from the candidate star set, and then remaining stars are identified according to the angular distance from the prominent stars; the third strategy is to obtain the image coordinates of the candidate star set, the proximity position constraint is considered while constraining the angular distance from the prominent stars. Experiments show that the speed of the third strategy is increased by about 20% and maintains a higher recognition rate (F1 is about 0.92). This two-stage recognition method ingeniously resolves the huge amount of calculation caused by the super-large star catalog, and can identify enough stars (ten thousand stars) in a single frame, and provides sufficient control points for the subsequent intrinsic calibration.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The detection and identification of faults in electricity distribution networks is essential in improving the reliability of power supply. After observing many fault current signals we found that: (1) features of many recorded fault electrical signals were unknown or obscure; (2) the fault types of most sample signals had no clear definition, that is, the labeled sample were very limited. In this situation, the semi-supervised support vector machine (S3VM) and SVM active learning were firstly introduced to distinguish the short circuit and grounding in distribution networks. We used wavelet packet analysis to extract features based on energy spectrum as the physical features of electric signals, then some statistical characteristics were also computed and selected to form a mixed feature set. A case study was conducted on a real data set including 72 labeled and 7720 unlabeled electrical signals for fault diagnosis. By performing transductive support vector machine (TSVM) and SVM active learning with mixed features, our experimental results showed that both of the two models can effectively identify the fault types. Meanwhile, the accuracy of TSVM is higher than that of SVM active learning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The data projection method can be used in tracking the signal subspace. However, when the signal is interfered by impulse noise, the data projection method will lose its tracking ability during the period impulse noise appears. In this paper, two stable and low complexity subspace tracking algorithms are proposed to solve this problem. The basic idea is to multiply a weight coefficient before the step size of the fast and stable data projection method. When the impulse noise occurs, the weight coefficient will reduce significantly. As a result, the step size will shrink and the adverse effect of subspace tracking caused by the outlier data will be reduced. In addition, both algorithms are verified through numerical simulations, and the effect caused by signal-to-noise ratio is discussed in the simulations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Circular synthetic aperture sonar (CSAS) has attracted great attention in the field of high-resolution SAS imaging. Time domain circular synthetic aperture imaging algorithm can adapt the non-uniform sampling on azimuth direction caused by platform non-uniform velocity, it has the advantage of lower memory demand and suitable for parallel computation. However, exact time domain demands huge computation resources. The Fast Factorized Back-Projection (FFBP) time domain imaging algorithms can reduce the computation load dramatically. In this work, the FFBP imaging algorithm has been used in circular SAS trajectories for experiment data. The first step is to split the entire aperture into several subapertures, then, processing the data in sub-apertures with back-projection method. The last step is to obtain the full-aperture CSAS image by merging all sub-images obtained from the sub-aperture processing. The experiment results have been validated the FFBP imaging algorithm compare with reference simulation result. What’s more, the result also shows that the FFBP imaging quality decrease with the approximation error of FFBP increased.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to solve the problems of low recognition rate, incapability of autonomous detection and weak generality of the existing surface defect detection methods, an improved depth learning surface defect detection method is proposed. This method improves the convolution neural network model in depth learning and divides it into two modules: segmentation module and decision module. After preprocessing, the image is input to the segmentation module for training, and then the output of the segmentation module and network features are used as input to the decision module to detect defects in the image. In the improved model, the convolution layer and convolution kernel size in the segmentation module are optimized, and a new convolution network model is constructed. In downsampling, the maximum pool is used instead of the maximum stride, and the loss function and activation function are designed at the same time. Experiments show that the method has a high defect detection accuracy rate of 99%, realizes autonomous detection, and has certain universality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To improve the imaging quality and reduce the computation burden, this paper proposes a sparse tensor recovery based method for multiple-input multiple-output (MIMO) radar 3D imaging. Firstly, by constructing the sensing matrices in the range direction and angle directions in a pseudo polar coordinate, the sparse tensor recovery model for target 3D imaging is established. Then, the tensor sequential order one negative exponential (Tensor-SOONE) function is proposed to measure the sparsity of the received signal tensor. At last, the gradient projection (GP) method is employed to effectively solve the sparse tensor recovery problem to get the 3D image of targets. Compared to conventional imaging methods, the proposed method can achieve a high-resolution 3D image of targets with reduced sampling number. Compared to existing sparse recovery based imaging methods, the proposed method has a higher accuracy and robustness, while the computational complexity is relatively small. Simulations verify the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a video scaling hardware system based on Lanczos algorithm to achieve real-time high-quality scaling of video resolution at any ratio.The traditional image scaling algorithm has the problem that the interpolation result and the calculation speed cannot be ideal at the same time. The speed of bilinear interpolation and nearest neighbor interpolation algorithm is very high, but mosaics and blurred edges will occur. The interpolation effect of bicubic interpolation is better but the calculation speed is very slow. Experimental results show the interpolation effect of Lanczos algorithm is better than the traditional interpolation algorithm, and the time cost is less than bicubic interpolation. Real-time scaling of the video signal can be achieved in the FPGA system designed in this paper.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of wireless video services, more video transmission schemes have emerged. Because of the cliff effect and a rapid decline in signal quality, the traditional wireless transmission scheme for broadcast and mobile application scenarios has some disadvantages. Based on this situation, new wireless video transmission schemes are proposed. The analog scheme represented by SoftCast solves the cliff effect and has better video objective quality. However, the coding efficiency is not high. In order to further improve the quality of video broadcasting, hybrid digitalanalog schemes are proposed. Hybrid digital-analog schemes combine digital coding and analog coding, which can balance coding efficiency and quality scalability. This is a survey on traditional digital scheme, analog scheme including SoftCast and ParCast, and hybrid digital analog schemes, including WSVC, DCast, WaveCast and LineCast.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the need of video abnormal events to be located in pixel-level regions, a video abnormal event detection method based on CNN (Convolutional Neural Networks) and multiple instance learning is proposed. Firstly, the Gaussian background model is used to extract the moving targets in the video, and the connected regions of the moving targets are obtained by the image processing method. Secondly, the pre-trained VGG16 model is used to extract the features of the connected regions what construct multiple instance learning packages. Finally, the multiple instance learning model is trained using MISVM (Multiple-Instance Support Vector Machines) and NSK (Normalized Set Kernel) algorithms and predicted at the pixel-level. The experimental results show that the video anomaly detection method based on CNN and multiple instance learning can accurately locate the abnormal events in the pixel-level region.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to answer semantically-complicated questions about an image, a Visual Question Answering (VQA) model needs to fully understand the visual scene in the image, especially the dynamic interaction between different objects. This task inherently requires reasoning the visual relationships among the objects of image. Meanwhile, the visual reasoning process should be guided by the information of the question. In this paper, we proposed a semantic relation graph reasoning network, the process of semantic relation reasoning is guided by the cross-modal attention mechanism. In addition, a Gated Graph Convolutional Network (GGCN) constructed based on cross-modal attention weights that novelly injects the semantic interaction information between objects into their visual features, and the features with relational awareness are produced. In particular, we trained a semantic relationship detector to extract the semantic relationship between objects for constructing the semantic relation graph. Experiments demonstrate that proposed model outperforms most state-of-the-art methods on the VQA v2.0 benchmark datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the intrinsic calibration of infrared cameras with certain atmospheric absorption bands, this paper proposes intrinsic-parameter correction methods based on the angular invariance of the infrared-star calibration points. The four cases of inner product cosine invariant and outer product sine invariant under the image model with or without distortion are compared and analyzed. According to the experimental results, the outer product sine invariant has higher correction accuracy due to the higher linearity for small angles, while the image model without distortion is more sensitive to the noise of star-centroid extraction, and the intrinsic calibration error is large. In addition, the experiment also proved that the noise of the star-centroid extraction should be controlled within half a pixel as much as possible; otherwise the accuracy of the intrinsic calibration may be reduced when the distortion in the field of view is severe. Experiments show that the sine-invariant correction algorithm under the distortion model is very suitable for high-sensitivity infrared camera intrinsic calibration using star points as a large number of control points.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many people believe that the understanding of classroom activities can benefit the parents and education experts to analyze the teaching situation. However, employing workers to supervise the events in the classroom costs lots of human resources. The deployment of surveillance video systems is considered to be a good solution to this problem. Converting videos captured by cameras into descriptions can further reduce data transmission and storage costs. In this paper, we propose a new task named Classroom Video Captioning (CVC), which aims at describing the events in classroom videos with natural language. We collect classroom videos and annotate them with sentences. To tackle the task, we employ an effective architecture called rethinking network to encode the visual features and generate the descriptions. The extensive experiments on our dataset demonstrate that our method can describe the events in classroom videos satisfactorily.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a novel matrix completion algorithm, penalty decomposition method based augmented Lagrange multipliers (PD-ALM), to improve the performance of Direction Of Arrival (DOA) in sparse array. In PD-ALM algorithm, we apply the penalty decomposition method to solve low-rank matrix completion problem directly. Firstly, we reconstruct a low rank matrix using the special structure of received signals of uniform linear array (ULA). Then, PD-ALM algorithm is used to complete the received signals of the sparse array. Finally, we apply Multiple Signal Classification (MUSIC) algorithm to estimate direction of arrival. The numerical experiments are provided to validate the effectiveness of the proposed algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a new approach for classifying targets captured by low-resolution Ground Surveillance Radar is proposed. Radar target is detected by the Doppler effect in radar echo signal. Those signals can be disposed in various domains to gain unique features of targets which can be used in radar target classification and enhance its effectiveness. The proposed approach consists of two steps, transforming original signals from 1D to 2D and constructing deep 2D convolution neural networks(CNN). In first step, Toeplitz matrix is made use of reconstructing Radar signal, to build a 2D plane of data. Reconstruction does not change the characteristic distribution of the signal but maps the signal from one to two dimensions in a rearranged method. Whilst,it makes possible of using 2D CNN to train the data. In second step, we take advantage of the “bottleneck” block to create 2D CNN, which guarantee the depth of CNN and ease the problem of vanishing/exploding gradients in back propagation process. method was tested on actual collected database including human and car, which achieve 99.7% accuracy on the original test set and 97% accuracy after adding noise.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The synchrosqueezing transform(SST), a kind of reassignment method, aims to sharpen the time-frequency representation. In this paper, we consider synchrosqueezing transform based short-time fourier transform with instantaneous frequency rate of change to analyze nonlinear and nonstationary signal, called the adaptive synchrosqueezing transform (ASST). Compared with SST, the window width of ASST is adaptively adjusted with the instantaneous frequency rate estimation which is extracted at the signal ridge. The proposed method can generate a more energy concentrated TF representation for the non-stationary signals with fast-varying frequencies. Simulation results are provided to demonstrate the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we focus on the application of the LightGBM model for audio sound classification. Though convolutional neural networks (CNN) generally have superior performance, LightGBM model possess certain notable advantages, such as low computational costs, feasibility of parallel implementations, and comparable accuracies over many datasets. In order to improve the generalization ability of the model, data augmentation operations are performed on the audio clips including pitch shifting, time stretching, compressing the dynamic range and adding white noise. The accuracy of speech recognition heavily depends on the reliability of the representative features extracted from the audio signal. The audio signal is originally a one-dimensional time series signal, which is difficult to visualize the frequency change. Hence it is necessary to extract the discernible components in the audio signal. To improve the representative capacity of our proposed model, we use the Mel spectrum and MFCC (Mel-Frequency Cepstral Coefficients) to select features as twodimensional input to accurately characterize the internal information of the signal. The techniques mentioned in this paper are mainly trained on Google Speech Commands dataset. The experimental results show that the method, which is an optimized LightGBM model based on the Mel spectrum, can achieve high word classification accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Cylindrical phased array radar has an important role in low-altitude target surveillance, and signal processing is one of the important component modules of the radar system. Cylindrical radar as one kind of phased-array radar has the characteristics of full azimuth range of multi beam and huge data which makes high demands of signal processing. FPGA occupies an important position in radar signal processing because of its characteristics of high-speed and real-time. In this paper, signal processing scheme based on Xilinx's FPGA Kintex-7 and multi-core digital signal processor (DSP) is proposed, which mainly implements functions such as data reception, pulse compression, and moving target detection (MTD) processing etc. By comparing the actual results with the matlab simulation results, it is shown that this scheme has a good performance in stablility with fast processing speed. Moreover, it has obvious advantages in the design and provides great value for engineering.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The algorithms of blocking matrix preprocessing (BMP) and eigen-projection matrix preprocessing (EMP) to suppress mainlobe interference will lead to the main peak offset. Aiming at dealing with the problem, the conventional methods, such as diagonal loading, diagonal loading combined with linear constraint, whitened, and weight coefficient compensation are used. Guided by the eigen-projection and L2 norm constraint, this paper presents a novel method based on blocking matrix preprocessing and L2 norm constraint. The improved BMP algorithm not only can overcome the deviation of the peak value, but also can achieve excellent performance at low snapshots in comparison with traditional methods. Besides, the accurate direction of sidelobe interference is not required, so the improved method is easy to implement in engineering. The biggest advantage is that the proposed method can maintain the high output signal to interference-plus-noise ratio (SINR) in the case of strong mainlobe jamming. Theoretical analysis and simulations demonstrate that the proposed method has the optimal performance for mainlobe jamming suppression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the linear systems, the conventional least mean fourth (LMF) algorithm has faster convergence and lower steady-state error than LMS algorithm, However, in many applications, the censored observations occur frequently. In this paper, a least mean fourth (LMF) algorithm with censored regression is proposed for adaptive filtering. When the identified system possesses a certain extent of sparsity, the least mean fourth algorithm for Censored Regression (CRLMF) algorithm may encounter performance degradation. Therefore, a reweighted zero-attracting LMF algorithm based on the censored regression model (RZA-CRLMF) is proposed further. Simulations are carried out in system identification and echo cancellation scenarios. The results verify the effectiveness of the proposed CRLMF and RZA-CRLMF algorithms. Moreover, in sparse system, the RZA-CRLMF algorithm improves further the filter performance in terms of the convergence speed and the mean squared deviation for the presence of sub-Gaussian noise.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The concept of Speech watermarking has risen to be an efficient and promising solution to safeguard speech signals in today’s world of swiftly advancing communication technologies. In this paper, Robust Principal Component Analysis (RPCA) and Formant Manipulation (FM) have been used to embed the watermark into the host speech signal. RPCA involves obtaining the sparse component of the speech signal for accurate embedding, extraction of the watermark and FM involves modifying the formants by exploiting the properties of Line Spectral Frequencies (LSFs). A non-blind watermark detection scheme has been proposed to detect the watermark which demonstrates better stability and accuracy. Results of performance evaluation reveal that the proposed technique is robust and the watermark embedded is imperceptible. Also, the robustness of the method is verified by testing against several speech processing attacks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, an end-to-end speaker verification system based on attentive deep convolutional neural network (CNN) is highlighted. It takes log filter bank coefficients as input and measures speaker similarity between a test utterance and enrollment utterances by cosine similarity for verification. The approach utilizes the channel attention module of convolutional block attention module (CBAM) to increase representation power by giving different weights to feature maps. In addition, softmax is used to pre-train for initializing the weights of the network and tuple-based end-to-end (TE2E) loss function is responsible for fine-tune in evaluation stage, such a strategy not only results in notable improvements over the baseline model but also allows for direct optimization of the evaluation metric. Experimental results on VoxCeleb dataset indicates that proposed model achieves an equal error rate (EER) of 3.83%, which is slightly worse than x-vectors while outperforms i-vectors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Feature extraction and utilization is of great importance for the problem of machine fault diagnosis. In this paper, multihead deep learning network is proposed to achieve machine health status classification using features of different sizes. Firstly, statistical characteristics which reflect machine signal status of time domain and frequency domain are summarized to compose feature vectors as one-dimensional network input. Secondly, Mel power spectrum and its incremental characteristics are utilized as two-dimensional network input of three channels. Lastly, the multi-head network is introduced to analyze both one-dimensional and two-dimensional features using two different sub neural networks and classify the machine health status according to the joint feature analyzing result. The experiments on bearing working status database of Case Western Reserve University show that the proposed method has good mechanical signal classification ability and better stability. Moreover, our final test accuracy of fault diagnosis on 16 kinds of bearing working signals can reach up to about 99.53%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The conventional channel detector based on the maximum a posteriori (MAP) algorithm for coded multiple-input multiple-output (MIMO) multiuser systems has a computational complexity growing exponentially with the product of the number of users, the number of transmit antennas, and the symbol constellation size. In this paper, we consider the multiuser detection problem from a combinatorial optimization viewpoint and develop a low-complexity iterative receiver based on the evolutionary programming (EP) technique. Simulation results show that with the proposed receiver, the performance of coded multiuser systems approaches that of the iteratively MAP-decoded single-user (SU) MIMO system at a significantly reduced computational complexity even for unknown channel scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a fully graph-based iterative detection and decoding scheme for low-density parity-check (LDPC) coded generalized two-dimensional (2D) intersymbol interference (ISI) channels. The 2D detector consists of a downtrack detector based on the symbol-level sum-product algorithm (SPA) and a bit-level SPA-based crosstrack detector. A LDPC decoder based on simplified check node operations is used to provide soft information for the 2D channel detector. Numerical results show that the proposed receiver significantly reduces the decoding complexity and also achieves better performance as compared with the trellis-based BCJR detector over 2×2 2D channels.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For bistatic inverse synthetic aperture radar (Bi-ISAR) cross-range scaling (CRS), it needs to estimate the effective rotational velocity (ERV) and correct linear-geometry distortion at the same time. In this paper, the effective rotational velocity (ERV), rotational center (RC) and ratio of linear-geometry distortion (RLGD) are jointly estimated by optimizing the image quality, which is measured by the image entropy. After parameter estimation and phase compensation, the image without linear-geometry distortion is generated by the matched Fourier transform (MFT). Numerical results validate that the proposed method works robust under different signal to noise ratio (SNR) conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Passive radar system is widely used in military, scientific and commercial fields. It faces a great challenge in detecting small slow-moving targets. This paper propose a new Bernoulli track-before-detect filter to deal with the detection difficulty. By estimating the ground clutter parameters, this new method can adapt to the changing of ground clutter. Simulation results prove the efficiency of this new method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Ethernet Media Access Control (MAC) controller is an indispensable IP core in Field-Programmable Gate Array (FPGA), in order to realize the independent intellectual property rights of MAC controller IP core. This paper designs a MAC controller which supports Media Independent interface (MII) / Gigabit Media Independent Interface (GMII) and supports full duplex / half duplex. According to the definition of Ethernet frame format, MAC control frame structure and Station (STA) management frame format in IEEE 802.3 protocol, the overall structure of MAC controller and the function of each module are designed. Advanced High-performance Bus (AHB) and Advanced Peripheral Bus (APB) are used to realize separate access of data cache and configuration register to improve the transmission efficiency of MAC controller bus. The results of Electronic Design Automation (EDA) and FPGA board level verification show that the MAC controller meets the design requirements of data transmission.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to deal with impulsive noise, the traditional filtered-s normalized maximum correntropy criterion (FsNMCC) adaptive algorithm has good robustness in nonlinear active noise control (ANC) systems. However, the FsNMCC algorithm has a single Gaussian kernel, of which the noise reduction performance is susceptible to the value of the kernel width. To surmount this shortcoming, the filtered-s normalized maximum mixture correntropy criterion (FsNMMCC) algorithm is designed for a functional link artificial neural network (FLANN) based on ANC systems. Simulation results show that the proposed FsNMMCC algorithm in this paper has better noise reduction performance than the FsNMCC algorithm in active noise control of impulsive noise with standard symmetric α-stable (SαS) distribution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
RSS-based target localization algorithms are usually derived from channel path-loss model where the measurement noise is generally assumed to obey Gaussian distribution. In this paper, we approximate the realistic measurement noise distribution by a Gaussian mixture model and proposed an improved mixture noise analysis-based RSS target localization algorithm employing expectation maximization, called Gaussian mixture-expectation maximization (GMEM) approach, to estimate target coordinates iteratively, which can be efficiently used for tackling unknown parameters of maximum likelihood estimation and non-convex optimization. Simulations show a considerable performance gain of our proposed localization algorithm in 2-D wireless sensor network.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Constant false alarm rate (CFAR) detectors are widely used in modern radar system to declare the presence of targets. One or more outliers will appear in the reference cell under the multiple strong interferences situation, and the clutter power estimation will increase, which will affect the detection threshold calculation, the detection probability of CFAR detectors decrease and the alarm rates increase significantly. This paper proposes an adaptive weighted truncation statistic CFAR (AWTS-CFAR) algorithm and achieves good performance. By improving the truncation process, the truncated larger value is adaptively weighted with the smaller value in the reference cell. Since AWTS-CFAR makes the larger value in the reference cell also participate in the calculation of the background clutter power estimation, even if the truncation threshold is selected to be smaller, AWTS-CFAR will not cause too much loss of constant false alarm, and will suppress clutter edge effect as much as possible in the clutter edge environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In cloud computing, the resource allocation of cloud platform faces not only single node resource request, but also complex multi-node resource request. Especially for users who need to run parallel or distributed tasks, there are very strict delay and bandwidth requirements for communication between nodes in cloud cluster. Existing cloud platforms often allocate resources one by one virtual machine, ignoring or making it difficult to guarantee the link resources between nodes, that is to say, there is a problem of multi resource allocation in cloud clusters. Therefore, this paper proposes a new cloud resource description method, and improves the allocation method of particle swarm cloud resource. The simulation results show that the proposed method can effectively allocate cloud resources, improve the average revenue and resource utilization of cloud resources, reduce the resource cost by at least 10% compared with the traditional methods, and have shorter task execution time (within 30ms).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a novel robust algorithm called geometric algebra least mean M-estimate (GA-LMM) is proposed, which is the extension of the conventional LMM algorithm in GA space. To further improve the convergence performance, variable step-size GA-LMM (VSS-GA-LMM) algorithm is also proposed, which effectively balances the trade-off between convergence rate and steady-state misalignment. Finally, a multidimensional system identification problem is considered to verify the performance of the proposed GA-LMM and VSS-GA-LMM algorithms. Simulation results show that the proposed algorithms are superior to other GA-based algorithms in terms of convergence rate and steady-state misalignment in impulsive noise environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Pregnancy complications put gestational women at risk, especially for those who are over 35, which can seriously threaten the safety of the mother and the fetus. This paper is aimed at detecting comprehensive adverse pregnancy outcomes based on Electronic Medical Records (EMRs) from the obstetrical department. However, EMR data is usually incomplete, imbalanced and high-dimensional with sparsity. Therefore, missing value imputation and data balancing methods were applied to improve the data quality. Also, manual feature selection based on medical prior knowledge and automatic feature selection methods were implemented to extract risk factors and evaluated for classification. The experimental results show that our system is capable of identifying patients at risk, and achieved the best accuracy of 0.8707 and the best recall of 0.7454. Besides, the extracted risk factors offer the opportunity to assist clinical diagnosis and improve labor processing procedures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a Detector-Net method for point cloud registration which learns a 3D feature detector of a specific descriptor. Different from the traditional detectors, deep neural network is used to generate this detector and manual annotation of feature points is not required. Instead, we leverage the aligned point cloud to deduce distinguishing points to generate training data. The indoor point cloud dataset is used as the training set, and experimental results show that the Detector-Net has better accuracy among traditional detectors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the adjacent multi-target scenario, the Gaussian mixture probability hypothesis density (GM-PHD) algorithm encounters problems of inaccurate target number estimation and low tracking accuracy. To tackle these problems, this paper proposes an improved components management strategy for GM-PHD algorithm. We develop a master-slave mode to process Gaussian components, the master components whose weights exceed the extraction threshold are retained to avoid merging them each other, which guarantees the accuracy of target number estimation. Meanwhile, the slave components which satisfying the merging conditions are merged with the corresponding master components to improve the target tracking accuracy. Simulation results show that the proposed algorithm can achieve better performance than conventional GM-PHD algorithm in different clutter environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.