PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 13086, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Camshafts, mechanical transmission devices with highly reflective surfaces, are widely utilized in the automotive industry. To ensure their performance and longevity, comprehensive quality inspections must be performed to identify any minor defects. Although deep learning has achieved success in industrial quality inspection applications, high computational power costs have impeded its progress. To address this limitation, this study proposed CamLite, a lightweight model for anomaly detection on edge devices. CamLite incorporates three key innovations: a lightweight backbone network, an auxiliary learning paradigm, and a dynamic weighted average strategy. The lightweight backbone network, a straight-through network constructed with reparameterization blocks, efficiently extracts detailed and semantic features with minimal parameter and memory overhead. The auxiliary learning paradigm improves the model’s performance by incorporating an anomaly classification task with higher information entropy without increasing inference latency. Moreover, the dynamic weighted average strategy adjusts the loss weight of the primary and auxiliary tasks, allowing the network optimization process to focus more on minimizing the loss of the primary task. Experimental results demonstrate that CamLite achieves an outstanding balance between accuracy, model size, and latency, with a Matthews correlation coefficient of 0.924 and only 0.63M parameters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a robust facial landmark detection network based on multiscale attention residual blocks (MARBNet) for effectively predicting facial landmark. MARBNet consists of three modules. Firstly, the coarse feature extraction module obtains coarse features through convolution, batch normalization, ReLU activation, and maximum pooling. The fine feature extraction module is composed of 33 multiscale attention residual blocks (MARB). MARB is composed of 1x1 convolution layer, 3x3 convolution layer, 1x1 convolution layer, two multiscale convolution module(MulRes) and channel attention module(CAM). MulRes is used to extract complementary features of different scales, obtain more feature information under different Receptive field, and avoid excessive loss of key information in the input image. CAM enables the network to pay more attention to high-frequency information on the channel, effectively prevents the loss of information, so as to improve the effect of facial landmark detection. The output module consists of two 1x1 convolution layers, one of which outputs landmark heatmap score and landmark coordinate offset, and the other outputs the nearest neighbor landmark offset. The experiment results on WFLW and 300W datasets show that our method is superior to the existing algorithms in terms of normalized mean square error indicators.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The applications of multi-scale fusion strategy to feature are very effective and common in image restoration. However, there is a lack of research on the application of the multi-scale fusion strategy to attention mechanism, although attention mechanism has also been verified to be effective in image restoration. To address this problem, we propose a residual multi-scale pixel attention fusion block (RMPAFB) to refine the input feature, which successfully combine the multi-scale fusion strategy with pixel attention. RMPAFB can capture feature correspondences from multi-scale pixel attention map, which can be more effective for feature refinement than single-scale pixel attention map. Based on RMPAFB, we build an efficient and effective network called RMPAFNet for image deblurring. Substantial experiments on several benchmark datasets have showed that multi-scale pixel attention performs better than single-scale pixel attention and our proposed RMPAFNet achieves state-of-the-art performance while requiring fewer overheads than recent competing deblurring models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a two-dimensional human pose estimation network constrained by the human structure information (HSINet). HSINet effectively fuses features of different scales and explicitly integrates human structure information to enhance the precision of key point localization. The architecture of HSINet comprises three pivotal modules: the feature extraction module, the encoding module, and the decoding module. The feature extraction module within HSINet employs the architecture of High-Resolution Net (HRNet). In contrast to HRNet, we remove redundant layers, and enhance the ability to combine global features and local features using the Gated Attention Unit (GAU). The second module encodes the feature maps derived from the feature extraction module. Each feature map corresponds to a joint point and is characterized by two feature vectors representing the x and y axes. Utilizing graph convolution for encoding introduces constraints based on human structure information. Subsequently, these encoded feature maps are decoded into precise coordinates of key points. The experiment results on COCO datasets show that our proposed method can improve the precision of key point detection while effectively reducing the number of parameters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Suitable matching area selection (SMAS) is one of the key technologies for aircraft scene matching navigation (SMN). Recently, deep neural networks have been applied to SMAS due to their powerful feature extraction capabilities, and have achieved significant performance improvements compared to hand-crafted suitable matching indicators. However, SMAS solely based on deep networks does not make full use of existing image suitability information. Therefore, this paper constructs a dual-branch multi-modal fusion network to maximize the utilization of image suitability features to improve SMAS performance. The network contains two parallel channels, one of which takes hand-crafted suitable matching indicators as input, and uses natural language processing networks to obtain deep suitable matching parameter vectors, and the other employs deep networks to extract deep features from original image patches. Finally, the two types of multi-modal information are fused to produce multi-modal suitable matching features (MMSMF), which is input to the output layer to predict the matching probability. By setting a matching probability threshold, we can distinguish between suitable and unsuitable matching image patches. Compared with traditional hand-crafted suitable matching indicators and deep networks, MMSMF can yield richer suitability feature representation. The public Sentinel dataset named SEN1-2 is used to evaluate the performance of MMSMF. Experimental results show its advantages over other representative methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Our work focuses on exploring the emerging field of cross-modal vehicle re-identification. Achieving accurate cross-modal vehicle re-identification requires a network that can capture local details from two different modality images while effectively fusing their valid information. However, existing methods only consider extracting high-level semantics, leading to a loss of fine-grained details and imprecise identification. Additionally, insufficient attention has been paid to effective information in different modalities, as cross-modality interaction has not been thoroughly explored. To address these issues, we propose a new cross-modal vehicle re-identification network consisting of a multi-scale feature fusion module and a cross-modal attention module. Specifically, the multiscale feature fusion module captures both global high-level semantics and local details by integrating multi-scale information in the feature extraction process, reducing the loss of local details. The cross-modal attention module explores valid information from different modalities and achieves feature-level fusion. We conducted experiments on the RGBNT100 cross-modal vehicle re-identification dataset to verify the proposed method’s effectiveness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The line-structure light bar images acquired in industrial metal parts inspection may suffer from inaccurate extraction of the centerline of the line structure light bar due to the interference of ambient lighting and strong diffuse reflection and weak reflection of the part surface material. For this reason, this study proposes a method based on G-HL (Gauss-Hyper-Laplace Distribution) curve fitting to improve the center extraction accuracy of the line structure light bar in complex environments. First, the method uses Gaussian filtering to denoise the line-structured light image. Then, the laser line target pixel region is localized by finding the upper and lower boundaries of the line-structured light bar using the neighborhood gradient for each column of the line-structured light image, and the width of the line-structured light bar is calculated by the difference of the boundary coordinates, and the fitting interval is determined. Finally, within the fitting interval, the G-HL curve is used to fit the gray scale of the line-structured light cross-section to obtain the G-HL parameter, and the point of the maximum value of the G-HL curve is taken to be the coordinates of the center of the line-structured light. Experiments prove that the proposed method can accurately extract the center of the line structure light in low exposure. Compared with the methods such as the grayscale center of gravity method and the extreme value method, the proposed method has high accuracy and robustness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Currently, object detection based on deep learning has received extensive research and attention in the field of grid inspection, achieving high detection accuracy and recognition precision. However, pre-trained object detection models lack overall perception and reasoning capabilities, resulting in higher false positives and missings due to a lack of holistic understanding of challenging samples. Recently, the combination of natural language models and image understanding in multi-modal large language models has gained significant attention. In this paper, we propose the Grid-Blip model, a multi-modal large model enhanced with general knowledge, to specifically study wildfires detection in grid inspection. Grid-Blip is based on the blip model architecture, which includes a natural language model, a visual generation model, and a fusion model. We conduct large-scale sample annotation at the semantic level of whole-image grid inspection, providing crucial training samples for multi-modal large-scale model research. Furthermore, we investigate the design of the fusion model network, training the model to effectively integrate the pre-trained natural language model and visual generation model. Experimental results demonstrate that compared to object detection models, the proposed multi-modal large-scale model in this paper achieves overall semantic perception and reasoning capabilities. The Grid-Blip model reduces the false alarm rate for wildfire smoke trend prediction from 20% to 10% and the missed detection rate from 18% to 13%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Scene recognition has a wide range of applications in autonomous driving, security monitoring, smart home, and so on. Though traditional methods achieved good results in this field, nowadays deep-learning methods are the dominator. In this paper, we conducted an extensive comparison of the performance of five deep-learning models on a common dataset to reveal their strength and weakness. Experimental results show that, of the five deep learning models, ConvNeXt works the best. In addition, all five models outperform a traditional method that used to be the state of the art.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper investigates a binarized backbone neural network design optimization method for dedicated hardware circuits, in which the activation values and weights of the convolutional operations are represented by 1bit to achieve low computation and low number of parameters while maintaining a high image classification accuracy. The network adopts a hyperparameterized design training method, which introduces hyperparameters in the binarization process of activation function, weights and activation values to improve the accuracy, and the inference process fuses all hyperparameters absorption into two binarization threshold numbers to simplify the hardware experimental complexity. The network downsampling layer adopts a separate design, using the pooling layer to complete the downsampling task of the convolutional module, which improves the accuracy while ensuring that all convolutional layers in the network have a step size of 1, providing uniformity and convenience, suitable for hardware deployment. The input layer is binarized, and all convolutional binarization is achieved by binarizing the RGB three-channel data into 24 channels to retain the input information, and only the BN and pooling layers retain the multi-value information. Based on the above scheme, the proposed binary network model architecture has good hardware adaptability, and the computational and parametric quantities are lower than other common binary networks, while maintaining high accuracy, achieving 89.27% and 66.21% top-1 accuracy on the Cifar10 and Cifar100 Dataset, respectively. A model that maintains multi-valued information in the input layer and binary data in the other layers achieves a top-1 accuracy of 65% on the ImageNet Dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning technology has been widely applied in the field of image object detection, and many mature object detection models have emerged, which rely on a large number of data samples for learning and training. However, in many practical application scenarios, it is difficult to obtain a large number of correctly labeled samples. The demand for the quantity and quality of training dataset samples is an important issue in the field of few-shot detection. This paper explores the relationship between sample size and training effectiveness through model training experiments on different datasets. It is found that the accuracy and recall of the model both above 70% when the sample size is more than 500, and less than 10% when the sample size is less than 100. We optimized a dataset of 197 images based on data augmentation, achieving an improvement in training effectiveness by increase 17.1% mean average precision of the model. By adjusting the simulation azimuth and pitch angles to obtain datasets with different sparsity, we trained the detection model using these datasets and tested the model's detection performance using test images. We found that increasing the shooting angle interval would make the dataset sparser, resulting in a decrease in the mean average precision of the model on the validation set and a decrease in the detection performance on the test images. Moreover, an overly sparse dataset could cause over-fitting problems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Structured light (SL) scanning in industrial metrology offers the advantages of non-contact measurement, high-speed operation, and high precision. However, fringes may exhibit excessive modulation when measuring highly reflective metal parts, leading to the calculation of invalid phases and the reconstruction of abnormal point clouds. High dynamic range (HDR) scanning is a commonly used approach to address this issue, but the optimal exposure time using this method is usually influenced by the reflective properties of the object. In this paper, we propose an optimal exposure time selection method based on the clustering of images in the fringe modulation. Firstly, we acquire modulation degree images containing only objects. Subsequently, the K-Means method is employed to cluster the modulation degree images. The K best exposure times are calculated based on the average maximum gray value of the fringe images in each cluster. Finally, K sets of fused fringe images are captured and utilized to reconstruct three-dimensional (3D) information with enhanced coverage. Using our method, the three-dimensional information of the printed circuit board (PCB) is reconstructed, achieving an improved coverage rate of 98.1% compared to the 88.3% coverage rate obtained from a single-shot capture. Additionally, our method achieves a measurement accuracy of 0.08mm. Experimental results demonstrate that the proposed method efficiently selects the optimal fusion exposure time, making it suitable for measuring highly reflective PCBs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Major countries in the world are facing the problem of aging. Whether it is an internal cause or an external cause of the fall, if the rescue is not timely, it will cause great harm to the elderly. Therefore, we urgently need a real-time and accurate fall detection technology for timely rescue after the elderly fall. For fall detection, the existing sensor-based wearable fall detection devices are expensive to popularize, and there is a problem that the elderly forget to wear them. Therefore, a fall detection model based on AlphaPose combined with LSTM and Lightgbm is proposed. In the algorithm, AlphaPose is first used to extract the key points of the human body, and then two LSTM sub-networks are used to extract temporal and spatial features, and then sent to the main LSTM network for feature fusion. Lightgbm performs classification to achieve more accurate detection results. Experiments were conducted on two fall datasets, KFALL and UR, and the fall detection accuracy rates were 94.43% and 93.81%, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Nowadays, image recognition plays a pivotal role in acquiring data via sensors. However, the adaptability of traditional algorithms is hindered by the unpredictable nature of open environments, varying sensor quality, and image dimensions. Challenges arise in adverse conditions like inclement weather, low light, and optical distortions. Retinex-based methods have emerged as a viable solution, effectively enhancing images plagued by shadows or poor lighting. Yet, issues surface when images possess saturated colors; the conventional multi-scale Retinex with color restoration risks color inversion. Moreover, during gain compensation, extreme histogram values occupy significant gray level space, obscuring vital image details. This study delves into these challenges and proposes an enhanced multi-scale Retinex algorithm. Our approach substitutes logarithmic functions with tansig functions, eliminating color inversion risks. Additionally, a novel gain compensation method, integrating histogram stretching with Gamma correction, refines image clarity. The algorithm's robustness is evidenced in diverse scenarios, including adverse weather, low light, underwater imaging, and non-uniform lighting. Experimental results validate our method's superiority, surpassing other Retinex-based techniques both qualitatively and quantitatively. This research contributes valuable insights into image enhancement methodologies, fostering advancements in sensor-based data gathering in Smart Spaces.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Plant fruit count of can predict the yield of the whole orchard, which has the key guiding effect to agricultural production process. In this work, we propose a context feature aggregated convolutional neural network for plant fruit counting. The feature extracted by VGG network is pooled to get the context information, which is concatenated with the original feature for classification and regression. The output of the network is point, which is matched to the ground truth point to define the loss. Experimental results show that compared with existing network structure for fruit counting, our method achieves better counting performance on public available fruit dataset. The improvement of accuracy indicators shows that the proposed method has a good effect on plant images with different scales, illumination, contrast and occlusion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multi-view stereo is a method that analyzes and processes images from multiple perspectives to estimate the 3D geometric information of the scene to achieve 3D reconstruction. To improve the accuracy of 3D reconstruction in large-scale scenes and reduce the complexity of the reconstruction algorithm, in this paper, we propose a coarse-to-fine multi-view stereo network based on attention mechanism. First, we use a feature pyramid to extract multi-scale features, introducing richer geometric information and more contextual information at different levels of the pyramid to improve modeling accuracy. Then, we use position encoding on the coarse-scale feature map and introduce an attention mechanism to obtain more context information. We adopt a cascade structure to achieve high-resolution depth map construction. We use the reference image to refine the final result again and enhance details such as edges. We conduct experiments on the publicly available DTU dataset. Experimental results show that our proposed method improves accuracy compared with existing algorithms. In addition, we also conduct experiments on other representative public datasets. The accuracy of the experimental results further validates the effectiveness of our proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Semantic segmentation aims to divide a scene into regions with different semantic categories. The prevalent technique in scene semantic segmentation is denoted as pixel-based segmentation, whereby classifications are assigned to singular points. However, the principles used to make predictions through these methods exhibit significant differences from the way in which a scene would be processed through human vision. When encountering new scenes, humans initially concentrate on each instance within the three-dimensional scene, rather than individual pixels. Inspired by M2F, a popular instance-based architecture proposed for 2D segmentation, we propose a 3D semantic segmentation algorithm based on M2F. It departs from the common practice of per-pixel classification in point cloud semantic segmentation. Instead, we first predict the instance mask and then assign labels to each point within the corresponding instance. In our experiments, the scene was divided into regions, and each region was treated as a complete entity and associated with a single global class label prediction. Thus, the pixel-wise classification problem in existing 3D semantic segmentation was transformed into a region-based classification problem in a 3D scene. The experiment conducted on the popular public database S3DIS illustrates that the proposed method achieves 68.5% mIoU / 74.3 mAcc, outperforming other competing approaches with certain margin.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the paper, a new fuzzy hyperbolic secant function clustering algorithm was presented to improve the cluster center positional accuracy and robustness of cluster. The core of this algorithm is to utilize the high recognition attribute of bell curve of hyperbolic secant function for obtaining the cluster centers and its number. By compared, the provided hyperbolic secant function clustering algorithm prior to the Gaussian function clustering algorithm in the location accuracy of cluster center, and is better than the FCM cluster algorithm both in the location accuracy of cluster center and the cluster robustness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Semantic segmentation in aquatic scenes is key technology water environment monitoring. Small-scale object detection and segmentation in aquatic scenes are major challenges in semantic segmentation of water bodies. Current typical semantic segmentation methods often use multi-scale feature fusion operations, features of different scales from different network layers are aggregated, enabling the features to have both strong semantic representation from high-level features and strong feature detail expression capability from low-level features. However, current methods, although they focus on the details of small-scale objects, primarily rely on low-level features to determine the presence of objects in the network scale adaptation for small object detection, resulting in the loss of accuracy when using high-level semantic features for prediction. Moreover, cross-scale fusion does not depend on category characteristics. Therefore, existing methods are not ideal for semantic-constrained small object segmentation, such as water surface garbage and plant debris. Our method focuses on the cross-level semantic information aggregation and utilization for object segmentation in aquatic scenes, providing a new approach for small object segmentation in complex semantic environments. In aquatic scenes, the category of objects has strong contextual relevance. Therefore, this paper proposes a cross-level semantic aggregation network to address the problem of small object segmentation in aquatic scenes. The cross-level semantic aggregation method guides the high-level features to perform semantic aggregation using low-level features, enabling the aggregation of features with high-level semantic features of the same category as small objects, while introducing relevant contextual scene features of different categories. Compared to traditional scale fusion, this introduces a new aggregation method within the semantic framework to handle small object segmentation in complex contextual relationships. We conducted extensive experiments on our self-built water body scene dataset, ColorWater, and the public dataset Aeroscapes. In addition to achieving state-of-the-art performance in overall segmentation, we particularly achieved significant advantages in small object categories such as floating garbage on the water surface and plant debris, which are the focus of this paper.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Underwater images can provide the underwater information intuitively and effectively. However, due to wavelength and distance related attenuation and scattering, underwater images may exhibit color distortion and low contrast. To address these two degradation issues, a novel two-stage network named as DAMcS-Net is proposed in this paper. In the first stage, a dual attention module that combines channel attention and spatial attention mechanisms is designed to amplify the network’s perception of detail textures. In the second stage, a multi-color space stretch module is designed to adaptively adjust the histogram distribution in RGB, HSI, and Lab color spaces, so that color projection and artifacts can be eliminated effectively. Quantitative and qualitative experiments show that our model has achieved state-of-the-art performance in comparison with existing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Coarse registration is the initial step of aligning a point clouds with other clouds, aiming to put two point clouds in the correct position. There are many coarse registration methods, and among them, the SAC-IA (Sample Consensus Initial Alignment) is widely used. It selects corresponding point pairs by matching the geometric features of point clouds. It has high registration accuracy and fast registration speed. However, when the geometric features of the point clouds are relatively simple or not distinctive, its registration performance may not be very effective. With the development of color image sensors, acquiring color point clouds has become increasingly important. The RGB information of point clouds can effectively compensate for the shortcomings of the SAC-IA algorithm. The color characteristics of the interested points are extracted by fusing the color information of feature points and their neighboring points, including the first-order moment of point clouds color and the CFH of the point clouds, based on the traditional SAC-IA algorithm. Experiments have shown that the improved SAC-IA algorithm has better accuracy and robustness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Usually radar target recognition methods only use a single type of radar data, such as synthetic aperture radar (SAR) or high-resolution range profile (HRRP). Compared with SAR, HRRP lacks the azimuth distribution information of the scattering center, but it has much looser imaging conditions than SAR. Both of them are important for radar target recognition. In fact, there is a correspondence between them. Therefore, in this paper, we propose an end-to-end fusion network, which can make full use of the different characteristics obtained from HRRP data and SAR images. The proposed network can automatically extract the features of HRRP and SAR data for fusion target recognition. It is a dual stream structure, which contains two separate feature extraction streams. One stream uses a 1D CNN to extract the complex features of HRRP data for full angle domain recognition, and the other uses a multi-scale 2D CNN to extract SAR features. An adaptive fusion module is designed in this paper for deeply fusing the two stream features and output the final recognition results. The contributions of this method mainly include: (1) A new end-to-end HRRP/SAR fusion network is proposed, and the experiment shows that our network significantly improves the recognition accuracy; (2) In HRRP feature extraction flow, we use a 1d-CNN, which can extract full angle features; (3) A multi-scale convolution neural network is used for SAR image feature extraction, which can solve the scale imbalance problem of SAR.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to effectively solve the instability of clustering results caused by random initial clustering centers of fuzzy clustering, a new method of fuzzy clustering based on improved snake optimization algorithm is proposed. Firstly, the population initialization of the snake optimization algorithm is improved; then the improved algorithm is applied to preprocess the dataset to obtain the initial clustering centers based on the dataset; finally, the generative clustering centers are used to carry out the iterative updating of fuzzy clustering. By comparing with the traditional fuzzy clustering algorithm, the results show that the initial clustering center generated based on the improved snake optimization algorithm can effectively avoid falling into the local optimal solution and has better robustness, which can effectively improve the stability and accuracy of the clustering algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The accurate 3D comprehension of point cloud scenes in diverse weather conditions holds paramount significance in various applications such as autonomous driving in contemporary automobiles, outdoor operations of robots, and autonomous drones. Presently, the majority of studies on semantic segmentation algorithms for 3D point clouds primarily focus on clear weather conditions. However, adverse weather conditions introduce specific types of noise that significantly deteriorate the quality of point clouds. Consequently, this poses a challenge in achieving high accuracy and efficiency in point cloud semantic segmentation for outdoor large-scale scenarios. To tackle this issue, this paper presents a novel semantic segmentation method designed for large scenes encompassing point cloud and foggy weather conditions. We further validate our approach using the Foggy Semantickitti dataset, thereby effectively enhancing the average cross-parallel ratio while maintaining computational efficiency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep neural networks have emerged as the predominant technical approach for remote sensing image interpretation and processing, surpassing traditional methods in various tasks such as target extraction, classification, and recognition. However, the decision-making processes underlying these deep networks usually lack transparency, making interpretability a pressing concern. In response to this concern, we employ four prominent feature attribution methods, namely Integrated Gradients, GradientShap, Occlusion, and Saliency, to perform interpretability analysis on deep learning models designed for visible light remote sensing image classification. Our objective is to unveil the foundational principles guiding the decision-making in remote sensing image classification and recognition. We also assess the effectiveness of these attribution methods in identifying crucial decision regions. Through our visual attribution analysis, we aim to contribute to a better understanding of the decision-making mechanisms employed by remote sensing image classification models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.