PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 13274, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurate recognition of cutting path is crucial for the manufacture of invisible aligners. Current methods for cutting path recognition suffers from low efficiency in the manual point selection and line drawing, or low robustness and high complexity in the automatic recognition. This paper proposes an accurate and rapid algorithm for the automatic recognition of cutting paths for invisible aligners. Based on curvature from the vertices of the mesh, the algorithm extracts the feature region for the cutting path. Initial cutting path is obtained via three-dimensional morphological operations and an improved skeleton extraction algorithm. The final cutting path is achieved by applying a conformal mapping algorithm to unfold initial cutting paths to a two-dimensional plane for smoothing, and then mapping two-dimensional smooth cutting path back to three-dimensional space. Experimental results on 80 digital dental models verify the accuracy and robustness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Synchronous classroom is an innovative practice in China that aims to build a community of urban and rural schools and promote a balanced development of education. As the distance learners are physically separated from the local teacher, they are more prone to situations of losing attention. Therefore, real-time detection and recognition of the attention status of distance learners is an important means to enhance the effectiveness and quality of teaching. This research achieved real-time detection and recognition of attention status from the perspective of classroom behavior recognition. Specifically, the research used the SAM algorithm to construct a dataset of classroom behavior data containing two typical behaviors for attentive status and three for inattentive status and utilized the YOWO network model that integrates two-dimensional spatial features and three-dimensional spatio-temporal features to achieve recognition and classification of five typical classroom behaviors, with an accuracy of 89.7%. Through the experiment, the correlation between the attention detection results of this model and the evaluation results by professional teachers reached 0.863, proving the effective evaluation of the attention status of distance learners.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we introduce a sophisticated automatic identification technique tailored for repair area detection on highspeed train body, with an emphasis on weak defects and flatness discrepancies within putty coatings. The methodology commences with developing a coating weak defect identification strategy, leveraging the Xception network and ASPP module for enhanced accuracy. This is followed by the integration of weak defect identification outcomes into the flatness geometric difference identification framework, effectively reducing aggregated point clouds’ interference on flatness assessments. The process culminates in the coating repair areas identification approach, designed to detect defects and necessary machining removals on the coating surface. Our results indicate that by incorporating the effects of defects on flatness into the analysis, the precision of flatness discrepancy detection is significantly improved. Moreover, the amalgamation of defect identification with the evaluation of flatness variations enables more precise mapping of repair zones, representing notable progress in the field of robotic repair technologies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Generative adversarial networks (GANs) have been recently introduced to synthesize radar micro-Doppler spectrograms for data augmentation. However, traditional GANs easily overfit with limited training data and they are hard to ensure that the synthetic spectrograms have high kinematic fidelity (e.g., continuous activity is suddenly interrupted in the time dimension). To address these issues, we propose a physics-aware few-shot generative adversarial network (PFGAN) to synthesize high-quality spectrograms with limited radar data. Our main contributions are two-fold. First, we design an attention ranking-based local fusion module (ARLFM). ARLFM learns to select important local features for matching and replacement, which considers the distribution characteristics of micro Doppler signatures. Second, we improve the kinematic fidelity of synthetic spectrograms by designing a multi-scale envelope-extraction module (MEM) where three types of spectrogram envelopes are extracted at different resolutions to reflect physical motion information. Experiments show that PFGAN can generate diverse spectrograms using few samples with high kinematic fidelity. The effectiveness of synthetic spectrograms is also demonstrated for human activity recognition tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Synthetic Aperture Radar (SAR) images comprise target areas, shadow regions, and background clutter. The target and shadow regions play a crucial role in target recognition and classification in SAR images. The existing SAR ATR method does not effectively separate the target region and the shadow region in the SAR image and extract the depth features separately, which makes the accuracy can not be further improved. This paper proposes a network method that separates target and shadow regions, extracts features from each, and then integrates them after segmentation. It solves the problem that two kinds of SAR image can not be extracted separately, and the recognition degree is improved to some extent compared with the existing methods. The method consists of three stages. Firstly, the original image undergoes threshold processing, counting filtering, morphological operations, contour extraction, etc., resulting in images containing only target or shadow regions, which are then cropped. Secondly, the preprocessed dataset undergoes data augmentation through affine transformation. The augmented target and shadow region images are then fed into parallel network structures for feature extraction. Finally, the features extracted from the two regions are fused, resulting in the ultimate classification decision. The effectiveness of the proposed method is validated through experiments using the MSTAR (Moving and Stationary Target Acquisition and Recognition) dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To address the limited feature extraction capability of convolutional neural networks and the issue of inaccurate localization of facial features, which leads to low recognition rates, this paper proposes an enhanced pyramid convolutional attention network, namely PCAF-Net. The network adopts Pyconvresnet50 as its backbone, incorporates the IMPy module to enhance feature extraction capability, optimizes the PyConv block to improve multi-scale feature extraction, integrates Coordinate Attention to precisely locate facial key areas, and utilizes the ArcFace loss function to enhance expression discrimination. To evaluate its performance, experiments were conducted using the Fer2013 and CK+ datasets. The findings reveal that Coordinate Attention and ArcFace loss functions exhibit superior capabilities in terms of recognition and classification compared to alternative attention and loss functions. It is noteworthy that PCAFNet achieves recognition rates of 71.58% and 92.02% on the Fer2013 and CK+ datasets, respectively, surpassing stateof-the-art networks without an increase in the number of parameters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study addresses the pressing need for precise skin disease diagnostics in the Philippines. Leveraging YOLOv5 deep learning, the research verifies an accurate categorization system for prevalent conditions like Acne, Leprosy, and Psoriasis. Comparative analysis with ANN and SVM classifiers demonstrates the transformative impact of this approach on skin disease diagnosis. By expediting care and supporting dermatologists, the study charts a course for healthcare advancements, bridging critical gaps in Filipino dermatology. The methodology outlines the implementation of a Skin Disease Detection Model utilizing YOLOv5 Deep Learning within a robust Medical Image Analysis framework. Rigorous evaluation metrics highlight YOLOv5L's accuracy in identifying Leprosy and Warts, despite challenges in recognizing diseases like Atopic Dermatitis and Keratosis Pilaris. The research underscores YOLOv5L's potential while emphasizing the need for dataset refinement and model optimization. Recommendations include diversifying datasets, exploring hyperparameters, and incorporating additional disease categories, offering valuable insights for advancing dermatological diagnostics in the Philippines.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is difficult to detect fish targets accurately and quickly because fish target detection algorithms face problems such as underwater background complexity, scale variation, and occlusion. To solve this problem, an improved YOLOv5 lightweight detection algorithm is proposed. Firstly, the building block of lightweight classification network ShuffleNet v2 is used as the feature extraction network, and the computational and parametric quantities of the model are reduced and the operation speed is improved by using the lightweight network ShuffleNet v2 instead of Backbone as the backbone network; secondly, the Repulsion Loss is used to improve the model based on the principle of sample bounding box exclusion and attraction. anti-obscuration ability and make full use of the occluded positive samples in the training process; finally, the CA attention mechanism is introduced to enhance the model's ability to accurately locate the target. The experimental results show that the number of parameters of the improved lightweight model is only 5.3MB, the number of parameters is reduced by 1.8MB, the detection accuracy mAP reaches 95.2%, which is 3.6 percentage points higher than that of the original model, and the amount of computation is only 8.7GFLOPs, and the amount of computation is reduced by 5.7G. Compared with the current mainstream target detection algorithms SSD, YOLOv3, YOLOv7, etc., it not only has superior detection accuracy and inference speed, but also significantly reduces the amount of computation, realizing the balance between lightweight and detection accuracy, and this study provides a reference for the research of fish target lightweighting.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many existing works fail to make full use of temporal information and ignore the diversity of normal behaviors in video anomaly detection tasks. In this paper, we propose a multi-scale dynamic prototype unit based video anomaly detection method. Some works proposed an autoencoder anomaly detection model based on dynamic prototype unit (DPU), which effectively improves the performance of anomaly detection, but ignores the importance of different levels of features for normal event modeling. Therefore, this paper proposes an anomaly detection model based on multi-scale dynamic prototype unit (DPU), which uses memory units to establish connections between encoder and decoder. Normal patterns at different scales are learned. In addition, based on the Temporal Shift technique, the temporal information of video can be mined more effectively to generate future video frames. Experimental results on UCSD Ped2, CUHK Avenue and ShanghaiTech datasets show that the proposed method is superior to the current mainstream video anomaly detection methods while meeting the real-time requirements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It's very meaningful to conduct the driver to the parking space available in the parking lot clearly and accurately by computer vision and computational intelligence. While it is an extremely difficult task, because of the scene complexity in the parking lot. In this paper, a new computational intelligence technology is proposed to detect parking spaces available in the parking lot outdoor based on fusing of Region Convolution Neural Network (RCNN) and random forests (RF), and we called it as RCNN-RF. Within the RCNN-RF, a new activation function is design. And with the help of random forests, RCNN-RF overcomes scene complexity includes illumination, weather, and slight occlusion successfully. Extensive experimental results on the benchmark dataset named PKLot and the self-built dataset named GZMU-LOT outperform those of the state-of-the-art, and it shows the superiority of our proposed RCNN-RF method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In response to the issues of low frames per second and small gesture scale in current hand interaction detection leading to accuracy problems in hand detection, we propose a lightweight small target hand recognition method - RB-YOLOv5s. This method first solves the problem of excessive parameter quantity and low recognition accuracy by replacing the Conv module in YOLOv5s with the RepVGGBlock module and reducing the number of modules. Secondly, a bidirectional feature pyramid structure is introduced in the feature fusion network to enhance the degree of semantic information and location information fusion. Finally, the CIOU loss function in YOLOv5s is changed to SIOU to accelerate training speed and efficiency. We validated this method on a public dataset with distant small targets. The experimental results show that the recognition accuracy of our proposed model is 91%, the parameter quantity is only 1.754×10^6, and the Frames Per Second has increased to 104.17.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper focuses on classifying and detecting traffic volume and congestion of a major highway and delivering predictions and forecasts of traffic congestion severity on these roadways. This would be performed by utilizing the YOLOv8 object detection model trained using the coco dataset. The dataset would be videos taken of a road overlooking passing vehicles. YOLOv8 is a deep neural network developed as a regression-based high-performance algorithm for real-time detection. ByteTrack would be utilized to have a tracking unique ID for each detected vehicle and be able to count the number of objects seen in each frame. This data would then be used to develop a Long Short-Term Memory Neural Network (LSTM) that can predict traffic congestion. While there are multiple techniques for time series predictions such as mathematical and statistical modeling, using a machine learning approach such as LSTM has advantages in robustness, accuracy, and not having to rely on simulating an environment. Afterward, regularization and merging with different optimizers on the LSTM model would be performed to improve the model's accuracy. Regularization is a technique for imposing constraints (such as L1 or L2) on the weights within LSTM nodes. This reduces overfitting and improves model performance. Multiple iterations of hyperparameter tuning would be used to determine the sets of hyperparameters and optimizers most suitable for the model that can provide accurate results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Motion object detection is a commonly used video image processing method used to extract motion regions, thereby reducing the amount of data required for further information processing such as object detection and recognition. ViBe is a high-quality motion object detection algorithm, but in practical applications, it is easy to cause scene switching in the image content due to camera pan tilt rotation and shaking, resulting in deviation in background updates and deviation in the extracted motion regions. This article introduces scene content differences analysis to achieve more reliable background updates and further reduce the computational complexity of background updates. The experiment shows that this method can improve the original ViBe algorithm in terms of quality and speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Remote sensing (RS) image change detection (CD) plays a crucial role in monitoring and understanding dynamic environmental processes. In recent years, CD tasks have seen numerous attempts involving pure convolutional neural networks (CNNs). However, it has become evident that CNNs are limited in their capacity to capture global context and long-range spatial relationships. In this paper, we propose a Bitemporal Feature Enhancement Transformer Network (BFETNet) for remote sensing image change detection. Specifically, the BFENet utilizes a transformer encoder-decoder network to enrich the contextual information of CNN features by incorporating a designed spatial-channel semantic tokenizer (SCST). Besides, we employ a channel-coordinate attention module (CCAM) to further model the positional information and channel information of the feature map. Finally, the change map is obtained by taking the difference of the two feature maps. Extensive experiment results on both LEVIR-CD and WHU-CD datasets show that the BFETNet performs significantly better than the existing state-of-the-art CD methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The safe navigation of unmanned surface vessels is inextricably linked to advanced obstacle detection technology. This technology serves as a critical mechanism, enabling the vessels to timely detect and navigate around obstacles, thereby guaranteeing their safety throughout the voyage. To address the issue of inaccurate water surface segmentation and false obstacle detection due to reflection regions, a novel obstacle detection approach is proposed. This method leverages an enhanced semantic segmentation model to effectively tackle these challenges. This approach encompasses horizon detection, reflection region identification and elimination, and obstacle detection utilizing an enhanced semantic segmentation model. Firstly, the approximate water surface area and horizon parameters are determined by horizon detection. Then, the reflection regions on the water surface are detected and eliminated. Finally, the horizon parameter is integrated into the semantic segmentation model, where the initialization and prior information of the model are adjusted, and the super-prior parameter is attenuated. This optimization enhances the speed and precision of water edge and surface obstacles detection. The experimental results show that compared with other methods, the proposed method has the lowest error in water edge detection, the highest accuracy, recall rate and F-score in obstacle detection, and the lowest average number of false positives (αFP) per frame.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Semantic segmentation of natural scenes is an important basic work of remote sensing information processing. LiDAR can collect the elevation information of the scene, and hyperspectral remote sensing images can collect the unique spectral information of the scene objects. Making full use of the data collected by different kinds of sensors can effectively improve the accuracy of semantic segmentation. To this end, this paper proposes a novel semantic segmentation method by fusing the extended extrema morphological profiles (EEMPs) of LiDAR and hyperspectral image (HSI). Firstly, principal component analysis (PCA) is used as the feature extractor to construct the feature maps by extracting the first informative feature from LiDAR and HSI data. Secondly, the extrema morphological profiles (EMPs) are used to extract the spatial structure feature from the informative feature maps to construct the EEMPs. Thirdly, in order to synthesize the elevation information of LiDAR and the spectral information of HSI in the scene, a feature-level image fusion method of LiDAR and HSI based on total variation model is adopted. Finally, support vector machine (SVM) is utilized to obtain accurate semantic segmentation from the fused EEMPs. Four metrics, i.e., class accuracy (CA), overall accuracy (OA), average accuracy (AA), and Kappa coefficient, are used to quantitatively measure the evaluate the semantic segmentation results. The high semantic segmentation accuracy demonstrate that EEMPs can efficiently extract the complementary information from LiDAR and HSI data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Medical image segmentation is crucial in numerous clinical applications. Nevertheless, accurately segmenting anatomical structures in medical images proves challenging. This difficulty arises from factors including the local volume effect, intensity inhomogeneity, and inter-object occlusion. To address these challenges, our paper introduces an innovative method for medical image segmentation. This approach integrates a hybrid attention mechanism(HCA) and dynamic convolutional(DConv) operations, aiming to elevate the overall segmentation performance. The HCA integrates channel attention and spatial attention to capture contextual correlations between channels and spatial locations. DConv, on the other hand, extends traditional convolution operators with learnable offsets to better accommodate geometric transformations, enabling the network to adapt to complex segmentation tasks. Integrating these two mechanisms into an end-to-end medical image segmentation framework enables the network to understand and process image information more comprehensively. Through experiments conducted on datasets consisting of MRI and CT scans, our methodology not only attains performance at the forefront but also surpasses prior techniques notably in segmentation accuracy. These outcomes present a viable means to enhance the precision of medical image segmentation and open up additional prospects for future advancements in the realm of medical imaging.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of deep learning theory in the field of image processing, its potential in the detection and recognition of agricultural diseases is gradually emerging. Given that apple leaf diseases are one of the common types of apple diseases that directly affect yield and quality, and considering the limitations of traditional segmentation methods, this paper proposes an apple leaf disease image segmentation method based on an improved U-net network. To overcome the deficiencies in detail capture and segmentation accuracy of existing methods, a pre-trained VGG16 network is introduced as a feature encoder, and an Enhanced Convolution Layer (EnhancedConvLayer) is proposed. The design of this layer includes parallel processing paths to fuse different feature information and incorporates the Convolutional Block Attention Module (CBAM), aiming to enhance the model's focus on key image features. Experimental results on the ATLDSD dataset show that the improved model achieves better Mean Intersection over Union (mIoU) and Mean Pixel Accuracy (MPA) than U-net, SegNet, and Unet++ in the detection of apple leaf diseases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Coal maceral image analysis is crucial for predicting coal behavior in processes such as gasification and coking. However, automated segmentation of coal macerals remains challenging due to the grayscale similarity between maceral components like liptinite and the background in coal photomicrographs. In this study, we propose a novel improved network, AR-UNet, for maceral image segmentation. First, we combine attention gates with Residual UNet (Res-UNet), then incorporate an additional loss function. Furthermore, we construct a Coal Maceral image dataset to evaluate our method, comprising 908 images containing vitrinite, inertinite, and liptinite macerals. According to the evaluation based on this Coal Maceral image dataset using the Intersection over Union (IoU) and Pixel Accuracy (PA) metrics, which are widely used for assessing segmentation performance, our proposed AR-UNet model demonstrates superior performance compared to most cutting-edge segmentation algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, hybrid CNN-Transformer models have attained a state-of-the-art performance in 3D medical segmentation tasks. The effectiveness of this approach is predominantly ascribed to the large receptive field for non-local self-attention, albeit at the cost of a substantial number of model parameters. Despite the introduction of deep convolution to reduce FLOPs of the model, deep convolution still has the problem of frequent memory access. In this paper, we introduce FSTU-Net, a lightweight 3D medical image segmentation model, which aims to adapt the inherent features of the Swin Transformer through ConvNet modules and enhance volumetric segmentation performance using a smaller model capacity. Specifically, we introduce Volumetric Fast Convolution (FConv) with Large Kernel (LK) size to emulate the large receptive field operations generated from attention in the Swin Transformer. We have developed a simple, rapid, and efficient new Fast Convolution (FConv) to eliminate the problem of frequent memory access associated with deep convolution or group convolution, reducing computational redundancy and the number of memory accesses. It holds significant potential to replace the current preferred method of deep convolution. We validated our FConv method and FSTU-Net model on BTCV and AMOS 2022 public datasets, assessing both efficiency and accuracy. Our results demonstrated state-of-the-art performance, effectively reducing the number of model parameters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Mainstream monocular depth estimation methods generally excel in accuracy but fall short in runtime performance. The main challenge in improving computational efficiency is to reduce the computation complexity and memory usage. Addressing this issue, we innovate an unsupervised monocular depth estimation method that not only achieves great real-time performance but also maintain high accuracy. We present a lightweight depth estimation network that leverages inverted residuals. Besides, we build a training scheme with multiple effective loss functions. Experimental validation on KITTI dataset demonstrates that our method not only rivals mainstream models in terms of accuracy but also exhibits lower number of parameters and FLOPs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Compared to conventional cameras, aerial images captured by UAVs primarily consist of small target images at a distance, exhibiting more complex information and richer detailed features. Traditional homography estimation tasks and existing deep learning methods often fail to fully exploit the shallow features of these images, resulting in limited accuracy when estimating homography for complex objects such as drones. Therefore, this paper proposes a deep learning-based method for estimating homography between image pairs. Specifically, we employ a feature extractor and a mask estimator to extract the feature map and mask of each image pair respectively. These maps are then multiplied together to obtain a weighted feature map that effectively highlights the regions contributing significantly to homography estimation. The weighted feature map is subsequently concatenated as the input of our homography estimator network, which incorporates both U-Net architecture and DLT layer as its backbone components. This integration enables accurate calculation of the homography matrix by effectively leveraging shallow features within the deep network structure. Experimental results demonstrate that our proposed method achieves superior accuracy in estimating the homography matrix compared to previous approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the domain of human pose estimation, graph convolutional networks have exhibited notable performance enhancements owing to their adeptness in naturally modeling the representation of human poses through graph structures. However, prevailing methods predominantly concentrate on the local physical connections between joints, overlooking higher-order neighboring nodes. This limitation curtails their ability to effectively exploit relationships between distant joints. This article introduces a Multiscale Spatio-Temporal Hypergraph Convolutional Network (MST-HCN) designed to capture spatio-temporal information and higher-order dependencies. MST-HCN encompasses two pivotal modules: Multiscale Hypergraph Convolution (MHCN) and Multiscale Temporal Convolution (MTCN). The MHCN module represents human poses as hypergraphs in various forms, enabling the comprehensive extraction of both local and global structural information. In contrast to traditional stride convolutions, MTCN leverages multiple branches to learn important frames based on their significance, thereby filtering out redundant frames. Experimental results underscore that MST-HCN surpasses state-of-the-art methods in benchmark tests such as Human3.6M and MPI-INF-3DHP.In particular, our proposed MST-HCN method boosts performance by 1.5% and 0.9%, compared to the closest latest method, using detected 2D poses and ground truth 2D settings respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In orthopedic surgical navigation, real-time tracking of surgical instrument positions can be achieved by attaching the optical reflective markers (ORMs) in the optical tracking system (OTS) to the surgical instruments. However, this approach is susceptible to environmental obstructions, leading to a loss of crucial positional information and adversely affecting the safety of the surgical procedure. To counter this issue, the presenting study implements a YOLOv8 network to predict the 2D positional information of the ORMs from a dual camera system perspectives. Subsequently, a 3D reconstruction based on a binocular algorithm is employed to reconstruct the spatial position of the ORMs. The experimental outcomes demonstrate that the methodology proposed herein provides an improved balance between accuracy and processing speed. The performance of the method was evaluated using an artificial dataset. Two variations of the YOLOv8 were tried in this study. The YOLOv8-N model achieved an average processing speed of 33 frames per second (FPS) with a mean detection error of 8.59 pixels. In contrast, the YOLOv8-X model processed at a slower frame rate of 10 FPS in average, but with a reduced average detection error of 6.36 pixels. This indicates that the proposed method can effectively mitigate the issue of missing position information due to occlusions and offer a feasible solution for accurate tracking of moving objects in environments where occlusion is a concern.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The curve-based lighting adjustment technique is widely used in fields such as photography and image processing. The deep learning-based lighting curve adjustment method has shown excellent performance in the field of low-light image enhancement. However, existing curve-based deep learning methods tend to use complex mathematical formulas to define the curve model and add a large number of regularization constraints to ensure that the curve conforms to real physical scenes. This limits the flexibility of the lighting curve, making it unable to accurately enhance brightness for low-light images, resulting in problems such as regional color distortion and overall color bias. To solve this problem, we propose a novel low-light image enhancement model called Discrete Brightness Curve Estimation (DBCE-Net). In DBCE-Net, we introduce a new method for defining curves to enhance regional illumination more effectively. At the same time, we propose a discrete parameter calculation network based on mutual attention mechanism to estimate the discrete brightness adjustment curve from low-light images. Finally, we use a multi-scale denoising network to handle noise introduced by brightness enhancement in shadow areas. Extensive experiments on various datasets have demonstrated that our DBCENet achieves competitive performance in terms of objective quantitative metrics and subjective visual quality evaluation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For highly sparse down-sampling in the k-space, magnetic resonance fingerprinting (MRF) technology faces the challenge to reconstruct clean fingerprints due to aliasing noise. Based on the compressed sensing (CS) theory, modelbased reconstruction methods reconstruct the fingerprints by solving an optimization problem. However, most stationary information of fingerprints can be reconstructed easily but the temporal varied features which decides the accuracy of dictionary matching. Sparsifying transform learning adaptively learns the transform domain based on image patches, leading to improved sparse representation and consequently enhancing reconstruction efficiency and quality. Therefore, a sparsifying transform learning based magnetic resonance fingerprinting (STLMRF) method is proposed, which integrates MRF reconstruction model with sparsifying transform learning model for the first time. Firstly, aliasing noises are suppressed well for the adaptive sparsity levels got by sparsifying transform learning. Additionally, the discriminative temporal features of the fingerprint are retrieved through dictionary matching constraint. Secondly, to accelerate the reconstruction process, singular value decomposition (SVD) is applied to map dictionary and k-space data into a lower-dimensional subspace. Finally, numerical experiments are conducted to validate the performance of the proposed STLMRF method. Compared with the state-of-the-art reconstruction methods, STLMRF is better in terms of accuracy and robustness. It means that STLMRF holds significant promise for improving the quality and reliability of MRF-based parameter estimation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As machine learning continues to improve the performance of image compression, there is a high demand for deep learning-based image compression algorithms. The first generation of deep learning-based image compression standard, JPEG AI, has emerged. Compared to the linear transform methods in traditional compression frameworks, deep learning-based image compression codecs use non-linear transform to extract visual features ranging from low to high levels in a large number of training samples, thereby achieving much higher compression performance. JPEG AI aims to explore image encoding methods that are more efficient than existing image codecs. In the JPEG AI official verification model, the Content Adaptive Inter-Channel Correlation Information (ICCI) subnetwork is used to reconstruct compressed images to achieve higher quality, but the complexity and parameter number of this subnetwork are relatively high. To solve this problem, we propose a simplified ICCI (sICCI) based on the Y, U, and V components. Compared to the standard ICCI module in JPEG AI and its lightweight version eICCI, our proposed sICCI significantly reduces network complexity and model parameters while keeping competitive image reconstruction quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Positron Emission Tomography (PET) images suffer from low spatial resolution, resulting in suboptimal visual effects and the inability to effectively display subtle pathological areas, hindering the early detection of potential issues. To address this problem, we propose a cascade multi-output super-resolution reconstruction method based on multi-channel input. Firstly, we construct a cascade super-resolution model by introducing degradation functions to refine the LR-to- HR mapping range. Through gradual super-resolution reconstruction, we comprehensively consider information loss and deformation issues during the image resolution reduction process, providing more accurate and higher-quality superresolution reconstruction results. Secondly, we introduce high-resolution CT images to accurately restore image texture details by incorporating additional high-frequency detail information and maintain overall image consistency by the integration of extra structural information. Finally, we incorporate region-based super-resolution detection information to adaptively reconstruct different areas of the image, avoiding distortion caused by excessive super-resolution and blurriness resulting from insufficient super-resolution. Experimental results demonstrate that our approach outperforms other methods, with SSIM, PSNR, and RMSE metrics reaching 0.9607, 34.9438, and 0.0201, respectively, achieving state-of-the-art performance. Furthermore, visual experiments demonstrate a significant improvement in the resolution of the reconstructed PET images using the method proposed in this paper. This effectively compensates for the deficiencies in the original images, providing strong support for the early detection of potential issues.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study introduces FE-RMVSNet, a novel deep learning-based multi-view 3D reconstruction network. It effectively addresses common challenges in 3D reconstruction, such as occlusion and computational limitations, by enhancing feature extraction. FE-RMVSNet demonstrates superior performance in both accuracy and completeness compared to existing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Hyperspectral image (HSI) reconstruction based on coded aperture snapshot spectral imaging (CASSI) systems aims to recover 3D HSI data from 2D measurements, which is an ill-posed inverse problem. In this paper, by taking advantage of mask modulation information and the spectral correlation, we propose a lightweight frequency-enhanced network with spectral-spatial dual priors (called FSDPNet). First, we treat the modulation mask as a spatial prior and propose frequency-spatial reconstruction modules (FSRMs) to progressively recover spatial details. In FSRMs, frequency learning blocks are designed to model long-range spatial dependencies in the frequency domain, and enhance the conventional spatial feature learning. Second, based on the spectral prior of HSIs, we design a spectral similarity loss to guide the reconstruction process. Experimental results on both simulation and real HSI datasets demonstrate that FSDPNet with few network parameters (0.98M) outperforms state-of-the-art methods in terms of reconstruction quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Current plant organ analysis algorithms are mostly based on two-dimensional images. There is an issue of overlapping in plant organs in the 2D space, and only specific species of plants can be well handled in 2D. Therefore, we propose a method for propagating labels of plant organs by using a modified Graph Neural Network on three-dimensional plant point clouds. We first convert the three-dimensional point cloud files of plants into a dataset with a graph data structure, and then input it into an improved GCNII for training. Only a few sampled points of the plant are needed for label propagation, then it can lift the efficiency of plant organ segmentation and classification. The accuracy of label propagation results for maize, sorghum, tomato, and tobacco can reach over 95%. The research is of great significance to reducing the amount of manual labeling work load for 3D plant phenotyping.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces a point cloud registration algorithm based on neighborhood feature similarity, specifically targeting the challenge of quality optimization in component assembly within the modern manufacturing industry, particularly for precise contour fitting. By enhancing the classic Iterative Closest Point (ICP) algorithm, the study focuses on feature extraction of contour point clouds and improving fitting accuracy. The algorithm incorporates neighborhood feature similarity based on curvature characteristics and multi-neighborhood normal vector variation, along with a loss function design based on contour deviation. Experiments on fitting actual headlight contour point clouds with CAD theoretical model point clouds demonstrate the algorithm's high precision and faster iteration speed in aligning narrow-band point clouds with unclear feature points, significantly outperforming traditional ICP and its variants. The research not only enhances component assembly quality but also provides an effective tool for contour point cloud fitting in complex manufacturing scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
CT image reconstruction requires certain requirements for projection data. When the projection data is complete, analytical algorithms (such as filtered back-projection algorithm, FBP) or iterative algorithms (ART) can be used for reconstruction. However, in the actual process of obtaining projection, due to limitations such as the geometric position of machine scanning, the structure of the scanning object, and the reasonable and lowest possible radiation dose, the system is difficult to obtain complete projection data, resulting in sparse projection reconstruction or limited angle projection reconstruction problems. This paper proposes a deconvolution iterative algorithm based on direct backprojection for the limited angle reconstruction problem of G-arm CT, and uses the newly proposed reconstruction algorithm combined with L0-GIF to simulate the reconstruction of limited angle projection data. By changing the missing angle, the reconstruction effect under different missing angles was studied. The experimental results showed that our algorithm can achieve good reconstruction quality below a missing angle of 100.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The development of remote sensing technology has made large-scale and long-term observations possible, but extracting land cover distribution from complex remote sensing data is full of challenges. Convolutional neural networks (CNNs) have played an important role in remote sensing data processing, but their performance is still limited by the ability to combine spatial-spectral information and extract discriminative features. To address these issues, a multiscale feature extraction network (MFENet) was designed for land cover classification. Firstly, multiple multiscale feature extraction layers are used to mine deep spatial-spectral features. Secondly, the gated feature enhancement (GFE) module is employed to further enhance the non-linear feature expression ability. Finally, the joint optimization of the classification model using cross-entropy loss and feature discrimination loss achieves inter-class feature differentiation. Extensive experiments over four wetland datasets demonstrate the superiority of the proposed MFENet compared with several classification technologies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Fitzpatrick scale is a commonly used tool in dermatology to categorize skin types based on melanin and sensitivity to Ultraviolet (UV) light. Existing methodologies for Fitzpatrick scale classification use Individual Typology Angle (ITA) approach for image classification. A primary task is to apply specific filters to detect skin regions in the image. However, such approaches relax their accuracy criteria allowing one tone difference, and the classification accuracy is no more than 75%. In this paper, we present a novel approach that uses specialized filters to detect and remove skin surface attributes, i.e., wrinkles and pores, over a dataset produced in a controlled environment by a lightweight u-health edge device. Image features are modeled as a 3-dimensional feature vector, and we conducted extensive Fitzpatrick classification experiments using Machine Learning (ML) models. The cross-validation outcomes demonstrate improved accuracy, reaching up to 90% while outperforming state-of-the-art methods without relaxing the accuracy criteria.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of computer technology, there is growing interest in developing deep learning systems to assist doctors in diagnosis. In specific clinical applications, the existing deep learning methods are faced with many problems. One of the most important problems is the lack of large and reliable labeled datasets, since the annotation of medical images requires professional knowledge. Furthermore, medical pictures include a lot of noise and are more erratic and blurry, making detection more challenging. This research attempts to create a deep learning-based system for classifying benign and malignant breast cancers, which can help physicians diagnose patients more accurately, increase productivity, and lower the risk of misdiagnosis. Breast ultrasonography is a common diagnostic tool for breast malignancies. Our work suggests that ultrasound pictures of breast cancers may be classified using the deblurring masked autoencoder. During pretraining, this technique adds deblurring to the proxy job of MAE, which is more suitable for tumor classification task based on ultrasound images. Our experimental findings show the model we propose works well, which achieves an AUROC score of 93.89 on large dataset and an AUROC score of 88.45 on small dataset, resulting in state-of-the-art performance in the ultrasound picture categorization of breast cancers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we propose a fusion model based on DenseNet201 and InceptionV3 aimed at improving the accuracy of 8-class classification of the BreakHis breast cancer histopathological image dataset. The BreakHis dataset poses significant challenges for automatic classification tasks due to its high feature similarity and class imbalance. Our model leverages the deep connectivity of DenseNet201 and the broad exploratory capabilities of InceptionV3 to capture comprehensive features from local to global scales, enhancing adaptability to various feature scales. This fusion model outperforms individual DenseNet201 or InceptionV3 models in key performance metrics such as precision, recall, and F1 score. It shows marked performance improvements, particularly in categories with high feature similarity and fewer samples, demonstrating its effectiveness in addressing internal dataset imbalances. Additionally, the model exhibits stability across multiple training and testing iterations, further validating its reliability and effectiveness in classifying breast cancer histopathological images. The results indicate that the fusion of DenseNet201 and InceptionV3 models can enhance the accuracy of automatic classification of breast cancer histopathological images, especially in categories with high feature similarity. This finding provides valuable insights for the development and selection of classification models and contributes to future work in medical image analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Most of the current display devices are with eight or higher bit-depth. However, common multimedia tools cannot achieve this bit-depth standard. Image de-quantization can improve the visual quality of low-bit-depth images for displaying on high-bit-depth screens. To achieve image de-quantization, we propose the DAGAN algorithm to perform super-resolution on image intensity resolution, which is orthogonal to the spatial resolution. DAGAN employs Generative Adversarial Networks (GANs) and achieves photo-realistic de-quantization via an end-to-end learning pattern. Our DAGAN consists of a dense residual non-local network (DRNN) and a discriminative network. We design the Dense Residual Non-local Block (DRNB) to construct DRNN. DRNB utilizes the dense network architecture to enhance the representation ability and employs the non-local module to extract features that capture long-range dependencies between pixels. Furthermore, we use the adversarial learning framework to promote our DRNN to produce high-quality natural images. Experiment results on several public benchmarks prove that our DAGAN can generate photo-realistic high-bit-depth images without quantization artifacts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Gabor wavelet is widely used to simulate the receptive fields of simple cells in the low–level visual cortex (such as V1, V2, and V3) of the human visual system. Based on this, end-to-end encoding models have achieved advanced encoding results in the low–level visual cortex. However, most current end-to-end encoding models are lightweight models with relatively simple structures and few parameters. This limitation may cause the models to perform poorly in processing detailed features of different frequencies and directions in complex Gabor feature maps. In this paper, a novel visual coding model GaborNeXt based on Gabor features is proposed. The model utilizes ConvNeXt convolutional layers to group independent convolutional kernels for convolutional operations and concatenates the outputs of each group to enhance nonlinear expressive power. We conducted experiments on the NSD (Natural Scenes Dataset) and the results demonstrate that our model outperforms the baseline models in encoding accuracy across several the low-level visual cortex. Additionally, we compared the effects of various Gabor convolutional layer kernel sizes on model performance through ablation experiments and found that using larger convolutional kernels in the Gabor convolutional layer has a positive impact on the model's performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To cope with video artifacts reduction, a new algorithm using bilateral filter is proposed. Compared to the ad-hoc denoising algorithms, a new method demonstrates the super performance both subjective and objective evaluation. Additionally, for reducing the heavy computation load of bilateral filter, the fast fixed point bilateral filter is proposed. all the test results show that, by using proposed fast implementation, it is possible to keep the same performance as well as the original one.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Various sources of measurement errors can compromise the accuracy of three-dimensional (3D) surface topography measurement systems in complicated situations with high ambient light noise. The impact of ambient light on measurement accuracy stands out among the others. To improve the accuracy of surface structure light reconstruction, this academic work suggests a novel compensation technique that focuses on the multiple error model fitting scheme. Discrete error analysis (DEA) is incorporated into the method, and a Double Perceived Phase Shifting (DPPS) algorithm is introduced. To create a binocular structured light reconstruction system utilizing the triangulation approach, the research first analyzes the 3D reconstruction algorithm of structured light based on spatial phase unfolding. Then, the analysis concentrates on the measuring system's grating acquisition error while taking into account the effects of various noise sources on the streak acquisition in the binocular reconstruction system. The sources of the error are categorized, and a Gaussian mixed stochastic model and a periodic error model are used to build the measurement system's overall error model. The optimal fringe is then obtained, and the mapping relationship between the projection and camera acquisition is improved for greater spatial fringe, by applying the DEA approach to assess the impact of the inaccuracy on the phase error of the structured light. Ultimately, a comparative study is carried out to evaluate the effectiveness of the suggested method. The collected findings show that the algorithm is stable and has high-precision measuring capabilities. The approach, which doesn't require extra phase shifting steps, notably shows notable improvements over traditional techniques like the Dual-Frequency Pattern Scheme (DFPS) and Third Harmonic Injection Algorithm (THIA). The DEA+DPPS achieved an accuracy improvement of 23.92% on average.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Neural implicit representation has emerged as a popular research direction within the realm of 3D deep learning, with a variety of implicit expression methods such as occupancy field, singed distance field (SDF), unsigned distance field (UDF), and NeRF being extensively employed in applications including 3D reconstruction. In this paper, we introduce an innovative blockwise high-resolution voxel representation and rough voxel super resolution technique based on diffusion models. We encode high-resolution voxel models using a set of latent vectors and reconstruct the original voxel models through the diffusion process. The experimental results validate that our approach achieves highly precise reconstruction outcomes in both voxel implicit representation and rough voxel super resolution tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Pansharpening involves synthesizing high resolution hyperspectral images (HRHS) by combining low spatial resolution hyperspectral (LRHS) images and panchromatic (PAN) images. Existing methods perform inadequately in extreme panchromatic sharpening, often resulting in excessively blurred HRHS images. The main reasons are the combination of inputs and training loss functions overly simplistic, and the spatial details of upsampling LRHS are severely distorted, which leads to the weakening of the neural network's ability to utilize the spatial information of PAN images. To address this issue, we propose a spectral-spatial dual injection network (SSDINet) combined with panchromatic loss for extreme pansharpening. SSDINet alleviates the blur problem of HRHS images during extreme pansharpening by adding an additional pseudo hyperspectral (PHS) image input and combining it with upsampling LRHS images to form an additional spectral injection branch that is different from spatial injection. Additionally, during network training, use an extra panchromatic loss to alleviates the problem of incomplete utilization of PAN images. Panchromatic mapping is realized by neural network. Experimental results demonstrate the superior performance of our approach compared to representative methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the process of generating Synthetic Aperture Radar (SAR) images, the noise based on speckle is produced due to physical reasons, and its presence seriously affects the interpretation and post-processing of the images. This study mainly focuses on the limitations of speckle noise in SAR images and summarizes the denoising algorithms in this field. Firstly, the SAR imaging principle is introduced, and the model of speckle noise generated during the imaging process is explained. Then, the WoS, ELSEZVIER, ScienceDirect, and Scopus databases are searched by using Boolean operators, and 30 denoising algorithms that used deep learning theory for SAR images in the past 5 years (2018-2022) are introduced and their relevant functions are counted. Finally, this study aims to provide ideas for the subsequent research on SAR image denoising by summarizing the current mainstream algorithms and their characteristics
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Although modern image inpainting has gradually become more reasonable and realistic in quality, there are still significant challenges for large-holed inpainting and diversity inpainting. Models trained adversarially generate realistic but not diverse results, and models optimizing the lower bound of likelihood fit the original data distribution, resulting in blurred reconstructed images. To overcome these problems, we propose a new image inpainting method that adopts a strategy based on a denoising diffusion probability model. The diffusion model directly optimizes the computation of data loglikelihood, thus achieving more accurate inference. We introduce additional condition during network training and sampling to guide image generation. By adjusting the model structure, we introduce an attention mechanism into the network predicting noise to improve the quality of image generation. We conducted experiments on CelebA-HQ and FFHQ to evaluate the effectiveness of our model. Compared with the current most advanced inpainting methods, our model can generate higher quality and more diverse images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Quantum Image Processing (QIP) is a field that aims to utilize the benefits of quantum computing for manipulating and analyzing images. However, QIP faces two challenges: the limitation of qubits and the presence of noise in a quantum machine. In this research we propose a novel approach to address the issue of noise in QIP. By training and employing a machine learning model that identifies and corrects the noise in quantum processed images, we can compensate for the noisiness caused by the machine and retrieve a processing result similar to that performed by a classical computer with higher efficiency. The model is trained by learning a dataset consisting of both existing processed images and quantum processed images from open access datasets. This model will be capable of providing us with the confidence level for each pixel and its potential original value. To assess the model's accuracy in compensating for loss and decoherence in QIP, we evaluate it using three metrics: Peak Signal to Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Mean Opinion Score (MOS). Additionally, we discuss the applicability of our model across domains well as its cost effectiveness compared to alternative methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video surveillance images in coal mines often suffer from overall darkness, low contrast, and strong background noise. Developing comprehensive algorithms that can effectively enhance low-light images while simultaneously reducing noise remains a challenging task. This paper presents a two-stage image enhancement algorithm tailored for mine environments, employing deep learning techniques. The algorithm comprises a low-light enhancement stage that utilizes light enhancement curves to improve image sharpness and contrast, followed by a denoising stage that removes noise from the enhanced images while preserving crucial mine image details. Furthermore, two dedicated mine image datasets were constructed to evaluate the proposed method. Experimental results on these mine image datasets and the BSD300 dataset demonstrate that the algorithm can significantly improve performance, achieving state-of-the-art results by synergistically combining brightness enhancement and denoising tailored for low-light mine conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multiple image steganography involves hiding multiple secret images in a single cover image and eventually recovering all secret images from the stego image. Although the current image steganography has been greatly improved in terms of steganography capacity and invisibility, its security, as well as its steganography capacity and invisibility in style transfer scenarios are facing serious challenges. To solve these problems, an improved high-security multiple image steganography framework SMIS is proposed by combining invertible neural network and style transfer network. First, the introduction of a pre-trained style transfer network can disrupt the hidden features of the secret image embedded in the stego image, thus significantly reducing the detection accuracy of steganography. Secondly, to enable the dense invertible neural network (DINN) to extract secret images effectively, on the one hand, a feature reconstruction module (FRM) is designed based on the UNet structure to reconstruct the semantic features of the secret information, and on the other hand, a semantic reconstruction loss is designed to constrain the overall training of the steganography model. In addition, to solve the problem of limited global receptive field and excessive feature redundancy of UNet, the explicit visual center (EVC) and a self-calibration Neck (ScNeck) are introduced. The experimental results show that the proposed SMIS can efficiently hide the multiple secret images greatly improve the invisibility of steganography, and extract them from stylized stego image. At the same time, the security of the proposed method is higher than that of the existing mainstream methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The temporal and spatial variation characteristics of MODIS NDVI data in Ningxia during the past ten years were analyzed by proper orthogonal decomposition(POD) method. The analysis result indicates that the temporal and spatial distribution of NDVI data in Ningxia area has three typical modes, and the accumulation contribution rate of eigenvalues variance of the first three modes is more than 95%. The time coefficients of mode 1 and mode 3 show an upward trend, and that of mode 2 shows a downward trend, with obvious interannual variation. Mode 1 decides that the NDVI trend is consistent, which is characterized by increasing or reducing in the whole region. The eigenvector of mode 2 shows strong oscillating characteristics in some areas of the plain area, which may be strongly related to the rapid urban development in the area. The eigenvector of mode 3 in the southern Liupan Mountain has a small oscillatory performance, which may be related to the local variation of farm land returning to woodland in a decade. The results show that the POD method is also suitable for the analysis of the NDVI flow field.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multiple Object Tracking (MOT) is a classical task in the field of computer vision, which aims to identify and track all objects in a video scene and assign a unique ID number to each object. Tracking-by-Detection (TBD) paradigm has become the mainstream framework for MOT due to its high Tracking accuracy. With the development of UAV technology, MOT research for UAV video has important military and civilian value. However, it faces challenges such as class imbalance, many small targets, and occlusion of targets in the scene, which makes it difficult to correctly match and continuously track the targets. We propose a new algorithm for the MOT problem in the UAV scenario. On the one hand, to solve the class imbalance problem of small targets, a dynamic adjustment parameter adjustment method based on the gradient information of training samples is proposed to improve the generalization ability of traditional loss function in multi-class target tracking. On the other hand, to improve the accuracy of inter-frame matching, this paper introduces a new feature similarity calculation method, which is based on the Wasserstein distance and optimizes the matching process according to the weight allocation mechanism of feature importance. Finally, the effectiveness of the proposed algorithm is verified on the VisDroneMOT2019 dataset. The results show that compared with the existing MOT algorithm, the proposed algorithm has significant improvements in tracking accuracy, trajectory integrity and identity maintenance, achieving 38.8% MOTA and 52.8% IDF1, which are better than the existing state-of-the-art tracking algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In low light environments, the collected images have characteristics such as low contrast, low signal-to-noise ratio, and loss of details, leading to an overall decline in image quality. Low light image enhancement aims to restore images with complete details and has gradually become a research hotspot in computer image processing. With the large-scale increase in data volume in recent years, deep learning based methods have gradually become mainstream. This article provides a detailed classification and analysis of low light image enhancement methods based on deep learning, sorts out various networks used, introduces the basic principles and steps of various algorithms, and introduces the existing low light image enhancement datasets and evaluation methods. Finally, a summary of the content was made, pointing out the difficulties in current research and providing prospects for future research directions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
COVID-19 outbreak has so far caused us much inconvenience, resulting in significant economic losses and casualties. CT scanning, as a fast and sensitive detection tool, plays an essential role in delineating infected regions. However, CT images of COVID-19 exhibit complex variations in infected regions and have limited expert annotation data. In this research, a novel Dilated Convolution (DC) module, incorporating a residual structure, is introduced to systematically broaden the sensory field of the model. Following this, the Pyramid Pooling Module (PPM) and Multiscale Attention Module (MSA) are synergistically fused to create an advanced multilayer attention module. Finally, the proposed Enhanced Decoder Path (EDPath) module serves the crucial role of bridging the gap between the encoder and decoder. This module effectively resolves challenges associated with the loss of high-resolution information and the issue of gradient vanishing during decoder transmission. A large number of ablation experiments and comparative experiments have demonstrated that each of the proposed modules is effective in improving the model performance, and the model outperforms the more advanced models in terms of DICE, IOU and SEN. This experimental study demonstrates that an improved segmentation network model based on Unet and attention mechanisms has high accuracy in the COVID-19 lesion segmentation task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Currently, there are issues of sample imbalance and insufficient sample quantity in graph-based Alzheimer's disease prediction methods. This can lead to classifiers being biased towards the majority class samples and result in overfitting. To address this problem, a graph-based data augmentation node expansion algorithm is proposed. Firstly, graph representation learning is used to reduce the original feature vectors to a low-dimensional space. This message aggregation method ensures that the low-dimensional vectors contain the potential structural information of the data, preventing structural damage that may arise from direct expansion on the original data. Secondly, in the low-dimensional space, an adaptive-weight node expansion algorithm is employed to generate new nodes, overcoming the boundary fuzziness issue of traditional oversampling algorithms. This weight expansion algorithm adjusts the priority of each expansion node to control the generation position and quantity of new nodes. Finally, the expanded graph is fed into a Graph Neural Network classifier for prediction. Quantitative experiments on the Tadpole dataset and NACC dataset demonstrate that the proposed graph-based data augmentation model achieves the highest accuracy. The average accuracy was 93.84% vs 92.8% on the Tadpole dataset and 90.11% vs 88.29% on the NACC dataset. In addition, additional ablation experiments have demonstrated the effectiveness of node expansion in graph structures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose and construct a new model for predicting cisplatin resistance in ovarian cancer using a deep learning neural network based on multi-modal data fusion. This multimodal data consists of ovarian ultrasound images(US) and color Doppler flow images(CDF). Firstly, collect clinical multimodal data, train U-Net network, and segment regions of interest in the image. Subsequently, the trained U-Net model is used to segment the image and obtain regions of interest, namely the regions of interest in ultrasound images and blood flow images. Finally, the ultrasound images of the regions of interest are segmented and input into a diagnostic network for feature extraction. During the feature extraction process, the transformer module is used to interact and fuse the two modal data, and the features of these two modal data are comprehensively utilized to achieve visual prediction of cisplatin resistance in ovarian cancer. In the experiment, we compared this model with other network models and demonstrated that our proposed model effectively predicted drug resistance in ovarian cancer patients, maintaining approximately 81% accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Immunohistochemistry is an essential technique in the quantitative evaluation of tumors, significantly impacting the diagnosis, treatment and prognosis of esophageal cancer. This paper presents a novel method for segmenting immunohistochemical images of esophageal cancer cells, which improves upon the adaptive genetic algorithm and the Otsu algorithm to address issues of inconsistent staining and imprecise segmentation. The method introduces a newly designed threshold discriminant function that takes into account both within-class and between-class variances. Utilizing an enhanced adaptive genetic algorithm, the method efficiently determines the optimal threshold, while an upgraded crossover probability function prevents the algorithm from settling on local optima. The experimental results show that the algorithm can achieve a segmentation accuracy of 95.3%, surpassing the performance of the standard Otsu and genetic algorithms, especially in processing immunohistochemical images with uneven staining.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Perivascular spaces (PVSs) segmentation algorithms based on deep learning have made steady progress, while these methods are usually limited by the information loss on thin tubular structure of PVSs during downsampling and the diffculty in capturing various morphologies. Addressing these challenges, we propose a global-local interactive multi-scale network (GLoMS-Net) based on the hybrid of convolutional neural network and transformer architecture for automatic PVSs segmentation. Specifically, global imaging features are extracted using a Transformer-based sub-network, and multi-level local features are extracted using the encoder of the TransUNet architecture. Afterward, the global and multi-level local features are interacted and these interactive features are fed into the decoder of the TransUNet via skip connections to compensate the information loss due to downsampling. Furthermore, we use the multi-scale feature learning module and the multi-scale fusion strategy to capture comprehensive contextual features of PVSs. The experimental results on 3T brain MR images from 56 participants with 168 slices demonstrate the superior performance of the proposed GLoMS-Net compared with several state-of-the-art methods for PVSs segmentation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Precise volumetric evaluation of the liver is crucial to mitigate the risk of postoperative liver failure following hepatectomy. However, existing liver resection volumetry calculation methods offer limited functionality, providing only liver and tumor volume, and simple calculation for the future liver remnant (FLR). To enhance understanding of liver resection volumetry, we introduce a flexible tool, able to integrate the resection plans with different underlying data (liver parenchyma, liver segments classification) and allow the user to interactively select and calculate the volume of chosen regions of interest (ROI) whether individually or in combination with other ROIs. This flexibility makes this tool scale to complex cases, for example, multiple resections in the same resection plan. Working alongside an experienced surgeon, we implemented two resection strategies and investigated various ROI volumes to see the difference between the two strategies. Through the experimented usage scenarios, we effectively showcase the tool’s proficiency in facilitating complex liver volumetry analysis for liver resection planning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Hybrid transformer-based segmentation methods have proven to be highly effective in analyzing medical images. However, they often demand significant computational resources for training and inference, which can be challenging for resource-scarce medical settings. In response, a novel framework named Condensed UNETR is introduced, which strikes a balance between precision and efficiency by combining the strengths of convolutional neural networks and transformers. The Condensed UNETR Block, a key element of this approach, facilitates efficient information flow through a self-attention mechanism decomposition and streamlined representation merging. The framework also incorporates the throughput metric as a measure of efficiency to monitor the model's resource usage. Experiments have shown that Condensed UNETR surpasses leading models in accuracy, size, and efficiency on devices with limited resources.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Space-time adaptive processing (STAP) is an important radar and sonar technique that can be used to suppress clutter and jamming. However, traditional constant false alarm rate (CFAR) cascade detection methods are difficult to provide explicit location and number of targets and jammings, while general purpose data-driven object detectors usually consume a large number of floating point operations (FLOPs) and parameters. To deal with this problem, we propose a new idea to design a dedicated data-driven object detector to predict bounding boxes and class probabilities directly from power images of STAP (STAPDet) in this letter. This idea embraces the characteristics of STAP to customize the detector architecture. Specifically, STAPDet first proposes an ultra-lightweight backbone part to effectively recognize the obviously different STAP objects. Second, the proposed detector enlarges the receptive field of detection head to cover the limited scales of the STAP objects instead of using the complicated neck part to fuse multi-scale features. Last, STAPDet adopts the single detection head to predict the sparse STAP objects with better simplicity and fewer parameters. Experiments on real-world data demonstrate that STAPDet provides accurate location and number information of objects while greatly reducing computational complexities and parameters compared with existing state-of-the-art counterparts. These results validate the effectiveness of our idea and suggest a new perspective to design efficiently dedicated detectors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Remote sensing image change detection is an important task in the field of remote sensing image analysis, and it is widely used in urban planning, disaster detection, environmental protection and other fields. A U-Net++ based remote sensing image change detection network is proposed to address the issues of complex backgrounds, diverse types of changes, missed detections, and rough boundary recognition in high-resolution remote sensing images in change detection tasks. This algorithm uses U-Net++ as the backbone extraction network, and applies a Siamese neural network structure in its encoder to extract features from two different time images. In the convolutional part, the CBAM attention module and Mish activation function are fused to improve the network's feature extraction ability. In addition, the MSOF strategy is used to fuse the results of different levels of the U-Net++ network to output the final result map to improve the accuracy of the network.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study aims to use object detection techniques for the purpose of identifying and detecting the location of cows on a field. The motivation for this study is to identify any cows that exhibit abnormal behavior such as self-isolation from the herd or lack of movement, which may indicate sickness or injury. Early detection of such behavior is crucial for swift intervention mitigating the risk of further injury or fatality. The proposed solution to this problem is to use an automated drone that periodically flies over the field taking pictures of the cattle. These pictures can then be analyzed to determine the precise location of all cows within the image. To identify all cows within aerial images, two methods that work together are developed and used. The first method focuses on color image processing and aims to identify all cows within the image by locating the highest concentration of cow related pixels within a given area. The second method uses the bag of features machine learning model and categorizes image pixels into ‘cow’ and ‘grass’ subcategories. These methods complement each other as the first provides a general center location of cows in an image, aiding in training the machine learning model with correctly identified cow images. The bag of features model refines this by not only identifying cows but also approximating their general shape and size. Together, these techniques demonstrate promising results in efficient and accurate cow detection in aerial field images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Human-Object Interaction (HOI) Detection aims to locate and recognize HOI instances in the form of in images or videos. However, the significant cost of manpower and resources for annotations poses challenges, particularly in addressing the long-tail or even zero-shot distribution problem. Additionally, the generation of visual features from fixed semantic information encounters the issue of a lack of diversity. In response of these challenges, we develop a novel Visual-Semantic and Multi-source Feature Generation (VSMG) network for zero-shot human-object interaction detection. Firstly, by combining the visual-semantic GAN, the model not only generates visual/semantic features from corresponding semantic/visual ones but also introduces the diverse unseen visual features during the training phase. Secondly, by employing a knowledge-aware relation graph, the model encodes the relations between objects, actions and interactions including both seen and unseen classes. Based on the relation weights in the relation graph, a dynamic multisource feature generation strategy is performed to generate diverse virtual visual features for unseen classes. Finally, experimental results on HICO-DET dataset validate the effectiveness of our proposed method, demonstrating improvements the detection performance of the trained HOI detector.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Compared to traditional image target detection, the scale variation of the targets to be detected in remote sensing images presents significant differences. The presence of other objects in complex backgrounds greatly interferes with the accuracy of target detection. This would make it difficult for the model to capture crucial detailed information about these targets, thereby impacting detection accuracy. Therefore, this paper proposes an improved algorithm aimed at enhancing the model's perception of multi-scale targets and improving detection accuracy. Firstly, we propose the SPPCSPC-NCA module, which accelerates the model inference speed while maintaining the same receptive field through the maximum pooling layer and Hardswish activation function in a series operation. Secondly, to reduce feature loss, this paper uses a sub-pixel convolution upsampling module to better capture detail features. Finally, the ASFF module is used to achieve adaptive weighted fusion of different scale features, further improving detection performance. Experimental results show that on the DIOR dataset, the improved model has increased by 1.5% and 2.1% on mAP50 and mAP50:95 respectively, verifying the effectiveness of the algorithm improvement proposed in this paper.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Detection of small objects is fundamental for advanced driving assistance systems (ADAS) to implement avoidance measures. Nevertheless, detecting small objects on roads presents a significant challenge. To tackle this challenge, this paper proposes an enhanced algorithm for detecting road foreground small objects using YOLOv5s. First, the Polarized Self-Attention (PSA) module is introduced to boost the performance of the YOLOv5s model. This module allows the model to prioritize foreground object learning while reducing attention to the background, thus improving detection robustness in complex background conditions. Second, optimization of the loss function is conducted by utilizing the Area-Based IoU (ABIoU) Loss, which imposes constraints on object regression loss based on area ratios. This significantly improves the model's ability to learn image features and enhances the accuracy of object detection. The experimental findings reveal that the enhanced YOLOv5s model demonstrated a notable improvement over the original model, with an increase in mAP of 2.7% and 2.9% on the Lost and Found and MS COCO datasets, respectively. Additionally, the precision witnessed a corresponding enhancement of 3.5% and 3.2%. This advancement facilitates the more efficient and precise detection of road foreground small objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Various applications such as urban monitoring, security, and autonomous systems rely heavily on object classification in video imagery. In this paper we present a backbone for an object detection model that uses ConvNeXt architectures with transfer learning to focus specifically on vehicle classification. By adapting the ConvNeXtBase and ConvNeXtXLarge models, and utilizing “Car Object Detection” dataset which consists of numerous videos captured in different environmental settings including varying traffic densities, weather changes and light intensities. To improve the classification capabilities to match vehicles, specialists are incorporated into these adaptations who have developed special convolutional and fully connected layers. This is accomplished through our transfer learning approach that helps the model produce distinctive features needed for accurate detection. Our models are systematically evaluated using standard performance metrics. For instance, ConvNeXtBase achieves 97.91% accuracy with validation accuracy being 97.82%, while ConvNeXtXLarge has an accuracy of 98.34% with validation accuracy at 98.11%. These results not only outperform numerous baseline models but also demonstrate that our models are effective in real world scenarios. The results obtained from this study constitute a significant contribution towards the development of intelligent transport systems as well as provide a solid foundation for future improvements in object classification via transfer learning methods. That’s why you should highly value methodologies provided in this article because they will be useful for any further findings in enhancing intelligent transportation systems by means of deep learning techniques applied to video surveillance tasks one of many applications where transfer learning can be employed successfully for more efficient outcomes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Change detection in remote sensing is an important technique used to monitor surface changes on Earth. However, due to the complexity of remote sensing images and noise interference, traditional change detection methods and some deep learning methods often suffer from incomplete change boundaries and missed or false detections. To address these issues, we propose a Prior Change Information Guided Network (PCIGNet). PC IGNet utilizes known change information as prior knowledge to guide the network in accurately predicting change regions during the detection process. First, we introduce a self-attention module called the Guided Fusion Module (GFM), which overcomes the limitations of traditional convolutional fixed receptive fields and effectively captures long -range information. It leverages the high-level semantic representations of deep features and the local detailed information of shallow features as prior knowledge to aid in the progressive concatenation and fusion of multi -level features. In addition, we propose a Feature Aggregation Module (FAM ) to aggregate features extracted by the backbone network, thereby enhancing the strength of dual-temporal feature representation. The effectiveness and robustness of our proposed method have been validated on two widely used public datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Lensless digital holographic microscopy holds significant importance in areas such as environmental monitoring and biological specimen analysis. In order to address the challenges of cumbersome procedures and suboptimal recovery results when obtaining amplitude and phase information of objects in lensless imaging, in this paper, we propose a purely physics-supervised trained Fourier neural operator network (PSF-Net) for holographic particle imaging. Training the network solely using a few holographic particle images in the absence of ground truth. Once parameter optimization is complete, holographic reconstruction can be achieved without obtaining the defocus distance. A lensless holographic system is set up to capture holograms of particle fields. The proposed network is employed for amplitude and phase reconstruction, and its performance is compared with other methods. The results demonstrate that our proposed method exhibits superior performance in terms of reconstruction quality, noise resistance, and twin image elimination.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a novel screen-shooting resilient document watermarking scheme, allowing for the extraction of watermark messages from captured documents displayed on screens. Through careful analysis of distortions encountered during screen-shooting, a specialized watermark embedding and extraction detection algorithm is developed, tailored to document image characteristics. This scheme ensures precise watermark regions detection despite distortions inherent in the screen-shooting process. Additionally, stable Discrete Cosine Transform (DCT) coefficients within the document region are selected, embedded, and extracted, enhancing the robustness of the algorithm. By focusing on the relative stability of DCT coefficients, the algorithm gains resilience against potential distortions and interference, contributing to an improved and more reliable watermarking process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The presence of moving objects in real-world scenarios can lead to mismatches in visual odometry feature points, thereby affecting the accuracy of positioning and mapping by the SLAM system and reducing its robustness in practical applications. This paper introduces a visual SLAM algorithm that leverages the ORB-SLAM3 framework and deep learning techniques. Enhancements to the SLAM system’s tracking thread enable identifying and removing dynamic feature points, thus increasing its adaptability to dynamic environments. Concurrently, YOLOv8s, known for its minimal depth and feature map width in the YOLOv5 series, is selected as the object detection network, with VanillaNet, a lightweight network, replacing its backbone network. This combination effectively determines the mobility of objects within the environment. Consequently, we propose an enhanced algorithm based on YOLOv8s capable of performing both object detection and semantic segmentation to eliminate dynamic feature points precisely. Ultimately, the algorithm’s accuracy and real-time performance were assessed using indoor dynamic scene data from the TUM RGB datasets. In comparison to models lacking any strategic approach, test results on the TUM datasets reveal that the experimental outcomes are more favorable in dynamic environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we develop a novel general covariance matrices based partial least squares (GCMPLS) method for learning latent features from high-dimensional datasets. GCMPLS utilizes Gaussian kernel mappings to map the features in the input spaces to higher-dimensional spaces, thereby being able to extract the nonlinear relationship between two-view features. It uses a simply weighted fusion strategy to fuse general covariance matrices and adopts an alternating iteration method to solve. The effectiveness of this method is validated through a series of experiments. The results indicate that GCMPLS outperforms traditional methods on multiple datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In accordance to the problem of high latency in H.264/H.265 video transmission schemes, research was conducted on the design optimization scheme for H.264/H.265 low latency transmission based on the analysis of video data flow and encoding and decoding processes. Image acquisition and encoding time was optimized by reducing the number of reading and writing operations in the video cache pool, increasing the input image frame rate, using multi-encoder encoding in parallel, etc. Image encoding and display time was optimized by methods such as network data transmission delay optimization, multi-encoder parallel decoding, decoding and display delay optimization, decoding data path optimization, and display output optimization. The experimental results show that the optimal solution is to use VI (Video Input), VPSS (Video Processing Sub-System), VENC (Video Encoder) binding and online mode at the encoding end, and use VDEC (Video Decoder), VPSS, VO (Video Output) binding and direct mode, with an end-to-end display delay of 2-4 frames, which greatly reduces the delay of H.264/H.265 video transmission.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The rapid growth in digital content has necessitated advanced retrieval systems capable of efficiently managing and searching through multimodal data. This paper introduces the Cascading Fusion Cross-attention Model for Visual Textual Retrieval, a novel approach designed to enhance the accuracy of retrieving textual descriptions corresponding to visual data and vice versa. The model employs a unique cascading fusion strategy that integrates multiple levels of feature representations from both modalities, ensuring a comprehensive understanding of the content. Simultaneously, a cross-attention mechanism dynamically aligns these features, focusing on the semantic correlations between the visual and textual modalities. Through extensive experiments and comparisons with existing state-of-the-art models, the proposed method demonstrates superior performance in terms of recall. The findings suggest that the Cascading Fusion Cross-attention Model holds great promise for advancing the field of multimodal retrieval.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Modern Image Processing and Engineering Applications
Multi view SAR is an important observation mode in SAR, which can obtain information from different views of the scene. Due to the inconsistent anisotropic scattering parameters of the target, the same target exhibits different grayscale changes on SAR images from different views, resulting in fewer feature points being extracted. For scenes with complex height variations, such as mountain areas, the offset of heights under different views is inconsistent, leading to severe local distortion in multi view SAR images. When extracting features, it is easy to contain a large number of incorrect matches. Therefore, it is crucial to effectively preserve correct correspondence and eliminate incorrect matches in multi-view SAR images. This paper combines the principles of multi view SAR imaging and proposes the use of nearest and second neighbor ratio algorithm and epipolar constraint for feature point matching of multi view SAR images. Firstly, based on the different scattering coefficients of the same target contained in multi view SAR images, feature extraction is proposed for multi view SAR images, and the nearest and second neighbor ratio algorithm is used for initial coarse matching. Subsequently, in order to increase the accuracy of feature matching, it was proposed to introduce the epipolar constraint of multi view SAR images for feature matching. Experiments showed that the proposed algorithm retained more correct feature points compared to the nearest and second neighbor ratio matching algorithm and MSAC algorithm for multi view SAR image feature matching.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Using AIS data to mine the change characteristics of ship flow is helpful to understand the shipping industry and economic policy changes, and to provide reference for the historical events in the corresponding period. The paper selected the number of coastal ships in the east coast of the United States as the research object. The AIS data set contains about 687G of various trajectory data, with a total of about 216 million AIS trajectory data points. Through the stability test of the number of ships along the east coast of the United States, this paper makes a time series analysis of 5 types of ships from three time nodes. The results show that the number of cargo ships changes the most strongly, while other types of ships are pure random non-stable data, and there is no rule to follow. After the outbreak of COVID-19 affected the types of most ships along the east coast of the United States, resulting to the flow trough in different situations, but the cargo ships almost increased the number of other ships except other types after herd immunization, and soon reached the peak; the outbreak of Russia-Ukraine war made the number of tankers further reach the peak, which is inconsistent with the impression of European and American sanctions on Russia to reduce oil imports.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Incremental learning is crucial for handling classification tasks involving continuously increasing new categories. Traditional models often grapple with challenges such as data forgetting and high storage requirements. An approach grounded in one-class envelope techniques effectively mitigates these concerns. However, this method introduces two new challenges: swiftly determining the class label of a new sample, and effectively distinguishing mixed samples online before training new sub-classifiers. To address these issues, this paper introduces an efficient incremental learning strategy that leverages one-class envelopes, where a dedicated one-class classifier is built for each category. A novel hybrid-distance-based search algorithm for the target category is introduced to enhance prediction efficiency during the testing phase. Moreover, for online mixed samples from multiple unknown categories, the K-means algorithm is employed to cluster all outlier samples. Silhouette coefficients are then calculated to identify well-defined clusters for subsequent classifier training. Experimental results show that the proposed approach reduce prediction time by 90% on Cifar100-30 dataset, and the incremental process for unlabeled mixed samples on the MNIST dataset achieves up to 55.3% performance improvement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the booming development of deep learning and image generation technology, the research on sketch-generated face images has achieved remarkable results, however, there are still deficiencies in some scenarios that require high face image fidelity, and it is not possible to generate images that are semantically and geometrically consistent with the input sketches. A semantically controllable sketch-generated face image method is proposed, where some modules is designed to extract the sketch semantics, merge them with the text semantics into a more expressive semantics, and feed them into the generator along with the sketches in order to achieve semantic and geometric alignment. The proposed method is experimentally validated on open-source datasets and homemade datasets, and the experimental results show that the method effectively improves the quality of the generated images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Efficient feature selection is essential for processing hyperspectral images due to their high dimensionality and computational complexity. This paper proposes a band selection method based on a multi-agent system, aiming to reduce dimensionality, decrease computational expenses, and enhance data processing efficiency. The method decomposes the optimization problem into exploration and exploitation tasks, promoting collaboration among multiple agents to search for global optimal solutions in the feature subset space. Additionally, adaptive search boundary design is employed to adjust the search strategy, enhancing search efficiency. Experimental results demonstrate the superiority of the proposed method in band selection tasks, showing significant advantages in search efficiency and stability compared to traditional methods. These findings highlight the potential of the proposed method for practical applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Grade estimation of fresh corn with bracts is mainly done manually. This work is carried out mainly by means of volume estimation by eye, hand squeezing and skinning. In order to reduce the problems of large differences in manual subjective judgment and low sorting efficiency, an image feature fusion-based estimation model for grading fresh corn with bracts was proposed in this study. The data of weight, external dimension and cob center bone line were extracted by weight sensing technology and machine vision technology respectively. Statistics such as distribution pattern, correlation and clustering were carried out on the data. Based on the support vector machine algorithm (SVM), the cob grade estimation model was constructed to discriminate the three classes. Different clusters were validated. Results demonstrate that all the eight categories of data extracted showed a normal distribution and conformed to the statistical pattern. The parameter of fresh corn with bracts is significantly correlated with the parameter of fresh corn cobs and the correlation of data of similar nature was higher than the other parameters. The correlation between length of fresh corn with bracts (LFCB) and length of fresh corn cobs (LFCC) is 0.542. The correlation between diameter of fresh corn with bracts (DFCB) and diameter of fresh corn cobs (DFCC) is 0.7. The correlation between effective length of fresh corn cob (LECC) and LFCB is 0.473. All other correlations exceeds 0.2 or more. The highest classification discrimination accuracy is achieved by using weight of fresh corn with bracts (WFCB), DFCB, and length of inflection point (LIP) as inputs, and the results of cluster analysis of LFCC, DFCC, and weight of fresh corn cob (WFCC) as outputs. The modeling accuracy is 91%. Considering weight information cluster analysis modeling is better than other clusters. The proposed method improves the efficiency of cob grade estimation for fresh corn with bracts, and provides a certain reference for the automated sorting of fresh corn.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.