PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 1307701 (2024) https://doi.org/10.1117/12.3027482
This PDF file contains the front matter associated with SPIE Proceedings Volume 13077, including the Title Page, Copyright information, Table of Contents, and Conference Committee information
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024)
D. Chandana, M. Tushara, A. Ramya Sri, Sridevi Sakhamuri, Laith Abualigah
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 1307702 (2024) https://doi.org/10.1117/12.3027104
Every human has their own kind of disabilities, we all try to live and overcome them in our life. We educate ourselves to overcome them, we invent technology to achieve our goals. Sign Language is a communication path for deaf-mute people through hand gestures and actions. Sign Language helps people who can’t speak sign language to interact with the people who can speak sign language, this deep learning paper aims to help build a communication bridge for this reason. We used Amazon Rekognition service which uses Deep CNN algorithm for the detection of static images of the signs. As most of the signs are for words they are in the form of videos. We used the I3D algorithm for the classification of videos of the signs of words. The PyTorch framework provides support for CuDNN (NVIDIA CUDA Deep Neural Network) which provides fast GPU implementations for the deep neural networks. The Experimental results has shown that the models used has displayed good results in detecting the words.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 1307703 (2024) https://doi.org/10.1117/12.3027105
Brain tumors were found to have a strong indicator, the methylation status of the O6-methylguanine-Deoxyribonucleic acid methyl-transferase gene promoter. The value of this indicator suggests the stage and severity of the Tumor. To evaluate the stage of cancer a patient is in, an ensemble model is proposed. This study combined 2D and 3D Densenet, Resnet, and Efficientnet to create an ensemble model and achieved the highest result of 0.87 accuracy, 0.90 precision, 0.86 area under the receiver operating characteristic curve (AUC), and 0.80 recall. The efficiency of artificial intelligence (AI) diagnosis and its relatively high accuracy both help radiologists confirm their evaluations and ensure more safety for patients by double-checking the radiologist’s diagnosis. Brain tumor segmentation allows for precise detection of malignant tumors on radiological brain scans. Conventionally, such a process has been done by trained radiologists, and requires a significant amount of time and effort. In assistance of a specialist, this study proposed to segment cerebral magnetic resonance imaging (MRI) scans with machine learning. Using the Unet Architecture, this study achieved dice score of 0.99 ± 0.00013, an accuracy of 0.99 ± 0.000070, a recall of 0.99 ± 0.00014, and a f1 score of 0.99 ± 0.00013. Through comparing the segmentation ground truth and the model output, we see that the output mappings are largely in line with the ground truth in all categories presented in the ground truth. With computational power of 1 Graphics processing unit(GPU), it takes less than 10 seconds to build the model, read in an input and generate a segmented result. The efficiency of this process can possibly assist radiologists in the process of tumor diagnosis and may provide them the ability to give easier and less costly diagnosis, which ultimately saves time and patients’ lives.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 1307704 (2024) https://doi.org/10.1117/12.3027110
This paper explores the application and challenges of Multiple Input, Multiple Output (MIMO) systems in the near-field range of wireless communication. With the shortening of the wavelength of electromagnetic wave, wireless communication gradually changes from far-field communication to near-field communication. In near-field communication, the spherical wave property of electromagnetic wave dominates, rather than the plane wave property in far-field communication. Therefore, it is necessary to reevaluate the existing far-field theories. The study investigates the use of MIMO to enhance channel capacity in the near-field range, considering the effective transmission distance of electromagnetic waves and the transition from plane to spherical waves. The complexities of utilizing millimeter waves and the potential of other frequency bands are examined. An in-depth analysis is provided on the practical implementation of near-field MIMO systems, including their applicability in current 4G and emerging 5G technologies. The manuscript offers a comprehensive understanding of the innovative approach to improve information transmission in wireless communication, paving the way for future research and development.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 1307705 (2024) https://doi.org/10.1117/12.3027111
Blockchain stands as a paramount research focus in the current technological landscape. While significant progress has been observed in digital currencies such as Bitcoin and Ethereum, a notable research void persists concerning voting systems. This manuscript underscores the merits of blockchain-based voting mechanisms. Embracing blockchain technology can bolster the authenticity and transparency of electoral outcomes, tackle the intricacies of identity verification, and thwart malicious incursions or unauthorized alterations of results. Utilizing the Solidity programming language, a smart contract is crafted to orchestrate a fundamental voting paradigm. The article delineates the logical architecture underpinning this smart contract. A synthesis of existing literature discerns the quintessential functions vital for a robust voting system. In a detailed exposition, the testing procedures for various functionalities of this smart contract are unveiled. To conclude, the discourse suggests a compendium of supplementary features that, when integrated, could enhance the practicality and efficacy of the smart contract in future iterations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 1307706 (2024) https://doi.org/10.1117/12.3027113
In recent years, the rapid advancement of 5G technology has brought to the forefront the pivotal role of Multiple-Input Multiple-Output (MIMO) system algorithms. This paper delves into a comprehensive exploration of two distinct algorithmic approaches within the context of 5G applications for massive MIMO systems. These two approaches are matrix transformation and machine learning, and the following paragraphs will shed light on their respective attributes and intricacies. Matrix transformation is a fundamental technique in MIMO systems, which aims to optimize the transmission of signals by manipulating the channel matrices. This method, while established and reliable, exhibits certain limitations in accommodating the dynamic and complex nature of 5G environments. On the other hand, machine learning algorithms, with their adaptability and capacity for self-improvement, have gained prominence in recent years. They offer a promising avenue for addressing the challenges presented by 5G MIMO systems, such as handling interference and optimizing resource allocation. In this paper, we provide concrete examples to analyze the strengths and weaknesses of both matrix transformation and machine learning in the context of 5G applications. Furthermore, we explore potential directions for the application of these algorithms and propose areas for improvement, with the ultimate goal of enhancing the efficiency and performance of massive MIMO systems in the evolving landscape of 5G technology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 1307707 (2024) https://doi.org/10.1117/12.3027114
The rapid expansion of the Internet and the utilization of big data have significantly contributed to a transformative shift in the tourism industry. As online travel reviews become more abundant, they provide insights into sentiments and attitudes related to travel experiences. This paper mainly concentrates on sentiment analysis of travel reviews utilizing deep learning methods and Transformer models. In particular, we explore the benefits of deep learning, specifically the Bi-LSTM, BERT, and ERNIE models. Rigorous comparative experiments on a database comprising 6,000 travel reviews from Henan Province, China are conducted. Experimental results demonstrate the advantage of the ERNIE model, which incorporates knowledge integration and diverse training tasks. The ERNIE model achieves a prominent enhancement in accuracy, recall and F1 score compared to the previous models. The findings underscore the efficacy of pre-trained language models in sentiment analysis tasks and their capacity to comprehend context and semantic nuances, leading to enhanced performance in sentiment classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 1307708 (2024) https://doi.org/10.1117/12.3027116
The escalating advancement of generative AI models amplifies the imperative for adept data valuation techniques. Amidst a myriad of methodologies, various Shapley value estimation techniques, such as Data Shapley, have garnered attention for their proficient data valuation capabilities, despite computational challenges when grappling with large datasets. This paper introduces an innovative, empirically-driven batch method, aiming to expedite data valuation while preserving precision. This method strategically optimizes training batch sizes and testing subsets, effectively striking a balance between computational efficiency and valuation accuracy, a critical step forward given the substantial volume of data processed in contemporary machine learning tasks. A thorough evaluation of different Shapley value estimation techniques is conducted, underscoring TMC-Shapley for its notable efficacy. Furthermore, the exploration delves into the modelagnostic nature of Shapley value estimations, utilizing diverse machine learning models across distinct training phases. This practice not only demonstrates the versatility of Shapley value methods but also highlights their adaptability and generalizability across varied model architectures, reaffirming the significance of this approach in the broader context of machine learning research. The holistic approach and findings presented herein serve as a robust foundation for future explorations and optimizations in the realm of data valuation, paving the way for more nuanced and efficient methodologies
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 1307709 (2024) https://doi.org/10.1117/12.3027117
Adversarial attacks and defenses are central in deep learning, with various attack methods and defense strategies, including adversarial training, proposed over the years. However, limited research has examined the differences in robustness across models of different sizes. This study seeks to explore these robustness variations through the application of multiple attack methods and attention visualization techniques on four prominent models: VGG16, ResNet18, GoogleNet, and Vision Transformers, employing four popular adversarial attack methods—Fast Gradient Sign Method (FGSM), Basic Iterative Method (BIM), Projected Gradient Descent (PGD), and Carlini-Wagner (CW). Plain adversarial training was used as a defense mechanism. By comparing the resulting changes and discrepancies in correctness, a notable decrease is observed in the robustness of larger models compared to smaller ones after applying this defense strategy. This phenomenon is likely associated with the distinct feature extraction approaches employed by the larger model and its reduced training efficiency. From a practical standpoint, it is advisable to prioritize the use of smaller models in real-world applications. Additionally, techniques like knowledge distillation can be considered to enhance the correctness of smaller models while minimizing computational resource requirements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770A (2024) https://doi.org/10.1117/12.3027119
The research on the detection function and obstacle avoidance function of automatic intelligent machine has become the development trend of contemporary scientific research. To realize the detection and efficient obstacle avoidance of intelligent machines in unknown environments, it is necessary to find a suitable and simple algorithm, which can be applied to most cases. Inspired by the principle of plant phototropism and Braitenberg Vehicle. Obstacles are likened to light sources. Through detecting the distance between obstacles and intelligent machines, then connect the detector directly to the motor. The motor is regulated by transmitting Positive and negative signal from detector that are similarly used as auxin, thus realizing the detection and obstacle avoidance. The experimental results show that the algorithm based on the above design can effectively complete the unknown environment detection work and achieve efficient obstacle avoidance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770B (2024) https://doi.org/10.1117/12.3027120
Parking in a parking lot is a challenging task for most drivers since it requires precise operations within limited space and visibility constraints. Automatic parking technology utilizes advanced sensors, robotics, and artificial intelligence to offer a solution to enhance driver convenience. Various studies have explored different control methods for autonomous parking systems. This paper focuses on investigating the advantages of control systems based on Model Predictive Control (MPC) and Reinforcement Learning (RL) Control. Meanwhile, the paper simultaneously explores the operational mechanisms of MPC and RL Control methods during the parking process, like the equations they used to modify the feedback. Additionally, the control system was trained by using the Proximal Policy Optimization (PPO) algorithm. After three rounds of training, a notable improvement in the parking success rate is observed. The paper also explores the optimization possibilities of the PPO algorithm by modifying the reward function. The results indicate the need for a larger sample size to draw conclusive findings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770C (2024) https://doi.org/10.1117/12.3027121
Cycle-Consistent Adversarial Network (CycleGAN) has been pivotal for image style transfer with the realm of computer vision. However, the quality of CycleGAN’s output often lacks fine-grained details when generating realistic images. Therefore, this paper proposes a realistic image generation approach based on an integrated CycleGAN-Diffusion network to achieve higher image quality with a comparative small model. To evaluate the Diffusion model’s ability to produce high-fidelity images under resource constraints and to compare its image quality with datasets processed using CycleGAN, we apply irregular masks with Gaussian noise during the inpainting and restoration phases. For assessing sample fidelity, we utilizes Mean Squared Error (MSE), Inception Score (IS) and Fréchet inception distance (FID) in this paper. Through extensive experiments, the proposed network is proved to perform better in generating high fidelity images, helping us to achieve FID scores of 5.98 and IS scores of 8.34 at 160×160 resolutions in the process of restoration and achieve IS scores of 7.69 in the inpainting process, both outperforming previous CycleGAN.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770D (2024) https://doi.org/10.1117/12.3027122
As the importance of human-computer interaction (HCI) continues to strengthen and the field of deep learning evolves, numerous models have found their application in the realm of Speech Emotion Recognition (SER), leading to significant advancements in recent years. However, effectively recognizing and processing human emotions through computational systems remains a complex and formidable challenge. This review aims to provide a comprehensive summary of the latest accomplishments in SER, encompassing a diverse range of application scenarios, from education and healthcare to criminal investigation. Additionally, it delves into various models and preprocessing techniques such as Convolutional Neural Networks (CNN), Convolutional Recurrent Neural Networks (CRNN), Long Short-Term Memory (LSTM), and datasets like RAVDESS and RECOLA, which encompass a wide array of scenes and languages. While the recent strides in SER have undeniably achieved impressive accuracy rates, a notable gap exists in research that addresses more intricate emotional contexts, including situations involving irony or sarcasm. Consequently, this review focuses on a comprehensive analysis of the limitations inherent in different feature engineering strategies. Moreover, it investigates the challenge of interpretability posed by complex models, the constraint posed by singular and hard-to-gather datasets, and the expansive scope of potential applications SER could serve. Considering these complexities, a potential pathway to further enhance SER's effectiveness and applicability is proposed. This involves exploring the concept of non-binary emotion classification, harnessing rich contextual information, and integrating datasets that incorporate gesture and textual data. By adapting feature extraction techniques to align with the unique demands of specific scenarios, the performance of SER models could be markedly improved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Siting Luo, Xianghui Meng, Xinran Niu, Hanyue Kong
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770E (2024) https://doi.org/10.1117/12.3027125
Attention-Deficit/Hyperactivity Disorder (ADHD) is a prevalent neurodevelopmental disorder, necessitating accurate diagnostic methods. Our research introduces a deep learning approach using Convolutional Neural Network (CNN) integrated with Bidirectional Long Short-Term Memory (BiLSTM) networks to analyse resting-state functional Magnetic Resonance Imaging (fMRI). This novel method captures intricate spatiotemporal patterns in brain activity, offering insights into ADHD characteristics that surpass traditional diagnostic techniques. Employing the ADHD-200 Sample, our study presents a comparative analysis demonstrating the enhanced efficacy of deep learning in ADHD diagnosis. The integration of CNN with BiLSTM allows for comprehensive analysis of fMRI data, revealing complex neural dynamics associated with ADHD. This approach marks a significant advancement in neuroimaging-based clinical neuroscience, potentially transforming ADHD diagnosis by providing a more objective, accurate, and efficient diagnostic tool. Our findings highlight the potential of deep learning technologies in medical imaging and diagnosis, opening new avenues for research and application in clinical neuroscience. The study underscores the importance of integrating advanced computational methods with clinical expertise to improve diagnostic accuracy and patient care in ADHD and potentially other neurodevelopmental disorders.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770F (2024) https://doi.org/10.1117/12.3027127
With the swift development of deep learning technologies, speech recognition has emerged as an essential tool in the domain of emotion analysis. These technologies are capable of analysing and recognizing the subtle variations in human emotions, thus enriching the emotional dimension of human-computer interaction. However, existing emotion speech recognition models often exhibit vulnerabilities when faced with meticulously crafted adversarial attacks. To address the challenge, a strategy of adversarial training using the Fast Gradient Sign Method (FGSM) aimed at enhancing the robustness of emotion speech recognition systems is proposed. Through a series of experiments, adversarial training with Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) models has notably enhanced the models' resilience to adversarial intrusions, while maintaining a high recognition accuracy. Specifically, the method led to an approximate 7% increase in overall LSTM model robustness and a 3.5% increase for the CNN model against such attacks, with a concomitant reduction in the rate of misrecognition, thereby affirming the efficacy of adversarial training in strengthening model security. This study not only showcases the potential of adversarial training in enhancing the security features of LSTM and CNN models but also opens new avenues for the design and refinement of future emotion speech recognition systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770G (2024) https://doi.org/10.1117/12.3027128
In recent years, facial editing technology using style-gan has developed rapidly. This takes advantage of StyleGAN's powerful generator, but it still presents some problems in practical applications that have been widely identified and proposed solutions. PTI(Pivotal Tuning Inversion) is a technique to optimize generators, which was released in 2021 and is a relatively new method with good effects. But in the actual test, there are still some problems. In this work, two significant flaws regarding PTI were found when it was applied to editing human faces. It is confirmed that this negative effect is widespread and non-negligible in some cases. Following the original paper of PTI, this paper specifically investigates how these defects occur from two aspects. A method of tuning hyperparameters is raised to improve the output inversion image. In the end, a conjecture is proposed that a discriminator could be trained to help the machine learn human preferences, an approach that has the potential to minimize the impact due to feature loss.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770H (2024) https://doi.org/10.1117/12.3027129
In recent years, 3D printing technology has become a hot topic. With the social progress, 3D printing technology has been widely used in industry, medicine and other fields, and with the combination of computer technology, biotechnology and so on. However, current research shows that current 3D printing technology is generally inefficient and relatively expensive. Slicing is a central part of the 3D printing process. It refers to the conversion of a 3D model into a series of layer slices, which are then printed out layer by layer by the printer to construct the entire 3D model. Therefore, improving the efficiency of slicing has become a critical issue. In this paper, we will introduce several methods of 3D printing slicing, expound the advantages and disadvantages of various slicing algorithms, analyze and evaluate several existing slicing algorithms, and summarize the disadvantages of model slicing algorithms, the direction of improvement and future research on the optimization of path planning and multi-material 3D printing cut-in point.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770I (2024) https://doi.org/10.1117/12.3027130
Lung nodules, solid or subsolid lung masses smaller than 3 centimetres, including subtle nodules within complex lung tissue, can go unnoticed by medical professionals due to fatigue or limited expertise. To address this challenge, our study proposes an algorithm for lung region of interest (ROI) computed tomography (CT) image processing based on the Attention U-net architecture and an enhanced variant called Dense-Attention U-net. The Attention U-net incorporates Attention Gates in the decoding path, facilitating the passage of relevant information while reducing irrelevant learning. We evaluate model performance using Dice loss and receiver operating characteristic (ROC) curve analysis. The Dense-Attention U-net enhances the model with dense connectivity in both encoder and decoder sections, ensuring complete layer connections. We used a dataset of 27,190 lung CT images for evaluation. Both U-net variants perform well, with the Dense-Attention U-net outperforming the Attention U-net. The Attention U-net took about eight hours to reach a training loss of 0.13, while the Dense-Attention U-net achieved the same in just half an hour. Notably, the Dense-Attention U-net achieves higher predictive accuracy, with area under the curve (AUC) values of 0.94 and 0.91 for the ROC curves, respectively. Visual results demonstrate excellent segmentation performance for both models. In conclusion, our study introduces and analyses two U-net variants for pulmonary nodule segmentation, emphasizing attention mechanisms and dense connections to enhance feature focus and model efficiency. We acknowledge challenges such as dataset biases and suggest future research directions, including individual nodule labeling and quantification, to enhance diagnostic accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770J (2024) https://doi.org/10.1117/12.3027132
This paper presents a comprehensive review and analysis of the A-star(A*) pathfinding algorithm and its variations. A-star algorithm’s core principles, practical representations, and diverse applications are examined. The study extends to various A* derivatives, including Weighted A*, IDA*(Iterative Deepening A*), ARA*(Anytime Repairing A*), D*(Dynamic A*), LPA*(Lifelong Planning A*), D* Lite, and AD*(Anytime Dynamic D*). Each variant’s unique adaptations and efficiencies are explored, highlighting their suitability for specific challenges in pathfinding tasks. This work aims to elucidate the intricacies of these algorithms, demonstrating their significance and versatility in solving complex navigational problems, thus offering valuable insights for future research and application development in the field of artificial intelligence and game industry.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770K (2024) https://doi.org/10.1117/12.3027134
This paper explores adversarial machine learning attacks in the context of malware detection, focusing on API sequencebased models. The vulnerability of machine learning algorithms to well-crafted attacks is addressed, particularly in the non-invertible and non-differentiable software domain. A preprocessing method is proposed to tackle issues of imbalance and excessive length in API sequences, enhancing model accuracy and reducing training time. Additionally, a universal trigger attack method for API sequence-based malware detection is introduced. This approach demonstrates transferable adversarial triggers, enabling black-box attacks without prior knowledge of the target model. Experimental results validate the effectiveness of the strategy, particularly in reducing attack overhead for deep learning models. Specifically, the average attack effectiveness in the problem space is 86.68%, with an average attack overhead of 0.0020%. Overall, our work contributes to advancing the understanding and mitigation of adversarial attacks in API sequence-based malware detection
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770L (2024) https://doi.org/10.1117/12.3027176
In the rapidly developing field of artificial intelligence (AI), this paper delve into the significant progress and significance of artificial intelligence conversational agents in enhancing human-computer interaction. Transitioning from this observation, the analysis focuses on the challenges and technological advancements in system design, particularly in terms of Natural Language Processing (NLP) and multilingual capabilities. The specific data collected are then cited to provide examples of analysis. At the heart of the analysis lies the emphasis on the adaptability, interoperability, and user-centric design of Artificial Intelligent conversational agents. Finally, the study reveals the transformative potential of Artificial Intelligent conversational agents in simplifying and enriching interactions, pointing to future developments in the field.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770M (2024) https://doi.org/10.1117/12.3027177
This study uses a multimodal fusion model for early student depression detection by analysing student data from Sina Weibo. It compares early and late fusion methods with traditional Natural Language Processing models and achieves a 3% accuracy improvement over 100 cycles. The study shows that standardising only structured data without neural network mapping reduces predictive performance. It was also found that while both fusion methods exhibited similar predictive capabilities, the late fusion model exhibited overfitting, suggesting that there is potential for the late fusion strategy to further improve model performance performance. This study summarises the ability of multimodal fusion models to effectively detect early signs of student depression and lays the foundation for future research on model interpretability for early student depression detection and future research on student behaviour analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770N (2024) https://doi.org/10.1117/12.3027179
As the impact of online reviews on consumer decision-making grows more pronounced, this study is dedicated to identifying service-related issues by analyzing online restaurant customer reviews. We utilized user reviews from the "Beijing Must-Eat List" on DianPing as the data source, employing artificial intelligence alongside spatial geographic analysis methods. Reviews were categorized for sentiment using the Bidirectional Encoder Representations from Transformers (BERT) model, with word clouds created for a visual display. The study also integrates hotspot estimation and kernel density estimation from spatial geographic analysis to delve into the geographic characteristics of sentiments in reviews. The model's effectiveness was assessed using metrics such as Precision, Recall, and F-Measure. Results indicated that our model excelled, demonstrating a precision of 98.73%, recall rate of 91.06%, and an F-Measure of 94.74%. This research offers insightful contributions towards a more nuanced understanding of consumer preferences and enhancing the marketing strategies in the food and beverage sector.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770O (2024) https://doi.org/10.1117/12.3027181
With the development of drones, UAVs are widely used in many fields such as military, life, agriculture, and commerce, among which low-altitude UAVs are the most widely used. Due to the complexity of the low-altitude environment, the requirements for the flight capability of the drone have also increased. In different working scenarios, factors such as wind speed, temperature, and humidity will affect the route of the drone, resulting in increased energy loss and reduced safety of the drone, and it may not be able to complete the task. To address this issue, a path planning method for UAV based on an improved D* algorithm is proposed. On the basis of the high matching between the D* algorithm and the dynamic environment, the cost function of the D* algorithm is dynamically adjusted according to the environmental factors, the heuristic estimation function algorithm is improved, and the correlation weight can be adjusted in real time according to the environment, and the path selection is adjusted and optimized to better cope with the changing environment. Simulation of MATLAB results show that the algorithm can reduce energy loss and enhance the adaptability of UAV to changing environments during operation, and has good adaptability in different environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770P (2024) https://doi.org/10.1117/12.3027183
The use of drones can be seen everywhere in life, and they are used in a wide range of ways. Collecting information and collecting images are also common uses. However, when drones are working, they are often affected by many factors, which ultimately leads to poor quality of data collected. Therefore, in order to increase drone efficiency and make the information obtained more accurate and reliable, many researchers choose to reduce signal noise on drones. Reduce or eliminate noise interference or clutter in drone signals through various technologies (such as deep learning, convolutional neural networks, etc.), thereby improving communication quality and system performance. Therefore, this article will analyze the currently used noise reduction methods, explain the advantages and disadvantages of each in use, and propose directions that can be changed based on the existing problems. By adding different algorithms, each neural network can be combined with a more complete algorithm, and how to choose the appropriate neural network in different environments to avoid shortcomings and maximize the efficiency of the drone. This article has practical significance for enhancing the performance and application of UAVs, and also provides direction and theoretical basis for subsequent research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770Q (2024) https://doi.org/10.1117/12.3027184
Nowadays, the market for UAV is huge, and they are widely used in civilian, commercial and even military. In order to achieve so many functions, the drone’s track tracking strategy is fundamental. Nowadays the PID control technology is quite mature, but this strategy is less robust and easily affected by external factors such as wind disturbance. On the other hand, since the UAV dynamic model is not a black box and most parts can be mathematically modeled, simply using PID cannot make good use of the UAV dynamic model. In order to achieve precise and robust flight control and management and improve the performance and safety of UAV systems, we design a MPC controller by utilizing the UAV's dynamic model and target constraints enables it to perform adaptive control in changing environments and provide more reliable and accurate flight performance. At the same time, Gaussian Process Regression is used to learn the historic data and predict the error, which can be used to compensate the MPC controller, thus enhancing the robustness and adaptability of the system. In the simulation by using SIMULINK and MATLAB, UAV trajectory tracking curves have higher accuracy and higher robustness compared with the results of PID control and ordinary MPC control. This paper uses MPC for the position loop of the UAV with the auxiliary compensation strategy of Gaussian process prediction, optimizes the UAV's more advanced MPC strategy which is also suitable for multi-UAV control, which has certain promoting significance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770R (2024) https://doi.org/10.1117/12.3027186
The drone landing gear is an important load-bearing component on the drone. Especially for small UAVs with short endurance time, poor landing performance and high failure rate, the landing gear is a key component to protect the airframe from damage when landing in various complex environments. Therefore, it is of great value to find the right landing gear for a variety of environments. However, some of the drone landing gear that already exists on the market today mostly has very limited protection for drones. Therefore, in view of the problems existing in civilian small UAVs and the functional blind spots not involved in the existing UAV landing gear on the market, an inflatable UAV landing gear was conceived. In this paper, ANSYS software was used to model and simulate the stress analysis of the three types of landing gear landing on uneven hard planes (simulating hard roads and potholes) and flat plates prone to collapse (simulating snow and sand) on straight rod landing gear, skid landing gear, and inflatable airbag landing gear. Find out the limitations of straight and skid landing gear, as well as the advantages of inflatable airbag landing gear. And put forward the design concept of the inflatable UAV landing gear.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770S (2024) https://doi.org/10.1117/12.3027187
Research on Unmanned Aerial Vehicle (UAV) path planning is crucial for enhancing autonomous flight capabilities. Bioinspired heuristic algorithms have been proven to effectively solve such complex problems. The heuristic algorithm selected in this paper is the Sparrow Search Algorithm. However, due to the limitations of this original algorithm, i.e., the inclination to become trapped in local optima, low search accuracy, and insufficient population diversity, improvements are necessary. To address these shortcomings, this paper introduces the Improved Tent Chaotic Mapping, Opposite-Based Learning strategy (OBL), Gaussian-Cauchy mutation mechanism, and Adaptive adjustment strategy for discoverers and joiners to improve the original algorithm. The improved algorithm is named the Chaotic Mapping Adaptive Mutation-Sparrow Search Algorithm (CMAM-SSA). This algorithm is applied to UAV path planning in MATLAB simulations, combined with a simulation environment featuring mountainous terrain modeling and threat areas. The cost function integrates external environmental constraints, UAV performance limitations, and path planning objectives. Furthermore, a six-degree-of-freedom UAV path tracker is implemented using PID control on the Simulink platform. The simulation outcomes demonstrate that the CMAM-SSA algorithm exhibits a more rapid convergence rate and superior accuracy, affirming its effectiveness and superiority. The excellent performance of the Simulink path tracker provides further validation for the proposed improvements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 130770T (2024) https://doi.org/10.1117/12.3027188
Unmanned aerial vehicles (UAVs) are widely utilized in various fields. However, during mission execution, the occurrence of mechanical failures or subsystem malfunctions, including fuel shortages, may result in the UAV landing in an unspecified area. Additionally, in emergency situations, the UAV may be forced to land in densely populated areas or treacherous terrains. Careful consideration of suitable landing points before touchdown is crucial, making research on UAV landing technology a significant contemporary topic. Current traditional landing techniques, relying on satellite navigation and inertial navigation, face challenges in adapting to complex environments and terrain interference with satellite signals. The application of machine vision-based technology to UAVs presents a promising solution, enabling autonomous landings in signal-deprived scenarios. Therefore, this paper investigates machine vision-based UAV landing technology by processing images captured by the UAV of the terrain. A series of image processing steps are applied to reconstruct the terrain in three dimensions, generating a point cloud map of the terrain. Through the analysis of this map, a range of methods is employed to determine the optimal landing points. The study successfully achieves the three-dimensional reconstruction of the terrain and identifies the optimal landing points, conducting experiments in complex terrains to successfully locate the best landing points and accomplish autonomous UAV landings. This research leverages numerous algorithms to optimize the terrain map, resulting in a more comprehensive point cloud. By combining two landing strategies, the study achieves more precise landing points.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.