PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293401 (2023) https://doi.org/10.1117/12.3017187
This PDF file contains the front matter associated with SPIE Proceedings Volume 12934, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293402 (2023) https://doi.org/10.1117/12.3008110
With the development of Qinghai digital economy, Tibetan carpet industry ushered in a new digital development opportunities for Tibetan carpet digital interaction for the three-dimensional simulation needs gradually increased. The existing realistic simulation of wool fabrics mostly uses image processing, stacked slice network, NURBS generation and other methods. Traditional wool fabric image simulation methods usually have poor simulation speed and reusability due to the complexity of texture simulation. In this paper, a method of three-dimensional realistic simulation rendering of Tibetan carpet based on the physical properties, physical engine shader and illumination characteristics is proposed, which reduces the high computational resource consumption caused by high-resolution mapping. Through the surface scattering relationship experiment of light, the visual performance such as density and graininess between wool fabrics is improved, and the simulation realism is increased, which provides a fast and templated method reference for the realistic simulation of wool fabrics such as Tibetan carpets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293403 (2023) https://doi.org/10.1117/12.3007983
Virtual chemistry experiments play a vital role in today's middle school chemistry experiments, which can enhance students' enthusiasm for learning. However, in the existing virtual experimental system, there is unnatural human-computer interaction and lack of authenticity in operation. In order to solve these problems, the paper proposes an experimental interaction device based on speech and vision, and designs an algorithm to solve multimodal information conflicts. The paper first introduces the methods of speech feature and visual feature extraction, and then based on these two intentions, a weighted average multimodal fusion algorithm is proposed to solve multimodal information conflicts. Finally, a virtual chemical experimental system is designed to combine real experimental equipment with virtual experimental scenarios. Experiment shows that this algorithm greatly improves the user's sense of authenticity and operation, and effectively solves the unnatural problem of human-computer interaction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293404 (2023) https://doi.org/10.1117/12.3008186
This paper proposes a motion intention understanding algorithm for helping the elderly. The main innovations of this method are: (a) A remote control system based on gesture tracking is proposed. (b) An intention understanding algorithm based on motion trends is proposed, which enables the robot to understand the user's intention and thus actively grasp the target. According to the physiological characteristics and living habits of the elderly, the algorithm can help users accurately grasp objects in complex home environment scenes. Using the algorithm in this paper on the Xarm7 robotic arm has achieved a grasping accuracy of 93%, which proves the effectiveness and accuracy of the algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293405 (2023) https://doi.org/10.1117/12.3007969
This article proposes a multimodal fusion auxiliary line intention detection algorithm for smart pens. The main innovation of this method lies in: (a) using a depth camera to track the smart pen, allowing the operator to move the smart pen in virtual space through natural hand movements; (b) Select vertices by hovering; (c) Judge the user's intention to make auxiliary lines through the interaction method of integrating operator behavior and voice commands. Compared to the method of understanding intention solely through single channel information of behavioral actions or voice, the method proposed in this article improves the accuracy of intention understanding, reduces the complexity of user operations, and is more in line with the concept of natural interaction. The method proposed in this article has achieved good accuracy and results in a Unity based virtual teaching environment, proving the effectiveness of the proposed algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293406 (2023) https://doi.org/10.1117/12.3008019
Neural Radiance Fields (NeRF) constructs the connection from 3D position and 2D direction to voxel density and color in the scene by learning the implicit expression of the scene space through a multi-layer perceptron. It is capable to output high-quality images in the task of novel view synthesis. Although NeRF performs well under the ideal conditions of static scenes with precise camera calibration, it can hardly handle freely-shot images, and the low training efficiency also hinders its application in reconstructing real-world scenes. This paper proposes a NeRF model extended by hash position encoding and view-dependent mapping, which is able to better deal with image sets collected from the real-world under complex lighting conditions while improving the learning speed and the effectiveness in recovering scene details. Through experiments, it has been proven to achieve more ideal results than the classic NeRF and its variants.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Hao Cui, Xin Wang, Xiankun Pu, Lei Shi, Zhiqiang Zhou, Jun Gao
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293407 (2023) https://doi.org/10.1117/12.3008006
Traditional multi-view target detection can only obtain the light intensity information of the target, and its detection ability is very limited in complex environments; Traditional polarization target detection mainly relies on a single camera to obtain the polarization information of the target, making it difficult to directly obtain all the polarization information of the target in some complex backgrounds. In order to overcome the limitations of the two detection methods mentioned above, this paper studies a multi angle polarization target detection method based on a polarization array system. This method mainly uses traditional multi perspective image processing algorithms to process polarized images and achieve polarization image stitching. To address the issue of the inability to directly concatenate polarized images in low light environments, this paper proposes the introduction of image enhancement algorithms to achieve the concatenation and target detection of multi angle polarized images in low light environments. Based on the characteristics of polarization feature images, a fusion method based on polarization features was studied to improve the contrast between the target and the background.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293408 (2023) https://doi.org/10.1117/12.3008244
Analyzing the relationship between GEN Z employees job performance and online Short Video behavior is the focus of employment management and guidance in companies under the background of big data. This paper extracts the characteristics of employees' job burnout information and online Short Video behavior data, constructs the student label model with new employees as the research object, we use association rules to mine the relationship between employers performance and online behavior, and analyze the behavior characteristics with different performance. The results show that three typical portraits exists in A employees, three typical paths in A employees, two typical paths in C employees. These results guide employees’ online short video behavior, and provide the application of association rule mining and behavior analysis in job performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293409 (2023) https://doi.org/10.1117/12.3007964
With the development of machine vision technology, using machine vision for target localization and measurement is currently a hot research topic to ensure precise positioning of industrial robots. This paper analyzes the composition of industrial robot systems based on binocular vision technology, designs algorithms for target localization, and trains target images using VisionPro software. Multiple measurement experiments are then conducted using the measurement system to determine the repeatability of the target localization accuracy. Experimental data demonstrates that the repeatability accuracy error of this localization method is within 3um, indicating its simplicity and measurement stability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340A (2023) https://doi.org/10.1117/12.3008215
Substation automation technology as one of the hot areas in the power industry, from the initial RTU remote device and unitary microcomputer protection and measurement and control equipment, to the current integrated intelligent equipment operation and inspection program, its automation level has been gradually improved with the in-depth research of various disciplines. Traditional inspection requires system training for employees resulting in high costs, while in the rainy and snowy weather and China's extreme cold areas of manual inspection there is a high degree of uncertainty, the traditional manual inspection has been unable to meet the growing demand for all types of inspection of substations. Therefore, there is a growing demand for the use of inspection robots instead of manual inspection. The substation to be inspected by the autonomous inspection robot of power system described in this thesis is located in Yunnan area and is suitable for indoor inspection of substations of 100kV and above. In order to meet the needs of the inspection task, the functions of various types of instrument identification, temperature and appearance monitoring of power equipment are completed based on the realization of 3D laser SLAM, completion of dynamic obstacle removal and indoor and outdoor repositioning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wenqi Huang, Ruiye Zhou, Qunsheng Zeng, Yang Wu, Zhuojun Cai, Jianing Shang, Lingyu Liang, Xuanang Li
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340B (2023) https://doi.org/10.1117/12.3007987
Pin defects can seriously affect the safety of transmission lines. Because the pin is small, it is difficult to detect the pin defects. Most existing methods detect pin defects by increasing the number of feature layers or cascade mechanisms. However, since there is too much redundant information in the high-resolution feature map, it is difficult for existing methods to achieve a balance between high-resolution feature maps and inference speed. In this paper, we proposed Sparse RetinaNet to effectively relieve the contradiction between high-resolution feature layer and slow inference speed. Specifically, we introduce high-resolution features in the prediction, and proposed a sparse mechanism to sparse the features in the high-resolution feature layer so as to make use of high-resolution features without seriously affecting the inference speed. Extensive experiments on our own pin defect detection dataset show that our proposed method can significantly improve training efficiency and performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340C (2023) https://doi.org/10.1117/12.3008038
With the continuous advancement of science and technology, three-dimensional (hereinafter referred to as 3D) scene images have gradually become integrated into our daily lives. From immersive tours of museums and art galleries to in-car imaging, current 3D panoramic products are capable of meeting individual user needs. However, due to their lengthy production cycles and high costs, they remain a creative production product that cannot satisfy the demands for building large quantities of fast, real-time, dynamic three-dimensional panoramic pictures. Therefore, in the context of the continuous development of digital information technology, studying process production models and technical specifications is a necessary condition for adapting 3D panorama to the next generation of information service industry. Based on this situation, this paper proposes a series of development directions guided by advanced manufacturing concepts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340D (2023) https://doi.org/10.1117/12.3008093
Virtual reality (VR) is a new type of media that can provide users with an unprecedented sense of immersion. In the process of achieving this sense of immersion, the spatial information perception of human vision plays a crucial role, which is one of the key requirements for human perception of the environment and virtual reality. The existing virtual reality display rendering often adopts a foveated rendering method, which utilizes the characteristics of human vision to save computational resources. Against the background of human visual characteristics, this thesis proposes a real-time computing method for peripheral vision metamer images based on the encoding of peripheral vision to address the lack of peripheral vision encoding in current foveated rendering methods. Our method renders visual metamer images more efficiently and achieves real-time computing with limited additional computing resources.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340E (2023) https://doi.org/10.1117/12.3008033
This article proposes a Facial Expression Hierarchical Detection Network (FHDN), use a convolutional neural network based on multiple branches, about facial expression detection. To further improve feature extraction performance, the method adds an ESSAM module as an attention mechanism. The ESSAM module can adaptively adjust the weight of each feature map, thus improving the performance of the model in feature extraction and facial expression recognition tasks. The method was experimentally evaluated on a self-made dataset, and the results showed that the detection accuracy rate of this model was 81.40%, which is an improvement of 5% and 0.7% compared to YOLOV5 and YOLOV8, respectively. When compared to conventional deep learning techniques, this approach extracts picture characteristics more quickly and accurately without the need for labor-intensive manual labor.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340F (2023) https://doi.org/10.1117/12.3008022
In order to objectively express the current situation of research on large vehicle assisted driving technology in China, the Chinese literature database of China National Knowledge Infrastructure and the patent search and analysis system of the State Intellectual Property Office of China were searched with “large vehicle assisted driving” as the keyword. The results showed that the mainstream basic technology for large vehicle assisted driving technology in China is static 3D panoramic technology. Further analysis showed that future research in this field should focus on real-time dynamic 3D panoramic construction technology that meets the “four characters”.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340G (2023) https://doi.org/10.1117/12.3008391
With the rapid development of the internet industry, the influence of the internet context on the language habits of young people has gradually increased, and there is a tendency to gradually decouple the internet language from the discourse of real life. Contemporary Chinese is the result of thousands of years of accumulation, with a complete grammatical structure, a clear logic of usage, and a concise presentation. However, some internet users are now using a newer, not yet fully validated, collective discourse system initiated by internet users, which is not directly integrated into the learning and transmission of the Chinese language system because of its confusing grammatical knowledge. In order to enhance communication between internet users of both languages and to facilitate the integration of the new internet language system into the contemporary Chinese language system, a crossword puzzle of Chinese words may be of help to internet users. As such, this paper takes the popular foreign game Wordle as an example, quantifies its features and, through modelling, investigates, from the perspective of a game designer. Through the creation of the model and the estimation of the model results, we can see more clearly how to make adjustments to the game in order to give players a better gaming experience and make them more willing to exercise and learn contemporary Chinese in this way, triggering them to think deeply about the relationship between contemporary Chinese and online terms. Some further study specifically pay attention to the Latvian version and look into how people form their guesses given any already uncovered hints. They analyses guess patterns, easy and difficult word characteristics, and player behaviour and response.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340H (2023) https://doi.org/10.1117/12.3008406
Speech recognition has made breakthrough progress and been widely used. Along with the development of speech recognition, new requirements are constantly put forward. First, acoustic parameters are related to the natural attributes of speakers; second, the calculation of acoustic parameters depends on a large range of corpus resources; and in the aspects of language recognition, speaker recognition, speech visualization and automatic speech annotation, more effort needs to be put into research. English contains 48 phonemes, and the correct recognition of phonemes is an important basis for the analysis and study of the acoustic characteristics of continuous intonation. In this paper, the convolutional neural network is first used to extract visual features of different scales, and the image features of different scales are fused effectively, so that the fused feature vector contains more detailed image information, and effectively alleviates the problem of image information loss. Then, an intonation acoustic feature recognition model based on attention mechanism is constructed, which takes into account the early and late fusion of features and improves the effectiveness of information fusion. The experimental results show that the training error of the model in this paper decreases gradually with the increase of the number of iterations and tends to be stable after 1000 iterations. The model basically converges and has reliability and feasibility. In the phoneme recognition experiment, for sentences with more phonemes and sentences with fewer phonemes, the recognition rate of the model in this paper is more than 60% and the loss rate is less than 5%, and about 60 phonemes can be recognized per minute. Therefore, the model presented in this paper improves the results of English intonation acoustic feature recognition to a certain extent, which is successful.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Haifeng Zhang, Jiaxin Wu, Delian Liu, Jiaxin Duan, Gao Guo, Gaopeng Zhang, Long Ren, Jianzhong Cao, Chao Mei
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340I (2023) https://doi.org/10.1117/12.3008182
In recent years, in the aerospace field, higher requirements have been put forward for the position and pose measurement tasks such as engines. Computer vision technology is the key technology to achieve the target measurement tasks. Rocket engine pose measurement provides technical support for engine pose control strategy optimization. Based on the principle of monocular vision, this article designs a measurement system with strong stability and high accuracy for launch vehicle engines, and verifies the effectiveness of the measurement system through two sets of simulation experiments. In terms of image preprocessing, the noise reduction effects of spatial filtering and wavelet transform were compared, and a preprocessing method combining wavelet transform and image enhancement was designed; In terms of feature extraction, improvements have been made to the traditional Accelerated Feature Segmentation Test (FAST) algorithm to reduce the impact of lighting on feature point extraction under different postures, while retaining the advantage of high-precision measurement algorithms under point features; In terms of pose measurement, the centroid and translation vector pre pose iterative optimization algorithm is adopted, which greatly reduces the computational complexity in the pose calculation iteration process and improves the speed of spatial target pose calculation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340J (2023) https://doi.org/10.1117/12.3008027
In modern industrial development and the era of intelligent production, fabric defects have become a crucial aspect of the textile industry. The accurate identification of defects on fabrics through computer vision technology holds significant research value. This article provides an overview of machine vision detection techniques for fabric defects. It classifies and analyzes existing defect detection techniques, summarizes achievements and innovations in recent years, and examines the research progress in defect detection. The article discusses various models and theories in detail, compares commonly used defect detection techniques, and identifies their main shortcomings. It analyzes the research status in the field of defect detection both domestically and internationally in recent years, provides a comprehensive discussion on relevant models and theories, compares them with current common defect detection techniques, highlights the imperfections in mainstream fabric defect detection, and offers predictions for future development trends.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Shengzhou Luo, Weijie Li, Lu Han, Chengju Zhou, Jiahui Pan
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340K (2023) https://doi.org/10.1117/12.3008070
We present a novel approach for optimizing transfer functions to generate volume rendering images that closely resemble a target image. Our approach uses a differentiable volume ray-casting renderer to compute transfer function parameters and leverages gradient-based optimization to minimize the difference between the rendered and target images. Additionally, we introduce a convolutional neural network to learn the volume rendering image’s local characteristics and optimize color transfer parameters from the target image. This results in an output image with a similar color style to the target image and significantly improved visual quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340L (2023) https://doi.org/10.1117/12.3008021
Semantic image synthesis aims to generate realistic images from semantic label maps. Current generative adversarial network models still struggle with understanding visual information with irregular topological structures because they rely on convolutional neural networks, which interpret semantic label map as grid structures. In this work, we propose a Vision GNN Generative Adversarial Network (VG-GAN). In our redesigned generator, the input semantic label map is embedded in patches and then converted into a graph structure. We use graph neural networks on the constructed graph to learn the complex interrelationships in the graph structure. Experimental results show that our proposed method generates images with irregular and complex objects that appear more realistic and perform better than state-of-the-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340M (2023) https://doi.org/10.1117/12.3008041
In on-orbit control missions, it is important to use more stable line segment features to measure the pose information of the space target. However, traditional line segment detection methods are prone to failure under complex space illumination environments. To address this, we propose a robust line segment detection method combining geometric properties of line segments. This method improves the anchor extraction algorithm by using the inherent properties of the line segment. In the line segment validation stage, the line segment is verified by two steps. The Helmholtz method is introduced to verify the initial line segment and the line segment is further verified by the anchor density of the line segment. We verify the effectiveness of the proposed method by using visible image data collected on a semi-physical simulation platform and find that the proposed method is superior to the traditional line segment detection methods in terms of accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340N (2023) https://doi.org/10.1117/12.3008208
The spatial offset between the visible and thermal images can pose challenges for accurately matching objects in both modalities, thus affecting the performance of object detection algorithms. To address this, we propose a feature alignment-based algorithm for unaligned RGB-T image object detection. Our approach admits the coarse-to-fine feature alignment strategy, incorporating an attention-guided feature offset prediction module. Additionally, a multi-headed self-attention mechanism is introduced to predict and correct the feature map offsets for visible images during the feature extraction stage. To further correct the offset between RGB-T features, a region of interest alignment module performs quadratic regression for each candidate frame in the pooling stage. Furthermore, our algorithm introduces a light-aware weighting module to adaptively adjust the contributions of different modalities by reweighting the region of interest features. Experimental results on the FLIRADAS dataset demonstrate that the proposed method achieves high accuracy and stability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340O (2023) https://doi.org/10.1117/12.3008092
A lightweight face recognition algorithm based on MobileNet is proposed in this paper to address limited computational power and storage resources in patient recognition by mobile nursing robots. Firstly, MobileNet-v2 is used as the backbone network, and redundant Block blocks are pruned to reduce the number of parameters. Secondly, ShuffleNet's spatially separable convolution is introduced in the residual blocks to increase network parallelism. Finally, the original Softmax loss function is replaced with an improved ArcFace loss function, which includes a Taylor expansion in the Target logit value, to enhance network constraint and achieve better separability. Experimental results show that the improved face recognition algorithm achieves a combined recognition rate of 97% and a combined average speed of 0.725 s, fulfilling the goal of designing a lightweight and efficient deep learning network.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340P (2023) https://doi.org/10.1117/12.3008398
Image tampering can easily be used in illegal activities such as false propaganda, fake news and falsifying evidence in court which may have a negative impact on society. Therefore, we need to constantly update and improve the image tampering detection technology. TransUNet is an efficient model for medical image segmentation. This paper modified and migrated TransUNet which to image forgery localization, and conducted experiments on standard datasets to demonstrate the effectiveness of TransUNet. The findings shows that the proposed framework is superior to several existing forged area localization techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340Q (2023) https://doi.org/10.1117/12.3008532
Image style transfer refers to the process of merging the content of a source image with the style(s) of one or more reference images, thereby creating images that combine the original content with other styles. This dissertation focuses on using a convolutional neural network (CNN) to achieve this goal. The image style transfer is completed with data augmentation, a loss network, and an image transformation network. The VGG-19 network is used to extract features in the loss network, and the content loss function and style are optimized iteratively through gradient descent. Additionally, a custom residual module network is trained to enable a specific style conversion of the image. As a result, the final model shows significant improvement, with the final style loss reduced to 2000E+4, and the total loss reduced to 6000E+4, thus achieving good results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340R (2023) https://doi.org/10.1117/12.3007990
Currently, deep learning has been widely used in the field of hyperspectral image classification, in which hyperspectral image classification methods based on pixel-level and super-pixel-level feature fusion combine the respective advantages of convolutional neural networks and graph convolutional neural networks to achieve better classification results. In this method super pixel-level graph neural network uses linear discriminant analysis to downscale hyperspectral remote sensing images, however, linear discriminant analysis as a supervised linear method cannot downscale the nonlinear hyperspectral data well. To solve this problem, we implemented a hyperspectral image classification method based on kernel principal component analysis for hyper-pixel-level graph neural networks. In general, principal component analysis is applicable to linear dimensionality reduction of data, while kernel principal component analysis can achieve nonlinear dimensionality reduction of data for processing linearly indistinguishable data sets. Extensive experiments on both datasets show that the improved method in this paper is competitive compared to other mainstream methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340S (2023) https://doi.org/10.1117/12.3008065
The incorporation of human-computer interaction technology into senior care robots in recent years has pushed the development of HCI technology in a more compassionate direction, with gesture interaction being the emphasis of research in this area. This paper analyzes the usage of a gesture recognition system based on ResNet+LSTM on an elderly care robot that is challenging to comprehend due to poor gesture detection accuracy, misjudgment, and gesture interaction intention. The process begins with the creation of a gesture dataset, followed by the extraction of gesture features using ResNet, the training of a neural network model using LSTM, and lastly the application of the trained model to the Pepper robot for algorithm evaluation and verification. According to the testing findings, the algorithm significantly improves the aging care robot's ability to recognize gestures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340T (2023) https://doi.org/10.1117/12.3008312
The digital, intelligent and visual operation of substations has become a current research trend. In order to obtain all-weather substation image information, this paper proposes to enhance the low-light image at night based on gamma correction, then match the enhanced image with feature points, transform the image by Fourier-Mellin transform (FM) to obtain the overlapping area; use the fast approximate nearest neighbour (FLANN) matching algorithm for coarse matching, introduce the progressive sample consensus (PROSAC) to finely filter the feature points to improve the correct rate; finally, the stitched image is weighted fusion processing to remove the stitching seam. (PROSAC) to finely filter the feature pairs to improve the correct rate; finally, the stitched images are weighted and fused to remove the stitching seams. The experimental results show that this algorithm can enhance the night image, improve the matching accuracy and meet the requirements of stitching quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Ke Zhang, Wenning Hao, Xiaohan Yu, Tianhao Shao, Qiuhui Shen
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340U (2023) https://doi.org/10.1117/12.3008039
The interpretable image classifier VAE-FNN can extract high-level features for classification from complex image information and provide explanations that are consistent with human intuition. However, due to the insufficient reconstruction ability of VAE, there are still challenges in feature extraction and interpretable classification for highdefinition images. An image preprocessing method is proposed in this paper and a model named E2GAN that can extract low-dimensional interpretable features from high-definition images is constructed. The model is based on a pre-trained StyleGAN generator, and two mapping networks are trained, one for extracting the low-dimensional compressed encoding of the input image and the other for restoring it to the matrix representation required by the StyleGAN generator, which effectively improves the quality of feature extraction and image reconstruction. A discriminator is introduced to perform adversarial training with the mapping network, further improving the realism of the reconstructed image. The training algorithm of the E2GAN model is designed, and a decoupling loss for the low-dimensional encoding is added to further improve its semantic interpretability. Experiments on the CelebA-HQ dataset show that the E2GAN model can extract low-dimensional, semantically informative features from high-definition images, which can be used to train high-precision and interpretable fuzzy neural network classifiers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340V (2023) https://doi.org/10.1117/12.3008026
Semantic segmentation of high-resolution traffic scene images is a challenging task due to complex backgrounds, diverse object shapes, similar appearances of multiple objects, and multi-scale characteristics of the same object. Many existing semantic segmentation networks only perform simple feature fusion when dealing with different object shapes and multiscale properties of the same object, often failing to provide satisfactory results. To address these issues, we propose an end-to-end multi-scale feature fusion network called EFFNet for traffic scenes semantic segmentation. EFFNet adopts an encoder-decoder structure, where ResNet-34 is used as the backbone network for feature extraction. At each stage of feature extraction, we introduce the Feature Fusion Module for multi-scale information. The Feature Fusion Module fuses the feature maps extracted by the backbone network at different stages, enriching the spatial and semantic information of the feature maps. Through end-to-end training, the accuracy of the segmentation results is significantly improved. We evaluated EFFNet on the traffic datasets Camvid and Cityscapes using a device equipped with a GTX 1650Ti graphics card. The results show that EFFNet achieves Miou scores of 55.3% and 50% on these two datasets, respectively. This indicates that EFFNet outperforms existing methods and demonstrates excellent segmentation accuracy in traffic scenes semantic segmentation tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340W (2023) https://doi.org/10.1117/12.3008207
Accurate stroke outcome prediction is of great significance to making treatment plans and evaluating the rehabilitation state of patients. Previous works paid more attention to the basic information and volume of ischemic tissue for predicting outcomes, ignoring the role of the whole-brain. The purpose of this paper was to prove the value of wholebrain features in outcome prediction. In detail, the pre-trained Med3D model was used to extract whole-brain features from minimum intensity projection (MinIP) of PWI-DSC images, the Least absolute shrinkage and selection operator was used to select outstanding whole-brain features, and ten machine learning models were applied to validate the role of the selected outstanding whole-brain features on predicting outcomes. As the results, when taking ResNet10, ResNet18, ResNet34, and ResNet50 as encoders in the Med3D model, the best AUC of outstanding whole-brain features were 0.88, 0.939, 0.781, and 0.883, and the mean ± std on the ten machine models were 0.756 ± 0.097, 0.766 ± 0.123, 0.714 ± 0.044, and 0.761 ± 0.105, respectively. It can be concluded that the outstanding whole-brain features extracted from the MinIP image can predict good outcomes and poor outcomes for ischemic stroke patients, and the whole-brain features from ResNet18 performed best. The method provided in this study may provide new insight for ischemic stroke research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340X (2023) https://doi.org/10.1117/12.3008219
The detection of small objects by oriented bounding box in aerial images is a recent hot topic. However, since the aerial images are not collected at the same height, the Ground Sample Distance (GSD) is different for each image, so that small objects are easily overlooked. Existing algorithms are designed for multi-scale object detection, and feature fusion is time-consuming, resulting in a large amount of model parameters that is not easy to deploy on embedded devices. We propose three methods to address the above problems. First, we scale the collected aerial images to the same scale according to the GSD value. Second, we change the structure of Feature Pyramid Network (FPN) and only keep the necessary low-level feature maps. Finally, we rescale the anchor for the specific scene. We validate our proposed method on the DOTA dataset. The results show that the modified model using our method can identify more small-scale objects, and the maximum number of model parameters can be reduced by 2.7%, the inference speed can be increased by 13.24%, and the model size was reduced by up to 28% when the detection accuracy is the same as the original algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340Y (2023) https://doi.org/10.1117/12.3008139
Ultrasonic Lamb wave nondestructive testing (NDT) technology is a fast, long-range, wide-area, relatively low-cost NDT method that has been developed in recent years. Because of the dispersion and multi-mode characteristics, the reflected signal cannot be directly used for imaging, and needs to be compensated for dispersion, and also needs to be extracted by mode separation if it contains multiple modes. The total focusing method (TFM) is a widely used array imaging technique due to accurate imaging results and large dynamic range, but its imaging signal-to-noise ratio (SNR) is low due to the separate excitation and separate reception. To overcome this problem, this paper proposes an modified TFM imaging method combining sign coherence factor, by which the pixel value in the defectd region can be enhanced and the pixel value caused by noise interference can be reduced, thus improving the SNR of imaging results as well as the horizontal resolution and improving the imaging quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340Z (2023) https://doi.org/10.1117/12.3008055
A medical image fusion algorithm based on multi-scale co-occurrence filter and ResNet152 is proposed. Firstly, the source image is decomposed through a multi-scale co-occurrence filter, which can effectively preserve edge structure information while obtaining detail and contour information at different scales. Using ResNet152 to extract source image features and generate initial weight maps for the base layer. Obtain the entropy map of the source image through the entropy function, and combine it with the initial weight map to generate a weight map. Used for basic layer image fusion, improving image contrast. In detail layer fusion, the introduction of the maximum symmetric surround saliency algorithm is used to extract salient feature of the intermediate base layer image. The obtained salient feature map is processed through guided filter to obtain weight map, and the detail layer images are weighted and fused to effectively utilize detail information. Finally, the fusion images of the basic layer and the detail layer are reconstructed to obtain the fusion image. The experimental results show that compared with existing fusion methods, the proposed method achieves better results in both subjective and objective evaluations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293410 (2023) https://doi.org/10.1117/12.3008005
Image description technology is an important research direction in the field of deep learning. It is a task that uses computer vision techniques and natural language processing techniques to generate textual descriptions of the image features extracted from the corresponding images into high-level semantic information, i.e. to enable computers to learn the ability to "read pictures and talk". This paper collates several representative research methods that have emerged successively in the continuous development of image description. The popular template- and retrieval-based image description methods at the beginning of the research, and later, as deep learning flourishes, deep learning-based image description techniques have become mainstream, starting from end-to-end encoder-decoder, subsequently, the model began to be refined using the attention mechanism, and nowadays, new techniques based on Transformer technology and generative adversarial networks have greatly improved the accuracy of description.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293411 (2023) https://doi.org/10.1117/12.3008558
With the need of application upgrading in the field of computer vision and pattern recognition, dynamic face recognition technology has also been rapidly developed. At present, there are many face recognition processing methods in the field of video surveillance Only in the front end of the device to do simple recognition and tracking, a large number of complex operations, including face eigenvalue comparison and big data processing part are completed in the server side. For users of video surveillance, the method can not display the face recognition and comparison results in the monitoring terminal in real time. Aiming at this demand, this paper proposes a face recognition video transmission scheme based on edge computing to realize video transmission. The synchronization of face recognition effect between the acquisition and the player in the video. In this scheme, the video processing module of the terminal device simultaneously carries out video coding and face recognition to the video data, according to the terminal face. Database, extract and analyze the face recognition results and record the coordinate information in the video, and the synchronous packet mechanism is used to fuse the two, so as to obtain the video data package with both encoded data and face information. Through the verification system test, it is proved that the scheme can solve the synchronization problem of face frame display and tracking in live video, and has high accuracy and real-time performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293412 (2023) https://doi.org/10.1117/12.3007980
The image caption is simply to input an image to the computer model, and the model outputs an accurate text caption according to the image to be described. This paper summarizes image caption methods based on template, retrieval and deep learning. The development history of different methods is introduced respectively. Among these, image labeling methods that use Deep Learning are effective, with the encoding code model being the most commonly used. Then it introduces the data sets and evaluation indicators that are widely used in the field of image caption. Finally, the current challenges of image caption methods are analyzed, and the future development direction is prospected.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293413 (2023) https://doi.org/10.1117/12.3008029
Neural Best Buddies (NBB) has been proposed to tackle the challenge of identifying semantically related features or geometrically similar parts between pairs of images that exhibit substantial variations in appearance and semantic categories. However, NBB does not have good robustness when facing rotating scenes. To address this limitation, we integrate SuperGlue, a matcher that can successfully match image pairs with significant rotation differences in the same scene, into the nearest neighbor finding process of NBB. This integration enables our improved NBB algorithm has better rotational robustness in cross-domain image matching. Experimental results demonstrate that our approach outperforms NBB.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Kaizheng Li, Yusheng Hao, Qiaoqiao Li, Weilan Wang
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293414 (2023) https://doi.org/10.1117/12.3008395
Aiming at the problems of false detection and missing detection of texts in the process of text detection caused by random distribution of Tibetan texts, various scales and shapes in natural scenes, this paper proposes a natural scene Tibetan text detection algorithm based on feature enhancement of spatial attention mechanism. The spatial attention mechanism is introduced into the pyramid network module of feature extraction to extract richer local and overall information and enhance the ability of feature extraction; feature kernel clustering can better distinguish adjacent text instances, and the predicted similarity vector is accurate Aggregate text pixels to the corresponding text kernel, further improve the accuracy of scene Tibetan detection, and effectively reduce false detection and missed detection. The model is evaluated on the TCSD scene Tibetan dataset, and the results show that the F-measure comprehensive index of this method reaches 81.09%, which is better than the previous scene Tibetan detection algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Zhi Yang, Xiaojun Dou, Sihang Zhang, Chuang Li, Mengxuan Li, Shaohua Wang, Te Li, Chang Liu, Bin Liu, et al.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293415 (2023) https://doi.org/10.1117/12.3007962
Facing the extraction of hidden danger information of external breakage to transmission lines, we first introduce lidar and optical remote sensing image data acquisition, processing, and information extraction technologies, then analyze the characteristics of transmission corridors obtained by these two sensing methods, and summarize their respective advantages and disadvantages. On this basis, we propose a method for external breakage hidden danger information of transmission lines by combining remote sensing images and LiDAR point clouds. This method can simultaneously acquire the texture and spatial three-dimensional data of the external breakage target in the transmission corridor, and form high-quality spatial three-dimensional model data, which is conducive to the effective identification and accurate extraction of external breakage hidden danger.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293416 (2023) https://doi.org/10.1117/12.3007960
To enhance the precision of detecting small targets in remote sensing images, a target detection algorithm based on improved faster R-CNN is proposed in this paper. In order to enhance the ability of feature extraction for image object category, the algorithm uses the large network model EfficientNet in the backbone feature extraction network. introduce an improved FPN (feature pyramid network) model to reduce that loss of key information of small target caused by the deep neural network, enhancing the capability of extracting feature information of the small target of the fast R-CNN model to the remote sensing image, and being capable of better coping with the remote sensing image with complex background and drastic size change, thereby reducing the omission rate. Finally, CBAM attention module is introduced into the feature graph output of the network model to enhance the interest of the model in feature information and improve the detection ability. Experiments on the NWPU_VHR-10 dataset show that the proposed algorithm improves the accuracy by 8.82%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293417 (2023) https://doi.org/10.1117/12.3008203
Efficient spatial-temporal feature extraction from input video streams is crucial for dynamic gesture recognition. In the task of video classification, convolutional neural networks (CNNs) are widely used as feature extractors, while methods based on recurrent neural networks (RNNs) are commonly employed for sequence modeling. However, RNNs lack the ability to model global dependencies and have a limited attention span in the temporal dimension. This becomes a performance bottleneck for dynamic gestures that require sensitivity to temporal correlations. To address this issue, this paper proposes a dynamic gesture recognition model called R(2+1)D-Transformer. It is a Transformer-based approach that focuses on global modeling. Firstly, the R(2+1)D network is employed as a spatial-temporal feature extractor to capture the spatiotemporal information. Then, self-attention-based Transformer is used to map the spatiotemporal feature sequence to the semantic representation of gesture movements, considering both the temporal and spatial context. Finally, the gesture recognition results are obtained through an MLP classification head. Experimental results demonstrate the effectiveness and potential of the proposed R(2+1)D-Transformer model on two publicly available dynamic gesture datasets, IPN-Hand and NvGesture. The promising performance of the proposed approach provides valuable insights and reference for further research and applications in dynamic gesture recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293418 (2023) https://doi.org/10.1117/12.3008196
In the field of classification and recognition of mechanical parts, in order to solve the difficulties of small target detection, adhesion and overlapping parts classification and recognition, this paper proposes an automatic classification and recognition algorithm of mechanical parts based on improved convolutional neural network YOLOv5. The feature pyramid module PANet is added to the traditional YOLOv5 network structure to strengthen the characterization ability of the network. At the same time, NMS algorithm and image segmentation algorithm are used to improve the detection ability of network boundary segmentation, to improve the detection accuracy and recall rate. The experimental results show that both the traditional YOLOv5 and the improved YOLOv5 in this paper can realize the classification and identification of mechanical parts, and the improved YOLOv5 has better detection precision and accuracy, faster detection speed, and better network robustness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 1293419 (2023) https://doi.org/10.1117/12.3008023
Underwater image saliency detection makes the terrestrial saliency detection model less effective in the application of underwater vehicles due to factors such as turbid water and unstable light, resulting in a degradation of the model's performance. We offer a model for underwater saliency detection based on an improved attentional feedback mechanism to overcome the aforementioned issues. The features at the top and bottom levels are effectively fused by forming a cascaded feedback decoder through the cross feature module and adding channel space attention, after which the residual refinement module is added for further refinement. The training and testing process uses underwater open datasets. The experimental findings demonstrate that our method is superior to other ways in comparative analysis with four general saliency detection methods and two underwater saliency detection methods on four underwater image datasets, proving the model's viability and efficacy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341A (2023) https://doi.org/10.1117/12.3008483
In order to improve the accuracy of gait recognition by obtaining more information such as local features of the input gait contour map, the discriminative gait features cannot be extracted due to the occlusion of pedestrians' clothes and backpacks in gait recognition. We propose to optimize the GaitPart model for gait recognition research method. First, on the basis of the original GaitPart model, due to the problems of slow convergence, unstable performance and easy overfitting of the ternary loss function during training, we propose to use a joint loss function with L2 regularization to co-supervise the training of the gait recognition network in order to improve the discriminative ability of the gait recognition network and thus improve the gait recognition accuracy. Finally, the results of the study on the CASIA-B public gait dataset through extensive training show that the cross-view gait recognition accuracy of pedestrians under normal walking conditions is 96.592%, the cross-view gait recognition accuracy of pedestrians carrying backpacks is 92.565%, and the cross-view gait recognition accuracy of pedestrians under clothing obscuring conditions is 80.825%, the accuracy in Rank-1 has been greatly improved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341B (2023) https://doi.org/10.1117/12.3008015
Depth image estimation is an important technology in integral imaging display system and its image quality has been widely concerned by researchers. In recent years, more and more researchers extract the depth images of RGB images by deep learning. To obtain high-quality depth images and solve the problem of unclear edges and incomplete outlines, we add semantic segmentation module (SSM) to the depth estimation network (DEN) to share parameters of depth estimation and semantic segmentation (DE&SS). The SSM extracts multi-level semantic feature information, fuses global feature information and local feature information effectively, and guides depth estimation with more abundant semantic feature information to improve depth image quality. Experiments are carried out on the general datasets NYU-Depth V2 and KITTI. According to the experimental results, the depth images obtained by the proposed method have clearer edges and more complete outlines than that obtained by other advanced methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341C (2023) https://doi.org/10.1117/12.3008209
Surface defect detection is an indispensable part of industrial production in order to guarantee product quality. With rapid development of deep learning, automatic surface defect detection is gradually applied to a variety of industrial scenarios. However, defect detection still faces some challenges, such as diverse defect types, various defect size and texture structures. To address the problems, we proposed a local and global feature fusion network (LGFNet) for surface defect segmentation. The network adopts a U-shaped encoder-decoder structure with a convolution-based local feature extraction unit (LFE) and a transformer-based global feature extraction unit (GFE). LFE utilizes multi-head convolutional attention to obtain the detailed textures of defects, and GFE utilizes dual attention module to obtain global contextual information of defects. LGFNet cross-cascades the two feature extraction units to obtain multi-scale defect features, thus adapting the segmentation network to different types of defects. Experiments on two widely used surface defect datasets (NEU-Seg, Road Defect) demonstrate that the network can accurately segment defects of multiple shapes and sizes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Virtual Reality Technology and Big Data Processing
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341D (2023) https://doi.org/10.1117/12.3008184
As the main carrier of Metaverse VR technology, the head-mounted display system plays a vital role in leading this technological trend. This paper implements a head-mounted display system based on Metaverse VR technology. It mainly applies the high-resolution LCD screen, microcontroller, and embedded programming technology. The system that collects data from gyroscopes and directional sensors in the head-mounted display terminal to realize the angle of view and scene switching on a developed Android system. Through the handheld terminal based on STM32 microcontroller, the system realizes human-system interaction. It isolates people's vision and hearing from the outside world and guides the user to a sense of being in a virtual environment. Then, the image generated by the small display in the helmet is magnified by the Fresnel lens, and a three-dimensional sense is generated in the mind by obtaining this differential information from the human eyes through the different images of the left and right eyes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341E (2023) https://doi.org/10.1117/12.3008313
This paper presents a novel algorithm for fast approximating scattered data, based on leveraging the parallelism of Graphics Processing Unit (GPU) to achieve high performance. The algorithm is designed to transform all key steps of computation into the corresponding CUDA kernel functions, which allows for the simultaneous execution of thousands of threads. We also employ powerful optimization techniques in our implementation to avoid atomic operations and to enhance program efficiency. Experimental results show that our algorithm achieves significant speedup comparing with conventional CPU-based solutions. Furthermore, using adaptive spatial subdivision, we extend our algorithm to handle large scale datasets, and it has good adaptability even for arbitrary datasets with highly varying data densities.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341F (2023) https://doi.org/10.1117/12.3008018
3D ink scene rendering is a kind of non-realistic rendering, and this non-realistic 3D ink effect is the focus of this paper. This paper first introduces the production characteristics of 3D ink and wash rendering and Ureal Engine5, then based on the traditional Chinese ink and wash art characteristics, the 3D ink and wash rendering technology with reference to manual drawing mapping and cartoon strokes is designed to realize the workflow of 3D ink and wash effect in Unreal Engine 5, then the proposed design method is verified through practice, and finally the conclusion and summary are presented for the design concept and implementation, and the Finally, we conclude and summarize the design concept and implementation, and look forward to the future application of 3D ink scenes produced by UE.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341G (2023) https://doi.org/10.1117/12.3007978
In this paper, we design a Kinect + Unity-based AR human-computer interaction algorithm for chemical experiments. The innovations of this paper are (1) Solving the difficulties of the traditional gesture recognition algorithm with a single type of gesture and not accurate enough recognition, and improving the robustness of the experimental operation; (2) Adding the interaction method of combining gesture recognition with speech recognition, solving the single determination method of determining the experimental steps with only gestures, and avoiding some errors occurred in the process of gesture recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341H (2023) https://doi.org/10.1117/12.3008441
Research on the stage control system of musical dramas to quickly demonstrate the effectiveness of stage design, provide a basis for stage control, build a 3D rendering and control engine, establish a dynamic 3D simulation model related to spatiotemporal architecture, present comprehensive and nonlinear effects of musical stage creativity, and output data such as the state and position of dance beauty creativity during the simulation process for precise stage control. Through the behavior modeling of the three-dimensional digital stage, the system completes spatial analysis and calculation, realizes WYSIWYG stage design presentation algorithm, and carries out high credibility display and effect evaluation of stage creativity. Based on the mechanical characteristics of the three-dimensional digital stage, the system implements a video editing method based on projection decomposition, which segments and recombines video data in a parallel and multi-core fast calculation method, and generates video files that match the stage motion. Engineering practice has proven that the system can effectively improve the practicality, reliability, and efficiency of stage design.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341I (2023) https://doi.org/10.1117/12.3008439
The way grottoes are displayed has disadvantages such as low sense of historical scene immersion and poor user interaction. A virtual reality based grotto model display system can effectively solve the above problems. The system uses 3D modeling technology to define the scene and model, and completes the process of scene selection and model rendering through the development of Unity3D virtual engine. After testing, the average processing time of various operations in the system is less than 0.1 seconds, and under normal circumstances, the error probability is only 0.2%. This system provides users with a virtual historical experience and the ability to communicate and interact with cave models, providing a new and effective way to visit caves.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341J (2023) https://doi.org/10.1117/12.3008028
As one of the most fundamental areas in computer vision, pedestrian detection aims to locate each object instance with a bounding box to represent its position and boundary in an input image. Recently, pedestrian detection has been attracting extensive attention in academia and industry with its essential role in high-level vision research and actual downstream task. In this paper, we build a pedestrian detection algorithm based on YOLOv5s to locate the position of a pedestrian in the input in the image. To address the problem of complex background and the human body is not easily detected by occlusion, an improved Inception module is added to YOLOv5s to extract different scale feature information. In contrast, CBAM attention is added to the CSP structure, and the EIoU loss function is used to improve the accuracy and generalization of the network and achieve accurate bounding box regression. The model is trained on the pre-processed VOC 2012 dataset to build a pedestrian detection model. Compared with the original YOLOv5s model, pedestrian recognition's accuracy and average precision are improved by 2.5% and 3.1%, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341K (2023) https://doi.org/10.1117/12.3007966
Due to the inconsistent color tones of ancient buildings, the accuracy of model construction is low. In order to improve the fine reconstruction effect of ancient buildings, a multi-angle LiDAR measurement data adaptive fusion model for fine virtual reconstruction of ancient buildings is proposed. Obtain fine data through data collection, preprocess the fine data of the ancient building, set judgment conditions based on the processing results, extract fine feature information of the ancient building from multi angle LiDAR measurement data, and achieve fine virtual reconstruction of the ancient building. The experimental results show that the fine virtual reconstruction scene under the proposed algorithm is clear and complete, and the reconstruction results of the length, width, and height of ancient buildings are consistent with the actual measurement results, with high accuracy and good results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341L (2023) https://doi.org/10.1117/12.3008031
With the development of subsurface scattering technology in recent years, skin rendering has also increased realism. However, as an important part of skin rendering, specular rendering now mostly still uses the K/S model based on Beckmann’s normal distribution function, which makes skin specular reflection always too concentrated, reducing the realism of rendering. To solve this problem, this paper proposes an improved normal distribution function: GGX-Improved based on the GGX normal distribution function. By introducing a parameter K into the GGX normal distribution function, it can realize the adjustment of the tail length of the K/S model specular reflection while ensuring the shape invariance, to make the skin specular reflection softer and more natural, which enhances the realism of the skin rendering.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341M (2023) https://doi.org/10.1117/12.3007994
A reversible data hiding method based on double embedding in chunking is proposed for existing data hiding algorithms to improve the embedding capacity and generate a secret-laden image with a high visual effect. Firstly, the original image is chunked and different reversible embedding methods are applied to different parts of the sub-blocks for the secret information embedding operation. To improve the embedding capacity, the secret information continues the pre-processing operation and adopts a hierarchical embedding method to improve the utilization of each embeddable point. The original prediction method is also improved to generate a steeper error histogram with more embeddable pixel points. In addition, the use of a suitable prediction and embedding method produces a compensating reduction effect when performing double embedding, reducing the number of invalid displacements and improving the quality of the carrier image. It is experimentally verified that the proposed method outperforms some state-of-the-art RDH work with higher embedding capacity as well as lower distortion. For a standard Lena image, a peak signal-to-noise ratio of up to 66.21 dB can be obtained when the embedding capacity is 10000 bits.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341N (2023) https://doi.org/10.1117/12.3007970
In order to understand the international research progress and frontier of the application of virtual reality technology in landscape architecture, based on the Web of Science core collection database, literature metrology online analysis platform and Citespace software were used to analyze the research status and frontier of this field. The results show that the overall number of aeticles is on the rise. China and the United States have the highest number of publications and the closest cooperation between them. Beijing University of Posts and Telecommunications has the largest number of publications. IEEE Access, Landscape and Urban Planning and Ieice Transactions on Communications are the major international journals. The international research hotspots in this field are mainly Virtual Reality, Design and System. The current international research frontiers are Internet, Algorithm, Computer Architecture and Internet of Things. With the further development of virtual reality technology in the future, it will bring more possibilities for landscape architecture and promote the development of this discipline.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wenqi Huang, Yang Wu, Zhuojun Cai, Ruiye Zhou, Qunsheng Zeng, Lingyu Liang, Jianing Shang, Xuanang Li
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341O (2023) https://doi.org/10.1117/12.3008050
The power system is constantly exposed to outdoor environments, which makes it susceptible to invasion by foreign body such as tree branches and garbage bags. Currently, most deep learning-based detection methods assume the presence of foreign body in the image, and there is still room for improvement in detection accuracy. In this paper, a foreign body detection method for the power system is proposed based on Inception-V3 and Trans-former. The method first classifies inspection images according to whether foreign bodies are present, and then detects foreign body that have invaded the power system. This method does not use pre-defined datasets and converts object detection into a direct bounding box prediction problem, which greatly optimizes existing detection methods. Experimental results on actual datasets show that our research effectively improves the accuracy and efficiency of foreign body detection compared to detection algorithms based on Faster R-CNN and YOLOv3.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Wenqi Huang, Zhuojun Cai, Ruiye Zhou, Qunsheng Zeng, Yang Wu, Lingyu Liang, Jianing Shang, Xuanang Li
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341P (2023) https://doi.org/10.1117/12.3008040
To address the issues of low efficiency, insufficient accuracy, and high miss rate in traditional inspection methods for surface defects on ceramic insulators of transmission towers, this paper introduces a UAV-based intelligent inspection solution based on the deformable U-Net network to effectively detect and recognize surface defects on ceramic insulators in transmission towers. By using the deformable convolution operator to optimize the U-net network's convolution layer, the perceptual range of the convolution kernel is extended to improve the integrity of defect detail information. Meanwhile, the full-scale skip connection model is used to integrate high-dimensional and low-dimensional feature information to further improve the accuracy of ceramic insulator surface defect feature recognition. The experimental results show that the UAV-based intelligent inspection solution based on the deformable U-Net network can achieve an identification accuracy of 97.5%, an average precision of 95.55%, and an average intersection over union (IOU) of 91.67% in ceramic insulator surface defect detection. Compared with the traditional U-net method, the proposed solution in this study has improved the ceramic insulator surface defect inspection accuracy by 7.6%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341Q (2023) https://doi.org/10.1117/12.3008408
As a hot research direction in current academic studies, knowledge graph reasoning is aimed at solving the many challenges and pain points of knowledge graphs. This paper centers around temporal data prediction and presents a multi-level framework that leverages causal knowledge graphs. Our framework seamlessly integrates causal knowledge graphs with temporal data to enhance prediction accuracy. The framework is composed of two key components: causal knowledge graph construction and multi-level gated graph neural network prediction. By representing facts and relationships within the domain using causal knowledge graphs, the framework enhances the capability of the temporal data prediction model. The proposed framework design can provide better knowledge understanding for researchers in the field and achieve accurate prediction of temporal data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341R (2023) https://doi.org/10.1117/12.3008238
Aiming at the problems that the current YOLOV5 target detection algorithm has many parameters, complex network structure, and high configuration required for training models, CNS-YOLO proposes a lightweight algorithm based on YOLOV5 (CNS-YOLO). First of all, this paper uses the ConvNeXt structure to optimize the neck network of YOLOV5, and secondly, this paper uses Shufflenetv2 to improve the backbone structure of YOLOV5, and reconstructs the feature extraction network and feature fusion network. Through the improvement of these two structures, the problem of too many parameters of YOLOV5 is optimized, and finally the group convolution and attention mechanism are added to further increase the ability to extract information and suppress background noise, thereby improving the detection speed and detection accuracy of the algorithm. The results of the RSOD data set show that the mAP@0.5 of the CNS-YOLO network has increased by 2.3 percentage points compared to before the improvement, and the FLOPs have decreased by 12.8G compared to the before improvement; the generated model file has decreased by 10.6M compared to before. In the case of reducing the number of model parameters, mAP@0.5 is still improved, indicating that the algorithm has achieved good improvements in all aspects and improved the effect of target detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341S (2023) https://doi.org/10.1117/12.3008179
Mesh simplification is a fundamental problem in geometry processing. Since general simplification algorithms are difficult to parallelize, the main challenge is to process meshes of tens of millions of faces with fast and low memory consumption and maintain high-quality output. In this paper, we propose a multi-threaded algorithmic framework for mesh simplification. First, we design a robust and fast serial simplification model based on edge collapsing with low memory consumption. We implement a simplified algorithm based on Probabilistic QEM, and we take strict measures to protect the mesh topology as well as a greedy strategy to speed up the algorithm. Then we design a parallel simplification algorithm framework based on the idea of divide-and-conquer followed by global optimization. This method can execute the algorithm much faster with the same memory consumption as the serial method and maintain high-quality output results. Experiments show that our parallel algorithm outperforms current open-source software in terms of speed and memory consumption, and maintains good output for all models tested.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341T (2023) https://doi.org/10.1117/12.3008014
This topic is mainly aimed at the limitations of traditional water coastline segmentation and extraction technology. Through the comparison and analysis of different neural network algorithms, a water coastline recognition and extraction method based on VGG16-U-Net network is proposed to improve the quality of the water coastline. Recognition ability and accuracy of high-resolution images. This study uses the UAV aerial photography data set,uses the pre-training model and the improved VGG+Unet-based enhanced feature extraction network structure for training, and finally marks the water coastline on the original image and generates corresponding picture. This paper first introduces the research background and related technologies, then analyzes the structure and principle of the algorithm model used, and elaborates the specific implementation process and modification testing process. The innovation of this paper is based on the symmetrical structure of the U-Net network, combined with the VGG16 model to replace the original Unet’s backbone feature extraction network, double upsampling in the upsampling, and finally feature fusion, the obtained feature layer and the height and width of the original image are the same, which improves the accuracy of image recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341U (2023) https://doi.org/10.1117/12.3008107
Recognizing and segmenting artistic targets in Chinese paintings is an important method for analyzing and studying this art form. In order to enrich the expressive forms and cultural connotations of Chinese paintings, as well as promote the modernization of traditional culture, this paper proposes a segmentation method for animals in Chinese paintings. Firstly, using the Swin Transformer, artistic targets such as animals in Chinese paintings are detected, and the interested target image blocks are cropped. Then, the Attention UNet model is employed to achieve high-precision image segmentation for animals in Chinese paintings. Experimental results demonstrate that our algorithm successfully segments 19 species of animals in the sample dataset, achieving high accuracy and accurately segmenting the artistic targets in Chinese traditional flower-and-bird paintings. The achievements of this paper can be applied to the digital research of Chinese paintings, providing technical references for the inheritance and development of Chinese traditional painting.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341V (2023) https://doi.org/10.1117/12.3008104
Chinese paintings are generally divided into calligraphy, white drawing, and brushwork, often painted with ink, metallic pigments, and vegetable pigments using rice paper and cloth with distinct textures as physical carriers, and have a distinctive artistic style. In this paper, we propose a data augmentation method for Chinese-style paintings, which can better generate digital images that match the characteristics of Chinese-style paintings and are as semantically realistic as possible. First, we use SinGAN to train a single Chinese painting and generate 50 data augmentation results, which can reproduce the image texture and brush stroke style of a single Chinese painting. Subsequently, the Repaint model is used to semantically improve the data augmentation results to make them more realistic from a subjective perspective. Finally, we verify the effect of data augmentation in the image classification task based on VGG 16 and InceptionV3 and compare the effect of traditional data augmentation techniques with the deep-learning data augmentation technique proposed in this paper. The experimental results demonstrate that the training set processed by the deep-learning data augmentation technique can improve the prediction accuracy of the classification model, while the prediction accuracy of the classification model is improved again after training on the training set processed by the combination method of the traditional data augmentation technique and the deep-learning technique. This indicates that deep-learning data augmentation techniques can improve the efficiency of image tasks and avoid overfitting, which can be used in the study of the digitization of Chinese paintings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Lingyi Chi, Liuping Feng, Yangquan Zhou, Guokai Wang
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341W (2023) https://doi.org/10.1117/12.3008020
Many reversible data hiding algorithms only scan images in a fixed sequence, leading to low embedding capacity and poor stego-image quality. This paper proposes an adaptive algorithm that divides images into smooth and texture regions based on complexity, and uses different embedding algorithms for each region. Three predictors are used to generate asymmetric histograms and increase embedding capacity through two rounds of embedding in smooth regions. Pixel value ordering (PVO) is used for embedding fewer data in texture regions to maintain image quality. Experimental results show that the algorithm has high embedding capacity and low visual distortion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341X (2023) https://doi.org/10.1117/12.3008404
People's pursuit of beauty is getting higher and higher, and there are more new ideas in the design of experiential ceramic space. From the original single artistic style to the present, it integrates shape, color and decoration. Using virtual reality technology to creatively create and interactively display in modern ceramic art space can make it a new and unique art. The purpose of the research on ceramic art space design using virtual reality technology in this paper is to strengthen the sense of ceramic technology and art through existing technical means to achieve the effect of visual communication. This paper mainly uses the methods of experimental simulation to analyze the factors involved in the design of experiential ceramic art space through component analysis. The experimental results show that in this system, the response time of each module is maintained within 2-3 seconds, which can improve the computing ability of the system from the aspect of algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341Y (2023) https://doi.org/10.1117/12.3008108
Aiming at the problems that the traditional lane line detection algorithm is greatly affected by light, the detection effect is poor when the color of the lane line and the road surface is relatively light, and the entire line segment is mistakenly divided into multiple straight lines during local detection, a new method based on color-enhanced lane line detection algorithm in HSL color space is proposed. First, the RGB image is converted into an HSL image, brightness enhancement, color extraction, and channel combination threshold setting are preformed to reduce the influence of illumination and facilitate subsequent effective capture of lane lines of interest. The perspective of the image is then changed to a bird's-eye view, which simplifies segmentation and subsequent processing. Then, the improved line segment extraction algorithm is adopted, and the performance of line segment extraction is improved by improving the threshold and line segment clustering. Finally, the lane lines are synthesized by curve fitting. The experimental results show that the method proposed in this paper has good performance, low environmental requirements and high recognition accuracy, and the performance of the algorithm fully meets the requirements of lane line detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129341Z (2023) https://doi.org/10.1117/12.3008205
Object detection is an important research in the computer vision area, the mainstream object detection tasks are generally performed on class-balanced datasets, and have made great progress. However, the data in real scenarios are usually presented as long-tailed distributions, there is an imbalance between the number of classes samples, which causes a significant decrease in the performance of object detection. Most of the current long-tailed object detection algorithms enhance the performance of the tail classes at the expense of the accuracy of the head and common classes. In this paper, we adopt Adaptive Effective Class Suppression Loss (AECSL) to adjust the attention of the model to the tail classes by allocating different weight costs to different classes during the training process. We conduct comprehensive experiments on the challenging LVIS benchmark, AECSL achieved the competitive results, with 28.6% segmentation AP and 28.5% box AP on LVIS v0.5 and 27.5% segmentation AP and 27.6% box AP on LVIS v1.0 based on ResNet-101.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.