PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Proceedings Volume Third International Conference on Computer Vision and Information Technology (CVIT 2022), 1259001 (2023) https://doi.org/10.1117/12.2674350
This PDF file contains the front matter associated with SPIE Proceedings Volume 12590, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Vision and Information Technology (CVIT 2022), 1259002 (2023) https://doi.org/10.1117/12.2670011
Arbitrary shape detection, compared with analytic shape detection, plays a more significant role in machine vision for industrial automation. With the development of industrial automation, the requirements for low-delay detection and highprecision operation are gradually increasing. However, existing works on arbitrary shape detection pay more attention to detection accuracy, but few researchers attempt to achieve ultra-low detection delay, because of the limitation of the bandwidth between memory to CPU. This paper proposes clustering relative-vectors-based parallelization and temporal constraint for generalized Hough transform (GHT) algorithm compression to achieve the ultra-low delay process system, implemented on FPGA. By clustering relative vectors among closed edge pixels as a clustered vector, and defining a regularized R-Table structure, the parallelization of GHT has been increased. Moreover, fully utilizing the temporal information in high frame rate video leads to the compression of accumulator memory consumption, by confining the search widow and restricting the rotation range according to the detection result from the previous frame. The evaluation shows that the proposed architecture finishes the detection in VGA sized sequence with an ultra-low process delay of 1.851ms per frame.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Vision and Information Technology (CVIT 2022), 1259003 (2023) https://doi.org/10.1117/12.2669973
High frame rate and ultra-low delay corner detection plays an increasingly important role in factory automated scenarios with a demand for accurate and robust corner features. However, classic intensity-based corner detection like the Harris method has limitations in determining corner types and parameter selection. Conventional contour-based corner detection like Chord to Triangular Arms Ratio (CTAR) method uses global level curve extraction based on the whole frame, leading to high delay. Achieving corner detection nearly simultaneous with capturing the same image provides a workable solution to minimize the delay. To modify the conventional detection methods which arbitrarily process any pixels within the scope of the entire input, a multi-line buffer based pipeline architecture is proposed. Using this pipeline, the whole frame is divided into lines processed independently. Junction connectivity analysis is proposed to define corner types based on the architecture. The proposed algorithm almost keeps the robustness (Average Repeatability of 0.5715, Localization Error of 0.4285) with the original CTAR method (AR of 0.5832, LE of 0.4374), better than the Harris method (AR of 0.5322, LE of 0.7324).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Vision and Information Technology (CVIT 2022), 1259004 (2023) https://doi.org/10.1117/12.2670053
Population aging is happening in developed and some developing countries, this means the proportion of elderly people in society is increasing dramatically. The elderly often suffers from bone and joint diseases that make their daily live difficult. Therefore, accidental falls are a major cause of loss of autonomy, injuries among the elderly. Healthcare surveillance systems need to be improved to take care of these elderly due to the lack of nurses, this paper represents a solution to this problem. By using a vision-based architecture, the healthcare surveillance system can detect fall accidents in people's daily life activities and then notify the nurse to have in-time assistance. Yolov3-tiny is applied to detect humans in the frame, bounding boxes are generated to visualize the detection, then humans are tracked using the Kalman filter algorithm. AlphaPose is applied to generate key points from detected persons. Each keypoint coordinate will change frame after frame when persons in the frame are moving, then keypoint coordinate states are fed into ST-GCN as input, and the ST-GCN model will predict the probability of the person's activity (fall or not fall). Our model is more effective at detecting and predicting falls, distinguishing it from many other proposed models in the past. Some of our improvement include fall detection in low lighting conditions, multi-person fall detection, and fall detection when the human body is partially occluded. The experimental results show that our proposed vision-based surveillance system achieves high accuracy of 99.08%, precision of 98.84%, and recall of 98.03%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Vision and Information Technology (CVIT 2022), 1259005 (2023) https://doi.org/10.1117/12.2669826
In recent years, with the popularity of the Internet, digital media has become an indispensable part of people's life. Image acquisition equipment is widely used in daily life and work. However, due to technical problems, it can not meet the performance requirements of users. At the same time, a large number of data and information are difficult to be effectively processed, which also reduces the image quality, reduces the storage space and increases the image distortion. Therefore, it is necessary to develop new algorithms to improve these defects. Therefore, this paper designs and develops an automatic digital media image recognition system based on mean clustering algorithm. Firstly, this paper describes the concept and characteristics of digital media. Then, this paper also studies the application of mean clustering algorithm. Based on this, an automatic digital media image recognition system is designed, and the performance of the system is tested by simulation. Finally, the test results show that the algorithm can accurately identify media images. In general, the performance of the digital media image automatic recognition system based on mean clustering algorithm can meet the needs of target users.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Cerwin Dexter L. Dela Rosa, Kreed Zion Lorenzo G. Lagunilla, Jomari V. Ramos, Austin Kenneth V. San Pedro, Joseph Marvin R. Imperial
Proceedings Volume Third International Conference on Computer Vision and Information Technology (CVIT 2022), 1259006 (2023) https://doi.org/10.1117/12.2669907
Common approaches to vision-based tasks such as character and object recognition use Convolutional Neural Networks (CNNs) due to their practicality in processing images and theoretical grounding. In this work, we take a different perspective in the task of Baybayin script recognition by exploring Vision Transformers, a new paradigm for processing images inspired by the Transformer model. We compare performances of CNNs and ViT and analyzed model confidence on a set of test images using Local Interpretable Model-Agnostic Explanations (LIME). Results show that, performance-wise, convolution-based architectures (CNNS) still outperform sequence-based methods (ViT) for discriminating Baybayin scripts with a nearly doubled performance of 84.5% to 48.8% in accuracy respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Xu Yanliang, Zhang Chuanyue, Shi Gongzuo, Jin Ye, Qu Yujin, Mao Xueshun, Xu Jingtao, Mohammad Noori, Wael A. Altabey
Proceedings Volume Third International Conference on Computer Vision and Information Technology (CVIT 2022), 1259007 (2023) https://doi.org/10.1117/12.2669715
With the continuous development of the social economy, road traffic plays an increasingly significant role in the national economy and people's lives. Traffic has become one of the important infrastructures for people's daily travel and economic construction in my country, and it is the key to reflecting the economic development of a region. During the long-term operation of expressways, various types of diseases will occur. In order to solve the problem that traditional pavement crack detection, a pavement crack detection method based on image processing under complex background is proposed. A pavement crack image segmentation model based on semantic segmentation is built, and cracks in highresolution crack images are extracted by using the pavement image segmentation model. The results show that, compared with the existing algorithms, the pro-posed algorithm has a better detection effect and stronger generalization ability in complex road scenes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Vision and Information Technology (CVIT 2022), 1259008 (2023) https://doi.org/10.1117/12.2669900
Science and technology have changed life. With the increasingly widespread application of image processing technology in sports competitions, in ball games, the landing point recognition system that recognizes and locates fast moving spheres plays an important role in competitions and daily training, and has become one of the research hotspots of artificial intelligence in the field of cultural entertainment. By collecting the players' playing videos, accurately identifying the landing points of table tennis and analyzing the landing areas, applying the target detection and tracking technology to the intelligent table tennis training scene, the system algorithm has carried out a number of experiments in the table tennis intelligent training system of the project team, accurately and effectively identifying the landing points and scoring the areas of table tennis, and realizing the functions of landing point identification and scoring the areas of table tennis, It has high recognition accuracy and real-time, and basically meets the functional requirements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Vision and Information Technology (CVIT 2022), 1259009 (2023) https://doi.org/10.1117/12.2670017
Due to their unique properties such as high availability and reliability, distributed systems are gaining popularity nowadays. However, the rapid growth of Big Data in distributed systems creates new issues for dataset reliability and availability. In any distributed computer system, the presence and recurrence of failures is an inescapable factor. Both hardware and software components of distributed systems are prone to failure. As a result, the issue of fault tolerance is being recognized as the fundamental theme and essential requirement for the construction and maintenance of the distributed computing paradigm in order to achieve prominence and criticality. Fault tolerance refers to the application that must be executed even in failure conditions by detecting and correcting the fault. Reactive fault tolerance techniques are used to effectively troubleshoot the systems upon occurrences of failures. This paper aims to provide a better understanding of reactive fault tolerance techniques and identifies various approaches used as reactive fault tolerance in distributed systems. Based on the reviews done in this research, there are various reactive fault tolerance techniques that can improve the performance of the distributed systems in terms of availability, reliability, total execution time, and communication cost such as replication, checkpointing task resubmission, and job migration.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Vision and Information Technology (CVIT 2022), 125900A (2023) https://doi.org/10.1117/12.2669648
The fine-grained classification of remote sensing airplane images is a very meaningful work. Few of existing works have paid attention to the fine grained classification of remote sensing objects. The purpose of our research is to develop a better fine-grained classification performance of remote sensing airplane images. In this paper we propose a remote sensing airplane fine grained classification method with few shot learning. Few shot learning is used to alleviate the extreme imbalance distribution of the samples in different categories. We found two factors that affect the classification accuracy, the direction of the airplane and the background distribution. In order to increase the accuracy of classification and weaken the influence of the background, we propose an algorithm which use the symmetry of the image to predict the direction of the airplane, and add a Transpose encoder to alleviate the impact of the background distribution. The experimental results on Fair1m dataset proves the effectiveness of our method, which has obtained 5.73% up on accuracy performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Vision and Information Technology (CVIT 2022), 125900B (2023) https://doi.org/10.1117/12.2669979
Object tracking plays an important role in the computer vision field and has many applications such as video surveillance and vehicle navigation. But the occlusion problem is one of the most challenging problems in the applications. Although there are many approaches in the object tracking field that focus on dealing with occlusion scenes, the occlusion with large size barriers and long occlusion time still cannot be solved. To handle the problems, this paper proposes a reliable tracking method based on particle filter focus on long-term full occlusion with large size barriers. In this paper the large size is defined as pixel width from 350 to 600 in fixed resolution images. and the long term is defined as occlusion frame number from 180 to 600. First, this paper proposed a particle position reset module to replace the resampling process during the occlusion periods to solve the problem of losing the target after occlusion. In addition, a hybrid feature based likelihood model is proposed for the occlusion happening and ending judgments. Experiments on the extreme occlusion situation sequences demonstrate the reliability and accuracy of the proposed work on these challenging scenes. The algorithm finally implements the average 92% success rate at the tested sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Vision and Information Technology (CVIT 2022), 125900C (2023) https://doi.org/10.1117/12.2669795
Successful jumps in figure skating with critical parameters such as proper jump height, spin speed, and the number of jump rotations, which are valuable for analysis in athlete training. Driven by recent computer vision applications, reconstructing 3D poses of the athlete in figure skating to extract the significant variables has become increasingly important. However, a large number of conventional works have obtained 3D poses from corresponding 2D information directly, which ignores the uniqueness of figure skating, such as self-occlusion, abnormal poses, etc. This paper proposes a multi-view voxel based system for calibration and error correction to reconstruct the 3D jumping poses of figure skaters in the presence of 2D heatmaps. The proposed method consists of two key components: Voxels based recovery method of high probability area in 2D heatmap; Plain 2D smoothness and motion trajectory and relative joint positions separable 3D smoothness based rectification method. This work is proven to be applicable to 3D pose dynamics in figure skating jumping motion. Mean Per Joint Position Error (MPJPE) is: 34.58mm in the pre-jump stage, 16.51mm in the jumping stage, 15.73mm in the post-jump stage, and 16.93mm in the whole jump stage, which is 36% improvement compared with the conventional work.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Vision and Information Technology (CVIT 2022), 125900D (2023) https://doi.org/10.1117/12.2669825
With the continuous development of the construction of digital communities, digital cities and digital earth, 3D model visualization technology has become a more important development direction. This paper took deep learning as the basic premise to design the 3D visualization model based on StyleGAN2 algorithm. That is, firstly, StyleGAN2 algorithm was used to segment the image, and Three JS was selected as the Web 3D visualization framework for the model; then the 3D visualization model based on StyleGAN2 algorithm was constructed. After testing and training the model for many times, the corresponding modeling image was acquired. After aligning this image and positioning coordinates, it was substituted in the suitable modeling scene, and finally the fast modeling of buildings was realized. The results show that StyleGAN2 algorithm has superior performance and high modeling efficiency, which can lead to lower modeling cost and thus promote the improvement of 3D visualization of urban buildings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Allen Bryant S. Lineses, Cerwin Dexter L. Dela Rosa, Anjelica M. Castillo, Kreed Zion Lorenzo G. Lagnilla, Austin Kenneth V. San Pedro, Ramon L. Rodriguez
Proceedings Volume Third International Conference on Computer Vision and Information Technology (CVIT 2022), 125900E (2023) https://doi.org/10.1117/12.2669651
As computer vision advances, the Philippines falls behind on the resources of a web-based tool that features image annotation. In line with this, this research develops a web application that users can use to crowdsource annotations from an uploaded file and have a web interface that summarizes all the stored images. Image annotation uses a computer and labels multiple methods in recognizing the data, classification, and object detection to identify and mark the prominent feature of an image to train the computer to recognize and identify an unlabeled image. It provides open access to any user and researcher who uses this app and collaborates with others. Although this app requires multiple image data to improve the app's overall performance, this app also features inputs from the user to widen the identifier feature of the app. This research used agile software development to ensure continuous development. In addition, the testing is used to continuously test and advise providers' feedback per step that the developers are doing. Ten evaluators are invited to test the system. The results show that the scenario completion time of using the CROMA Web Application is easy to use and navigate and uses the users' time well. The scores of each usability component show that the system is easy to navigate and that learns to use the system well. It is a good result and suggests that users can perform the specific task they need to do in the system. Furthermore, the user acceptance test scores of each usability component show that the web application is doing its function efficiently, and the users are satisfied and happy with the features. After three layers of agile testing, CROMA Web Application has been given the go-ahead to be deployed in the cloud server. For future work, automated labeling and a search function would help the user navigate preferences for every category of the images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Computer Vision and Information Technology (CVIT 2022), 125900F (2023) https://doi.org/10.1117/12.2669839
With the rapid development of virtual reality (VR) technology, virtual reality technology is gradually introduced into the development of ski simulator system. As a kind of simulation equipment that can correctly simulate skiing movements, ski simulator can not only train skiers and shorten the training cycle, but also popularize and promote winter sports to the public in a new way of communication. The software architecture of the control system of the ski simulator platform is designed with the idea of modularization, including the communication interface design with the VR upper computer. The control system is divided into motor setting module, motor control module, sensor data module, motor status module, motor status module, real-time position module of the motion axis, communication data module and data recording module. The test results show that the whole control system has perfect functions, friendly communication interface and human-computer interaction function. The test results show that the skiing experience performance of the ski simulator system is good and meets the design requirements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Yuying Xu, Fang Wu, Li Shi, Zifan Feng, Rongxin Qiu
Proceedings Volume Third International Conference on Computer Vision and Information Technology (CVIT 2022), 125900G (2023) https://doi.org/10.1117/12.2669830
Different teaching modes arising from different educational backgrounds have an important influence on the architectural design of colleges. There is a relationship between the layout designs of informal learning spaces in college teaching buildings and the probablity of informal learning behaviors. This paper attempts to combine visual and topological analyses to investigate the socio-spatial properties of informal learning spaces and use them to assess the influence of the floor plan characteristics of the teaching building on the probablity of informal learning behaviors. It has been found that planes with high accessibility and visibility are more conducive to the occurrence of informal learning behaviors. Informal learning spaces increase the probability of informal learning behaviors occurring by relying on corridor spaces with high visibility attributes and frequency of use. This study will delve into how the layout of informal learning influences the occurrence of informal learning behaviors, providing some reference significance for future renovation and new construction of college teaching buildings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.