PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 12084, including the Title Page, Copyright information, Table of Contents, Introduction, and Conference Committee.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The use of research information systems (RIS) depends to a large extent on the quality of the data recorded there. Scientifically proven methods and procedures are required to efficiently ensure high data quality. This paper proposes a concept for managing the quality of research data that was developed by the author as part of the dissertation. It is aimed at those responsible in the facilities for data management and quality assurance in research information systems. The concept is intended to help the user analyze and improve the quality of data so that it is ultimately needed for decisions
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the face of a wide variety and a large number of production and domestic waste, it is a great challenge for the task of automatic detection and sorting of waste. Based on yolov5 algorithm, this paper proposes a method for rapid detection and classification of garbage, trains the model on taco[1] garbage data set, and extracts the location and feature information of garbage through this network model according to the experimental results. In reality, this model can effectively detect the garbage classified by the data set. After testing, the mAP(Mean Average Percision) value of the model reaches 97.62%, the detection accuracy is 95.49%, and the detection speed reaches 5.52fps. Compared with yolov3 network model, which better complete the task of garbage classification and detection. This network model has the necessary technical conditions for the algorithm of waste sorting robots.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose POTATOES (Partitioning OverfiTting AuTOencoder EnSemble), a new method for unsupervised outlier detection (UOD). More precisely, given any autoencoder for UOD, this technique can be used to improve its accuracy while at the same time removing the burden of tuning its regularization. The idea is to not regularize at all, but to rather randomly partition the data into sufficiently many equally sized parts, overfit each part with its own autoencoder, and to use the maximum over all autoencoder reconstruction errors as the anomaly score. We apply our model to various realistic datasets and show that if the set of inliers is dense enough, our method indeed improves the UOD performance of a given autoencoder significantly. For reproducibility, the code is made available on github so the reader can recreate the results in this paper as well as apply the method to other autoencoders and datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Procedurally-defined implicit functions, such as CSG trees and recent neural shape representations, offer compelling benefits for modeling scenes and objects, including infinite resolution, differentiability and trivial deformation, at a low memory footprint. The common approach to fit such models to measurements is to solve an optimization problem involving the function evaluated at points in space. However, the computational cost of evaluating the function makes it challenging to use visibility information from range sensors and 3D reconstruction systems. We propose a method that uses visibility information, where the number of function evaluations required at each iteration is proportional to the scene area. Our method builds on recent results for bounded Euclidean distance functions by introducing a coarse-to-fine mechanism to avoid the requirement for correct bounds. This makes our method applicable to a greater variety of implicit modeling techniques, for which deriving the Euclidean distance function or appropriate bounds is difficult.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Within the farming industry using computer vision technologies provides great support for various processes. One example is the grain harvest with a combine harvester, which requires excellent expert knowledge to reach a certain level of quality. This means the amount of byproducts including straw and hulls should be as small as possible. Inside a combine harvester several sensors are installed which deliver important information about the current harvesting state. This includes a visual based sensor that constantly delivers images to the driver about the composition of the processed grain. Normally, it is the driver’s task to decide if the quality is sufficient or if machine settings need to be adjusted to improve the outcome. Yet, resolving this task entirely manually is rather error prone, due to the high amount of fairly similar images that need to be analyzed in a short time. Therefore we designed and implemented a system that is able to automatically detect unwanted byproducts within the wheat harvest and highlight those parts inside images delivered by the visual sensor. The system itself is a combination of an automated preprocessing step and a variation of a UNet. The preprocessing step filters unwanted byproducts and is able to automatically generate training data for the UNet. The UNet is lightweight, easy to train and can potentially be used onboard the combine harvester.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Monitoring airport runways in panchromatic remote sensing images is helpful for both civil and strategic communities in effective utilization of the large-area acquisitions. This paper proposes a novel multimodal semantic segmentation approach for effective delineation of the runways in panchromatic remote sensing images. The proposed approach aims to learn complementary information from two modalities, namely, panchromatic image and digital elevation model (DEM) to obtain discriminative features of the runway. The fusion of image features and the corresponding terrain information is performed by stacking the image and DEM by leveraging the merits of both Transformers and U-Net architecture. We perform the experiments on Cartosat-1 panchromatic satellite images with the corresponding Cartosat-1 DEM scenes. The experimental results demonstrate a significant contribution of terrain information to the segmentation process in achieving the contours of airport runways effectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual attention and its modeling are getting more and more focus during the past decades. It has been used for several years in various fields, such as the automotive industry, robotics, or even in diagnostic medicine. So far, the research has focused mainly on the generalization of the collected data, although on the contrary, the identification of unique features of the visual attention of the individuals remains an open research topic. The aim of this paper is to propose a methodology which is able to cluster people into groups based on individualities in their visual attention patterns. Unlike the former research approaches focused on the classification problem where the class of the subjects is required to be known, we focus our work on the open research problem of unsupervised machine learning based on the measured data about subjects’ visual attention, solely. Our methodology is based on the clustering method which utilizes individual feature vectors created from measured visual attention data. Proposed feature vectors forming up the fingerprint of the attention of an individual are based on the direction of saccades of individuals. Our proposed methodology is designed to work with a limited set of the measured eye-tracking data without any additional information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Despite comprehensive works on point cloud ground segmentation for flat roads, research on rough roads has rarely been conducted due to dataset scarcity. To study point cloud ground segmentation on rough roads, in this paper, we provide a synthetic geometric transformation of flat roads motivated by the investigation of real-world rough roads. Our proposed TransGSnet framework consists of two modules: the pillar feature extractor, which turns a raw point cloud into the pseudo image as an intermediate representation, and the transformer-based segmentation network to perform ground segmentation. Specifically, our segmentation network exploits the U-Net architecture and includes three sub-modules: Transformer, mobile block (MB), and convolutional block attention module (CBAM). We thoroughly evaluate our framework in experiments, including comparisons against state-of-the-art approaches on semanticKITTI and the synthetic rough road dataset, respectively. As a result, our framework shows a great trade-off of performance cost.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Parkinson's disease (PD) is one of the neurological disorders that affect the central nervous system leads to cognitive, emotional and speech disorders. Many methods have been proposed over time for discriminating between people with PD and healthy people using signals processing. In this paper, a new approach is defined using i-vector subspace modelling to discriminate healthy people from people with PD. The i-vectors features is one of the crucial parameters that prove promising results in the domain of speech recognition. In this study two i-vectors dimensionality (100 and 200 dimensions) extracted from voice recordings using Gaussian Mixture Models based on Universal Background Model (GMM-UBM) size (64, 128 and 256 Gaussians). To the end, we assess the effect of the i-vectors features by using Support Vector Machine (SVM). The results reveal show that the proposed approach can be strongly recommended for classifying Parkinson's patient from healthy individuals.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper considers the classification methods for business documents images data extracted after recognition. The peculiarities of the recognized text analysis are pointed out. The identification mechanism for the recognized words is described. The advantages and disadvantages of the Levenshtein distance are listed. Other string distance metrics are considered: Jaro–Winkler similarity, multiset metric, Most Frequent K Characters (MFKC) metric. The standard Levenshtein distance is compared with other string distance metrics. A modification of the Levenshtein distance is proposed, which is aimed at the peculiarities of recognized characters. The paper provides the experimental results illustrating the proposed distance application.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper* we discuss the concept of the Cross-Barcode (P,Q) introduced and studied in the recent work [1]. In particular, we describe the emergence of this concept from the combinatorics of matrices of the pairwise distances between the two data representations. We also illustrate the applications of the Cross-Barcode (P,Q) to the evaluation of disentanglement in data representations. Experiments are carried out with the dSprites dataset from computer vision.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Understanding road scenes is a fundamental problem in advanced driver assistance systems (ADAS) for having a safe and comfortable driving. In our previous work, we proposed a novel approach to utilise the advantages of enhancement-based segmentation method to improve the road segmentation performance at reasonable computational effort. However, unpaved roads, cast shadows, grass and the side walk near the road boundary are still fuzzy and could not precisely to be identified. They suffer from either insufficient training data or the limitation of higher-order potentials in pairwise Conditional Random Field (CRF) models. To overcome these drawbacks, we propose a semi-supervised refinement strategy based on a modified cycle generative adversarial network (CycleGAN), which is more generalisable by enforcing higher-order consistency without being limited to a very specific class of high-order potentials. Proposed method uses only a fraction of annotated images, which may significantly reduce human annotation efforts. Our contribution is that unlike the existing adversarial learning methods, we proposed a modified generative model with fewer parameters than the original CycleGAN, which improves the performance while decreasing the computational cost. Moreover, we enforce cycle consistency to learn the mapping between 4-D channel unpaired images and label domain. To guarantee that the generated image from our modified network corresponds to the original image, we added the distance between a sub-set of images and their paired targeted label. The adversarial learning procedure is limited to the already predicted road boundary obtained from our recent work, which together with the limited number of annotated images boost the segmentation performance. Experiments on KITTI public road segmentation benchmark shows the effectiveness of the 4-7% of improvement with respect to our previous work based on the super pixel-CNN approach and achieves comparable performance among the top-performing algorithms of recent un/semi-supervised semantic segmentation tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning based techniques have been widely used for semantic segmentation. The underlying voluminous DNN models are trained on large datasets that have been annotated at the pixel level by humans. Such low-level annotation tasks are expensive to obtain for newly collected datasets. Alternatively, we propose ComViSe, a segmentation pipeline that requires only high-level annotations that remain relatively accessible (e.g., bounding boxes and labels of a detection, labels of a legend) to segment a given image. ComViSe embeds a segmentation framework, pre-trained on a semantically different dataset, to generate image region proposals. The pipeline relies then on several semantic, visual and geometric criteria to characterize each proposed region, and combines them to select the optimal segmentation mask, comparing diverse aggregation strategies from handcrafted formula to automatic ones, supervised or not. An experimental study conducted on the PASCAL VOC dataset shows that these effectively combined criteria are enough to select the mask proposals with the best IoU score in most cases, and that the aggregation can be done automatically.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recent approaches have achieved excellent results on few-shot object detection. However, most detectors are easily confused by visually similar classes, leading to misclassification of interesting objects. In this work, we introduce an anti-confusion grouping mechanism for this problem. Our model can refine the results of the major multi-class classifier of the few-shot object detector with an anti-confusion module. Instead of maximizing the feature distribution distance of similar classes in the feature space, our approach uses additional auxiliary grouping module to distinguish similar classes on the same feature space as in base training phase. Concretely, the class groups are obtained according to the class visual similarity, and then they are utilized to train the auxiliary module. The main classifier, regressor and auxiliary anti-confusion module are end-to-end trained based on a multi-task loss. In the test phase, the auxiliary module is combined with the main classifier to provide the final classification result. Through extensive experiments, we demonstrate that our model outperforms well-established baselines for few-shot object detection. We also present analysis on various aspects of our model, aiming to provide some inspiration for future few-shot detection works.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Consistency training has proven to be an advanced semi-supervised framework and achieved promising results in medical image segmentation tasks through enforcing an invariance of the predictions over different views of the inputs. However, with the iterative updating of model parameters, the models would tend to reach a coupled state and eventually lose the ability to exploit unlabeled data. To address the issue, we present a novel semi-supervised segmentation model based on parameter decoupling strategy to encourage consistent predictions from diverse views. Specifically, we first adopt a two-branch network to simultaneously produce predictions for each image. During the training process, we decouple the two prediction branch parameters by quadratic cosine distance to construct different views in latent space. Based on this, the feature extractor is constrained to encourage the consistency of probability maps generated by classifiers under diversified features. In the overall training process, the parameters of feature extractor and classifiers are updated alternately by consistency regularization operation and decoupling operation to gradually improve the generalization performance of the model. Our method has achieved a competitive result over the state-of-the-art semi-supervised methods on the Atrial Segmentation Challenge dataset, demonstrating the effectiveness of our framework. Code is available at https://github.com/BX0903/PDC.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Despite recent impressive results of generative adversarial networks on text-to-image generation, the generation of complex scenes with multiple objects in the complicated background remains challenging; moreover, end-to-end text-toimage generation still suffers from poor image quality. In this work, we propose a sequential algorithm of text-to-image generation, which allows synthesizing high-quality images (more than 1024x1024 pixels). The proposed approach consists of location inference, key objects extraction, image search, layout generation, and image harmonization stages. We compare the suggested approach with state-of-the-art image generation model DALL-E with text-to-image mapping. Our approach demonstrates the effectiveness and visual plausibility of the generated images based on golden section layouts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic table understanding in document images is one of the most challenging topics in the research community. This is owing to the fact that tables may appear in various structures and designs. However, a big majority of tables are designed with ruling lines. Recognizing these lines in images is mandatory in numerous table understanding processes. Previous works have utilized hand-crafted features, merely applicable to distortion-free images. We present a compact CNN as an alternative solution. This method is capable of segmenting the ruling lines in challenging environments. In addition to the proposed architecture, a new dataset is generated for this task that contains 35K labeled samples. The reported results on this dataset show the effectiveness of this method. Our implementation and dataset are available online.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of important challenges in the document liveness detection process for identity document verification is quality verification. To tackle this challenge, this paper proposes a reference hashing approach to discriminate between the original template of the identity document image and the scan one, which is called CheckScan. Actually, the discrimination process takes place between two aligned identity document images. The proposed approach is made up of two steps: feature extraction based on Fast Fourier Transform (FFT) and hash construction. Feature extraction step involves partitioning the identity document image into set of non-overlapping blocks, and for each block the FFT magnitude spectrum is calculated. Hence, a specific number from the FFT magnitude peaks is selected as discriminative features. The hash construction step quantizes the selected peaks into binary codes by applying a new quantization approach that is based on the coordinates of the selected peaks. These two steps are combined together in this work to achieve good discriminate (well anti-collision) capability for distinct identity document images. Experiments were conducted in order to analyze and identify the most proper parameters to achieve higher discrimination performance. The experimental results were performed on the Mobile Identity Document Video dataset (MIDV-2020), and the results show that the proposed approach builds binary codes quite discriminative for distinct identity document images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper contains three different problems and solutions for detecting defects in very large-scale integration of wafer images. The goal is to explain the typical categories of wafer inspection problems: topology, device inspection, wafer analysis. Unusual and particular cases have been selected to show the high problem variety into the same wafer inspection field. This aspect implies solutions very different and specific variations of classic image analysis techniques and computer vision methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The age verification is an important task in various context of applications like access control in spaces in hotels which are prohibited for children and teenagers, in dangerous spaces for children and in public area during a spread of virus among others. In fact, the age verification consists in classifying the face images into different age groups while dealing with the face appearance variation affected by occlusion, pose variation, low resolution, scale variation and illumination variation. This work introduced an access control application based on the age verification in an uncontrolled environment. In fact, we proposed a new two-level age classification method based on deep learning in order to classify the face images into eight age groups. Actually, the two-level classification strategy help reducing the confusion between the inter and intra age groups. Our experiments were performed on the multi-constrained Adience benchmark. The obtained results illustrate the effectiveness and robustness of the proposed age classification method in an uncontrolled environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A technique for determining the parameter values from experimental images for the porous structures generator for synthesizing porous phantoms is presented. The algorithms for fast determination of porosity and the standard deviation of the Gaussian filter used as input parameters of the phantom generator are considered. The phantoms generated according to the found parameters have geometric characteristics similar to the original images, which makes it possible to use such phantoms both for studying and modeling processes in porous media and as basic structures for creating training samples for segmentation algorithms of experimental images using machine learning methods..
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aircraft type recognition remains challenging, due to their tiny sizes and geometric distortions in large-scale panchromatic satellite images. This paper proposes a framework for aircraft type recognition by focusing on shape preservation, spatial transformations, and geospatial attributes derivation. First, we construct an aircraft segmentation model to obtain masks representing the shape of aircrafts by employing a learnable shape-preserved and deformable network in the mask RCNN architecture. Then, the orientation of the segmented aircrafts is determined by estimating the symmetrical axes using their gradient information. Besides template matching, we derive the length and width of aircrafts using the geotagged information of images to further categorize the types of aircrafts. Also, we present an effective inferencing mechanism to overcome the issue of partial detection or missing aircrafts in large-scale images. The efficacy of the proposed framework is demonstrated on large-scale panchromatic images with ground sampling distances of 0.65m (C2S).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is crucial for early detection of damage or wears on tools, to perform any manufacturing operation on a work-piece with good quality and precision. If tool cutting edges are worn or damaged, they can be reground instead of opting for a new tool. This saves a lot of machining costs for the company. Traditionally the damage detection is done by manual inspection with the help of an optical microscope and the damage locations are reground using a CNC machine. However, damage detection on coated milling tool using an optical microscope is time taking process and quite challenging due to factors such as non-homogeneous illumination intensity, a huge amount of reflections captured by the camera system, and different damage formations. Therefore, a novel approach has been proposed in this paper where automatic image-based damage detection of optical critical components such as TiN (Titanium Nitride) coated milling tools is done by using a new lighting source. The illumination source where a Cylindrical Shaped Enclosure (CSE) with 14 multi-spectral Light Emitting Diodes (LED) distributed uniformly around its circumference to enhance multi-light scattering allows to capture high-quality images with high resolution, good contrast, and low noise which helps in improving damage detection tasks. To date, the current work is the first of its kind, where an optical critical object is inserted in a cylindrical-shaped illumination source to capture high-quality images for damage detection. In the end, the proposed method is compared with the traditional approach to compare the damage detection capability. In this paper, Image-based damage detection on TiN-coated Milling tools by using a multi-light scattering illumination technique is proposed. This paper is experimentally oriented work and presents a practical solution to a given problem. With the proposed lightning system, the image processing algorithm can better localize the damage.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sign Language Recognition (SLR) targets on interpreting the sign language into text or speech, so as to facilitate the communication between deaf-mute people and ordinary people. The task has broad social impact, but is still very challenging due to the complexity and large variations in hand actions. Existing dataset for Sign Language Recognition (SLR) in Bangla Sign Language (BdSL) is based on RGB images. Recent research on sign language recognition has shown better recognition accuracy using depth-based features. In this paper, we present a complete dataset for Bangla sign digits from Zero (Shunno in Bangla) to Nine (Noy in Bangla) using MediaPipe, a cross-platform depth-map estimation framework. The proposed method can utilize hand skeleton joint points containing depth information in addition to x, y coordinates from RGB images only. To validate the effectiveness of our proposed approach, we have run MediaPipe on a benchmark American Sign Language (ASL) dataset. Running different classifiers in our proposed dataset we got 98.65% using Support Vector Machine (SVM). Moreover, we compared our dataset with the existing Bangla digit dataset Ishara Bochon using deep learning based approach and achieved significantly higher accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In response to the needs of maritime target monitoring, combined with the practical application of Synthetic Aperture Radar (SAR), an anchor free SAR image ship target detection model (AF-YOLO) based on YOLO under the premise of sea-land segmentation is proposed. Sea-land segmentation based on the Otsu can remove interference from the terrestrial environment and improve the identification of ships. The detection head based on the anchor free is applied to YOLO, and the weightable feature fusion structure is used for multi-scale fusion. Experiments have shown that the mAP of the proposed algorithm on the public SAR ship data set has reached 93.4%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For text line recognition, much attention is paid to augmentation of the training images. Yet the inner structure of the textual information in the images also affects the accuracy of the resulting model. In this paper, we propose an ANNbased method for textual data generation for printing in images with a background of a synthetic training sample. In our method we avoid the usage of completely random sequences as well as the dictionary-based ones. As a result, we gain the data that saves the basic properties of the target language model, such as the balance of vowels and consonants, but avoid the lexicon-based properties, like the prevalence of the specific characters. Moreover, as our method focuses only on high-levels features and does not try to generate the real words, we can use a small training sample and light-weight ANN for text generation. To check our method, we train three ANNs with same architecture, but with different training samples. We choose machine readable zones as a target field because of their structure that does not correspond with the ordinary lexicon. The results of the experiments on three public datasets of identity documents demonstrate the effectiveness of our method and allows to enhance the state-of-the art results for the target field.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Admittedly, machine vision-based assistive applications are beneficial for blind and visually impaired (BVI) persons. Such a need has numerous already implemented outdoor assistive solutions. However, there are much less effective solutions for indoor navigation and orientation. It is due to the absence of GPS signals and the need for infrastructural investments (such as WI-FI signals, beamers, RFID tags). In this paper, we present another way - a wearable electronic traveling aid (ETA) system for the BVI persons using outsourcing, i.e., volunteers’ mapping of buildings indoor routes. Volunteers use the proposed wearable ETA device to record indoor routes stored in the web cloud database using web services. Smartphones’ IMU and other sensors, stereo and depth camera, audio and haptic devices, computer vision algorithms, and computational intelligence are employed for objects detection and recognition, and consequently, intelligent routing and mapping of indoor spaces. Integration of semantic data of points of interest (such as stairs, doors, WC, entrances/exits) and building (evacuation) schemes makes the proposed approach even more attractive to the BVI users. The presented approach can also be employed to crowdsourcing real-time help in complex navigational situations such as dead reckoning, avoiding various obstacles, or unforeseen situations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper endeavors to leverage spatio-temporal visual cues to improve video-based action detection. As a result, a NOn-Local Action detector based on anchor-free called NOLA is proposed, which is built off a recent moving center detector (MOC) and further extends it by efficiently aggregating long-range spatio-temporal information. In detail, a significantly efficient spatio-temporal motion-aware non-local block is explored to provide global motion contexts for the entire predictive branches of MOC. This byproduct can make the large batch samples run on a resource limited device. Besides, a light-weighted data augmentation method termed clip augmentation designed for video-based tasks is proposed, which serves to improve the generalization ability of the detector with economical scale-and-addition operation. NOLA works with two above schemes in real-time as well. Experiments on two benchmark datasets show that NOLA significantly exceeds MOC. Compared to other existing methods,, NOLA reaches the state-of-the-art, in terms of video-level mean of average precision (video mAP).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Advertising in TV shows and movies is expensive and has one of the largest market shares in the entire advertising industry. We address the task of adding a given advertising banner in a given video. In this paper, we propose a new algorithm for processing and replacing advertising banners in videos, which preserves the quality of the original video content. This algorithm allows the given posters to be inserted into a video in a fully automated mode. In order to replace a banner, the algorithm requires only a video and an image of the banner to be inserted. Our algorithm uses computer vision methods for localizing banners on the scene, analyzing, and transforming them. We suggest the approach to create a synthetic dataset for fine-tuning advertising banners detection models. We implement three various methods for the banner localization task and compared the approaches with each other and existing methods. The source code and examples of the algorithm performance are publicly available https://github.com/leonfed/ReAds.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep convolutional neural networks have proven effective in computer vision, especially in the task of image classification Nevertheless, the success is limited to supervised learning approaches, requiring extensive amounts of labeled training data that impose time-consuming manual efforts. Unsupervised deep learning methods were introduced to overcome this challenge. The gap, however, towards achieving comparable classification accuracy to supervised learning is still significant. This paper presents a deep learning framework for images of planktonic organisms with no ground truth or manually labeled data. This work combines feature extraction methods using state-of-the-art unsupervised training schemes with clustering algorithms to minimize the labeling effort while improving the classification process based on essential features learned by the deep learning model. The models utilized in the framework are tested over existing planktonic data sets. Empirical results show that unsupervised approaches that cluster the data based on the deep learning model’s feature space representations improve the classification task and can identify classes that have not been seen during the learning process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Re-Identification (Re-ID) is becoming more and more common in today’s world, the need for more optimized algorithms also becomes more wanted. This is due to the importance of high accuracy as the consequences of an incorrect match can mean security issues, if used to gain access or result in incorrect findings in science due to wrong data. This paper explores enhancing the performance of Siamese Neural Networks by exploring the performance of loss functions to better suit the user’s Re-IDing needs. These loss functions are Triplet loss, Triplet Hard loss and Quadruplet loss. Results show that the Triplet hard loss function performs better than the two others. The functions were tested on a human dataset as well as on animal datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Frequent inspection of salmon cage integrity is essential to early detect and prevent the possible escape of farmed salmon—minimizing the risk of any negative impact for the remaining wild stock of salmon. Current state-of-the-art computer vision-based approaches can detect net irregularities under “optimal” net and illumination conditions but might fail under real-world conditions. In this paper, we present a novel modularized processing framework based on advanced computer vision and machine learning approaches to effectively detect potential net damages in video recordings from cleaner robots traversing the net cages. The framework includes a deep learning-based approach to segmenting interpretable net structure from background, transfer learning facilitated classification of potential holes from irrelevance, and computer vision-based modules for irregularity detection, filtering, and tracking. Filtering and classification are vital steps to ensure that temporally consistent holes within net structure are reported—and irrelevant objects such as by-passing fish are ignored. We evaluate our approach on representative real-world videos from real cleaning operations and show that the approach can cope with the difficult lighting conditions that are typical for aquaculture environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To achieve good results with the existing target detection framework, a large amount of annotated data is often needed. However, the acquisition of annotated data is a laborious process. It is even impossible to obtain sufficient annotated data in some categories. To reduce the dependence of deep learning model on large-scale data, a new end-to-end single-stage detector (FSSSD) network is proposed based on metric learning method. In this way, the objects that are not seen during training can be detected under the condition of providing a small number of samples. The main innovation of this paper is that the traditional single-stage detection model FCOS is improved and the Class Feature Extractor module is added to make it become the Correlation Detector. Thus, the model can extract the feature distribution of the support category from the small number of pictures provided by the support set. Thereafter, the model converts the query set feature map into the object probability distribution map, and fuses it with the original feature map to enhance the feature representation of the potential objects consistent with the supporting category on the feature map, so that the model pays more attention to the objects consistent with the supporting category in classification and regression. The method in this paper does not need fine-tuning or retraining at all when recognizing objects of a new category, and only needs to provide supporting pictures of the corresponding category during testing. At the same time, our modules are flexible and easy to migrate, theoretically suitable for all target detection models, and can improve the performance of these models on few-shot problems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Electromyography (EMG) is the most used clinical and biological signal, which is an essential element for detecting several neurodegenerative and neuromuscular disorders such as ALS, myopathy, neuropathy etc. Among various functional abnormalities of the motor neuron, Amyotrophic Lateral Sclerosis (ALS) is one of the deadliest neurodegenerative diseases. Several studies have been conducted recently; researchers struggle to find an appropriate approach to enhance the identification mechanism for ALS disease. In this paper, we developed a new algorithm based on Multi-resolution analysis, Fast Wavelet Transform and wavelet network served as feature extraction, selection, and reduction all in one technique. The results of our study are adequate for the classification of ALS, Normal and Myopathy patients. Furthermore, our approach with the AdaBoost classifier outperformed all the recent studies with 100% overall accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Human action recognition has received a lot of attention in computer vision community given its interest in many real applications. In this paper, we proposed a new method for human action recognition based on deep learning methods. The main contribution of the proposed method is an efficient combination of two Convolutional neural networks. The two-stream framework allows to fully utilize the rich multimodal information in videos. In fact, we explored the complementarity between appearance information and motion information to represent human actions. Specifically, we suggested a spatial Convolutional Neural Network performed on still individual images to model spatial information. To exploit motion between frames, a second Convolutional Neural Network is processed on accumulated optical flow images obtained by stacking the optical flow estimations between consecutive frames in a single image. Then, a fusion score is performed between the two Convolutional Neural Networks to achieve the appropriate class. In order to prove the performance of our method, we trained and evaluated our architecture on a standard human actions benchmark, the Weizmann dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Local patch descriptors are used in many computer vision tasks. Over the past decades, many methods of descriptor extraction were proposed. In recent years researchers have started to train descriptors via Convolutional Neural Networks (CNNs) which have shown their advantages in many other computer vision fields. However, the resulting descriptors are usually represented as a long real-valued vector. That leads to high computational complexity and memory usage in real applications with a large amount of data being processed. To deal with that problem binary local descriptors were designed, but they still have a large size. In this paper, we propose a method of discrete low-dimensional local descriptor creation with lightweight CNN. We show that for small-sized descriptors the quality drops significantly during simple binarization compared to floating-point ones. The experiments on HPatches dataset [1] demonstrate that our descretization approach dramatically outperforms the naive binarization for the compact descriptors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a new method for protection of copyright on pretrained deep neural networks is proposed. The main idea is to embed a digital watermark into a pretrained model by finetuning the final layer weights. A deep neural network is retrained on a unique trigger set formed by synthesizing pseudo-holographic images and embedding them into raster images of the original dataset. In order to provide the accuracy of the original model, the deep model watermarking process is implemented with addition of a new class intended for the elements of the trigger sample. Experimental results show that the quality of the original model is not affected by watermarking process. Furthermore, the model can be retrained to distinguish the watermark of a legal owner from unauthorized one.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a human pose estimation method for martial arts video analysis using a Semantic Graph Convolutional Network (SemGCN) instead of an ordinary convolutional neural network (CNN). The inputs for the model are videos from the Human3.6M dataset, in addition to the ones from Martial Arts, Dancing and Sports (MADS) dataset. A data unification process is described so that MADS joints can be adapted to the Human3.6M base setting. The performance of the model when only uses Human3.6M for training is compared to training with both Human3.6M and MADS datasets, resulting in a lower mean per-joint position error (MPJPE) for the latter. Finally, performance indicators such as the vertical position of the center of mass, balance and stability, are calculated for the MADS sequences in order to provide insights regarding martial arts execution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a novel computer vision-based deep learning approach for metadata extraction as both a central component of and an ancillary aid to structured information extraction from scientific literature which has various formats. The number of scientific publications is growing rapidly, but existing methods cannot combine the techniques of layout extraction and text recognition efficiently because of the various formats used by scientific literature publishers. In this paper, we introduce an end-to-end trainable neural network for segmenting and labeling the main regions of scientific documents, while simultaneously recognizing text from the detected regions. The proposed framework combines object detection techniques based on Recurrent Convolutional Neural Network (RCNN) for scientific document layout detection with Convolutional Recurrent Neural Network (CRNN) for text recognition. We also contribute a novel data set of main region annotations for scientific literature metadata information extraction to complement the limited availability of high-quality data set. The final outputs of the network are the text content (payload) and the corresponding labels of the major regions. Our results show that our model outperforms state-of-the-field baselines.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning architectures have emerged as powerful function approximators in a broad spectrum of complex representation learning tasks, such as, computer vision, natural language processing and collaborative filtering. These architectures bear a high potential to learn the intrinsic structure of data and extract valuable insights. Despite the surge in the development of state-of-the-art intelligent systems using the deep neural networks (DNNs), these systems have found to be vulnerable to adversarial examples produced by adding a small-magnitude of perturbations. Such adversarial examples are adept at misleading the DNN classifiers. In the past, different attack strategies have been proposed to produce adversarial examples in the digital, physical, and transform domain, but the likelihood to generate perceptually realistic adversarial examples require more research efforts. In this paper, we present a novel approach to produce adversarial examples by combining the single-shot fast gradient sign method (FGSM) and spatial, as well as, transform domain image processing techniques. The resulted perturbations neutralize the impact of low-intensity based regions, thus, instilling the noise only in the selective high-intensity regions of the input image. While combining the customized perturbation with one-step FGSM perturbation in an un-targeted black-box attack scenario, the proposed approach successfully fools state-of-the-art DNN classifiers with 99% adversarial examples being misclassified on the ImageNet validation dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The visual impairment community especially blind people needs support from advanced technologies to help them with understanding and answering the image content. In the multi-modal area, Visual Question Answering (VQA) is the notable cutting-edge task requiring the combination of images and texts via a co-attention mechanism. Inspired by the Deep Co-attention Layer, we propose a Bi-direction Co-Attention VT-Transformer network to jointly learn visual and textual features simultaneously. Via our system, the relationship and interaction of the modality objects are digested and combined together into the meaningful space. Besides, the consistency of Transformer architecture in both feature extractor and multi-modal attention function is efficient enough to decrease the layer of attention as well as the computation cost. Through the experimental results and ablation studies, our model achieves the promising performance against the existing approaches and uni-direction mechanism in VizWiz-VQA 2020 dataset for blind people.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The delineation of plants and trees and their structural analysis is getting more important in agricultural and ecological applications. In our paper, we propose an approach where 2D-projected graph-based tree models are generated from monocular images with the help of deep neural networks (DNN): the three main blocks of the network are responsible for segmentation, contour, and centerline detection. Thus graph structures are built upon these predicted structural elements. We demonstrate that the applied DNN can also help to reconstruct the spatial (depth) order of crossing branches. The proposed method is believed to have the potential to soon replace current expensive and timeconsuming laser scanning approaches for many applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Lately, one of the most common illegal activities include the use of shooting weapons. In such dangerous situations, there is a dire need of preventive measures that can automatically detect such munitions. This paper presents the use of computer vision and deep learning to detect weapons like guns, revolvers and pistols. Convolutional Neural Networks can be efficiently used for object detection. In this paper, precisely, two Convolutional Neural Network (CNN) architectures - Faster R-CNN with VGG16 and YOLOv3, have been used, to carry out the detection of such weapons. The pre-trained neural networks were fed with images of guns from the Internet Movie Firearms Database (IMFDB) which is a benchmark gun database. For negative case images, MS COCO dataset was used. The goal of this paper is to present and compare performance of the two models to bring about gun detection in any given scenario. The results of YOLOv3 outperforms Faster R-CNN with VGG16. The ultimate aim of this paper is to detect guns in an image accurately which in turn can aid crime investigation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image segmentation is one of the key components in systems performing computer vision recognition tasks. Various algorithms for image segmentation have been developed in the literature. Among them, more recently, deep learning algorithms have been remarkably successful in performing this task. A downside with deep neural networks for segmentation is that they require a large amount of labeled dataset for training. This prerequisite is one of the main reasons that led researchers to adopt data augmentation approaches in order to minimize manual labeling efforts while maintaining highly accurate results. This paper uses classical non-deep learning methods for background extraction to increase the size of the dataset used to train deep learning attention segmentation algorithms when images are presented as time-series to the model. The method presented adopts the Gaussian mixture-based (MOG2) foreground-background segmentation followed by dilation and erosion to create masks necessary to train the deep learning models. It is applied in the context of planktonic images captured in situ as time series. Various evaluation metrics and visual inspection are used to compare the performance of the deep learning algorithms. Experimental results show higher accuracy achieved by the deep learning algorithms for time-series image attention segmentation when the proposed data augmentation methodology is utilized to increase the training dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we present the auto-clustering method which can be used for pattern recognition tasks and applied to the training of a metric convolutional neural network. The main idea is that the algorithm creates clusters consisting of classes similar from a network’s point of view. The usage of clusters allows the network to pay more attention to classes that are hard to differ. This method improves the generation of pairs during the training process, which is a current problem because the optimal generation of data significantly affects the quality of training. The algorithm works in parallel with the training process and is fully automatic. To evaluate this method we chose the Korean alphabet with the corresponding PHD08 dataset and compared our auto-clustering with random-mining, hard-mining, distance-based mining. Opensource framework Tesseract OCR 4.0.0 was also considered to evaluate the baseline.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The problem addressed in this paper is feature extraction and classification of images. As a solution, we proposed a Deep Wavelet Network architecture based on the Wavelet Network and the Stacked Auto-encoders. In this work, we shifted from the deep learning based on neural networks to deep learning based on wavelet networks. The latter doesn’t change the general form of the Deep Learning based on the Neural Network but it is a novel method that shows the process of feature extraction and explains the system of image classification. Our Deep Wavelet Network is created for the training and the classification phase. After the training phase, a linear classifier is applied. Finally, the experimental test of our method is in the COIL-100 dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the manufacturing process and environmental effects steel surfaces can have a variety of defects. The nonuniform surface brightness and the variety of shapes of defects make their detection challenging. In our paper we propose neural networks for the recognition of new defect classes and also for the classification of known types. For the former a zero-shot approach, based on a siamese network, is used learning features to classify unseen classes without a single training example. Additionally, we can utilize one branch (one structural part) of the same network for the classifications of previously trained defects. For performance evaluations, experiments were carried out on two benchmark data-sets: the Northeastern University and the Xsteel surface defect data-sets. Results show that our method outperforms the state-of-the-art solutions on the NEU data-set for zero-shot learning and for classification with accuracy 85.80% and 100% respectively. In case of the Xsteel data-set, we reached 98% for classification (which is the top known performance).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Monitoring of pigs is currently done by humans observing pig pens to acquire information such as behaviour to back certain actions such as regulation of temperature. These actions can have vital outcomes, as taking the wrong action or not taking an action, can result in fatalities. A vision-based system to automatically monitor pigs, therefore, has the potential of improving the welfare of pigs and enhancing the pig industry. This work investigates such a system and demonstrates several modules that can be utilized to obtain behavioural information and how such information can be interpreted. Moreover, the utilized system modules and side-view data will be evaluated. The code is public and available at: https://github.com/KHML-master/apimos.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multi-source unsupervised domain adaptation (MUDA) has received increasing attention that leverages the knowledge from multiple relevant source domains with different distributions to improve the learning performance of the target domain. The most common approach for MUDA is to perform pairwise distribution alignment between the target and each source domain. However,existing methods usually treat each source domain identically in source-source and source-target alignment, which ignores the difference of multiple source domains and may lead to imperfect alignment. In addition, these methods often neglect the samples near the classification boundaries during adaptation process, resulting in misalignment of these samples. In this paper, we propose a new framework for MUDA, named Joint Alignment and Compactness Learning (JACL). We design an adaptive weighting network to automatically adjust the importance of marginal and conditional distribution alignment, and such weights are adopted to adaptively align each pair of source-target domains. We further propose to learn intra-class compact features for some target samples that lie in boundaries to reduce the domain shift. Extensive experiments demonstrate that our method can achieve remarkable results in three datasets (Digit-five, Office-31, and Office-Home) compared to recently strong baselines.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The moving object classification is a crucial step for several video surveillance applications whatever in the visible or thermal spectra. It still remains an active field of research considering the diversity of challenges related to this topic mainly in the context of an outdoor scene. In order to overcome several intricate situations, many moving objects classification methods have been proposed in the literature. Particular interest is given to the classes “Pedestrian” and “Vehicle”. In this paper, we have proposed a moving object classification approach based on deep learning methods from visible and infrared spectra. Three series of experiments carried on the challenging dataset “CD.net 2014” have proved that the proposed method reach accurate moving objects classification results when compared to methods based on deep learning and handcrafted features.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Over time, crack pattern (craquelure) inevitably develops in paintings as a sign of their ageing, sometimes accompanied by larger losses of paint (lacunas). In restoration treatments, cracks are typically not filled in, and virtual restoration is often the only option to “reverse” the ageing of paintings, simulating their original appearance. Moreover, virtual restoration can serve as an important supporting step in decision making during the physical restoration. In this research, we investigate the possibility of applying deep learning-based methods for virtual restoration. In particular, our crack detection method is based on a convolutional autoencoder (U-Net), and we employ a generative adversarial neural network (GAN) to virtually inpaint the detected cracks. We propose an original way of training the GAN model for painting restoration, which improves its practical performance. A series of experiments shows encouraging results in comparison with known methods, and indicates huge potential of deep learning for virtual painting restoration.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Malignant melanoma is a common skin cancer that is mostly curable before metastasis -when growths spawn in organs away from the original site. Melanoma is the most dangerous type of skin cancer if left untreated due to the high risk of metastasis. This paper presents Melatect, a machine learning (ML) model embedded in an iOS app that identifies potential malignant melanoma. Melatect accurately classifies lesions as malignant or benign over 96.6% of the time with no apparent bias or overfitting. Using the Melatect app, users have the ability to take pictures of skin lesions (moles) and subsequently receive a mole classification. The Melatect app provides a convenient way to get free advice on lesions and track these lesions over time. A recursive computer image analysis algorithm and modified MLOps pipeline was developed to create a model that performs at a higher accuracy than existing models. Our training dataset included 18,400 images of benign and malignant lesions, including 18,000 from the International Skin Imaging Collaboration (ISIC) archive, as well as 400 images gathered from local dermatologists; these images were augmented using DeepAugment, an AutoML tool, to 54,054 images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There is a growing need for accurate depth measurements of on-chip structures, fueled by the ongoing size reduction of integrated circuits. However, current metrology methods do not offer a satisfactory solution. As Critical Dimension Scanning Electron Microscopes (CD-SEMs) are already being used for fast and local 2D imaging, it would be beneficial to leverage the 3D information hidden in these images. In this paper, we present a method that can predict depth maps from top-down CD-SEM images. We demonstrate that the proposed neural network architecture, together with a tailored training procedure, leads to accurate depth predictions on synthetic and real experimental data. Our training procedure includes a domain adaptation step, which utilizes data from a different modality (scatterometry), in the absence of ground truth data in the experimental CD-SEM domain. The mean relative error of the proposed method is smaller than 6.2% on a contact-hole dataset of synthetic CD-SEM images with realistic noise levels. Furthermore, we show that the method performs well in terms of important semiconductor metrics. To the extent of our knowledge, we are the first to achieve accurate depth estimation results on experimental data, by combining data from the aforementioned modalities. We achieve a mean relative error smaller than 1%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present the collections of images of the same rotating plastic object made in X-ray and visible spectra. Both parts of the dataset contain 400 images. The images are maid every 0.5 degrees of the object axial rotation. The collection of images is designed for evaluation of the performance of circular motion estimation algorithms as well as for the study of X-ray nature influence on the image analysis algorithms such as keypoints detection and description. The dataset is available at https://github.com/Visillect/xvcmdataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The development of new methods for reconstruction of an object image given its sinogram and some additional information about the object is important due to the possibility of artifact presence in the reconstructed image or its insufficient sharpness when the used additional information does not hold. A new method for processing tomographic images based on combining an assumptions-free image of the object and a dummy estimate followed by projection onto the set of images of possible objects is proposed and developed for the case when this set is the non-negative cone. This additional information corresponds to the natural condition of non-negativeness of the estimated brightnesses. The proposed method with different used iterative solving method and its parameters is illustrated by examples of processing teeth sinograms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In computed tomography (CT), the relative trajectories of a sample, a detector, and a signal source are traditionally considered to be known, since they are caused by the intentional preprogrammed movement of the instrument parts. However, due to the mechanical backlashes, rotation sensor measurement errors, thermal deformations real trajectory differs from desired ones. This negatively affects the resulting quality of tomographic reconstruction. Neither the calibration nor preliminary adjustments of the device completely eliminates the inaccuracy of the trajectory but significantly increase the cost of instrument maintenance. A number of approaches to this problem are based on an automatic refinement of the source and sensor position estimate relative to the sample for each projection (at each time step) during the reconstruction process. A similar problem of position refinement while observing different images of an object from different angles is well known in robotics (particularly, in mobile robots and selfdriving vehicles) and is called Simultaneous Localization And Mapping (SLAM). The scientific novelty of this work is to consider the problem of trajectory refinement in microtomography as a SLAM problem. This is achieved by extracting Speeded Up Robust Features (SURF) features from Xray projections, filtering matches with Random Sample Consensus (RANSAC), calculating angles between projections, and using them in factor graph in combination with stepper motor control signals in order to refine rotation angles.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
ETL process is responsible of the integration of data into the data warehouse. It is about extracting data from different sources, transforming it to deliver quality data of value for analysis, and loading it into the warehouse. In this context, we propose a new approach called Mapping-ELT (M-ELT) is suggested to deal with ELT basic operations and take into account the semantic heterogeneity. In order to accelerate data handling, the Hive is used to improve data warehousing capabilities and Ontology as a solution to treat the problems of semantic heterogeneity. Experimental results confirm that the ELT operation works well, particularly with adapted operations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.