PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 11736, including the Title Page, Copyright Information, and Table of Contents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep Neural Networks (DNNs) although promising for image processing, have significant computational complexity. This impedes implementation in resource-constrained systems. This paper presents effective heuristic approaches for porting DNNs onto mobile devices. Four sets of heuristics are studied: (1) heuristics based on the reuse of transferred weight matrices and weight pruning; (2) heuristics based on parameter reduction, network acceleration and non-tensor layer improvements; (3) a suite of heuristics for low power acceleration of DNNs based on dataflow, near memory and in-memory processing, transform schemes and analog based approaches; and (4) heuristics based on feature and feature map pruning utilizing cosine distances. These sets of heuristics achieve significant complexity, memory and power reductions with minimal reduction of accuracy across an assortment of state-of-the-art DNNs and applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fingerprinting is one form of biometrics, a science that can be used for personal identification. It is one of the important techniques and security measures for human authentication across the globe due to its uniqueness and individualistic characteristics. Fingerprints are made up of an arrangement of ridges, called friction ridges. Each ridge consists of pores, that are attached to the glands under the skin. Several algorithms proposed different approaches to recreate fingerprint images. However, these works encountered problems with poor quality and presence of structured noise on these images. In this paper, we present a novel fingerprint system that provides more unique and robust algorithms which are capable to distinguish between individuals effectively. A sparse autoencoder (SAE) algorithm is used to reconstruct fingerprint images. It is an unsupervised deep learning model that replicates its input at the output. The architecture is designed and trained with datasets of fingerprint images that are pre-processed to be able to fit them in the model. Three datasets of fingerprint images have been utilized to validate the robustness of the model. This dataset has been split into 70% for training and 30% for testing the model. SAE is fine-tuned and optimized with L2 and sparsity regularization, thus it increased the efficiency of learning representation for the architecture. The sparse autoencoder is a suitable deep learning model to improve the recreation of fingerprint images significantly. The proposed approach showed promising results, and it can enhance the quality of reproduced fingerprint images with a clear ridge structure and eliminating various overlapping patterns.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Real-time processing of images and videos is becoming considerably crucial in modern applications of machine learning (ML) and deep neural networks. Having a faster and compressed floating point arithmetic can significantly increase the performance of such applications optimizing memory occupation and transfer of information. In this field, the novel posit number system is very promising. In this paper we exploit posit numbers to evaluate the performance of several machine learning algorithms in real-time image and video processing applications. Future steps will involve further hardware accelerations for native posit operations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Segmentation of individual seagrass images is of importance to biologists who are investigating individual seagrass blade cover to correlate the surface cover information to benthic environmental factors. Seagrasses may be covered with epiphytes like crustose and filamentous algae and tubeworms, all bioindicators of nutrient and turbidity conditions of the seagrass environment. Classical image processing techniques to segment seagrasses have been successful; however, such techniques are relatively time consuming. We introduce deep learning as a computationally efficient approach to perform semantic segmentation in multiple seagrass images to determine each blade’s percent cover and surface composition. Pre-trained ResNet-18 and ResNet-50 convolutional neural networks have been adapted using transfer learning to classify seagrass blade surface composition. Seagrass surface semantic segmentation and mapping is achieved for five classes including the bare seagrass blade (no cover), general epiphyte, tubeworm, filamentous algae, and background. We present the application of deep learning in two convolutional neural networks to achieve semantic segmentation of seagrass blades as a fast tool for seagrass surface classification. Classification accuracy and computational performance of the two deep CNN are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Prior work leveraging neural networks in agriculture have been proposed and achieved significant results in autonomous classification of diseases in plants. One notable complication for classification using neural networks, however, is inability to acknowledge classes the model was not trained to identify. With the recent advancements in computer vision, we develop a convolutional neural network for the specific intent of detection of Northern Corn Leaf Blight via segmentation – resulting in a network which is resistant to diseases the model is not capable of classifying, and thus also reducing occurrences of Type I and Type II error. The model is trained on a publicly available dataset of maize images with Northern Corn Leaf Blight and annotations documenting precise locations of the disease in each image. We report the mean average precision (mAP) of the developed model and its effectiveness in real time detection with its latency and computational overhead. The impact of this research is a reliable means of identifying specific diseases in plants, reducing misclassification due to inability to classify, and facilitating the development of products that incorporate microcontrollers while demonstrating their ability to be used in real time disease detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Photoacoustic imaging is an emerging imaging technology based on the photoacoustic effect. As a hybrid imaging technology that combines pure optical imaging and ultrasound imaging, it also has the advantages of optical imaging with high resolution and rich contrast. And the advantage of high penetration depth of acoustic imaging. With its advantages, photoacoustic imaging has extremely broad applications in biomedical testing, such as brain imaging and tumor imaging. Due to the optical diffraction limit of the objective lens, the image resolution of the obtained image is hard to be further improved, therefore, finer structural information is difficult to obtain. In order to solve this problem, we use an end-to-end convolutional neural network from low resolution to high resolution to further process the obtained low-resolution images to obtain optimized high-resolution image and improve the quality of imaging. A convolutional neural network is built on the pycharm platform through the open source Tensorflow library. Bicubic interpolation is used to preprocess the original data. Then we perform network training on the processed sample data and finally a series of photoacoustic microscopy images of cerebral blood vessels[1,2] were tested. The test results show that the resolution of the image is significantly improved, and a clearer image is obtained. The experimental results verify that this end-to-end convolutional neural network from low resolution to high resolution can effectively improve the resolution of photoacoustic imaging. This has laid a good foundation for the follow-up biomedical research[3] of photoacoustic imaging technology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Human action recognition has been utilized in many applications such as human-computer interaction, video surveillance, assistive living, and gaming. Deployment of human action recognition demands the processing to be carried out in real-time or in a computationally efficient manner. The real-time requirement is addressed by only a subset of the developed methods in the literature. This paper provides a review of computationally efficient human action recognition methods in which a vision sensor is used. The reviewed papers are categorized in terms of conventional and deep learning approaches as well as in terms of single vision and multi-vision modality sensing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
People recognition is a relevant subset of the generic image based recognition task with many possible application areas such as security, surveillance, human-robot interaction or recently the social security in a pandemic context. In this work we present a light-weight recognition pipeline for time-of-flight cameras based on deep learning techniques tailored to this specific type of camera with registered infrared and depth images. By combining the maturity of the 2D image based recognition techniques with the custom depth sensing we achieved effective solutions for a number of relevant industrial applications. In particular, our focus was on automatic door-control and people counting applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Streaming of 360-degree videos over the internet is challenging task, but it provides rich multimedia experiences by allowing viewers to navigate 360-degree contents. The 360-degree videos need larger bandwidth and less latency to be streamed over the internet than the conventional videos. Therefore, non-visible area must be discarded from the video to save bandwidth. View prediction techniques have been used to predict visible area of the 360-degree video frames to be streamed. Linear regression using viewer’s past viewing behavior data is useful to predict short-term future behavior of the viewer, which is not useful when the network delay is longer than the prediction horizon. Object detection techniques help predicting viewers’ future motion for longer prediction horizon since the viewers tend to follow the objects that draw their attention. However, conventional object detection techniques using a convolutional neural network, such as YOLO, are difficult to be applied to 360-degree videos. There are distortions in the 360-degree videos when the spherical 360-degree video is projected into equi-rectangular videos for processing and storing purposes. A same object could have different shapes in the equi-rectangular video depends on their angular position in the sphere. Therefore, in this paper, we propose a multi-directional projection (MDP) technique to detect objects in the 360-degree videos. The proposed multi- directional projection technique mitigates the distortions in the equi-rectangular videos and feeds the redirected videos to the object detection system. Therefore, the neural network trained with conventional video dataset can be used without any change. Experimental result shows that the proposed method helps detecting objects in the edges of the 360-degree videos.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The presence of Advanced Driver Assistance Systems (ADAS) in modern vehicles has become a reality in recent years, enhancing the comfort and safety of drivers and road users. In the track to achieve full autonomous driving it is vital to include Driver Monitoring Systems (DMS) as part of the automation set of systems to assure possible hand-over/hand-back actions. The development of DMS usually involve the integration of different computer vision and deep learning components. In this work we present a modular approach for rapid prototyping of DMS by defining atomic processing units (i.e. Analyzers) and the interface (i.e. Measures) between these units. This approach allows the definition of a network of Analyzers which can be easily interconnected in pipelines to perform specific DMS tasks (drowsiness, distraction, identity recognition). A key advantage of our approach is that a single step can be re-used for multiple DMS functionalities without the need to double computational resources. In addition, it is possible to test and validate different methods that share the same interfaces and produce the same measures. Therefore, it is easy to switch between different algorithms in a pipeline. The distributed processing capabilities of the resulting DMS architectures obtained from the proposed framework allow the generation of parallel processes in specialized hardware (i.e. Multi-core CPU and GPU boards) with a positive impact on real-time performance. Our DMS framework is compatible with RTMaps automotive-level platform for real-time multi-sensor data processing and the interfaces are compliant with the ASAM OpenLABEL concept paper by using VCD description format.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Real time circle detection requires a considerable amount of computing power, especially with growing image size. This paper presents a modified version of the Hough transform with a dedicated and streamlined pre- processing stage to detect circles in video images in real-time using mid-range performance smartphones. Hough transform for detection of co-circular line pixels requires a 3-dimensional data space instead of 2 dimensions for detection of co-linear pixels. This dimensional complexity and the fact that Hough transform in general requires computational expensive pre-processing, make optimizations for hand-held or embedded systems inevitable. Multiple modifications for tuning the algorithms by trading mathematical accuracy against processing speed are shown in this paper, which improve the overall computational performance, significantly. Some of these optimizations allow e.g. to replace the edge detection process completely by a simple but smart thresholding and pixel-wise neighbourhood inspection, using pre-calculated lookup tables instead of complex calculations and restricting the Hough space in size and precision. These modifications where implemented and tested on both desktop and mobile devices for comparison but without any support by the GPU. Benchmarks showed that more than 60 FPS on desktops and more than 20 FPS on mobile devices are achievable for processing full HD resolution images, which allows implementations meeting the real time constraints and deadlines specified by a concrete application of an ambulant water quality analysis scenario.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the most preferred platforms or boards for developing 'real-time image-video processing' applications is NVIDIA's Jetson Nano, which is equipped with CUDA that will accelerate the performance. However, there is no research has yet evaluated CUDA performance on the Jetson Nano for Real-Time Image/Video Processing applications. Through this research, an evaluation of the CUDA performance on Jetson Nano will be obtained by running the Thresholding application with and without the CUDA feature. Some of the aspects evaluated from this study are as follows: CPU usage percentage, GPU usage percentage, temperature level, Current Usage on CPU, Current Usage on GPU from the research results, it was found that the CUDA feature does not always provide added value or performance in its use, CUDA will run very effectively when offline (not real-time), but in real-time, the performance with and without CUDA is almost the same.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Since the first quarter of this year, the spread of SARS-CoV-19 virus has been a worldwide health priority. Medical testing consists of Lab studies, PCR tests, CT, PET, which are time-consuming, some countries lack these resources. One medical tool for diagnosis is X-Ray imaging, which is one of the fastest and low-cost resources for physicians to detect and to distinguish among these different diseases. We propose an X-Ray CAD system based on DCNN, using well-known architectures such as DenseNet-201, ResNet-50 and EfficientNet. These architectures are pre-trained on data from Imagenet classification challenge, moreover, using Transfer Learning methods to Fine-Tune the classification stage. The system is capable to visualize the learned recognition patterns applying the GRAD-CAM algorithm aiming to help physicians in seeking hidden features from perceptual vision. The proposed CAD can differentiate between COVID-19, Pneumonia, Nodules and Normal lung X-Ray images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Land-cover classification is one of many applications involved with remote sensing. This task usually requires image processing to compute relevant features, which will be the input for a classifier. Some feature extraction algorithms can become complex in processing time since remote sensing images such as hyperspectral ones consist of a large number of bands. This represents a delay in the classification stage. Consequently, the final results of the land-cover classification could not be obtained in real-time. Therefore, a parallel implementation of the feature extraction stage may contribute to the real-time classification process of hyperspectral images by reducing the computing time of the features. In the specific case of hyperspectral images, the features can be categorized in spatial and spectral, being the algorithms to obtain the spatial ones more susceptible to increase their computational time due to parameters such as the neighborhood size. One spatial-feature extraction method that has led to desirable classification results in image processing is the Gabor filter. Nonetheless, it implicates a high computational cost because of the application of the filter bank composed of various rotations and scales. This work aims to propose a parallel implementation of a Gabor filter feature extraction method for hyperspectral images over a Graphics Processing Unit (GPU) and multi-core Central Process Unit (CPU). The performance of the implementation is compared with the non-parallel version of the process in terms of computing time and time complexity of the algorithms. Furthermore, the feature extraction method is evaluated with a Support Vector Machine (SVM) using overall accuracy and kappa coefficient as quality metrics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Modern Advanced Driver Assistance Systems (ADAS) require the ability to sense and process information in real-time. More specifically, these devices need to accurately and quickly detect lanes in images. The Hough transform (HT) is a very accurate method of finding lines in a still image. In order to meet real-time requirements and low power consumption, a proposed hardware architecture design for the Hough transform in a real-time lane detection system is presented. This design efficiently and aggressively utilizes the DSP and embedded memory blocks in a configurable platform to speed up the HT calculation as well as reduce the resource requirements of the system. The proposed design utilized a parallelpipeline architecture to allow for full area coverage of all line possibilities while optimizing for hardware restrictions. Initial results have shown that the proposed design achieve a normalized processing rate of 6.06 ns per pixel which is suitable for real-time lane detection application.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Processing hyperspectral image data can be computationally expensive and difficult to employ for real-time applications due to its extensive spatial and spectral information. Further, applications in which computational resources may be limited, such as those requiring artificial intelligence at the edge, can be hindered by the volume of data that is common with airborne hyperspectral image data. This paper proposes utilizing band selection to down-select the number of spectral bands considering a given classification task so that classification can be done at the edge with lower computational complexity. Specifically, we consider popular techniques for band selection and investigate their feasibility to identify discriminative bands such that classification performance is not drastically hindered. This would greatly benefit applications where time-sensitive solutions are needed to ensure optimal outcomes (this could be related to defense, natural disaster relief/response, agriculture, etc.). Performance of the proposed approach is measured in terms of classification accuracy and run time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Hyperspectral image analysis has been attracting research attention in a variety of fields. Since the size of hyperspectral data cubes can easily reach gigabytes, their efficient transfer, manual delineation, and intrinsic heterogeneity have become serious obstacles in building ground-truth datasets in emerging scenarios. Therefore, applying supervised learners for the hyperspectral classification and segmentation remains a difficult yet very important task in practice, as segmentation is a pivotal step in the process of extracting useful information about the scanned area from such highly dimensional data. We tackle this problem using self-organizing maps and exploit an unsupervised algorithm for segmenting such imagery. The experimental study, performed over two benchmark hyperspectral scenes and backed up with the sensitivity analysis, showed that our technique can be applied for this purpose due to its flexibility, it delivers reliable segmentations, and offers fast operation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Moving object detection and recognition has been widely used in computer vision and remote sensing field. When the foreground object exists in the initial frame, the original ViBe algorithm has ghost phenomenon and the fixed threshold is not always appropriate for different background complexity. In the light of this, an improved ViBe algorithm is proposed in this paper. In order to reduce the repetition rate of the pixel value in background model, the proposed method changes the way of neighborhood selection so as to improve the accuracy of background model initialization. During the background model update process, different time subsampling factors are used to speed up the update. Based on the characteristic of less texture information in ghost regions, texture feature operators are used to further remove ghost. In addition, the adaptive threshold is used to replace the fixed threshold to improve the anti-noise performance of the algorithm. Shadow features, the unique brightness, hue and saturation, are used to solve the problem that the moving shadow causes the decrease of detection accuracy. Experiments have been conducted on the public ChangeDetection.net data set, indicating that the proposed method is superior to original ViBe algorithm, thus the higher detection accuracy can be achieved and ghost phenomenon and moving shadows can be alleviated at the similar detection efficiency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep Neural Networks (DNNs) have been deployed in many real-world applications in various domains, both industry and academic, and have proven to deliver outstanding performance. However, DNNs are vulnerable to adversarial attacks, that are small perturbations embedded in an image. As a result, introduction of DNNs into safety-critical systems, such as autonomous vehicles, unmanned aerial vehicles or healthcare devices, would introduce very high risk of limiting their capabilities to recognize and interpret the environment in which they are used and therefore would lead to devastating consequences. Thus, robustness enhancement of DNNs by development of defense mechanisms is a matter of the utmost importance. In this paper, we evaluated a set of state-of-the-art denoising filters designed for impulsive noise removal as defensive solutions. The proposed methods are applied as a pre-processing step, in which the adversarial patterns in the source image are removed before performing classification task. As a result, the pre-processing defense block can be easily integrated with any type of classifier, without any knowledge about utilized training procedures or internal architecture of the model. Moreover, the evaluated filtering methods can be considered as universal defensive techniques, as they are completely unrelated with the internal aspects of the selected attack and can be applied against any type of adversarial threats. The experimental results obtained on German Traffic Sign Recognition Benchmark (GTSRB) have proven that the denoising filters provide high robustness against sparse adversarial attacks and do not significantly decrease the classification performance on non-altered data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Living in a constant news cycle creates the need for automated tracking of events as they happen. This can be achieved through the investigation of broadcast overlay textual content. There exists a great amount of information to be deciphered via these means before further processing, with applications spanning from politics to sports. We utilize image processing to create mean cropping masks based on binary slice clustering from intelligent retrieval to identify areas of interest. This data is handed off to CEIR, based on the connectionist text proposal network (CTPN) to fine-tune the text locations and an advanced convolutional recurrent neural networks (CRNN) system to carry out text recognition to recognize the text strings. In order to improve the accuracy and reduce processing time, this novel approach utilizes a preprocessing mask identification and cropping module to reduce the amount of data being processed by the more finely tuned neural network.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.