PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 11401, including the Title Page, Copyright information, and Table of Contents
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
While the data throughput of electronic processors is rather high (TFLOP range), the question arises whether optical neural networks (NN) enable a value proposition for zero-delay (real-time) processing? The answer might be ‘yes’, since once the network is trained, the time to perform an inference tasks is simply given by the time-of-flight of the photon through the processor’s NN. Here we discuss how the three functions of the perceptron (dot-product synaptic weighting, summation, and nonlinear thresholding) can be mapped onto a) optoelectronic [George et al. Opt.Exp. 2019], and b) all-optical hardware [Miscuglio et al. OMEx 2018]. The latter is realized via co-integration of phase-change-materials atop Silicon photonics. Once trained, the weights only require rare updating, thus saving power. Performance wise, such an integrated all-optical NN is capable of < fJ/MAC using experimental demonstrated pump-probe [Waldecker et al, Nat. Mat. 2015] with a delay per perceptron being ~ps.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Semantic Segmentation using convolutional neural networks is a trending technique in scene understanding. As these techniques are data-intensive, several devices struggle to store and process even a small batch of images at a time. Also, as the volume of training datasets required by the training algorithms is very high, it might be wise to store these datasets in their compressed form. Not only this, in order to correspond the limited bandwidth of the transmission network the images could be compressed before sending to the destination. Joint Photography Expert Group (JPEG) is a famous technique for image compression. However, JPEG introduces several unwanted artifacts in the images after compression. In this paper, we explore the effect of JPEG compression on the performance of several deep-learning-based semantic segmentation techniques for both the synthetic and real-world dataset at various compression levels. For some established architectures trained with compressed synthetic and real-world dataset, we noticed the equivalent (and sometimes better) performances compared to uncompressed dataset with substantial amount of storage space reduced. We also analyze the effect of combining original dataset with the compressed dataset with different JPEG quality levels and witnessed a performance improvement over the baseline. Our evaluation and analysis indicates that the segmentation network trained on compressed dataset could be a better option in terms of performance. We also illustrate that the JPEG compression acts as a data augmentation technique improving the performance of semantic segmentation algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents the real-time implementation of deep neural networks on smartphone platforms to detect and classify diabetic retinopathy from eye fundus images. This implementation is an extension of a previously reported implementation by considering all the five stages of diabetic retinopathy. Two deep neural networks are first trained, one for detecting four stages and the other to further classify the last stage into two more stages, based on the EyePACS and APTOS datasets fundus images and by using transfer learning. Then, it is shown how these trained networks are turned into a smartphone app, both Android and iOS versions, to process images captured by smartphone cameras in real-time. The app is designed in such a way that fundus images can be captured and processed in real-time by smartphones together with lens attachments that are commercially available. The developed real-time smartphone app provides a costeffective and widely accessible approach for conducting first-pass diabetic retinopathy eye exams in remote clinics or areas with limited access to fundus cameras and ophthalmologists.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Real-time face verification from a live stream is still an open question, although it is quite popular in recent years. In order to overcome this problem, several bio-metric techniques are widely used for authentication purposes, both in military and civilian areas. It is merely the task of detecting and comparing a candidate face with the other faces in the database to validate whether they are the same person or not. Typically, A face verification pipeline is composed of four stages: Face detection, alignment, recognition, and matching. The faces in the frames of the live stream are detected via a deep neural network (DNN) in the first part. Then, the detected faces are aligned, and another DNN extracts the face features. The feature vector of each face is used for matching with other vectors in the database to validate the identity. New developments in deep learning lead to achieving human-level performance on the aforementioned tasks. However, the networks used in the stages require high computation power. In order to achieve real-time performance in resource-limited devices, lightweight networks should be preferred. Unfortunately, usage of these kinds of networks decreases the detection and recognition performance dramatically in some frames of a live stream. Therefore, the set of feature vectors for an individual, collected from the live stream, contains outliers that complicate obtaining a robust reference feature vector, which is essential for achieving high confidence in verification tasks. In this work, a conditional generative network is utilized for generating these vectors for the given candidate. We conduct the experiments on a real-life scenario for showing the incrementation of performance that is caused by our proposed generative network.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Handedness is one of the most obvious functional asymmetries, but its relation to anatomical asymmetry in the brain has not yet been clearly demonstrated. However, there is no significant evidence to prove or disprove this structure-function correlation, thus left-handed patients are often excluded from magnetic resonance imaging (MRI) studies. MRI classification of left and right hemispheres is a difficult task on its own due to the complexity of the images and the structural similarities between the two halves. We demonstrate a deep artificial neural network approach in connection with a detailed preprocessing pipeline for the classification of lateralization in T1-weighted MR images of the human brain. Preprocessing includes bias field correction and registration on the MNI template. Our classifier is a convolutional neural network (CNN) that was trained on 287 images. Each image was duplicated and mirrored on the mid-sagittal plane. The best model reached an accuracy of 97.594% with a mean of 95.42% and standard deviation of 1.37%. Additionally, our model’s performance was evaluated on an independent set of 118 images and reached a classification accuracy of 97%. In a larger study we tested the model on grey-matter images of 927 left and 927 right-handed patients from the UK Biobank. Here all right-handed images and all left-handed images were classified as belonging to one class. The results suggest that there is no structural difference in grey-matter between the two hemispheres that can be distinguished by the deep learning classifier.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Glaucoma, Cataract, Age-related macular degeneration, (AMD) Diabetic retinopathy (DR) are among the leading retinal diseases. Thus, there is an active effort to create and develop methods to automate screening of retinal diseases. Many CAD (Computer Aided Diagnosis) systems have been expanded and are widely used for ocular diseases. Recently, Deep Neural Networks (DNNs) have been adopted in ophthalmology and applied to fundus images, achieving detection of retinal abnormalities using retinal images. There are essentially two approaches, the first one is based on hybrid method that employs image processing for preprocessing, features extraction and post processing and Deep Neural Network (DNN) is only used for classification. The second is the fully method where DNN is used for both feature extraction and classification. Several DNN models and their variants have been proposed such as AlexNet, VGG, GoogleNet, Inception, U-Net, Residual Net (ResNet), DenseNet for detection of eye retina abnormalities. The aim of this work is to provide the background and the methodology to conduct a benchmarking analysis including the computational aspects and analysis of the representative DNNs proposed in the state of the art for detection DR diseases. For each DNN different characteristics and some performance indices (i.e. model complexity, computation complexity, inference time, memory use) and detection disease performance (i.e. accuracy rate), must be taking into account to find the more accurate model. The public domain datasets used for training and testing the DNN models such as Kaggle, MESSIDOR, and EyePACS are outlined and analyzed in particular in DR detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Diseases in plants are substantially problematic for agricultural yield management. Compounded with insufficient information to correctly diagnose crops, they can lead to significant economic loss and yield inefficiencies. Due to the success of deep learning in many image processing applications, the first part of this paper involves designing a deep neural network for the detection of disease in maize due to its economic significance. A convolutional neural network is designed and trained on a public domain dataset with labeled images of maize plant leaves with disease present, or lack thereof. In the second part of this paper, the trained convolutional neural network is turned into a smartphone app running in real-time for the purpose of performing maize crop disease detection in the field in an on-the-fly manner. The smartphone app offers a cost-effective, portable, and universally accessible way to detect disease in maize. The approach developed in this paper enables recognizing early signs of plant pathogens from maize crop images in realtime in the field, thus leading to preemptive corrective actions prior to significant yield loss. Keywords: Artificial Intelligence in agriculture, real-time detection of crop disease, smartphone
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Remote sensing consists on the acquisition of a specific area information. This process serves as a base to detect and monitor objects without being in physical contact with a landscape. One of the many signal representations that can be captured through this process is the hyperspectral image. This kind of image is characterized by its large number of bands, which means that a single pixel may have hundreds of values. In order to identify the objects registered in the images, their pixels need to be classified. The classification of hyperspectral images represents a high computational cost due to their dimensions. This study aims to propose a time optimization of the classifcation process of these images. For this reason, a comparison between feature extraction methods using wavelet filters, such as Haar, Daubechies, Biorthogonal, Coiflets and Symlets, is performed in order to apply a shrinkage of the image's dimension. Furthermore, three Artificial Neural Network architectures are proposed with the objective of classify the images using the features based in the Wavelet Transform. These architectures are implemented in a parallel programming model to be executed over a Graphics Processing Unit. Additionally, a multi-thread scheme programmed to be used in a multi-core Central Process Unit variation is presented. Both implementations and a non-parallel version of the methods are compared using algorithmic computational complexity, computing time performance, overall accuracy and kappa coefficient. To measure the performance of the methods, experiments using cross-validation and different number of samples to train the classifiers and are carried out.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image Quality assessment (IQA) is a tricky field to master, as it attempts to measure the quality of an image with reference to the complex human visual system. In IQA, there are three dominant strands of research, namely: fullreference, reduced-reference and no-reference image quality assessment. No-reference image quality assessment is the hardest one to achieve, as the reference images required for determining the quality of the given images are not available. In one of our previous papers, we quantified no-reference IQA, using state-of-the-art multitasking neural networks, particularly the VGG-16 and shallow neural networks. We achieved good accuracy for the classification of most distortions. However, one of the drawbacks of the networks used was that the classification accuracy was not good for JPEG2000 compressed images. These images were classified incorrectly as blurry or noisy images. In this paper, we try to classify compressed images more accurately using residual neural networks (ResNets). These deep learning models were built based upon micro-architecture modules and are specific task-focused entities, each one determining the distortion type and distortion level of an artifact present in the image. The test images were obtained from the LIVE II, CSIQ, and TID2013 databases for comparison with previous work. In contrast to our previous approach, where the training was limited to one specific distortion at a time, we train the collection of ResNets with all the possible distortion types present in the test databases. Preprocessing of the images is done using local contrast normalization and global contrast normalization methods. All the hyper-parameters in the ResNets collection, such as activation functions, dropout regularizations, optimizers are tuned to produce optimal classification accuracy. The results are evaluated with different methods such as PLCC, SROCC and MSE and high linear correlation is achieved using the ResNets collection and compared to previous results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Steganography in digital images commonly uses a carrier image and embeds secret data into to create the stego-image by spatial or frequency domain methods, which directly modifies the bits of the carrier image, altering the intensity of the pixels and leaving traces of modification caused by the embedding of data in the carrier image, which makes successful steganalysis possible. This paper proposes a digital image steganography framework without embedding data directly into the images that extracts the secret-data from the convolutional neural network trained with the distance local binary pattern images from an indexed image database. Experimental results demonstrate that the proposed framework is resistant to common steganalysis tools, intentional and unintentional image attacks such as luminance and contrast changes, rescaling, noise addition and compression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
When analyzing data recorded on mobile devices (including unmanned aerial vehicles and cars), additional errors arise due to the interframe blur effect and relative interframe displacement. The elimination of the noise component in such conditions is very difficult since the processing can be carried out the only frame by frame eliminating the possibility of multi-frame analysis. The paper presents an approach that allows performing complex primary processing of data obtained by a group of sensors operating in the visible and cameras capturing data in the infrared range. As a device used to calculate operations performed by an unmanned vehicle, including control, data collection and initial processing on nVidia Jetson 1 is used. A pair of cameras with a resolution of 1980x1080 pixels with a frequency of 10 frames on second, as well as SEAK thermal imaging camera with a resolution of 320x240 pixels, are used as data sensors. The algorithm presented in the work is based on a step-by-step identification of stationary areas on a series of images, the search for their correspondence and simplification. At the next stage, a local reduction of the noise component is performed using the method based on multicriterial smoothing processing. At the final stage, the operation of the complex application of parameters to filter on a series of images obtained in various electromagnetic ranges is performed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A proof-of-concept, compact, portable Fourier Ptychographic Microscope (FPM) to perform wide field-of-view, high spatial resolution imaging (<1 μm) for biosignature motility in liquid samples, is presented. The FPM has the potential method to be developed as a space-based payload for future landers destined to the Ocean Worlds. A portable FPM using an existing Fourier ptychography (FP) algorithm adapted for reconstruction is demonstrated. A NVIDIA Jetson Nano board and camera combined with FP, is used to computationally reconstruct sub-micron resolution images. Additionally, deep learning was employed to perform inferencing prediction which enables the on-edge FPM device.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a novel parallel color image watermarking scheme in the frequency domain for copyright protection of Multiple Picture Object (MPO) based on multithreading coding is proposed. Designed technique consists of three steps: firstly, the color watermark is codified using the Curvelet Transform (CvT), reducing quantity of information that represents the watermark; the second step represents the embedding process of a color watermark. In the embedding process, theMPO file contains two images where each image is divided in 8x8 pixels blocks applying DCT to any block. A color watermark is inserted in the medium frequency sub-bands of each DCT block, because the modifications performed in this sub-band are less perceptible by Human Visual System (HVS). Finally, during third step, the recovering process for color watermark is performed. The proposed scheme appears to demonstrate significant improvement in processing time though possible parallelization process when moving from serial programming to the use of threads in multicore CPUs. According to numerous experiments, where the Peak Signal-to-Noise Ratio (PSNR) and Similarity Structural Index Measure (SSIM) were used as quality criteria for watermarked MPO image and the recovered color watermark, novel method is not intrusive, since it does not degrade the quality of the watermarked MPO image. Additionally, the proposed framework is resistant to the most common image processing attacks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.