Purpose: Diabetic retinopathy (DR) is characterized by retinal lesions affecting people having diabetes for several years. It is one of the leading causes of visual impairment worldwide. To diagnose this disease, ophthalmologists need to manually analyze retinal fundus images. Computer-aided diagnosis systems can help alleviate this burden by automatically detecting DR on retinal images, thus saving physicians’ precious time and reducing costs. The objective of this study is to develop a deep learning algorithm capable of detecting DR on retinal fundus images. Nine public datasets and more than 90,000 images are used to assess the efficiency of the proposed technique. In addition, an explainability algorithm is developed to visually show the DR signs detected by the deep model.
Approach: The proposed deep learning algorithm fine-tunes a pretrained deep convolutional neural network for DR detection. The model is trained on a subset of EyePACS dataset using a cosine annealing strategy for decaying the learning rate with warm up, thus improving the training accuracy. Tests are conducted on the nine datasets. An explainability algorithm based on gradient-weighted class activation mapping is developed to visually show the signs selected by the model to classify the retina images as DR.
Result: The proposed network leads to higher classification rates with an area under curve (AUC) of 0.986, sensitivity = 0.958, and specificity = 0.971 for EyePACS. For MESSIDOR, MESSIDOR-2, DIARETDB0, DIARETDB1, STARE, IDRID, E-ophtha, and UoA-DR, the AUC is 0.963, 0.979, 0.986, 0.988, 0.964, 0.957, 0.984, and 0.990, respectively.
Conclusions: The obtained results achieve state-of-the-art performance and outperform past published works relying on training using only publicly available datasets. The proposed approach can robustly classify fundus images and detect DR. An explainability model was developed and showed that our model was able to efficiently identify different signs of DR and detect this health issue.
Unmanned Aerial Vehicles (UAVs) are very popular and increasingly used in different applications. For many applications, it can be very interesting to achieve UAVs collaboration. In this work, we propose the use of vision-based collaboration between UAVs. The proposed approach uses images captured by a UAV and deep learning to detect and follow another UAV. To detect the leader UAV, we developed an approach based on the deep YOLO algorithm. This approach was able to process videos at 30 fps and get high mAP for UAV detection. To follow the leader UAV, we developed a high-level control algorithm based on the use of the detected bounding box coordinates. The bounding box size and position help compute the command to send to the follower UAV. Tests were conducted in outdoor scenarios using quadcopter UAVs. The obtained results and the high mAP are promising and show the possibility of using this kind of vision-based deep learning approach for UAVs collaboration.
Over the past few years, UAVs have known and increase in popularity and are now widely used in many applications. Today, the use of multiple UAVs and UAV swarms are attracting more interest from the research community leading to the exploration of topics such as UAV cooperation, multi-drones autonomous navigation, etc. In this work, we are interested in UAVs tracking and pursuit. The goal here, is to use deep learning and the captured images from one of the UAVs to detect and track the second moving UAV. The proposed approach uses deep reinforcement learning for UAV pursuit. The input is the current frame cropped using the last target pose, and the output is a probabilistic distribution between a set of possible actions. The experimental results are promising and show that the proposed algorithm achieves high performances in challenging outdoor scenarios.
These last years, we have witnessed considerable improvements in machine learning and deep learning. Many advanced techniques are now based on deep neural networks. Although many software libraries are available, the development of deep neural networks requires a good level of mathematical knowledge and high programming skills. In this work, we present a visual tool to help simplify the programming of deep learning networks. The developed framework DeepViP is comprised of a node editor that provides users with a toolbox representing different types of neural layers. It allows the connection between the different blocks and the configuration of important hyper parameters of each layer. Thus, speeding-up experimentation with different architectures. Additionally, the developed solution offers users the possibility to generate a python script of the designed network that can be run using specific libraries such as keras or tensorflow.
Detecting drifting icebergs is an important task to avoid threats to navigation and offshore activities. Government and companies use aerial reconnaissance and shore-based observation platforms to detect these icebergs. However, in some areas with harsh weather conditions only satellite imagery can be used to monitor this risk. In this work, we propose the use of deep Convolutional Neural Networks to detect and classify these small remotely sensed targets as ships or icebergs. In this work, we use satellite radar imagery composed of two bands. The image patches have a resolution below 6K pixels and are noisy. To solve this challenge, we developed a deep convolutional network architecture and optimized its hyperparameters for this classification. The obtained results show that the proposed deep convolutional network achieves a very interesting accuracy for the classification of icebergs vs. ships with radar satellite images.
Unmanned aerial vehicles have become widespread in today’s world and are used for applications ranging from real estate marketing and bridge inspection to defense and military applications. These applications have in common some form of autonomous navigation that requires a good localization capability at all time. Most UAV are using a combination of global navigation satellite systems (GNSS) and inertial measurement unit (IMU) to perform this task. Unfortunately, GNSS are subject to signal unavailability and all sorts of interference impeding on the ability of the UAV to self-localize. In this paper, we propose a new algorithm to perform localization in GNSS-denied environments by using a relative visual localization technique. We developed a new measure based on the use of local feature points extracted with ORB to estimate the likelihood of a previously captured image to have been taken in a position close to the current UAV location. The measure is embedded in a particle filter in which IMU data is used in order to reduce the number of images we need to analyze to perform localization. The resulting method have shown significant improvement in both accuracy and execution time in comparison to previous approaches.
Extracting face images at a distance, in the crowd, or with a lower resolution infrared camera leads to a poorquality face image that is barely distinguishable. In this work, we present a Deep Convolutional Generative Adversarial Networks (DCGAN) for infrared face image enhancement. The proposed algorithm is used to build a super-resolution face image from its lower resolution counterpart. The resulting images are evaluated in term of qualitative and quantitative metrics on infrared face datasets (NIR and LWIR). The proposed algorithm performs well and preserves important details of the face. The analysis of the resulting images show that the proposed framework is promising and can help improve the performance of image super-resolution generation and enhancement in the infrared spectrum.
Wildland fires are considered one of the major natural risks affecting almost every country in the world. The impacts of these fires are huge in term of environmental, economic, and social losses. Experts estimate that with the climate change and global warming, we will witness an increase in the frequency and size of fires in the next years. In this paper, we will present the advances in the use of multiple spectrum computer vision to process, analyze and understand wildland fires behavior. We will introduce different multispectral technologies used in image capture, the techniques developed to detect and extract the fires from the images, and how multispectral fusion is used in the context of wildland fires. We will show our recent results using multiple multimodal stereovision systems where different modalities are combined to extract important fires characteristics in threedimensional space. Finally, we will discuss the use of UAVs to monitor fires at a larger scale.
Ear biometrics has known an increase of interest from the computer vision research community in recent years. This is mainly because ear geometric features can be extracted in a non-intrusive way, are unique to each individual and does not change over time. Different techniques were proposed to extract ear features in 2D and 3D space and use them in a person recognition system. In this work, we propose Deep-Ear a deep convolutional residual network to perform ear recognition. The proposed algorithm uses a 50 layers deep residual network (ResNet50) as a feature extractor followed by 2 fully connected layers and a final softmax layer for classification. Experimental tests were performed on AMI-DB ear dataset. The obtained top-1 accuracy is equal to 95.67% and a top-3 accuracy is 99.67%. These results show that the proposed architecture is promising in developing a robust feature-free ear recognition technique based on deep learning.
Face recognition is a research area that has been widely studied by the computer vision community in the past years. Most of the work deals with close frontal images of the face where facial structures can be easily distinguished. Little work deals with recognizing faces at a distance, where faces are at a very low resolution and barely distinguishable. In this work, we present a deep learning architecture that can be used to enhance lower resolution facial images captured at a distance. The proposed framework uses Deep Convolutional Generative Adversarial Networks (DCGAN). The proposed architecture works well even in the presence of a small number of images for learning. The new enhanced images are then sent to a face recognition algorithm for classification. The proposed framework outperforms classical enhancement techniques and leads to an increase in the face recognition performance.
In the last decade, research was conducted to develop measurement solutions dedicated to forest fires and based on image processing and computer vision. Significant progress was achieved in developing such tools for fire propagation in controlled laboratory environments. However, these developments are not suitable for outdoor unstructured environments. Additionally, wildland fires cover large areas; this limits the use of vision-based ground systems. Unmanned Aerial Vehicles (UAV) with cameras for remote sensing are promising as their performance/price ratio is increasing over time. They can provide a low-cost alternative for the prevention, detection, propagation monitoring and real-time support for fire fighting. In this paper, we give an overview of past work dealing with the use of UAVs in the context of wildland and forest fires, and propose a framework based on cooperative UAVs and UGVs for fires monitoring on a larger scale.
Most of today's UAVs make use of multi-sensor GNSS/INS fusion for localization during navigation. In such a context GNSS systems are used as a compact and cost-effective way to constrain the unbounded error induced by the INS sensors on the localization. Unfortunately, GNSS systems have been proven to be unreliable in multiple contexts. The drawback of such an approach resides in the radio communications necessary to acquire the localization data. Radio communication systems are prone to availability problems in some environments, to signal alteration and to interference. The root cause of the problem resides in the use of global information to solve a local problem. In this work, we propose the use of local visual information to perform relative localization in an unknown outdoor environment. The algorithm uses feature point methods to extract salient points from a set of images pertaining to possible matches during the navigation. The extracted features are matched with available visual data stored during previous navigation or from an aerial view map. Different feature extraction techniques were analyzed, and ORB was the one that gave the best mean absolute error. The estimated distance between the best match and ground-truth localization was within 70 meters on average at an altitude of 150 meters. Experimental tests were conducted on outdoor videos captured using a quadcopter. The obtained results are promising and show the possibility of using relative visual data in GPS/GNSS-denied environments to improve the robustness of UAVs navigation.
Every year, forest and wildland fires affect more than 350 million hectares worldwide resulting in important environmental, economic, and social losses. To efficiently fight against this major risk, specific actions are deployed. The efficiency of these actions is tightly linked to the knowledge of the phenomena and in improving the tools for detecting, predicting, and understanding fire propagation. An important step for vision-based fire analysis, is the detection of fire pixels. In this work, we propose Deep-Fire a deep convolutional neural network for fire pixels detection and fire segmentation. The proposed technique is tested on a database of wildland fires. The obtained results, show that the proposed architecture gives a very high performance for the segmentation of wildland and forest fire areas in outdoor non-structured scenarios.
Three-dimensional (3D) vision scanning for metrology and inspection applications is an area that knows an increasing interest in the industry. This interest is driven by the recent advances in 3D technologies, permitting to attain high precision measurements at an affordable cost. 3D vision allows for the modelling and inspection of the visible surface of objects. When it is necessary to detect subsurface defects, active infrared (IR) thermography is one of the most used tools today for non-destructive testing (NDT) of materials. Fusion of these two modalities allows the simultaneous detection of surface and subsurface defects and to visualize these defects overlaid on a 3D model of the scanned and modelled parts or their 3D computer-aided design (CAD) models. In this work, we present a framework for automatically fusing 3D data (scanned or CAD) with the infrared thermal images for an NDT process in 3D space.
Tracking pedestrians is an area of computer vision that has attracted a lot of interest in recent years. Many of these work was conducted in the visible spectrum. Some work was also conducted in thermal infrared spectrum. The majority of the research work used one spectrum at a time. In this work, we present a fusion framework using thermal infrared and visible spectrums in order to robustly track the detected moving objects. The detected objects are then processed using HOG features in order to classify them as a pedestrian or a non-pedestrian using SVM. The tests were conducted in outdoor scenarios. The obtained results are promising and show the efficiency of the proposed framework.
In a milking robot, the correct localization and positioning of milking teat cups is of very high importance. The milking robots technology has not changed since a decade and is based primarily on laser profiles for teats approximate positions estimation. This technology has reached its limit and does not allow optimal positioning of the milking cups. Also, in the presence of occlusions, the milking robot fails to milk the cow. These problems, have economic consequences for producers and animal health (e.g. development of mastitis). To overcome the limitations of current robots, we have developed a new system based on 3D vision, capable of efficiently positioning the milking cups. A prototype of an intelligent robot system based on 3D vision for real-time positioning of a milking robot has been built and tested under various conditions on a synthetic udder model (in static and moving scenarios). Experimental tests, were performed using 3D Time-Of-Flight (TOF) and RGBD cameras. The proposed algorithms permit the online segmentation of teats by combing 2D and 3D visual information. The obtained results permit the teat 3D position computation. This information is then sent to the milking robot for teat cups positioning. The vision system has a real-time performance and monitors the optimal positioning of the cups even in the presence of motion. The obtained results, with both TOF and RGBD cameras, show the good performance of the proposed system. The best performance was obtained with RGBD cameras. This latter technology will be used in future real life experimental tests.
Tracking targets in video surveillance with the possibility of moving the camera to keep the target within the field of view is an important task for security personnel working in sensitive sites.
This work presents a real-time 3D tracking system based on stereovision. The camera system is positioned on a Pan and Tilt platform in order to continuously track a detected target. Particle filters are used for tracking and a pattern recognition approach is performed in order to keep the focus on the target of interest. The 3D position of the target relative to the stereovision frame is computed using stereovision techniques. This computed position gives the possibility of following the target position in a georeferenced site map in real-time.
Tests conducted in outdoor scenarios show the efficiency of the proposed approach.
This work introduces a new framework for active and passive infrared image fusion for face recognition applications. Two multispectral face recognition databases were used in our experiments: Equinox Database (Visible, SWIR, MWIR, LWIR) and m-Faces Database (Visible, NIR, MWIR, LWIR). The proposed framework uses a fusion scheme in texture space in order to increase the performance of face recognition. The proposed texture space is based on the use of binary and ternary patterns. A new adaptive ternary pattern is also introduced. Active (SWIR and NIR) and passive (MWIR, LWIR) infrared modalities are used in this fusion scheme. An intraspectral and inter-spectral fusion approaches are introduced. The obtained results are promising and show an increase in the recognition performance when texture channels are fused in a multi-scale fusion scheme.
Face recognition is an area of computer vision that has attracted a lot of interest from the research community. A growing demand for robust face recognition in security applications has driven interesting advancements in this field. In this work, we introduce a new multistep approach for face recognition in the infrared spectrum. The proposed approach works in texture space using binary and ternary pattern descriptors. The approach operates in two steps. In the first step, dimensionality reduction techniques are used to classify the preprocessed infrared face image. This operation permits the selection of the highest score candidates. In the second step, a small set of these candidates are then classified using a correlation based approach. This last step permits the selection of the best matching candidate. The obtained results show a high increase in the face recognition performance when a multistep approach is used compared to dimensionality reduction face recognition techniques alone.
In fire research and forest firefighting, there is a need of robust metrological systems able to estimate the geometrical characteristics of outdoor spreading fires. In recent years, we assist to an increased interest in wildfire research to develop non destructive techniques based on computer vision. This paper presents a new approach for the estimation of fire geometrical characteristics using near infrared stereovision. Spreading fire information like position, rate of spread, height and surface, are estimated from the computed 3D fire points. The proposed system permits to track fire spreading on a ground area of 5mx10m. Keywords: near infrared, stereovision, spreading fire, geometrical characteristics
This paper presents a new approach for the estimation of fire front volume in indoor laboratory experiments. This work
deals with fire spreading on inclinable tables. The method is based on the use of two synchronized stereovision systems
positioned respectively in a back position and in a front position of the fire propagation direction. The two vision
systems are used in order to extract complementary 3D fire points. The obtained data are projected in a same reference
frame and used to build a global form of the fire front. An inter-systems calibration procedure is presented and permits
the computation of the projection matrix in order to project all the data to a unique reference frame. From the obtained
3D fire points, a three dimensional surface rendering is performed and the fire volume is estimated.
This work presents a framework for fast texture analysis in computer vision. The speedup is obtained using General-
Purpose Processing on Graphics Processing Units (GPGPU technology). For this purpose, we have selected the
following texture analysis techniques: LBP (Local Binary Patterns), LTP (Local Ternary Patterns), Laws texture kernels
and Gabor filters. GPU optimizations are compared to CPU optimizations using MMX-SSE technologies and Multicore
parallel programming. The experimental results show an important increase in the performance of the proposed
algorithms when GPGPU is used particularly for large image sizes.
Face recognition is an area of computer vision that has attracted a lot of interest from the research community. A
growing demand for robust face recognition software in security applications has driven the development of interesting
approaches in this field. A large quantity of research in face recognition deals with visible face images. In the visible
spectrum the illumination and face expressions changes represent a significant challenge for the recognition system. To
avoid these problems, researchers proposed recently the use of 3D and infrared imaging for face recognition.
In this work, we introduce a new framework for infrared face recognition using texture descriptors. This framework
exploits linear and non linear dimensionality reduction techniques for face learning and recognition in the texture space.
Active and passive infrared imaging modalities are used and comparison with visible face recognition is performed. Two
multispectral face recognition databases were used in our experiments: Equinox Database (Visible, SWIR, MWIR,
LWIR) and Laval University Multispectral Database (Visible, NIR, MWIR, LWIR).
The obtained results show high increase in recognition performance when texture descriptors like LBP (Local Binary
Pattern) and LTP (Local Ternary Pattern) are used. The best result was obtained in the short wave infrared spectrum
(SWIR) using non linear dimensionality reduction techniques.
In recent years, we see an increase of interest for intelligent and efficient tracking systems in surveillance applications.
Many of the proposed techniques are designed for static cameras environments. When the camera is moving, tracking
moving objects becomes more difficult and many techniques fail to detect and track the desired targets. The problem
becomes more complex when we want to track a specific object in real-time using a Pan and Tilt camera system (PTU).
Tracking a target using a PTU in order to keep the target within the image is important in surveillance applications.
When a target is detected, the possibility of automatically tracking it and keeping it within the image until action is taken
is very important for security personnel working in sensitive areas.
This work presents a real-time tracking system based on particle filters. The proposed system permits the detection and
continuous tracking of a selected target using a Pan and Tilt camera platform. A novel simple and efficient approach for
dealing with occlusions is presented. Also a new intelligent forget factor is introduced in order to take into account target
shape variations and avoid learning non desired objects. Tests conducted in outdoor operational scenarios show the
efficiency and robustness of the proposed approach.
In recent years, we see an increase of interest for efficient tracking systems in surveillance applications. Many of the
proposed techniques work well for good quality images and when objects are within a certain size. When dealing with
UAV or surveillance cameras, the images are noisy and many techniques fail to detect and track the real moving objects.
This work presents a tracking technique based on a combined spatial and temporal wavelet processing of the image
sequence. For sequences coming from an UAV, images are rectified using detected features in the scene. A modified
Harris corner detector is used to select points of interest. Regions around these points are matched in successive frames
in order to find the transformations between successive images. These transformations are used to stabilize the images
and to build a complete scene mosaic from the original sequence during the object tracking.
A spatial discrete wavelet transform is then used to extract potential target regions. These detections are refined using a
temporal wavelet transform. Mathematical morphology is then used to eliminate targets resulting from image noise. The
remaining targets are further processed using Kalman filter. A refinement selection strategy is then performed to keep
only the targets obtaining the highest scores.
The obtained results are promising and show the possibility of efficiently tracking moving objects in noisy images
captured by a moving camera. Also, the proposed technique works efficiently with noisy infrared sequences captured by
a surveillance system.
Each year, hundred millions hectares of forests burn causing human and economic losses. For efficient fire fighting, the
personnel in the ground need tools permitting the prediction of fire front propagation.
In this work, we present a new technique for automatically tracking fire spread in three-dimensional space. The proposed
approach uses a stereo system to extract a 3D shape from fire images.
A new segmentation technique is proposed and permits the extraction of fire regions in complex unstructured scenes. It
works in the visible spectrum and combines information extracted from YUV and RGB color spaces. Unlike other
techniques, our algorithm does not require previous knowledge about the scene.
The resulting fire regions are classified into different homogenous zones using clustering techniques. Contours are then
extracted and a feature detection algorithm is used to detect interest points like local maxima and corners. Extracted
points from stereo images are then used to compute the 3D shape of the fire front. The resulting data permits to build the
fire volume. The final model is used to compute important spatial and temporal fire characteristics like: spread
dynamics, local orientation, heading direction, etc.
Tests conducted on the ground show the efficiency of the proposed scheme. This scheme is being integrated with a fire
spread mathematical model in order to predict and anticipate the fire behaviour during fire fighting. Also of interest to
fire-fighters, is the proposed automatic segmentation technique that can be used in early detection of fire in complex
Face recognition in the infrared spectrum has attracted a lot of interest in recent years. Many of the techniques used in
infrared are based on their visible counterpart, especially linear techniques like PCA (Principal Component Analysis)
and LDA (Linear Discriminant Analysis).
In this work, we introduce non linear dimensionality reduction approaches for multispectral face recognition. For this
purpose, the following techniques were developed: global non linear techniques (Kernel-PCA, Kernel-LDA) and local
non linear techniques (Local Linear Embedding, Locality Preserving Projection). The performances of these techniques
were compared to classical linear techniques for face recognition like PCA and LDA.
Two multispectral face recognition databases were used in our experiments: Equinox Face Recognition Database and
Laval University Database. Equinox database contains images in the Visible, Short, Mid and Long waves infrared
spectrums. Laval database contains images in the Visible, Near, Mid and Long waves infrared spectrums with variations
in time and metabolic activity of the subjects.
The obtained results are interesting and show the increase in recognition performance using local non linear
dimensionality reduction techniques for infrared face recognition, particularly in near and short wave infrared spectrums.
An imaging technique of the hand vein tree is presented in this paper. Using the natural human circulatory system and a controlled armband pressure around the arm, a lock-in thermography technique with an internal excitation is carried out. Since the stimulation frequency is inversely proportional to the inspection depth, the subcutaneous layer requires the use of a very slow frequency. Thus, a sawtooth waveform is preferred to minimize the duration of the pressure applied to the armband during the experiment. A frequency of approximately 0.03 Hz and a pressure range between 100 and 140 mmHg, according to the diastolic and systolic blood pressure, are used as stimulation. Then, dorsal hand amplitude and phase images are obtained with IR_view (Klein, 1999), a tool specifically designed to analyze infrared images.
The hand vein structure is thermally mapped by an infrared camera operating in the middle wavelength infrared range (MWIR) at room temperature. Parasitic frequencies are avoided by keeping the hand fixed. The resulting images show a gradient of temperature between surrounding tissues and the back-of-hand veins. The vascular signature segmentation is extracted from the amplitude and phase images by using a Fast Fourier Transform image processing technique. This work could be used for vein localization for perfusion or for the early diagnosis of vein diseases such as primitive varicose and deep vein thrombosis (DVT). A hand vein signature database for identification purposes is also possible.