The pervasiveness of small Unmanned Aerial Vehicles (UAVs), due to low cost, ease of control, and portability, opens
the possibility of their use in urban environments for illegal or adversarial purposes. Such use includes unauthorized
surveillance, reconnaissance, and weaponization. Detecting adversarial UAVs in urban environments is difficult. Urban
canyons provide shielding from visibility. The small size of quadcopter-type UAVs limits the number of object pixels
available for processing, which reduces standoff detection performance. UAVs fly against a background of ground
motion clutter which can mask their motion. One possible solution to small UAV detection in urban environments uses
low-cost UAV surveillance platforms, equipped with optical sensors, together with computer vision algorithms to detect
adversarial UAVs in video data. In this paper we adapt the astronomical technique of transit photometry to detect small
UAVs, operating in urban environments, in video data. Transit photometry, typically used for exo-planet discovery,
detects small changes in background brightness due to a transiting object. As the UAV traverses across a bright
background region, for example, the vehicle occludes the background and reduces the perceived brightness. This
brightness dip may be used to infer the existence of a potential UAV passing across the background. The transit
photometry curve, resulting from this brightness dip, reveals information about the traversing vehicle. We investigate
mathematical properties of the transit photometry curve and derive a closed-form expression for it. We present numerical
results demonstrating the technique on real video data acquired from a small UAV operating in an urban environment.
Image inpainting is the process of filling in the missing region to preserve continuity of its overall content and semantic. In this paper, we present a novel approach to improve an existing scheme, called exemplar-based inpainting algorithm, using Topological Data Analysis (TDA). TDA is a mathematical approach concern studying shapes or objects to gain information about connectivity and closeness property of those objects. The challenge in using exemplar-based inpainting is that missing regions neighborhood area needs to have a relatively simple texture and structure. We studied the topological properties (e.g. number of connected components) of missing regions surrounding the missing area by building a sequence of simplicial complexes (known as persistent homology) based on a selected group of uniform Local binary Pattern LBP. Connected components of image regions generated by certain landmark pixels, at different thresholds, automatically quantify the texture nature of the missing regions surrounding areas. Such quantification help determine the appropriate size of patch propagation. We have modified the patch propagation priority function using geometrical properties of curvature of isophote and improved the matching criteria of patches by calculating the correlation coefficients from spatial, gradient and Laplacian domain. We use several image quality measures to illustrate the performance of our approach in comparison to similar inpainting algorithms. In particular, we shall illustrate that our proposed scheme outperforms the state-of-the-art exemplar-based inpainting algorithms.
The automatic restoration of image colours (inpainting) is an interesting challenge in computer vision. The aim is to restore/recover missing colour information in a region based on the surrounding region information in a way that looks acceptable to the human eye. We investigate two functional formulas based on the difference between the directional derivative of the gradient (and the Laplacian) of two channels. The Euler-Langrage process applied to the two functional produces a nonlinear second order (and a nonlinear fourth order) PDE, the numerical solutions of which restore the colour to the region of interest. The first method extends an already established, but only for one specific colour space. We shall establish the effectiveness of the corresponding image inpainting schemes, in both the spatial and wavelet domains for 8 different colour spaces. We demonstrate the success of both schemes for a large number of natural images with better performance in comparison with the popular Poisson formula.
How to describe an image accurately with the most useful information is the key issue of any face recognition task. Therefore, finding efficient and discriminative facial information that should be stable under different conditions of the image acquisition process is a huge challenge. Most existing approaches use only one type of features. In this paper, we argue that a robust face recognition technique requires several different kinds of information to be taken into account, suggesting the incorporation of several feature sets into a single fused one. Therefore, a new technique that combines the facial shape with the local structure and texture of the face image is proposed, namely multi-feature fusion (MFF). It is based on local boosted features (LBF) and Gabor wavelets techniques. Given an input image, the LBF histogram and Gabor features histogram are built separately. Then a final MFF feature descriptor is formed by concatenating these three histograms, which feeds to the support vector machine (SVM) classifier to recognize the face image. The proposed MFF approach is evaluated on three different face datasets and provided promising results.
In the human visual system, visible objects are recognized by features, which can be classified into local features that are based on their simple components (i.e., line segment, angle, color, etc.) and global features that are based on the whole objects (i.e., connectivity, number of holes, etc.). Over the past half century, anatomical, physiological, behavioral and computational studies of the visual systems have led to a generally accepted model of vision, which starts at processing local features in the early stages of the visual pathways, followed by integrating them to global features in the later stages of the visual pathways. However, this popular local-to-global model has been challenged by a set of experiments showing that the visual systems in humans, non-human primates and honey bees are more sensitive to global features than local features. These “global-first” studies further motivated developing new paradigms and approaches to understand human vision and build new vision models. In this study, we started a new series of experiments that examine how two representative pre-trained Convolutional Neural Networks (CNN) (AlexNet and VGG-19) process local and global features. The CNNs were trained to classify geometric shapes into two categories based on local features (e.g., triangle, square and circle) or a global feature (e.g., having a hole). In contrast to the biological visual systems, the CNNs were more effective at classifying images based on local features than the global feature. We further showed that adding distractors greatly lowered the performance of the CNNs, again different from the biological visual systems. Ongoing studies will extend these analyses to other geometrical invariants and internal representations of the CNNs. The overarching goal is to use the powerful CNNs as a tool to gain insights into the biological visual systems, including that of humans and non-human primates.
A library usually holds thousands of books and each book is assigned to a unique position so that the visitors can find the books easily by checking the database of the library. However, the misplaced books bring troubles for readers to find them. Therefore, finding out these misplaced books and rearranging them is one of the important jobs for librarians. In this paper, a convolutional-neural-network-based book label recognition algorithm is proposed to help librarians finding out the misplaced books by scanning the book labels. The algorithm is divided into two parts: the first part applies image processing techniques to extract the characters of the labels attached to each book from the images of the bookshelves. The second part uses convolutional neural networks (CNNs) to train a classifier for recognizing characters. In this part, a CNN with four convolutional layers is designed to train classifiers for classifying characters and numbers that are used for the recognition of the text.
Eye tracking technology allows researchers to monitor position of the eye and infer one’s gaze direction, which is used to understand the nature of human attention within psychology, cognitive science, marketing and artificial intelligence. Commercially available head-mounted eye trackers allow researchers to track pupil movements (saccades and fixations) using infrared camera and capture the field of vision by a front-facing scene camera. The wearable eye tracker opened a new way to research in unconstrained environment settings; however, the recorded scene video typically has non-uniform illumination, low quality image frames, and moving scene objects. One of the most important tasks for analyzing the recorded scene video data is finding the boundary between different objects in a single frame. This paper presents a multi-level fixation-oriented object segmentation method (MFoOS) to solve the above challenges in segmenting the scene objects in video data collected by the eye tracker in order to support cognition research. MFoOS shows its advancement in position-invariance, illumination, noise tolerance and is task-driven. The proposed method is tested using real-world case studies designed by our team of psychologists focused on understanding visual attention in human problem solving. The extensive computer simulation demonstrates the method’s accuracy and robustness for fixation-oriented object segmentation. Moreover, a deep-learning image semantic segmentation combining MFoOS results as label data was explored to demonstrate the possibility of on-line deployment of eye tracker fixation-oriented object segmentation.
Many human detection algorithms are able to detect humans in various environmental conditions with high accuracy, but they strongly use color information for detection, which is not robust to lighting changes and varying colors. This problem is further amplified with infrared imagery, which only contains gray scale information. The proposed algorithm for human detection uses intensity distribution, gradient and texture features for effective detection of humans in infrared imagery. For the detection of intensity, histogram information is obtained in the grayscale channel. For extracting gradients, we utilize Histogram of Oriented Gradients for better information in the various lighting scenarios. For extraction texture information, center-symmetric local binary pattern gives rotational-invariance as well as lighting-invariance for robust features under these conditions. Various binning strategies help keep the inherent structure embedded in the features, which provide enough information for robust detection of the humans in the scene. The features are then classified using an adaboost classifier to provide a tree like structure for detection in multiple scales. The algorithm has been trained and tested on IR imagery and has been found to be fairly robust to viewpoint changes and lighting changes in dynamic backgrounds and visual scenes.
Camouflage aims at making objects disappearing in the background environment by presenting similar textures, color information and patterns with the background. The camouflage objects can be divided into two groups: dark camouflage and light camouflage. To locate the camouflage objects, many existing detection algorithms have been published. And, their performance is highly related to the image enhancement as their pre-processes. Even though existing histogram equalization-based image enhancement algorithms perform well at either dark camouflage image or light camouflage image, there is still a challenge to deal with an image containing both dark camouflage and light camouflage. To meet this challenge, a new hill climbing-based histogram equalization algorithm is proposed to follow a three-step framework of segmentation, enhancement and integration. Different from existing approaches, this proposed method aims at segmenting the dark camouflage content and light camouflage content by utilizing the hill climbing algorithm. The segmented camouflage contents are enhanced by their corresponding histogram equalization. Finally, the enhanced segments are combined by an integration process to get the final output images with a satisfied quality. This hill climbingbased histogram equalization can enhance the detailed structural information in both dark and light regions of images simultaneously. Experimental and comparison results demonstrate its superior performance.
In most pattern recognition applications, the object of interest is represented by a very high dimensional data-vector. High dimensionality of modeling vectors poses serious challenges related to the efficiency of retrieval, analysis and classifying the pattern of interest. The Curse of Dimension is a general reference to these challenges and commonly addressed by Dimension Reduction (DR) techniques. The most commonly used DR schemes are data-dependent like Principal Component Analysis (PCA). However, we may expect over-fitting and biasness of the adaptive models to the training sets as consequences of low sample density ratio to dimension. Therefore, data-independent DR schemes such as Random Projections (RP) are more desirable. In this paper, we investigate and test the performance of differently constructed overcomplete Hadamard-based mxn (m<<n) sub-matrices using Walsh-Paley (WP) matrices as a DR scheme for Gait-based Gender Classification (GBGC). In particular, we shall demonstrate that these Hadamard-based RPs perform as well as, if not better, PCA and Gaussian-based RPs. Moreover, we shall show that Walsh-Paley Structured Matrices (WPSM) perform better than Walsh-Paley Random Matrices (WPRM).
Image Steganography is the technique of hiding sensitive data (secrete message) inside cover images in a way that no suspicion occurs to attackers, while steganalysis is the technique of detecting the embedded data by unauthorized persons. As a first step of detecting hidden data, distinguishing between original (Images without secrete message) and Stego (Images contain secrete message) is important. In this paper we design and propose a novel scheme based on the emerging field of Topological Data Analysis (TDA) concept of persistent homological (PH) invariants (e.g. No. of connected components), associated with certain image features. Selected group of Uniform Local Binary Pattern (LBP), which is a texture descriptor, codes representing the image features used to construct a sequence of simplicial complexes (SC) from an increasing sequence of distance thresholds (T). We calculate the corresponding non-increasing sequence of homological invariants which shows the speed at which the constructed sequence of SCs terminates. This approach is sensitive to differentiate original images from stego images. We test this approach on three different embedding techniques which are Traditional Least Significant Bits (TLSB) embedding technique, spatial Universal Wavelet Relative Distortion (S-UNIWARD) and LSB-Witness embedding technique together with a large number of images chosen randomly from large database of images. Preliminary results show that the PH sequence defines a discriminates criterion for steganalysis purpose with over 90% classification accuracy.
We consider in this paper the problem of image inpainting in medical image analysis, where the objective is to reconstruct missing or deteriorated parts of an image. It is a good tool for such medical applications as vascular reconstruction, specular reflection removal for endoscopic images, MRA artefacts removing etc. Most inpainting approaches require a good image model to infer the unknown pixels. The proposed approach uses the modified exemplar-based technique. A novel approach combines mapping from image patches and pre-trained deep neural network. In our work, we exploit the concept of sparse representation, which takes a group of nonlocal patches with similar textures as the basic unit instead of a patch. Moreover, the color and multi-direction constraints are incorporated into the optimisation criterion to obtain sharp inpainting results. As a result, the proposed method provides plausible restoration while propagating information of edge for the target region. Experimental results demonstrate the effectiveness of the proposed method in the tasks of medical image inpainting.
Accurate representation of objects, large and small, has always been in the forefront of scientific interest. Threedimensional (3D) information of objects are traditionally extracted using projective and descriptive geometry of the objects in two dimensions. Currently, 3D scanners are extensively used to perform such processes. 3D scanners can examine an object to gather information about its physical structure including shape, volume, and texture. Although various 3D scanners utilize different capturing and reconstruction methods, they ultimately produce a form of depth images which are used to generate 3D point-clouds. However, the fundamental problem with depth image-based processing is that, the depth image could contain blocky artifacts, discontinuities and noise. This could result in occlusion and image warping. 3D image quality metrics are critical in the evaluation of 3D images in fields and industries such as automotive, art, biometrics, and biomedical. Although, there are several two-dimensional (2D or color) quality metrics, there is a visible void in the field of objective depth quality assessment. This paper proposes a novel no-reference based depth image measure and further fuses this measure with an extended color quality metric. The Color-Depth image quality measure CDME has no constraint on the 3D images being compared and demonstrates a very high correlation with the human judgment. Extensive computer simulations are performed to evaluate the proposed color-depth image quality measure against other no-reference image error measurements. The effectiveness of the presented measure is evaluated by using the NYU Depth Dataset V2. Experimental results show that the proposed measure provides a clear distinction between lower quality and higher quality images. Eventually, the presented method could be used to provide optimal parameters for 3D post-processing algorithms.
Security surveillance are low-cost, ubiquitous systems, which are employed in smart cities around the world for threat monitoring and assessment. Manual observation, monitoring and tracking their population, detection, and reporting abnormal events in crowded places can be very challenging. Smart cities favor the use of sophisticated security systems, which can exceed human errors. Moreover, multi-view near-infrared surveillance systems pose challenges such as poor image quality, color discontinuity, occlusion, and image blur. Also, the performance of a recognition system depends on the specifications of the camera. All these distortions cause interference in feature extraction process in face or object classification systems. In this article, an intelligent multi-view image mosaicking algorithm, which combines near-infrared images captured from dozens of cameras/sensors is introduced. The presented system a) preserves facial features, b) avoids vertical banding (exposure variation), and c) solves color discontinuity aiding for face detection systems. The performance of this technique is tested against its ground truth, both subjectively and quantitatively. The quantitative analysis is performed using measures such as SSIM, MS-SSIM, AME, LogAMEE, and TDMEC.
The proposed method is a novel image enhancement for color medical images. In this method, the 3-D medical image is transformed first to the 2-D grayscale image and then the enhancement algorithms, either in frequency domain or spatial domain, are applied to the grayscale image. This paper describes the enhancement effects on the medical images by the proposed transformation model and then the enhancement by the alpha-rooting method, for the frequency domain algorithm, and the histogram equalization, for the spatial domain enhancement algorithm. The enhancement is quantitatively measured with respect to the metric which is called the color enhancement measure estimation (CEME). The proposed method is showing good CEME values as compared to the original images.
The problem of color image composition from original grayscale images is considered. A few models are proposed and analyzed, which are based on the observation done on many color images that in average a proportion exists between primary colors in images. Re-coloring of grayscale images are performed in the RGB color model and the Golden, Silver, Aesthetic ratios and other rules of proportions are considered between the primary colors. The gray is used as the main color to be map into the three colors at each pixel. We also describe a parameterized model with the given ratio of re-coloring images.
Image Processing and Computer Vision solutions have become commodities for software developers, thanks to the growing availability of Application Programming Interfaces (APIs) that encapsulate rich functionality, powered by advanced algorithms. Tech giants like Apple, Google, IBM, and Microsoft have made APIs and micro-services available in the cloud for the agile integration of machine learning and intelligent features onto everyday applications. As privacy and cyber welfare become prime concerns, special efforts have been devoted in the field of face processing and recognition. In this context, this paper provides a friendly, intuitive and fun to use mobile app that leverages the state-of-the-art APIs for face, age, gender and emotion recognition. The Face- It-Up app was implemented for the iOS platform and uses the Microsoft Cognitive Services APIs as a tool for human vision and face processing research. Experimental work on image compression, upside-down orientation, the Thatcher effect, negative inversion, high frequency, facial artifacts, caricatures and image degradation were performed to test the application. For this purpose, we used the Radboud and 10k US Adult Faces Databases. The app benefits from accessing high-resolution imagery and touch input from the smart-devices, allowing for a wide range of new experiments from the user perspective. Furthermore, our approach serves as a potential framework for new initiatives in image-based biometrics, the Internet of Things, and citizen science.
Face recognition technologies have been in high demand in the past few decades due to the increase in human-computer interactions. It is also one of the essential components in interpreting human emotions, intentions, facial expressions for smart environments. This non-intrusive biometric authentication system relies on identifying unique facial features and pairing alike structures for identification and recognition. Application areas of facial recognition systems include homeland and border security, identification for law enforcement, access control to secure networks, authentication for online banking and video surveillance. While it is easy for humans to recognize faces under varying illumination conditions, it is still a challenging task in computer vision. Non-uniform illumination and uncontrolled operating environments can impair the performance of visual-spectrum based recognition systems. To address these difficulties, a novel Anisotropic Gradient Facial Recognition (AGFR) system that is capable of autonomous thermal infrared to visible face recognition is proposed. The main contribution of this paper includes a framework for thermal/fused-thermal-visible to visible face recognition system and a novel human-visual-system inspired thermal-visible image fusion technique. Extensive computer simulations using CARL, IRIS, AT and T, Yale and Yale-B databases demonstrate the efficiency, accuracy, and robustness of the AGFR system.