With the popularity of commercial unmanned aerial vehicles (UAVs), people have easy access to UAV. However, people’s privacy and safety can be threatened if UAV flies at airports, private yards, etc. It is important to be able to detect the illegal UAV accurately and promptly on these vulnerable sites. However, motion blur, occlusion and truncation occur frequently due to fast movement of UAV. It is hard to make correct predictions because of the small size of UAV in images. In this paper, we propose an anchor-free one-stage method for UAV detection. The method eliminates the anchor boxes that are used in most existing detectors, which makes our method simpler and more efficient. We improve the detection accuracy in the following two ways. First, a new multi-scale feature fusion method is proposed to enhance the semantic information exchange between different scales. Second, a reasonable loss function is adopted to increase the proportion of small UAV’s loss. Experimental results validate the effectiveness of our improvements and our proposed detector achieve a superior performance.
As the number of vehicles soared, demand for parking spaces became more intense. Although there are many horizontal parking spaces located outdoor, many existing methods cannot detect horizontal parking spaces effectively due to the complex environment of outdoor and the long span of horizontal parking space. In this paper, a method based on vehicle-mounted fisheye image is proposed for outdoor parking space detection. While a car goes through a parking space at idle speed, fisheye images are captured by vehicle-mounted fisheye camera. Image processing algorithms are used to remove noise and classify light intensity. Then, different grayscale thresholds are used to enhance image contrast and Canny operator is utilized to detect contour information of the parking space line. Next, Hough transform is performed to detect straight-line segments. Angles of the line segments are calculated to determine whether the segments are perpendicular to each other. If right angle is detected in two consecutive frames, the second frame is selected as signal of starting or ending position of parking space. Different distances between space line and wheels are selected to verify that the method has good adaptability to distance. Experiments show that the proposed method can detect parking line effectively and meet real-time requirement.
Constant false alarm rate (CFAR) detection is an important method which is widely used in radar automatic detection. However, the performance of CFAR detection degrades dramatically in non-homogeneous environment because of the existence of interference targets and clutter in the reference window of the test cell. To the end, an intelligent multistrategy fusion (IMSF) CFAR detection algorithm is proposed in this paper. By combining the preprocessing results of FOCA-CFAR, BOCA-CFAR and OSVI-CFAR, IMSF-CFAR can utilize more suitable independent, identically distributed reference cells than conventional CFAR method, which means better detection performance. The simulation results reveal that the IMSF-CFAR maintains stable detection performance both in the homogeneous and heterogeneous environments.
Face reconstruction is a long-standing and extremely challenging problem. In order to solve this problem more accurately and efficiently, In this paper, we propose a model based stereo method to reconstruct fine detailed 3D face with the calibrated face stereo images captured by stereo camera. Our model based stereo method can avoid the limited power of expressiveness in model based reconstruction and the mismatch appeared in stereo reconstruction. In the proposed method, the sparse landmarks detected from stereo images are employed to reconstruct coarse shape and posture parameters with 3d morphable model(3DMM). The detailed face will be gotten by per vertex deformation based on coarse shape according to illumination value, gradient and surface smooth terms, and the direction of per vertex deformation follows the direction of the vertex’s normal, for which is the fastest changing direction of local mesh area. The experimental results show that our proposed method achieve high performance compared with the E-eos method.
Analyzing physiological signals during sleep can assist experts in diagnosing sleep arousal. To overcome this timeconsuming manual work for medical technologists, in this work a multi task algorithm for automatic identifying sleep arousal events proposed. The algorithm contains two parts: feature extractions and classification. The feature extractions are made of two regular features of arousal and one proposed feature (fuzzy entropy). Fuzzy entropy highlights the possibilities of events. With this contribution and the rest, our result reaches a sensitivity of 0.903 and a specificity of 0.834.
In digital radiography, the interaction between X-ray with object causes scattered radiation that reduces the contrast of image. Scatter kernel superposition (SKS), a computerised scatter correction method, would remove scattering in digital X-ray images. Parameters of scatter kernels in SKS are commonly obtained using Monte Carlo N-Particle Transport Code (MCNP) simulation. However, simulated scatter kernel has bias compared to physical scatter characteristic, related to errors of physical parameters of device and MCNP simulation. Because hyper-parameters in scatter kernel are difficult to optimize, we introduce Bayesian optimization to further optimize the parameters. According to the results of phantom and clinical experiments, our method improves contrast and the peak signal-to-noise ratio of images compared to traditional SKS.
Although it is well believed for years that contextual information and relation between pedestrians would help pedestrians recognition, but this idea is rarely used in the deep learning era. This is due to the fact that the convolution method of deep neural networks is not easy to fuse related features and will increase the amount of computation. In this paper, we propose a single shot proposal relation based approach for pedestrian detection. We get the proposal on the image features of different scales, and use these proposal relationships to extend the features of each proposal. Finally, the position of the pedestrian is obtained through the convolutional neural network. Its computational cost is small and it is easy to embed into existing networks. Our detector is trained in an end-to-end fashion, experimental results on the Caltech Pedestrian dataset show that our approach achieves state-of-the-art performance.
In this paper we propose a new approach to tackling the challenging problem of robust fundamental matrix estimation from corrupted correspondences. Compared with traditional robust methods, the proposed approach achieves enhanced estimation accuracy and stability. These achievements are attributed mainly to two novelties contributed by the new approach. Firstly, a new, more easily-solvable analytic objective function is proposed to well consider both the presence of correspondence outliers and the computational convenience. Secondly, an adjusted gradient projection method is developed to provide a more stable solver for robust estimation. Experimental results show that the proposed approach performs better than traditional robust methods RANSAC, MSAC, LMEDS and MLESAC, in particular when correspondences were seriously corrupted.
We propose a novel end-to-end supervised convolutional neural network(CNN) to compute disparity from a pair of stereo images. To solve the current problem of computing the high-quality disparity in ill-areas, our cascade spatial pyramid pooling (CSPP) substructure is able to gather global context information by aggregating the context information in different positions and different feature block scales from coarse to fine. We also introduce a warp layer, the right feature map is warped with the previously predicted disparity, and then is compared with the left feature map to form a cost volume. We learn the disparity from the cost volume with different level features information. We evaluate our method on three stereo datasets, and results show our method has advantages in textured areas, target edge areas and efficiency. We also achieve a high ranking performance.
As the cGANs achieves great success on pix to pix problem , we proposed a new architecture based on cGAN to solve our optical flow estimation problem. Specifically, we propose a loss function which consists of an adversarial loss and a content loss. The adversarial loss is the pixel-to-pixel loss. We use a discriminator network which is trained to differentiate the ground-truth flow and the generated flow on pixel space. The content loss focuses on perceptual similarity of the ground-truth flow and the generated flow. Our architecture (FlowGan) contains a generator based on FlowNetS with Dense Block to make it deeper and a Markovian discriminator to classify image patch instead of the whole image. We train our network with FlyingChairs datasets and evaluated our network on MPISintel. FlowGan can get competitive results with practical speed.
This paper focuses on the problem of estimating the fundamental matrix with unknown radial distortion. The general method to the problem is Gröbner basis method. That solves nontrivial polynomial equations formed by a pair of correspondences under one-parameter division model for radial distortion, which is nonconvex and no noise-resistant. Using results from polynomial optimization tools and rank minimization method, this paper shows that the problem can be solved as a sequence of convex semi-definite programs. In the experiments, we show that the proposed method works well and is more noise-resistant.
Unimodal analysis of finger-vein (FV) and finger dorsal texture (FDT) has been investigated intensively for personal recognition. Unfortunately, it is not robust to segmentation error and noise. Motivated by distribution trait of FV and FDT in a finger, we present a multimodal recognition method, called weighted sparse fusion for identification (WSFI), which uses FV and FDT images with fusion applied at the pixel level. Firstly, a new fused test sample, a weighted sum of FV and FDT images per-pixel, is obtained, the weight values are computed according to the reconstruction error of each FV and FDT pixels. And a new dictionary associated with the fused test sample is constructed in the same manner. Secondly, for every new fused test sample and the dictionary associated with it, the sparse representation based classification (SRC) is implemented for recognition. Experiments show that comparing with state-of-art techniques, our method achieves significant improvement in terms of accuracy rate (AR) equal error rate (EER).
The Screen content images (SCIs) are images containing textual and pictorial regions, which have become more and more connected with our daily life with the widespread adoption of multimedia applications. In particular, the image quality assessment (IQA) of SCIs is important because of its good property to guide and optimize lots of image processing systems. However, the no-reference (NR) IQA algorithms receive little attention and achieve unsatisfactory performance. Hence, this paper proposes a novel no-reference IQA method based on patch-wise multi-order derivatives for SCIs. This method includes two stages: patch-wise image quality evaluation and quality pooling. The first stage focuses on learning visual quality of local regions. Two features of image patches are extracted: multi-order derivative statistics, multi-order derivative histograms, which respectively describe the global and local information of the multiorder derivatives. Then the support vector regression (SVR) is applied to measure visual quality of image patches given a set of extracted features. The second stage aims at pooling patch-wise quality to an overall quality score with weights derived from entropy of gradient information of SCIs. Experimental results show that our method obtains superior performance against state-of-the-art NR-IQA approaches on the SIQAD database of SCIs, and also achieves competitive performance against state-of-the-art FR-IQA methods for SCIs.
Correlation filter based trackers have proved to be very efficient and robust in object tracking with a notable performance competitive with state-of-art trackers. In this paper, we propose a novel object tracking method named Adaptive Kernelized Correlation Filter (AKCF) via incorporating Kernelized Correlation Filter (KCF) with Structured Output Support Vector Machines (SOSVM) learning method in a collaborative and adaptive way, which can effectively handle severe object appearance changes with low computational cost. AKCF works by dynamically adjusting the learning rate of KCF and reversely verifies the intermediate tracking result by adopting online SOSVM classifier. Meanwhile, we bring Color Names in this formulation to effectively boost the performance owing to its rich feature information encoded. Experimental results on several challenging benchmark datasets reveal that our approach outperforms numerous state-of-art trackers.
An active depth sensing approach by laser speckle projection system is proposed. After capturing the speckle pattern with an infrared digital camera, we extract the pure speckle pattern using a direct-global separation method. Then the pure speckles are represented by Census binary features. By evaluating the matching cost and uniqueness between the real-time image and the reference image, robust correspondences are selected as support points. After that, we build a disparity grid and propose a generative graphical model to compute disparities. An iterative approach is designed to propagate the messages between blocks and update the model. Finally, a dense depth map can be obtained by subpixel interpolation and transformation. The experimental evaluations demonstrate the effectiveness and efficiency of our approach.
Real-time accurate motion detection is a key step for many visual applications, such as object detection, smart video surveillance and so on. Although lots of considerable research efforts have been devoted to it, it is still a challenging task due to illumination variation, etc. In order to enhance the robustness to illumination changes, many block-based motion detection algorithms are proposed. However, these methods usually neglect the influences of different block sizes. Furthermore, they cannot choose background-modeling scale automatically as environment changes. These weaknesses limit algorithm’s flexibility and their application scenes. In the paper, we propose a multi-scale motion detection algorithm to benefit from different block sizes. Moreover, an adaptive linear fusion strategy is designed through analyzing the accurateness and robustness of background models at different scales. At detecting, the ratios of different scales would be adjusted as the scene changes. In addition, to reduce the computation cost at each scale, we design an integral image structure for HOG feature of different scales. As a result, all features only need to be computed once. Different out-of-door experiments are tested and demonstrate the performance of proposed model.
This paper presents a machine vision system for automated label inspection, with the goal to reduce labor cost and ensure consistent product quality. Firstly, the images captured from each single-camera are distorted, since the inspection object is approximate cylindrical. Therefore, this paper proposes an algorithm based on adverse cylinder projection, where label images are rectified by distortion compensation. Secondly, to overcome the limited field of viewing for each single-camera, our method novelly combines images of all single-cameras and build a panorama for label inspection. Thirdly, considering the shake of production lines and error of electronic signal, we design the real-time image registration to calculate offsets between the template and inspected images. Experimental results demonstrate that our system is accurate, real-time and can be applied for numerous real- time inspections of approximate cylinders.
Proc. SPIE. 9045, 2013 International Conference on Optical Instruments and Technology: Optoelectronic Imaging and Processing Technology
KEYWORDS: Image fusion, Optical filters, Digital filtering, Machine learning, Active remote sensing, Optimization (mathematics), 3D vision, Magnetorheological finishing, 3D image processing, RGB color model
In this paper, we consider the task of hole filling in depth maps, with the help of an associated color image. We take a supervised learning approach to solve this problem. The model is learnt from the training set, which contain pixels that have depth values. Then we apply supervised learning to predict the depth values in the holes. Our model uses a regional Markov Random Field (MRF) that incorporates multiscale absolute and relative features (computed from the color image), and models depths not only at individual points but also between adjacent points. The experiments show that the proposed approach is able to recover fairly accurate depth values and achieve a high quality depth map.
Cell nuclei segmentation is a key issue in automatic cell image analysis for nuclear malignancy. However, due to the
complexity of microscopic images, it is usually not easy to obtain satisfied segmentation results, especially on the
separation of touching or overlapping nuclei. We propose a method to separate overlapping nuclei whose shapes are
similar to ellipses, even if they are tightly clustered and no edge is present where they touch. As a class-specific
approach, it introduces a statistical shape model as an extra constraint within the energy functional that measures the
homogeneity of regional intensity. The desired contours of each nucleus can be obtained by minimizing this energy
functional. The proposed algorithm has been tested on human cervical nuclei images. Experiment results show that our
method can separate touching or overlapping ellipse-like nuclei from each other accurately, and the tests on noisy and
textured nuclei images also demonstrate its robustness. The resulting segmentation contours are ellipses in different sizes
and directions, therefore the shapes of the nuclei have been preserved to a certain degree. The algorithm can be naturally
extended to color images, and also has the potential to deal with the separation for overlapping nuclei of other shapes.
Magnetic Resonance Image (MRI) is widely used in radiology diagnosis, especially in pathology detection in human brain. Most of the methods now applied to automatically segment brain tumors rely on T1-weighted sequences exclusively despite the fact that the imaging agent is multi-spectral. The work focuses on the integration or fusion of information provided by each sequence, i.e. T1, T2 and PD. Based on the fuzzy aggregators proposed in fuzzy theory, a system integrating all these information is established. The paper discusses some famous operators, their properties and application in tumor segmentation. In particular, Davies-Bouldin index is used to determine the parameters of the parametric operations. The result shows the importance of data fusion in segmentation process, discovers that T-norms are less robust to noise compared with mean operators. Meanwhile, weights allocated illustrate the order of importance of each spectrum in pathology detection, and are in agreement with their characteristic.
Automated medical image processing and analysis offer a powerful tool for medical diagnosis. In this work, a decision-tree based white blood cell (WBC) classification scheme for peripheral blood images is developed. Based on the sufficient analysis on the characteristics of white blood cells, 10 efficient features are extracted, including size, shape, intensity and color, and a classification scheme based on decision-tree is designed to classify 6 different types of normal white blood cells. Especially, an efficient approach to separate two types of neutrophil is presented. The presented scheme is tested on 59 WBCs coming from 3 sets of blood images, which are obtained under different dying and imaging conditions. Results show classification accuracy above 96%.
Magnetic resonance image analysis by computer is useful to aid diagnosis of malady. We present in this paper a automatic segmentation method for principal brain tissues. It is based on the possibilistic clustering approach, which is an improved fuzzy c-means clustering method. In order to improve the efficiency of clustering process, the initial value problem is discussed and solved by combining with a histogram analysis method. Our method can automatically determine number of classes to cluster and the initial values for each class. It has been tested on a set of forty MR brain
images with or without the presence of tumor. The experimental results showed that it is simple, rapid and robust to segment the principal brain tissues.
This paper proposes a semiautomatic algorithm for the accurate extraction of an athlete from color diving sequences. Change detection techniques and edge detection techniques are combined to extract the moving object. Color information and interactive information are used to get rough region of the athlete interested. A robust edge map is derived from the difference between successive frames, then further refining of rough athlete region is applied by the information of the robust edge. The proposed method is useful in applications with a relatively still background, Experimental results show that the method provides accurate extraction with pixel-wise precision, thus providing a reliable input to further analysis or applications such as MPEG-4.
This paper presents a fuzzy information fusion method to automatically extract tumor areas of human brain from multispectral magnetic resonance (MR) images. The multispectral images consist of T1 -weighted (T1), proton density (PD), and 12-weighted (T2) feature images, in which signal intensities of a tumor are different. Some tissue is more visible in one image type than the others. The fusion of information is therefore necessary. Our method, based on the fusion of information, model the fuzzy information about the tumor by membership functions. Thismodelisation is based on the a priori knowledge of radiology experts and the MR signals of the brain tissues. Three membership functions related to the three images types are proposed according to their characteristics. The brain extraction is then carried out by using the fusion of all three fuzzy information. The experimental results (based on 5 patients studied) show a mean false-negative of 2% and a mean false-positive of 1 .3%, comparing to the results obtained by a radiology using manual tracing.
The paper provides a novel algorithm for face rendering applications. Ensuring algorithms of low complexity to render virtual humans in VLBR networks is at the heart of our new facial rendering system. The system differs from others such as parametric animation models and interpolation solutions. The novelties include a dual segment growing algorithm and a heat diffusion rendering method. The extracting process takes into account information both in gradient domain and topographic feature. And segments are used to carry this information, which greatly reduces the transmitted packet size. Face rendering is based on this segment and is carried out like a heat diffusive process. Experimental results, as reported in following, prove that this proposed system. Furthermore this scheme can be extended to deal with more general video or image analysis and synthesis systems.
When enjoying videophone or distant learning, people want to see human face as real as possible even in very low bit rate. How to synthesis human face to deliver over network such as Internet and PSTN draws much attention. Conventional techniques based on low-level features cannot perform the desired operation. While model based method need much prior knowledge. The authors present a new algorithm for human face synthesis. It can give a virtual face based on human vision system for bit rate ranging from several kb/s to tens of KB/s. An Adaptive Face Image Filter(AFIF) is used to attenuate noise and preserve face edges as well as details. A facial region detection method detects those pixels that belong to a face. After that, with a novel facial texture interpolating method, the face is rendered in gray scale. Its key feature is a group of diffuse functions for interpolation. Then color is rendered to the whole face scalable.
Rate Control is an important component in a video encoder for date storage or real-time visual communication. In this paper, we will discuss the rate control in MPEG encoder for real-time video communication over Variable Bit Rate (VBR) channel. In interactive video communication, the video transmission is subject to both channel rate constraints and end-to-end delay constraints. Our goal in this paper is to modify the rate control in MPEG-2 encoder and satisfy the rate constraints, and study how to improve the video quality in the scenario of VBR transmission. Here, we employ Leaky- Bucket to describe the traffic parameters and monitor the encoder's output. Depending on the Rate-distortion models developed by us, we present a rate control algorithm to achieve almost uniform distortion both within a frame and between frames in a scene. With adaptive rate-distortion models and additional function of scene detecting, our method can robustly deal with scenes of different statistical characteristics. Comparing to MPEG2 TM5, in real time video communication, we could keep the constant buffer delay while maintain the decoded image quality stable. Furthermore, the bit allocation in our algorithm is more reasonable and controllable. Therefore, our method realized the advantages that advanced by VBR video communication, such as small end-to-end delay, consistent image quality and high channel efficiency.