RGB-D object detection is a challenging task due to the demand of effectively processing of visible modality and depth modality features. However, pre-existing RGB-D object detection models have several deficiencies, including demand for hand-crafted settings, and insufficient ability of fusing cross-modal features. In this paper, we propose a novel Cross-modal RGB-D object detection model, based on Deformable DETR, named as CM-DETR. Our proposed model can effectively fuse multi-modal information, and don’t need hand-crafted settings resulted from prior information. Extensive experiments show that our model has achieved extraordinary improvement, which exceeds the baseline by more than 4.6% mAP on SUN-RGBD and 6.9% mAP on NYUDv2.
Due to the increasing demand for deploying CNNs on resource-constrained platforms, efficient neural networks are becoming more and more popular, in which depthwise convolution plays an indispensable role. Recently, larger kernel sizes (≥5) have been applied to depthwise convolution, but with significantly increased computational cost and parameter size. In this paper, we propose a novel extremely separated convolutional block (XSepConv), which fuses spatially separable convolutions into depthwise convolution to substantially reduce both the computational cost and the parameter size induced by large kernels. We also propose an extra 2×2 depthwise convolution coupled with a new and improved symmetric padding strategy to compensate for the side effect brought by spatially separable convolutions. XSepConv is a more efficient alternative to vanilla depthwise convolution with large kernel sizes; moreover, extensive experiments on popular image classification and object detection benchmarks demonstrate that XSepConv can strike a better trade-off between accuracy and efficiency, and the improvement is more significant for larger kernels.
Face reconstruction is a long-standing and extremely challenging problem. In order to solve this problem more accurately and efficiently, In this paper, we propose a model based stereo method to reconstruct fine detailed 3D face with the calibrated face stereo images captured by stereo camera. Our model based stereo method can avoid the limited power of expressiveness in model based reconstruction and the mismatch appeared in stereo reconstruction. In the proposed method, the sparse landmarks detected from stereo images are employed to reconstruct coarse shape and posture parameters with 3d morphable model(3DMM). The detailed face will be gotten by per vertex deformation based on coarse shape according to illumination value, gradient and surface smooth terms, and the direction of per vertex deformation follows the direction of the vertex’s normal, for which is the fastest changing direction of local mesh area. The experimental results show that our proposed method achieve high performance compared with the E-eos method.
Although it is well believed for years that contextual information and relation between pedestrians would help pedestrians recognition, but this idea is rarely used in the deep learning era. This is due to the fact that the convolution method of deep neural networks is not easy to fuse related features and will increase the amount of computation. In this paper, we propose a single shot proposal relation based approach for pedestrian detection. We get the proposal on the image features of different scales, and use these proposal relationships to extend the features of each proposal. Finally, the position of the pedestrian is obtained through the convolutional neural network. Its computational cost is small and it is easy to embed into existing networks. Our detector is trained in an end-to-end fashion, experimental results on the Caltech Pedestrian dataset show that our approach achieves state-of-the-art performance.
KEYWORDS: Electroencephalography, Feature extraction, Signal detection, Electromyography, Spindles, Fuzzy logic, Polysomnography, Signal processing, Linear filtering, Data modeling
Analyzing physiological signals during sleep can assist experts in diagnosing sleep arousal. To overcome this timeconsuming manual work for medical technologists, in this work a multi task algorithm for automatic identifying sleep arousal events proposed. The algorithm contains two parts: feature extractions and classification. The feature extractions are made of two regular features of arousal and one proposed feature (fuzzy entropy). Fuzzy entropy highlights the possibilities of events. With this contribution and the rest, our result reaches a sensitivity of 0.903 and a specificity of 0.834.
In digital radiography, the interaction between X-ray with object causes scattered radiation that reduces the contrast of image. Scatter kernel superposition (SKS), a computerised scatter correction method, would remove scattering in digital X-ray images. Parameters of scatter kernels in SKS are commonly obtained using Monte Carlo N-Particle Transport Code (MCNP) simulation. However, simulated scatter kernel has bias compared to physical scatter characteristic, related to errors of physical parameters of device and MCNP simulation. Because hyper-parameters in scatter kernel are difficult to optimize, we introduce Bayesian optimization to further optimize the parameters. According to the results of phantom and clinical experiments, our method improves contrast and the peak signal-to-noise ratio of images compared to traditional SKS.
We propose a novel end-to-end supervised convolutional neural network(CNN) to compute disparity from a pair of stereo images. To solve the current problem of computing the high-quality disparity in ill-areas, our cascade spatial pyramid pooling (CSPP) substructure is able to gather global context information by aggregating the context information in different positions and different feature block scales from coarse to fine. We also introduce a warp layer, the right feature map is warped with the previously predicted disparity, and then is compared with the left feature map to form a cost volume. We learn the disparity from the cost volume with different level features information. We evaluate our method on three stereo datasets, and results show our method has advantages in textured areas, target edge areas and efficiency. We also achieve a high ranking performance.
In this paper we propose a new approach to tackling the challenging problem of robust fundamental matrix estimation from corrupted correspondences. Compared with traditional robust methods, the proposed approach achieves enhanced estimation accuracy and stability. These achievements are attributed mainly to two novelties contributed by the new approach. Firstly, a new, more easily-solvable analytic objective function is proposed to well consider both the presence of correspondence outliers and the computational convenience. Secondly, an adjusted gradient projection method is developed to provide a more stable solver for robust estimation. Experimental results show that the proposed approach performs better than traditional robust methods RANSAC, MSAC, LMEDS and MLESAC, in particular when correspondences were seriously corrupted.
As the cGANs achieves great success on pix to pix problem [12], we proposed a new architecture based on cGAN to solve our optical flow estimation problem. Specifically, we propose a loss function which consists of an adversarial loss and a content loss. The adversarial loss is the pixel-to-pixel loss. We use a discriminator network which is trained to differentiate the ground-truth flow and the generated flow on pixel space. The content loss focuses on perceptual similarity of the ground-truth flow and the generated flow. Our architecture (FlowGan) contains a generator based on FlowNetS with Dense Block to make it deeper and a Markovian discriminator to classify image patch instead of the whole image. We train our network with FlyingChairs datasets and evaluated our network on MPISintel. FlowGan can get competitive results with practical speed.
This paper focuses on the problem of estimating the fundamental matrix with unknown radial distortion. The general method to the problem is Gröbner basis method. That solves nontrivial polynomial equations formed by a pair of correspondences under one-parameter division model for radial distortion, which is nonconvex and no noise-resistant. Using results from polynomial optimization tools and rank minimization method, this paper shows that the problem can be solved as a sequence of convex semi-definite programs. In the experiments, we show that the proposed method works well and is more noise-resistant.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.