In this work, we developed a multimodal imaging system for real-time applications by integrating 2D image sensors in different spectral ranges as well as a polarization camera into a high-speed optical 3D sensor. For the generation of the multimodal image data, a pixel-level alignment of 2D images in different modalities to 3D data is realized by applying projection matrices to each point in the 3D point cloud. For the calculation of projection matrices for each 2D image sensor, a calibration procedure is proposed for the extrinsic calibration of arbitrarily positioned image sensors. The final imaging system delivers multimodal video data with one mega-pixel resolution at a frame rate of 30 Hz. As application examples, we demonstrate the estimation of vital signs and the detection of human body parts with this imaging system.
Currently, Deep Learning (DL) shows us powerful capabilities for image processing. But it cannot output the exact photometric process parameters and shows non-interpretable results. Considering such limitations, this paper presents a robot vision system based on Convolutional Neural Networks (CNN) and Monte Carlo algorithms. As an example to discuss about how to apply DL in industry. In the approach, CNN is used for preprocessing and offline tasks. Then the 6- DoF object position are estimated using a particle filter approach. Experiments will show that our approach is efficient and accurate. In future it could show potential solutions for human-machine collaboration systems.