We address the automatic differentiation of human tissue using multispectral imaging with promising potential for automatic visualization during surgery. Currently, tissue types have to be continuously differentiated based on the surgeon’s knowledge only. Further, automatic methods based on optical in vivo properties of human tissue do not yet exist, as these properties have not been sufficiently examined. To overcome this, we developed a hyperspectral camera setup to monitor the different optical behavior of tissue types in vivo. The aim of this work is to collect and analyze these behaviors to open up optical opportunities during surgery. Our setup uses a digital camera and several bandpass filters in front of the light source to illuminate different tissue types with 16 specific wavelength ranges. We analyzed the different intensities of eight healthy tissue types over the visible spectrum (400 to 700 nm). Using our setup and sophisticated postprocessing in order to handle motion during capturing, we are able to find tissue characteristics not visible for the human eye to differentiate tissue types in the 16-dimensional wavelength domain. Our analysis shows that this approach has the potential to support the surgeon’s decisions during treatment.
We present a complete system for the automatic creation of talking head video sequences from text messages. Our system converts the text into MPEG-4 Facial Animation Parameters and synthetic voice. A user selected 3D character will perform lip movements synchronized to the speech data. The 3D models created from a single image vary from realistic people to cartoon characters. A voice selection for different languages and gender as well as a pitch shift component enables a personalization of the animation. The animation can be shown on different displays and devices ranging from 3GPP players on mobile phones to real-time 3D render engines. Therefore, our system can be used in mobile communication for the conversion of regular SMS messages to MMS animations.
In this paper, a next generation 3-D video conferencing system is
presented that provides immersive tele-presence and natural
representation of all participants in a shared virtual meeting space.
The system is based on the principle of a shared virtual table
environment which guarantees correct eye contact and gesture
reproduction and enhances the quality of human-centered communication. The virtual environment is modeled in MPEG-4 which also allows the seamless integration of explicit 3-D head models for a low-bandwidth connection to mobile users. In this case, facial
expression and motion information is transmitted instead of video
streams resulting in bit-rates of a few kbit/s per participant. Beside low bit-rates, the model-based approach enables new possibilities for image enhancements like digital make-up, digital dressing, or modification of scene lighting.
Illumination variability has a considerable influence on the performance of computer vision algorithms or video coding methods. The efficiency and robustness of these algorithms can be significantly improved by removing the undesired effects of changing illumination. In this paper, we introduce a 3-D model-based technique for estimating and manipulating the lighting in an image sequence. The current scene lighting is estimated for each frame exploiting 3-D model information and by synthetic re-lighting of the original video frames. To provide the estimator with surface normal information, the objects in the scene are represented by 3-D shape models and their motion and deformation are tracked over time using a model-based estimation method. Given the normal information, the current lighting is estimated with a linear algorithm of low computational complexity using an orthogonal set of light maps. This results in a small set of parameters which efficiently represent the scene lighting. In our experiments, we demonstrate how this representation can be used to create video sequences with arbitrary new illumination. When encoding a video sequence for transmission over a network, significant coding gains can be achieved when removing the time varying lighting effects prior to encoding. Improvements of up to 3 dB in PSNR are observed for an illumination-compensated sequence.
Block-based disparity compensation is an efficient prediction scheme for encoding multi-view image data. Available scene geometry can be used to further enhance prediction accuracy. In this paper, three different strategies are compared that combine prediction based on depth maps and 3-D geometry. Three real-world image sets are used to examine prediction performance for different coding scenarios. Depth maps and geometry models are derived from the calibrated image data. Bit-rate reductions up to 10% are observed by suitably augmenting depth map-based with geometry-based prediction.