Texture classification plays a major role in many computer vision applications. Local binary patterns (LBP) encoding schemes have largely been proven to be very effective for this task. Improved LBP (ILBP) are conceptually simple, easy to implement, and highly effective LBP variants based on a point-to-average thresholding scheme instead of a point-to-point one. We propose the use of this encoding scheme for extracting intra- and interchannel features for color texture classification. We experimentally evaluated the resulting improved opponent color LBP alone and in concatenation with the ILBP of the local color contrast map on a set of image classification tasks over 9 datasets of generic color textures and 11 datasets of biomedical textures. The proposed approach outperformed other grayscale and color LBP variants in nearly all the datasets considered and proved competitive even against image features from last generation convolutional neural networks, particularly for the classification of biomedical images.
The analysis of color and texture has a long history in image analysis and computer vision. These two properties are often considered as independent, even though they are strongly related in images of natural objects and materials. Correlation between color and texture information is especially relevant in the case of variable illumination, a condition that has a crucial impact on the effectiveness of most visual descriptors. We propose an ensemble of hand-crafted image descriptors designed to capture different aspects of color textures. We show that the use of these descriptors in a multiple classifiers framework makes it possible to achieve a very high classification accuracy in classifying texture images acquired under different lighting conditions. A powerful alternative to hand-crafted descriptors is represented by features obtained with deep learning methods. We also show how the proposed combining strategy hand-crafted and convolutional neural networks features can be used together to further improve the classification accuracy. Experimental results on a food database (raw food texture) demonstrate the effectiveness of the proposed strategy.
Automatic action recognition in videos is a challenging computer vision task that has become an active research area in recent years. Existing strategies usually use kernel-based learning algorithms that considers a simple combination of different features completely disregarding how such features should be integrated to fit the given problem. Since a given feature is most suitable to describe a given image/video property, the adaptive weighting of such features can improve the performance of the learning algorithm. In this paper, we investigated the use of the Multiple Kernel Learning (MKL) algorithm to adaptive search for the best linear relation among the considered features. MKL is an extension of the support vector machines (SVMs) to work with a weighted linear combination of several single kernels. This approach allows to simultaneously estimate the weights for the multiple kernels combination as well as the underlying SVM parameters. In order to prove the validity of the MKL approach, we considered a descriptor composed of multiple features aligned with dense trajectories. We experimented our approach on a database containing 36 cooking actions. Results confirm that the use of MKL improves the classification performance.
Intelligent multi-camera systems that integrate computer vision algorithms are not error free, and thus both false positive and negative detections need to be revised by a specialized human operator. Traditional multi-camera systems usually include a control center with a wall of monitors displaying videos from each camera of the network. Nevertheless, as the number of cameras increases, switching from a camera to another becomes hard for a human operator.
In this work we propose a new method that dynamically selects and displays the content of a video camera from all the available contents in the multi-camera system. The proposed method is based on a computational model of human visual attention that integrates top-down and bottom-up cues. We believe that this is the first work that tries to use a model of human visual attention for the dynamic selection of the camera view of a multi-camera system.
The proposed method has been experimented in a given scenario and has demonstrated its effectiveness with respect to the other methods and manually generated ground-truth. The effectiveness has been evaluated in terms of number of correct best-views generated by the method with respect to the camera views manually generated by a human operator.
In order to create a cooking assistant application to guide the users in the preparation of the dishes relevant to their profile diets and food preferences, it is necessary to accurately annotate the video recipes, identifying and tracking the foods of the cook. These videos present particular annotation challenges such as frequent occlusions, food appearance changes, etc. Manually annotate the videos is a time-consuming, tedious and error-prone task. Fully automatic tools that integrate computer vision algorithms to extract and identify the elements of interest are not error free, and false positive and false negative detections need to be corrected in a post-processing stage. We present an interactive, semi-automatic tool for the annotation of cooking videos that integrates computer vision techniques under the supervision of the user. The annotation accuracy is increased with respect to completely automatic tools and the human effort is reduced with respect to completely manual ones. The performance and usability of the proposed tool are evaluated on the basis of the time and effort required to annotate the same video sequences.
In this paper we present a descriptor for texture classification based on the histogram of a local measure of the color contrast. The descriptor has been concatenated to several other color and intensity texture descriptors in the state of the art and has been experimented on three datasets. Results show, in nearly every case, a performance improvement with respect to results achieved by baseline methods thus demonstrating the effectiveness of the proposed texture features. The descriptor has also demonstrated to be robust with respect to global changes in lighting conditions.