We focus on saliency estimation in digital images. We describe why it is important to adopt a data-driven model for such an illposed problem, allowing for a universal concept of “saliency” to naturally emerge from data that are typically annotated with drastically heterogeneous criteria. Our learning-based method also involves an explicit analysis of the input at multiple scales, in order to take into account images of different resolutions, depicting subjects of different sizes. Furthermore, despite training our model on binary ground truths only, we are able to output a continuous-valued confidence map, which represents the probability of each image pixel being salient. Every contribution of our method for saliency estimation is singularly tested according to a standard evaluation benchmark, and our final proposal proves to be very effective in a comparison with the state-of-the-art.
We present a method for the automatic restoration of images subjected to the application of photographic filters, such as those made popular by photo-sharing services. The method uses a convolutional neural network (CNN) for the prediction of the coefficients of local polynomial transformations that are applied to the input image. The experiments we conducted on a subset of the Places-205 dataset show that the quality of the restoration performed by our method is clearly superior to that of traditional color balancing and restoration procedures, and to that of recent CNN architectures for image-to-image translation.
We present a fully automated approach for smile detection. Faces are detected using a multiview face detector and aligned and scaled using automatically detected eye locations. Then, we use a convolutional neural network (CNN) to determine whether it is a smiling face or not. To this end, we investigate different shallow CNN architectures that can be trained even when the amount of learning data is limited. We evaluate our complete processing pipeline on the largest publicly available image database for smile detection in an uncontrolled scenario. We investigate the robustness of the method to different kinds of geometric transformations (rotation, translation, and scaling) due to imprecise face localization, and to several kinds of distortions (compression, noise, and blur). To the best of our knowledge, this is the first time that this type of investigation has been performed for smile detection. Experimental results show that our proposal outperforms state-of-the-art methods on both high- and low-quality images.
The analysis of color and texture has a long history in image analysis and computer vision. These two properties are often considered as independent, even though they are strongly related in images of natural objects and materials. Correlation between color and texture information is especially relevant in the case of variable illumination, a condition that has a crucial impact on the effectiveness of most visual descriptors. We propose an ensemble of hand-crafted image descriptors designed to capture different aspects of color textures. We show that the use of these descriptors in a multiple classifiers framework makes it possible to achieve a very high classification accuracy in classifying texture images acquired under different lighting conditions. A powerful alternative to hand-crafted descriptors is represented by features obtained with deep learning methods. We also show how the proposed combining strategy hand-crafted and convolutional neural networks features can be used together to further improve the classification accuracy. Experimental results on a food database (raw food texture) demonstrate the effectiveness of the proposed strategy.
A simple but effective technique for absolute colorimetric camera characterization is proposed. It offers a large dynamic range requiring just a single, off-the-shelf target and a commonly available controllable light source for the characterization. The characterization task is broken down in two modules, respectively devoted to absolute luminance estimation and to colorimetric characterization matrix estimation. The characterized camera can be effectively used as a tele-colorimeter, giving an absolute estimation of the XYZ data in cd=m2. The user is only required to vary the f - number of the camera lens or the exposure time t, to better exploit the sensor dynamic range. The estimated absolute tristimulus values closely match the values measured by a professional spectro-radiometer.
In security applications the human face plays a fundamental role, however we have to assume non-collaborative subjects. A face can be partially visible or occluded due to common-use accessories such as sunglasses, hats, scarves and so on. Also the posture of the head influence the face recognizability. Given a video sequence in input, the proposed system is able to establish if a face is depicted in a frame, and to determine its degree of recognizability in terms of clearly visible facial features. The system implements features filtering scheme combined with a skin-based face detection to improve its the robustness to false positives and cartoon-like faces. Moreover the system takes into account the recognizability trend over a customizable sliding time window to allow a high level analysis of the subject behaviour. The recognizability criteria can be tuned for each specific application. We evaluate our system both in qualitative and quantitative terms, using a data set of manually annotated videos. Experimental results confirm the effectiveness of the proposed system.
The aim of this work is to study image quality of both single and multiply distorted images. We address the case of images corrupted by Gaussian noise or JPEG compressed as single distortion cases and images corrupted by Gaussian noise and then JPEG compressed, as multiply distortion case. Subjective studies were conducted in two parts to obtain human judgments on the single and multiply distorted images. We study how these subjective data correlate with No Reference state-of-the-art quality metrics. We also investigate proper combining of No Reference metrics to achieve better performance. Results are analyzed and compared in terms of correlation coefficients.
The processing pipeline of a digital camera converts the RAW image acquired by the sensor to a representation of the original scene that should be as faithful as possible. There are mainly two modules responsible for the color-rendering accuracy of a digital camera: the former is the illuminant estimation and correction module, and the latter is the color matrix transformation aimed to adapt the color response of the sensor to a standard color space. These two modules together form what may be called the color correction pipeline. We design and test new color correction pipelines that exploit different illuminant estimation and correction algorithms that are tuned and automatically selected on the basis of the image content. Since the illuminant estimation is an ill-posed problem, illuminant correction is not error-free. An adaptive color matrix transformation module is optimized, taking into account the behavior of the first module in order to alleviate the amplification of color errors. The proposed pipelines are tested on a publicly available dataset of RAW images. Experimental results show that exploiting the cross-talks between the modules of the pipeline can lead to a higher color-rendition accuracy.
Pattern matching, also known as template matching, is a computationally intensive problem aimed at localizing the instances of a given template within a query image. In this work we present a fast technique for template matching, able to use histogram-based similarity measures on complex descriptors. In particular we will focus on Color Histograms (CH), Histograms of Oriented Gradients (HOG), and Bag of visual Words histograms (BOW). The image is compared with the template via histogram-matching exploiting integral histograms. In order to introduce spatial information, template and candidates are divided into sub-regions, and multiple descriptor sizes are computed. The proposed solution is compared with the Full-Search-equivalent Incremental Dissimilarity Approximations, a state of the art approach, in terms of both accuracy and execution time on different standard datasets.
In order to create a cooking assistant application to guide the users in the preparation of the dishes relevant to their profile diets and food preferences, it is necessary to accurately annotate the video recipes, identifying and tracking the foods of the cook. These videos present particular annotation challenges such as frequent occlusions, food appearance changes, etc. Manually annotate the videos is a time-consuming, tedious and error-prone task. Fully automatic tools that integrate computer vision algorithms to extract and identify the elements of interest are not error free, and false positive and false negative detections need to be corrected in a post-processing stage. We present an interactive, semi-automatic tool for the annotation of cooking videos that integrates computer vision techniques under the supervision of the user. The annotation accuracy is increased with respect to completely automatic tools and the human effort is reduced with respect to completely manual ones. The performance and usability of the proposed tool are evaluated on the basis of the time and effort required to annotate the same video sequences.
In this paper we present a descriptor for texture classification based on the histogram of a local measure of the color contrast. The descriptor has been concatenated to several other color and intensity texture descriptors in the state of the art and has been experimented on three datasets. Results show, in nearly every case, a performance improvement with respect to results achieved by baseline methods thus demonstrating the effectiveness of the proposed texture features. The descriptor has also demonstrated to be robust with respect to global changes in lighting conditions.
The aim of our research is to specify experimentally and further model spatial frequency
response functions, which
quantify human sensitivity to spatial information in real complex images. Three visual response
functions are measured: the isolated Contrast Sensitivity Function (iCSF), which describes the
ability of the visual system to detect any spatial signal in a given spatial frequency
octave in isolation, the contextual Contrast Sensitivity Function (cCSF), which describes the
ability of the v isual system to detect a spatial signal in a given octave in an image and the
contextual Visual Perception Function (VPF), which describes visual sensitivity to changes in
suprathreshold contrast in an image. In this paper we present relevant background, along with
our first attempts to derive experimentally and further model the VPF and CSFs. We examine
the contrast detection and discrimination frameworks developed by Barten, which we find prov
ide a sound starting position for our own modeling purposes. Progress is presented
in the following areas: verification of the chosen model for detection and discrimination;
choice of contrast metrics for defining contrast sensitivity; apparatus, laboratory set-up
and imaging system characterization; stimuli acquisition and stimuli variations; spatial
decomposition; methodology for subjective tests. Initial iCSFs are presented and compared
findings that hav e used simple visual stimuli, as well as with more recent relevant work in
We present here the results obtained by including a new image descriptor, that we called prosemantic feature
vector, within the framework of QuickLook2 image retrieval system. By coupling the prosemantic features and
the relevance feedback mechanism provided by QuickLook2, the user can move in a more rapid and precise way
through the feature space toward the intended goal. The prosemantic features are obtained by a two-step feature
extraction process. At the first step, low level features related to image structure and color distribution are
extracted from the images. At the second step, these features are used as input to a bank of classifiers, each
one trained to recognize a given semantic category, to produce score vectors. We evaluated the efficacy of the
prosemantic features under search tasks on a dataset provided by Fratelli Alinari Photo Archive.
We address the problem of image quality assessment for natural images, focusing on No Reference (NR) assessment
methods for sharpness. The metrics proposed in the literature are based on edge pixel measures that
significantly suffer the presence of noise. In this work we present an automatic method that selects edge segments,
making it possible to evaluate sharpness on more reliable data. To reduce the noise influence, we also propose a
new sharpness metric for natural images.
We propose a bio-inspired framework for automatic image quality enhancement. Restoration algorithms usually
have fixed parameters whose values are not easily settable and depend on image content. In this study, we
show that it is possible to correlate no-reference visual quality values to specific parameter settings such that
the quality of an image could be effectively enhanced through the restoration algorithm. In this paper, we chose
JPEG blockiness distortion as a case study. As for the restoration algorithm, we used either a bilateral filter, or
a total variation denoising detexturer. The experimental results on the LIVE database will be reported. These
results will demonstrate that a better visual quality is achieved through the optimized parameters over the entire
range of compression, with respect to the algorithm default parameters.
In this work we present an automatic local color transfer method based on semantic image annotation. With
this annotation, images are segmented into homogeneous regions, assigned to seven different classes (vegetation,
snow, water, ground, street, and sand). Our method permits to automatically transfer the color distribution
from regions of the source and target images annotated with the same class (for example the class "sky"). The
amount of color transfer can be controlled by tuning a single parameter. Experimental results will show that
our local color transfer is usually more visually pleasant than a global approach.
In this work we present a system which visualizes the results obtained from image search engines in such a way
that users can conveniently browse the retrieved images. The way in which search results are presented allows
the user to grasp the composition of the set of images "at a glance". To do so, images are grouped and positioned
according to their distribution in a prosemantic feature space which encodes information about their content at
an abstraction level that can be placed between visual and semantic information. The compactness of the feature
space allows a fast analysis of the image distribution so that all the computation can be performed in real time.
We propose here a strategy for the automatic annotation of outdoor photographs. Images are segmented in
homogeneous regions which may be then assigned to seven different classes: sky, vegetation, snow, water, ground,
street, and sand. These categories allows for content-aware image processing strategies. Our annotation strategy
uses a normalized cut segmentation to identify the regions to be classified by a multi-class Support Vector
Machine. The strategy has been evaluated on a set of images taken from the LabelMe dataset.
In the present article we focus on enhancing the contrast of images with low illumination that present large
underexposed regions. For these particular images, when applying the standard contrast enhancement techniques,
we also introduce noise over-enhancement within the darker regions. Even if both the contrast enhancement and
denoising problems have been widely addressed within the literature, these two processing steps are, in general,
independently considered in the processing pipeline. The goal of this work is to integrate contrast enhancement
and denoise algorithms to proper enhance the above described type of images. The method has been applied
to a proper database of underexposed images. Our results have been qualitatively compared before and after
applying the proposed algorithm.
In this work we propose an image quality assessment tool. The tool is composed of different modules that
implement several No Reference (NR) metrics (i.e. where the original or ideal image is not available). Different
types of image quality attributes can be taken into account by the NR methods, like blurriness, graininess,
blockiness, lack of contrast and lack of saturation or colorfulness among others. Our tool aims to give a structured
view of a collection of objective metrics that are available for the different distortions within an integrated
framework. As each metric corresponds to a single module, our tool can be easily extended to include new
metrics or to substitute some of them. The software permits to apply the metrics not only globally but also
locally to different regions of interest of the image.
A method for contrast enhancement is proposed. The algorithm is based on a local and image-dependent exponential correction. The technique aims to correct images that simultaneously present overexposed and underexposed regions. To prevent halo artifacts, the bilateral filter is used as the mask of the exponential correction. Depending on the characteristics of the image (piloted by histogram analysis), an automated parameter-tuning step is introduced, followed by stretching, clipping, and saturation preserving treatments. Comparisons with other contrast enhancement techniques are presented. The Mean Opinion Score (MOS) experiment on grayscale images gives the greatest preference score for our algorithm.
We have designed a new self-adaptive image cropping algorithm that is able to detect several relevant regions in the
image. These regions can then be sequentially proposed as thumbnails, to the user according to their relevance order,
thus allowing the viewer to visualize the relevant image content and eventually to display or print only those regions in
which he is more interested in. The algorithm exploits both visual and semantic information. Visual information is
obtained by a visual attention model, while semantic information relates to the detection and recognition of particularly
significant objects. In this work we concentrate our attention on the two common objects found in personal photos, such
as face and skin regions. Examples are shown to illustrate the effectiveness of the proposed method.
Correct image orientation is often assumed by common imaging applications such as enhancement, browsing, and
retrieval. However, the information provided by camera metadata is often missing or incorrect. In these cases
manual correction is required, otherwise the images cannot be correctly processed and displayed. In this work
we propose a system which automatically detects the correct orientation of digital photographs. The system
exploits the information provided by a face detector and a set of low-level features related to distributions in the
image of color and edges. To prove the effectiveness of the proposed approach we evaluated it on two datasets
of consumer photographs.
The present work concerns the development of a no-reference demosaicing quality metric. The demosaicing
operation converts a raw image acquired with a single sensor array, overlaid with a color filter array, into a
full-color image. The most prominent artifact generated by demosaicing algorithms is called zipper. In this work
we propose an algorithm to identify these patterns and measure their visibility in order to estimate the perceived
quality of rendered images. We have conducted extensive subjective experiments, and we have determined the
relationships between subjective scores and the proposed measure to obtain a reliable no-reference metric.
No-reference quality metrics estimate the perceived quality exploiting only the image itself. Typically, noreference
metrics are designed to measure specific artifacts using a distortion model. Some psycho-visual experiments
have shown that the perception of distortions is influenced by the amount of details in the image's content,
suggesting the need for a "content weighting factor." This dependency is coherent with known masking effects
of the human visual system. In order to explore this phenomenon, we setup a series of experiments applying
regression trees to the problem of no-reference quality assessment. In particular, we have focused on the blocking
distortion of JPEG compressed images. Experimental results show that information about the visual content of
the image can be exploited to improve the estimation of the quality of JPEG compressed images.
This paper focuses on full-reference image quality assessment and presents different computational strategies aimed to
improve the robustness and accuracy of some well known and widely used state of the art models, namely the Structural
Similarity approach (SSIM) by Wang and Bovik and the S-CIELAB spatial-color model by Zhang and Wandell. We
investigate the hypothesis that combining error images with a visual attention model could allow a better fit of the
psycho-visual data of the LIVE Image Quality assessment Database Release 2. We show that the proposed quality
assessment metric better correlates with the experimental data.
In the framework of multimedia applications image quality may have different meanings and interpretations. In this paper, considering the quality of an image as the degree of adequacy to its function/goal within a specific application field, we provide an organized overview of image quality assessment methods by putting in evidence their applicability and limitations in different application domains. Three scenarios have been chosen representing three typical applications with different degree of constraints in their image workflow chains and requiring different image quality assessment methodologies.
Although traditional content-based retrieval systems have been successfully employed in many multimedia applications, the need for explicit association of higher concepts to images has been a pressing demand from users. Many research works have been conducted focusing on the reduction of the semantic gap between visual features and the semantics of the image content. In this paper we present a mechanism that combines broad high level concepts and low level visual features within the framework of the QuickLook content-based image retrieval system. This system also implements a relevance feedback algorithm to learn users' intended query from positive and negative image examples. With the relevance feedback mechanism, the retrieval process can be efficiently guided toward the semantic or pictorial contents of the images by providing the system with the suitable examples. The qualitative experiments performed on a database of more than 46,000 photos downloaded from the Web show that the combination of semantic and low level features coupled with a relevance feedback algorithm, effectively improve the accuracy of the image retrieval sessions.
We present different computational strategies for colorimetric characterization of scanners using multidimensional polynomials. The designed strategies allow us to determine the coefficients of an a priori fixed polynomial, taking into account different color error statistics. Moreover, since there is no clear relationship between the polynomial chosen for the characterization and the intrinsic characteristics of the scanner, we show how genetic programming could be used to generate the best polynomial. Experimental results on different devices are reported to confirm the effectiveness of our methods with respect to others in the state of the art.
Skin detection is a preliminary step in many applications. We analyze some of the most frequently cited binary skin classifiers based on explicit color cluster definition and present possible strategies to improve their performance. In particular, we demonstrate how this can be accomplished by using genetic algorithms to redefine the cluster boundaries. We also show that the fitness function can be tuned to favor either recall or precision in pixel classification. Some combining strategies are then proposed to further improve the performance of these binary classifiers in terms of recall or precision. Finally, we show that, whatever the method or the strategy employed, the performance can be enhanced by preprocessing the images with a white balance algorithm. All the experiments reported here have been run on a large and heterogeneous image database.
Several algorithms were proposed in the literature to recover
the illuminant chromaticity of the original scene. These algorithms
work well only when prior assumptions are satisfied, and the
best and the worst algorithms may be different for different scenes.
We investigate the idea of not relying on a single method but instead
consider a consensus decision that takes into account the responses
of several algorithms and adaptively chooses the algorithms
to be combined. We investigate different combining strategies
of state-of-the-art algorithms to improve the results in the
illuminant chromaticity estimation. Single algorithms and combined
ones are evaluated for both synthetic and real image databases
using the angular error between the RGB triplets of the measured
illuminant and the estimated one. Being interested in comparing the
performance of the methods over large data sets, experimental results
are also evaluated using the Wilcoxon signed rank test. Our
experiments confirm that the best and the worst algorithms do not
exist at all among the state-of-the-art ones and show that simple
combining strategies improve the illuminant estimation.
Low quality images are often corrupted by artifacts and generally need to be heavily processed to become visually pleasing. We present a modified version of unsharp masking that is able to perform image smoothing, while not only preserving but also enhancing the salient details in images. The premise supporting the work is that biological vision and image reproduction share common principles. The key idea is to process the image locally according to topographic maps obtained from a neurodynamical model of visual attention. In this way, the unsharp masking algorithm becomes local and adaptive, enhancing the edges differently according to human perception.
In this work we consider six methods for automatic white balance available in the literature. The idea investigated
does not rely on a single method, but instead considers a consensus decision that takes into account
the compendium of the responses of all the considered algorithms. Combining strategies are then proposed and
tested both on synthetic and multispectral images, extracted from well known databases. The multispectral
images are processed using a digital camera simulator developed by Stanford University. All the results are
evaluated using the Wilcoxon Sign Test.
In this paper we investigate the relationship between matrixing methods, the number of filters adopted and the
size of the color gamut of a digital camera. The color gamut is estimated using a method based on the inversion of
the processing pipeline of the imaging device. Different matrixing methods are considered, including an original
method developed by the authors. For the selection of a hypothetical forth filter, three different quality measures
have been implemented. Experimental results are reported and compared.
The illuminant estimation has an important role in many domain applications such as digital still cameras and mobile phones, where the final image quality could be heavily affected by a poor compensation of the ambient illumination effects. In this paper we present an algorithm, not dependent on the acquiring device, for illuminant estimation and compensation directly in the color filter array (CFA) domain of digital still cameras. The algorithm proposed takes into account both chromaticity and intensity information of the image data, and performs the illuminant compensation by a diagonal transform. It works by combining a spatial segmentation process with empirical designed weighting functions aimed to select the scene objects containing more information for the light chromaticity estimation. This algorithm has been designed exploiting an experimental framework developed by the authors and it has been evaluated on a database of real scene images acquired in different, carefully controlled, illuminant conditions. The results show that a combined multi domain pixel analysis leads to an improvement of the performance when compared to single domain pixel analysis.
KEYWORDS: Facial recognition systems, 3D acquisition, Detection and tracking algorithms, Image processing, Computing systems, System identification, C++, Mahalanobis distance, 3D image processing, RGB color model
This paper presents FaceLab, an innovative, open environment created to evaluate the performance of face recognition strategies. It simplifies, through an easy-to-use graphical interface, the basic steps involved in testing procedures such as data organization and preprocessing, definition and management of training and test sets, definition and execution of recognition strategies and automatic computation of performance measures. The user can extend the environment to include new algorithms, allowing the definition of innovative recognition strategies. The performance of these strategies can be automatically evaluated and compared by the tool, which computes several performance measures for both identity verification and identification scenarios.
The segmentation of skin regions in color images is a preliminary step in several applications. Many different methods for discriminating between skin and non-skin pixels are available in the literature. The simplest, and often applied, methods build what is called an "explicit skin cluster" classifier which expressly defines the boundaries of the skin cluster in certain color spaces. These binary methods are very popular as they are easy to implement and do not require a training phase. The main difficulty in achieving high skin recognition rates, and producing the smallest possible number of false positive pixels, is that of defining accurate cluster boundaries through simple, often heuristically chosen, decision rules. In this study we apply a genetic algorithm to determine the boundaries of the skin clusters in multiple color spaces. To quantify the performance of these skin detection methods, we use recall and precision scores. A good classifier should provide both high recall and high precision, but generally, as recall increases, precision decreases. Consequently, we adopt a weighted mean of precision and recall as the fitness function of the genetic algorithm. Keeping in mind that different applications may have sharply different requirements, the weighting coefficients can be chosen to favor either high recall or high precision, or to satisfy a reasonable tradeoff between the two, depending on application demands. To train the genetic algorithm (GA) and test the performance of the classifiers applying the GA suggested boundaries, we use the large and heterogeneous Compaq skin database.
Spectral characterization involves building a model that relates the device dependent representation to the reflectance function of the printed color, usually represented with a high number of reflectance samples at different wavelengths. Look-up table-based approaches, conventionally employed for colorimetric device characterization cannot be easily scaled to multispectral representations, but methods for the analytical description of devices are required. The article describes an innovative analytical printer model based on the Yule–Nielsen Spectral Neugebauer equation and formulated with a large number of degrees of freedom in order to account for dot-gain, ink interactions, and printer driver operations. To estimate our model's parameters we use genetic algorithms. No assumption is made concerning the sequence of inks during printing, and the printers are treated as RGB devices (the printer-driver operations are included in the model). We have tested our characterization method, which requires only about 130 measurements to train the learning algorithm, on four different inkjet printers, using different kinds of paper and drivers. The test set used for model evaluation was composed of 777 samples, uniformly distributed over the RGB color space.
This paper presents an innovative method that combines a feature-based approach with a holistic approach for tri-dimensional face detection and localization. Salient face features, such as the eyes and nose, are detected through an analysis of the curvature of the surface. In a second stage, each triplet consisting of a candidate nose and two candidate eyes is then processed by a PCA-based classifier, trained to discriminate between faces and non-faces. The method has been tested on about 150 3D faces acquired by a laser range scanner with good results.
In many applications, it is requested to compute surface color appearance under different illuminants. If multispectral information about surface reflectance is available, the calculus of tristimulus values under illuminants of specified SPD is straightforward. When only colorimetric information is known, the illuminant change is performed adopting transforms based on the Von Kries scaling model (i.e. the Bradford transform). Unfortunately, the three dimensional colorimetric space may be limited for properly computing the change in appearance of colors due to a change of illuminant. In this paper, a solution to this problem is presented. Assumption of the proposed method is that the problem is specific for a domain of colors, and that this domain can be modeled in a three dimensional Gaussian space. Given the domain Gaussian space, colors may be represented through synthesized reflectance spectra. Exploiting synthesized reflectance is proved to be an effective strategy to implement an illuminant change transform.
We propose an innovative approach to the selection of representative frames of a video shot for video summarization. By analyzing the differences between two consecutive frames of a video sequence, the algorithm determines the complexity of the sequence in terms of visual content changes. Three descriptors are used to express the frame’s visual content: a color histogram, wavelet statistics and an edge direction histogram. Similarity measures are computed for each descriptor and combined to form a frame difference measure. The use of multiple descriptors provides a more precise representation, capturing even small variations in the frame sequence. This method can dynamically, and rapidly select a variable number of key frame within each shot, and does not exhibit the complexity of existing methods based on clustering algorithm strategies. The method has been tested on various video segments of different genres (trailers, news, animation, etc.) and preliminary results shows that the algorithm is able to effectively summarize the shots capturing the most salient events in the sequences.
The acquisition of large or high-resolution multispectral images may require that different parts of the scene be acquired separately and then be mosaicked to obtain the whole image. While the problem of stitching together parts of an image to form a consistent whole has been studied rather extensively for traditional images, in this case view angles are typically wide, geometrical distorsions are significant, and shots are often markedly misaligned with one another. On the other hand, many current applications for multispectral acquisition present a different scenario, as view angles are narrow, shots follow a much more precise alignment, and the quality of the resulting mosaic may be sensitive to even the tiniest registration errors, depending on the context. Moreover, given the nature of multispectral imaging, precision in color reproduction is usually much more important than it is when dealing with traditional images. All these issues raise the question of whether traditional strategies will work for typical multispectral images too.
In this paper, we report an experience in multispectral mosaicking. This experience was a first attempt at developing a mosaicking framework for multispectral images, including a suitable outline of the flow of operations as well as the selection of methods and procedures for geometrical registration and color synchronization. The scenario was that of an overhung camera acquiring images from above, with lighting provided by lamps used in professional photography; this can be seen as a typical studio setup, half-way between laboratory tests and 'field usage'. The performances of our framework in terms of quality of the resulting mosaics, resource requirements, and processing time, were evaluated through tests based on real acquisitions of different images.
One of the most important components in a multispectral acquisition system is the set of optical filters that allow acquisition in different bands of the visible light spectrum. Typically, either a rather large set of traditional optical filters or a tunable filter capable of many different configurations are employed. In both cases, minimising the actual number of filters used while keeping the error sufficiently low is important to reduce operational costs, acquisition
time, and data storage space. In this paper, we introduce the Filter Vectors Analysis Method for choosing an optimal subset of filters / filter configurations among those available for a specific multispectral acquisition system. This method is based on a statistical analysis of the data resulting from an acquisition of a representive target, and tries to identify those filters that yield the most information in the given environmental conditions. We have compared our method with a simple method (ESF, for 'evenly spaced filters') that chooses filters so that their transmittance peak
wavelengths are as evenly spaced as possible within the considered spectrum. The results of our experiments suggest that the Filter Vectors Analysis Method method can not bring substantial improvements over the ESF method, but also indicate that the ideas behind our method deserve further investigation.
Information about the spectral reflectance of a color surface is useful in many applications. Assuming that reflectance functions can be adequately approximated by a linear combination of a small number of basis functions and exploiting genetic algorithms, we address here the problem of synthesizing a spectral reflectance function, given the standard CIE 1931 tristimulus values. Different sets of basis functions have been experimented and different data sets have been used for benchmarking.
The paper describes an algorithm for the automatic removal of "redeye" from digital photos. First an adaptive color cast removal algorithm is applied to correct the color photo. This phase not only facilitates the subsequent steps of processing, but also improves the overall appearance of the output image. A skin detector, based mainly on analysis of the chromatic distribution of the image, creates a probability map of skin-like regions. A multi-resolution neural
network approach is then exploited to create an analogous probability map of candidate faces. These two distributions are then combined to identify the most probable facial regions in the image. Redeye is searched for within these regions, seeking areas with high “redness” and applying geometric constraints to limit the number of false hits. The redeye removal algorithm is then applied automatically to the red eyes identified. Candidate areas are opportunely smoothed to
avoid unnatural transitions between the corrected and original parts of the eyes. Experimental results of application of this procedure on a set of over 300 images are presented.
The paper describes an innovative image annotation tool for classifying image regions in one of seven classes - sky, skin, vegetation, snow, water, ground, and buildings - or as unknown. This tool could be productively applied in the management of large image and video databases where a considerable volume of images/frames there must be automatically indexed. The annotation is performed by a classification system based on a multi-class Support Vector Machine. Experimental results on a test set of 200 images are reported and discussed.
The paper addresses the problem of distinguishing between pornographic and non-pornographic photographs, for the design of semantic filters for the web. Both, decision forests of trees built according to CART (Classification And Regression Trees) methodology and Support Vectors Machines (SVM), have been used to perform the classification. The photographs are described by a set of low-level features, features that can be automatically computed simply on gray-level and color representation of the image. The database used in our experiments contained 1500 photographs, 750 of which labeled as pornographic on the basis of the independent judgement of several viewers.
Image retrieval is a two steps process: 1) indexing, in which a set or a vector of features summarizing the properties of each image in the database, is computed and stored; and 2) retrieval, in which the features of the query image are extracted and compared with the others in the database. The database images are then ranked in order of their similarity. We introduce an innovative image retrieval strategy, the Dynamic Spatial Chromatic Histogram, which makes it possible to take into account spatial information in a flexible way without greatly adding to computation costs. Our preliminary results on a database of about 3000 images show that the proposed indexing and retrieval strategy is a powerful approach
In recent years, many methods have been proposed for the spectral-based characterization of inkjet printers. To our knowledge, the majority of these are based on a physical description of the printing process, employing different strategies to deal with mechanical dot gain and the physical interaction among inks. But our experience tells us that as printing is a physical process involving a large number of effects and unpredictable interactions, it is not unusual to be unable to fit a mathematical model to a given printer. The question becomes, therefore, whether it is feasible, and to what degree, to employ an analytical printer model even if it appears to be incapable of describing the behavior of a given device. A key objective of our work is to obtain a procedure that can spectrally characterize any printer, regardless of the paper and the printer driver used. We consider in fact the printers RGB devices, and incorporate the printer driver operations, even if they are unknown to us, into the analytical model.
We report here our experimentation on the use of genetic algorithms to tune a spectral printer model based on the Yule-Nielsen modified Neugebauer equation. In our experiments we have considered three different inkjet printers and used different kinds of paper and printer drivers. For each device the printer model has been tuned, using a genetic algorithm, on a data set of some 150 measured reflectance spectra. The test set was composed of 777 samples, uniformly distributed in the RGB color space.
Tasks such as assessing the capabilities of a device or performing gamut mapping require the accurate estimation of device gamut boundaries in colorimetric space. We propose here a new method (which we call the Face Triangulation Method) to estimate the physical gamut boundary of a colour printer. The method has been tested on a real device, using two different paper media, and a common type of Gamut Boundary Descriptor. Given the small number of sample colours used for the estimate, and the high number of test colours employed, we conclude that the results achieved are rather good. We also indicate how to extend the Face Triangulation Method so that the boundary estimation can be further improved either locally or globally.
The paper describes an adaptive and tunable color cast removal algorithm. This multi-step algorithm first quantifies the strength of the cast by applying a color cast detector, which classifies the input images as having no cast, evident cast, ambiguous cast (images with low cast, or for which whether or not the cast exists is a subjective opinion), or intrinsic cast (images presenting a cast that is probably due to a predominant color we want to preserve, such as in underwater images). The cast remover, a modified version of the white balance algorithm, is then applied in the two cases of evident or ambiguous casts. The method we propose has been tuned and tested, with positive results, on a large data set of images downloaded from personal web-pages, or acquired by various digital cameras.
The paper addresses the problem of annotating photographs with broad semantic labels. To cope with the great variety of photos available on the WEB we have designed a hierarchical classification strategy which first classifies images as pornographic or not-pornographic. Not-pornographic images are then classified as indoor, outdoor, or close-up. On a database of over 9000 images, mostly downloaded from the web, our method achieves an average accuracy of close to 90%.
The paper presents an innovative approach to the spectral-based characterization of ink-jet color printers. Our objective was to design a color separation procedure based on a spectral model of the printer, managed here as an RGB device. The printer was a four-ink device, and we assumed that the driver always replaced the black completely when converting from RGB to CMYK amounts of ink. The color separation procedure, which estimates the RGB values given a reflectance spectrum, is based on the inversion of the Yule-Nielsen modified Neugebauer model. To improve the performance of the direct Neugebauer model in computing the reflectance spectrum of the print, given the amounts of ink, we designed a method that exploits the results of the numerical inversion of the Neugebauer model to estimate a correction of the amount of black ink computed on RGB values. This correction can be considered a first step in optimization of the Neugebauer model; it accounts for ink-trapping and the lack of knowledge on how the black is actually replaced by the printer driver.
We present here a web-based protytpe for the interactive search of items in quality electronic catalogues. The system based on a multimedia information retrieval architecture, allows the user to query a multimedia database according to several retrieval strategies, and progressively refine the system's response by indicating the relevance, or non-relevance of the items retrieved. Once a subset of images meeting the user's information needs have been identified, these images can be displayed in a virtual exhibition that can be visited interactively by the user exploiting VRML technology.
The paper describes a method for detecting a color cast (i.e. a superimposed dominant color) in a digital image without any a priori knowledge of its semantic content. The color gamut of the image is first mapped in the CIELab color space. The color distribution of the whole image and of the so-called Near Neutral Objects (NNO) is then investigated using statistical tools then, to determine the presence of a cast. The boundaries of the near neutral objects in the color space are set adaptively by the algorithm on the basis of a preliminary analysis of the image color gamut. The method we propose has been tuned and successfully tested on a large data set of images, downloaded from personal web-pages or acquired using various digital and traditional cameras.
The need to retrieval visual information form large image and video collections is shared by many application domains. This paper describes the main features of Quicklook, a system that combines in a single framework the alphanumeric relational query, the content-based image query exploiting automatically computed low-level image features, and the textural similarity query exploiting any textual attached to image database items.
A hierarchical classification for photographs, graphics, texts and compound documents is described. The key features of the strategy are the use of Cart trees for classification and the indexing of the images considering only low-level perceptual features, such as color, texture, and shape, automatically computed on the images. The preliminary results are reported and discussed.
We have examined the performance of various color-based retrieval strategies when coupled with a pre-filtering Retinex algorithm to see whether, and to what degree, Retinex improved the effectiveness of the retrieval, regardless of the strategy adopted. The retrieval strategies implemented included color and spatial-chromatic histogram matching, color coherence vector matching, and the weighted sum of the absolute differences between the first three moments of each color channel. The experimental results are reported and discussed.
Proc. SPIE. 4300, Color Imaging: Device-Independent Color, Color Hardcopy, and Graphic Arts VI
KEYWORDS: Human-machine interfaces, Visualization, Vegetation, Data visualization, Human vision and color perception, Chemical elements, Algorithm development, Space operations, Graphic design, Information visualization
In this paper, we describe the main feature of a system supporting the selection of color palettes for qualitative data representation and graphic interface design. The system is mainly based on visual interaction providing effective tools for browsing the Munsell color space and setting perceptual constraints on the number and type of colors the system selects automatically. The system can also manage ICC device profiles, making it possible to process colors in terms of standard, device-independent color representation. Experimental results are also reported.
The effective classification of the contents of an image allows us to adopt the most appropriate strategies for image enhancement, color processing, compression, and rendering. We address here the problem of distinguishing photographs from graphics and texts purely on the basis of low-level feature analysis. The preliminary results of our experimentation are reported.
There is a great demand for efficient tools that can, on the basis of the pictorial content, organize large quantities of images and rapidly retrieve those of interest. With that goal in mind we present a method for indexing complex color images. The basic idea is to exploit image data decomposition and compression based on the standard Haar multiresolution wavelet transform to describe image content. In this way we are able to effectively eliminate data redundancy and concisely represent the salient features of the image in image signatures of predefined lengths. In the retrieval phase image signatures are compared using a similarity measure that the system has 'learned' from user's. Experimental results confirm the feasibility of our approach, which outperforms more standard procedures, in retrieval accuracy and at lower computational costs.