This paper addresses the problem of face recognition using a graphical representation to identify structure that is
common to pairs of images. Matching graphs are constructed where nodes correspond to image locations and edges are
dependent on the relative orientation of the nodes. Similarity is determined from the size of maximal matching cliques in
pattern pairs. The method uses a single reference face image to obtain recognition without a training stage. The Yale
Face Database A is used to compare performance with earlier work on faces containing variations in expression,
illumination, occlusion and pose and for the first time obtains a 100% correct recognition result.
Visual attention is commonly modelled by attempting to characterise objects using features that make them special or in some way distinctive in a scene. These approaches have the disadvantage that it is never certain what features will be relevant in an object that has not been seen before. This paper provides a brief outline of the approaches to modeling human visual attention together with some of the problems that they face. A graphical representation for image similarity is described that relies on the size of maximally associative structures (cliques) that are found to be reflected in pairs of images. While comparing an image with itself, the similarity mechanism is shown to model pop-out effects when constraints are placed on the physical separation of pixels that correspond to nodes in the maximal cliques. Background regions are found to contain structure in common that is not present in the salient regions which are thereby identified by its absence. The approach is illustrated with figures that exemplify asymmetry in pop-out, the conjunction of features, orientation disturbances and the application to natural images.
This paper provides a brief outline of the approaches to modeling human visual attention. Bottom-up and top-down
mechanisms are described together with some of the problems that they face. It has been suggested in brain science that
memory functions by trading measurement precision for associative power; sensory inputs from the environment are
never identical on separate occasions, but the associations with memory compensate for the differences. A graphical
representation for image similarity is described that relies on the size of maximally associative structures (cliques) that
are found to reflect between pairs of images. This is applied to the recognition of movie posters, the location and
recognition of characters, and the recognition of faces. The similarity mechanism is shown to model popout effects
when constraints are placed on the physical separation of pixels that correspond to nodes in the maximal cliques. The
effect extends to modeling human visual behaviour on the Poggendorff illusion.
This paper describes a recognition mechanism based on the relationships between interest points and their properties that is applied to the problem of modelling the Poggendorff illusion. The recognition mechanism is shown to perform in the same manner as human
vision on the standard illusion and reduced effects are modelled on a variant without parallels. The model shows that the effect can
be explained as a perceptual compromise between alignment of the elements in the oblique axis and their displacement from each
other in the vertical. In addition an explanation is offered how obtuse angled variants of the Poggendorff figures yield stronger effects than the acute angled variants.
Proc. SPIE. 6057, Human Vision and Electronic Imaging XI
KEYWORDS: Visual process modeling, Visual analytics, Visualization, Cameras, Image processing, 3D modeling, Human vision and color perception, Analytical research, Broadband telecommunications, Network architectures
This paper proposes a new algorithm that extracts color correction parameters from pairs of images and enables the perceived illumination of one image to be imposed on the other. The algorithm does not rely upon prior assumptions regarding illumination constancy and operates between images that can be significantly different in content. The work derives from related research on visual attention and similarity in which the performance distributions of large numbers
of randomly generated features reveal characteristics of images being analysed. A proposed color correction service to be offered over a mobile network is described.
This paper describes a new approach to the automatic detection of human faces and places depicted in photographs taken on cameraphones. Cameraphones offer a unique opportunity to pursue new approaches to media analysis and management: namely to combine the analysis of automatically gathered contextual metadata with media content analysis to fundamentally improve image content recognition and retrieval. Current approaches to content-based image analysis are not sufficient to enable retrieval of cameraphone photos by high-level semantic concepts, such as who is in the photo or what the photo is actually depicting. In this paper, new methods for determining image similarity are combined with analysis of automatically acquired contextual metadata to substantially improve the performance of face and place recognition algorithms. For faces, we apply Sparse-Factor Analysis (SFA) to both the automatically captured contextual metadata and the results of PCA (Principal Components Analysis) of the photo content to achieve a 60% face recognition accuracy of people depicted in our database of photos, which is 40% better than media analysis alone. For location, grouping visually similar photos using a model of Cognitive Visual Attention (CVA) in conjunction with contextual metadata analysis yields a significant improvement over color histogram and CVA methods alone. We achieve an improvement in location retrieval precision from 30% precision for color histogram and CVA image analysis, to 55% precision using contextual metadata alone, to 67% precision achieved by combining contextual metadata with CVA image analysis. The combination of context and content analysis produces results that can indicate the faces and places depicted in cameraphone photos significantly better than image analysis or context analysis alone. We believe these results indicate the possibilities of a new context-aware paradigm for image analysis.
Whilst storage and capture technologies are able to cope with huge numbers of images, image retrieval is in danger of rendering many repositories valueless because of the difficulty of access. This paper proposes a similarity measure that imposes only very weak assumptions on the nature of the features used in the recognition process. This approach does not make use of a pre-defined set of feature measurements which are extracted from a query image and used to match those from database images, but instead generates features on a trial and error basis during the calculation of the similarity measure. This has the significant advantage that features that determine similarity can match whatever image property is important in a particular region whether it be a shape, a texture, a colour or a combination of all three. It means that effort is expended searching for the best feature for the region rather than expecting that a fixed feature set will perform optimally over the whole area of an image and over every image in a database. The similarity measure is evaluated on a problem of distinguishing similar shapes in sets of black and white symbols.
Visual systems that have evolved in nature appear to exercise a mechanism that places emphasis upon areas in a scene without necessarily recognising objects that lie in those areas. This paper describes the application of a new model of visual attention to the automatic assessment of the degree of damage in cultured human lung fibroblasts. The visual attention estimator measures the dissimilarity between neighbourhoods in the image giving higher visual attention values to neighbouring pixel configurations that do not match identical positional arrangements in other randomly selected neighbourhoods in the image. A set of tools has been implemented that processes images and produces corresponding arrays of attention values. Additional functionality has been added that provides a measure of DNA damage to images of treated lung cells affected by ultraviolet light. The unpredictability of the image attracts visual attention with the result that greater damage is reflected by higher attention values. Results are presented that indicate that the ranking provided by the visual attention estimates compare favourably with an 'experts' visual assessment of the degree of damage. Potentially, visual attention estimates may provide an alternative method of calculating the efficacy of genotoxins or modulators of DNA damage in treated human cells.
This paper describes a video summarization and semantics editing tool that is suited for content-based video indexing and retrieval with appropriate human operator assistance. The whole system has been designed with a clear focus on the extraction and exploitation of motion information inherent in the dynamic video scene. The dominant motion information has ben used explicitly for shot boundary detection, camera motion characterization, visual content variations description, and for key frame extraction. Various contributions have been made to ensure that the system works robustly with complex scenes and across different media types. A window-based graphical user interface has been designed to make the task very easy for interactive analysis and editing of semantic events and episode where appropriate.