Personal consumer photography collections often contain photos captured by numerous devices stored both locally and
via online services. The task of gathering, organizing, and assembling still and video assets in preparation for sharing
with others can be quite challenging. Current commercial photobook applications are mostly manual-based requiring
significant user interactions. To assist the consumer in organizing these assets, we propose an automatic method to
assign a fitness score to each asset, whereby the top scoring assets are used for product creation. Our method uses cues
extracted from analyzing pixel data, metadata embedded in the file, as well as ancillary tags or online comments. When a
face occurs in an image, its features have a dominating influence on both aesthetic and compositional properties of the
displayed image. As such, this paper will emphasize the contributions faces have on affecting the overall fitness score of
an image. To understand consumer preference, we conducted a psychophysical study that spanned 27 judges, 5,598
faces, and 2,550 images. Preferences on a per-face and per-image basis were independently gathered to train our
classifiers. We describe how to use machine learning techniques to merge differing facial attributes into a single
classifier. Our novel methods of facial weighting, fusion of facial attributes, and dimensionality reduction produce stateof-
the-art results suitable for commercial applications.
A key aspect of image effectiveness is how well the image visually communicates the main subject. In consumer images,
two important features that impact viewer appreciation of the main subject are the amount of clutter and the main subject
placement within the image. Two subjective experiments were conducted to assess the relationship between aesthetic
and technical quality and perception of clutter and image center. For each experiment, 30 participants evaluated the same
70 images, on 0 to 100-point scales for aesthetic and technical quality. For the clutter experiment, participants also
evaluated the images, on 0 to 100-point scales for amount of clutter and main subject emphasis. For the center
experiment, participants pointed directly onto the image to mark the center of interest. Results indicate that aesthetic
quality, technical quality, amount of clutter, and main subject emphasis are strongly correlated. Based on 95%
confidence ellipses and mean-shift clustering, expert main subject maps are consistent with observer identification of
main subject location. Further, the distribution of the observer identification of the center of interest is related to the
object class (e.g., person, scenery). Additional features related to image composition can be used to explain clusters formed by patterns of mean ratings.
The primary goal of the current research was to develop image categorization algorithms that are more consistent with users' search strategies for their personal image collections. Other goals were to provide users with the option of correcting and labeling these image groups and to understand user behaviors and needs while they are using an automated image-organization system. The main focus of this paper is to provide automatic organization of images by two of the most important semantic classes in the consumer domain-events and people. Methods are described for automatically producing meaningful groups of images whereby each group depicts an event as well as clusters of similar faces in users' collections. Given that the proposed system envisions user interaction and is intended for organizing and searching personal collections, a usability study focused on consumers was conducted to gauge the performance of the system.
With the increasing use of digital imaging in general consumer applications, there is a great deal of interest in developing
new products that increase the value and enjoyment level of viewing digital images in consumers' living rooms. One way
to enrich image viewing and sharing is to combine images with voice annotation and music. A picture VCD system
(PVCD) was developed for multimedia authoring, centered around and driven by still photos, with an emphasis on
composing still images with sound, including music and spoken annotations. We describe the overall system, as well as
major enabling technology components, including multimedia authoring, semantic image classification, and cross-media
indexing. The finished multimedia bit stream is primarily recorded on DVD or VCD to facilitate enriched enjoyment
through a TV set, but can also be shared on a desktop/laptop, via email, or online.
In consumer photography, image appeal may be defined by the interest that a picture generates when viewed by third-party observers. In this paper, the results of a ground truth experiment on human estimation of image appeal are reported, where 11 participants were asked to rank pictures in 30 groups, based on their relative appeal within their group and comment on the factors that influenced their decisions. Based on their responses, a list of both positive and negative influences was compiled and influences were grouped in general categories that include people, composition, subject, and objective metrics. The results of our experiment indicate that image appeal is related to image quality only with respect to the influences in the category of objective metrics, while the majority of influences belong to the categories of people, composition, and subject. The influences in these categories are scene dependent and fundamentally different in nature from traditional image quality metrics. Thus, when evaluating image appeal a new set of metrics needs to be developed. Individual influences and their relative merits are discussed.
For video applications such as video-on-demand, signaling is an important step for service initialization including service setup, authentication, and resource allocation. This paper describes the design and implementation of an experimental platform for delivery of real-time MPEG encoded video over LAN with DSM-CC signaling. The platform consists of a real-time MPEG encoder system as a server, a network node workstation, and a client with MPEG decoding capability. Specifically, we have implemented a client- initiated subset of the DSM-CC user-to-network (UN) protocol for signaling. Communication between the network node and server/client is achieved by using the TCP/IP protocol. The network node provides the functionality of upstream message exchange, UN and/or UU (user-to-user) resource negotiation, and MPEG bitstream relay. The operation of our current video platform setup is quite robust and can deliver MPEG video up to about 1.5 Mb/s via DSM-CC protocols. It is demonstrated that practical implementation of video applications is feasible over small-scale LANs.
This paper presents a dual architecture for the high-speed realization of basic morphological operations. Since morphological filtering can be described as a combination of erosion and dilation, two basic building blocks are required for the realization of any morphological filter. Architectures for the two basic units, namely the erosion unit and the dilation unit, are proposed and studied in terms of cycle time, hardware complexity, and cost. These basic units are similar in structure to the systolic array architecture used in the implementation of linear digital filters. Correspondingly, the proposed units are highly modular and are suitable for efficient VLSI implementation. These basic units allow the processing of either binary or gray-scale images. They are particularly suitable for applications in robotics, where speed, size and cost are of critical importance.