This PDF file contains the front matter associated with SPIE Proceedings Volume 8664, including the Title Page, Copyright Information, Table of Contents, Introduction, and Conference Committee listing.
This paper describes how videos can be implemented into printed photo books. More than half of consumers take videos
with DSCs, the rest use camcorders, smartphones and other devices. This means that consumers who use DSC’s to
capture the images used in their photo books are an ideal target group for a new service, enabling them to use the CEWE
PHOTOBOOK software to select scenes (frames) from their videos and have them printed together with a QR code in
the book. Once the customer receives the printed product, they can scan the QR code with any smart phone or tablet and
the video clip will be played on the mobile device.
The popularity of tablets in recent years has given rise to a new style of presenting web content, one that mimics the appearance of print magazines. We present a pipeline which automatically harvests and reformats web content from arbitrary sources into a paginated multi-column form and deliver the publication to the reader via print or mobile. We are working to integrate this pipeline into HP’s Scheduled Delivery program.
Over one million interactive whiteboards (IWBs) are sold annually worldwide, predominantly for classroom use with few sales for corporate use. Unmet needs for IWB corporate use were investigated and the CloudBoard Research Platform (CBRP) was developed to investigate and test technology for meeting these needs. The CBRP supports audio conferencing with shared remote drawing activity, casual capture of whiteboard activity for long-term storage and retrieval, use of standard formats such as PDF for easy import of documents via the web and email and easy export of documents. Company RFID badges and key fobs provide secure access to documents at the board and automatic logout occurs after a period of inactivity. Users manage their documents with a web browser. Analytics and remote device management is provided for administrators. The IWB hardware consists of off-the-shelf components (a Hitachi UST Projector, SMART Technologies, Inc. IWB hardware, Mac Mini, Polycom speakerphone, etc.) and a custom occupancy sensor. The three back-end servers provide the web interface, document storage, stroke and audio streaming. Ease of use, security, and robustness sufficient for internal adoption was achieved. Five of the 10 boards installed at various Ricoh sites have been in daily or weekly use for the past year and total system downtime was less than an hour in 2012. Since CBRP was installed, 65 registered users, 9 of whom use the system regularly, have created over 2600 documents.
This paper provides an overview of a system for the automatic composition of publications. The system first composes nested hierarchies of contents, then applies layout engines at branch points in the hierarchies to explore layout options, and finally selects the best overall options for the finished publications. Although the system has been developed as a general platform for automated publishing, this paper describes its application to the composition and layout of a magazine-like publication for social content from Facebook. The composition process works by assembling design fragments that have been populated with text and images from the Facebook social network. The fragments constitute a design language for a publication. Each design fragment is a nested mutable sub-layout that has no specific size or shape until after it has been laid-out. The layout process balances the space requirements of the fragment’s internal contents with its external context in the publication. The mutability of sub-layouts requires that their layout options must be kept open until all the other contents that share the same space have been considered. Coping with large numbers of options is one of the greatest challenges in layout automation. Most existing layout methods work by rapidly elimination design options rather than by keeping options open. A further goal of this publishing system is to confirm that a custom publication can be generated quickly by the described methods. In general, the faster that publications can be created, the greater the opportunities for the technology.
To increase the flexibility and enrich the reading experience of e-book on small portable screens, a graph based method
is proposed to perform layout analysis on Portable Document Format (PDF) documents. Digital born document has its
inherent advantages like representing texts and fractional images in explicit form, which can be straightforwardly
exploited. To integrate traditional image-based document analysis and the inherent meta-data provided by PDF parser,
the page primitives including text, image and path elements are processed to produce text and non text layer for
respective analysis. Graph-based method is developed in superpixel representation level, and page text elements
corresponding to vertices are used to construct an undirected graph. Euclidean distance between adjacent vertices is
applied in a top-down manner to cut the graph tree formed by Kruskal’s algorithm. And edge orientation is then used in a
bottom-up manner to extract text lines from each sub tree. On the other hand, non-textual objects are segmented by
connected component analysis. For each segmented text and non-text composite, a 13-dimensional feature vector is
extracted for labelling purpose. The experimental results on selected pages from PDF books are presented.
Document aesthetics measures are key to automated document composition. Recently we presented a probabilistic document model (PDM) which is a micro-model for document aesthetics based on a probabilistic modeling of designer choice in document design. The PDM model comes with efficient layout synthesis algorithms once the aesthetic model is defined. A key element of this approach is an aesthetic prior on the parameters of a template encoding aesthetic preferences for template parameters. Parameters of the prior were required to be chosen empirically by designers. In this work we show how probabilistic template models (and hence the PDM cost function) can be learnt directly by observing a designer making design choices in composing sample documents. From such training data our learning approach can learn a quality measure that can mimic some of the design tradeoffs a designer makes in practice.
Automated publishing requires large databases containing document page layout templates. The number of
layout templates that need to be created and stored grows exponentially with the complexity of the document
layouts. A better approach for automated publishing is to reuse layout templates of existing documents for
the generation of new documents. In this paper, we present an algorithm for template extraction from a docu-
ment page image. We use the cost-optimized segmentation algorithm (COS) to segment the image, and Voronoi
decomposition to cluster the text regions. Then, we create a block image where each block represents a homo-
geneous region of the document page. We construct a geometrical tree that describes the hierarchical structure
of the document page. We also implement a font recognition algorithm to analyze the font of each text region.
We present a detailed description of the algorithm and our preliminary results.
In the design of a magazine cover, making a set of decisions regarding the color distribution of the cover image and the colors of other graphical and textual elements is considered to be the concept of color design. This concept addresses a number of subjective challenges, specifically how to determine a set of colors that is aesthetically pleasing yet also contributes to the functionality of the design, the legibility of textual elements, and the stylistic consistency of the class of magazine. Our solution to automatic color design includes the quantification of these challenges by deploying a number of well-known color theories. These color theories span both color harmony and color semantics. The former includes a set of geometric structures that suggest which colors are in harmony together. The latter suggests a higher level of abstraction. Color semantics means to bridge sets of color combinations with color mood descriptors. For automatic design, we aim to deploy these two viewpoints by applying geometric structures for the design of text color and color semantics for the selection of cover images.
Consumer photos are typically authored once, but need to be retargeted for reuse in various situations. These
include printing a photo on different size paper, changing the size and aspect ratio of an embedded photo to
accommodate the dynamic content layout of web pages or documents, adapting a large photo for browsing on
small displays such as mobile phone screens, and improving the aesthetic quality of a photo that was badly
composed at the capture time. In this paper, we propose a novel, effective, and comprehensive content-aware
automatic cropping (hereafter referred to as “autocrop”) method for consumer photos to achieve the above
purposes. Our autocrop method combines the state-of-the-art context-aware saliency detection algorithm, which
aims to infer the likely intent of the photographer, and the “branch-and-bound” efficient subwindow search
optimization technique, which seeks to locate the globally optimal cropping rectangle in a fast manner. Unlike
most current autocrop methods, which can only crop a photo into an arbitrary rectangle, our autocrop method
can automatically crop a photo into either a rectangle of arbitrary dimensions or a rectangle of the desired aspect
ratio specified by the user. The aggressiveness of the cropping operation may be either automatically determined
by the method or manually indicated by the user with ease. In addition, our autocrop method is extended to
support the cropping of a photo into non-rectangular shapes such as polygons of any number of sides. It may also
be potentially extended to return multiple cropping suggestions, which will enable the creation of new photos to
enrich the original photo collections. Our experimental results show that the proposed autocrop method in this
paper can generate high-quality crops for consumer photos of various types.
In this paper, we present improvements to image selection and image layout for automatic photobook generating algorithms. These improvements are designed to help the user easily create a photo album, which matches the user preferences and strengthens the aesthetic quality of the photobook. Image content, composition, and metadata are utilized to determine the set of images being selected, and to suggest the layout of each page.
Recommender systems seek to predict the interest a user would find in an item, person or social element they had not yet considered, based upon the properties of the item, the user's past experience and similar users. However, recommended items are often presented to the user with no context and no ability to influence the results. We present a novel visualization technique for recommender systems in which, a user can see the items recommended for him, and understand why they were recommended. Focusing on a user, we render a planar visualization listing a set of recommended items. The items are organized such that similar items reside nearby on the screen, centered around realtime generated categories. We use a combination of iconography, text and tag clouds, with maximal use of screen real estate, and keep items from overlapping to produce our results. We apply our visualization to expert relevance maps in the enterprise and a book recommendation system for consumers. The latter is based on Shelfari, a social network for reading and books.
For objects with the same texture but different colors, it is difficult to discriminate them with the traditional scale invariant feature transform descriptor (SIFT), because it is designed for grayscale images only. Thus it is important to keep a high probability to make sure that the used key points are couples of correct pairs. In addition, mean distributed key points are much more expected than over dense and clustered key points for image match and other applications. In this paper, we analyze these two problems. First, we propose a color and scale invariant method to extract a more mean distributed key points relying on illumination intensity invariance but object reflectance sensitivity variance variable. Second, we modify the key point’s canonical direction accumulated error by dispersing each pixel’s gradient direction on a relative direction around the current key point. At last, we build the descriptors on a Gaussian pyramid and match the key points with our enhanced two-way matching regulations. Experiments are performed on the Amsterdam Library of Object Images dataset and some synthetic images manually. The results show that the extracted key points have better distribution character and larger number than SIFT. The feature descriptors can well discriminate images with different color but with the same content and texture.
Nowadays, video has gradually become the mainstream of dissemination media for its rich information capacity and intelligibility, and texts in videos often carry significant semantic information, thus making great contribution to video content understanding and construction of content-based video retrieval system. Text-based video analyses usually consist of text detection, localization, tracking, segmentation and recognition. There has been a large amount of research done on video text detection and tracking, but most solutions focus on text content processing in static frames, few making full use of redundancy between video frames. In this paper, a unified framework for text detection, localization and tracking in video frames is proposed. We select edge and corner distribution of text blocks as text features, localizing and tracking are performed. By making good use of redundancy between frames, location relations and motion characteristics are determined, thus effectively reduce false-alarm and raise correct rate in localizing. Tracking schemes are proposed for static and rolling texts respectively. Through multi-frame integration, text quality is promoted, so is correct rate of OCR. Experiments demonstrate the reduction of false-alarm and the increase of correct rate of localization and recognition.
Current image vectorization techniques mainly deal with images with simple and plain colors. For full-color
photographs, many difficulties still exist in object segmentation, feature line extraction, and color distribution
In this paper, we propose a high-efficiency image vectorization method based on importance sampling and triangulation.
A set of blue-noise sampling points is first generated on the image plane by an improved error-diffusion sampling
method. The point set well preserves the features in the image. Then after triangulation on this point set, color
information can be recorded on the mesh vertices to form a vector image. After certain image editing, e.g. scaling or
transforming, the whole image can be reconstructed by color interpolating inside each triangle.
Experiments show that the method has high performing efficiency and abilities in feature-preserving. It will bring
benefits to many applications, e.g. image compressing, editing, transmitting and resolution enhancement.
Effective local feature extraction is one of the fundamental tools for retrieval applications in computer vision. However, it is difficult to achieve distinguishable local features in large viewpoint variances. In this paper, we propose a novel non-iterative approach of normalized feature extraction in large viewpoint variances, which adapts local regions to rotation, scale variance and rigid distortion from affine transformation. Our approach is based on two key ideas: 1) Localization and scale selection can be directly achieved with the centroid and covariance matrix of the spatial distribution of pixels in a local region. 2) Principal Component Analysis (PCA) on gradients of intensity gives information on texture, thus it can be used to get a resampled region which is isotropic in terms of variance of gradient. Experiments demonstrate that our normalized approach has significant improvement on matching score in large viewpoint variances.
Most packaging is printed using spot colors to reduce cost, produce consistent colors, and achieve a wide color gamut on the package. Most watermarking techniques are designed to embed a watermark in cyan, magenta, yellow, and black for printed images or red, green, and blue for displayed digital images. Our method addresses the problem of watermarking spot color images. An image containing two or more spot colors is embedded with a watermark in two of the colors with the maximum signal strength within a user-selectable visibility constraint. The user can embed the maximum watermark signal while meeting the required visibility constraint. The method has been applied to the case of two spot colors and images have been produced that are more than twice as robust to Gaussian noise as a single color image embedded with a luminance-only watermark with the same visibility constraint.
In this paper, a QR code is presented with a dual resolution structure. It contains a high resolution layer that is coded in luminance and is in consistency with the conventional QR code, and a low resolution layer providing additional error checking information, that is coded in chrominance and is robust to blurring. The proposed QR code is compatible to its underlying conventional black and white barcode as it can be read by their decoders. Its advantage is additional reliability when a color decoder is used. In particular, it enhances the decoding accuracy for devices such as mobile devices for barcodes printed in small sizes.
We are developing tangible imaging systems1-4 that enable natural interaction with virtual objects. Tangible imaging systems are based on consumer mobile devices that incorporate electronic displays, graphics hardware, accelerometers, gyroscopes, and digital cameras, in laptop or tablet-shaped form-factors. Custom software allows the orientation of a device and the position of the observer to be tracked in real-time. Using this information, realistic images of threedimensional objects with complex textures and material properties are rendered to the screen, and tilting or moving in front of the device produces realistic changes in surface lighting and material appearance. Tangible imaging systems thus allow virtual objects to be observed and manipulated as naturally as real ones with the added benefit that object properties can be modified under user control. In this paper we describe four tangible imaging systems we have developed: the tangiBook – our first implementation on a laptop computer; tangiView – a more refined implementation on a tablet device; tangiPaint – a tangible digital painting application; and phantoView – an application that takes the tangible imaging concept into stereoscopic 3D.
People are often the most important subjects in videos. It is highly desired to automatically summarize the occurrences of different people in a large collection of video and quickly find the video clips containing a particular person among them. In this paper, we present a person-based video summarization and retrieval system named VideoWho which extracts temporal face sequences in videos and groups them into clusters, with each cluster containing video clips of the same person. This is accomplished based on advanced face detection and tracking algorithms, together with a semisupervised face clustering approach. The system achieved good clustering accuracy when tested on a hybrid video set including home video, TV plays and movies. On top of this technology, a number of applications can be built, such as automatic summarization of major characters in videos, person-related video search on the Internet and personalized UI systems etc.