This PDF file contains the front matter associated with SPIE Proceedings Volume 7879, including the Title Page, Copyright information, Table of Contents, Introduction, and the Conference Committee listing.
Short run printing technology and web services such as MagCloud provide new opportunities for long-tail magazine
publishing. They enable self publishers to supply magazines to a wide range of communities, including groups that are
too small to be viable as target communities for conventional publishers.
In a Web 2.0 world where users constantly discover new services and where they may be infrequent patrons of any
single service, it is unreasonable to expect users to learn the complex service behaviors. Furthermore, we want to open
up publishing opportunities to novices who are unlikely to have prior experience of publishing and who lack design
Magazine design automation is an ambitious goal, but recent progress with another web service, Autophotobook, proves
that some level of automation of publication design is feasible. This paper describes our current research effort to extend
the automation capabilities of Autophotobook to address the issues of magazine design so that we can provide a service
to support professional-quality self publishing by novice users for a wide range of community types and sizes.
In personalized digital printing, such as greeting cards, calendars and photo books, people select artworks to
match their photos at their preference. Art work design elements are often categorized by occasions, styles,
and products. The amount of designs grows significantly, as customers demand more choices and the trends of
popular designs rise and fade season by season. It is crucial to manage and understand how design elements
are used in order to create most desirable productions. In this paper, we analyze and compare different design
tracking systems. Art work designs are labeled, ranked, and cross referenced. For each system, we demonstrate
the scale of applications, data collection techniques and its advantages and disadvantages.
Web article pages usually have hyperlinks (or links) that lead to print-friendly web pages containing mainly the article
content. Content extraction using these print-friendly pages is generally easier and more reliable, but there are many
variations of the print-link representations in HTML that made robust print-link detection more difficult than it first
appears. First, the link can be text-based, image-based, or both. For example, there is a lexicon of phrases used to
indicate print-friendly pages, such as "print", "print article", "print-friendly version", etc. In addition, some links use
printer-resembling image icons with or without a print phrase present. To complicate the matter further, not all the links
no URL is available for extraction. We estimate that there are more than 90% of the Web article pages have print-links,
of which about 35% of them have valid print-friendly URLs, which is a good percentage. Our solution to the print-link
extraction problem takes on two stages: (1) the detection of the print-link, (2) the retrieval of the print-friendly page
URL from the link attributes, including the test for its validity. Experimental results based on roughly 2000 web article
pages suggest our solution is capable of achieving over 99% precision and 97% recall performance measures.
Current printing technologies enable customers to reproduce high quality, realistic, and colorful hard copies of
their digital documents. Although the activity of printing is transparent to the customers, the progression of
a customer's document through the color printing workflow (CPW) is a complex process that may alter the
colors in the print job. Given the complexity of the CPW, it is a difficult problem to diagnose the source of the
color issue. Novel tools and methods that address this challenge are beneficial for both the manufacturer and
its customers. We propose a Web-based troubleshooting tool that helps customers to self-solve color issues with
electrophotographic laser printers when printing solid colors in graphics and text. The tool helps the customer
to reconfigure his/her CPW following printing best practices. If the issue is still unresolved, the tool guides
the user to search the gamut of the printer for his/her color preference. The usability of the tool was carefully
evaluated with human subject experiments. Also, the description and organization of the troubleshooting tasks
were continuously reviewed and improved in regular meetings of the development team. In this paper, we describe
the troubleshooting strategy, the color preference search algorithm, and the results of the usability experiments.
Natural language color (NLC) was initially developed as a web-based application and then deployed in one
Xerox print driver. NLC changes the image-editing paradigm from the use of curves, sliders, and knobs, to the
use of verbal text-based commands such as "make light green much less yellowish". The technology appeals
to a common user who has no expert knowledge in color science, and this naturally leads one to think about
its use in mobile devices. A prototype GUI design for a language-based color editing on iPhone platform will
be presented that uses several of its haptic interfaces (e.g. "slot-machine", shaking, swiping, etc.). A textual
interface is provided to select a color to be modified within the image and a direction of change for the
modification. A swipe interface is provided to select a magnitude and polarity for the modification. Actions on
the textual and swipe interface are converted to natural language commands that are in turn used to derive a
color transformation that is applied to relevant portions of the image to yield a modified image. The
modifications are displayed in real time to the user.
The availability of web and on-line image sharing services makes image personalization and customization a more
interesting topic. Nonetheless, designing a personalized image is a time-consuming task, requiring hours of work
by expert designers. Observing the potential opportunity to make the design process easier and more amenable
to ordinary users, we presented a semi-automatic tool for designing personalized images in the Electronic Imaging
(EI) symposium last year.1, 2
As a follow-up, we present several improvements to the original semi-automatic tool, for both text insertion
and text replacement on planar surfaces. We also describe our effort in implementing the tool as a true web-based
service, which eliminates the need for installation of any software or packages by the user. We believe that we
have made the technology of image personalization more friendly and accessible to ordinary users.
Managing large document databases is an important task today. Being able to automatically com-
pare document layouts and classify and search documents with respect to their visual appearance
proves to be desirable in many applications. We measure single page documents' similarity with
respect to distance functions between three document components: background, text, and saliency.
Each document component is represented as a Gaussian mixture distribution; and distances between
dierent documents' components are calculated as probabilistic similarities between corresponding
distributions. The similarity measure between documents is represented as a weighted sum of the
components' distances. Using this document similarity measure, we propose a browsing mechanism
operating on a document dataset. For these purposes, we use a hierarchical browsing environment
which we call the document similarity pyramid. It allows the user to browse a large document dataset
and to search for documents in the dataset that are similar to the query. The user can browse the
dataset on dierent levels of the pyramid, and zoom into the documents that are of interest.
Document images are obtained regularly by rasterization of document content and as scans of printed documents.
Resizing via background and white space removal is often desired for better consumption of these images, whether on
displays or in print. While white space and background are easy to identify in images, existing methods such as naïve
removal and content aware resizing (seam carving) each have limitations that can lead to undesirable artifacts, such as
uneven spacing between lines of text or poor arrangement of content. An adaptive method based on image content is
hence needed. In this paper we propose an adaptive method to intelligently remove white space and background content
from document images. Document images are different from pictorial images in structure. They typically contain
objects (text letters, pictures and graphics) separated by uniform background, which include both white paper space and
other uniform color background. Pixels in uniform background regions are excellent candidates for deletion if resizing
is required, as they introduce less change in document content and style, compared with deletion of object pixels. We
propose a background deletion method that exploits both local and global context. The method aims to retain the
document structural information and image quality.
Even though technology has allowed us to measure many different aspects of images, it is still a challenge to
objectively measure their aesthetic appeal. A more complex challenge is presented when an arrangement of
images is to be analyzed, such as in a photo-book page. Several approaches have been proposed to measure the
appeal of a document layout that, in general, make use of geometric features such as the position and size of a
single object relative to the overall layout. Fewer efforts have been made to include in a metric the influence of
the content and composition of images in the layout. Many of the aesthetic characteristics that graphic designers
and artists use in their daily work have been either left out of the analysis or only roughly approximated in an
effort to materialize the concepts.
Moreover, graphic design tools such as transparency and layering play an important role in the professional
creation of layouts for documents such as posters and flyers. The main goal of our study is to apply similar
techniques within an automated photo-layout generation tool. Among other design techniques, the tool makes
use of layering and transparency in the layout to produce a professional-looking arrangement of the pictures.
Two series of experiments with people from different levels of expertise with graphic design provided us with the
tools to make the results of our system more appealing. In this paper, we discuss the results of our experiments
in the context of distinct graphic design concepts.
Automatic picture orientation recognition is of great significance in many applications such as consumer gallery
management, webpage browsing, content-based searching or web printing. We try to solve this high-level classification
problem by relatively low-level features including Spacial Color Moment (CM) and Edge Direction Histogram (EDH).
An improved distance-based classification scheme is adopted as our classifier. We propose an input-vector-rotating
strategy, which is computationally more efficient than several conventional schemes, instead of collecting and training
samples for all four classes. Then we research on the classifier combination algorithm to make full use of the
complementarity between different features and classifiers. Our classifier combination methods include two levels:
feature-level and measurement-level. And we present two classifier combination structures (parallel and cascaded) at
measurement-level with a rejection option. As the precondition of measurement-level methods, the theory of Classifier's
Confidence Analysis (CCA) is introduced with the definition of concepts such as classifier's confidence and generalized
confidence. The classification system finally approached 90% recognition accuracy on a wide unconstrained consumer
Whiteboards support face to face meetings by facilitating the sharing of ideas, focusing attention, and summarizing.
However, at the end of the meeting participants desire some record of the information from the whiteboard. While there
are whiteboards with built-in printers, they are expensive and relatively uncommon. We consider the capture of the
information on a whiteboard with a mobile phone, improving the image quality with a cloud service, and sharing the
results. This paper describes the algorithm for improving whiteboard image quality, the user experience for both a web
widget and a smartphone application, and the necessary adaptations for providing this as a web service. The web widget,
and mobile apps for both iPhone and Android are currently freely available, and have been used by more than 50,000
There is considerable effort underway to digitize all books that have ever been printed. There is need for a service that
can take raw book scans and convert them into Print on Demand (POD) books. Such a service definitely augments the
digitization effort and enables broader access to a wider audience. To make this service practical we have identified
three key challenges that needed to be addressed. These are: a) produce high quality image images by eliminating
artifacts that exist due to the age of the document or those that are introduced during the scanning process b) develop
an efficient automated system to process book scans with minimum human intervention; and c) build an eco system
which allows us the target audience to discover these books.
Empowering the group collaboration and knowledge-sharing capabilities for the Universal Digital Library (UDL)
is definitely an important work after more than 1.5 million digitalized books were open to access online. One
motivation of developing such a platform is the emergence of Web 2.0 in recent years, especially with the rapidly
increased popularity of Wikipedia. This paper presents our vision, which we call iULib, about where and how
UDL and Wikipedia could meet. In the first phase, we directly apply the Wiki architecture and software in UDL to
upgrade the digital library as an interactive platform that facilitates community and collaboration. Preliminary
implementation shows the feasibility and reliability of our design. Furthermore, as a free encyclopedia that
assembles contributions from different users, Wikipedia may also be used as a knowledge base for UDL. As a
result, UDL can be upgraded as an intelligent platform for information retrieval and knowledge sharing. Our
practice at the WikipediaMM task in the ImgeCLEF 2008 shows that the knowledge network constructed from
Wikipedia can be used to effectively expand the query semantics of image retrieval. It is expected that Wikipedia
and digital library can integrate each other's valuable results and best practices to benefit each other.
We describe a cloud-based automated-publishing platform that allows third party developers to embed our software
components into their applications, enabling their users to rapidly create documents for interactive viewing, or
fulfillment via mail or retail printing. We also describe how applications built on this platform can integrate with a
variety of different consumer digital ecosystems, and how we will address the quality and scaling challenges.
Recently, we observed a substantial increase in the users' interest in sharing their photos online in travel blogs,
social communities and photo sharing websites. An interesting aspect of these web platforms is their high
level of user-media interaction and thus a high-quality source of semantic annotations: Users comment on the
photos of each others, add external links to their travel blogs, tag each other in the social communities and
add captions and descriptions to their photos. However, while those media assets are shared online, many users
still highly appreciate the representation of these media in appealing physical photo books where the semantics
are represented in form of descriptive text, maps, and external elements in addition to their related photos.
Thus, in this paper we aim at fulfilling this need and provide an approach for creating photo books from Web
2.0 resources. We concentrate on two kinds of online shared media as resources for printable photo books:
(a) Blogs especially travel blogs (b) Social community websites like Facebook which witness a rapidly growing
number of shared media elements including photos. We introduce an approach to select media elements including
photos, geographical maps and texts from both blogs and social networks semi-automatically, and then use these
elements to create a printable photo book with an appealing layout. Because the selected media elements can
be too many for the resulting book, we choose the most proper ones by exploiting content based, social based,
and interactive based criteria. Additionally we add external media elements such as geographical maps, texts
and externally hosted photos from linked resources. Having selected the important media, our approach uses a
genetic algorithm to create an appealing layout using aesthetical rules, such as positioning the photo with the
related text or map in a way that respects the golden ratio and symmetry. Distributing the media over the pages
is done by optimizing the distribution according to several rules such that no pages with purely textual elements
without photos are produced. For the page layout appropriate photos are chosen for the background based on
their salience. Other media assets, such as texts, photos and geographical maps are positioned in the foreground
by a dynamic page layout algorithm respecting both the content of the photos and the background, and common
rules for visual layout. The result of our system is a photo book in a printable format. We implemented our
approach as web services that analyze the media elements, enrich them, and create the layout in order to finally
publish a photo book. The connection to those services is implemented in two interfaces. The first is a tool to
select entries from personal blogs, and the second is a Facebook application that allows the user to select photos
from his albums.
This paper presents a scheme which utilizes comments given to images on an image-sharing site in order to obtain an
appropriate image for insertion into poem-like weblogs (blogs) as a way to represent their atmosphere (impression). The
result shows that utilizing comments is effective. To achieve this purpose, there are two issues: how impression words
are extracted from blogs and how images representing the impression words are obtained. Assuming that it is important
to obtain images representing the impression words, this paper focuses on only the latter issue. We hypothesize that
comments and tags extracted from an image-sharing site can be adequate for obtaining images corresponding to
impression words at low cost. In particular, utilizing comments can be more appropriate for the image search with
impression words than utilizing tags because the impression words are often used in comments. Therefore, we propose a
scheme which utilizes comments to obtain appropriate images. In order to investigate the effectiveness of utilizing
comments, conformance between impression words and the images was evaluated. The rating for conformance is 3.5 on
a scale of 1 to 5 when utilizing comments, which is 0.6 higher than when utilizing tags.
Extracting informative content from Web article pages has many applications such as printing and content reuse. Title is
a very significant and unique component of an article. However, identifying the true title is not an easy problem even for
human readers. In this paper, we present a title identification method that takes into account of several features including
the title field of the HTML page and HTML tag of a DOM node as well as font size and horizontal alignment. We tested
our method on a ground truth data set consisting of 1993 pages from 98 web sites and achieved 97.5% accuracy, about
20% above a baseline method based on only the font size.
3D Head models have many applications, such as virtual conference, 3D web game, and so on. The existing several web-based
face modeling solutions that can create a 3D face model from one or two user uploaded face images, are limited to
generating the 3D model of only face region. The accuracy of such reconstruction is very limited for side views, as well
as hair regions. The goal of our research is to develop a framework for reconstructing the realistic 3D human head based
on two approximate orthogonal views. Our framework takes two images, and goes through segmentation, feature points
detection, 3D bald head reconstruction, 3D hair reconstruction and texture mapping to create a 3D head model. The main
contribution of the paper is that the processing steps are applies to both the face region as well as the hair region.
In recent years, mobile devices are quickly reaching almost every corner of our daily life in a variety of forms: personal
media players, smart phones, netbooks, and tablets. Besides the more powerful, smaller, and more versatile hardware,
another driving force is the vast number of software applications ("apps") on those mobile devices. A number of mobile
apps employ intelligent multimedia understanding (MU) technologies. This paper gives an overview of such apps. The
focus is not on the underlying MU techniques, which are already covered by a huge amount of literature. Instead, it
attempts to shed some light on the junction of mobile apps and MU. For this purpose, it addresses a number of important
aspects: unique requirements and characteristics of MU-related apps, values brought in by MU, typical MU
technologies, various system architectures, available development tools, and related standards.
Being able to detect distinguishable objects is a key component in many high level computer vision
applications. Traditional methods for building such detectors require a large amount of carefully
collected and cleaned data. For example to build a face detector, a large number of face images need
to be collected and faces in each image need to be cropped and aligned as the data for training. This
process is tedious and error-pruning. Recently more and more people are sharing their photos on the
internet, if we could leverage these data for building a detector, it will save tremendous amount of
effort in collecting training data. Popular internet search engines and community photo websites like
Google image search, Picassa, Flickr make it possible to harvesting online images for image
understanding tasks. In this paper, we develop a method leveraging images obtained from online
image search to build an object detector. The proposed method can automatically identify the most
distinguishable features across the downloaded images. Using these learned features, a detector can
be built to detect the object in a new image. Experiments show promising results of our approach.
Images meant for marketing and promotional purposes (i.e. coupons) represent a basic component in incentivizing
customers to visit shopping outlets and purchase discounted commodities. They also help department stores in attracting
more customers and potentially, speeding up their cash flow. While coupons are available from various sources - print,
web, etc. categorizing these monetary instruments is a benefit to the users. We are interested in an automatic categorizer
system that aggregates these coupons from different sources (web, digital coupons, paper coupons, etc) and assigns a
type to each of these coupons in an efficient manner. While there are several dimensions to this problem, in this paper
we study the problem of accurately categorizing/classifying the coupons. We propose and evaluate four different
techniques for categorizing the coupons namely, word-based model, n-gram-based model, externally weighing model,
weight decaying model which take advantage of known machine learning algorithms. We evaluate these techniques and
they achieve high accuracies in the range of 73.1% to 93.2%. We provide various examples of accuracy optimizations
that can be performed and show a progressive increase in categorization accuracy for our test dataset.
Web images constitute an important part of web document and become a powerful medium of expression, especially for
the images containing text. The text embedded in web images often carry semantic information related to layout and
content of the pages. Statistics show that there is a significant need to detect and recognize text from web images. In this
paper, we first give a short review of these methods proposed for text detection and recognition in web images; then a
framework to extract from web images is presented, including stages of text localization and recognition. In text
localization stage, localization method is applied to generate text candidates and a two-stage strategy is utilized to select
text candidates, then text regions are localized using a coarse-to-fine text lines extraction algorithm. For text recognition,
two text region binarization methods have been proposed to improve the performance of text recognition in web images.
Experimental results for text localization and recognition prove the effectiveness of these methods. Additionally, a
recognition evaluation for text regions in web images has been conducted for benchmark.
Web image annotation has become an important issue with exploding web images and the necessity of effective image
search. The social tags have recently utilized at image annotation because they can reflect the user's tagging tendency,
and reduce the semantic gap. However, an effective filtering procedure is required to extract the relevant tags since the
user's subjectivity and noisy tags. In this paper, we propose a two-step filtering on social tags for image annotation. This
method conducts the filtering and verification tasks by analyzing the tags of visual neighbor images using voting method
and co-occurrence analysis. Our method consists of the following three steps: 1) the tag candidate set is founded by
searching the visual neighbor images, 2) from a given tag candidate set, coarse filtering is conducted by tag grouping and
voting technique, 3) the dense filtering is conducted by using similarity verification for coarse filtered candidate tag set.
To evaluate the performance of our approach, we conduct the experiments on a social-tagged image dataset obtained
from Flickr. We compare the annotation accuracy between the voting method and our proposed method. Our
experimental results show that our method has an improvement in image annotation.