Extracting informative content from Web article pages has many applications such as printing and content reuse. Title is
a very significant and unique component of an article. However, identifying the true title is not an easy problem even for
human readers. In this paper, we present a title identification method that takes into account of several features including
the title field of the HTML page and HTML tag of a DOM node as well as font size and horizontal alignment. We tested
our method on a ground truth data set consisting of 1993 pages from 98 web sites and achieved 97.5% accuracy, about
20% above a baseline method based on only the font size.
Advertisements today provide the necessary revenue model supporting the WWW ecosystem. Targeted or contextual ad
insertion plays an important role in optimizing the financial return of this model. Nearly all the current ads that appear on
web sites are geared for display purposes such as banner and "pay-per-click". Little attention, however, is focused on
deriving additional ad revenues when the content is repurposed for alternative mean of presentation, e.g. being printed.
Although more and more content is moving to the Web, there are still many occasions where printed output of web
content is desirable, such as maps and articles; thus printed ad insertion can potentially be lucrative. In this paper, we
describe a contextual ad insertion network aimed to realize new revenue for print service providers for web printing. We
introduce a cloud print service that enables contextual ads insertion, with respect to the main web page content, when a
printout of the page is requested. To encourage service utilization, it would provide higher quality printouts than what is
possible from current browser print drivers, which generally produce poor outputs, e.g. ill formatted pages. At this
juncture we will limit the scope to only article-related web pages although the concept can be extended to arbitrary web
pages. The key components of this system include (1) the extraction of article from web pages, (2) the extraction of
semantics from article, (3) querying the ad database for matching advertisement or coupon, and (4) joint content and ad
layout for print outputs.
Publishing industry is experiencing a major paradigm shift with the advent of digital publishing technologies. A large number of components in the publishing and print production workflow are transformed in this shift. However, the process as a whole requires a great deal of human intervention for decision making and for resolving exceptions during job execution. Furthermore, a majority of the best-of-breed applications for publishing and print production are intrinsically designed and developed to be driven by humans. Thus, the human-intensive nature of the current prepress process accounts for a very significant amount of the overhead costs in fulfillment of jobs on press. It is a challenge to automate the functionality of applications built with the model of human driven exectution. Another challenge is to orchestrate various components in the publishing and print production pipeline such that they work in a seamless manner to enable the system to perform automatic detection of potential failures and take corrective actions in a proactive manner. Thus, there is a great need for a coherent and unifying workflow architecture that streamlines the process and automates it as a whole in order to create an end-to-end digital automated print production workflow that does not involve any human intervention. This paper describes an architecture and building blocks that lay the foundation for a plurality of automated print production workflows.
To run a targeted campaign involves coordination and management across numerous organizations and complex process flows. Everything from market analytics on customer databases, acquiring content and images, composing the materials, meeting the sponsoring enterprise brand standards, driving through production and fulfillment, and evaluating results; all processes are currently performed by experienced highly trained staff. Presented is a developed solution that not only brings together technologies that automate each process, but also automates the entire flow so that a novice user could easily run a successful campaign from their desktop. This paper presents the technologies, structure, and process flows used to bring this system together. Highlighted will be how the complexity of running a targeted campaign is hidden from the user through technologies, all while providing the benefits of a professionally managed campaign.