In our previous work1 , we presented a block-based technique to analyze printed page uniformity both visually
and metrically. The features learned from the models were then employed in a Support Vector Machine (SVM)
framework to classify the pages into one of the two categories of acceptable and unacceptable quality. In this
paper, we introduce a set of tools for machine learning in the assessment of printed page uniformity. This
work is primarily targeted to the printing industry, specifically the ubiquitous laser, electrophotographic printer.
We use features that are well-correlated with the rankings of expert observers to develop a novel machine
learning framework that allows one to achieve the minimum "false alarm" rate, subject to a chosen "miss" rate.
Surprisingly, most of the research that has been conducted on machine learning does not consider this framework.
During the process of developing a new product, test engineers will print hundreds of test pages, which can
be scanned and then analyzed by an autonomous algorithm. Among these pages, most may be of acceptable
quality. The objective is to find the ones that are not. These will provide critically important information to
systems designers, regarding issues that need to be addressed in improving the printer design. A "miss" is defined
to be a page that is not of acceptable quality to an expert observer that the prediction algorithm declares to
be a "pass". Misses are a serious problem, since they represent problems that will not be seen by the systems
designers. On the other hand, "false alarms" correspond to pages that an expert observer would declare to be of
acceptable quality, but which are flagged by the prediction algorithm as "fails". In a typical printer testing and
development scenario, such pages would be examined by an expert, and found to be of acceptable quality after
all. "False alarm" pages result in extra pages to be examined by expert observers, which increases labor cost.
But "false alarms" are not nearly as catastrophic as "misses", which represent potentially serious problems that
are never seen by the systems developers. This scenario motivates us to develop a machine learning framework
that will achieve the minimum "false alarm" rate subject to a specified "miss" rate. In order to construct such a
set of receiver operating characteristic2 (ROC) curves, we examine various tools for the prediction, ranging from
an exhaustive search over the space of the nonlinear discriminants to a Cost-Sentitive SVM3 framework. We then
compare the curves gained from those methods. Our work shows promise for applying a standard framework to
obtain a full ROC curve when it comes to tackling other machine learning problems in industry.