Most methods for classifier design assume that the training samples
are drawn independently and identically from an unknown data
generating distribution (i.i.d.), although this assumption is violated in several real life problems. Relaxing this i.i.d. assumption, we
develop training algorithms for the more realistic situation where
batches or sub-groups of training samples may have internal
correlations, although the samples from different batches may be
considered to be uncorrelated; we also consider the extension to
cases with hierarchical--<i>i.e.</i> higher order--correlation structure
between batches of training samples. After describing efficient
algorithms that scale well to large datasets, we provide some
theoretical analysis to establish their validity. Experimental
results from real-life Computer Aided Detection (CAD) problems
indicate that relaxing the i.i.d. assumption leads to statistically
significant improvements in the accuracy of the learned classifier.
Colon cancer is a widespread disease and, according to the American Cancer Society, it is estimated that in 2006
more than 55,000 people will die of colon cancer in the US. However, early detection of colorectal polyps helps
to drastically reduces mortality. Computer-Aided Detection (CAD) of colorectal polyps is a tool that could help
physicians finding such lesions in CT scans of the colon.
In this paper, we present the first phase, candidate generation (CG), of our technique for the detection of
colonic polyp candidate locations in CT colonoscopy. Since polyps typically appear as protrusions on the surface
of the colon, our cutting-plane algorithm identifies all those areas that can be "cut-off" using a plane. The key
observation is that for any protruding lesion there is at least one plane that cuts a fragment off. Furthermore,
the intersection between the plane and the polyp will typically be small and circular. On the other hand, a
plane cannot cut a small circular cross-section from a wall or a fold, due to their concave or elongated paraboloid
morphology, because these structures yield cross-sections that are much larger or non-circular.
The algorithm has been incorporated as part of a prototype CAD system. An analysis on a test set of
more than 400 patients yielded a high per-patient sensitivity of 95% and 90% in clean and tagged preparation
respectively for polyps ranging from 6mm to 20mm in size.