Proceedings Volume Visual Communications and Image Processing 2003, (2003) https://doi.org/10.1117/12.502558
For various applications, such as data compression, structure from motion, medical imaging and video enhancement, there is a need for an algorithm that divides video sequences into independently moving objects. Because our focus is on video enhancement and structure from motion for consumer electronics, we strive for a low complexity solution. For still images, several approaches exist based on colour, but these lack in both speed and segmentation quality. For instance, colour-based watershed algorithms produce a so-called oversegmentation with many segments covering each single physical object. Other colour segmentation approaches exist which somehow limit the number of segments to reduce this oversegmentation problem. However, this often results in inaccurate edges or even
missed objects.
Most likely, colour is an inherently insufficient cue for real world object segmentation, because real world objects can
display complex combinations of colours. For video sequences, however, an additional cue is available, namely the motion of
objects. When different objects in a scene have different motion, the motion cue alone is often enough to reliably
distinguish objects from one another and the background. However, because of the lack of sufficient resolution of efficient
motion estimators, like the 3DRS block matcher, the resulting segmentation is not at pixel resolution, but at
block resolution. Existing pixel resolution motion estimators are more sensitive to noise, suffer more from aperture problems
or have less correspondence to the true motion of objects when compared to block-based approaches or are too computationally
expensive.
From its tendency to oversegmentation it is apparent that colour segmentation is particularly effective near edges of
homogeneously coloured areas. On the other hand, block-based true motion estimation is particularly effective in heterogeneous
areas, because heterogeneous areas improve the chance a block is unique and thus decrease the chance of the wrong position
producing a good match. Consequently, a number of methods exist which combine motion and colour segmentation. These methods use
colour segmentation as a base for the motion segmentation and estimation or perform an independent
colour segmentation in parallel which is in some way combined with the motion segmentation. The presented
method uses both techniques to complement each other by first segmenting on motion cues and then refining the segmentation
with colour. To our knowledge few methods exist which adopt this approach. One example is \cite{meshrefine}. This method uses an
irregular mesh, which hinders its efficient implementation in consumer electronics devices. Furthermore, the method produces
a foreground/background segmentation, while our applications call for the segmentation of multiple objects.
NEW METHOD
As mentioned above we start with motion segmentation and refine the edges of this segmentation with a pixel resolution colour
segmentation method afterwards. There are several reasons for this approach:
+ Motion segmentation does not produce the oversegmentation which colour segmentation methods normally produce, because
objects are more likely to have colour discontinuities than motion discontinuities. In this way, the colour segmentation only
has to be done at the edges of segments, confining the colour segmentation to a smaller part of the image. In such a part, it
is more likely that the colour of an object is homogeneous.
+ This approach restricts the computationally expensive pixel resolution colour segmentation to a subset of the image.
Together with the very efficient 3DRS motion estimation algorithm, this helps to reduce the computational
complexity.
+ The motion cue alone is often enough to reliably distinguish objects from one another and the background.
To obtain the motion vector fields, a
variant of the 3DRS block-based motion estimator which analyses three frames of input was used. The 3DRS motion estimator is
known for its ability to estimate motion vectors which closely resemble the true motion.
BLOCK-BASED MOTION SEGMENTATION
As mentioned above we start with a block-resolution segmentation based on motion vectors. The presented method is inspired by
the well-known $K$-means segmentation method \cite{K-means}. Several other methods (e.g. \cite{kmeansc}) adapt $K$-means for
connectedness by adding a weighted shape-error. This adds the additional difficulty of finding the correct weights for the
shape-parameters. Also, these methods often bias one particular pre-defined shape. The presented method, which we call
$K$-regions, encourages connectedness because only blocks at the edges of segments may be assigned to another segment. This
constrains the segmentation method to such a degree that it allows the method to use least squares for the robust fitting of
affine motion models for each segment. Contrary to \cite{parmkm}, the segmentation step still operates on vectors instead of
model parameters. To make sure the segmentation is temporally consistent, the segmentation of the previous frame will be used
as initialisation for every new frame. We also present a scheme which makes the algorithm independent of the initially chosen
amount of segments.
COLOUR-BASED INTRA-BLOCK SEGMENTATION
The block resolution motion-based segmentation forms the starting point for the pixel resolution segmentation. The pixel
resolution segmentation is obtained from the block resolution segmentation by reclassifying pixels only at the edges of
clusters. We assume that an edge between two objects can be found in either one of two neighbouring blocks that belong to
different clusters. This assumption allows us to do the pixel resolution segmentation on each pair of such neighbouring blocks
separately. Because of the local nature of the segmentation, it largely avoids problems
with heterogeneously coloured areas. Because no new segments are introduced in this step, it also does not suffer from
oversegmentation problems. The presented method has no problems with bifurcations. For the pixel resolution segmentation
itself we reclassify pixels such that we optimize an error norm which favour similarly coloured regions and straight edges.
SEGMENTATION MEASURE
To assist in the evaluation of the proposed algorithm we developed a quality metric. Because the problem does not have an
exact specification, we decided to define a ground truth output which we find desirable for a given input. We define the measure for the segmentation quality as being how different the segmentation
is from the ground truth. Our measure enables us to evaluate oversegmentation and undersegmentation seperately. Also, it
allows us to evaluate which parts of a frame suffer from oversegmentation or undersegmentation. The proposed algorithm has been tested on several typical sequences.
CONCLUSIONS
In this abstract we presented a new video segmentation method which performs well in the segmentation of multiple
independently moving foreground objects from each other and the background. It combines the strong points of both colour and
motion segmentation in the way we expected. One of the weak points is that the segmentation method suffers from
undersegmentation when adjacent objects display similar motion. In sequences with detailed backgrounds the segmentation will
sometimes display noisy edges. Apart from these results, we think that some of the techniques, and in particular the
$K$-regions technique, may be useful for other two-dimensional data segmentation problems.