A new class of nonlinear filters for color image processing was proposed by Lucchese and Mitra. This type of color filter processes the chromatic component of images encoded in the International Commission on Illumination (CIE) u'v' color space. Images processed by this filter do not show color shifts near edges between regions with different intensities. The filter uses linear convolution operations internally and is effective and efficient for denoising and regularizing color images. Image processing systems are computationally intensive and usually require a large amount of area in order to reach desirable levels of performance. The use of on-line arithmetic can decrease the area of the hardware implementation and still maintain a reasonable throughput. This work presents the design of the color filter as a network of on-line arithmetic modules. The network topology and some detail of each arithmetic module are provided. The final implementation targets FPGAs and it is compared in terms of area against an estimate of a conventional design. The throughput of this solution is capable of supporting real-time processing of common image formats.
On-line division is one of the slowest operations among the basic arithmetic operations and naturally becomes a bottleneck in networks of on-line modules that use it. A higher radix divider has a good potential to attain higher throughput than radix-2 dividers and therefore improve the overall throughput of networks where division is needed. The improvement in throughput when using radix 4 is not straightforward since several components of the divider become more complex than in the radix-2 case. Previously proposed radix-4 designs were based on operand pre-scaling to simplify the selection function and reduce the critical path delay, at the cost of more complexity in the algorithm conditions and operations, plus a variable on-line delay, which is a very unattractive feature when small precision values are used (usually the case for DSP). These designs include several phases for pre-scaling and actual division. This paper proposes a design approach based on overlapped replication that results in a radix-4 on-line division module with low algorithm complexity, single division phase, less restrictions to the input values, and a small and fixed on-line delay.