Many multimedia processing algorithms as well as communication algorithms implemented in mobile devices are based
on intensive implementation of linear algebra methods, in particular, implying implementation of a large number of inner
products in real time. Among most efficient approaches to perform inner products are the Associative Computing (ASC)
approach and Distributed Arithmetic (DA) approach. In ASC, computations are performed on Associative Processors
(ASP), where Content-Addressable memories (CAMs) are used instead of traditional processing elements to perform
basic arithmetic operations. In the DA approach, computations are reduced to look-up table reads with respect to binary
planes of inputs. In this work, we propose a modification of Associative processors that supports efficient
implementation of the DA method. Thus, the two powerful methods are combined to further improve the efficiency of
multiple inner product computation. Computational complexity analysis of the proposed method illustrates significant
speed-up when computing multiple inner products as compared both to the pure ASC method and to the pure DA method
as well as to other state-of the art traditional methods for inner product calculation.
An FFT-ordered L-filter (FFT-LF) has been previously proposed as an alternative to the L-filter. FFT-LF filter is defined as a linear combination of 'FFT-ordered' inputs which can be considered as outputs of a bank of stack filters formed according to an FFT-like flowchart. It has been shown that FFT-LF's efficiently remove mixed noise. They possess good characteristics of performance and are simple in implementation. The idea of this paper is based on using FFT-LF in adaptive LMS filtering framework. This allows to incorporate advantages of both transform-domain adaptive LMS (TD-LMS) filters and adaptive L-filters into a unified design. We propose a new adaptive filter structure, called adaptive FFT-LF filter which consists of two stages: preprocessing and the LMS algorithm. Many different adaptive filters, in particular, TD-LMS, adaptive L-filters can be realized based on the proposed structure. The range of these filters is efficiently implemented on a unified device.
A new efficient parallel algorithm and an architecture for order statistic, weighted order statistic and stack filters are suggested in this paper. This design is based on coding the order relations between input samples within the filter's window by so called binary ordering P-matrices. A simple scheme utilized in the design allows to update the current binary matrix from the previous one using just one parallel step of comparisons and simple, regular bit shifts. The architecture of the design is simple and suits well for implementation in systolic arrays. It is unified for all the above filters meaning that in all the cases the structure is the same and only one block, the output former, is different for each of these filters.
A filter structure formed as a linear combination of stack filters (L-stack-filters), is studied. This type of filters include many useful filter classes, e.g., linear FIR filters and nonlinear threshold Boolean filters, and L-filters. An efficient algorithm for finding optimal filter coefficients under the mean squared error (MSE) criterion is derived. A subclass of the above filters, called FFT-ordered L-filters (FFT-LF), is studied in detail. In this case the bank of filters is formed according to a generalized structure of the FFT flowchart. It
is shown that FFT-LFs effectively remove mixed Gaussian and impulsive noise. While possessing good characteristics of performance, FFT-LFs are simple in implementation. In the sense of implementation the most complicated FFT-LFs are the well-known L-filters. We suggest an efficient parallel architecture implementing FFT-LFs as well as a family of discrete orthogonal transforms including discrete Fourier and Walsh transforms. Both linear and nonlinear L-filter-type filters are implemented effectively on the architecture. Comparison with known architectures implementing both linear
and nonlinear filters reveals advantages of the proposed architecture. An efficient implementation of L-stack-filters is also proposed.
A filter structure formed as a linear combination of a bank of nonlinear filters, in particular, as linear combination of a bank of stack filters, is studied. This type of filter includes many known filter classes, e.g., linear FIR filters and nonlinear threshold Boolean filters, L-filters. An efficient algorithm based on joint distribution functions of stack filters for finding optimal filter coefficients under MSE (mean squared error) criterion is proposed. A subclass of the above filters, called FFT-ordered L-filters (FFT-LF), is studied in detail. In this case the bank of filters is formed according to the generalized structure of the FFT flowgraph. It is shown that FFT-LFs effectively remove mixed Gaussian and impulsive noise. Possessing good characteristics of performance, FFT-LFs are simple in implementation. The most complicated (in the sense of implementation) FFT-LFs are well known L-filters. We suggest an efficient parallel architecture implementing FFT-LFs as well as a family of discrete orthogonal transforms including discrete Fourier, Walsh and other transforms. Both linear and nonlinear L-filter-type filters are implemented effectively on the architecture. Comparison with known architectures implementing both linear and nonlinear filters reveals advantages of the proposed architecture.
An efficient spectral algorithm for representing any Boolean function as a linear combination of positive Boolean functions (PBF) is proposed. A processor architecture realizing this algorithm with the varying levels of parallelism is suggested. The algorithm finds not only the truth table of PBF's, but also their minimal disjunctive normal form representations. This allows to incorporate previously proposed efficient stack filtering designs into the construction of architectures for threshold Boolean filters.
Efficient algorithms with various level of parallelism are proposed for the recently introduced class of Binary Polynomial Transforms (BPT) including Walsh and conjunctive (Reed-Muller) transforms. A unique generic algorithm is proposed for the class of BPT. For each level of parallelism this algorithm is optimal for most of BPT with regard to speedup factor, including Walsh-Hadamard and conjunctive transforms. A family of processors realizing the proposed algorithms is also suggested. The processors can be implemented using a varying number of processor elements of unified architecture. They are universal, i.e. a class of binary polynomial transforms is effectively realized in the processor. Although the processors are universal, their area-time complexities are comparable with the complexities of known Hadamard processors. Processors from the proposed family can be included as blocks in construction of a signal/image processing system.
In this work we introduce a new wide family of 'unbounded' DOTs based on parametric representations of transform matrices. This family contains the generalized Haar transform. A computational model corresponding to linear MISD type (pipelined) algorithms is introduced. Lower bounds are found for the complexity of linear transforms relative to the proposed model. Unified pipelined-parallel algorithms with various level of parallelism which can be implemented on MISD systems to compute unbounded DOTs are developed. It is shown that the proposed algorithms are asymptotically optimal, i.e. the order of the upper bounds coincide with the order of the lower bounds. A unified processor architecture realizing the proposed algorithms is developed. Each processor is universal for a family of unbounded DOTs, meaning that each transform of the family is effectively realized in the processors. The processors can be implemented using different number of processor elements based on the same architecture. Although the processors are universal their area-time complexities are comparable with complexities of known Haar processors.
A new fast running filtering algorithm of decompositional type for realizing stack filters that are based on a subclass of positive Boolean functions, namely cyclic positive Boolean functions, is proposed. Peculiar to the presented algorithm is the use of Fibonacci p-codes, which make it possible to have a unified approach to running filtering, containing as special cases the complete threshold decomposition (when p is greater than the maximal value of input data) and the binary-tree threshold decomposition (when p equals 0). The choice of optimal value of p depends on the statistics of input data and reduces the complexity of running stack filtering.