In this work we present algorithms and schemes for computing several common arithmetic expressions defined
in the complex domain as hardware-implemented operators. The operators include Complex Multiply-Add
(CMA : ab + c), Complex Sum of Products (<i>CSP : ab + ce + f</i>), Complex Sum of Squares (<i>CSS : a<sup>2</sup> + b<sup>2</sup></i> ),
and Complex Integer Powers (<i>CIPk : x<sup>2</sup>, x<sup>3</sup>, ..., x<sup>k</sup></i>). The proposed approach is to map the expression to a
system of linear equations, apply a complex-to-real transform, and compute the solutions to the linear system
using a digit-by-digit, the most significant digit first, recurrence method. The components of the solution vector
corresponds to the expressions being evaluated. The number of digit cycles is about <i>m</i> for <i>m</i>-digit precision. The
basic modules are similar to left-to-right multipliers. The interconnections between the modules are digit-wide.
We describe a hardware-oriented design of a complex division algorithm. This algorithm is similar to a radix-r digit-recurrence division algorithm with real operands and prescaling. Prescaling of complex operands allows efficient selection of complex quotient digits in higher radix. The use of the digit-recurrence method allows hardware implementation similar to that of conventional dividers. Moreover, this method makes correct rounding of complex quotient possible. On the other hand, the proposed scheme requires the use of prescaling tables which are more demanding than tables in similar dividers with real operands. In this paper we present main design ideas, implementation details, and give a rough estimate of the expected latency. We also make a comparison with the estimated latency of the Smith's algorithm used in software routines for complex division.
Proc. SPIE. 5559, Advanced Signal Processing Algorithms, Architectures, and Implementations XIV
KEYWORDS: Digital signal processing, Surface plasmons, Digital image processing, Detection and tracking algorithms, Image processing, Signal processing, Laser induced breakdown spectroscopy, Embedded systems, Binary data, Standards development
This paper presents a C library for the software support of single precision floating-point (FP) arithmetic on processors without FP hardware units such as VLIW or DSP processor cores for embedded applications. This library provides several levels of compliance to the IEEE 754 FP standard. The complete specifications of the standard can be used or just some relaxed characteristics such as restricted rounding modes or computations without denormal numbers. This library is evaluated on the ST200 VLIW processors from STMicroelectronics.
We present a new elementary function library, called CR-LIBM. This library implements the various functions defined by the Ansi99 C standard. It provides correctly rounded functions: the returned result is always the floating-point number that is closest to the exact result. When writing this library, our primarily goal was to certify correct rounding, and make it reasonably fast, and with a low utilisation of memory. Hence, our library can be used without any problem on real-scale problems.
We present an algorithm for implementing correctly rounded exponentials in double-precision floating point arithmetic. This algorithm is based on floating-point operations in the widespread EEE-754 standard, and is therefore more efficient than those using multiprecision arithmetic, while being fully portable. It requires a table of reasonable size and IEEE-754 double precision multiplications and additions. In a preliminary implementation, the overhead due to correct rounding is a 6 times slowdown when compared to the standard library function.
In several cases, the input argument of an elementary function evaluation is given bit-serially, most significant bit first. We suggest a solution for performing the first step of the evaluation (namely, the range reduction) on the fly: the computation is overlapped with the reception of the input bits.
This paper deals with the computation of reciprocals, square roots, inverse square roots, and some elementary functions using small tables, small multipliers, and for some functions, a final 'large' multiplication. We propose a method that allows fast evaluation of these functions in double precision arithmetic.The strength of this method is that the same scheme allows the computation of all these functions.
We present a method, called the value-preserving (VP) method for reducing the amount of work when computing the value of a function at regularly spaced points. The VP method uses the fact that if two argument values x and y have p common digits, then the values f(x) and f(y) computed with an on-line algorithm of delay (delta) have at least p-(delta) common digits. We discuss evaluation of polynomials using the VP method and compare its performance with several traditional techniques.
The most significant digit first function evaluation method (E-method) allows efficient evaluation of polynomials and certain rational fucntions on custon hardware. The time required for the computation is of the order of m carry-free addition operations, m being the number of digits in the result. We discuss a digit-parallel and a digit-serial implementation of this method on a DecPeRLe-1 board, made up with Xilinx FPGAs. After a presentation of the E-method, we give a discription of the architecture of the DecPeRLe-1 board, present our designs and analyze their performances.
We present here some algorithms for on-line computation of elementary functions. These algorithms use shift-and-add as elementary step and need signed digit representations of numbers. Then, we give some theoretical results about on-line computation of functions. For instance, we show that a finite automaton (in practice a bounded size and memory operator) can compute in on-line only piecewise affine functions.