The Logarithmic Number System (LNS) has area and power advantages over fixed-point and floating-point number systems in some applications that tolerate moderate precision. LNS multiplication/division require only addition/subtraction of logarithms. Normally, LNS is implemented with ripple-carry binary arithmetic for manipulating the logarithms; however, this paper uses carry-free residue arithmetic instead. The Residue Logarithmic Number System (RLNS) has the advantage of faster multiplication and division. In contrast, RLNS addition requires table-lookup, which is its main area and delay cost. The bipartite approach, which uses two tables and an integer addition, is introduced here to optimize RLNS addition. Using the techniques proposed here, RLNS with dynamic range and precision suitable for MPEG applications can be synthesized. Synthesis results show that bipartite RLNS achieves area savings and shorter delays compared to naive RLNS.
Previous research shows the Signed Logarithmic Number System (SLNS) offers lower power consumption than the fixed-point number system for MPEG decoding. SLNS represents a value with the logarithm of its absolute value and a sign bit. Subtraction is harder in SLNS than other operations. This paper examines a variant, Dual-Redundant LNS (DRLNS), where addition and subtraction are equally easy, but DRLNS-by-DRLNS multiplication is not. DRLNS represents a value as the difference of two terms, both of which are represented logarithmically. DRLNS is appropriate for the Inverse Discrete Cosine Transform (IDCT) used in MPEG decoding because a novel accumulator register can contain the sum in DRLNS, but the products are fed to this accumulator in non-redundant SLNS format. Since DRLNS doubles the word size, the accumulator needs to be converted back into SLNS. This paper considers two such methods. One computes the difference of the two parts using LNS. The other converts the two parts separately to fixed point and then computes the logarithm of their difference. A novel factoring of a common term out of the two parts reduces the bus widths. Mitchell's low-cost logarithm/antilogarithm approximation is shown to produce acceptable visual results in this conversion.
The Logarithmic Number System (LNS) has lower power and larger dynamic range than fixed point, which makes LNS suitable for designing low-power, portable devices. Motion estimation is a key part of the MPEG encoding system. This paper introduces LNS into motion estimation for the MPEG encoding system. The block matching technique is the most commonly used motion-estimation method in MPEG encoding. The Mean Absolute Difference (MAD) is an inexpensive fixed-point cost function, which uses the sum of the absolute difference of the pixel values in the reference and encoded frames. Since LNS addition and subtraction are expensive, we propose the quotient of the two pixels' values instead of the difference. LNS division only needs a fixed-point subtractor. Similar to the absolute difference, we take the quotient of the larger value over the smaller value. We call this new cost function Mean Larger Ratio (MLR). The product of such ratios is calculated for each of the macroblocks in MPEG frames. Using MLR, LNS has approximately the same hardware as MAD for fixed point. Example videos show MLR provides a practical cost function to perform motion estimation with LNS.
This paper describes truncated squarers, which are specialized squarers with a portion of the squaring matrix eliminated. Rounding error and errors due to matrix reduction are quantified and analyzed. Constant and variable correction techniques are presented that minimize either the mean error or the maximum absolute error as required by the application. Area and delay estimates are presented for a number of designs, as well as error statistics obtained both analytically and numerically by exhaustive simulation. As an example, one design of a 16-bit truncated squarer using constant correction is 10.1% faster and requires 27.9% less area than a comparable standard squarer with true rounding. The range of error for this truncated squarer is -0.892 to +0.625 ulps, compared to +/-0.5 ulps for the standard squarer.
Truncated multipliers offer significant improvements in area, delay, and power. However, little research has been done on their use in actual applications, probably due to concerns about the computational errors they introduce. This paper describes a software tool used for simulating the use of truncated multipliers in DCT and IDCT hardware accelerators. Images that have been compressed and decompressed by DCT and IDCT accelerators using truncated multipliers are presented. In accelerators based on Chen's algorithm (256 multiplies per 8 x 8 block for DCT, 224 multiplies per block for IDCT), there is no visible difference between images reconstructed using truncated multipliers with 55% of the multiplication matrix eliminated and images reconstructed using standard multipliers with the same operand lengths and intermediate precision.
Floating-point and fixed-point are expensive for portable multimedia devices. Low-cost Logarithmic Number System (LNS) arithmetic can reduce power consumption of MPEG decoding in exchange for barely perceptible video artifacts. Different number representations need different word sizes to produce the same quality image. LNS can produce good visual results using fewer bits than fixed point. Rounding to the nearest is often done with fixed point and floating point, but LNS allows a cheaper unrestricted-faithful-rounding mode that does not degrade the visual quality of MPEG outputs. This paper also describes how the Berkeley MPEG tools were modified to carry out these MPEG arithmetic experiments.
Field Programmable Gate Arrays (FPGAs) have some difficulty with the implementation of floating-point operations. In particular, devoting the large number of slices needed by floating-point multipliers prohibits incorporating floating point into smaller, less expensive FPGAs. An alternative is the Logarithmic Number System (LNS), where multiplication and division are easy and fast. LNS also has the advantage of lower power consumption than fixed point. The problem with LNS has been the implementation of addition. There are many price/performance tradeoffs in the LNS design space between pure software and specialised-high-speed hardware. This paper focuses on a compromise between these extremes. We report on a small RISC core of our own design (loosely inspired by the popular ARM processor) in which only 4 percent additional investment in FPGA resources beyond that required for the integer RISC core more than doubles the speed of LNS addition compared to a pure software approach. Our approach shares resources in the datapath of the non-LNS parts of the RISC so that the only significant cost is the decoding and control for the LNS instruction. Since adoption of LNS depends on its cost effectiveness (e.g., FLOPs/slice), we compare our design against an earlier LNS ALU implemented in a similar FPGA. Our preliminary experiments suggest modest LNS-FPGA implementations, like ours, are more cost effective than pure software and can be as cost effective as more expensive LNS-FPGA implementations that attempt to maximise speed. Thus, our LNS-RISC fits in the Virtex-300, which is not possible for a comparable design.