The design of an microprocessor is a long, tedious, and error-prone task consisting of typically three design phases:
architecture exploration, software design (assembler, linker, loader, profiler), architecture implementation (RTL
generation for FPGA or cell-based ASIC) and verification. The Language for instruction-set architectures (LISA)
allows to model a microprocessor not only from instruction-set but also from architecture description including
pipelining behavior that allows a design and development tool consistency over all levels of the design.
To explore the capability of the LISA processor design platform a.k.a. CoWare Processor Designer we present
in this paper three microprocessor designs that implement a 8/8 wavelet transform processor that is typically
used in today's FBI fingerprint compression scheme. We have designed a 3 stage pipelined 16 bit RISC processor
(NanoBlaze). Although RISC &mgr;Ps are usually considered "fast" processors due to design concept like constant
instruction word size, deep pipelines and many general purpose registers, it turns out that DSP operations
consume essential processing time in a RISC processor. In a second step we have used design principles from
programmable digital signal processor (PDSP) to improve the throughput of the DWT processor. A multiply-accumulate
operation along with indirect addressing operation were the key to achieve higher throughput. A
further improvement is possible with today's FPGA technology. Today's FPGAs offer a large number of embedded
array multipliers and it is now feasible to design a "true" vector processor (TVP). A multiplication of two
vectors can be done in just one clock cycle with our TVP, a complete scalar product in two clock cycles. Code
profiling and Xilinx FPGA ISE synthesis results are provided that demonstrate the essential improvement that
a TVP has compared with traditional RISC or PDSP designs.
Cascaded Integrator Comb (CIC) filters are one of the most economical multirate filters widely used as decimators in digital receivers. Size/power improvements in this crucial unit of a digital receiver may essentially improve system performance including battery lifetime of the wireless portable system. Since, the only block in this filter structure that can be improved for better size/power performance is the adder, we have designed several CIC decimators using different means of arithmetic and the designs are synthesized for Altera's Field Programmable Gate Arrays (FPGAs). The arithmetic schemes used are two's complement addition, Carry Save Adder (CSA) using parallel counter logic and the Modified Carry Save Adder (MCSA) that incorporates Wallace tree structure. Each of these CIC decimators is a 16 bit I/O bit-width, 5-stage design with a rate change factor of 1000. Due to the presence of the integrators, the internal required bit-width is 66 bits. In order to maintain the same number of input and output bit-widths, two different Pruning schemes are used, pruning done at each stage of the design and pruning at the final stage alone. Results of synthesis are tabulated for the individual adder designs and the CIC designs with both the pruning schemes for all possible synthesis options. Depending on the requirements of the application, the results of the synthesis can be used to choose a CIC decimator that consumes less silicon or a design that provides better speed or a design which is most cost effective in terms of area/speed product.