In this paper we present an implementation of a series of complex-valued operators defined in Ref. 2: Complex-Multiply-Add (CMA), Complex-Sum of Squares (CSS), and Complex-Sum of Products (CSP). The preceding
paper2 defined these operators at an algorithmic level, for which we now provide actual hardware performance
metrics through detailed discussion of their implementation for an Altera Stratix II17 FPGA device. In addition
to discussing these designs in particular, we present our methodology and choice of tools to create a pragmatic
ISA extensions for DLX type architectures are proposed to perform high radix online floating point addition
on fixed point units with extended feature sets. Online arithmetic allows most significant digit first computation
of results, allowing overlapped execution of dependent operations and offers greater instruction scheduling
opportunities than software implementations of conventional floating point addition. In this paper we seek
an ISA formulation to find a middle ground between full hardware floating point addition units and software
implementations strictly based on available ALU logic.
In this paper we propose an interconnection scheme to compute any unfactored arithmetic expression as a network of online modules. This is accomplished through mapping the expression to a doubly-linked hypercube network of online units. The mapping algorithm guarantees a maximum dilation of 2, with unit load, and conjectures that any arbitrary unfactored expression can be mapped to the proposed architecture with a small delay overhead. The proposed architecture requires no form of reconfiguration to accomplish the mapping, providing us with an efficient way to compute any network of online operations.
We present an online arithmetic scheme for hardware evaluation of multinomials arising in Bayesian networks. The design approach consists of representing the multinomial in a factored form as an arithmetic circuit which is then partitioned into subgraphs and mapped to FPGA hardware as a network of online modules connected serially and operating in overlapped manner. This minimizes the interconnect demand without a drastic increase in computation latency. We developed a partitioning/mapping algorithm, designed basic radix-2 online operators and modules, and determined their cost/performance characteristics. We also evaluated the cost/performance characteristics of implementing a Bayesian network on an FPGA chip.