Direct digital synthesizers (DDS), or numerically controlled oscillators, are a functional requirement of virtually every digital communications system, including modems and software defined radios. Frequency synthesis is commonly realized using application specific standard parts or as software on a DSP processor. With ever increasing amounts of digital signal processing being realized using field programmable gate array (FPGA) based hardware platforms, it is fruitful to explore various DDS architectures and evaluate the many possible architecture/performance tradeoffs with a view to FPGA implementation. This paper describes three DDS architectures and presents several designs that illustrate DDS performance and highlight design considerations for FPGA implementation.
This contribution describes the design and implementation of an algorithm-agile cryptographic co-processor board. The core of the board is an FPGA which can be dynamically configured with a variety of block ciphers. The FPGA is capable of encrypting data at high speed through an ISA bus interface. The board contains a RAM with a collection of FPGA configuration files. In addition, the algorithms can be added or deleted during operation. The co-processor board also contains other reconfigurable logic and a microprocessor for control functions, and high-speed FIFOs for data storage. We report about the general design, our experiences with this proof-of-concept implementation, and recommendations for future work.
Connected Component Labelling is an important task in intermediate image processing. Several algorithms have been developed to handle this problem. Hardware implementations have typically been based on massively parallel architectures, with one logical processing element per pixel. This approach requires a great deal of logic, so current solutions are often implemented in VLSI rather than on FPGAs, and are limited in the size of image which can be labelled.
QUICKFLEX's patent pending and revolutionary Quick Qard Technology (QQT) Reconfigurable Computing (RC) architecture approaches RC from an Operating System (OS) systems viewpoint, rather than from the perspective of any one particular application. As such, many problems important to integrating reconfigurable logic into the OS are solved. These problems would otherwise prevent RC from easily being used in a general-purpose mainstream market. The differences between QQT and other RC board methods of use is not so much in the RC hardware and boards, but in the way the RC hardware is utilized as part of a system.
Cryptographic algorithms are constantly evolving to meet security needs, and modular arithmetic is an integral part of these algorithms, especially in the case of public-key cryptosystems. To achieve optimal system performance while maintaining physical security, it is desirable to implement cryptographic algorithms in hardware. However, many public- key cryptographic algorithms require the implementation of modular arithmetic, specifically modular multiplication, for operands of 1024 bits in length. Additionally, algorithm agility is required to support algorithm independent protocols, a feature of most modern security protocols. Reprogrammability, particularly in-system reprogrammability, is critical in enabling the switching between cryptographic algorithms required for algorithm independent protocols. Field Programmable Gate Arrays (FPGAs) are a viable option for achieving this goal. Ideally, the targeted FPGA will have been designed with the architectural requirements for wide-operand modular arithmetic in mind in an effort to maximize system performance. This contribution investigates existing FPGA architectures with respect to modular multiplication. It also proposes a new FPGA architecture optimized for the wide-operand additions required for modular multiplication.
A high bandwidth, low latency processor interconnect has been constructed using off-the-shelf FPGAs and a small amount of additional logic. The Achilles system comprises a router in which the main circuit elements (the FPGAs) have been placed on ten small PCBs and connected together to make a router `stack' and a PCI interface. On the first prototype, channel bandwidth has been measured at 28 Mbytes/second and latency was measured to be 800 ns. The newer low-voltage FPGAs will be used in the next Achilles variant, allowing the bandwidth capability will increase to close to the theoretical maximum for PCI bus. The router stack and PCI interface also has application as a general purpose reconfigurable processor which may be arbitrarily extended by adding further stacks to the system.
In this paper, we present a rapid prototyping environment for mixed signal systems. The environment consists of programmable mixed signal hardware together with a set of integrated CAD tools to enable fast prototyping of mixed signal designs from high-level specifications. The prototyping hardware comprises of field-programmable analog arrays and field-programmable gate arrays on which the analog and digital sections of the design are respectively implemented. Field-programmable interconnect routes signals between multiple devices. A bank of data converters constitutes the interface between the analog and digital parts. Design tools are required to map the given design onto the prototyping hardware. The high-level design specification is first compiled into an intermediate format suitable for synthesis. Following this, the design is partitioned into analog and digital sections. The analog and digital subsystems are synthesized for the target FPAA and FPGA devices respectively. Configuration bitstreams are generated and downloaded on to the respective devices.
The PLDSP is a reconfigurable computing and signal processing architecture. It uses FPGA technology to allow custom signal processing functions to be implemented which can run at rates comparable to real-time dedicated hardware but with easy software-based reconfiguration. Using this methodology, some kinds of algorithms can be made to perform at speeds many times faster than a general purpose computer or even a DSP chip. Depending on the amount of parallelism, processing speeds can easily be in the range of billions of operations per second. It may also be used for rapid prototyping and development of IP which may eventually be implemented in custom ASIC devices. This paper describes the key features of the PLDSP board and several plug in modules. Several applications from signal and image processing are described and their implementation, performance, and resource usage are reviewed.
We introduce in this paper the process of validation applied to digital designs in FPGAs. It allows the designer the ability to test his/her implementation using the real data of the application and providing real results. With such real data, it becomes easier to identify where the error occurs and then to understand it.
Although many studies have been performed to determine the overall parallelism of various applications, little is known about how parallelism changes dynamically during program execution. In this paper, we present a methodology for measuring the dynamic parallelism of a general purpose workstation workload, as represented by a subset of the SPEC95 benchmark suite. We measure the range of parallelism encountered, the rate of major parallelism changes, and the regularity of these changes using a detailed model of an aggressive out-of-order speculative microprocessor. We find that parallelism can vary significantly and rapidly during application execution, and that varying the level of hardware support for exploiting application parallelism can have a non-uniform effect. We discuss how configurable processors oriented towards general purpose computing can potentially exploit these application characteristics.
FPGAs have been successfully used to accelerate a wide variety of applications on a large number of systems. The FPGA devices in these systems are typically configured once by the application and seldom perform any sort of reconfiguration during execution. With the advent of new device architectures and new software tools, the interest in Run-Time Reconfiguration or RTR has increased. As with previous efforts, the focus of RTR has primarily been either in purely theoretical work or in demonstrating performance improvements over software-based solutions. In this paper we explore some of the more practical design issues surrounding RTR systems, and evaluate the advantages of RTR in terms of savings in hardware and software complexity. Preliminary results indicate that RTR can dramatically reduce the amount of FPGA logic and software support necessary for even simple coprocessing applications.
Dynamically Reconfigurable processors are becoming increasingly viable with the advent of modern field- programmable devices. A key feature of dynamically reconfigurable FPGAs is that the logic, and interconnect is time-multiplexed. This enables the implementation of large circuits by partitioning the specification into multiple segments, that execute one after the other on the reconfigurable processor. The available resources can be reused which gives us virtually an infinite pool of resources. In this paper, we introduce a novel technique of temporal partitioning and synthesis of behavioral specification for reconfigurable architectures. We try to optimize the overall latency by performing a trade-off between the total number of partitioned segments, and the latency of each segment. Our approach integrates partitioning and scheduling, and performs design exploration to exploit the trade-off. We also introduce an enhanced Force-Directed List Scheduling algorithm to perform partitioning. We demonstrate the effectiveness of our approach with experimental results.
This paper presents an Image Processing Coprocessor implementation for XC6000 series FPGAs. The FPGA acts as a semi-autonomous abstract coprocessor carrying out image processing operational independently. This paper outlines the main structure of the image processing coprocessor in addition to its high level programming environment. The environment provides a library of very high level, parametrized architecture descriptions which are scaleable and general.
GeneticFPGA is a Java-based tool for evolving digital circuits on Xilinx XC4000EXTM and XC4000XLTM devices. Unlike other FPGA architectures popular with Evolutionary Hardware researchers, the XC4000 series architectures cannot accept arbitrary configuration data. Only a small subset of configuration bit patterns will produce operational circuits; other configuration bit patterns produce circuits which are unreliable and may even permanently damage the FPGA device. GeneticFPGA uses novel software techniques to produce legal circuit configurations for these devices, permitting experimentation with evolvable hardware on the larger, faster, more mainstream devices. In addition, these techniques have led to methods for evolving circuits which are neither temperature, voltage, nor silicon dependent. An 8-bit counter and several digital frequency dividers have been successfully evolved using this approach. GeneticFPGA uses Xilinx's JBitsTM interface to control the generation of bitstream configuration data and the XHWIF portable hardware interface to communicate with a variety of commercially available FPGA-based hardware. GeneticFPGA, JBits, and XHWIF are currently being ported to the Xilinx VirtexTM family of devices, which will provide greatly increased reconfiguration speed and circuit density.
We describe the design and implementation of a speed feedback module for an adaptive motor control system. This system is innovative in its use of reconfigurable logic to accurately measure the speed of a motor.
This paper discusses the Chidi holographic video processing system (called Holo-Chidi) used for real-time computation of computer generated holograms and the subsequent display of the holograms at video frame rates. Chidi is a reconfigurable multimedia processing system designed at the MIT Media Laboratory for real-time synthesis and analysis of multimedia data in general and digital video frames in particular. Holo-Chidi which is an adaptation of Chidi, comprises two main components: the sets of processor cards and the display interface cards.
The ability of rapidly reconfigure the gate-level logic of SRAM-based FPGAs at run-time allows algorithm implementations to be both adaptive and highly application- specific. In this way, adaptive algorithms can be implemented in hardware, and deliver high performance with a flexibility not normally associated with an application- specific solution. This paper describes the static implementation and operation of an adaptive image- enhancement algorithm called SUSAN on a Xilinx XC4085XL, demonstrating order-of-magnitude performance improvements over optimized software running on high-end general-purpose processors, and discusses the implementation of adaptive solutions in both hardware and software. The effective use of dynamic reconfiguration for run-time reconfigurable systems based on XC4000-series devices is examined, along with the use of the JBits API for implementing FPGA-based adaptive signal processing algorithms. Both the performance and suitability of XC4000-series devices for adaptive processing are shown to be critically dependent on device capacity.
The hardware implementation of fixed-point multiplication has become a standard feature in almost all processors and computing systems. Though many researchers have studied various multiplication techniques for ASIC technology, the same techniques may not yield the same performance for FPGA- based multipliers. In this paper, we investigate the costs and speed performances associated with various multiplication techniques implemented on a single XC4010PQ160-5 device. The investigation reveals the significant performance influencing factors for effective design of FPGA-based multipliers. Based on the understanding of the revealed performance influencing factors, we propose a parallel multiplication technique appropriate for FPGA implementations. The implementation results demonstrate that the proposed technique is valid and effective. This paper offers useful references for FPGA-based computing unit designs, and provides an important groundwork for effective design and development of FPGA-based computing systems.
The paper proposes an architecture of an adaptive reconfigurable router, suitable for autonomous vision guided mobile robot vision system. A definition of an adaptive router is given and its simulation with object oriented technology. This approach of system design allows to adapt the router architecture to the aimed vision system performance, through different hardware and software router modules repartitions (algorithm--architecture adequation with constraints (A3C)).