PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Image Processing requires high computational power, plus the ability to experiment with algorithms. Recently, reconfigurable hardware devices in the form of Field Programmable Gate Arrays (FPGAs) have been proposed as a way of obtaining high performance at an economical price. At present, however, users must program FPGAs at a very low level and have a detailed knowledge of the architecture of the device being used. To try to improve design time for FPGA-based image processing, this paper reports on the design and realization of an FPGA-based image processing machine and its associated high level programming model. Central to the design of architecture blocks is the `design to fit' approach. The abstract machine is based on a PC host system with a PCI-bus add-on card containing Xilinx XC6200 series FPGA(s). The machine's high level instruction set is based on the operators of Image Algebra. XC6200 series FPGA configurations have been developed to implement each high level instruction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To develop a cost-effective re-configurable DSP engine, it has been proposed to upgrade an existing custom designed TMS320C40 based multi-processing architecture with run-time configuration capabilities. The upgrade will consist of four Xilinx XC6200 series field programmable gate arrays which will enable concurrent algorithm structures to be efficiently mapped onto the system. Furthermore, the upgraded architecture will provide a platform for the development of adaptive routing structures, self- configuration techniques and facilitate the merging of instruction and hardware based parallelism.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We are mapping an image clustering algorithm onto an FPGA- based computer system. Our approach processes raw pixel data in the red, green, blue color space and generates an output image where all pixels are assigned to classes. A class is a group of pixels with similar color and location. These classes are then used as the basis of further processing to generate tags. The tags, in turn, are used to generate queries for searching libraries of digital images. We run our image tagging approach on an FPGA-based computing machine. The image clustering algorithm is run on an FPGA board, and only the classified image is communicated to the host PC. Further processing is run on the host. Our experimental system consists of an Annapolis Wildforce board with four Xilinx XC4000 chips and a PCI connection to a host PC. Our implementation allows the raw image data to stay local to the FPGAs, and only the class image is communicated to the host PC. The classified pixels are then used to generate tags which can be used for searching a digital library. This approach allows us to parallelize the image processing on the FPGA board, and to minimize the data handled by the PC. FPGA platforms are ideally suited for this sort of initial processing of images. The large amount of image data can be preprocessed by exploiting the inherent parallelism available in FPGA architectures, keeping unnecessary data off the host processor. The result of our algorithm is a reduction by up to a factor of six in the number of bits required to represent each pixel. The output data is passed to the host PC, thus reducing the processing and memory resources needed compared to handling the raw data on the PC. The process of generating tags of images is simplified by first classifying pixels on an FPGA-based system, and digital library search is accelerated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many vision tasks have stringent latency, throughput requirements, and real-time processing demand for greater parallelism. This paper describes a real-time Computer- Vision application that has been developed on a reconfigurable platform called RIPP10 from Altera. We use our simple, modular, reconfigurable and parallel specialized architecture called DIPSA. We analyze and quantify performance requirements demanded by an Automatic Target Recognition and Tracking algorithm and then, propose a partitioning of the algorithm into DIPSA modules to achieve frame rate processing. Results of the implementation are used for addressing some issues that benefit and limit configurable computing systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a parallel systolic VLSI circuit which can support efficiently the implementation of a dynamic programming algorithm, a part of two aerial image matching procedure. A dynamic programming algorithm allows to estimate the dense field of local luminosity difference (distance) between images in O(N) steps (N X N being image size). The calculated field is a sampling of the projective transform which links two images. The transform parameter final values are obtained through pyramidal calculations (at different image resolutions) and least square approximations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The advent of reconfigurable computers (RCs) containing field-programmable gate-array (FPGA) ICs presents a potential solution to the problem of processing telemetry data at the high rates required to support the latest remote-sensing satellites. For example, one satellite scheduled for launch in 1999 by NASA's Earth Science Enterprise project will generate as much Earth-science telemetry in six months as has been collected in NASA's entire 40-year history. NASA is developing software for large, expensive, conventional parallel-processing computer systems in an attempt to meet the expected processing requirements, but whether or not the resulting performance will be adequate remains unknown. For computationally- intensive, repetitive applications like this, RC technology can provide the critical performance edge. The Adaptive Scientific Data Processing (ASDP) project at NASA Goddard Space Flight Center has been investigating RC applications in scientific processing systems. ASDP has developed prototype RC solutions which have achieved processing speeds an order of magnitude faster than a conventional high-end computer workstation alone. This paper presents an overview of remote-sensing satellite telemetry, outlines a particular telemetry processing challenge, describes ASDP's application of RC, discusses the results, and analyzes the current and future state of the art.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Finite impulse response filters (FIR filters) are very commonly used in digital signal processing (DSP) applications and are traditionally implemented using ASICs or DSP-processors. For FPGA implementation, due to the high throughput rate and large computational power required under real-time constraints, they are a challenging subject. Indeed, the limitation of resources on FPGA, i.e., logic blocks and flip flops, and furthermore, the high routing delays, requires compact implementations of the circuits. Three approaches for implementation of high-performance symmetric FIR filters on lookup table-based FPGAs will be considered in this paper. Fully parallel distributed arithmetic, table lookup multiplication, and conventional hardware multiplication. Implementation results will be illustrated by an 8 taps 8 bits symmetric FIR filter, and comparative considerations of the above approaches invoked for Xilinx FPGAs will be also shown.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a new methodology for rapid virtual system design and prototyping guided by a set of system basic architectures. This methodology includes the automatic hardware/software partitioning of a considered applications. The partitioning is performed under user-specified constraints (temporal performance, real time, system volume, ...). The aimed plat-form for system realization is a PC board with DSP and FPGA. The proposed methodology will be applied for design of a generic router node for an autonomous robot image processing board.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This work is part of a project that studies the implementation of neural network algorithms in reconfigurable hardware as a way to obtain a high performance neural processor. The results for Adaptive Logic Network (ALN) type binary networks with and without learning in hardware are presented. The designs were made on a hardware platform consisting of a PC compatible as the host computer and an ALTERA RIPP10 reconfigurable board with nine FLEX8K, FPGAs and 512 KB RAM. The different designs were run on the same hardware platform, taking advantage of its configurability. A software tool was developed to automatically convert the ALN network description resulting from the training process with the ATREE 2.7 for Windows software package into a hardware description file. This approach enables the easy generation of the hardware necessary to evaluate the very large combinatorial functions that results in an ALN. In an on-board learning version, an ALN basic node was designed optimizing it in the amount of cells per node used. Several nodes connected in a binary tree structure for each output bit, together with a control block, form the ALN network. The total amount of logic available on-board in the used platform limits the maximum size of the networks from a small to medium range. The performance was studied in pattern recognition applications. The results are compared with the software simulation of ALN networks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present methods for architectural adaptation that use application-specific hardware assists and policies to provide substantial improvements in performance on a per application basis. We have used architectural customization to improve performance of the memory hierarchy and utilize network bisection for the multiprocessor architecture. We demonstrate the utility of architectural customization in efficient memory hierarchy management and memory bandwidth requirements using an application in sparse matrix manipulations. The experimental work is presented in the context of the MORPH machine that is currently being designed to provide high system performance by directly addressing memory system limitations in the current machines. Based on our preliminary results, we propose that an application-driven machine customization provides a cost effective way to achieve high performance and combat performance fragility while maintaining application retargetability across architectures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A complete computing system supports a design path from problem description to implementation. The term configurable computing refers to complete computing systems that support the development of applications for configurable computing machines. Configurable computing systems generally include a microprocessor-based host, a configurable processing array and the tools necessary for capturing the problem and mapping it into software for the host and configurations for the hardware. This work proposes a framework for a set of platform independent configurable computing tools. The proposed tools temporally partition large designs, described in a textual language, into stages that can be mapped onto the computing array. The temporal partitions are spatially partitioned to support multiple FPGA arrays. These results are then given to platform specific backends that convert the tool's description of the design into functional FPGA configurations, hardware controllers and host-based control code.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We report on our ongoing work in the development of automated CCM mapping and scheduling tools. We seek efficient methods to assign a high-level computational description across the processing elements of a target CCM. Such an assignment requires both a partitioning in space (the task map) and a partitioning in time (the execution schedule). We embrace a number of algorithmic design techniques, spanning the spectrum from the hugely theoretical to the extremely applied. Our goal is to produce suites of tools that meet a variety of design objectives.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The ATM network was designed to deal with all sorts of traffic such as data-, voice-, and video-traffic. This is achieved using service categories (CBR, VBR, ABR, and UBR) and AAL protocols. Existing ATM Network Interface Cards (NICs) realize the full-range of service categories but generally perform only one AAL protocol (AAL1 $ARLR AAL5). Research in our laboratory investigates possibilities that offer reconfigurable logic (FPGA) in the field of ATM NICs. In this paper, we present a flexible multiservice ATM network interface. It is flexible through the use of FPGA (and eventually other resources like multipurpose processors) and it is multiservice as it enables lots of possible AAL-protocols, including complete AAL1 and AAL5, at one and the same time and also new specialized, proprietary adaptation protocols. These new protocols could be specially designed for security, compression, data structure recognition, ... Here present bandwidth estimations based on sample implementations and possible target architectures using different types of FPGAs (statically or partially/dynamically reconfigurable). Their advantages and drawbacks are discussed. Finally we try to isolate the open questions related to the flexible multiservice network interface.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper analyzes the performance differences found between software and hardware/software implementations of a reformulated Fuzzy ART neural network algorithm. This reformulated algorithm is a solution for a real time radar signal clustering problem. The software implementations run on a 50 MHz TMS320C40 DSP, and the hardware/software implementation runs on the same DSP for its software part, whereas the FPGA based application specific hardware accelerator is realized on the MiroTech's X-CIM TIM40 module. This investigation of FPGA based acceleration gave excellent results for our application: acceleration factors up to 68.9 have been reached.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
XBI(tm), the Xilinx Bitstream Interface is a set of Java (tm) classes which provide an Application Program Interface (API) into the Xilinx FPGA bitstream. This interface operates on either bitstreams generated by Xilinx design tools, or on bitstreams read back from actual hardware. This provides the capability of designing, modifying and dynamically modifying circuits in Xilinx XC4000 (tm) series FPGA devices. The programming model used by XBI is a 2D array of Configurable Logic Blocks (CLBs). Each CLB is referenced by a row and column, and all configurable resources in the selected CLB may be set or probed. Additionally, control of all routing resources adjacent to the selected CLB are made available. Because the code is written in Java, compilation times are very fast, and because control is at the CLB level, bitstreams can typically be modified or generated in times on the order of one second or less. This API has been used to construct complete circuits and to modify existing circuits. In addition, the object oriented support in the Java programming language has permitted a small library of parameterizable, object oriented macro circuits or Cores to be implemented. Finally, this API may be used as a base to construct other tools. This includes traditional design tools for performing tasks such as circuit placement and routing, as well as application specific tools to perform more narrowly defined tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There has recently been much research interest in the concept of evolvable hardware--partly due to the rapid technological changes brought about by reconfigurable devices and partly due to the success of evolutionary techniques in software systems. In this paper we contribute to this effort and present a scalable single chip solution for evolvable hardware. This employs standard off-the-shelf Field Programmable Gate Arrays as opposed to a custom silicon solution. The resulting system permits the automatic evolution of digital circuits to match some given specification and has significant advantages and features over existing design flows. The system employs evolutionary programming as the adaptive design process--however the underlying system architecture is independent of the evolutionary algorithm being employed and so may be changed as required. The system is described in the hardware description language VHDL and hence is portable to other programmable devices satisfying the architectural requirements which are also detailed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Reconfigurable computing devices are emerging as a viable alternative to fixed-function components and programmable processors. To expand our knowledge of the role and optimization of these devices, it is increasingly imperative for us to compare implementations of tasks and subroutines across this wide spectrum of implementation options. The fact that most processors, FPGAs, ASICs, and memories are fabricated in a uniform technology medium, CMOS VLSI, where area scaling is moderately well understood eases our comparison task. Nonetheless, the rapid pace of technology, limited device size selection, and economic artifacts complicate the picture. In this paper, we look at the task of comparing computing machines, reviewing normalization techniques and many important issues which arise during comparisons. This paper includes examples intended to underscore the methodology and comparison issues, but does not attempt to make definitive conclusions about the merits of the technology alternatives from the small sample set. The immediate intent of this work is to help designers faced with tradeoffs between technological alternatives. The longer term intent is to help the community collect and analyze the broad-based data needed to better understand the range of available computing options.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper presents an analysis of technology trends based on the data available from the recently released National Technology Roadmap for Semiconductors (NTRS 1997). This analysis shows that increasing clock rates and system diameter in clock periods will make efficient management of communication and coordination increasingly critical. Due to the decreasing cost of logic versus interconnect and the electrical necessity of signal regeneration to counter worsening effect of interconnect geometries, use of configurable logic blocks even in custom data-paths presents a unique opportunity to customize bindings, mechanisms, and policies which comprise the interaction of processing, memory, I/O and communication resources. This programming flexibility, or `customizability,' can provide the key to achieving robust high performance. We use the results of this study to make a case for evolution of computer architectures into `Application Adaptive' (AA) architectures. These architectures exploit the capability of the underlying hardware to reconfigure logic to achieve system-level cost/performance goals by extensive analysis and profiling of application data and runtime characteristics. A key distinction made by AA architectures against traditional custom-computing machines is that architectural flexibility is used to customize architectural mechanisms and policies (instead of building additional functional resources--an approach commonly adopted by custom computing machines). Thus relatively small amounts of reconfigurable circuit blocks can be leveraged to yield high performance on a per application basis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we introduce the architecture of a new embedded field programmable processor array (E-FPPA) which consists of a low-power multiprocessor system embedded with standard programmable logic blocks and memory. Each block (processor, programmable logic,...) is coupled to a transfer controller responsible of all the transfers between blocks. Instead of using a classical crossbar interconnection network, we propose a low cost hierarchical ring which combines simple interface and high performance communications when data locally is observed. This architecture is fully scalable and is based on a numa (non- uniform access time memory) multiprocessor scheme. The core of the architecture is a small RISC processor (actually, a very low power CoolRisc has been chosen) which is embedded with programmable logic blocks (similar to standard CPLD or FPGA), static RAMs and other devices (DSP coprocessor, peripherals, ...). By using the 8-bit CoolRisc processor, an E-FPPA including a cluster of 16 processors, 16 TC, 4 Kbytes data memory and 5.5 Kbytes program memory for each processor can deliver up to 3200 Mops at 100 MHz. The chip size has been evaluated in 0.35 micrometers to 52 mm2.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multimedia applications commonly require high computation power mostly in conjunction with high data throughput. As an additional challenge, such applications are increasingly used in handheld devices, where also small package outlines and low power aspects are important. Many research approaches have shown, that accelerators based on reconfigurable hardware can satisfy those performance demands. Most of these approaches use commercial fine- grained FPGAs to implement reconfigurable accelerators. However, it has shown, that these devices are not always well suited for reconfigurable computing. The drawbacks here are the area-inefficiency and the insufficiency of the available design-tools. Besides the fine-grained FPGAs, coarse-grained reconfigurable architectures have been developed, which are more area efficient and better suited for computational purposes. In this paper, an implementation of such an architecture, the KressArray, is introduced and its use in the Map-oriented Machine with Parallel Data Access (MoM-PDA) is shown. The MoM-PDA is an FPGA-based custom computing machine, which is able to perform concurrent memory accesses by means of a dedicated memory organization scheme. The benefits of this architecture are illustrated by an application example.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a methodology for specifying abstract models of reconfigurable architectures. These models may be used by compilers, synthesis systems and other design agents to evaluate the correctness and performance of postulated reconfiguration schedules. We show how the proposed methodology can be used to model reconfigurable computation, interconnect, memory and I/O elements interacting with each other using various protocols. We illustrate the modeling approach through small case studies. The proposed methodology is embedded in a modeling language called PDL+ and its support environment called ARC.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many real-time DSP applications can benefit from the use of FPGA or adaptive computing devices because these devices provide very high throughput for a small hardware cost. This efficiency comes with a price: it is very time consuming to actually program complicated algorithms in these devices. We have developed a new method of performing this implementation process, providing for an order of magnitude reduction in design time. Our approach consists of an interactive algorithm development tool closely coupled to a set of FPGA devices. From an algorithm script, the tool derives a hardware design, which includes the data path, host interface, custom sequencer, and address generators. The design is loaded and executed in the FPGA whenever a call for the function is encountered in the algorithm script. Pieces of the algorithm may be ported incrementally; the algorithm will always execute properly, regardless of the state of porting from software to hardware. This approach is optimized to allow mathematicians, generally unskilled in efficient hardware algorithm design, to directly implement algorithms in FPGAs. This enables designers to quickly see the effect of algorithmic changes and approximations on hardware efficiency, reducing the number and time of design iterations. We are currently porting our approach to the publicly available Ptolemy environment, which will facilitate the transfer of this methodology to the adaptive computing community.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper investigates two options for the field programmable gate array (FPGA) implementation of a very high-performance 2D discrete cosine transform (DCT) processor for real-time applications. The first architecture exploits the transform separability and uses a row-column decomposition. The row and column processors are realized using distributed arithmetic (DA) techniques. The second approach uses a naturally 2D method based on polynomial transforms. The paper provides an overview of the DCT calculation using DA methods and describes the FPGA implementation. A tutorial overview of a computationally efficient method for computing 2D DCTs using polynomial transforms is presented. A detailed analysis of the datapath for this approach using an 8 X 8 data-set is given. Comparisons are made that show the polynomial transform approach to require 67% of the logic resources of a DA processor for equal throughputs. The polynomial transform approach is also shown to scale better with increasing block size than the DA approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes and evaluates an approach for improving the performance of arithmetic calculations via delayed addition. Our approach employs the idea used in Wallace trees to delay addition until the end of a repeated calculation such as accumulation or dot-product; this effectively removes carry propagation overhead from the calculation's critical path. We present imager and floating- point designs that use this technique. Our pipelined integer multiply-accumulate design is based on a fairly traditional multiplier design, but with delayed addition as well. This design achieves a 37 MHz clock rate on an XC4036XL-2 FPGA. Next, we present a 32-bit floating-point accumulator based on delayed addition. Here delayed addition requires a novel alignment technique that decouples the incoming operands from the accumulated result. A conservative version of this design achieves a 33 MHz clock rate. Finally, we also present a more aggressive 32-bit floating-point accumulator design that achieves a 66 MHz clock rate. These designs demonstrate the utility of delayed addition for accelerating FPGA calculations in both the integer and floating-point domains.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Geometry processing comprises of a great many computationally intensive floating-point operations. Real- time graphics systems generally use application-specific custom designed parallel hardware to provide the high performance computation power. When designing a graphics engine on a FPGA-based configurable computing system, cost- effectiveness is important. This paper investigates and proposes a cost-effective FPGA-based floating-point datapath for geometry process. It is designed to be a basic building block for FPGA-based geometry processors. The implemented datapath operates at a frequency of 6.25 Mhz and has an average floating-point operation time of 10.2 microseconds.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Reconfigurable machines have recently been used as co- processors to accelerate the execution of certain algorithms or program subroutines. The problems with the above approach include high reconfiguration time and limited partial reconfiguration. By far the most critical problems are: (1) the small on-chip memory which results in slower execution time, and (2) small FPGA areas that cannot implement large subroutines. Dynamically Programmable Cache (DPC) is a novel architecture for embedded processors which offers solutions to the above problems. To solve memory access problems, DPC processors merge reconfigurable arrays with the data cache at various cache levels to create a multi-level reconfigurable machines. As a result DPC machines have both higher data accessibility and FPGA memory bandwidth. To solve the limited FPGA resource problem, DPC processors implemented multi-context switching (Virtualization) concept. Virtualization allows implementation of large subroutines with fewer FPGA cells. Additionally, DPC processors can parallelize the execution of several operations resulting in faster execution time. In this paper, the speedup improvement for DPC machines are shown to be 5X faster than an Altera FLEX10K FPGA chip and 2X faster than a Sun Ultral SPARC station for two different algorithms (convolution and motion estimation).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As FPGA density increases, so does the potential for configurable computing machines. Unfortunately, the larger designs which take advantage of the higher densities require much more effort and longer design cycles, making it even less likely to appeal to users outside the field of configurable computing. To combat this problem, we present the Reconfigurable Computing Application Development Environment (RCADE). The goals of RCADE are to produce high performance applications, to make FPGA design more accessible to those who are not hardware engineers, to shorten the design lifecycle, and to ease the process of migration from one platform to another. Here, we discuss the environment architecture, the current set of agents, and other agents to be developed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.