Due to the dynamic nature of brain studies in functional magnetic resonance imaging (fMRI), fast pulse sequences such as echo planar imaging (EPI) and spiral are often used for higher temporal resolution. Hundreds of frames of two- dimensional (2-D) images or multiple three-dimensional (3-D) images are often acquired to cover a larger space and time range. Therefore, fMRI often requires a much larger data storage, faster data transfer rate and higher processing power than conventional MRI. In Mercury Computer Systems' PCI-based embedded computer system, the computer architecture allows the concurrent use of a DMA engine for data transfer and CPU for data processing. This architecture allows a multicomputer to distribute processing and data with minimal time spent transferring data. Different types and numbers of processors are available to optimize system performance for the application. The fMRI reconstruction was first implemented in Mercury's PCI-based embedded computer system by using one digital signal processing (DSP) chip, with the host computer running under the Windows NTR platform. Double buffers in SRAM or cache were created for concurrent I/O and processing. The fMRI reconstruction was then implemented in parallel using multiple DSP chips. Data transfer and interprocessor synchronization were carefully managed to optimize algorithm efficiency. The image reconstruction times were measured with different numbers of processors ranging from one to 10. With one DSP chip, the timing for reconstructing 100 fMRI images measuring 128 X 64 pixels was 1.24 seconds, which is already faster than most existing commercial MRI systems. This PCI-based embedded multicomputer architecture, which has a nearly linear improvement in performance, provides high performance for fMRI processing. In summary, this embedded multicomputer system allows the choice of computer topologies to fit the specific application to achieve maximum system performance.