An inner product processor is presented which is capable of performing 3.3 million inner products a second, where each vector consists of 100 elements each 20 bits wide. This is equivalent to more than 660 million 40 bit arithmetic operations a second. The latency of a particular calculation is 12.3 microseconds. The processor can be constructed entirely from 1024 by 6 bit ROMs with 300 ns cycle times and latchable inputs or outputs. Modular arithmetic is used internally; the input and output are binary. The specifications of the architecture are compatible with the stricter structural requirements needed by an optical implementation.