An optical cell switch for interconnecting massively parallel nodes offers the potential for reduction in size, power consumption, and cost of high-performance computing (HPC) interconnects. We designed an architecture based on a broadcast and select approach that is highly flexible in terms of supported ports and can easily scale from a 16x16 to a 2048x2048 port switch by exploiting both wavelength multiplexing and fiber multiplexing. The optical system is designed for 40 Gb/s operation, but the full 160 Gb/s switching likely required of a commercial system can be supported by this architecture. At the core of the switch is an array of semiconductor optical amplifiers (SOAs), which provide fast switching (~1 ns), high extinction ratio (>40 dB) for cross-talk reduction, and optical gain (15 dB typical). The full optical switch consists of a 2-stage broadcast and select design for fiber select and color select, leading to a bufferless low-latency crossbar cell switch. A switch system demonstrator with 8 full optical paths has been implemented and used for performance characterization. A fast 40 Gb/s cell receiver was developed and proven to support up to 9 dB dynamic range. System demonstration measurements have shown that a raw BER of 10-15 is achievable. Optical cross-talk is negligible and does not degrade the system performance. The system design and verification experiments demonstrate that a scaleable 40 Gb/s switch for massively parallel systems is feasible and offers the potential for significant size, power consumption, and cost reduction by applying the scalability of an optical solution to a HPC system.