Embedded smart cameras are gaining in popularity for a number of real-time outdoor surveillance applications. However, there are still challenges, i.e., computational latency, variation in illumination, and occlusion. To solve these challenges, multimodal systems, integrating multiple imagers can be utilized. However, trade-off is more stringent requirements on processing and communication for embedded platforms. To meet these challenges, we investigated two low-complexity and high-performance preprocessing architectures for a multiple imagers’ node on a field-programmable gate array (FPGA). In the proposed architectures, majority of the tasks are performed on the thermal images because of the lower spatial resolution. Analysis with different sets of images show that the system with proposed architectures offers better detection performance and can reduce output data from 1.7 to 99 times as compared with full-size images. The proposed architectures can achieve a frame rate of 53 fps, logics utilization from 2.1% to 4.1%, memory consumption 987 to 148 KB and power consumption in the range of 141 to 163 mW on Artix-7 FPGA. This concludes that the proposed architectures offer reduced design complexity and lower processing and communication requirements while retaining the configurability of the system.