To detect drones in real-time at large distances, using high-resolution visual(VIS) cameras, requires a careful system design. In our setup we use four 10-bit 25 Megapixels@25fps cameras to detect drones. This means a theoretical raw data throughput of about 33 GB/sec for the system from the VIS cameras alone. We implemented a small-object detection algorithm on two different platforms. The algorithm is based on a point-detector with a subsequent clustering step. One platform is a Xilinx Kintex-7 FPGA, the other a Nvidia GeForce GTX 1080 GPU. We explain, how the small-object detection algorithm, based on a software reference implementation, has been ported to a FPGA version. It is shown, that the FPGA implementation of the point-detector reaches the optimal throughput and we show, that the clustering can be done in a streaming fashion. Using the FPGA implementation, the camera can be used in free-running mode, processing the camera data in real time. The CUDA implementation of the algorithm shows, that the computing capabilities of modern GPGPUs allow an easy port of the algorithm to this platform to reach real-time performance with 68fps. The use of four high-resolution cameras requires a careful overall system design. We present our hardware system design and the central data distribution system. As the small-object detections may be used by different processes like the tracking process or display processes, a fast, safe and robust distribution system must be provided. We describe our approach using the Boost Interprocess library to provide a portable data distribution system.