Connected Component Labeling (CCL) is a basic algorithm in image processing and an essential step in nearly every application dealing with object detection. It groups together pixels belonging to the same connected component (e.g. object). Special architectures such as ASICs, FPGAs and GPUs were utilised for achieving high data throughput, primarily for video processing. In this article, the FPGA implementation of a CCL method is presented, which was specially designed to process high resolution images with complex structure at high speed, generating a label mask. In general, CCL is a dynamic task and therefore not well suited for parallelisation, which is needed to achieve high processing speed with an FPGA. Facing this issue, most of the FPGA CCL implementations are restricted to low or medium resolution images (≤ 2048 ∗ 2048 pixels) with lower complexity, where the fastest implementations do not create a label mask. Instead, they extract object features like size and position directly, which can be realized with high performance and perfectly suits the need for many video applications. Since these restrictions are incompatible with the requirements to label high resolution images with highly complex structures and the need for generating a label mask, a new approach was required. The CCL method presented in this work is based on a two-pass CCL algorithm, which was modified with respect to low memory consumption and suitability for an FPGA implementation. Nevertheless, since not all parts of CCL can be parallelised, a stop-and-go high-performance pipeline processing CCL module was designed. The algorithm, the performance and the hardware requirements of a prototype implementation are presented. Furthermore, a clock-accurate runtime analysis is shown, which illustrates the dependency between processing speed and image complexity in detail. Finally, the performance of the FPGA implementation is compared with that of a software implementation on modern embedded platforms.