The early detection of abnormal regions with increased tracer uptake in positron emission tomography (PET) is a key driver of imaging system design and optimization as well as choice of imaging protocols. Detectability, however, remains difficult to assess due to the need for realistic objects mimicking the clinical scene, multiple lesion-present and lesion-absent images and multiple observers. Fillable phantoms, with tradeoffs between complexity and utility, provide a means to quantitatively test and compare imaging systems under truth-known conditions. These phantoms, however, often focus on quantification rather than detectability. This work presents extensions to a novel phantom design and analysis techniques to evaluate detectability in the context of realistic, non-piecewise constant backgrounds. The design consists of a phantom filled with small solid plastic balls and a radionuclide solution to mimic heterogeneous background uptake. A set of 3D-printed regular dodecahedral ‘features’ were included at user-defined locations within the phantom to create ‘holes’ within the matrix of chaotically-packed balls. These features fill at approximately 3:1 contrast to the lumpy background. A series of signal-known-present (SP) and signal-known-absent (SA) sub-images were generated and used as input for observer studies. This design was imaged in a head-like 20 cm diameter, 20 cm long cylinder and in a body-like 36 cm wide by 21 cm tall by 40 cm long tank. A series of model observer detectability indices were compared across scan conditions (count levels, number of scan replicates), PET image reconstruction methods (with/without TOF and PSF) and between PET/CT scanner system designs using the same phantom imaged on multiple systems. The detectability index was further compared to the noise-equivalent count (NEC) level to characterize the relationship between NEC and observer SNR.