The current development of increasingly sensitive low-light detector technologies in the VNIR/SWIR regions shows many promises for future night vision applications, including digital image fusion. By combining spectral bands from the reflective and the thermally emissive domains, providing complementary band-specific cues and advantages, it is anticipated that a fused representation will increase situational awareness and target discrimination performance. Performance assessment of image fusion still remains an open problem however, as suitable procedures, models and image quality metrics are still largely missing. A night-time data collection was made on a side-aspect two-hand object identification task over several ranges in a rural/woodland area using a common line-of-sight VNIR/LWIR system. Perception experiments based on an 8-alternative forced choice (8AFC) object ID task were performed, on both the two individual bands as well as several common pixel-based fusion algorithms (including maximum, subtraction and averaging). As image fusion is highly task and scene dependent it is difficult to draw any general conclusions from a single experiment, but for the particular task/scene combination investigated most of the fusion algorithms are shown to perform better than the VNIR channel, albeit most of them fail to perform as well as the LWIR. This is thought to be the result of the VNIR channel being contrast-limited for the particular task/scene being studied and the low dynamic range of the low-light EBCMOS camera used in the fusion setup.