Moving object detection is one of the most promising research areas, which is required in different applications, such as video monitoring and surveillance systems, human activity recognition systems, vehicle counting, and anomaly detection. Various methods for object detection using single sensor and a few using multimodal techniques have been reported in the literature. However, such systems fail to handle adverse or challenging atmospheric conditions such as illumination variations, scale and appearance change of objects or targets, occlusions, and camouflaged conditions. We have presented an approach for the detection of moving objects using structural similarity metric (SSIM) and Gaussian mixture model (GMM). SSIM is used to compute similarity between reference mean background frame and foreground frame of visible spectrum (VIS) and thermal infrared (IR) independently. The computation of similarity measure is performed in an image spatial domain. The threshold results of SSIM are fused together using different pixel-level fusion methods such logical “OR,” discrete wavelet transform, and principal components analysis. Temporal analysis is performed to eliminate noise and false positives (unwanted background regions) using GMM on fused results. We have compared the results with recent methods for different complex scenarios and found out that approximately F-measure increases up to 80%. Hence, the proposed method proves to be a robust moving object detection technique in multimodality domain.