Algorithms developed for the detection of landmines are tasked with discriminating a wide variety of targets in a diverse array of environmental conditions. However, the potential performance of a detection algorithm may be underestimated by evaluating it in batch on a large, diverse dataset. This is because environmental, or in general, contextual, factors may contribute signiﬁcant variance to the output of a detection algorithm across diﬀerent contexts. One way to view this is as a problem of miscalibration: within each context, the output scores of a detection algorithm can be seen as miscalibrated relative to the scores produced in the other contexts. As a result of this miscalibration, the observed receiver operating characteristic (ROC) curve for a detector can have a sub-optimal area-under-the-curve (AUC). One solution, then, is to re-calibrate the detector within each context. In this work, we identify multiple sets of contexts in which diﬀerent landmine detection algorithms exhibit signiﬁcant output variance and, consequently, miscalibration. We then apply a monotonic calibration strategy that maximizes AUC and demonstrate the gain in observed performance that results when a landmine detection algorithm is properly calibrated within each context.