We have performed an experiment that compares the performance of human observers with that of a robust algorithm for the detection of targets in difficult, nonurban forward-looking infrared imagery. Our purpose was to benchmark the comparison and document performance differences for future algorithm improvement. The scale-insensitive detection algorithm, used as a benchmark by the Night Vision Electronic Sensors Directorate for algorithm evaluation, employed a combination of contrastlike features to locate targets. Detection receiver operating characteristic curves and observer-confidence analyses were used to compare human and algorithmic responses and to gain insight into differences. The test database contained ground targets, in natural clutter, whose detectability, as judged by human observers, ranged from easy to very difficult. In general, as compared with human observers, the algorithm detected most of the same targets, but correlated confidence with correct detections poorly and produced many more false alarms at any useful level of performance. Though characterizing human performance was not the intent of this study, results suggest that previous observational experience was not a strong predictor of human performance, and that combining individual human observations by majority vote significantly reduced false-alarm rates.