The purpose of this work is to provide a model for the average time to detection for observers searching for targets in photo-realistic images of cluttered scenes. The proposed model builds on previous work that constructs a fixation probability map (FPM) from the image. This FPM is constructed from bottom- up features, such as local contrast, but also includes top- down cognitive effects, such as the location of the horizon. The FPM is used to generate a set of conspicuous points that are likely to be fixation points, along with initial probabilities of fixation. These points are used to assemble fixation sequences. The order of these fixations is clearly crucial for determining the time to fixation. Recognizing that different observers (unconsciously) choose different orderings of the conspicuous points, the present model performs a Monte- Carlo simulation to find the probability of fixating each conspicuous point at each position in the sequence. The three main assumptions of this model are: the observer can only attend to the area of the image being fixated, each fixation has an approximately constant duration, and there is a short term memory for the locations of previous fixation points. This fixation point memory is an essential feature of the model, and the memory decay constant is a parameter of the model. Simulations show that the average time to fixation for a given conspicuous point in the image depends on the distribution of other conspicuous points. This is true even if the initial probability of fixation for a given point is the same across distributions, and only the initial probability of fixation of the other points is distributed differently.