Translator Disclaimer
Open Access Paper
11 March 2014 Visual search from lab to clinic and back
Author Affiliations +
Many of the tasks of medical image perception can be understood as demanding visual search tasks (especially if you happen to be a visual search researcher). Basic research on visual search can tell us quite a lot about how medical image search tasks proceed because even experts have to use the human “search engine” with all its limitations. Humans can only deploy attention to one or a very few items at any one time. Human search is “guided” search. Humans deploy their attention to likely target objects on the basis of the basic visual features of object and on the basis of an understanding of the scene containing those objects. This guidance operates in medical images as well as in the mundane scenes of everyday life. The paper reviews some of the dialogue between medical image perception by experts and visual search as studied in the laboratory.



Many medical image perception tasks are tasks of interpretation. The clinician is asked to look at something and evaluate it. Is it broken? Is it infected? In another set of tasks, the clinician is asked to find something. Is there a stroke in this brain? Is there cancer in this breast? These are visual search tasks, where the location and even the presence of a finding are uncertain. The human “search engine” is, at once, powerful and constrained. It is useful to understand the way that the capabilities and limitations of human search interact with the demands of medical search tasks because that understanding can lead to improvements in vitally important medical image perception tasks.



Imagine that you are looking for a 1977 penny in a pile of pennies. There is not much you can do except to direct your attention to each penny in turn, rejecting it when it turns out to have a different date and moving on to the next one. Since that date inscription is small, you will need to fixate each penny to get the numbers on the fovea. Voluntary fixations occur at a rate of about 4 per second so your penny search will be constrained to be at least that slow. If the numbers on the pennies were big enough that you could read them without the need to foveate each one, the rate at which you could processes the pennies would increase to something like 25-50 pennies per second. This tells you either that the “spotlight” attention can be deployed separately from and faster than the deployment of the eyes or, perhaps, it tells you that, under the right circumstances, more than one item can be processed in parallel during each fixation of the eyes 1.

If all of visual search was a succession of such “serial, self-terminating” searches, finding anything from the cat to an aneurysm would be a needle-in-the-haystack experience and it would be hard to get anything done. Even at 25-50 deployments of attention per second, it would take far too long to find your keys, your socks, or a nodule in the liver. Fortunately, the human “search engine” is smarter than that. It uses several sources of information to “guide” attention. Suppose, for example, you were still looking for that 1977 penny but now the coins are not all pennies. They are a mix of pennies, nickels, dimes, and quarters. You will still need to search through one coin after another. However, you will be able to use the size and color of the coins to restrict your attention to the pennies, avoiding the other coins 2. If that 1977 penny was the only penny in the pile, search would be trivial. With the target is defined by a unique, salient feature the penny would “pop-out” of the display 3.



The penny example suggests that there are a set of features that are processed in parallel across the entire visual field 4. A single, unique item will tend to attract attention whether or not the observer is looking for that feature. This is an example of the “bottom-up”, stimulus-driven guidance of attention. In Figure 1, the white, horizontal item on the left attracts attention in this bottom-up manner because its features are markedly different from those of its homogeneous neighbors. The same type of white horizontal item is much less salient on the right, in a more diverse neighborhood. It is this local, bottom-up salience that is captured by most “saliency” algorithms 5,6.

Figure 1:

Notice that the white horizontal item on the left attracts your attention more than the white horizontal item on the right.


Bottom-up saliency is useful, up to a point but, if you think about a typical medical image, the clinically significant finding is not likely to be the most salient feature in the image – not if we define salience in terms of these local differences in basic features. In deliberate search tasks, when we have a target in mind, we will configure our search engine in a top-down, user-driven manner to guide attention to candidate targets. Thus, in Figure 1, you can configure yourself to look for black and vertical. This will rapidly guide your attention to the black vertical item even if it was not the most salient item and had not attracted much attention bottom-up 7. As a medical example, in lung CT, in a search for long nodules, attention will be guided to small white objects.

Note that, in both of these examples, the target does not happen to be defined by a single feature but by a conjunction of features. Some items are black. Other items are vertical. You are looking for the conjunction of black and vertical and the intersection of the set of black things and the set of vertical things would be an excellent place to look for black vertical targets.

There is a limited set of attributes that can guide attention. You cannot direct your search engine to guide your attention to breast cancer. A trained radiologist can search for breast cancer, of course, but the guidance that limits the set of sensible places to deploy attention will be drawn from a limited vocabulary of attributes. There appear to be one to two dozen of these 8. Everyone agrees on attributes like color and motion. Other attributes like “shininess” have less experimental attestation 9, while other candidates like faces or facial emotion remain controversial even after years of research 10 11.



The human search engine is not a search engine like Google that allows you type anything into the search box. Not only is the set of attributes limited, but additionally, the use of those attributes is also limited.

One set of constraints is illustrated in Figure 2 where there are two types of target shown in the “search box”: Items with big and small parts on the left and small things with big parts on the right. Use you search engine to find each in turn. It turns out that the human search engine is limited to one feature value for each attribute. That is, observers can search for the item that has the color feature “RED” and the size feature “BIG”. This yields the intersection of the sets of red and big items. However, if the observer tries to look two size features (BIG and SMALL), that search seems to yield the union of the big and small items. In this case, that includes all of the dumbbell objects and thus there is no guidance 12. However, the system is capable of search for a feature of the whole object and a feature, in the same dimension, of a part of that object. Thus, the search for the small square objects with the larger, enclosed square parts is guided and should feel somewhat easier 13,14 (Did you find both examples of each target?)

Figure 2:

The “search box” at the top of the figure tells you what to look for. You should find that it is harder to find the dumbbell with big and small parts than it is to find small square item with the large black square part. (There are two of each.)


Notice that the size terms used here are “big” and “small”. We seem to be able to talk to our search engine only in a very limited vocabulary. We can guide to big and small (but not medium-sized) 15. Orientation seems to be defined by the terms: steep, shallow, left, and right 16. Color guidance is probably guidance by color categories like “red” and “blue” not “rose red” or “610nm” 17. This color categorical guidance has medical image perception consequences. Think of color heat maps in, for example, PET images. The scale is continuous but we are predisposed to see these maps categorically. For example, in a standard red-to-green heat map, we might see a red hot spot of a specific dimension. The shape and size of that perceived hot spot would be quite different if the color mapping was changed even though that mapping would not change the underlying data.

Even with this limited vocabulary of search terms, it is possible to guide search quite intelligently because, as a general rule, the world is not constituted like Figure 2. In the real world, knowing that you are looking for a big, red, shiny thing with a small yellow part is likely to substantially reduce the set of candidates. We know that it is possible to guide to many attributes at the same time 18. Moreover, visual search for objects is best if you show the observer the exact target object just before the search 19, suggesting that our search engine can translate a picture cue into an effective search template quickly and effectively.



There is a downside to effective guidance. In mechanical terms, guidance probably involves setting “weights” in the nervous system to boost the effectiveness of some feature (e.g. red) or some dimension (e.g. color) 20-22. That can make it less likely that another, incidental target will be found. We had observers searching for nodules in lung CT so, we may presume, they had their search engine set for small and white. As a consequence, 84% of our expert radiologists (and 100% of non-radiologists) failed to report a gorilla the size of a matchbook that we had inserted across 5 slices in the last case (upper right of Figure 3) 23. Others have seen similar effects (like missing a missing clavicle 24). We were tracking our radiologists’ eye movements so we could see that the gorilla was often fixated and still missed. Looking at an object is not quite the same thing as “seeing” it. Indeed, it is even possible to pick up a target and move it without noticing that it is the thing you are looking for 25.

Figure 3:

What is a gorilla doing in the upper right quadrant?




When researchers have looked at the eye movements of expert radiologists and compared the results to those from novices, the most striking difference is that experts look in fewer places 26. Clearly something is guiding the expert’s search and it seems very unlikely that this guidance is produced by better appreciations of the basic features of the targets. Indeed, guidance of attention by basic features will only get us part of the way toward explaining the efficiency of search in real world scenes in general. An expert, searching for breast cancer, and shopper, searching for cucumbers are both guided by a set of “scene guidance” cues. These are not present in random arrays of items like those in Figures 1 and 2. However, they are present in most standard search tasks. Consider the cucumber search. A shopper, in the produce section, will be aided by “syntactic” guidance – guidance based on the physical rules of the world. Cucumbers will be in a bin somewhere. They will not be floating in mid-air because cucumbers just do not do that. Search will be aided by “semantic” guidance – guidance based on what we know about cucumbers, beyond their status as physical objects. They are likely to be near the carrots and celery because the produce section is typically ordered by rules like “put salad vegetables near each other”. Other rules are perfectly possible (arrange by color or alphabetically) but that is not how it is typically done and our shopper knows that. Finally, he might be aided by “episodic” guidance. The cucumbers were in this location last week. They are probably in the same place today. (“Episodic” from episodic memory.). Take the shopper to another store and the episodic guidance will fail. Take the shopper to another country and the semantic guidance might fail. Syntactic guidance should be reliable as long as the laws of gravity do not change.

The rules of feature guidance probably come with the system. You don’t have to learn to guide attention to color or size. On the other hand, semantic guidance, in particular, must be learned. There is no rule of nature that says that forks tend to lie near plates on a table. A significant part of search expertise in tasks from radiology to satellite surveillance must involve learning the contingencies that are reliable in landscapes from North Korea to CT colonoscopy. In radiology, it is interesting to note the rapid rate of change in the “landscape” as the technology evolves. For example, the 2D world of the chest x-ray has evolved into the 3D stack of CT images representing a volume and not just a plane. The relative novelty of 3D volumetric image data gives us the chance to observe what might be thought of as the evolution of guidance.

The gorilla study, mentioned above, was part of just one case in a study that was actually directed at measuring the eye movements of radiologists as they searched for nodules in lung CT. Using an eye tracker, we could measure the eyes’ position in the XY plane while also registering “depth” by tracking the slice that was being viewed as the clinicians scrolled up and down through the stack of images of the lung 27. We found that radiologists fell into two groups. “Drillers” tended to have their eyes fixated in one spot in the XY plane or, at least, in one quadrant of the lung, while they scrolled up and down through the lung. “Scanners” moved more slowly in depth while moving their eyes throughout all four quadrants in the present slice (Quadrants are coded by color in Figure 4). We do not yet know if one of these methods is superior. It is possible, however, that scanning is the older style, having been all that could be done with a 2D chest x-ray. Drilling might be an adaptation to the new, 3D world.

Figure 4:

Representation of the eye movements of four radiologists as they search as stack of lung CT images for lung nodules. THe x-axis shows time. The y-axis shows depth – simply the slice fixated at a specific time. Colors show which quadrant the eyes are fixating at that time. “Drillers” stay in one quadrant while drilling through the whole stack. Then they move and drill again. Scanners examine all quadrants before moving to the next slice. The two scanners shown here make only a single pass through the stack.




One of the striking aspects of scene perception is the speed with which one can extract the “gist” of a scene. Tens of milliseconds of exposure are all that is required to know that a scene in natural or man-made or that it is navigable or that it contains an animal 28 29. Interestingly, this gist signal is often quite global in nature. That is, an observer might have a reliable sensation that an animal is present but not actually know what animal it might be or where in the image it is located 30. Experts sometimes report a similar phenomenon. A radiologist might have the sense that an image contains bad news for the patient before locating the actual problem. Kundel and Nodine 31 talk about an initial stage of “holistic” or “gestalt” processing of radiological images before the clinician gets down to actual search. We found that expert mammographers were significantly above chance in classifying a mammogram as normal or abnormal after just 250 msec of exposure 32. Technologists, reading cervical cancer slides, had similar abilities. In both cases, the experts were at chance in localizing the pathology. Apparently, they had become sensitive to a global signal in a very specific kind of scene. They could detect the “gist” of cancer at above chance levels. Non-experts were at chance in these tasks.



The various forms of guidance, discussed here, make it possible for humans to find what they are looking for quite effectively in many, if not most cases. However, guidance does not solve (and may even complicate) one of the most fundamental search problems. When are you finished? If you are looking for your cell phone and you find your cell phone, then the answer is straight-forward. You are done. But suppose you do not find the phone? How long should you keep looking? Or suppose that you are looking for the best lemon in the supermarket or all of the metastases in an abdominal CT? In these cases, the correct quitting time is not obvious. Guidance complicates matters because, with guidance, the correct answer isn’t “look through everything”. Perhaps you will look through all or most of the candidates above some guidance threshold. But if you are looking at a complex image like a mammogram, setting that threshold becomes the quitting problem. It is not at all obvious where to set the threshold and when to quit. Mammography is a good venue to worry about quitting time because, in screening mammography (as in baggage screening and a number of other important tasks), actual targets are very rare. Almost all the cases end when the clinician decides to quit without a positive finding.

It turns out the extreme rarity of target in a task like breast cancer screening is, itself, a problem. You are more likely to miss a rare target than a common one 33 34. This seems to be true, even if you are an expert radiologist. We took 100 mammograms, 50 positive cases and 50 negative, and introduced them into the normal workflow of a breast cancer screening at a slow rate of less than one per day. Under those low prevalence conditions, radiologists missed 30% of the cancers. When we had radiologists read the same 100 cases in a single session outside the clinic (50% prevalence), they missed just 12% of the cancers 35. Prevalence modulated performance. The same thing happened with cervical cancer screening stimuli 36 and with airport baggage screeners 37. Like the gorilla experiment, these results do not reflect badly on radiologists (or airport screeners, for that matter). These results tell us that the limits on the human search engine on human decision making processes apply to experts as well as to novices. We need to understand how humans perform search tasks, if we are going to ask experts to do difficult, important search tasks and if we want them to do those tasks well.



Of course, the limits of human search engines would not be a problem if we could turn over medical image perception tasks to the computer; but we can’t – not yet, in most cases. Computer Aided Detection (CADe) systems are good but they are not perfect so, at present, they are partners with human observers, not replacements for those human. One would think that a good radiologist plus a good CADe system would be markedly better than either alone. Curiously, that is not the case. On balance, CAD helps but the improvement is modest when it is found at all 38,39. Radiologists do not appear to make optimal use of CAD signals. For instance, in one study, radiologist failed to act on 70% of correct positive CAD marks 40. The prevalence problem might be one reason. Suppose we screen 1000 women. There might be 3 cancers and a good CADe system might mark them all. A really good system might also mark 10% of negative cases. Even a system this good, therefore, is producing 100 false positive marks in these 1000 cases. The radiologist gets 103 marks, three of them correctly marking cancer. That is not a positive predictive value designed to inspire confidence. Moreover, a CAD mark at one location may actually make it less likely that an observer will find a target at another location. That was our finding with non-radiologists and a simulated CAD task 41. This is a relation of a more general problem known as “satisfaction of search” 42-44 or “subsequent search misses” 45 where finding one target makes it less likely that you will find a second one in displays that have multiple targets.



Radiologists and visual attention researchers have a lot to say to each other. Radiologists perform remarkable feats of visual search on a daily basis. By trying to understand what they do, we can come closer to understanding quotidian world of search in which we all live. At the same time, by understanding the capabilities and limitations of the human search engine, we may be able to identify pitfalls and opportunities in the world of medical image perception.


The work reviewed here was supported by grants from NIH (EY017001), the Office of Naval Research ONR MURI N000141010278, and Toshiba Medical Systems (BWH Agreement No. A203079). In addition, support was provided by NIH postdoctoral fellowships for Trafton Drew (1F32EB011959) and Karla Evans (1F32EY019819).



Wolfe, J.M., “Moving towards solutions to some enduring controversies in visual search,” Trends Cogn Sci, 7 70 –76 (2003). Google Scholar


Egeth, H.E., Virzi R.A. & Garbart, H., “Searching for conjunctively defined targets,” J. Exp. Psychol: Human Perception and Performance, 10 32 –39 (1984). Google Scholar


Egeth, H., Jonides J. & Wall, S., “Parallel processing of multielement displays,” Cognitive Psychology, 3 674 –698 (1972). Google Scholar


Treisman, A. & Gelade, G., “A feature-integration theory of attention,” Cognitive Psychology, 12 97 –136 (1980). Google Scholar


Koch, C. & Ullman, S., “Shifts in selective visual attention: Towards the underlying neural circuitry,” Human Neurobiology, 4 219 –227 (1985). Google Scholar


Itti, L. & Koch, C., “Computational modelling of visual attention,” Nature Reviews of Neuroscience, 2 194 –203 (2001). Google Scholar


Wolfe, J.M., Cave K.R. & Franzel, S.L., “Guided Search: An alternative to the Feature Integration model for visual search,” J. Exp. Psychol. - Human Perception and Perf., 15 419 –433 (1989). Google Scholar


Wolfe, J.M. & Horowitz, T.S., “What attributes guide the deployment of visual attention and how do they do it?,” Nature Reviews Neuroscience, 5 495 –501 (2004). Google Scholar


Wolfe, J.M. & Franzel, S.L., “Binocularity and visual search,” Perception and Psychophysics, 44 81 –93 (1988). Google Scholar


Becker, S.I., Horstmann, G. & Remington, R.W., “Perceptual grouping, not emotion, accounts for search asymmetries with schematic faces,” Journal of Experimental Psychology: Human Perception and Performance, 37 1739 –1757 (2011). Google Scholar


Horstmann, G., Scharlau I. & Ansorge, U., “More efficient rejection of happy than of angry face distractors in visual search,” Psychon Bull Rev, 13 1067 –1073 (2006). Google Scholar


Wolfe, J.M. et al., “Limitations on the parallel guidance of visual search: Color X color and orientation X orientation conjunctions,” J. Exp. Psychol: Human Perception and Performance, 16 879 –892 (1990). Google Scholar


Wolfe, J.M., Friedman-Hill, S.R. & Bilsky, A.B., “Parallel processing of part/whole information in visual search tasks,” Perception and Psychophysics, 55 537 –550 (1994). Google Scholar


Bilsky, A.A. & Wolfe, J.M., “Part-whole information is useful in size X size but not in orientation X orientation conjunction searches,” Perception and Psychophysics, 57 749 –760 (1995). Google Scholar


Hodsoll, J.P., Humphreys, G.W. & Braithwaite, J.J., “Dissociating the effects of similarity, salience, and top-down processes in search for linearly separable size targets,” Percept Psychophys, 68 558 –570 (2006). Google Scholar


Wolfe, J.M., Friedman-Hill, S.R., Stewart, M.I. & O’Connell, K.M., “The role of categorization in visual search for orientation,” J. Exp. Psychol: Human Perception and Performance, 18 34 –49 (1992). Google Scholar


Yokoi, K. & Uchikawa, K., “Color category influences heterogeneous visual search for color,” J Opt Soc Am A Opt Image Sci Vis, 22 2309 –2317 (2005). Google Scholar


Nordfang, M. & Wolfe, J.M., “Guided Search for Triple Conjunctions,” Atten Percept Psychophys, (2014). Google Scholar


Wolfe, J., Horowitz, T., Kenner, N.M., Hyle M. & Vasan, N., “How fast can you change your mind? The speed of top-down guidance in visual search,” Vision Research, 44 1411 –1426 (2004). Google Scholar


Zehetleitner, M., Krummenacher, J., Geyer, T., Hegenloh M. & Müller, H., “Dimension intertrial and cueing effects in localization: support for pre-attentively weighted one-route models of saliency,” Attention, Perception, & Psychophysics, 73 349 –363 (2011). Google Scholar


Krummenacher, J., Muller, H.J. & Heller, D., “Visual search for dimensionally redundant pop-out targets: Parallel-coactive processing of dimensions is location specific,” J. Exp. Psychol: Human Perception and Performance, 28 1303 –1322 (2002). Google Scholar


Found, A. & Muller, H.J., “Searching for unknown feature targets on more than one dimension: Investigating a ‘dimension weighting’ account,” Perception and Psychophysics, 58 88 –101 (1996). Google Scholar


Drew, T., Vo, M.L.-H. & Wolfe, J.M., “The Invisible Gorilla Strikes Again: Sustained Inattentional Blindness in Expert Observers,” Psychological Science, 24 1848 –1853 (2013). Google Scholar


Potchen, E.J., “Measuring observer performance in chest radiology: some experiences,” J Am Coll Radiol, 3 423 –432 (2006). Google Scholar


Solman, G.J., Cheyne, J.A. & Smilek, D., “Found and missed: failing to recognize a search target despite moving it,” Cognition, 123 100 –118 (2012). Google Scholar


Nodine, C.F., Kundel, H.L., Lauver, S.C. & Toto, L.C., “Nature of expertise in searching mammograms for breast masses,” Acad Radiol, 3 1000 –1006 (1996). Google Scholar


Drew, T. et al., “Scanners and drillers: Characterizing expert visual search through volumetric images. Journal of Vision,” (2013). Google Scholar


Greene, M.R. & Oliva, A., “The briefest of glances: the time course of natural scene understanding,” Psychol Sci, 20 464 –472 (2009). Google Scholar


Li, F.F., VanRullen, R., Koch, C. & Perona, P., “Rapid natural scene categorization in the near absence of attention,” Proc Natl Acad Sci U S A, 99 9596 –9601 (2002). Google Scholar


Evans, K.K. & Treisman, A., “Perception of objects in natural scenes: is it really attention free?,” J Exp Psychol Hum Percept Perform, 31 1476 –1492 (2005). Google Scholar


Kundel, H.L. & Nodine, C.F., “Interpreting chest radiographs without visual search,” Radiology, 116 527 –532 (1975). Google Scholar


Evans, K., Georgian-Smith, D., Tambouret, R., Birdwell, R. & Wolfe, J.M., “The gist of the abnormal: Abovechance medical decision making in the blink of an eye,” Psychonomic Bulletin & Review, 1 –6 (2013). Google Scholar


Colquhoun, W.P. & Baddeley, A.D., “Influence of signal probability during pretraining on vigilance decrement,” J Exp Psychol, 73 153 –155 (1967). Google Scholar


Wolfe, J.M., Horowitz, T.S. & Kenner, N.M., “Rare items often missed in visual searches,” Nature, 435 439 –440 (2005). Google Scholar


Evans, K.K., Birdwell, R.L. & Wolfe, J.M., “If You Don’t Find It Often, You Often Don’t Find It: Why Some Cancers Are Missed in Breast Cancer Screening,” PLoS ONE, 8 (5), (2013). Google Scholar


Evans, K.K., Tambouret, R., Wilbur, D.C., Evered A. & Wolfe, J.M., “Prevalence of Abnormalities Influences Cytologists’ Error Rates in Screening for Cervical Cancer,” Archives of Pathology & Laboratory Medicine, 135 1557 –1560 (2011). Google Scholar


Wolfe, J.M., Brunelli, D.N., Rubinstein J. & Horowitz, T.S., “Prevalence effects in newly trained airport checkpoint screeners: Trained observers miss rare targets, too,” Journal of Vision, 13 (2013). Google Scholar


Warren Burhenne, L.J. et al., “Potential contribution of computer-aided detection to the sensitivity of screening mammography,” Radiology, 215 554 –562 (2000). Google Scholar


Doi, K., “Computer-aided diagnosis in medical imaging: Historical review, current status and future potential,” Comput. Med. Imaging Graph, 31 198 –211 (2007). Google Scholar


Nishikawa, R.M. et als., “Clinically missed cancer: how effectively can radiologists use computer-aided detection?,” AJR Am J Roentgenol, 198 708 –716 (2012). Google Scholar


Drew, T., Cunningham C.A. & Wolfe, J.M., “When and Why Might a Computer-aided Detection (CAD) System Interfere with Visual Search? An Eye-tracking Study,” Acad Radiol, 19 1260 –1267 (2012). Google Scholar


Berbaum, K.S. et al., “Satisfaction of search in diagnostic radiology,” Invest Radiol, 25 133 –140 (1990). Google Scholar


Nodine, C.F., Krupinski, E.A., Kundel, H.L., Toto L. & Herman, G.T., “Satisfaction of search (SOS),” Invest Radiol, 27 571 –573 (1992). Google Scholar


Berbaum, K.S. et al., “Satisfaction of Search from Detection of Pulmonary Nodules in Computed Tomography of the Chest,” Academic Radiology, 20 194 –201 (2013). Google Scholar


Cain, M.S., Biggs, A.T., Darling E.F. & Mitroff, S.R., “A little bit of history repeating: Splitting up multiple-target visual searches decreases second-target miss errors,” JEP:Applied, (2014). Google Scholar
© (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jeremy M. Wolfe "Visual search from lab to clinic and back", Proc. SPIE 9037, Medical Imaging 2014: Image Perception, Observer Performance, and Technology Assessment, 903702 (11 March 2014);

Back to Top