Medical Image Perception & Performance
Visual Search Workshop SPIE 1997
Visual Search : Mechanisms & Issues In Radiology
1) Reader Error Studies
Why study visual search? Studies on mass screening accuracy after WWII showed large inter & intra observer variations. Out of the 4 techniques studied could not show that any one yielded better diagnostic accuracy because of the large variations. Were not previously aware of the magnitude of these errors.
2) CRN Program
Because of unexpected results in first study, it was repeated (CRN = initials of the radiologists). Got the same results so repeated trying to use only descriptors rather than diagnostic categories & still got wide variations. Not just a matter of semantics.
3) Figure from Garland (1949).
The scientific evaluation of diagnostic procedures. Radiology 52:309-327. Variations in descriptions of roentgen shadows. Corners refer to apparent characteristics of roentgen shadow. Apex = uniform density, left corner = mottling, right corner = honeycombed shadow. Shows individual tendencies of readers to describe shadows in different terms. Reader R (filled in circles) tends to see lesions as mottled or spotty, reader N as more uniform, and reader C as more honeycombed.
4) GMZ Studies
Repeated the study again trying to break down diagnostic process into 3 phases: see, describe & evaluate the lesion. In all 3 categories still got wide variations. These results tend to suggest that differences may lie in perceptual & cognitive factors rather than looking for explanations like bad technique, poor processing etc.
5) Reasons For Errors
From Garland (1959). Again suggesting it is worth looking at perceptual & cognitive factors associated with diagnosing radiographic images.
6) Slide of hidden figure.
7) More Reasons Tuddenham (1960)
Gestalt psychology provided some interesting possibilities for failures of perception. Most prominent being the concept of figure-ground relations. In radiology the normal appearance of the image generally becomes the background against which the figure abnormality stands out. Subtlety of the lesion, camouflage of lesion by normal features etc. can disrupt figure ground organization, making the figure harder to see.
8) Old/young woman figure. Once you organize the figure-ground one way it can be difficult to switch. The same can happen in radiology.
9) Fox in trees figure. Example of how camouflaging of certain features/figures forces you to organize field one way, making it difficult to see the hidden figures.
10) Illusory triangle figure. Certain arrangements of features can cause you to perceive things that are not there. same with FPs in radiology.
11) More Reasons
One of the first papers to suggest looking specifically at visual search processes. Looked at Gestalt principles etc. plus read a book by Buswell (1935) "How people look at pictures" which examined scanning patterns while looking at general pictures (artworks). If can do the same thing in radiology, maybe some answers can be found to see why errors are made. possibly the first person to suggest the SOS phenomenon.
12) More Suggestions
These suggestions come from a number of investigators at the time of the error studies. Many of them are still researched today. Dual : Yerushalmy (1950), Hessel et al. (1978), Beam et al. (1996). Flow charts : Tuddenham (1968, 1969), many radiology textbooks. Checklists : Getty et al. (1988). Heuristics : Wackenheim (1984, 1986, 1987). Directed scanning : Tuddenham (1962). Within directed scanning could probably put CAD & other aids that point things out to radiologists.
13) Why Visual Search?
Why concentrate on visual search? Could focus only on improving the image - better film, processing, computer displays etc. However, the radiologist is still the final answer - the one that has to read the images and reach a diagnostic decision. If we can find ways to understand the processes involved in perception & decision making, maybe we can improve error rates.
14) Figure of rod & cone distributions. There are a number of factors that make it necessary for humans to search images. Physiologically we are limited by the fact that high-resolution foveal vision has a limited range. The fovea subtends an angle of only about 2 deg (size of a dime at arm's length). To see fine details the stimulus information must fall on the cones. Rods have very limited fine detail ability.
15) Visual Acuity fall-off figure. Visual acuity is greatest in the fovea region of the retina. By 5 deg from center of fovea acuity is down by about 70%.
16) Visual acuity on a chest image. Kundel (1975). Concentric circles show fall-off in acuity with fovea at center of the chest (14" x 17" image).
17) T-Scope Studies
Exactly what can be seen is when we restrict visual search? These studies show 1) how important visual search is & why it should be studied, & 2) how much can actually be perceived in a very brief flash of information (200 msec). In a brief flash a number of things can actually be seen in an image. Kundel et al. with chest images, Mugglestone et al. with mammograms. Similar findings in both types of images.
18) Figure from Kundel & Nodine (1975). Shows ROC curves for flash vs free search on chest radiographs. d' about 2.5 for free search vs. about 1.2 for flash presentation.
19) Figure from Mugglestone et al. (1996). Shows ROC curves for flash vs free viewing of mammograms. Az about 0.52 for flash, 0.68 for free search.
20) Figure from Carmody et al. (1980). 360 msec flash presentation of chest image with nodules at various displacements from the center. Progressive drop-off in probability of detection as eccentricity increases. Can find a lot in a flash, but peripheral vision (& corresponding drop in acuity) comes into play very quickly.
21) Peripheral Vision
What about peripheral vision & detection of radiographic lesions? T-scope studies show can detect some lesions peripherally - how much of peripheral vision is used during visual search? Kundel et al. developed an interactive display that either made nodule visible only when foveally hit or only in periphery using various window sizes. Too small of a window degrades performance & it takes longer to hit the nodule. Beyond a 5 deg window get no more improvement. Seems as if we have a useful visual field of about 5 deg which incorporates foveal & some parafoveal vision & we move this around a display during search. Probably want to include parafoveal because need to differentiate normal from abnormal & is more efficient if both are included in the useful field.
22) Kundel et al. (1991) figure. Shows drop off in time to hit nodule with different window sizes. By 5 deg no change.
23) Kundel et al. (1991) figure. Schematic of probability of detecting an inconspicuous nodule during a single fixation as a function of distance from axis of gaze. Cylinder shows roughly the useful visual field.
24) More on Complexity
In radiology can differentiate between random & structured noise. Both affect detection, but structured noise may more directly affect visual search. Revesz et al. (1974, 1977), Kundel (1975), Kundel & Revesz (1976), Kundel et al. (1984, 1985). A number of studies looking at effect of structured noise on probability of nodule detection. Over the years refined definition of nodule conspicuity as it relates to the background complexity. In general, as level of structured noise is increased, probability of detection decreases.
25) Figure from Kundel & Revesz (1976). Shows probability of detecting a nodule in a chest image vs conspicuity.
26) Figure from Kundel et al. (1979). Shows how nodule properties affect detection probability. Detection is therefore a combination of figure (nodule) properties & background (chest anatomy) properties.
27) Other Factors Affecting Detection
Are many other factors affecting lesion detection & hence visual search. Just a few are mentioned here.
28) 1st Visual Search Experiment
Probably the first visual search experiment in radiology. Not a very precise study, but many of the findings have been substantiated in later visual search studies.
29) 1st Eye-Position Study
Probably the first eye-position recording study in radiology. Noted that there are 2 stages in perception of detail : 1) object must be focused on fovea (hence need for search) & 2) information must be translated by brain to meaning. Experiment done with 5 readers just entering training. 30 sec search time.
30) Why Search Only Certain Areas?
Why do readers only fixate certain areas? Kundel divided chest into 64 squares & asked radiologists & lay people to 1) create 8 basic areas & rate them as having highest to lowest information content, 2) recorded eye position to see if fixating correlates with subjective estimates of information content. Correlations were fairly high. Differences between radiologists & lay due to knowledge about nodule probabilities & general experience.
31) Specific Aspects of Search
A number of studies have followed these early studies, looking into more and more specific aspects of the search process. Eye-position recording techniques have been very useful tools. Some of the more important aspects will be discussed in detail.
32) Dwell & Decisions
These are general dwell times associated with the 4 possible decision categories (TP, FN, FP & TN) as derived from a variety of eye-position recording studies (Kundel, Nodine, Krupinski, Hu, Gale, Barrett, Papin, deValk) with 3 types of images. Of interest is to note the similarity of findings across image types. In general, TP & FP decisions are associated with the longest dwell times, suggesting that extensive information processing is associated with reaching these decisions. TN decisions are typically associated with the shortest dwell times suggesting little information processing is needed to reach these decisions. FN decisions have intermediate dwell times, suggesting that at some level some feature extraction/recognition is taking place, but that observers either actively reject these areas (not enough information?) or do not recognize the features as belonging to an abnormality.
33-35) Survival curve figures for chest, bone & mammography. Survival analysis has proven to be a useful tool for characterizing the distributions of fixation dwells associated with the various decisions.
36) Error Classification
One of the goals of studying visual search is to try and determine why errors are made. Kundel et al. used eye-position recording and decision dwell times to devise an error classification system. 3 types of errors were classified based on dwell : scanning, recognition & decision. The cut offs for these decisions may off course vary for different images & tasks.
37) Scanning Patterns
It was obvious from the early studies that search patterns differed among observers. Is there a way to characterize scanning patterns? General pattern types have been observed. They depend on the task & image to a very large degree. Kundel & Wright (1969) told readers to either do a general search or nodule search & saw distinct differences - general look at most areas, nodule look at lungs much more so. Image type affects pattern as well - anatomic layout guides search to some extent - in chest see more circumferential, in bone more circular & radial.
38) Figure from Kundel & Wright (1969). Shows 3 basic search patterns on chests. In nodule search get the circumferential pattern mostly & in general search get the complex pattern mostly.
39-41) Figures of general scanning patterns for chest, bone & mammography images.
42) Comparison Scans
A number of texts/articles do suggest that radiologists adopt specific search strategies. Do they? One common recommendation is to use comparison scans for reading chest images. Carmody et al. (1984) found that radiologists are indeed taught (& do teach others) to use comparison scans, but that in practice they rarely do. Most patterns are circumferential or complex & rarely integrate left-right comparisons.
43) Figure from Carmody et al. (1981). One thing that does promote comparison scans, however, is nodule visibility. As visibility decreases the probability of making a comparison scan increases. If not sure of what you are seeing need to check out similar areas to try and resolve the ambiguities.
44) Comparisons & Task
Task affects probability of comparisons. In mammography find many more comparison scans because you have more images and need to confirm presence of lesions on different views. Beard et al. looked at comparison scan rates to devise workstation for mammography and determine which pairs of images should be pre-stored for easy access.
45) Circadian Variations
Does time of day affect search? To some degree yes. The period right after lunch differs from morning & evening.
46) Search & Viewing Time
Up to this point most of the studies discussed have limited search time to some extent. others have looked at the entire search process & tried to characterize viewing time in general. One important finding that corresponds nicely with the T-scope studies is that some nodules can be found very quickly (in a glance) and some take longer to find. This suggests that there are 2 components to search - a fast & slow one. Some lesions need very little search - can be found quickly & very often peripherally. Some require search & take more time. This leads to various models of the search process.
47) Figure from Christensen et al. (1981). Viewing time differs for staff vs residents. Time at which TPs & FPs made also differ. Staff tend to make more TPs early in search, residents more spread out into later search. Staff & residents make FPs throughout search, but staff tend to stop search before FP rate gets too high compared to TPs.
48) Figure from Christensen et al. (1981). Shows how search has a rapid & slow component.
49) Figure from Oestmann et al. (1988). Search time is affected by lesion subtlety. The more obvious the lesion the more likely it is to be detected at a variety of viewing times (0.254, 1, 4 sec & unlimited).
50) Modelling Search
Much of the search data has led to development of various models. Can approach from a number of ways (cognitive, perceptual, mathematical) & use for a number of purposes (general visual models, task oriented - WS design).
51) Figure from Kundel et al. (1987). Kundel et al. compared human search to two models - random search & systematic search (covers whole image systematically). Found that human search is much more like a random search process, but is not totally random. So what is it?
52) Figure of Neisser's general search model from psychology. Shows interaction between the stimulus to be searched, exploration processes & a general cognitive schema that helps guide search & is modified by search. Many later models incorporate some general aspects of this model. Swensson used this basic model to suggest a basic 2-stage model of search : perceptual recognition & decision-making.
53) Figure of Blesser's model for the radiologic process. Breaks the process down into psychophysical, psychological & nosological parts. Visual search falls into psychological & nosological parts. Parts are not sequential because can get feedback. This is a very basic model but does represent 3 very important aspects of the radiologic process in general. the psychological phase does bring in some aspects discussed in early papers - Gestalt principles, gaining meaning out of patterns etc.
54) Early model by Nodine & Kundel (1987). Specifically on the search & decision process. Global : peripheral vision has a big role, immediate impression, overall characteristics perceived. Discovery : systematic sampling with useful visual field, if find lesion use Foveal Verification to identify & confirm or can return to it after subsequent search & then decide (Reflective Search). If for some reason time is limited may use Post-Search Recall to identify lesions. After all this reach a decision. Can alternate between global & discovery & verification search. Prior knowledge obviously affects Global part but also impacts in all other areas & other areas add to base of knowledge as more images are searched.
55) Figure from Kundel & Nodine (1983). Shows how 1) prior knowledge & 2) recognition of features leads to increase in fixation time (foveal verification) on image features.
56) Figure from Nodine et al. (1987). A more advanced model of the search process. Still incorporates the same basic features - a global phase of basic characteristic recognition, scanning processes & focal attention to specific details to get a plausible perception interpretation, and a final decision phase. Again is a cycle not a serial process.
57) Figure from Gale et al. (1994). Similar model of search specifically for mammography, incorporating same basic global & focal aspects.
58) Influence of Experience
All of the models have had a component called experience, prior knowledge etc. & implied that it has some effect not only on diagnostic accuracy/performance, but on search as well. Listed are some aspects of the search process where experience or prior knowledge has some effect of search. These findings have been seen for chest, bone, mammography & dental images. Possible to train people to adopt patterns (Gale et al. 1983), but it takes a lot of training & a good deal of feedback so it can become automatic.
59) Figure from Krupinski (1996). Scanning patterns of experienced vs inexperienced reader on a mammogram. Experienced find lesions earlier & tend to spend less time searching lesion-free areas. Could be that global view is much more efficient at identifying areas that are suspicious vs those that are not.
60) Figure from Nodine et al. (1996). In mammography, experienced readers fixate nodules earlier than less experienced & then fixate them in another view (i.e., CC vs MLO or right vs left) much more quickly than less experienced. Again, prior knowledge & experience guide search patterns - know to confirm or validate initially detected lesion in the other view.
61) History & Prompting
Another aspect of prior knowledge is clinical history or some other type of prompting. Especially now with advances in CAD schemes, the effects of prompting on search & decision-making need to be examined. Kundel & Wright (1969) already showed that prompting for a given task (general vs nodule search) does influence scanning patterns. What else does it affect?
62) Global vs Segmented
This studying looked at prompting in the sense that it limited search to only specific segments of a chest image, so all observer had to do really was decide nodule present/absent. Basically performance deteriorated in segmented search because TPs were the same as with whole viewing but FPs went up. Possibly 1) need whole image available for global perception of general characteristics of the film & 2) perhaps need some amount of comparison scans between segments. Not exactly prompting, but might suggest that limiting search could be detrimental.
63) Figure from Carmody et al. (1980). Graph showing effect of global vs segmented search for staff vs residents.
64) "Superiority of Search"
A debate arose as to whether prompting/history (limiting search) was an effective means of improving perception. Swensson et al. took the position that there was a "superiority of search" - prompting did not improve performance, it merely resulted in a criteria shift that gave more TPs & more FPs.
65) "Superiority of Search"
Swensson et al. carried out a number of experiments (various chest lesions) limiting search by giving various types of histories - broad vs specific vs extremely specific. Overall, it was found that free search generally is the same as search with history and was often better (FPs would increase without search). Different cases were used, different analyses etc. but the result was always the same. However, other groups showed that history does help & argued about methodology 1) how the areas were chosen for non-search evaluation & 2) how these areas were given default negative ratings for the search condition, possibly biasing the results.
66) Figure from Swensson et al. (1982). One of many similar figures showing the "superiority of search" result - free search vs focused search (broad area defined) vs non-search (specific location & description given).
67) History Helps
A lot of other studies were showing that providing history does indeed improve overall performance - TP increases without FP increase.
68) Berbaum et al. Studies
Main opponent of the Swensson et al. results. Berbaum et al. studies have generally shown that history does improve performance. Used a variety of chest lesions & found that history helped in all cases but not too much for nodules. TP increased with history while FP did not change. Concluded that history actually results in an improvement in perception, not just a criterion shift or decision-making. Why not nodules? Attentional & perceptual processes may differ for nodules. Nodules may be distorted more by interactions with the background, so typically small, round nodules may not look so small & round when interact with background anatomy (camouflaged). If history primes the reader for specific nodule features & those features are camouflaged, then detection/recognition may be less likely to take place.
69) Figure from Berbaum et al. (1986). ROC curves for prompted & unprompted lesions - diverse & nodules. Diverse improves performance, nodules not really. 70) Berbaum et al. Studies
A number of other Berbaum et al. studies have confirmed the positive influence of history, of films etc. on detection of lesions in a variety of film types.
71-72) Figures from Hutt et al. (1994) & Mugglestone et al. (1996). Demonstrate effects of history/prompts in mammographic images (CAD). Mugglestone results show one possible negative result from prompting in mammography - readers tend to make fewer comparisons between images with prompting.
73) History & Detection Times
Berbaum et al. looked at effect of history on time course of detection (also an SOS study, dealt with later). Used native lesions &/or nodules with native or metastatic disease history or no history. Giving a history for a native lesion with & without distracter nodule present improves accuracy. When a distracter is present with a distracter history do get decease in detection of native lesion (SOS). No history in general worse than history. History for both nodule & native increases the speed with which the prompted lesion is found. With appropriate history the prompted lesion is found prior to detection of other lesions.
74) Figure from Berbaum et al. (1993). Detection vs seconds of search for normal (left) & abnormal images (right). Normal = no native lesion. A & B : no nodules, native history. C & D : nodules, native history = native found prior to nodule. E & F : nodules, mets history = nodules found prior to native & native lower than when no nodules present.
75) Perceptual Feedback
Perceptual feedback (using eye-position dwell data to feed back areas of prolonged dwell) can be seen as a form of prompting. In this case perceptual feedback resulted in a 16% improvement (increase in TP, decrease in FP) in performance compared to a second look without feedback. Initial view while eye-position recorded provides areas for feedback. second view seen with or without feedback.
76) Figure of chest image with feedback circles superimposed.
Why do prompts work to improve performance? Probably do affect basic perceptual & attentional mechanisms. Later studies by Krupinski et al. (1993) & Kundel et al. (1995) showed that prompts do affect search parameters such as likelihood of fixating lesions & how many times. Possibly prompts act as some sort of fiducial marker, telling eyes where to fixate in reference to where just looked - keep the eyes & attention in the appropriate place, increasing likelihood of fixating lesion.
78) Satisfaction of Search
Tuddenham noted in 60s that SOS - breaking off search once quest for meaning has been satisfied - could be one source of errors. Have we made any progress in documenting & explaining SOS?
79) Does it Exist?
Yes SOS exists. Berbaum et al. have done most of the studies. For fractures & chest images SOS results with a decrease in TPs but not FPs. In contrast studies of the abdomen get decrease in FPs as well. Possibly different mechanisms operate for different images/tasks. In these cases (chest) nodule was more conspicuous than subtle native lesions.
80) Figure from Berbaum et al. (1990). Effect of nodules present on detection of native lesion (left). Right : FROC curves showing same effect.
81) Time Course of SOS
Early experiment to try and determine why SOS occurs - what is its effect on search time? Used an interruption technique to study - observer stops search each time abnormality detected, reports while image gone, then can resume search. Time to native (25.5 vs 23.9 sec with & without nodule) or nodule (18.2 vs 17.a sec with & without native) not affected by presence of the other. Will find nodules earlier in search than natives. FPs found later in search than nodules or natives (around 33 sec). Total search about 45 sec in all cases. Conclusion : is not early termination of search. More like stopping search before FPs get to high (as in Christensen). What is it? We have feature detectors which are not independent of each other & which are mediated by attention. If are activated by one set of features (nodules) this may inhibit or mediate attention for other features (native lesion). It makes it less likely that attentional & perceptual processing resources will be available to allocate to the native lesion, so it remains undetected. Is a perceptual set or bias (goes back to Gestalt from earlier - seeing figure-ground one way inhibits seeing it another way).
82) Figure from Berbaum et al. (1991). Time course of SOS as described above.
83) Figure from Berbaum et al. (1991). Shows the same effect in terms of decision confidence. 1 = definite 4 = possible but probably normal.
84) SOS & Eye-Position
Eye-position recording may provide a more accurate measure of why SOS occurs.
85) Figure from Samuel et al. (1995). Native lesions were more obvious than nodules so found a reverse SOS effect here - native affects detection of nodules. If image has no native lesion nodules detected more than if native present (SOS) - left. Native lesions & secondary subtle lesion features not affected by nodules (right).
86) Figure from Samuel et al. (1995). Dwell on FNs (nodules) greater on native-free than native lesions. Suggests native does detract from fixating nodules to some degree but FNs are fixated so not as if lesions were not fixated at all.
87) More Results
Readers had 30 total but could terminate search early if chose to. In general terminated earlier for single native lesions than for multiple natives or those with nodules - SOS not really due to early halt if nodule image search lasted longer than native search. When there was premature halt there were FNs. But FNs were fixated just as often as TN areas. Based on eye-position data errors ( n = 37 in native, 27 in native-free) could be classified. Get fewer scanning errors when native present & more decision errors in native free. Not due to lack of fixating.
Obvious native lesions detected early in search decreasing attention & perceptual resources later on (capture of attention). Missed lesions are still fixated but not as much as when native-free. More salient features satisfy quest for meaning & may search around more but perceptual set makes feature activation less likely for other types of features contributing to early termination.
89) Can We Disregard Inadequate Scanning?
Samuel et al. showed that for nodules they are scanned but inadequately and recognition does not take place. Berbaum et al. suggest contrast studies of abdomen may have a different SOS explanation since in these SOS studies saw a decrease in FPs as well as TPs. Recorded eye-position on contrast studies of the abdomen (plain-film then with contrast). SOS = see lesion on plain film then miss in contrast study. Without PF abnormality time on contrast > on plain (97 vs 82). With PF abnormality time about same (100 vs 95). Therefore not premature halting of search. Basically, on contrast film spend much more time on contrast areas & less on the plain film areas than spent looking at plain film areas on the plain films. Lesions in either case that are detected get prolonged dwell, while lesions in either case that are missed get less dwell.
90) Error Types & Mechanism
Look at error types by dwell times - in contrast films get a huge increase in scanning errors to the plain film abnormality areas. This can be seen as a decision not to scan/visual neglect. Differs from nodule search where more FNs were actually fixated but not reported. Is a definite case of capture of attention.
91) Figure from Berbaum et al. 1996. Time course of SOS from eye-position. On right shows that still fixate PF area a little during search - right at beginning & right at end, but contrast area quickly grabs attention. Fixating on PF area later in search may not matter because readers tend to be reporting what's in the contrast area - may just be idling.
92) Workstation Design
We can study visual search in other areas as well. One area has been characterizing search patterns to design workstations. If we know how people scan films than we can devise pre-load patterns and decide how many monitors are really needed. Beard et al. (1991, 1997) conducted such a study. Comparison scans were important for deciding on pre-load patterns.
93) Assess Effects of Modality Differences
Can look at if & why scan patterns are different for different ways of displaying the same images (e.g., monitor vs film). Does the presence of a menu affect scanning? Are scan patterns & dwell times different? Do these factors affect diagnostic performance? Will they affect use of the new imaging modalities?
94) CAD & Visual Search
Goes back a little to effect of prompts, but is specific to CAD. Does CAD affect the people search images? Yes it does. It may give readers a false sense of security. May be more harmful for less experienced readers who 1) may depend on CAD since their global perceptions are less efficient & 2) may not easily disregard the multiple FP areas prompted.
Reader Error & Variabilty