Two noise experiments have been done to determine the perception level of camera noise in dynamic sequences of medical low-dose X- ray images. The X-ray dose per image was at a fluoroscopic level of 1(mu) R/image. In the first experiment an adjustable amount of white noise was mixed with the video output signal of a standard medical X-ray image-intensifier/video-camera imaging chain. The paper explains the difficulties with this experiment and how the second experiment was organized to find more reliable results on noise perception. The second experiment was based on software simulations of both quantum noise and camera noise. The perception threshold was determined with a so-called two- alternative forced-choice method while presenting the noisy image sequences dynamically on a split-screen display. The perception threshold of the camera noise in the image was at a level as low as 27dB below the signal level.
Investigation of human signal-detection performance for noise- limited tasks with statistically defined signal or image parameters represents a step towards clinical realism. However, the ideal observer procedure is then usually nonlinear, and analysis becomes mathematically intractable. Two linear but suboptimal observer models, the Hotelling observer and the non- prewhitening (NPW) matched filter, have been proposed for mathematical convenience. Experiments by Rolland and Barrett involving detection of signals in white noise superimposed on statistically defined backgrounds showed that the Hotelling model gave a good fit while the simple NPW matched filter gave a poor fit. It will be shown that the NPW model can be modified to fit their data by adding a spatial frequency filter of shape similar to the human contrast sensitivity function. The best fit is obtained using an eye filter model, E(f) equals f1.3 exp(-cf2) with c selected to give a peak at 4 cycles per degree.
We find that pulsed fluoroscopy of stationary objects gives average dose savings of 22 percent, 38 percent, and 49 percent at 15, 10, and 7.5 acq/sec, respectively. Dose savings depend on object size, with less savings for smaller objects. A model that includes the spatio-temporal response of the human visual system (HVS) describes these results with one free parameter. Pulsed-30 is always better than continuous for detecting moving objects. Likewise, pulsed-15 saves dose as compared to pulsed-30, but the savings decrease for high velocities and small objects.
The purpose of this study was to enhance the ability of quantitative sonography to distinguish between B-scan images of malignant and benign lesions of the breast. Several second-order pixel gray level statistics have been used to achieve a good but not acceptable diagnostic accuracy in characterizing breast lesions. Therefore, this study sought to optimize the diagnostic accuracy of second order statistics. The co-occurrence matrix is the most useful second-order statistic so far studied. It is an estimate of the joint probability distribution of gray levels of two pixels separated by a given distance and orientation. Several distances and orientations have been tried previously, but no systematic attempt had been made to find the optimum parameters for diagnosis. In this study, co-occurrence statistics of malignant and benign lesion images were determined as a function of distance and orientation. In particular, the correlation function was modeled as a separable, exponential function, first order for increments in both the x and y directions. Model parameters were used as features for discriminating benign from cancer lesions. An attempt was made to optimize the features by excluding the noisy data from the fit and again using the model parameters.
A new mathematical technique is described that makes possible to find an exact analytical solution of the quadratic optimization problem of maximizing the detectability index d' for visual signal detection on a white noise background. The model of observer is assumed to have one channel with 2D spatial sensitivity function A(x), satisfying limitation A(x) <EQ 1 and internal noise which is taken to be a stationary white noise process. The two types of internal noise (independent and dependent on input luminance distribution L(x)) are considered. The technique is based on the substitution A(x) by one or two (in dependence on type of internal noise) specially defined nonnegative measures. This substitution makes possible to introduce a generalized operator, concerned the joint action of spatially transformed external (image) noise and internal noise, and thereby to obtain an analytical expression for the optimal function A(x) which provides the largest value of d'. The function A(x) is proved to depend on not only the signal to be detected, but on internal/external variance ratio as well.
We developed techniques to unobtrusively track direction and pupil diameter of radiologists reading a wide variety of films. This study concentrated on mammography since the use of mammograms is important for successful treatment of breast cancer, and we wished to determine how previous studies of eye gaze with lung films relate to the specialized field of mammography. Our objective was to identify eye gaze patterns in mammographic experts as they observe features, such as masses and microcalcification clusters: identification of these patterns could lead to improving the rate of early detection of breast cancer. Our near IR light system successfully tracked eye gaze direction and pupil diameter of mammographic experts evaluating films. The association of long eye gaze dwells with diagnostic accuracy varied with the type of object being viewed. In films with masses, false positive diagnoses were associated with long dwells: this is similar to published results of observing lung nodule diagnosis. In mammograms with microcalcifications, true positive diagnoses were associated with long dwells. The association of prolonged dwells with true positive diagnoses of microcalcifications is a new observation.
We have been monitoring the eye fixations of radiologists in order to identify why errors occur in the detection of pulmonary nodules. From these studies we have concluded that most errors are due not to overlooking, but to failure to report nodules that have been fixated by the eyes, often for prolonged gaze durations. Observers visually attend to the nodule, perhaps even apply decision criteria, but falsely decide that it is not the target of search. Because omission errors are associated with clusters of eye fixations having prolonged gaze durations, visual dwell can be used to predict the locations of potentially-missed nodules. A computer-assisted perception (CAP) algorithm has been developed to localize potential lung nodules based on measuring the durations of clusters of eye fixations during search for nodules. In an experiment to test the effectiveness of CAP, observers' eye-position was recorded during initial image reading. This was followed by CAP which consisted of playing back the highlighted locations of prolonged dwells indicating sites of potentially-missed nodules. The observers evaluated each CAP site for nodules and revised their initial decisions. For comparison, the same observers read the images a second time, after a two- month interval, without CAP in a counterbalanced design. The CAP condition resulted in a 16 percent increase in detection performance (AFROC) compared to the non-CAP condition.
Perceptual feedback (by circling) of chest image areas receiving prolonged gaze duration significantly increases pulmonary nodule detection performance. Other methods of perceptual cueing do not lead to such dramatic increases in performance. The mechanisms by which circling influences detection performance were examined. The results of a number of experiments indicate that circling improves nodule detection because (1) the circle isolates the nodule-containing region from the rest of the image, making the disembedding and perceptual integration of nodule features more likely, (2) the circle insulates the region-of-interest from distracters in the chest anatomy outside of the circle boundary which tend to interfere with attention and detection processes, and (3) the circle increases the precision with which the eye fixates relevant nodule features within the region-of-interest, and decreases the dispersion of fixations within this area. The facilitative effects of circling thus seem to influence some basic visual processes. The results should be generalizable to other types of radiological search and detection tasks.
An overview of the UK breast screening program is presented and the importance of observer performance to its success is shown. A conceptual model is presented which leads to the consideration of diagnostic errors in three classes. Appropriate training can help reduce the occurrence of these errors. Data are presented from a national self assessment program which aims to give radiologists insight into aspects of their performance. In particular, information on calcification detection and the benefits of double reading is considered as a means of improving cancer detection rates.
Previous models of radiological skill have been primarily concerned with either visual search or with the interpretation of abnormality. In breast cancer screening both skills are important for the efficient assessment of screening mammograms. They can be accounted for in a double two-stage model of observer performance, incorporating detection and interpretation levels. Each consists of an analysis stage and a decision stage. The model is evaluated in terms of its ability to explain results from two training studies involving lay people learning to read mammograms. In the first study, a comparison of novice error rates with those of trainees with prior experience in mammography revealed that novices make more detection errors whereas interpretation errors are more frequent in experienced trainees. The observed differences support the notion that there are two separate stages of processing in mammogram interpretation. Accurate visual processing may be a developmental precursor of full interpretation. The second study demonstrates that even specific instruction on abnormal features in mammograms leads to improved recognition of normality in complete novices. This is predicted by the present model because true positives require a more complex processing route. The advantages of a process model which takes account of decision outcome for radiological skill are discussed.
We have studied the influence of the parameter gamma on the perception of digital X-ray images. Gamma is the exponent of the function describing how grey values are transformed into luminance. We have applied several values of gamma to angiographic images, and we have presented these images on a monitor (soft copy) and also as films on a light box (hard copy). The perceived contrast and quality have been assessed with the aid of category scaling techniques. Radiologists, technical experts (people involved in the development of X-ray systems) and nonexperts participated in the experiments. We found that perceived contrast increases with gamma. Gamma also affects the perceived quality, but the effect depends on the image contents. Furthermore, we found slightly different judgments for different groups of subjects. Technical experts appreciated higher gamma values than nonexperts did. In the case of radiologists, the perceived quality depended on the image (and on its diagnostic purpose) to a much greater extent than in the case of nonexperts. No differences were found between the results for hard and soft copies.
Following a large ROC study to assess diagnostic accuracy of PA chest computed radiography (CR) images displayed in a variety of formats, we asked nine experienced radiologists to subjectively assess their acceptance of and preferences for display modes in primary diagnosis of erect PA chest images. Our results indicate that radiologists felt somewhat less comfortable interpreting CR images displayed on either laser-printed films or workstations as compared to conventional films. The use of four minified images were thought to somewhat decrease diagnostic confidence, as well as to increase the time of interpretation. The reverse mode (black bone) images increased radiologists' confidence level in the detection of soft tissue abnormalities.
Attempts to implement digital chest imaging in the Neonatal Intensive Care Unit at Georgetown University have proven unsuccssful. This paper explores some of the reasons for this. Based on our experience we will also make suggestions of how to best use the storage phosphor system, should one wish to implement it.
Judging the perceptual quality of processed images is a cognitive process in which the perceived impressions of basic image attributes, such as sharpness and presence of noise, play an important role. In this paper, we show how multidimensional scaling can be used to study the perceptual factors that influence the quality impression of computed tomography images processed by a noise reduction technique. We also show how intersubject differences in the assessment of image attributes can be characterized.
In this paper, we consider the nature of mental workload and how one might determine whether mental workload effects the potential accuracy of a particular medical image display method. Then we detail observer experiments we have conducted evaluating electronic display of CT images. While detailing these results, we will focus on the evidence we have accumulated which suggests that mental workload has a significant influence on workstation accuracy and interpretation speed. These results suggest that experiments involving interpretation tasks that are very similar to those in the clinic are needed in addition to conventional ROC analysis when evaluating the effectiveness of a new display device or method.
We are currently developing an easy-to-use, microcomputer-based software application to help researchers perform ROC studies. The software will have facilities for aiding the researcher in all phases of an ROC study, including experiment design, setting up and conducting test sessions, analyzing results and generating reports. The initial version of the software, named 'ROC Assistant', operates on Macintosh computers and enables the user to enter a case list, run test sessions and produce an ROC curve. We are in the process of developing enhanced versions which will incorporate functions for statistical analysis, experimental design and online help. In this paper we discuss the ROC methodology upon which the software is based as well as our software development effort to date.
The digital archiving system at Madigan Army Medical Center (MAMC) uses a 10:1 lossy data compression algorithm for most forms of computed radiography. A systematic study on the potential effect of lossy image compression on patient care has been initiated with a series of studies focused on specific diagnostic tasks. The studies are based upon the receiver operating characteristic (ROC) method of analysis for diagnostic systems. The null hypothesis is that observer performance with approximately 10:1 compressed and decompressed images is not different from using original, uncompressed images for detecting subtle pathologic findings seen on computed radiographs of bone, chest, or abdomen, when viewed on a high-resolution monitor. Our design involves collecting cases from eight pathologic categories. Truth is determined by committee using confirmatory studies performed during routine clinical practice whenever possible. Software has been developed to aid in case collection and to allow reading of the cases for the study using stand-alone Siemens Litebox workstations. Data analysis uses two methods, ROC analysis and free-response ROC (FROC) methods. This study will be one of the largest ROC/FROC studies of its kind and could benefit clinical radiology practice using PACS technology. The study design and results from a pilot FROC study are presented.
The ability of human observers to detect low-contrast targets in screen-film (SF) images, computed radiographic (CR) images, and compressed CR images was measured using contrast detail (CD) analysis. The results of these studies were used to design a two- alternative forced-choice (2AFC) experiment to investigate the detectability of nodules in adult chest radiographs. CD curves for a common screen-film system were compared with CR images compressed up to 125:1. Data from clinical chest exams were used to define a CD region of clinical interest that sufficiently challenged the observer. From that data, simulated lesions were introduced into 100 normal CR chest films, and forced-choice observer performance studies were performed. CR images were compressed using a full-frame discrete cosine transform (FDCT) technique, where the 2D Fourier space was divided into four areas of different quantization depending on the cumulative power spectrum (energy) of each image. The characteristic curve of the CR images was adjusted so that optical densities matched those of the SF system. The CD curves for SF and uncompressed CR systems were statistically equivalent. The slope of the CD curve for each was - 1.0 as predicted by the Rose model. There was a significant degradation in detection found for CR images compressed to 125:1. Furthermore, contrast-detail analysis demonstrated that many pulmonary nodules encountered in clinical practice are significantly above the average observer threshold for detection. We designed a 2AFC observer study using simulated 1-cm lesions introduced into normal CR chest radiographs. Detectability was reduced for all compressed CR radiographs.
We explore the relative value of a number of commonly used performance indices in breast cancer screening by mammography. Such indices are used extensively in quality assurance evaluation of the screening services on offer and it is important that they provide an accurate measure of performance. The conditional probabilities, sensitivity and specificity are widely used as measures of performance. While these measures are necessary elements in performance evaluation, there is an interest in providing a single composite measure of general performance which remains stable over situations with differing incidence of abnormality. Such indices should be useful in the evaluation of performance of individual screeners as well as entire screening centers. In addition they will be of value in the assessment of training where the ratio of abnormal to normal mammograms is higher than in practice. Three classes of overall performance index are identified and discussed, probabilities, area under the ROC, and correlation. A Monte Carlo study is described in which 13 indices are compared on a large number of data sets with various criteria levels. Criteria used to evaluate these indices were as follows: ease of interpretation, ease of calculation, evenness of distribution, not influenced by differences in criteria. Two indices emerge with properties that make them particularly useful for mammography screening. These are the Pollack-Norman Area index and the point-symmetry adjusted phi (or G-Index).
Many investigators have pointed out the need for performance measures that describe how well the images produced by a medical imaging system aid the end user in performing a particular diagnostic task. To this end we have investigated a variety of imaging tasks to determine the applicability of Bayesian and related strategies for predicting human performance. We have compared Bayesian and human classification performance for tasks involving a number of sources of decision-variable spread, including quantum fluctuations contained in the data set, inherent biological variability within each patient class, and deterministic artifacts due to limited data sets.
Much of the reading of medical images is in terms of figures: objects or object components. We focus on the part of the visual process in which the viewer attentively scrutinizes objects to determine their shape and their geometry relative to other nearby objects. We propose and give evidence that figures are represented in terms of their middles and widths by a mechanisms that separates information at various levels of detail. Important figural properties, as well as inter-figure relationships, appear directly in or are easily derived from this representation by a set of cores. Cores are traces in scale space that are derived from a value called medialness. Medialness is, in turn, derived from boundariness values, which are multiscale graded measurements derived directly from the image intensities. Core formation is guided by attention which directs processing to an approximate location and size (scale), i.e., about here, about this width. This paper gives justification for the model and discusses its usefulness in modeling various visual tasks that are performed when viewing medical images.
Medical image quality can be defined in terms of observer performances in detecting image abnormalities, since diagnosis is essentially based on visual inspection of medical images. There exists a large body of theoretical and experimental work specifying it in terms of signal to noise ratio, area under the ROC curve, and detectability index. However, the comparison between the theoretical and experimental figures of merit (FOMs) is made difficult because FOMs do operate on different signals and observers. In this paper we investigate the relationships between such signals, observers, and FOMs; the soundness of the underlying assumptions; and the possibility of optimizing image display. In section 2 we define three signal-observer pairs for which the main theoretical and experimental results are recalled. We also present the results obtained in our lab to show their consistency with results found in the literature. In section 3 we describe an experiment designed to evaluate the relationships between the three types of signal-observer pairs, and to assess the robustness of the model with respect to the assumptions. We also present in this section the results of this experiment. In section 4 these results and the relevance of FOMs are discussed.
The main focus of this paper is on the extent to which radiological expertise is based on low-level perceptual processes. Experiment 1 showed that naive observers can perform well above chance level in classifying mammograms with just a few hours training. Experiment 2 showed that expert radiologists performed better than naive observers on a 'perceptual simulation' of a radiographic task, even though high-level knowledge of anatomy and disease processes was of no assistance. Experiment 3 showed that one of the fundamental parameters of the visual system likely to be involved in radiographic performance- contrast sensitivity-could be improved with practice. Experiment 4 showed that naive observers improved on a similar perceptual simulation task as used in experiment 2, and that although there was partial interocular transfer, the results suggest that at least some degree of learning was based on low-level perceptual processes. Overall the results show that some degree of skill in radiological search can be acquired with no high-level medical knowledge at all, and that some aspect of radiological skill may be based on changes in the effectiveness of early visual processes.
Acutance refers to the sharpness and thus the perceptibility of features in an image. Currently available methods for characterization of perceptibility of image features are either highly subjective or statistical. There is a need for a purely objective measure to characterize the perceptibility of image features. This paper proposes a measure of acutance based on gray-level variations across the boundary of an object. An algorithm to calculate the acutance based on region growing and a derivative measure across region boundaries has been designed and implemented. After testing the algorithm on various images, it is shown that this measure can accurately reflect changes in the appearance of objects due to blurring and sharpening. Using this technique, it is possible to quantify the level of enhancement in a digital image by calculating the acutance before and after the enhancement operation.