Ocular fundus imaging has long played a key role in the documentation, diagnosis, and progression assessment of ophthalmic diseases. With the advent of digital cameras, ophthalmic imaging changed dramatically. Among the advantages of digital imaging are the ease and speed of access to data, fast and exact duplication, archiving and transmission, and digital image analysis techniques. Altogether, these advantages set the foundations for modern ophthalmology in the framework of telemedicine.
Fundus cameras can be mydriatic or non-mydriatic. Mydriatic fundus cameras require pharmacological dilation, while non-mydriatic cameras use a near-infra-red (NIR) viewing system to exploit the patient’s natural dilation in a dark room.1 Infra-red light is used to preview the retina on a video monitor. Once the monitor’s image is focused and aligned, a flash of visible light from a Xenon arc lamp is fired and the image is captured. Non-mydriatic fundus cameras are equipped with a focusing mechanism that displaces a compensation lens. It is basically an aspheric objective lens design that, when combined with the optics of the eye, matches the image plane to the eye fundus. The focus control of the fundus camera is used to compensate for refractive errors in the subject’s eye. Until recently,2 these cameras were entirely operated manually with the focusing mechanism assisted by a split-line visual aid. Manual focusing is error prone, especially when performed by inexperienced photographers, and may lead to images that require additional restoration or enhancement.3 The auto-focus feature offered in new retinal cameras is a significant advance that ultimately leads to a more robust imaging system, especially for medical screening purposes. On the other hand, the auto-focus feature still relies on the split-line mechanism, whereas in this work we propose a passive focus measure (FM) completely based on image analysis. For further details on fundus imaging, the reader is referred to Refs. 1, 4, and 5.
In Ref. 6 we studied the performance of several state-of-the-art no-reference image quality metrics for eye fundus imaging. The most interesting finding relates to the importance of directional properties with image quality. In other words, the measure of anisotropy as a quality metric. This was proposed by two co-authors of this paper (Gabarda and Cristóbal) in Ref. 7 and it represents an important step forward in the area of no-reference quality metrics. However, given the properties of the NIR fundus focusing system, the FM should be robust to noise (spatial and temporal) and to changes in illumination and contrast. Furthermore, real-time imaging is constrained by the overhead required to compute the directional Rényi entropy as described in Ref. 7. Therefore, in this work we made use of our findings on the directional dependency of eye fundus images against defocus to define a new and robust FM based on the directional variance of the normalized discrete cosine transform (DCT). The FM proposed here could impact the design of portable retinal cameras with autofocus function or the manufacturing of low-cost retinal cameras because they would require fewer optical components.
In a single-lens optical imaging system operating within the paraxial regime, the process of focusing consists in adjusting the relative position of the object, the lens, the image sensor, or a certain combination of the three to obtain a focused image. Let be the focused image of a planar object and a sequence of images recorded for a sequence of camera parameter settings. The eye fundus is actually a curved surface; however, in our case corresponds to a small region of the fundus so it can be considered as an isoplanatic patch.8 We consider the variation of only one camera parameter at a time—either the lens position or the focal length. The acquired set of images can be expressed by convolution9 Ideally, the best possible case occurs when , therefore . In practice, all have an unknown low-pass filter effect.
An FM may be understood as a functional defined on the image space which reflects the amount of blurring introduced by . Let be the FM with which we look for the “best” image by maximizing/minimizing over . A reasonable FM should be monotonic with respect to blur and robust to noise. Groen et al.10 used eight different criteria for the evaluation of focus functions. Ideally, the focus function should be unimodal, but in practice it can present various local maxima that can affect the convergence of the auto-focus procedure. Moreover, the focus curve should ideally be sharp at the top and long tailed, which can accelerate the convergence of the screening procedure.
Various FMs have been reported in the literature.2,9,1112.–13 They mainly consist of a focus-measuring operator that estimates the sharpness of the image. The image that yields a maximum FM is considered as the focused one. Almost all FMs depend directly on the amount of high-frequency information in the image. The high-frequency components correspond to edge information. On the other hand, their accuracy can deviate depending on the content of the processed images. Because well-focused images have sharper edges, they are expected to have higher-frequency content than blurred ones.13 The common FMs are based on norm of gradient or second derivative of the image, gray level variance, and Laplacian energy. Surprisingly, little is known about the performance of these methods for fundus imaging, and the literature on this subject is scarce.
To the best of our knowledge, there exist only two published works that deal with autofocusing in retinal imaging;2,14 however, they use conventional mydriatic imaging in the visible spectrum, which is not our case. In Ref. 14 they do not propose a FM; instead they use several preprocessing operations to improve the performance of traditional FMs for segmentation purposes. On the other hand, in the recent work by Moscaritolo et al.,2 they propose a filtering technique to assess sharpness of optic nerve head images; however, they do not compare with other methods. In this section we briefly summarize five notable approaches—including that of Moscaritolo et al.2—for later comparison with our proposed method.
The first FM was proposed in Ref. 2. It may be defined mathematically as15 proposed a noise-insensitive FM based on the summed modified Laplacian operators. When two second partial derivatives with respect to horizontal and vertical directions have different signs, one offsets the other and the evaluated focus value is incorrect. The method is a modification to obtain the absolute value of each second partial derivative as 16 is a high-pass nonlinear filter based on the difference of medians. It is well known as a nonlinear edge detector that removes impulsive noise effectively. The FSWM uses several nonlinear subfilters having a weight according to the frequency acting like a bandpass filter as
Subbarao and Tyan17 analyzed the robustness of three FMs: the image variance (not included here), , and . They recommended using because of its tolerance to additive noise; however, the differences among individual measures were not significant. There are many other FMs, such as the wavelet-based FM proposed in Ref. 12 or the mid-frequency discrete cosine FM in Ref. 11, but they were not included in our study either because of their lack of robustness to noise or their complex implementation. For a review and evaluation of FMs in natural images, the reader is referred to Refs. 9 and 13.
Discrete Cosine Transform
DCT is an invertible, linear transformation . An image is transformed to its spectral representations by projection onto a set of of orthogonal two-dimensional (2-D) basis functions. The amplitudes of these projections are called the DCT coefficients. Let , for and , denote an image and its DCT denoted by , given by the equationFig. 1). Therefore, low-order coefficients depict slow spatial variations in image intensity, while those of higher orders depict rapid variations.
The DCT is closely related to the discrete Fourier transform (DFT), a standard tool in signal processing, and has been reported as a suitable transform for spectral-based focusing algorithms.18 However, the DCT has a greater energy compaction property than the DFT, i.e., most of the image information tends to be concentrated in a few low-frequency DCT coefficients. This is also why the JPEG compression standard is based on the DCT. In addition, many efficient schemes for the computation of DCT exist,19 and hardware implementations are commonly available.20
The normalized DCT21 of an image is defined as
For illustrating the nature of blurring and the behavior of the DCT, we take the red channel from a sharp RGB fundus image (because it resembles more the NIR image) and simulate the imaging system as a linear shift-invariant system to acquire a sequence of images by varying the lens position. This was carried out by means of Fresnel propagation. In Fig. 2, we show the original sharp image, image patches of both the sharp and blurred images, and their DCT spectra (in the same log scale). Notice how the spectrum changes—there is less high- and mid-frequency content in the blurred image spectrum. In addition, in the original spectrum there are some favored orientations in the mid- and low-frequency coefficients, while in the blurred spectrum they seem to become more uniformly distributed. Another important feature is that in the blurred spectrum the coefficients related to high frequency have decreased significantly, and, as described in Sec. 2, many FMs are actually based on the idea of emphasizing high frequencies. While this may be true in theory, in practice there will always be noise contributing to the high-frequency content due to different acquisition conditions. Furthermore, given that the focusing mechanism involves acquiring a sequence of images, there will be spatial and temporal variations of noise.
Measure of Anisotropy
As we have seen in the previous example, the overall nature of blurring can be described as a low-pass filtering that tends to break down the characteristic anisotropy of the original image. The FM proposed here aims to quantify this anisotropic dependence based on the normalized DCT of the image.
To define our measure, we introduce some notation. From Eq. (10), is the normalized DCT of of size , and , for , is a vector along one of the three main orientations of the spectrum depicted in Fig. 3. We will restrict our study to angular partitions of the spectrum roughly equivalent to vertical, diagonal, and horizontal components of the image space. Our measure of anisotropy mainly consists in calculating a difference of weighted coefficients along these orientations. Let be the set of DCT coefficients located between and angles, for . The function takes as input , performs orthogonal projection of all its elements along vector , and averages the elements that after projection fall on the same discrete coordinates. With we seek to compact the information around the three main orientations in a one-dimensional vector of elements. To illustrate, let us compute , where is the set of coefficients located between and . In Fig. 3(b), we show the projection of the coefficient with coordinates (4,2) along . After projection, this coefficient has coordinates (4,1). Therefore, the element . Consequently, we can stack all to form the following matrix,
DCT Coefficient Weighting
The first issue to address is the selection of a suitable . In DCT-based pattern recognition, robustness is achieved by means of coefficient truncation.22 It is known that low frequencies are related to illumination variation and smooth regions, and high frequencies represent noise as well as small variations (like edge and details) of the image. The middle-frequency coefficients contain useful information of basic structure; therefore these are suitable candidates for recognition.23 Consequently, a trade-off between low-frequency and high-frequency truncation should be achieved to obtain a robust FM that is monotonic with respect to blur, unimodal, and at the same time robust to noise and illumination variations.
We decided to find a that meets our requirements based on a training set of images. This can be formulated as an optimization problem. The goal would be to find the vector that simultaneously optimizes K objective values . Every objective value is formulated so that the FM decreases with respect to blur, . There are subsets of all generated in the same way as described in Eq. (1), but they differ in that every stands for a different kind of noise degradation, except for , the noise-free case. In other words, we want to find a that guarantees monotonicity of with respect to blur under different types of noise. The objective values are implicitly defined in terms of permutations of the ordered set . Thus, the reference permutation is , and any other arbitrary permutation of violates the decreasing property of with respect to blur. As a result, our goal is to find a that produces permutations for all types of noise equal to that of . The objective value is defined as the -norm of the difference between and ,24 is the weighted linear sum of all , where all weights are equal to 1.
The solution to this problem is not a straightforward task, as the search space is multivariate and a unique global optimum cannot be guaranteed to exist. Therefore, we solved it using a probabilistic metaheuristic approach called simulated annealing.25 It provides an acceptably good solution in a fixed amount of time. Each step of the algorithm replaces the current solution by a random nearby solution, chosen with a probability that depends both on the difference between the corresponding function values and on a global parameter (called the temperature), that is gradually decreased during the process. The dependency is such that the current solution changes almost randomly when is large, but increasingly downhill as goes to zero. (For further details see Ref. 24.)
Typically, FMs are applied to a region called the focusing window, which is much smaller than the image. To achieve real-time computation, we decided to implement our measure by dividing the focusing window into subwindows. The measure is computed in the following manner:
1. The focusing window is divided into non-overlapping sub-images of size . This is chosen so that the most basic structures of the image fit in the subwindows.
2. Each subwindow image is transformed with the normalized DCT, and the FM is computed.
3. An overall FM is computed by taking the mean of all values from the subwindows.
According to this implementation, the parameter consists of 16 elements. The considered noise degradations for the procedure described in Sec. 4.2 are Gaussian noise (), speckle noise (), and impulsive noise (). The resulting is shown in Fig. 4. As expected, the first two coefficients are practically zero. Observe the distribution of instead of the individual values per coefficient. This means that a strong emphasis should be put to mid-frequency coefficients. It is perhaps not surprising that the distribution resembles a bandpass filter. This finding is consistent with the work in Ref. 9, where they showed that bandpass filtering causes the FMs to have sharp peaks while retaining monotonicity and unimodality. Interestingly, these weights also resemble the band pass response of the contrast sensitivity function of the human visual system. In the DCT domain, different approaches have been considered for computing visually optimized coefficients for a given image.26 A major feature of our approach is the fast computation of the FM. The average execution time per frame, in MATLAB implementation on a PC with a 2.66-GHz Intel Core 2 Duo processor, is 40 ms. In most cases this is sufficient; however, if needed, implementation in a low-level programming language could significantly reduce the execution time. In addition, because we divide the focusing window into subwindows, our implementation could be further improved by taking advantage of large parallel architectures such as in graphics processor unit computing.
Simulated Images and Robustness Assessment
To evaluate the robustness of our proposed FM we simulated the focusing procedure. We generated a sequence for from the red channel of a sharp RGB fundus image and propagated it at different distances through a linear imaging system of fixed focal length by means of Fresnel propagation. This is equivalent to displacing the lens or the sensor to look for the optimal focus position. From this noise-free sequence, we generated six additional sequences by corrupting it with two levels of three different types of noise: Gaussian, speckle, and impulse noise. We carried out this procedure for 20 retinal images for a total of 140 focusing sequences. Ideally, a noise-robust FM should produce the same (or similar) focusing curve for both the noise-free and corrupted sequences. To quantify the similarity between two focusing curves and we used the zero-lag normalized cross-correlation defined as
All FMs were computed using a focusing window of located over retinal structures. In Fig. 5, we show an example to illustrate the robustness assessment of the FMs. The FM curves represent the normalized measure value over the search space for different lens positions. The highest value should be obtained when the lens is in the optimal focus position identified by the dashed vertical line. As the lens gets farther from the optimal position, the measure value should decrease proportionally to the distance. It comes as no surprise that all measures performed sufficiently well in the noise-free sequence shown in Fig. 5(a), where all curves follow a typical bell shape with a unique maximum. However, in the curves shown in Fig. 5(b)–5(d) where the focusing sequence is corrupted by different types of noise, the proposed FM clearly outperforms the other measures in terms of monotonicity and unimodality. Notice that under Gaussian and speckle noise [Fig. 5(b)–5(c)], the curves are nearly identical to the noise-free curve in Fig. 5(a). Without jumping to conclusions, this result is interesting because it graphically shows the robustness of the proposed FM. The results for all of the 140 sequences are summarized in Table 1. Each value represents the average cross-correlation obtained for all 20 sequences corrupted with a specified type and level of noise for a particular FM. The overall average for each FM is shown in the last column. These results provide further evidence that the proposed FM has a considerable robustness to noise, with an overall performance value of 0.929 and an exceptional 0.996 for the sequence corrupted with Gaussian noise with . The second and third best FMs were and , with overall values of 0.781 and 0.502, respectively. In comparison with these values represent a moderate to mild noise robustness. In the following section, we use these two FMs to compare with in real images.
Average normalized cross-correlation results for noise robustness assessment of focus measures from 140 sequences generated from 20 retinal images corrupted with different types and levels of noise. The three best FMs are in bold type.
|Gaussian (σ2=0.001)||Gaussian (σ2=0.005)||Speckle (σ2=0.001)||Speckle (σ2=0.005)||Impulse (d=0.01)||Impulse (d=0.05)||Overall average|
In this subsection we show the results obtained from real NIR-focusing eye fundus images. The images have a relatively low signal to noise ratio (SNR) which justifies the need for a robust FM. All images were acquired using a digital fundus camera system (TRC-NW6S, Topcon, Tokyo, Japan) taking the video output from the infrared focusing system with a resolution of . The focusing system enables a compensation range of in normal operation. For strong myopia or hyperopia, two additional compensation lenses are available to compensate the ranges: and , respectively. The image sequences analyzed here were acquired by means of an in-house assembled motor mechanism for the displacement of the compensation lens across the whole range for normal operation.
It is well known that as a person ages the crystalline lens of the eye gradually gets opacified, obstructing the passage of light. This is called a cataract. A complete loss of transparency is observed only in advanced stages in untreated patients. In early stages of cataracts retinal examination is considered practicable—however, it is not without difficulty. For this reason, we decided to test our focusing method on healthy young subjects and elderly subjects with first signs of cataracts, not only to demonstrate its applicability on real images, but to assess its limitations as well. In this work we show results from five representative subjects of ages 27, 40, 68, 70, and 81 years for a total number of 10 eye fundi.
First we show the effects of placing the focusing window on different regions of the retinal image. A retinal image has distinct sharp structures such as the blood vessels and the optic disk, as opposed to the relatively uniform background. No FM is reliable without placing the focusing window on top of structures with edges, a fact easily appreciable from the three focusing curves shown in Fig. 6, which were computed from the right eye fundus of the 27-year-old subject. The optimal focus position identified by the dashed vertical line was verified via the split-line focusing mechanism. The curves computed from regions (b) and (c) are clearly reliable in terms of monotonicity and unimodality and coincide on the optimal focus position. Conversely, the and curves fail to produce a reliable profile against the curves that display a steeper peak at the optimal focus position, evidence of the measure’s robustness to noise. In contrast, all measures computed from region (d) are unusable because they are mainly given by noise.
To illustrate the link between the focusing curves and the image quality, in Fig. 7 we show three image details depicting the optic disk region for three different focusing positions. The image detail in Fig. 7(b) corresponds to the focused image (optimal focus position 11 in the curve Fig. 6). Notice how this image is properly focused: it has sharp details such as the blood vessels. The other two images are blurred, demonstrating the consistency of the curves with image quality or sharpness. The result that emerges from this example is that to effectively locate the best-focused image, homogeneous regions should be avoided. An adaptive technique, based in an edge detector for example, could prove useful for detecting such prominent structures and therefore candidate regions for applying the focusing technique automatically. The focusing curves shown hereafter, however, were all computed from a focusing window located manually over retinal structures.
To further analyze the performance of the FM in Fig. 8, we show the focusing curves obtained from four of the five subjects; the ages are shown in the figure caption. From the four cases shown only in one [Fig. 8(c)], the measure peak did not coincide precisely with the optimal focus position. However, the error is no more than a single position. The FMs curves of and are generally flatter than those of which in a focus search strategy is not wanted because of the difficulty to properly distinguish the optimum position in a coarse or initial search. From the curves in Fig. 8, we can also note that there appears to be little difference between the curves from young and elderly subjects. In Fig. 9, we show the focusing curves obtained from the 81-year-old subject for both eye fundi. This case is interesting on its own because in the right eye [Fig. 9(a)], the crystalline lens has been extracted and replaced with an intraocular lens, whereas the left eye [Fig. 9(b)], is in an early stage of cataract. While both focusing curves are able to successfully identify the optimal focus position, the curve in Fig. 9(b) is certainly flatter throughout most of the search space. This is most likely due to the difference in visibility and clarity from both eyes. In general, from the comparison against and it can clearly be stated that the proposed FM outperforms them in the considered cases.
A close examination of the results reveal that the shape of the focusing curve is not exclusively given by the degree of defocus, but by the state of the subject’s eye and the analyzed region of the fundus as well. This is important because it conditions the strategy for searching the optimal focus position. Finally, even though the results seem to indicate that the FM could be successfully applied to both young and elderly subjects, further research on a higher number and variety of subjects is necessary. Additionally, we report here that we encountered some difficulty in the procedure with the elderly subjects related to sustaining fixation during the acquisition procedure. From an initial number of six subjects one was excluded from all calculations due to this condition. Patient inability to successfully establish fixation is a true challenge in fundus photography, and dealing with it is out of the scope of this work.
In this paper, a new robust FM for nonmydriatic retinal imaging has been proposed. It is based on a measure of anisotropy, mainly the weighted directional variance of the normalized DCT. The weights were calculated by means of an optimization procedure to maximize the noise robustness of the FM. Not only were the resulting weights in agreement with previous works,9 but they also provide a key insight into the design of noise-invariant FMs. By both simulation and real fundus imaging, we demonstrated the robustness and the accuracy of the proposed FM, which clearly outperformed the other considered measures. The findings presented here may have a number of implications for the design and operation of auto-focusing in modern retinal cameras. Finally, in this study we included several young and elderly subjects to assess the limitations of the proposed FM. Even though we found no significant differences between the focusing curves, there was some difficulty in the acquisition of images from the elderly mainly given by inability to sustain fixation. As with all such studies, there are limitations that offer opportunities for further research. Adapting our method to these variations within the patient population is a goal worth pursuing.
This research has been partly funded by the Spanish Ministerio de Economía y Competitividad and Fondos FEDER (project DPI2009-08879) and projects TEC2010-09834-E and TEC2010-20307. The first author also thanks the Spanish Ministerio de Educación for an FPU doctoral scholarship. The authors especially thank the Pérez-Cabré and the Sisquella-Cabré families for their cooperation in the experimental session.