Glaucoma is the second leading cause of vision loss in the world.1 Moreover, the number of people with glaucoma is estimated to be 60.5 million in 2010 and 79.6 million in 2020.2 In a population-based prevalence survey of glaucoma in Tajimi City, Japan, one in 20 people aged over 40 was found to have the disease.3, 4 Glaucoma is a group of eye diseases causing optic nerve damage; the exact causes of this damage are not fully understood, but involve mechanical compression and/or decreased blood flow to the optic nerve.5 Although incurable, glaucoma can be treated if diagnosed early. Mass screening of glaucoma using retinal fundus images is simple and effective.6
The cup/disk (C/D) ratio, which is the ratio of the diameter of the depression (cup) to that of the optic nerve head (ONH, disk), is one of the important parameters for an early diagnosis of glaucoma. The C/D ratio is generally used in clinical practice, because its value is greater in the case of glaucoma. The interpretation of the ONH (which actually has a 3-D structure) by using a 2-D image is subjective, and there is a wide variation between the examinations of the ONH by different observers and even between the examinations by the same observer.7 A more quantitative alternative is to use the Heidelberg Retina Tomograph (HRT), which is a confocal laser scanning microscope, for the acquisition and analysis of the 3-D measures of the ONH.8, 9 It has been revealed that an HRT is capable of ONH imaging, and it is an established technique for detecting glaucomatous structural changes.
A computerized technique for the qualitative estimation of the depth of the ONH from the stereoscopic pairs of retinal fundus images has been suggested as another objective method for the 3-D analysis of the depression of the ONH.10, 11, 12, 13 It has been shown that this technique is useful for the investigation of the 3-D measures of the ONH. Corona 12 calculated 2-D and 3-D C/D ratios, and the results showed good correlation between clinical measures and computer-generated measures. Xu and Chutatape13 described an automatic reconstruction method from a stereo image pair by the use of a new approach that included sparse disparity measurement, subpixel modification, searching range autoadjustment, and piece-wise cubic interpolation and smoothing operations. The subpixel modification performed reconstruction from a low-resolution stereo retinal fundus image pair. Xu 14 also compared the C/D vertical ratio generated from the stereo image pair with the results from the HRT and an experienced ophthalmologist for evaluating the reconstruction results. The correlations of the results with the ophthalmologist’s measurement were 0.71 and 0.67, respectively, for the proposed method and the HRT. However, the relative depth value was expressed as the number of pixels in these techniques, and the experimental results regarding the quantitative depth value calculated from the stereo image pair of the ONH have not been reported. Although Xu 14 reported the real depths, the values were calibrated by using the maximum cup depth obtained from the HRT results. Moreover, there have been no studies in which the depth value calculated from the stereo image pair has been compared with the HRT outputs.
In this study, an automatic method for reconstructing the 3-D structure of the ONH by using the stereo retinal fundus image pair has been proposed. To evaluate the accuracy of our method, the shape of the depression of a disk depth model, which had a circular dent when generated from the stereo image pair and which was used to model the ONH, was compared with its true depth value. The true depth value was measured by use of a vernier caliper with accuracy level of . Moreover, the depth value of the ONH obtained from the stereo image pair was compared with the HRT measurement results.
In our technique, the depth value is obtained from the stereo retinal fundus image pair, which is a digital color image; the technique mainly consists of five processes. The flowchart of the main procedure is shown in Fig. 1 . A stereo image pair consists of a left image and a right image captured from different perspectives. The stereo image pair can be generated by taking two shots with a parallel shift using a single-lens retinal camera, or by taking a single shot using a stereo retinal camera. In the first step, the images of the ONH region are cut out from the original retinal fundus images in the first step. In the second step, the registration process of the stereo ONH image pair is performed to remove any displacements. The displacement between the stereo retinal fundus image pair is caused by movement of the eye and differential refraction of light due to astigmatic eye. It is necessary for precise measurement of the depth value to reduce displacement. In the third step, the “corresponding points” in each stereo ONH image are detected, as discussed in Sec. 2.3. In the fourth step, the fluctuations in the disparity are reduced by both medium and smoothing filters. The fluctuations of disparity are caused by failure in the disparity measurement process. In the last step, the depth values of the 3-D structures are calculated from the results of the disparities measured for the configuration of the corresponding points.15
Cutout of Optic Nerve Head Region
The images of the ONH region were cut out from the original stereo image pair to reduce the processing area to expedite the subsequent steps, namely the registration of the stereo pair and disparity detection. In this processing step, the retinal fundus images were cut out to form quadrates at the position of the ONH region that was extracted automatically.
The ONH region has relatively high pixel values in three channels (red, green, and blue channels of the image) in the color stereo retinal fundus image pairs. -tile thresholding16 can be applied to define a threshold for an approximate extraction of the ONH region, because individual variations of the ONH do not vary significantly. -tile thresholding was performed on three channel images; subsequently, the region that was extracted in more than two channel images was determined as the extraction result. When more than two regions were extracted, the region with the maximum area was selected as the ONH region. The value of in the -tile thresholding operation was experimentally set to a value slightly greater than the average area of the ONH.
The blood vessels (BVs) running on the surface of the ONH interfered with the correct extraction of the ONH region in the -tile thresholding operation. To solve this problem, the extraction of the ONH region was performed by using the images in which the BVs were erased. These erased pixels were then interpolated by using the RGB values of the pixels in the surrounding region. The pixel value used in the interpolation was calculated asdenotes the values of the pixels in the surrounding region, is the number of surrounding pixels, and is the distance between the interpolated pixel and each surrounding pixel. An example of the blood-vessel-erased image is shown in Fig. 2c .
The BVs were extracted by using the black-top-hat transformation, which is a type of grayscale morphological operation,17, 18 from the green channel of the color stereo retinal fundus image pairs. This grayscale morphological transformation was well suited to the task of segmenting BVs from the retinal fundus images. The black-top-hat transformation is defined as the residue between the image processed by morphological closing, which is a dilation followed by an erosion operation, and the original image. The “structure element” used in this transformation was a disk whose diameter was set to the same level as the thickness of the largest BVs in the ONH region. There are differences in the diameters of BVs among individuals. However, the differences are not significant. The thickness of the largest BV was determined in advance from experiments. The regions containing BVs were extracted after applying the Otsu thresholding technique19 to the black-top-hat transformed image. An example of the blood vessel image is shown in Fig. 2b.
The center of the quadrate of the region of interest (ROI) around the ONH region was the gravity point of the ONH region extracted from the images in which the BVs were erased, as shown in Fig. 2d.
In this study, the size of the original retinal fundus image was , the angle of view was , and the size of the ROI around the ONH region was , as shown in Fig. 3 . The size of the ROI was determined by considering the vertical diameter of the ONH in advance from experiments. The extraction process of the ONH region was implemented by using an image was reduced to one-sixth of its original left retinal fundus image.
Registration of Stereo Image Pair
The disparity, which is defined as the difference in the position of the corresponding points in the stereo image pair, depends on the change in the position of not only the camera but also the subject (subject’s motion). The disparity due to the subject’s motion affects the calculated result of the depth value. To accurately measure the depth value, it is necessary to rectify the disparity due to the subject’s motion. However, it is difficult to determine the motion that induces the disparity only on the basis of observations.
In the retinal fundus, the bell-shaped ONH has a dent on the opposite side, that is, the side facing the camera. Therefore, the cup region of the ONH exists at a distant position from the camera, and the disparity is small. Theoretically, it is possible to use the pixels in the ONH region, which have small disparities, for image registration. However, the region of the retina around the ONH is more suitable for the image registration task, because the blood vessels in the retina exist even on the curved surface. In this region, the right and left images exhibit a parallel shift. Moreover, the 3-D structure of the retinal region is simpler than that of the ONH. From the previously mentioned description, the image registration for rectifying the disparity due to a subject’s motion was performed by using the pixels from regions other than the ONH region.
To exclude the ONH region from the registration process, the pixels of a retinal fundus image were allocated to two regions: the retina and ONH region. The pixels in the retina region were used for registering the stereo image pair. The boundary between the two regions was obtained by automatically extracting the ONH region.
In the first step, a contour of the ONH region was extracted from two stereo fundus images. The ONH region has a tendency to have a higher pixel value than the other regions. Furthermore, the contour of the ONH region can be expressed with a smooth closed curve in many cases. Therefore, the contour that exhibits high edge intensity, which is defined as a change in the brightness, was extracted as the smooth closed curve by using an active contour model.20 Some techniques for the extraction of the ONH region have been previously reported.21, 22, 23, 24 The ONH extraction would be relatively easy, as long as the contour of the ONH is clear without any obscured region. Therefore, details pertaining to the definition of the energy and parameters of the active contour model used in the ONH extraction in this study are briefly explained here.
In the next step, the registration of the stereo image pair was performed by using all the pixels of the images from regions other than the ONH region. If the positional error is minimum, the sum of all the differences between the pixel values of the two images will be minimum. Therefore, the right image was translated and rotated until the sum of all the differences was minimum. This registration procedure used the cross correlation between the two images, calculated asand are the feature values in the coordinate system of the left and right images, respectively; and are the width and height of the image, respectively; and and are the average pixel values in the left and right images, respectively. The information used in the registration was the pixel values of the three channels and its edge images created by the Sobel filter, whose algorithm detects the horizontal and vertical qualities of edges.16 The cross correlation coefficients were calculated between the red, green, and blue channels of the images and the edge images of these three channel images; subsequently, the average cross correlation coefficient was used as the index of positional error.
The required disparity for obtaining the depth value was calculated from the location differences between the corresponding points. The detection of the corresponding points was performed using the pixels of the area within the registered ONH image pair. The detection of the corresponding points comprised the search for a point on the right image that corresponded to the reference point on the left image. The search was performed by setting up regions of interest (ROIs), including the pixels around the reference point and the candidate point separately. Two points on the left and right images, having a similar texture in their respective ROIs, were regarded as the corresponding points. The similarity was measured by the cross correlation coefficient, defined asand are the feature values of the pixels in the ROIs set in the ONH image pair, and are the average feature values of the ROIs, is the coordinate of the reference point in the left image, is the coordinate of the candidate point in the right image, and and are the width and height of the ROI, respectively.
The information used in the detection of the corresponding points was the pixel values of the three channels of the images and its edge images created by the Sobel filter16 using the pixel value in the three channel images. In other words, the cross correlation coefficients were calculated between the red-, green-, and blue-channel images and the edge images of these three images; subsequently, the average cross correlation was used as an index of similarity.
The point having the maximum cross correlation coefficient was considered to be the corresponding point. When the maximum value of the cross correlation coefficient was smaller than a preset threshold value, it was assumed that the corresponding point of the reference point was not found. Additionally, when the texture amount in the ROI was low, the detection of the corresponding points was skipped because the reliability of the results of the detection from such regions was particularly low. The texture amount was estimated by considering the contrast of the ROI. The contrast was defined as the difference between the maximum and minimum pixel values. The disparity of the point that did not have a corresponding point was interpolated by the average of the disparities of the surrounding reference points. The threshold values were 0.5 for the cross correlation coefficient and 10 for the texture. These values were determined by a trial-and-error method. Therefore, the similarity of the ROI was computed as
The disparity variance in the stereo retinal image pair is not very large, because the depth of the ONH is not over . To distinguish the disparity difference, subpixel measurement was carried out by using expanded images. If the disparity measurement is implemented by using a double-sized (expanded) image, the disparity whose fineness is half of the original pixel size is obtained by dividing the result of the disparity measurement by two. The expanded images were generated from the original images by employing a bilinear interpolation technique. The expansion was performed in the horizontal direction of the image, and the expansion ratio was set to 3.
The parameters of this process are shown in Fig. 4 . The size of the ROI was set to , and the searching range was set to ([ , ]). The reference points arranged at the equally spaced positions and the intervals were set to . Note that all the parameters about the disparity measurement using expanded images in conjunction with the expansion procedure, such as the size of the ROI and the searching range, were multiplied by three.
Sharp changes in the disparities were observed during the disparity detection. The depth of the ONH varies smoothly in the ROI. Therefore, if depth varies considerably in the ROI, the singular depth value assumes an incorrect result (these are called “noise” in this study). To reduce this type of noise, five iterations of a median filter and a moving average filter16 were applied to the disparity matrix.
Quantitative Depth Calculation
In general, there are two typical vision systems for implementing stereo vision—convergent and parallel (nonconvergent) systems. The depth value of the 3-D position was determined according to the value of the disparity at each location of the reference point in both the systems. Our stereo fundus camera has a parallel stereo configuration, in which there is no rotation between two optical axes. However, parallel light rays from the camera are refracted by the cornea and the crystal lens inside the eye. Hence, the parallel visual system is not suitable for the retinal fundus examination. The convergent visual system (Fig. 5 ) was selected in this study, and the depth value was calculated asand are the horizontal coordinates of the corresponding points in the left and right images, respectively. The original points in the coordinate system were arranged on the optical axis of the left and right viewpoints. is the angle of view of the images; is the angle between the position of the corresponding point and the optical axis; is the width of the images; and is the length of the baseline, which is the distance between the optical centers of the camera. is the angle of the optical axis. The depth value calculated by this method is the distance from the point at which the light rays are refracted.
Retinal fundus camera
A new prototype stereo fundus camera (prototype of the WX-1, Kowa Company Limited, Tokyo, Japan) was used in this study. The camera can capture two images sequentially by the parallel movement of the aperture diaphragm in . Thus, a stereo image pair of retinal fundus is obtained without any mydriatic drug. The length of the base line, which is the length between the two cameras in a stereo vision system, equals the amount of parallel movement and is . The size of the image is ; the diameter is ; and the angle of view is .
Heidelberg retina tomograph
The HRT is a confocal scanning laser ophthalmoscope that uses a wavelength diode laser to scan the retinal surface in three dimensions.25, 26 The HRT used in this study was the first model marketed in 1991. The HRT provides a topographic image of the ONH, which is derived from multiple optical sections at 32 consecutive focal depth planes.
Depth Measurement of the Disk Depth Model
A basic experiment was conducted to estimate the accuracy of the proposed method. The test object is a disk depth model, as shown in Fig. 6a , that was built in-house. It comprises a flat plate made of paper and a lens. The flat plate was arranged on the focal plane of the lens; it has a circular dent at the center to model the ONH, as shown in Fig. 6b. The diameter of the dent is and its depth is . The focal length of the lens is . Therefore , which is related to the angle of the optical axis, was set to and the resolution of the images is per pixel.
The accuracy of the proposed method was tested using the disk depth model described. The stereo image pair of the disk depth model is shown in Fig. 7a . The registration of the stereo image pair was not performed because the subject did not move. The depth maps generated from the stereo image pair are shown in Fig. 7b. The left image depicts the depth maps obtained with noise reduction and the right-hand image obtained shows the depth maps without noise reduction. The noise reduction improved the uniformity of the depth distribution.
Figure 7c shows the depth profile curve sampled at the center of the dent region in the model. The vertical axis indicates the depth from the base plane, and the horizontal axis indicates the location. The focal length of the disk depth model is defined in the base plane. The white dots in this figure refer to the depth calculated with the noise reduction, and black dots refer to the depth calculated without noise reduction. The difference between these two depth distributions was small. The test result yielded a value of approximately for the depth of the circular dent. The average depth distribution in the range from was . The measured results showed good accordance with the actual value.
Depth Measurement of the Optic Nerve Head
The proposed technique was evaluated using 12 stereo image pairs of the ONH. All the subjects had normal eyes exception for refractive errors. The mean age of the subjects was (range ). The mean refractive error of the subjects was (range ). This experiment assumed a focal length of for the subjects’ eyes. Therefore, was set to and the resolution of the images was per pixel. The focal length value of was obtained from the data of the simplified Gullstrand eye model.27
The depth value of the same ONH was measured using the HRT. Three topographic images were obtained for each eye, and the mean topographic image was generated. Each image consisted of , with each pixel corresponding to the retinal height at its location. The disk margin was determined by using a contour line of the ONH that was drawn by an ophthalmologist.
The first and second columns of Fig. 8 show the depth map generated from the stereo retinal fundus image pair and the topographic images obtained using the HRT, respectively. The results of the depth measurements obtained by using the stereo image pair were in accordance with the results of the HRT at some level.
The correlation coefficient between two depth distributions on the line running through the point with the maximum depth value was calculated for a comparison of the stereo fundus camera and the HRT. For this comparison experiment, 16 equally spaced points were selected from a total of 140 points in the horizontal direction of the depth map.
The third column of Fig. 8 shows the horizontal cross section of the depth distributions along with the correlation coefficients of the comparison results. In the point diagram, the white dots refer to the depths obtained by using the stereo fundus camera, and the black dots refer to the depth obtained using the HRT. The depth values obtained from the stereo fundus camera were horizontally shifted for comparison with the results of the HRT through manual procedures.
Table 1 presents the age and refractive power of the subjects, the calculated correlation coefficients, and the -values obtained in the analysis results. There were four cases with significant correlation with , and four cases with good correlation with to 0.9 . The mean of the correlation coefficients was 0.80 . There were three cases with poor correlation with ; however, in these three cases, the distribution patterns were similar to each other. The relation between the refractive power of the eyes and the correlation coefficients are shown in Table 1. There was no correlation between the refraction and correlation coefficients ( , ).
Correlation coefficient between depth values obtained from stereo fundus camera and HRT. (RE)=right eye; (LE)=left eye; D=diopter .
|Oculus||Age||Refractive power||r||p value|
The displacements of stereo image pairs could be compensated by employing the registration technique in which pixels from regions other than the ONH region were used. It is difficult for actual eyes to determine the magnitude of displacements. Instead of estimating the accuracy of the registration, the effects of the displacements on the calculation results of the depth values were simulated by changing the amount of the displacement. Figure 9 shows different profiles of the depth values with varying horizontal displacement in the stereo image ranging from (further separation between the stereo image pair) to . The reference point (marked by ) is the location having the maximum cross correlation coefficient in the proposed registration technique. The depth values of the ONH region were increased by , owing to the increase in the amount of displacement by . The maximum amount of horizontal displacement in the stereo image pair used in this study was . From these facts, it is inferred that the registration is an indispensable procedure to obtain accurate depth values. The amount of displacement presumed from the pixel values in the stereo image pair will be different for the registration approach. However, it is assumed that the differences in the displacement calculated by various methods are a few pixels. Thus, the differences between the depth values are considered to be small.
In the disparity detection step, the thresholds of the cross correlation coefficient and the texture were set with the viewpoint of eliminating low-trust results. If the thresholds are set to a high value, only the high-trust corresponding points are determined. However, the number of corresponding points determined is decreased in such cases. Thus, the undetermined points cannot be interpolated by using the disparities of the surrounding reference points. The threshold values should be automatically adjusted according to the condition of the stereo image pair.
The median filter was effective in reducing noise in the depth maps without substantial deformation of the curve shape. In Fig. 10 , some incorrect depth values can be observed; these are labeled as “noise” in the depth maps without the noise reduction. It is considered that the inaccuracy was caused by an error while searching for the corresponding points. The noise tended to appear in regions in which the amount of textures was larger than the threshold, but differences in the color between the left and right image were large. In the proposed method, the amount of texture, in the ROI used for the disparity detection was presumed by using the contrast defined as the difference between the maximum and minimum pixel values in the ROI. However, for the case where the color gradient is smooth in the ROI, there may not be a considerable amount of texture even if the contrast is high. Therefore, it might be better to use only the edge-extracted image to evaluate the amount of texture in the ROI. If a property of a subject can be presumed, the range of admissible disparity values will be limited to drop a sharply changed disparity as a wrong result. Moreover, because the searching range can also be limited, not only the appearance of noises but also the calculation time will be reduced.
The axis length of the eye was assumed as for all cases in this study. However, there are individual variations in the focal lengths of actual eyes. The refraction index of the eye also varies from place to place on the surface of the subject’s cornea in astigmatic eyes. These are considered the causes for the differences in the obtained depth values between the stereo fundus camera and the HRT in absolute level. Moreover, the camera lens distortion should be connected to obtain more accurate quantitative values.
In this study, we conduct quantitative measurements of the depth value of an ONH region from stereo retinal images. The measurement results obtained when the disk depth model is used are approximately consistent. The depth values obtained from the stereo image pair are in accordance with the results of the HRT. These depth values might be useful as assisting parameters for ophthalmologists in the diagnosis of degrees of glaucoma.
This work was partly supported by a grant for the Knowledge Cluster Creation Project from the Ministry of Education, Culture, Sports, Science and Technology, Japan. The authors would like to acknowledge the contribution of R. Shiraki of Gifu University for the acquisition of the HRT data.