Translator Disclaimer
1 November 2010 Optimal algorithm for automatic detection of microaneurysms based on receiver operating characteristic curve
Author Affiliations +
Microaneurysms (MAs) are the first manifestations of the diabetic retinopathy (DR) as well as an indicator for its progression. Their automatic detection plays a key role for both mass screening and monitoring and is therefore in the core of any system for computer-assisted diagnosis of DR. The algorithm basically comprises the following stages: candidate detection aiming at extracting the patterns possibly corresponding to MAs based on mathematical morphological black top hat, feature extraction to characterize these candidates, and classification based on support vector machine (SVM), to validate MAs. Feature vector and kernel function of SVM selection is very important to the algorithm. We use the receiver operating characteristic (ROC) curve to evaluate the distinguishing performance of different feature vectors and different kernel functions of SVM. The ROC analysis indicates the quadratic polynomial SVM with a combination of features as the input shows the best discriminating performance.



Diabetic retinopathy (DR) is one of the most serious and most frequent eye diseases in the world. It is a complication of diabetes and the most common cause of blindness in adults. In order to prevent the damage of DR, it is very important to diagnose DR early. The analysis of digital retinal images (Fig. 1), obtained by the fundus camera, is viewed as a feasible approach because the acquisition of the retinal image is nonintrusive and low cost.

Fig. 1

Fundus retinal image (a MA marked with an arrow).


There are two categories of analysis based on retinal images. One is morphological analysis on arteries, veins, and optical cup.1 The other is based on the detection of the pathology lesion, such as hemorrhages, microaneurysms (MAs), hard exudates, and cotton wool spots.2, 3 MAs are the first unequivocal signs of DR, which appear as small reddish isolated patterns of circular shape in color fundus images. They are characterized by their diameters, which are always <125 μm.4 Because they are situated on capillaries and capillaries are not visible in color retinal images, they appear as isolated patterns (i.e., disconnected from the vascular tree).

In this paper, we propose a new method to optimize the algorithm to identify MAs. The algorithm for automatic detection of MAs generally comprises three steps. The first step aims at detecting candidates i.e., all patterns possibly corresponding to MAs based on a mathematical morphological black top hat. Then, features are extracted to characterize these candidates. And finally, a support vector machine (SVM) is used to distinguish MAs from all candidates. We use the receiver operating characteristic (ROC) curve to evaluate the distinguishing ability of different feature vectors and different kernel functions of SVM in this paper. Thus, the optimal feature vector and classifier can be determined by ROC analysis.


Automatic Detection of MA


Candidates Detection-Based Morphological Operation

Mathematical morphology operation is a way of nonlinear filtering used for image processing.5 The primary operations of mathematical morphology are dilation and erosion, denoted by ⊕ and Θ, on which other operations are based, such as opening and closing operations. In this paper, we use a black-top-hat transform. It can extract dark objects and structures in gray images. In the RGB images, the green channel exhibits the best contrast. We work on the gray image from the green channel, denoted by f. In the gray image f, MAs appear as dark patterns, small, isolated, and of circular shape.

Black-top-hat transform is the first closing operation on the image f and then subtracting the original image, defined as

Eq. 1

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} f_{\rm b} = (f \bullet {e}) - f, \end{equation}\end{document} fb=(fe)f,
where e is a structuring element and fe means closing operation to f with e. The morphological closing operation dilates an image f and then erodes the dilated image using the same structuring element, which is defined as follows:

Eq. 2

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} f \bullet e = (f \oplus e)\,\Theta\, e. \end{equation}\end{document} fe=(fe)Θe.

The blood vessels in the retinal images are usually considered as piecewise linear structures at different orientations. A total of 12 rotated linear structures are used with a radial resolution of 15 deg. The length of a linear structuring element should be such that it is larger than MAs in retinal images. MAs appear as dark patterns within s pixels, which is <125 μm. Thus, the length of the linear-structuring element must be longer than s pixels and s is variable for different sizes of original images. For each pixel, record the minimum response as the closed result applied with those 12 rotated linear-structuring elements. Then, we obtain the image f b by subtracting the original gray image from the closed image. The image f b contains MAs, which appear as the local bright regions in Fig. 2(a).

Fig. 2

(a) Image after black-top-hat transform and (b) binary image after matched filter.


Only a certain number of candidates are reasonable (for example, several dozens for each image). A matched filter is used to extract regions of interest (ROI) from f b. The matched filter is a 2-D Gaussian function with σ = 1 and has a size of s × s pixels. Gaussian difference [TeX:] $d_{\rm G}$ dG is an index image to evaluate the difference between the local region in f b and Gaussian function, calculated as

Eq. 3

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} d_{\rm G}(i,j) = \frac{\sum_{i',j' \in S} {[f_{\rm b}(i',j') - g(i',j')} ]^2}{s^2}, \end{equation}\end{document} dG(i,j)=i,jS[fb(i,j)g(i,j)]2s2,
where g is the normalized 2-D Gaussain function centered at (i, j) and (i′, j′) is the pixel position within the local region of f b centered at (i, j), with size of s × s pixels. Thus, d G(i, j) is summed over all the pixels (i′, j′) in the s×s region, and this local region is denoted by S in Eq. 3.

The ROI are the local bright regions in f b with lower Gaussian difference value [d G(i, j)]. They are extracted by a global threshold of Gaussian difference. Each ROI has a size of s × s pixels and is recorded by its center coordinates for facilitating to extract corresponding ROI from different source. The binary candidate region of each ROI is determined by thresholding, which minimizes the intraclass variance of candidate and surrounding [shown in Fig. 2(b)].


Feature Extraction for Candidates

Because MAs are mainly characterized by their shape, size, and color, we use three types of features as follows:

  • 1 shape features, such as relative size and compactness of candidate

  • 2 texture features based on the gray-level co-occurrence matrix (GLCM) of ROI from the green channel

  • 3 color features within the ROI from different color space

Shape is one of the essential characteristics of the object, which provides meaningful information. The Shape feature can be divided into two categories: one is based on the boundary of the candidate and the other is based on the region. We use the relative size and compactness to describe the shape of candidates. Each ROI is a region of s×s pixels, where s depends on the size of the original image to ensure the length of s pixels is not <125 μm. The binary candidate of each ROI is obtained by thresholding the ROI, making the surrounding black and the candidate white. In the binary ROI, area A and perimeter P of the candidate is calculated. Relative size, s 1, is defined as the ratio of area of candidates and ROI. Compactness, s 2, is used to describe circularity of the shape of candidate, which is defined as

Eq. 4

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} \begin{array}{*{20}c} {s_1 = A/s^2 }, \\[2pt] {s_2 = 1 - 4\pi A/P^2 }, \\ \end{array} \end{equation}\end{document} s1=A/s2,s2=14πA/P2,
where A is the number of pixels in a candidate and P is the total number of pixels around the boundary of the candidate.

Texture is a significant property in digital images. It has an important role in human visual perception and offers information for recognition and interpretation. The GLCM is a powerful statistical tool that has proved its usefulness in a variety of image-analysis applications. It captures second-order gray-level information, which is mostly related to human perception and the discrimination of textures. It is common practice to utilize the 14 well-known Haralick's coefficients as the GLCM-based features.6 The coefficients are usually calculated from the average GLCM obtained by averaging the matrices calculated for 0-, 45-, 90-, and 135-deg directions. In this work, we first convert the gray scale of each ROI to 11, then calculate the average GLCM, [TeX:] $\bar C$ C¯ . Last, we select four coefficients based on GLCM. They are energy t 1, contrast t 2, local homogeneity t 3, and entropy t 4.

Eq. 5

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} \displaystyle\begin{array}{*{20}c} {t_1 = \sum\limits_{i,j = 1}^N {\overline{C}^2 (i,j)} }, \\[2pt] {t_2 = \sum\limits_{i,j = 1}^N {(i - j)^2 \overline{C}(i,j)} }, \\[2pt] {t_3 = \sum\limits_{i,j = 1}^N \displaystyle{\frac{1}{{1 + (i - j)^2 }}\overline{C}(i,j)} }, \\[2pt] {t_4 = \sum\limits_{i,j = 1}^N {\bar C(i,j)} \log \overline{C}(i,j)}. \\[2pt] \end{array} \end{equation}\end{document} t1=i,j=1NC¯2(i,j),t2=i,j=1N(ij)2C¯(i,j),t3=i,j=1N11+(ij)2C¯(i,j),t4=i,j=1NC¯(i,j)logC¯(i,j).

Intensity is the only available information in the gray image, but color images provide more abundant color information, except intensity. The color contrast is a useful feature. It is defined as the sum of squared differences between the candidate region and its surroundings.4 We extract color features in RGB color space first, defined as

Eq. 6

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} \begin{array}{*{20}c} {c_1 = [\mu _{{\rm ext}} (R) - \mu _{{\mathop{\rm int}} } (R)]^2 }, \\[2pt] {c_2 = [\mu _{{\rm ext}} (G) - \mu _{{\mathop{\rm int}} } (G)]^2 }, \\[2pt] {c_3 = [\mu _{{\rm ext}} (B) - \mu _{{\mathop{\rm int}} } (B)]^2 }, \\ \end{array} \end{equation}\end{document} c1=[μext(R)μint(R)]2,c2=[μext(G)μint(G)]2,c3=[μext(B)μint(B)]2,
where [TeX:] $\mu _{{\mathop{\rm int}} }$ μint is the mean value on the candidate region and [TeX:] $\mu _{{\rm ext}}$ μext is the mean value on the surroundings in each ROI, and R, G, B in parentheses represent that the calculation is executed in R, G, and B channels, respectively.

The selection of these color features is a complicated task due to the variety of color models. The RGB space of the original image is transformed to hue, saturation, and value space (HSV) because HSV color space is more appropriate since it allows the value component to be separated from the other two color components. Then we extract two kinds of color feature from the hue and saturation components as

Eq. 7

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} \begin{array}{*{20}c} {c_4 = [\mu _{{\rm ext}} (H) - \mu _{{\mathop{\rm int}} } (H)]^2 }, \\[4pt] {c_5 = [\mu _{{\rm ext}} (S) - \mu _{{\mathop{\rm int}} } (S)]^2 }. \\ \end{array}\vadjust{\vspace*{-2pt}\pagebreak} \end{equation}\end{document} c4=[μext(H)μint(H)]2,c5=[μext(S)μint(S)]2.

In order to select the optimal feature vector, we construct several feature vectors, denoted by S, T, C1, C2, A1, and A2. Here, S = [s 1,s 2], T = [t 1,t 2,t 3,t 4], C1 = [c 1,c 2,c 3], C2 = [c 4,c 5], A1 = {S,T,C1}, and A2 = {S, T,C2}. We test these feature vectors by SVM in Section 2.3.


Validation of MA Based on SVM

The SVM, first introduced by Vapnik, is a learning algorithm for two-class classification. It is widely used in pattern recognition applications. It is based on strong foundations from the broad area of statistical learning theory according to structural risk minimization. A SVM is known for its good performance. The basic principle behind a binary SVM is to find the hyperplane that best separates vectors from both classes in feature space while maximizing the distance from each class to the hyperplane.7 There are both linear and nonlinear approaches to a SVM. If the two classes are linearly separable, then the SVM attempts to find the optimal separating hyperplane by maximizing the margin between both classes. The margin is [TeX:] $2/\!\left\| w \right\|^2$ 2/w2 . For a linear SVM, to find the optimal hyperplane is equal to min [TeX:] $\frac{1}{2}\left\| w \right\|^2$ 12w2 . When the two classes are nonlinearly separable, the SVM computes the optimal separating hyperplane by minimizing the following equation:

Eq. 8

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} J(w) = \frac{1}{2}\left\| w \right\|^2 + C\sum\limits_{i = 1}^{\rm N} {\xi _i }, \end{equation}\end{document} J(w)=12w2+Ci=1Nξi,
where the constant C > 0 is user defined and determines the trade-off between the maximization of the margin and minimization of the classification error and [TeX:] $\xi _i$ ξi are slack variables introduced for nonlinearly separable classes.

Different kernel functions determine the classification performance of the SVM. Thus, the selection of the kernel function is important to the SVM. We compare the performances of commonly used kernel functions based on the receiver operation characteristic (ROC) curve. The commonly used kernels are defined as

Eq. 9

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{eqnarray} \begin{array}{*{20}c} {{\rm Linear}\,{\rm function{:}}\; K(x_1, x_2) = \left\langle {x_1, x_2 } \right\rangle },\\ {{\rm Polynomial}\,{\rm (poly){:}}\; K(x_1, x_2) = (\left\langle {x_1, x_2 } \right\rangle + 1)^p },\hspace*{-10pt} \\ {{\rm Gaussian}\,{\rm radial}\,{\rm basis}\,{\rm function}\,{\rm (rbf){:}}\; K(x_1, x_2) = e^{{{ - \left| {x_1 - x_2 } \right|^2 } \mathord{\left/ {\vphantom {{ - \left| {x_1 - x_2 } \right|^2 } {2\sigma ^2 }}} \right. \kern-\nulldelimiterspace} {2\sigma ^2 }}} }\!, \end{array}\nonumber\hspace*{-10pt}\\ \end{eqnarray}\end{document} Linearfunction:K(x1,x2)=x1,x2,Polynomial(poly):K(x1,x2)=(x1,x2+1)p,Gaussianradialbasisfunction(rbf):K(x1,x2)=ex1x22x1x222σ22σ2,
where p is order of polynomial and usually takes 2 or 3 and σ is width of the rbf and controls its functionary range.


Optimizing Features Vector and Classifier Based on ROC Curve

The ROC is a performance measure commonly used to compare different classifiers. The ROC curve can be drawn using sensitivity as the x coordinate and 1-specificity as the y coordinate.8 Sensitivity describes the probability of a positive test among all positive samples and indicates the probability of a negative test among all negative samples. Therefore, the diagonal in an ROC plot is the performance of random guessing. The ROC curves move toward the upper left corner, indicating rising accuracy of performance. The area under the ROC curve (AUCROC) is an appropriate performance measure. The AUCROC value ranges from 0.5 to 1. The larger the AUCROC is and the closer it is to 1.0, the higher the validity of the classifiers is. Conversely, the nearer the AUCROC is to 0.5, the lower the validity of the classifiers is. If AUCROC is 1, then this means the classifier is perfect.


Results and Discussion


Sample Sets

MAs can easily be confounded with other dark patterns. One of the major problems in detection of a MA is to establish a “gold standard,” namely, a set of annotated samples for learning and testing. In this paper, we used the 50 annotated retinal images from the Retinopathy Online Challenge database.9 The 50 images were from patients with diabetes without known diabetic retinopathy at the moment of photography; they represent a random sample of unique patients with “red lesions” from a large (>10,000 patients) diabetic retinopathy screening program. These images were taken with Topcon NW 100, NW 200, or Canon CR5-45NM “nonmydriatic” cameras at their native resolution and compression settings. The retinal specialist annotations were obtained from a combination of three ophthalmologists with retinal fellowship training. The first 20 images are for training, whereas the remaining 30 are for testing in this paper.

We extract all annotated MA regions as true positive samples and ∼30 spurious objects as true negative samples from each training image and construct a training sample set, size of 727. And we obtain a test sample set, size of 1112, in the same manner.


Experimental Results

In order to identify the most favorable feature vector to distinguish real MAs from spurious objects, the feature vector shape features S, texture features T, and color features C1, C2, and the different combinations of them, A1 and A2, are respectively used as the input of the SVM. ROC curves for different feature vectors are shown in Fig. 3.

Fig. 3

ROC curves for different feature vectors.


The diagonal in Fig. 3 is the ROC of the SVM using C1 or C2 or S as feature vector, separately. They are laid over each other and have no diagnostic value. The ROC for T, A1, and A2 have higher AUCROC. Table 1 shows that the ROC for A2 has the highest AUCROC and the optimal classification performance among these feature vectors.

Table 1

Sensitivity and specificity of different feature vectors for SVM.


An SVM with different kernel functions has a significant difference of classification performance. In this paper, the commonly used kernel functions, such as linear function, Gaussian rbf, and polynomial kernel, are discussed. Figure 4 shows these ROC curves of SVM using these different kernel functions, and the best result is obtained by using the polynomial function kernel with p = 2. Table 2 shows that the ROCs for linear and polynomial functions have higher areas than the one for rbf, and the ROC for the polynomial function with p = 2 has the largest AUCROC value (0.7574), which is slightly larger than the one for the linear function. This indicates that the SVM using the polynomial function with p = 2 has the highest classification performance among the above-mentioned kernels.

Fig. 4

ROC curves for different kernels of SVM.


Table 2

Sensitivity and specificity of different kernels of SVM.

LinearrbfPoly 2Poly 3

Different thresholds can bring different classification results. Thus, the threshold is very important for classification performance of the classifier. Here, sensitivity × specificity is used as a performance measure to determinate which threshold has the better diagnostic value.9 When threshold is set at −0.9, sensitivity × specificity reached its maximum, as shown in Fig. 5. Because the threshold of classification is −0.9, the sensitivity is 64% and the specificity is 80%.

Fig. 5

Threshold for SVM.




In this paper, we present a new method to optimize the algorithm of automatic detection of MAs. The MA detection algorithm comprises three steps: candidate detection, feature extraction, and classification. We use a ROC curve to evaluate the distinguishing ability of different feature vectors and different kernel functions of an SVM. The feature vector A2 has the highest AUCROC in Fig. 3 and is used for the input of an SVM, and the polynomial function kernel with p = 2 shows the best discriminating performance according to those the ROC curves in Fig. 4. As mentioned above, the ROC curve is a useful technique for estimate classification performance of classifiers. It can be used to select the appropriate feature vector and to optimize the classifier.



W. L. Yun, U. R. Acharya, and Y. V. Venkatesh, “Identification of different stages of diabetic retinopathy using retinal optical images,” Inf. Sci., 178 (1), 106 –121 (2008). Google Scholar


Y. Hatanaka, T. Nakagawa, and Y. Hayashi, “CAD scheme to detect hemorrhages and exudates in ocular fundus images,” Proc. SPIE, 6514 65142M1 (2007). Google Scholar


B. Zhang, X. Wu, J. You, Q. Li, and F. Karray, “Detection of microaneurysms using multi-scale correlation coefficients,” Pattern Recogn., 43 (6), 2237 –2248 (2010). Google Scholar


T. Walter, P. Massin, and A. Erginay, “Automatic detection of microaneurysms in color fundus imagews,” Med. Image Anal., 11 (6), 555 –566 (2007). Google Scholar


A. Sopharak, B. Uyyanonvara, S. Barman, and T. H. Williamson, “Automatic detection of diabetic retinopathy exudates from non-dilated retinal images using mathematical morphology methods,” Comput. Med. Imaging Graphics, 32 (8), 720 –727 (2008). Google Scholar


A. Gelzinis, A. Verikas, and M. Bacauskiene, “Increasing the discrimination power of the co-occurrence matrix-based features,” Pattern Recogn., 40 (9), 2367 –2372 (2007). Google Scholar


M. H. Horng, “Multi-class support vector machine for classification of the ultrasonic images of supraspinatus,” Expert Syst. Appl., 36 (4), 8124 –8133 (2009). Google Scholar


X. Yuan, G. Ao, C. Quan, J. Zhang, P. Wang, and Y. Tian, “ROC analysis of CT hemodynamic in the diagnosis of breast cancer,” Chin.-German J. Clin. Oncol., 9 (3), 165 –168 (2010). Google Scholar


Retinopathy Online Challenge, 〈 Google Scholar
©(2010) Society of Photo-Optical Instrumentation Engineers (SPIE)
Lili Xu and Shuqian Luo "Optimal algorithm for automatic detection of microaneurysms based on receiver operating characteristic curve," Journal of Biomedical Optics 15(6), 065004 (1 November 2010).
Published: 1 November 2010

Back to Top