Tumor segmentation from breast magnetic resonance images using independent component texture analysis

Abstract. A new spectral signature analysis method for tumor segmentation in breast magnetic resonance images is presented. The proposed method is called an independent component texture analysis (ICTA), which consists of three techniques including independent component analysis (ICA), entropy-based thresholding, and texture feature registration (TFR). ICTA was mainly developed to resolve the inconsistency in the results of independent components (ICs) due to the random initial projection vector of ICA and then accordingly determine the most likely IC. A series of experiments were conducted to compare and evaluate ICTA with principal component texture analysis, traditional ICA, traditional principal component analysis (PCA), fuzzy c-means, constrained energy minimization, and orthogonal subspace projection methods. The experimental results showed that ICTA had higher efficiency than existing methods.


Introduction
Breast magnetic resonance imaging (MRI) has gradually gained much popularity in clinical use because results have shown that the screening accuracy of MRI is significantly higher than that of mammography and ultrasound. 1 Currently, doctors generally rely on breast MRI for obtaining the region of tumor since it is usually an important sign of breast cancer for diagnosis. Based on these considerations, this study proposes a new scheme that adopts mathematical algorithms from multispectral image processing techniques for specific object contrast enhancement. The tumor region can be segmented from contrast-enhanced images and shown as a binary image. We anticipate that the generated binary tumor region images will aid doctors in clinical diagnosis.
Over the years, many computer-assisted methods have been developed for analyzing single-spectral MRI, such as principal component analysis (PCA), 2 eigenimage analysis, 3 neural networks, 4 and fuzzy c-means (FCMs). 5 Eigenimage analysis has been shown to be effective in segmentation and feature extraction, and neural networks have been found to perform well in segmenting brain tissues and have been compared with classical maximum likelihood methods. However, because multispectral images provide more information for processing or analysis, multispectral analysis techniques can be used to improve the performance. Hence, several methods have been developed for processing multispectral MRIs, such as orthogonal subspace projection (OSP) 6 and Kalman filter, 7 but both of them require prior knowledge. With these considerations, we have developed a new method called the independent component texture analysis (ICTA) to segment the tumor region in multispectral breast MRIs. ICTA comprises three techniques: independent component analysis (ICA), entropy-based thresholding (ET), and texture feature registration (TFR). Among them, ICA, originally, is a blind source separation (BSS) method in the signal processing field, and it is a powerful tool for feature extraction and data representation such as speech recognition, image recognition, and statistical analysis. 8 The proposed method initially assumes multispectral breast MRIs to be gray-level values in a three-dimensional space composed of several independent components (ICs) that can be regarded as different tissues of the breast such as fat, glands, tumor mass, and muscle. First, the ICA technique is used to separate the ICs. Second, the binary images that indicate the suspicious tumor region are generated using the ET technique. Finally, texture feature extraction is employed to find the consistency of the tumor texture feature in suspicious tumor regions, which is called TFR. The texture features adopted for feature description are called the texture spectrum (TS), which is based on the varying relationship between the gray levels of both image pixels and surrounding pixels. We divided the TS according to three feature descriptors named black-white symmetry (BWS), geometric symmetry (GS), and degree of direction (DD). 9 According to the experimental results, the three feature values may effectively reflect differences between tumors and normal tissues. The binary image of the tumor region selected by TFR is the output of proposed ICTA method that could assist doctors in their diagnosis. A flowchart of the proposed ICTA is shown in Fig. 1. The performance of ICTA is compared with that of principal component texture analysis (PCTA), FCM, constrained energy minimization (CEM), and OSP methods using a set of breast MRIs to evaluate the feasibility of this new method in medical and clinical applications.
The remainder of this paper is organized as follows: Sec. 2 presents the existing multispectral image blind separation technique, ICA. Section 3 describes in detail the proposed method, ICTA. Section 4 explains the experiments and their results. Finally, the conclusions are presented in Sec. 5.
2 Independent Component Analysis ICA has been developed to solve BSS problems such as the cocktail party problem, and it is an extension of the covariance-based PCA method. The observed signals are considered to be a linear combination of the original signals and a mixing matrix. The goal is to find the mixing matrix. In this paper, we applied the ICA technique to separate multispectral breast MR images of different tissues. First, let us define an n-dimensional original signal denoted as a vector S ¼ ðs 1 ; s 2 ; : : : ; s n Þ T . Through linear transformation, we obtain an m-dimensional observed signal denoted as a vector X ¼ ðx 1 ; x 2 ; : : : ; x m Þ T . We assume that the linear transformation X is composed of a mixing matrix A of size m × n and the original signal S 2 6 6 6 4 . .
For simplicity, we rewrite the above equation as follows: To obtain the mixing matrix A, we compute its inverse W ¼ A −1 and obtain the IC asŜ S ¼ S: Currently, many algorithms have been developed for the implementation of ICA, but most of them either require complex computation or have slow convergence. In this paper, we applied the fast fixed-point algorithm (FastICA) 10 proposed by Hyvarinen and Oja for the effective implementation of ICA. The principal advantages of this algorithm are fast convergence and simple computation.

Independent Component Texture Analysis
Because ICs generated by ICA are inconsistent and cannot identify tissue automatically, this paper proposes ICTA, which combines ICA, ET, and TFR, to cope with the aforementioned problems of ICA and accordingly determine the target tumor region from ICs. The following sections describe ET and TFR in more detail.

Entropy-Based Thresholding
In ICTA, ET (Refs. 11 and 12) is first used to segment suspicious regions from an IC. For a start, we let t ij be the (i, j)'th element of a co-occurrence matrix W that considers the graylevel transitions between two adjacent pixels. We define it as where The probability of a transition from gray level i to j is obtained as Assume that t is the threshold used to threshold an image. Then, t partitions the co-occurrence matrix defined by Eq. (6) into four quadrants, A, B, C, and D, as shown in Fig. 2. These four quadrants can be further grouped into two classes. We assume that pixels with gray levels above the threshold are assigned to the foreground (objects) and those equal to or below the threshold are assigned to the background. Quadrants A and C correspond to local transitions within the background and foreground, respectively, whereas quadrants B and D represent transitions across boundaries between the background and foreground. The probabilities associated with each quadrant are then given by The probabilities in each quadrant can be further obtained by so-called cell probabilities: which are conditional probabilities for a specific quadrant. Three definitions-local entropy, joint entropy, and global entropy-can be derived on the basis of the cell probabilities, each of which yields a different method. According to experimental results, the threshold obtained from the local entropy is better than that obtained from the joint and global entropies. Therefore, we focused on the local entropy in this study.

Local entropy
Because quadrants A and C contain local transitions from background to background (BB) and objects to objects (FF), respectively, the local entropy of BB, denoted by H BB ðtÞ, and the local entropy of FF, denoted by H FF ðtÞ, can be defined as By summing up the local within-class transition entropies of the foreground and the background, a second-order local entropy, denoted by H LE ðtÞ, has been derived by Pal and Pal 11 as The LE method proposed by Pal and Pal 11 selects a threshold t LE by maximizing H LE ðtÞ as defined in Eq. (12) over t so that 3.2 Texture Feature Registration After ICs are segmented by ET, the TS (Ref. 9) is then applied to extract texture features from the segmented region (suspicious region) for identifying the optimum IC. TS is intended for feature description based on the varying relationship between the gray levels of the image pixels and the surrounding pixels. It has shown high efficiency in tissue discrimination. The feature value (TS) calculated from the suspicious region is then compared with the value calculated from a proven tumor region to define its probability and determine the most probable IC, and so is referred to as "TFR." In TFR, TS is subdivided into three feature descriptors named BWS, GS, and DD, 9 which are described as follows.

Texture spectrum
A group of vectors V ¼ fV 0 ; V 1 ; : : : ; V 8 g is defined for the pixel of which V 0 represents the gray level of the mask central pixel and V 1 ; V 2 ; : : : ; V 8 represent the gray levels of the surrounding eight points. The relationship E between the central pixel V 0 and the surrounding eight pixels is The relationship between the central pixel and its eight adjacent pixels is defined as a texture unit (TU), TU ¼ fE 1 ; E 2 ; : : : ; E 8 g, with each element having three possibilities. Thus, the possible available number of TU is 3 8 ¼ 6561, which is associated with a TU code N TU , defined as The surrounding eight pixels are assigned letters a to h clockwise, as illustrated in Fig. 3. Using the TS, it is possible to define the following texture features that represent the textures of images: BWS: GS: DD: where SðiÞ represents the occurrence frequency of the i'th TU code, i ¼ 0; : : : ; 6560 and S j ðiÞ represents the occurrence   frequency of the i'th TU code with j as the starting position, j ¼ 1; 2; : : : ; 8 (corresponding to a; b; : : : ; h in Fig. 3).
For the implementation of TFR, the tumor regions of T1, T2, and PD were first identified by experienced radiologists and then the TS was used to extract the texture features from these known tumor regions. Different sections in cases were utilized to obtain the reference tumor texture features. Table 1 shows the texture feature mean values extracted from the proven tumor regions.

Experimental Results
In this section, the performances of CEM, OSP, FCM, PCTA, and the proposed ICTA are evaluated with breast MRI datasets.

Experimental Data
In total, 16 cases of breast MRIs were acquired from Tri-Service General Hospital in Taiwan for performance evaluation, of which 11 cases contain tumor and five cases without tumor. The MRIs were performed on a 1.5T MAGNETOM Vision Plus system (Siemens, Erlangen, Germany). One demonstrated case which contains a medium size tumor is selected as an example and shown in Fig. 4 with four different sequences (bands): PD-weighted spectral image acquired with TR∕TEðrepetition time∕echo timeÞ ¼ 3000∕ 15 ms, T1-weighted spin-echo image acquired with TR∕TE ¼ 832∕20 ms, T2-weighted spin-echo image acquired with TR∕TE ¼ 3000∕105 ms, and T1_FS (T1weighted fat-saturated image). The resolution and size of each band are 8-bit gray level and 427 × 427, respectively. The slice thickness of all MRIs is 2 mm.
In each case, the standard tumor region required for performance evaluation was conformably verified by three experienced radiologists as shown in Fig. 5(d). In Fig. 5, Fig. 5(a)-5(c) shows the contours of the tumor mass delineated by three experienced radiologists, and Fig. 5(d) is the result of the intersection of Fig. 5(a)-5(c).

Abundance percentage thresholding method
In order to quantitatively study and compare the results with different methods, we converted the abundance fractional images generated by the different methods to binary images. Hence, we adopted the method proposed in Ref. 13, which used the abundance fraction percentage as the cutoff threshold value for such a conversion. First, using the normalized abundance fraction of the image with the range of [0, 1], let r be the pixel vector of the image and a 1 ðrÞ;â 2 ðrÞ; : : : ;â p ðrÞ be the estimated abundance fractions of a 1 ; a 2 ; : : : ; a p in r; then the normalized abundance fractionã j ðrÞ of each estimated abundance fractionα j ðrÞ can be obtained byã j ðrÞ ¼α j ðrÞ − min rαj ðrÞ max rαj ðrÞ − min rαj ðrÞ : Assume that α% is the cutoff abundance fraction threshold value. If the normalized abundance fraction of the pixel Journal of Electronic Imaging 023027-5 Apr-Jun 2013/Vol. 22 (2) vector is greater than or equal to α%, then the pixel will be detected as a desired object pixel and set to "1"; otherwise, it will be set to "0" and not be detected as a desired object pixel. The use of this cutoff threshold value to threshold a fractional abundance image will be referred to as the "α% thresholding method."

Receiver operating characteristic curve analysis
Using the α% thresholding method, we were able to calculate the number of detected pixels in the generated fractional abundance image. Then the performance indices were calculated using receiver operating characteristic (ROC)  analysis. 14 We define fd 1 ; d 2 ; : : : ; d p g ¼ fd i g p i¼1 as p interesting objects to be classified. Nðd i Þ is the total number of pixels specified by the i'th object signature d i , N D ðd i Þ is the number of pixels specified by the i'th object signature d i and actually detected as d i , and N F ðd i Þ is the number of falsealarm pixels that are not specified by the object signature d i but are detected as d i . Using the definitions of Nðd i Þ, N D ðd i Þ, and N F ðd i Þ, we further define the detection rate R D ðd i Þ, false alarm rate R F ðd i Þ, mean detection rate R D , and mean false rate R F as follows: where N is the total number of pixels in the image and pðd i Þ ¼ Nðd i Þ∕ P p i¼1 Nðd i Þ. Note that the mean detection rate R D as defined by Eq. (22) is the average of the detection rates for all detected objects; similarly, the mean falsealarm rate R F as defined by Eq. (23) is the average of the false-alarm rates for all detected objects. According to Eqs. (20) and (21), each fixed α% can generate a pair of R D and R F . Furthermore, increasing α% from 0% to 100% gradually can generate a set of pairs ðR D ; R F Þ. A two-dimensional ROC curve of ðR D ; R F Þ is then plotted, and the area size under the ROC curve (AUC) is measured as a performance index (classification rate).

Experimental Results
The experimental results obtained by the traditional ICA method are shown in Table 2, where the highest classification rates are given in bold face. In Table 2, four IC and four IC' (negative film of IC) values are generated in 10 run times. As seen in Table 2, the highest classification rates did not appear in the same IC in each run time. Therefore, the classification rates would not be high as shown in the last row if we chose a fixed IC as the classification result. Table 3 presents the mean values of the 10 highest classification rates manually selected from ICs in 100 run times for three cases with different tumor sizes (big, medium, and small). From the results of Table 3, we can see that the performance of the ICA method was better than the PCA method, but both of them were unable to identify the tumor region by employing a fixed IC.
The texture features calculated from the suspicious tumor region for each IC were then compared with the reference tumor texture features to find out the most probable ICs according to their Euclidean distance calculated as where S is the number of ICs, and the texture features distance between ICs and the known tumor is calculated by with a most probable IC selected using minimum distance as minfICTA 1 ; ICTA 2 ; : : : ; ICTA S g: (28) Table 4 shows the tumor classification rates for all IC and IC' values of the three cases using traditional ICA, and the minimum distances are given in bold face. From Table 4, we can see that ICs with minimum distance will have highest classification rate. Such a result also indicates that correct ICs could be found by using TFR. Table 5 shows the correct IC identification rates of TFR for 16 cases, of which 11 cases contain tumor and five cases without tumor. According to experimental experience, the result of a case will be identified as no tumor within if all the distances of ICs are >0.4. Figure 6 shows high-contrast images resulting from ICTA, PCTA, CEM, and OSP methods, where the PCTA method has the same procedure with ICTA, but using PCA instead of ICA. Figure 7 shows ROC curves for the classified sections shown in Fig. 6. Table 6 presents the area under the ROC curves for ICTA, PCTA, CEM, and OSP. Figure 8 shows the segmentation results of applying ICTA, PCTA, CEM, and OSP on the images shown in Fig. 4. In addition, we add an FCM into Fig. 8 for comparison because it is a common representative in classification. Table 7 shows the averaged tumor correct-classification and false-alarm rates obtained from applying ICTA, PCTA, FCM, CEM, and OSP in our experimental data set. From Tables 6 and 7, we can see that ICTA has good performance with lowest false-alarm rate, as does CEM. But unlike CEM, which requires prior knowledge, ICTA is a blind classification technique.

Conclusion
Although ICA shows good performance in multispectral MRI analysis, it may produce inconsistent results for ICs because ICA uses random initial projection vectors. To solve this problem, we propose a new method called the ICTA for tumor contrast enhancement and segmentation using breast MRIs. ICTA consists of three processes: ICA, ET, and TFR.
After comparison with PCTA, FCM, CEM, OSP, and contrast-injected breast MRIs, our results indicate the possibility of using ICTA for accurate diagnosis. First, ICTA is a BSS method that, unlike CEM or OSP, does not require prior knowledge. Second, the correct IC identification rate by TFR is as high as 92.09%. Third, the tumor region correctclassification rate is as high as 95.78% with a 2.05% falsealarm rate. We anticipate that the high-contrast and binary images generated by ICTA will assist radiologists in breast MRI screening and increase the accuracy of breast tumor diagnosis.