## 1.

## Introduction

Noninvasive cardiac imaging is an invaluable tool for the diagnosis and treatment of cardiovascular disease (CVD). Magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), single photon emission computed tomography (SPECT), and ultrasound (US) have been used extensively for physiologic understanding and diagnostic purposes in cardiology. These imaging technologies have greatly increased our understanding of normal and diseased anatomy. Cardiac image segmentation plays a crucial role and allows for a wide range of applications, including quantification of volume, computer-aided diagnosis, localization of pathology, and image-guided interventions. However, manual delineation is tedious, time-consuming, and is limited by inter- and intraobserver variability. In addition, many segmentation algorithms are sensitive to the initialization and therefore the results are not always reproducible, which is also limited by interalgorithm variability. Furthermore, the amount and quality of imaging data that needs to be routinely acquired in one or more subjects has increased significantly. Therefore, it is crucial to develop automated, precise, and reproducible segmentation methods. Figure 1 illustrates an example of segmentation of heart on CT scan.

A variety of segmentation techniques have been proposed over the last few decades. While earlier approaches were often based on heuristics, recent studies employ more sophisticated and principled techniques. However, cardiac image segmentation still remained a challenge due to the highly variable nature of cardiac anatomy, function, and pathology.^{2} Furthermore, intensity distributions are heavily influenced by the disease state, imaging protocols, artifacts, or noise. Therefore, many researchers are seeking techniques to deal with such constraints. The research in cardiac image segmentation ranges from the fundamental problems of image analysis, including shape modeling and tracking, to more applied topics such as clinical quantification, computer-aided diagnosis, and image-guided interventions.

In this review, we aim to provide an overview on cardiac segmentation methods applied to images from major noninvasive modalities such as US, PET/SPECT, CT, and MRI. We focus on the segmentation of the cardiac chambers and whole heart applied to static and gated images (obtained through the cardiac cycle). In addition, we also discuss important clinical applications, characteristics of imaging modalities, and validation methods used for cardiac segmentation. We do not discuss coronary vessel tracking, which is a separate topic. We hope that this article can serve as a useful guide to recent developments in this growing field. The review is organized as follows. The clinical background of cardiac image segmentation is discussed in Sec. 2. Numerous segmentation methods are described in Sec. 3. Cardiac imaging modalities are reviewed in Sec. 4. Approaches to validation of the segmentation results are discussed in Sec. 5. Concluding remarks are given in Sec. 6.

## 2.

## Clinical Background

CVD is the major cause of morbidity and mortality in the western world. More than 2,200 patients die of CVD each day in the United States alone.^{3} CVD involves a variety of disorders of the cardiac muscle and the vascular system. The common causes of CVD include ischemic heart disease and congestive heart failure.^{4} Cardiac imaging has played a crucial and complementary role to the diagnosis and treatment of patients with known or suspected CVD. In the case of ischemic heart disease, the first consequence of the disease is the changes in the myocardial perfusion assessed by SPECT and PET or by MRI.^{5} In particular, the perfusion deficit leads to metabolic changes in myocardial tissues assessed by PET. A myocardial ischemia could further diminish ejection of blood because of the reduced capacity of the heart as analyzed by the myocardial contractile function using US, PET/SPECT, CT, or MRI.

Assessment of the left ventricle (LV) contractile function is essential for diagnosis and prognosis of CVD. The LV contractile function is commonly analyzed as it pumps oxygenated blood to the entire body.^{6}^{,}^{7} The computer-aided or fully automated segmentation of the ventricular myocardium is generally used to standardize analysis and improve the reproducibility of the assessment of contractile cardiac function.^{8} In addition, it forms an important preliminary step to provide useful diagnostic information by quantifying clinically important parameters, including end-diastolic volume (EDV), end-systolic volume (ESV), ejection fraction (EF), wall motion and thickening, wall thickness, stroke volume (SV), and transient ischemic dilation (TID).^{9} Furthermore, segmentation of the LV is necessary for the quantification of myocardial perfusion,^{10} the size of the myocardial infarct,^{11} or myocardial mass.^{12} Accurate determination of these parameters can help with a variety of diagnostic or prognostic applications in cardiology.

In addition to the LV segmentation, the whole heart, including the right ventricle, atria, aorta, and pulmonary artery^{13} is often segmented for 3-D visualization purposes to analyze coronary lesions or other cardiac abnormalities.

The primary application of cardiac segmentation has been the measurement of cardiac function. The most commonly used index of LV contractile function is the EF, which is the index of volume strain (change in volume divided by initial volume).^{7}^{,}^{14} The EF can be derived from EDV and ESV given by

^{15}16.17.

^{–}

^{18}Methods to quantify wall motion can rely on detecting endocardial motion by observing image intensity changes, determining the boundary wall of the ventricle, or attempting to track anatomical myocardial landmarks.

^{19}Wall thickening (WT) is usually measured using centerlines,

^{15}

^{,}

^{16}which can be defined in terms of percentage of systolic thickening and calculated per landmark point as

## (2)

$$\mathrm{WT}\text{\hspace{0.17em}}(\%)=\frac{{w}_{\mathrm{es}}-{w}_{\mathrm{ed}}}{{w}_{\mathrm{ed}}}\times 100,$$^{13}Moreover, TID of LV is a specific and sensitive parameter for detecting severe coronary artery disease (CAD).

^{20}TID is defined as the ratio of volume of blood pool after stress compared with rest. TID has been mostly measured by SPECT.

^{20}

## 3.

## Segmentation Techniques

In this section, we review several techniques for the segmentation of heart chambers and the whole heart. Cardiac image segmentation techniques can be divided into four main categories: (1) boundary-driven techniques, (2) region-based techniques, (3) graph-cuts techniques, and (4) model fitting techniques, in which multiple techniques are often used together to efficiently address the segmentation problem. We describe the methods in each category, and discuss their advantages and disadvantages.

## 3.1.

### Boundary-Driven Techniques

## 3.1.1.

#### Active contours (or snakes)

Boundary-driven segmentation techniques are based on the concept of evolving contours, deforming from the initial to the final position. One of the most widely used methods is the “active contour” model, which is also referred to as “snakes.”^{21} The active contour model allows a curve defined in the image domain to evolve under the influence of internal and external forces. The internal force is imposed on the contour in order to control the smoothness while the external force is usually derived from the image itself. An edge detector function is utilized as the external force in the classical active contour model. Most active contour models only detect objects with edges defined by the gradients. Kass et al.^{21}^{,}^{22} were the first to formulate the classical active contour model using an energy minimization approach. The active contour model seeks the lowest energy of an objective function, where the total energy of the active contour model is defined as

## (5)

$${E}_{\mathrm{in}}={\int}_{0}^{1}{E}_{\mathrm{in}}[v(s)]\mathrm{d}s\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{and}\phantom{\rule[-0.0ex]{1em}{0.0ex}}{E}_{\mathrm{ex}}={\int}_{0}^{1}{E}_{\mathrm{ex}}[v(s)]\mathrm{d}s,$$## (6)

$${E}_{\mathrm{in}}(v(s))=\alpha (s){\left|\frac{dv}{ds}\right|}^{2}(\text{Elasticity})+\beta (s){\left|\frac{{d}^{2}v}{d{s}^{2}}\right|}^{2}(\text{Stiffness}),$$A simple edge detector is used in Eq. (7) to formulate the external energy term ${E}_{\mathrm{ex}}$, where ${G}_{x}$ and ${G}_{y}$ denote the gradient images along $x$ and $y$ axes, respectively.

An example of the evolving 2-D contours obtained by applying the active contour model to a sequence of US of the LV is shown in Fig. 2. These contours deform gradually to the exact object boundaries by minimizing the energy of the active contour model. Although the active contour model has been a seminal work, it has some limitations. For instance, it is sensitive to the initialization as the contour may get stuck to a local minimum near the initial contour. The curve may pass through the boundary of the field of view of the image when the image has high amounts of noise. In addition, the accuracy of the active contour model depends on the convergence criteria employed in the minimization technique. A few attempts have been made to improve the original model by adopting new types of external field, including gradient vector flow^{23} and the balloon model.^{24}

## 3.1.2.

#### Geodesic active contour

The original active contour model can be expressed as the geodesic active contour^{26}27.28.^{–}^{29} using level set formulation.^{30} This method enables an implicit parameterization, allowing automatic changes in the topology. The geodesic active contour is an extended version of the geometric active contours^{31} by using geometric flow to shrink or expand a curve. It allows stable boundary detection when the image gradients suffer from large variations.^{26} The problem of fitting a contour is equivalent to finding geodesics of the minimal distance curves by minimizing the intrinsic energy given by

## (8)

$$E(v)={\int}_{0}^{1}g\{|\nabla I[v(p)]|\}|{v}^{\prime}(p)|\mathrm{d}p={\int}_{0}^{L(v)}g\{|\nabla I[v(p)]|\}\mathrm{d}s,$$## (9)

$$\frac{\partial \varphi}{\partial t}=g(I)|\nabla \varphi |\left[\mathrm{div}\right(\frac{\nabla \varphi}{|\nabla \varphi |})+k]+\nabla g(I)\xb7\nabla \varphi ,$$^{26}

The geodesic active contour with the level-set representation has become the basis of many boundary-driven segmentation techniques developed in the last decade.^{32}^{,}^{33} Although the geodesic active contour model has been applied to cardiac image segmentation, it has several limitations.^{32}^{,}^{33} One example is the sensitivity of the computed gradient value to noise because the differentiation of gray levels tends to magnify noise.

## 3.2.

### Region-Based Techniques

In the region-based segmentation techniques, regions of interest, including chambers from extracardiac structures, are partitioned by a selected global model that provides approximations of the region of interest. In other words, the global information defined within the region of interest is used to differentiate the region of interest from others by global homogeneity regional properties.^{34}^{,}^{35} Hybrid techniques that combine the region-based and boundary-based information have also been proposed to enhance the segmentation performance.

## 3.2.1.

#### Mumford-Shah functional

Mumford and Shah^{36} proposed a functional utilizing a piecewise smooth model. The functional of the piecewise model is smooth within regions yet may not be always smooth across the boundaries. The Mumford-Shah functional is defined as:

## (10)

$$E(f,C)=\lambda {\iint}_{R}{(f(x,y)-I(x,y))}^{2}\mathrm{d}x\mathrm{d}y+{\iint}_{R-C}{\Vert \nabla f(x,y)\Vert}^{2}\mathrm{d}x\mathrm{d}y+\mu |C|,$$This segmentation model has some drawbacks. It is computationally expensive^{37} and is not robust in the presence of strong noise and/or missing information. To circumvent these limitations, a fuzzy algorithm was introduced in the Mumford-Shah segmentation using the Bayesian and Maximum A Posteriori (MAP) estimator.^{38} Prior knowledge has also been incorporated^{39}40.41.^{–}^{42} to overcome the problem of noise and/or missing information that commonly occurs in medical imaging.

## 3.2.2.

#### Level-set based technique

Unlike the parametric representation, the level-set framework represents curves implicitly as the zero level set of a scalar function proposed by Osher and Sethian.^{43} Following the introduction of the level-set framework, Sethian,^{30}^{,}^{44} Osher and Fedkiw,^{45} and Osher and Paragios^{46} built a solid foundation of the level-set representation applied to a variety of problems. An example of LV segmentation using the level-set method is depicted in Fig. 3. The representation for contour evolution in the level-set framework is implicit, parameter-free, and intrinsic. Let $\mathrm{\Omega}\subset {\mathbb{R}}^{n}$, where $n$ is 2 or 3, denote the image domain. A contour $C\subset \mathrm{\Omega}$ can be represented by the zero level set of a higher-dimensional embedding function $\varphi (x):\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{\Omega}\to \mathbb{R}$ as given by

## (11)

$$\{\begin{array}{l}C=\{x\in \mathrm{\Omega}|\varphi (x)=0\}\\ \text{interior}(C)=\{x\in \mathrm{\Omega}|\mathit{\varphi}(x)>0\}\\ \text{exterior}(C)=\{x\in \mathrm{\Omega}|\mathit{\varphi}(x)<0\}\end{array},$$The interface is the zero level of $\varphi $ (*i.e.*, $\varphi (C(t),t)=0$ for all $t$). An evolution equation for $\varphi $ then can be derived using $\overrightarrow{n}=\frac{\nabla \varphi}{|\nabla \varphi |}$ as

The contour evolution $\frac{dC}{dt}=F\overrightarrow{n}$ corresponds to an evolution of $\varphi $ given by $\frac{\partial \varphi}{\partial t}=-F|\nabla \varphi |$. The level-set based segmentation method has been extensively utilized in the image segmentation problems due to a variety of advantages: it is parameter free, implicit, can change the topology, and provides a direct way to estimate the geometric properties. In addition, a large amount of effort has been made for its performance improvement.^{27}^{,}^{29}^{,}^{31}^{,}^{47}48.49.^{–}^{50}

In boundary-driven techniques, the gradient is used as a criterion to stop the curve. However, there are objects whose boundaries cannot be defined, such as smeared boundaries. Chan and Vese^{32} proposed a different model incorporating an implicit energy functional in boundaries $C$ with active contours and the level-set representation by modifying the Mumford-Shah functional, i.e.,

## (14)

$$E(f,C)=\sum _{i=1}^{N}{\lambda}^{2}{\iint}_{{R}_{i}}{[{c}_{i}(x,y)-I(x,y)]}^{2}\mathrm{d}x\mathrm{d}y+\mu |C|,$$## (15)

$$\frac{\partial \varphi}{\partial t}=\delta (\varphi )\left[\mu \text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{div}\right(\frac{\nabla \mathit{\varphi}}{|\nabla \mathit{\varphi}|})-{|{c}_{1}-I|}^{2}+{|{c}_{2}-I|}^{2}],$$## 3.2.3.

#### Clustering

Clustering algorithms have been used to group image pixels of similar features in the image segmentation problems. The resulting pixel-cluster memberships provide a segmentation of the image. Clustering-based segmentation methods are considered to be an old yet robust technique.^{51}52.53.^{–}^{54} One of the widely used clustering techniques is the $K$-means algorithm. This approach uses an objective function that expresses the performance of a representation for $k$ given clusters. If we represent the center of each image cluster by ${m}_{i}$ and the $j$’th element in cluster $i$ by ${x}_{j}$, the objective function can be defined as

## (16)

$$\mathrm{\Phi}(\text{clusters},\text{data})=\sum _{i\in \text{clusters}}\{\sum _{j\in i\text{'}\mathrm{th}\text{\hspace{0.17em}}\text{cluster}}{({x}_{j}-{m}_{i})}^{T}({x}_{j}-{m}_{i})\}.$$^{55}

Another clustering-based segmentation method is the fuzzy c-means algorithm based on the $K$-means and fuzzy set theory.^{56}57.^{–}^{58} The conventional fuzzy c-means method does not fully utilize the spatial information of the image. To cope with this limitation, an approach was developed to incorporate the spatial information into the objective function by indicating the strength of association between each pixel and a particular cluster (i.e., the probability that a pixel belongs to a specific cluster) in order to improve the segmentation results.^{59}

In addition, the expectation-maximization (EM) algorithm using the Gaussian mixture model is one of the well-established clustering-based methods. The iterative algorithm uses the posterior probabilities and the maximum likelihood estimates of the means, covariances, and coefficients of the mixture model.^{60}^{,}^{61} Furthermore, the EM algorithm can be combined with various models such as the hidden Markov random field model in order to achieve accurate and robust segmentation results.^{62} However, clustering-based methods have a few weaknesses. The methods are sensitive to initialization, noise, and inhomogeneities of image intensities.^{63}

## 3.3.

### Graph-Cuts Techniques

The graph-cuts technique^{64}^{,}^{65} was originated from Greig's maximum a posteriori (MAP) estimation^{66} in order to find the maximum flow for binary images. An interactive graph-cuts technique can find a globally optimal segmentation of an image. The user selects some pixels called “seed points” as hard constraints inside the object to be segmented as well as some pixels belonging to the background. The objective function is typically defined by boundary and regional properties of the segments. Therefore the obtained segmentation provides the best balance of boundary and region properties satisfying the constraints.^{65}

In the graph-cuts theory,^{65} an image is interpreted as a graph, where all pixels are connected to its neighbors. Graph node set $P$ and edge set $Q$ connect nodes $v\in P$ to form a graph $G=\{P,Q\}$. Terminals are two special nodes, known as the source (s) and sink (t), which are the start and end nodes of the flow in the graph, respectively. Also, there are two types of edges: n-links that connect neighboring pixels and t-links that connect pixels in image to terminal nodes. The cost or weight ${w}_{e}$ is assigned to each edge, $e\in Q$. The costs of n-links are the penalties for discontinuities between the pixels, and the costs of t-links are the penalties for assigning the corresponding terminal to the pixel. Thus, the total cost of the n-links represents the cost of the boundary while the total cost of the t-links indicates the regional properties. A cut $\mathrm{X}\subset Q$ is a set of edges that separates the graph into regions connected to terminal nodes. The cost of a cut is defined by the sum of the costs of edges that belong to the cut, which is denoted by

Then optimal segmentation results using the graph-cuts technique amount to finding the optimal solution for the cost of a cut, i.e., a minimal cost cut. An example of medical image segmentation using the graph-cuts techniques is illustrated in Fig. 4.

Several methods to find an optimal cost cut have been proposed such as minimizing the maximum cut between the segments^{68} and normalizing the cost of a cut.^{69} Boykov and Kolmogorov^{70} proposed a max-flow/min-cut algorithm and compared its efficiency with Goldberg-Tarjan's push-relabel^{71} and Ford-Fulkerson's augmenting paths.^{72} Based on the cut cost described above, the energy function can be formulated, consisting of the boundary term and the regional term. Let ${l}_{p}$ be the label for a given pixel $p$, which can be either an object or the background. Let $S$ be a set of pixels and $N$ be a set of all pairs of neighboring elements. The energy function^{73} for graph-cuts can then be given by:

One limitation of the graph-cuts technique is that it is not fully automated, as it demands the initialization of seed points in the object and the background regions.

## 3.4.

### Model-Fitting Techniques

The model-fitting segmentation attempts to match a predefined geometric shape to the locations of the extracted image features of an image. A two-step procedure is usually needed in the model-fitting segmentation: (1) generating the shape model from a training set and (2) performing the fitting of the model to a new image. The models contain the information about the shape and its variations. The main tasks in the model-fitting are the extraction of the features and generation of the best fitting model from the features. Given an accurate and appropriate model, the segmentation procedure becomes an optimization problem of finding the best model parameters for a given patient image. Human heart anatomy exhibits specific features and therefore the similar shape or intensity information about hearts can be utilized by means of a shape-prior knowledge. Prior knowledge can be used to compensate for common difficulties such as poor image contrast, noise, and missing boundaries.

Integrating the prior knowledge using explicit shape representation into segmentation process has been a topic of interest for decades. For instance, global shape information with closed curves represented by Fourier descriptors was proposed where the Gaussian prior was assumed for Fourier coefficients.^{74}^{,}^{75} The shape model was built by learning the distribution of Fourier coefficients. In addition, active shape models (ASM) were used in a variety of segmentation tasks.^{18}^{,}^{76}77.^{–}^{78} In brief, key landmark points on each training image generate a statistical model of shape variation, and a statistical model of intensity is built by warping each example image to match the mean shape. Principal component analysis (PCA) is applied on the key landmark points where the sample distribution is assumed as a Gaussian distribution. Any sample within the distribution can be expressed as a mean shape with a linear combination of eigenvectors.^{79} Cootes et al.^{76}^{,}^{80}^{,}^{81} built statistical models by positioning control points across training images and developed the active appearance model (AAM).^{82} An example of image segmentation based on the AAM is illustrated in Fig. 5. The landmark points should be placed in a consistent way over a large database of training shapes in order to avoid incorrect parameterization.^{77} Also, if the size of a training set is small, the model cannot capture its variability and is unable to approximate data that are not included in the training set.^{78} Furthermore, a statistical model is incorporated in order to describe intersubject shape variabilities. For example, the dimension of the parametric contours was reduced by the use of PCA. By projecting the shape onto the shape parameters and enforcing limits, global shape constraints have been applied to ensure that the current shape remains similar to that in the training set.^{76} Wang and Staib^{84} extended the work of Cootes et al.^{76} using a Bayesian framework to adjust the weights between the statistical prior knowledge and the image information based on image quality and reliability of the training set. The B-splines based curve representation was applied to the classical active contours model.^{85}86.^{–}^{87}

There have been several attempts to incorporate the prior knowledge of shape in the implicit shape representation. Leventon et al.^{88} incorporated the shape-prior information in the level-set framework with a set of previously segmented data using the signed distance function. A shape-prior model was also proposed to restrict the flow of the geodesic active contour, where the prior shape was derived by performing the PCA on a collection of the signed distance function of the training shape. A similar approach was proposed in Ref. 89 with an energy functional, including the information of the image gradient and the shape of interest in geometric active contours using the distance function to represent training distances. Another objective function for segmentation was proposed in Ref. 90 by applying the PCA to a collection of signed distance representations of the training data. Rousson and Paragios^{91} applied a shape constraint to the implicit representation using the level-set to formulate an energy functional, where an initial segmentation result can be corrected by the level-set shape prior model through PCA. They also considered a stochastic framework in constructing the shape model with two unknown variables: the shape image and the local degrees of shape deformations.

In specific applications, 3-D heart modeling was explored in Ref. 19 and the four-chamber heart modeling was proposed in Refs. 1 and 92. Geometric constraint was also incorporated in the LV segmentation problem. The model-based approach in Ref. 93 has gained a lot of attention as a solution to the image segmentation problem with incomplete image information.^{94}^{,}^{95}

Several other model-fitting methods have been investigated to date. The atlas-based segmentation was carried out based on the registration, where multiple atlases were registered to a target image by propagation of the atlas image labels with spatially varying decision fusion weight in CT scans.^{96} In addition, a deformable surface represented by a simplex mesh in the 3-D space used the time constraints in segmenting the SPECT cardiac image sequence in Ref. 2. Modeling the four-chamber heart was performed for 3-D cardiac CT segmentation,^{97} where the simplex meshes were used to provide a stable computation of curvature-based internal forces. Heart modeling was accomplished with a statistical shape model^{76} and labeling is performed on mesh points that correspond to special anatomical structures such as control points that integrate mesh models.^{1} The whole heart segmentation method, including four chambers, myocardium, and great vessels in CT images, was proposed in Ref. 98, where ASM and the generalized Hough transform for automatic model initialization were exploited.

## 4.

## Applications to Specific Imaging Modalities

In this section, several modalities for cardiac examinations are reviewed and techniques used for segmentation in each modality are presented. We summarize roles and characteristics of each modality with reference to the recent work,^{99} and describe the segmentation techniques used for each modality.

## 4.1.

### Ultrasound Imaging

US imaging is the most widely used technique in cardiology for evaluation of contractile cardiac function. It has several advantages, including good temporal resolution and relatively low cost. It can be used to assess tissue perfusion by myocardial contrast echocardiography.^{100} Additionally, it is well-suited for image-guided interventions due to its recent advances, allowing visualization of instruments as well as cardiac structures through the blood pool.^{101} However, US imaging suffers from low SNR (signal-to-noise ratio) and speckle noise,^{102} making the LV segmentation task challenging. Moreover, the acquisition is usually performed in 2-D^{102} and therefore depends on the orientation, leading to missing boundaries and low contrast between regions of interest.^{103} US imaging of the heart involves 2-D, 2-$\mathrm{D}+t$, 3-D, 3-$\mathrm{D}+t$, and Doppler echocardiography, each of which poses different challenges. In this review, we focus primarily on the segmentation of the 3-D and 3-$\mathrm{D}+t$ data.

A recent advance in this field of cardiac imaging is three-dimensional echocardiography (3-DE). This tool has been used only for research purposes in the past, but due to recent improvements in software algorithms and transducer technology, it is now used in clinical practice.^{104}^{,}^{105} 2-D and 3-D echocardiography use different transducers. 3-DE is well-suited for LV mass, volumes, and EF^{104}^{,}^{105} because 2-D imaging can potentially provide biased measurements of EF.^{106}

Numerous segmentation techniques have been proposed for US imaging. 3-D AAM was proposed,^{79}^{,}^{107} where its model was learned from the manual segmentation results and the information of the shape and image appearance of cardiac structures was included in a single model. The level-set or the active contour segmentation methods were also applied to the US segmentation.^{108}109.110.^{–}^{111} Level-set based method with specialized processing was adopted to extract highly curved volumes while ensuring smoothness of signals.^{108} Additionally, an algorithm based on deep neural networks and optimization was employed^{112} and a discriminative classifier, random forest, was used to delineate myocardium.^{113} For an in-depth review on the segmentation of US images, we refer the reader to Ref. 102.

## 4.2.

### Nuclear Imaging (SPECT and PET)

Nuclear imaging has been an accepted clinical gold standard for the quantification of relative myocardial perfusion at stress and rest.^{114} It is also the mainstream imaging technique to estimate myocardial hypo-perfusion due to coronary stenosis. Gated myocardial perfusion SPECT^{115} is also widely used for the quantitative assessment of the LV function. LV regional wall motion and thickening by SPECT play an integral part to assess coronary artery disease and determine the extent and severity of functional abnormalities.^{116} Accurate segmentation of LV and quantification of the volume offer an objective means to determine the risk stratification and therapeutic strategy.^{117} However, delineation of the endocardial surface with nuclear imaging is challenging due to relatively low image resolution, extracardiac background activities, partial volume effect, count statistics, and reconstruction parameters.^{118}

A few techniques have been developed for nuclear imaging segmentation. Germano et al.^{119} proposed LV segmentation method for SPECT, which is widely used in nuclear cardiology practice as illustrated in Fig. 6. In addition, wall motion and thickening were further investigated with the same technique.^{116} In brief, an asymmetric Gaussian was exploited to fit to each profile in each interval of a gated MPS volume, where a maximal count myocardial surface was determined. Other well-established methods for the quantitative analysis of nuclear myocardial perfusion imaging exist such as the Corridor4DM,^{120} the Emory Cardiac Toolbox,^{121} the University of Virginia quantification program,^{122} and the Yale quantification software.^{123} These automated software tools allow highly automatic definition of the LV contours and measure perfusion defect size, EF, EDV, and LV mass.

In other developments, the level-set technique was employed for the segmentation of cardiac gated SPECT images^{124} and a geometric active contour-based SPECT segmentation technique was proposed.^{125} Slomka et al.^{126} and Declerck et al.^{127} proposed a template-based segmentation method using the registration-based approach. Additionally, the 4-D (3-$\mathrm{D}+t$) shape prior was adopted in Ref. 128 using implicit shape representation of the left myocardium in SPECT image segmentation. This study extended the shape modeling to the spatiotemporal domain by treating time as the fourth dimension and applied the 4-D PCA. Faber et al.^{129} employed an explicit edge detection method to estimate endocardial and epicardial boundaries using the structural information in gated SPECT perfusion images. The 3-D ASM segmentation algorithm was adopted in Refs. 118 and 130 for cardiac perfusion gated SPECT studies and the construction of geometrical shape and appearance models. Reutter et al.^{131} used a 3-D edge detection technique for the segmentation of respiratory-gated PET transmission images and Markov random fields were adopted for 3-D segmentation of cardiac PET images.^{132}

## 4.2.1.

#### Gated SPECT analysis

In gated cardiac imaging, a short and cyclic image sequence is generated, representing a single heartbeat that summarizes data acquired over cardiac cycles.^{133}^{,}^{134} Gated SPECT images can provide global and regional parameters of LV function as described in Sec. 2. Once LV is segmented,^{119}^{,}^{129}^{,}^{135} the endocardial and epicardial boundaries are utilized for the quantification of global and regional parameters. The LV cavity volume is determined by the volume of each voxel and number of voxels bound by the LV endocardium and valve plane.^{116}^{,}^{119}^{,}^{136} Measurements of EF including ES and ED from gated SPECT are validated in many studies, demonstrating good accuracy.^{8}^{,}^{119}^{,}^{136}^{,}^{137} However, the relatively low resolution of nuclear cardiac images can lead to an underestimation of the LV cavity size, especially when patients have small ventricles, therefore resulting in overestimation of the EF.^{138}139.^{–}^{140} Quantitative measurement of wall motion is obtained by displacements of the endocardium from ED to ES^{116}^{,}^{141}^{,}^{142} and WT quantification is measured by assessing the apparent intensity of the myocardium from ED to ES resulting from the partial volume effect.^{18}^{,}^{129}^{,}^{143}144.^{–}^{145} Despite the low resolution of gated MPS, partial volume effect is actually exploited to analyze motion and thickening, since changes in the image intensity are related to the thickening of the myocardium.^{116}

## 4.3.

### Computer Tomography (CT)

In cardiac CT, there are two imaging procedures: (1) coronary calcium scoring with noncontrast CT and (2) noninvasive imaging of coronary arteries with contrast-enhanced CT. Typically, noncontrast CT imaging exploits the natural density of tissues. As a result, various densities using different attenuation values such as air, calcium, fat, and soft tissues can be easily distinguished.^{146} Noncontrast CT imaging is a low-radiation exposure method within a single breath hold, determining the presence of coronary artery calcium.^{146} In comparison, contrast-enhanced CT is used for imaging of coronary arteries with contrast material such as a bolus or continuous infusion of a high concentration of iodinated contrast material.^{147} Furthermore, coronary CT angiography has been shown to be highly effective in detecting coronary stenosis.^{148} Especially in the recent rapid advances in CT technology, CT can provide detailed anatomical information of chambers, vessels, coronary arteries, and coronary calcium scoring. Coronary CT angiography can visualize not only the vessel lumen but also the vessel wall, allowing noninvasive assessment of the presence and the size of the noncalcified coronary plaque.^{149} Additionally, CT imaging provides functional as well as anatomical information, which can be used for quantitative assessment for systolic WT and regional wall motion.^{150}^{,}^{151}

Various segmentation techniques have been proposed for cardiac CT applications. Funka-Lea et al.^{152} proposed a method to segment the entire heart using graph-cuts. Segmenting the entire heart was performed for clearer visualization of coronary vessels on the surface of the heart. They attempted to set up an initialization process to find seed regions automatically using a blowing balloon that measures the maximum heart volume and added an extra constraint with a blob energy term to the original graph-cuts formulation. Extracting the myocardium in 4-D cardiac MR and CT images was proposed in Ref. 67 using the graph-cuts as well as EM-based segmentation. Zheng et al.^{1} presented a segmentation method based on the marginal space learning by searching for the optimal smooth surface. Model-based techniques were also adopted for cardiac CT image segmentation using ASM with PCA.^{153} Methods for region growing^{154}^{,}^{155} and thresholding^{156}^{,}^{157} were also employed. An entirely different topic is the segmentation of coronary arteries from the CT angiography data, which is well covered by other reviews.^{158}^{,}^{159}

## 4.4.

### MRI

Cardiac MRI allows comprehensive cardiac assessment by several types of acquisitions that can be performed during one scanning session.^{9} It provides high-resolution visualization of cardiac chamber volumes, functions, and myocardial mass.^{160} Cardiac MRI has been established as the research gold standard for these measurements, with more and more clinical impact. Moreover, recently developed delayed enhancement imaging with gadolinium contrast has emerged as a highly sensitive and specific method for detecting myocardial necrosis. This allows improved evaluation of the myocardial infarction.^{161}^{,}^{162} Perfusion MRI imaging can also be performed for the diagnosis of ischemic heart disease. However, the perfusion MR imaging depends on a first-pass technique, which limits the conspicuity of perfusion defects.^{163}^{,}^{164} The advantages of MRI include exquisite soft-tissue contrast, high spatial resolution, low SNR, ability to characterize tissue with a variety of pulse sequences, and no ionizing radiation. Compared to PET or SPECT, the dependence of MR signal on regional hypoperfusion is minimal and does not prevent segmentation tasks. Some of the disadvantages are that cardiac MRI typically employs one breath-hold per slice with 5 to 15 slices per patient study, therefore necessitating multiple breath-holds for each patient dataset. Additionally, the images are of high-resolution in-plane but the resolution between slices is low (typically 8 to 10 mm). Also, multiple breath-hold acquisitions can cause errors in spatial alignment and result in artifacts of the 3-D heart image. These misalignments can be corrected by software registration techniques.^{165} Recently, full volume 3-D MRI acquisitions have been proposed.^{9}

Cardiac MR tagging is an important reference technique to measure myocardial function, which allows quantification of local myocardial strain and strain rate.^{166}^{,}^{167} Tagged MR produces signals that can be used to track motion. Several techniques have been developed, including magnetization, saturation, spatial modulation of magnetization (SPAMM), delay alternating with nutation for tailored excitation (DANTE), and complementary SPAMM (CSPAMM). These techniques produce a visible pattern of magnetization saturation on the magnitude reconstructed image without any post-processing. However, quantifying myocardial motion requires exhaustive post-processing. In contrast, more advanced techniques such as Harmonic phase (HARP), displacement encoding with simulated echoes (DENSE), and strain encoding (SENC)^{167}^{,}^{168} compute motion directly from the signal and do not directly show tagging pattern. Simple post-processing is required for myocardial motion information. For more details, we refer readers to the recent review of cardiac tagged MRI.^{167}

Numerous image segmentation techniques have been applied to MRI and are summarized below. Petitjean et al.^{169} presented a review of segmentation methods in short axis MR images. Paragios^{35} used the level-set technique using a geometric flow to segment endo- and epicardium of the LV. Two evolving contours were employed for the endo- and epicardium and the method combined the visual information with anatomical constraints to segment both regions of interest simultaneously. Paragios et al.^{35}^{,}^{170}^{,}^{171} applied the shape prior knowledge with the level-set representation to achieve robust and accurate results.

Moreover, several constraints and prior knowledge have been incorporated in the level-set framework for efficiently segmenting regions of interest. For example, the velocity-constrained front propagation method was proposed by using the magnitude and direction of the phase contrast velocity as the constraints.^{172} Woo et al.^{173} proposed statistical distance between the shape of endo- and epicardium as a shape constraint using signed distance functions. Tsai et al.^{174} proposed a shape-based approach to curve evolution and Ciofolo et al.^{175} proposed a myocardium segmentation scheme for late-enhancement cardiac MR images by incorporating the shape prior with contour evolution. Zhu et al.^{176} applied a dynamic statistical shape model with the Bayesian method.

Segmentation techniques using thresholding,^{177}^{,}^{178} region growing,^{179}^{,}^{180} and boundary detection^{181}^{,}^{182} were applied to MRI data. For instance, a local assessment of boundary detection method was proposed to improve the capture range and accuracy.^{183} Segmentation algorithm using optimal binary thresholding method and region growing was presented to delineate 3-$\mathrm{D}+t$ cine MR images.^{184} In addition, learning frameworks were used to segment 2-D tagged cardiac MR images.^{185}^{,}^{186}

## 4.5.

### Parameter Correlation between Imaging Modalities

Several attempts have been made to compare and correlate the quantitative parameters obtained by different imaging modalities and different image segmentation approaches. Various reports in the literature indicate that cardiac MRI can provide accurate estimates of EF, LV volumes.^{187}188.189.190.^{–}^{191} and wall motion/thickening analysis.^{187}188.^{–}^{189} In addition, gated SPECT has been extensively validated against various two-dimensional imaging techniques, such as echocardiography,^{192} but there are only a limited number of studies comparing gated SPECT with other three-dimensional techniques such as cardiac MRI, which is considered the reference standard for assessing LV volumes.^{193}194.195.^{–}^{196} Visual interpretations of wall motion by observers on the two modalities have been compared along with LV volumes^{193}194.195.^{–}^{196} but quantitative comparison for assessment of regional wall motion/thickening has not been reported previously. Using echocardiographic sequences, values of LV volumes, EF, and regional endocardial shortening also correlate with MR. Cardiac MRI was used as a reference method for comparison with unenhanced and contrast-enhanced echocardiography.^{197}^{,}^{198} LV mass obtained by contrast enhanced color Doppler echocardiography has shown excellent agreement with those from MRI.^{199} The left and right ventricular EDV, ESV, stroke volume, EF, and myocardial mass obtained by dual-source CT also correlated well with those from MRI.^{200}

## 5.

## Validation (Evaluation) of Segmentation Results

Automatic cardiac image segmentation results can be evaluated alone or by comparing it with a reference, possibly a different imaging modality, including the manual segmentation result or a ground truth. For stand-alone evaluation, one can exploit statistical properties of heart anatomy and/or observe the segmented images. For reference-based evaluation, both quantitative and qualitative comparisons can be performed. Quantitative comparison can be done by measuring various metrics such as the fractional energy difference, the Hausdorff distance, the average perpendicular distance, the dice metric, and the mean absolute distance^{110} between the segmented structures. The average perpendicular distance measures the distance from the automatically segmented contour to the corresponding manually drawn contour by experts, and averages of all contour points. For LV segmentation, the ED and the ES phases of all slices have been measured. The EF and the LV mass are also important clinical parameters to evaluate. Table 1 summarizes previous studies that dealt with cardiac segmentation validation with respect to different imaging modalities, imaging targets, the number of data sets, evaluation results, and comments.

## Table 1

Cardiac image segmentation results with validation.

References | Year | Modality | ROI | Dim | Data size | Validation results |
---|---|---|---|---|---|---|

Bosch et al.83 | 2002 | US | LV | 2D+T | 129 | The averaged border positioning error was 4.27±2.52 mm |

Yue et al.110 | 2008 | US | WH | 2D | 21 | The Hausdorff distance and mean absolute distance were measured |

Angelini et al.108 | 2005 | US | LV, RV | 3D+T | 10 | The error intervals were 1.31±6.27% for RV EF and 2.93±6.13% for LV EF |

Mitchell et al.79 | 2002 | MRUS | LV | 3D | 18 | The mean signed endocardial surface positioning error was 0.46±1.33 mm and the mean signed epicardial surface positioning error was 0.29±1.16 mm. |

Tsai et al.174 | 2003 | MR | LV | 2D | 100 images from 1 patient | N/A |

Uzumcu et al.201 | 2003 | MR | LVRV | 2D | 150 | Median point-to-point error (pixels) was 1.86 |

Woo et al.173 | 2009 | MR | LV | 2D3D | 10 | The mean values of EDV, ESV and LVEF were calculated using manual segmentation and proposed algorithm. EDV was 139±41, ESV was 68±49 and LVEF was 55±19 |

Funka-Lea et al.152 | 2006 | CT | WH | 3D | 70 | 2/70 failed. The averaged error between the manually and automatically generated surface was 5.5 mm |

Isgum et al.96 | 2009 | CT | WH | 3D | 29 | The Tanimoto coefficient between the reference and the automated segmentation was computed and the accuracy was 70.15±17.38% |

van Assen et al.202 | 2008 | CT | LV | 3D | 9 | Average point-to-point distances measured per patient between manual drawn ground truth and the proposed algorithm’s contour were 1.85 mm (endocardium) and 1.60 mm (epicardium). |

Zheng et al.92 | 2007 | CT | FourChambers | 3D | 137 | The point-to-mesh distance based on four-fold cross-validation with an averaged error rate of 2.3% |

Kohlberger et al.128 | 2006 | SPECT | LV | 3D+T | 15 | The averaged error rate was 27% compared to hand-segmented ground truth using relative symmetric voxel error calculation. |

Yang et al.125 | 2006 | SPECT | LV | 2D | 20 | The accuracy was 87.3±6.7% based on a region overlap measure against hand-segmented ground truth. |

Debreuve et al.124 | 2001 | SPECT | LV | 3D+T | 8 | The number of voxels of the segmented myocardium was computed |

## 6.

## Conclusions

Several advanced segmentation techniques have been proposed in the image processing and computer vision communities for the cardiac image analysis. In this review, we have categorized them into four major classes: 1) the boundary-driven techniques, 2) the region-driven techniques, 3) the graph-cuts techniques, and 4) the model-fitting techniques. These techniques have been applied to segmentation of cardiac images acquired by different imaging modalities, providing high automation and accuracy in determining clinically significant parameters. These computational techniques aid clinicians in evaluation of the cardiac anatomy and function, and ultimately lead to improvements in patient care. However, cardiac image segmentation continues to remain a challenge due to the complex anatomy of the heart, limited spatial resolution, imaging characteristics, cardiac and respiratory motion, and variable pathology and anatomy. Therefore, improved segmentation techniques with enhanced reliability, reduced computation time, superior accuracy, and full automation will be needed for the future.

## References

## Biography

**Dongwoo Kang** received the BS degree from the Seoul National University, Seoul, in 2007 and the MS degree from the University of Southern California (USC), Los Angeles, in 2009, all in electrical engineering. He is currently working toward his PhD degree in the department of electrical engineering and the Signal and the Image Processing Institute at the USC. His research interests are in the areas of medical image analysis including segmentation, tracking and detection of lesions.

**Jonghye Woo** received the BS degree from the Seoul National University, Seoul, in 2005 and the MS and PhD degrees from the University of Southern California (USC), Los Angeles, in 2007 and 2009, respectively, all in electrical engineering. He is presently a postdoctoral fellow and visiting scientist at the University of Maryland and Johns Hopkins University, Baltimore, respectively. His research interests are in the areas of medical image analysis including registration, segmentation, and quantitative analysis. He is the recipient of the USC Viterbi School of Engineering Best Dissertation Award in 2010.

**Piotr J. Slomka** is a research scientist with the artificial intelligence in medicine program. Dr. Slomka is also a professor of medicine at the University of California, Los Angeles (UCLA) David Geffen School of Medicine. Dr. Slomka is widely recognized as one of the leading contributors in the world to the area of research in software algorithms for medical image analysis. His current focus is in the development of automated computer algorithms for myocardial perfusion quantification and multimodality image registration. Dr. Slomka received his doctorate in medical biophysics from the University of Western Ontario in Canada and his master’s degree in medical instrumentation from the Warsaw University of Technology in Poland.

**Damini Dey** is a research scientist with the departments of biomedical sciences and imaging at the Cedars-Sinai Medical Center. She is also the technical director of the Experimental Image Analysis Lab, Biomedical Imaging Research Institute, at Cedars-Sinai Medical Center, and assistant professor at the University of California, Los Angeles (UCLA) David Geffen School of Medicine. She received her doctorate in medical physics from the University of Calgary in Canada. She is recognized as an expert in the area of algorithms for computer-aided quantitative analysis of cardiac CT. Her recent research focus is in automated derivation of clinically relevant imaging measures from noninvasive cardiac image data, clinical implementation of novel automated computer processing algorithms, and the application of these tools to solve key clinical problems. Her current investigations include the development of automated algorithms for detection and measurement of coronary plaque from coronary CT angiography, and quantification of epicardial and thoracic fat from noncontrast CT.

**Guido Germano** received his doctorate and master's degrees in biomedical physics from UCLA. He also earned his MBA from GEPI, Ministry of the Treasury in Rome, Italy. He is currently the director of the artificial intelligence in medicine program at Cedars-Sinai Medical Center in Los Angeles. He is also a professor of medicine at the University of California, Los Angeles (UCLA) David Geffen School of Medicine. His research and expertise play an integral role in Cedars-Sinai's nuclear cardiology program. One of his most outstanding contributions to the field's knowledge and clinical practice has been his creation of new artificial intelligence techniques to accurately determine the location of the heart from 3-D tomographic (SPECT) images, estimate epicardial and endocardial boundaries, and quantify heart volumes in a completely automated fashion. In addition, he has written over 200 original manuscripts and book chapters and received numerous awards for excellence in research in the fields of heart research, medical physics, and nuclear medicine.

**C.-C. Jay Kuo** received the BS degree from the National Taiwan University, Taipei, in 1980 and the MS and PhD degrees from the Massachusetts Institute of Technology, Cambridge, in 1985 and 1987, respectively, all in electrical engineering. He is presently director of the Signal and Image Processing Institute (SIPI) and professor of electrical engineering and computer science at the USC. His research interests are in the areas of multimedia data compression, communication and networking, multimedia content analysis and modeling, and information forensics and security. He is editor-in-chief for the *IEEE Transactions on Information Forensics and Security*and editor emeritus for the *Journal of Visual Communication and Image Representation* (an Elsevier journal). Dr. Kuo is a fellow of AAAS, IEEE, and SPIE.