Riemannian manifold has attracted an increasing amount of attention for visual classification tasks, especially for video or image set classification. Covariance matrices are the natural second-order statistics of image sets. However, nonsingular covariance matrices, known as symmetric positive defined (SPD) matrices, lie on the non-Euclidean Riemannian manifold (SPD manifold). Covariance discriminative learning (CDL) is an effective discriminative learning method that employs the Riemannian manifold in the SPD kernel space. However, in practice, the discriminative learning of CDL often suffers from the problems of poor generalization and overfitting caused by a finite number of training samples and noise corruption. Hence, we propose to address these problems by importing eigenspectrum regularization and graph-embedded frameworks. Discriminative learning with SPD manifold is generalized by the graph-embedded framework, which combines with eigenspectrum regularization in the SPD kernel space. Three local Laplacian graphs of graph-embedded framework and two eigenspectrum regularized models are incorporated to the proposed method. Comprehensive mathematical deduction of the proposed method is depicted with the “kernel tricks.” Experimental results on set-based face recognition and object categorization tasks reveal the effectiveness of the proposed method.

## 1.

## Introduction

The development of intelligent video surveillance, social networks, and electronic commerce enables a probe image set to be matched against all gallery image sets that becomes an image set classification task.^{1} Image sets can be extract from videos or albums. Each probe image set and gallery image set contains multiple images that belong to the same class, which allows the extraction of considerably more discriminative information than is possible in the traditional single image classification task.^{2} Image set classification has achieved widespread success in face recognition^{3}^{–}^{6} and object categorization.^{7}^{–}^{11}

Recently, many studies have indicated that numerous particular visual features lie on a Riemannian manifold.^{12} The subspaces of image sets form the Grassmann manifold, the symmetric positive definite (SPD) matrices form the SPD manifold,^{13} and the two-dimensional shapes lie on Kendall shape spaces.^{14} The use of Riemannian manifold to model image set and build the corresponding classifier for classification is popular in recent years.^{15} Subspace and covariance matrix are two typical representations for modeling image sets on the Riemannian manifold. The subspaces of image sets form the Grassmann manifold,^{8} and the nonsingular covariance matrices form the SPD manifold.^{10} Linear subspace is a popular choice for modeling image sets due to its excellent accommodation of image variations. Hence, the Grassmann manifold formed by subspaces is widely used for image set classification. However, linear subspace-based modeling has the limitation that it incorporates only relatively weak information (such as the subspace angles) about the location and boundary of the samples in the input space.^{10} The second-order statistic feature known as the (nonsingular) covariance matrices of image sets that form the SPD manifold characterizes the set structure more faithfully.^{10} Many studies^{10}^{,}^{12}^{,}^{16} have shown the effectiveness of SPD manifold for image set classification and that the covariance descriptor is robust to noise and illumination variations.

Covariance discriminative learning (CDL)^{10} is one of the most representative methods that uses covariance descriptor for image set classification. Covariance descriptor provides a natural representation for an image set, which makes no assumption about the set data distribution. Hence, covariance descriptor characterizes the set structure more faithfully, and the representation possesses stronger resistance to outliers.^{10} The SPD manifold formed by covariance matrices is mapped to a high-dimensional reproducing kernel Hilbert spaces (RKHS), where Euclidean geometry applies. Subsequently, linear discriminant analysis (LDA)^{17} is applied to perform discriminative learning with the “kernel trick,” which is known as kernel discriminant analysis.^{18} CDL has achieved considerable results on set-based face recognition and objection categorization tasks.

In this work, we focus on the discriminative learning problem of SPD manifold on the mapped RKHS. Due to the conventional problems of linear discriminative learning, such as the singularity of within-class scatter matrix and the instability of its inverse caused by the finite number of training samples,^{17} CDL may suffer from overfitting and poor generalization since conventional problems may also occur during discriminative learning in the kernel space.^{19} To address the conventional problems of LDA, numerous approaches, such as the Fisherface LDA,^{17} direct LDA,^{20} and null space LDA^{21} on linear Euclidean space, have been proposed. For the conventional problems in the kernel space, kernel methods of kernel Fisherface LDA,^{22} null space kernel LDA,^{19} and kernel direct-LDA^{23} in the nonlinear kernel space exist. However, these approaches usually discard a subspace (either the principal space or null space) to circumvent the singularity before discriminant learning, which causes a loss of discriminative information.^{24} Although dual-subspace LDA^{25} considers the contributions of both subspaces, the associated average scaling factor may not be a suitable choice for information in the principal subspace. To address these problems, the eigenfeature regularization and extraction (ERE)^{24} and complete discriminant evaluation and feature extraction (CDEFE)^{26} approaches were proposed to address these problems in a linear flat space and nonlinear kernel Euclidean space, respectively. ERE considers that the entire eigenspace of the within-class scatter matrix ${\mathbf{S}}_{W}$ should be retained for discriminant analysis and regularized by the eigenspectrum regularization weighting function. The entire eigenspace is partitioned into three parts according to the median operation, and three different strategies according to the eigenspectrum of ${\mathbf{S}}_{W}$ are devised for regularization.^{24} CDEFE tackles these problems in the kernel space by nonlinear mapping; it decomposes the kernel within-class variation matrix into principal and noise dominated subspaces. A weighting function that is based on the ratios of the successive eigenvalues of the eigenspectrum was proposed to circumvent the undue scaling of projection vectors.^{26} Discriminative vectors by applying predicted eigenvalues^{27} combined the eigenspectrum regularization models of ERE and CDEFE. Recently, regularized locality preserving discriminant embedding^{28} and locality regularization embedding (LRE)^{29} were proposed; these methods generalized the eigenfeature extraction of ERE by the graph-embedded framework to better preserve data locality. An adaptive locality preserving regulation model was devised for eigenspectrum regularization. The experimental results have demonstrated the effectiveness of these eigenspectrum regularization techniques.^{29}

Inspired by eigenspectrum regularization, in this work, we aim to address the conventional problems of CDL in discriminative learning by exploiting the eigenspectrum regularization with the graph-embedded framework in the RKHS, which is mapped from the SPD manifold. We refer to the proposed method as regularized graph-embedded covariance discriminative learning (RGCDL). Figure 1 shows the conceptual illustration of the proposed method. The main contributions of this paper are presented as follows.

1. We circumvent the instability, overfitting, and poor generalization on CDL

^{10}with kernel eigenspectrum regularization architecture. The input elements in high-dimensional SPD manifold are reduced to lower-dimensional SPD manifold by principal component analysis (PCA).2. We incorporate the graph-embedded framework with three local Laplacian graphs into the Riemannian kernel eigenspectrum regularization architecture to better preserve data locality. We give the systematic derivation of the graph-embedded framework that incorporates with the eigenspectrum in Riemannian kernel space.

3. We evaluate the advantages of the proposed method with two eigenspectrum regularization models on face recognition and object categorization tasks. The experimental results show the stability of the extracted features, robustness to noise-corrupted data, and high classification rate to numerous set-based classification methods.

The rest of this paper is organized as follows. We present the works related to image set classification according to the image set representations in Sec. 2. The original CDL method and the architecture of eigenspectrum regularization are introduced in Sec. 3. Then, the RGCDL approach is presented in Sec. 4. Experimental evaluation and discussions are presented in Sec. 5. Finally, Sec. 6 concludes this paper.

## 2.

## Related Work

In this paper, we aim to use the proposed method to solve the image set classification task. The major issues of image set classification focus on how to represent image set and measure the distance or similarity between two sets.^{5} Various techniques have been proposed to represent image set, such as the statistical distribution,^{30}^{,}^{31} affine/convex hull model,^{4} spare representation,^{5} subspace,^{7}^{,}^{32} and covariance matrix.^{10}^{,}^{12}

The methods^{30}^{,}^{31} that model each image set by statistical distribution are one of the earliest approaches employed for image set classification. They measure the similarities between pairs of distributions of two sets and achieve considerable results. However, if the set data have no strong statistical correlations for parameter estimation, these methods often fail to work.^{5} The most representative affine/convex hull-based methods are the affine/convex hull-based image set distance (AHISD/CHISD);^{4} AHISD/CHISD represents images as points in a linear or affine feature space and computes the distance of convex geometric region spanned by its feature points. Hu et al.^{5} incorporated the sparse representation to regularize the affine hull model. Zhu et al.^{6} employed the collaborative representation technique to utilize the discrimination information between gallery sets. The affine/convex hull approaches actually aim to find the synthetic nearest points between image sets.^{11} However, these hull models usually cannot handle the complex appearance variations caused by multiple views and extreme illumination.

Subspace is a popular and effective approach for modeling image sets. Mutual subspace method (MSM)^{32} is one of the earliest classic subspace-based method for image set classification. MSM models all image sets by linear subspaces, and the similarity between pairs of subspaces is measured by canonical correlation analysis (CCA).^{33} Fukui and Yamaguchi^{34} and Fukui and Maki^{35} projected the linear subspaces to a “difference subspace,” which can extract the disparity between two subspaces. Kim et al.^{7} incorporated discriminative learning into subspace-based set classification according to canonical correlations (DCC). DCC attempts to obtain a linear transformation that maximizes the canonical correlations of within-class subspaces and minimizes the canonical correlations of between-class subspaces. Arandjelovic^{36} extended CCA to an extended version (ECCA) by extracting the most similar models of variability within two sets and exploited the discriminative learning architecture to train a classifier (DECCA).

Subspaces can also be treated as points that lie on a special type of Riemannian manifold, which is known as the Grassmann manifold. The method in Ref. 3 represents an image set as multiple local linear subspaces and treats them as points on the Grassmann manifold; then, the manifold-to-manifold distance (MMD) is defined between two manifolds of two image sets. Manifold discriminant analysis^{37} was proposed to learn an embedding space by maximizing the manifold margin of the MMD. Grassmann manifold can also be mapped to an RKHS, where Euclidean geometry applies, Grassmann discriminant analysis (GDA)^{8} implements LDA on the mapped RKHS by the Grassmannian kernel. GDA is generated to kernel GDA (KGDA) using Gaussian kernel principal subspaces.^{38} Graph-embedding Grassmann discriminant analysis (GGDA)^{9} is another counterpart to the GDA method; it exploits the graph-embedded framework to implement discriminant analysis on the mapped RKHS. Grassmann nearest points (GNP)^{11} finds the nearest Grassmann points on the mapped vector space using the affine hull. More recently, regularized Grassmann discriminant analysis (RGDA)^{2} was proposed to circumvent the conventional problems of LDA, when the training sets are insufficient. However, as previously mentioned, the linear subspace-based methods have the limitation of using weak information to measure the similarity.^{10} Modeling visual features as covariance matrices for visual classification has become popular in recent years^{10}^{,}^{12}^{,}^{39} since the nonsingular covariance matrix (as known as SPD matrix) can form a special Riemannian manifold, which is referred to as SPD manifold.^{12} Previous studies employed covariance matrices to characterize local regions within an image, which is named the region covariance.^{39} Different from the region covariance descriptor, CDL is the crucial method that models the whole image set by the covariance descriptor for addressing the image set classification with SPD manifold. Huang et al.^{40} proposed log-Euclidean metric learning to learn a tangent mapping from the original tangent space of the SPD manifold to a new discriminative space. Tan and Gao^{16} proposed a patch-based principal covariance discriminative learning (PPCDL) method, in which the image set is partitioned into several local maximum linear patches by a hierarchical divisive clustering method, the local patches are modeled by covariance matrices, and the final discriminative learning is similar to CDL. Discriminant analysis on Riemannian manifold of Gaussian distributions (DARG)^{41} models the image set with a Gaussian mixture model (GMM) and derives a series of kernels for Gaussians discriminative learning on SPD manifold. Symmetric positive definite manifold learning^{12} learns an orthonormal projection from the high-dimensional SPD manifold to a low-dimensional, more discriminative manifold.

## 3.

## Preliminaries

In this section, we first review the theory of CDL^{10} and then present the architecture of eigenspectrum regularization according to LRE.^{29}

## 3.1.

### Covariance Discriminative Learning

CDL uses a natural methodology to characterize image sets by the covariance descriptor. Let $\mathbf{X}=[{x}_{1},{x}_{2},\dots ,{x}_{n}]$ denote the data matrix of an image set with $n$ image vectors, where ${x}_{i}\in {\mathbb{R}}^{D}$ in the $D$-dimensional vector space. The covariance descriptor can be expressed as

## Eq. (1)

$$\mathbf{B}=\frac{1}{n-1}\sum _{i=1}^{n}({x}_{i}-\overline{x}){({x}_{i}-\overline{x})}^{T},$$^{10}This perturbation can be denoted as ${\mathbf{B}}^{*}=\mathbf{B}+\eta \mathbf{I}$, where $\mathbf{I}$ is the identity matrix and $\eta $ is a scaling parameter. Hence, the nonsingular covariance matrix becomes a $D\times D$ SPD matrix ${\mathrm{sym}}_{D}^{+}$, which is an element on Riemannian manifold. In the following paper, we still use $\mathbf{B}$ to denote the nonsingular covariance matrix for simplicity. After modeling the image sets as multiple SPD matrices, CDL explores a Riemannian kernel that is induced by the Riemannian metric, such as the log-Euclidean distance (LED)

^{42}to map the ${\mathrm{sym}}_{D}^{+}$ to an Euclidean space. The Riemannian metric of LED defines a true geodesic on the Riemannian manifold, as it is induced by a positive definite kernel,

^{42}and the manifold structure can be preserved as much as possible. The metric of LED is defined as

## Eq. (2)

$${d}_{\mathrm{LED}}({\mathbf{B}}_{1},{\mathbf{B}}_{2})={\Vert \mathrm{log}({\mathbf{B}}_{1})-\mathrm{log}({\mathbf{B}}_{2})\Vert}_{F},$$## Eq. (3)

$$\mathrm{log}(\mathbf{B})=\mathbf{U}\text{\hspace{0.17em}}\mathrm{log}\left(\mathbf{\Sigma}\right){\mathbf{U}}^{T},$$## Eq. (4)

$${k}_{\mathrm{log}}({\mathbf{B}}_{1},{\mathbf{B}}_{2})=\mathrm{tr}[\mathrm{log}({\mathbf{B}}_{1})\mathrm{log}({\mathbf{B}}_{2})].$$The kernel function ${k}_{\mathrm{log}}$ is shown to be an SPD kernel^{10}^{,}^{13} that obeys Mercer’s theorem.^{43} Therefore, the manifold structure can be preserved by the LED Riemannian kernel.

The explicit kernel feature mapping allows application of any standard vector space learning algorithms. The discriminative learning of CDL is conducted by the kernel LDA^{18} with the kernel trick. The mapping of Riemannian manifold to an Euclidean space is defined by the function $\varphi (\xb7)$. Therefore, if $L$ points of the specified Riemannian manifold are spanned by the ${\mathrm{sym}}_{D}^{+}$ matrices $\{{\mathbf{B}}_{1},{\mathbf{B}}_{2},\dots ,{\mathbf{B}}_{L}\}$, the mapped feature points on Euclidean space can be denoted as $\{f({\mathbf{B}}_{1}),f({\mathbf{B}}_{2}),\dots ,f({\mathbf{B}}_{L})\}$. With the inner product $\u27e8\varphi ({\mathbf{B}}_{i}),\varphi ({\mathbf{B}}_{j})\u27e9={k}_{\mathrm{log}}({\mathbf{B}}_{i},{\mathbf{B}}_{j})$, CDL seeks to solve the following optimization:^{10}

## Eq. (5)

$${\mathbf{\alpha}}_{\mathrm{opt}}=\mathrm{arg}\text{\hspace{0.17em}}\mathrm{max}\text{\hspace{0.17em}}\frac{{\mathbf{\alpha}}^{T}\mathbf{K}\mathbf{W}\mathbf{K}\mathbf{\alpha}}{{\mathbf{\alpha}}^{T}\mathbf{K}\mathbf{K}\mathbf{\alpha}},$$## Eq. (6)

$${W}_{ij}=\{\begin{array}{ll}\frac{1}{{N}_{c}},& \text{if}\text{\hspace{0.17em}\hspace{0.17em}}{\mathbf{B}}_{i}\in {C}_{c}\text{\hspace{0.17em}\hspace{0.17em}}\text{and}\text{\hspace{0.17em}\hspace{0.17em}}{\mathbf{B}}_{j}\in {C}_{c}\\ 0,& \text{otherwise}\end{array},$$## 3.2.

### Eigenspectrum Regularization Technique

Eigenspectrum regularization^{24}^{,}^{26}^{,}^{29} was originally proposed to address the conventional problems (problems caused singularity of ${\mathbf{S}}_{W}$ and the numerical instability of its inverse) of LDA on the linear Euclidean space. In this section, we introduce LRE^{29} as an instance since it is the prototype of the proposed method in this paper.

Consider $n$ samples of training data $\mathbf{X}=[{x}_{1},{x}_{2},\dots ,{x}_{n}]$ with $\{{x}_{i}\in {\mathbb{R}}^{D}|i=\mathrm{1,2},\dots ,n\}$. In LRE, intrinsic data structure is modeled to regularize the directions of data locality $\mathbf{X}{\mathbf{L}}_{\mathrm{loc}}{\mathbf{X}}^{T}$.^{29} The eigenspectrum and directions can be obtained by decomposing

## Eq. (8)

$$\mathbf{\lambda}={\mathbf{V}}^{T}\mathbf{X}{\mathbf{L}}_{\mathrm{loc}}{\mathbf{X}}^{T}\mathbf{V},$$^{29}and $\mathbf{\lambda}$ is a diagonal matrix whose diagonal elements are the eigenvalues $[{\lambda}_{1},{\lambda}_{2},\dots ,{\lambda}_{D}]$ in descending order. The plot of the eigenvalues ${\lambda}_{k}$ against the index $k$ is referred to as the eigenspectrum. $\mathbf{V}=[{v}_{1},\dots ,{v}_{D}]$ contains the eigenvectors (directions) of the locality preserving matrix $\mathbf{X}{\mathbf{L}}_{\mathrm{loc}}{\mathbf{X}}^{T}$ corresponding to $\mathbf{\lambda}$. The locality preserving matrix $\mathbf{X}{\mathbf{L}}_{\mathrm{loc}}{\mathbf{X}}^{T}$ has been shown to be exactly equal to the within-class scatter matrix with equal weights on the edges of adjacent data pairs of ${\mathbf{L}}_{\mathrm{loc}}$.

^{28}

LRE decomposes the entire eigenspace $\mathbf{V}$ into two subspaces: (1) the disparity subspace ${\mathbf{V}}_{\text{disparity}}=[{v}_{1},{v}_{2},\dots ,{v}_{q}]$, which corresponds to lower locality preservation, and (2) the principal subspace ${\mathbf{V}}_{\text{principal}}=[{v}_{q+1},{v}_{q+2},\dots ,{v}_{D}]$ for higher locality preservation. LRE indicates that the first few eigenvectors of the eigenspace correspond to large eigenvalues that provide lower locality preserving capability, whereas the eigenvectors that correspond to smaller eigenvalues provide higher locality preserving capability. Hence, larger weights are imposed on the subspace with higher locality preservation, whereas smaller weights are assigned to the subspace with lower locality preservation. A method is devised by determining “fences” to separate the disparity subspace and principal subspace, and then regularize these two subspace according to an adaptive eigenspectrum regularization model. The fences is defined by a split point on eigenspectrum ${\lambda}_{\text{disparity}}=\gamma (Q3+1.5\times IQR)$, where $\mathbf{Q}3$ is the third quartile (cutting off the highest 75% or lowest 25% of the sum of $\mathbf{\lambda}$), and $\gamma $ is a parameter for adaptively scaling the separating value. The definition of $IRQ$ is $IRQ=Q3-Q1$, where $\mathbf{Q}1$ is the first quartile.

This adaptive eigenspectrum regularization model finds the $q$’th split eigenvalue that satisfies ${\lambda}_{q}=\mathrm{max}\{\forall \text{\hspace{0.17em}\hspace{0.17em}}{\lambda}_{i}|{\lambda}_{i}\le {\lambda}_{\text{disparity}}\}$. The piecewise regularization function of LRE is defined as

## Eq. (9)

$${w}_{k}^{\mathrm{LRE}}=\{\begin{array}{ll}{\lambda}_{k}^{-1/2},& 1\le k\le q\\ {\lambda}_{q}^{-1/2},& q<k\le D\end{array}.$$The regularization function is imposed on the corresponding eigenvectors to form a full-dimensional transformation matrix

Then, LRE can obtain a more localized feature by transforming the original training data

We indicate that there is no dimensional reduction has occurred in this transforming. The information of the original training data is preserved as much as possible.

In the subsequent step of LRE, feature extraction and dimensional reduction from the regularized and more compact data are performed. To further preserve the within-locality and between-locality power, a similarity weight matrix ${\mathbf{G}}^{\mathrm{LRE}}$ is utilized. The element of ${\mathbf{G}}^{\mathrm{LRE}}$ is defined as

## Eq. (12)

$${G}_{ij}^{\mathrm{LRE}}=\{\begin{array}{ll}\frac{1}{{n}_{c}}-\frac{1}{n},& \text{if}\text{\hspace{0.17em}\hspace{0.17em}}{x}_{i}\in {C}_{c}\text{\hspace{0.17em}\hspace{0.17em}}\text{and}\text{\hspace{0.17em}\hspace{0.17em}}{x}_{j}\in {C}_{c}\\ -\frac{1}{n},& \text{otherwise}\end{array},$$^{29}The final objective function of LRE is defined as

## Eq. (13)

$${\mathbf{U}}^{*}=\underset{{\mathbf{U}}^{T}\mathbf{U}=1}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{max}}\text{\hspace{0.17em}}{\mathbf{U}}^{T}\tilde{\mathbf{X}}{\mathbf{G}}^{\mathrm{LRE}}{\tilde{\mathbf{X}}}^{T}\mathbf{U}.$$This problem can be easily solved by converting it to a generalized eigenvalue problem $\tilde{\mathbf{X}}{\mathbf{G}}^{\mathrm{LRE}}{\tilde{\mathbf{X}}}^{T}{u}_{i}={\phi}_{i}{u}_{i}$. By retaining $d$ eigenvectors $\mathbf{U}=[{u}_{1},{u}_{2},\dots ,{u}_{d}]$ ($d\le D$), which correspond to the largest $d$ eigenvalues, the projection matrix ${\mathbf{Z}}^{\mathrm{LRE}}=\tilde{\mathbf{V}}\mathbf{U}$ is used for the final lower-dimensional eigenfeature extraction.

## 4.

## Proposed Method

In this section, the proposed RGCDL is presented. To incorporate the eigenspectrum regularization and graph-embedded framework with SPD manifold in the kernel space, the algorithm of RGCDL is quite different from the original CDL algorithm. Generally, the algorithm of our RGCDL mainly comprises of two steps. The first step is eigenspectrum regularization, and the second step is feature extraction and dimensional reduction.

## 4.1.

### Representation of SPD Manifold

At first, according to Harandi et al.^{12} and Tan and Gao,^{16} the computational cost of the Riemannian kernel with a high-dimensional SPD matrix is quite high. Several strategies are available to lower the dimensionality of the SPD matrix and reduce the computational cost of constructing the Riemannian kernel matrix.^{12} Here, we combine all the training data in different training sets to collaboratively produce the dimensional reduction projection matrix by PCA.

Consider $L$ training image sets $\mathbf{\chi}={\{{\mathbf{X}}_{i}\}}_{i=1}^{L}$, each set contains ${L}_{i}$ images ${\mathbf{X}}_{i}=[{x}_{1},{x}_{2},\dots ,{x}_{{L}_{i}}]$. We combine all images of all sets to build a sample data collection

## Eq. (14)

$$\chi ={\{{x}_{j}\}}_{j=1}^{N},\phantom{\rule[-0.0ex]{1em}{0.0ex}}N=\sum _{i=1}^{L}{L}_{i}.$$The dimensional reduction projection matrix can be obtained by decomposing the following sample covariance matrix:

## Eq. (15)

$$\mathbf{\Pi}=\frac{1}{N}\sum _{j=1}^{N}({x}_{j}-\overline{x}){({x}_{j}-\overline{x})}^{T},$$This simple PCA that is applied to all training sets not only alleviates the problem of the high computational complexity of constructing the SPD kernel matrix but also better preserves the main variations in the set data to build the covariance matrices, which form the SPD manifold. This operation of refining the high-dimensional SPD matrices can also be viewed as a transformation from the high-dimensional manifold to a low-dimensional manifold.^{16}

## 4.2.

### Eigenspectrum Regularization with SPD Manifold

The $i$’th dimensional reduced set can be represented as ${\mathbf{Y}}_{i}=[{y}_{1}^{i},{y}_{2}^{i},\dots ,{y}_{{L}_{i}}^{i}]$. ${\mathbf{Y}}_{i}$ can be modeled by the covariance descriptor [Eq. (1)] and represented as ${\mathbf{B}}_{i}\in {\mathbb{R}}^{{d}_{1}\times {d}_{1}}$. To ensure that ${\mathbf{B}}_{i}$ is nonsingular to form the SPD manifold, a small perturbation is added to the covariance matrix ${\mathbf{B}}_{i}+\eta \mathbf{I}$. Hence, the perturbed ${\mathbf{B}}_{i}$ is an SPD matrix ${\mathrm{sym}}_{{d}_{1}}^{+}$.

For $L$ image sets of $C$ classes, they can be denoted as a collection of ${\mathrm{sym}}_{{d}_{1}}^{+}$ matrices $\mathbf{B}=\{{\mathbf{B}}_{1},{\mathbf{B}}_{2},\dots ,{\mathbf{B}}_{L}\}$ that form an SPD manifold. By defining the Riemannian mapping $\varphi :\mathcal{M}\to \mathcal{H}$, we can obtain the samples $\mathbf{\Phi}(\mathbf{B})=[\varphi ({\mathbf{B}}_{1}),\varphi ({\mathbf{B}}_{2}),\dots ,\varphi ({\mathbf{B}}_{L})]$ on the RKHS $\mathcal{H}$, which is homeomorphic to Euclidean space.

To further preserve the local structure, we incorporate the graph-embedded framework into our proposed method. The local Laplacian matrix ${\mathbf{L}}_{\mathrm{loc}}$ is utilized to preserve the locality information, whereas the global Laplacian matrix is adjacent regardless of the class membership of all vertices.^{29} In this step, we aim to obtain the eigenspectrum and directions of the local structure in the SPD Riemannian kernel space. They can be implemented by decomposing the locality preserving matrix $\mathbf{\Phi}(\mathbf{B}){\mathbf{L}}_{\mathrm{loc}}\mathbf{\Phi}{(\mathbf{B})}^{T}$ on the mapped space; we denote $\mathbf{\Phi}(\mathbf{B})$ as $\mathbf{\Phi}$ for simplicity. Then, we have

## Eq. (17)

$$\mathbf{\lambda}={\mathbf{V}}^{T}\mathbf{\Phi}{\mathbf{L}}_{\mathrm{loc}}{\mathbf{\Phi}}^{T}\mathbf{V},$$## Eq. (18)

$${\mathbf{L}}_{\text{class}}=\{\begin{array}{ll}1-{W}_{ij},& i=j,\\ -{W}_{ij},& i\ne j,\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}\hspace{0.17em}}{\mathbf{B}}_{i}\in {C}_{c}\text{\hspace{0.17em}\hspace{0.17em}}\text{and}\text{\hspace{0.17em}\hspace{0.17em}}{\mathbf{B}}_{j}\in {C}_{c}\\ 0,& \text{otherwise}\end{array},$$^{10}The locality preserving matrix $\mathbf{\Phi}{\mathbf{L}}_{\text{class}}{\mathbf{\Phi}}^{T}$ can be proved to be equal to the kernel within-class scatter matrix ${\mathbf{S}}_{W}^{\mathrm{\Phi}}$ with equal weights on the edges of adjacent data pairs of ${\mathbf{L}}_{\text{class}}$.

^{29}

Unlike the edge weights in ${\mathbf{L}}_{\text{bin}}$ and ${\mathbf{L}}_{\text{class}}$, which are fixed in values, the edge weights in ${\mathbf{L}}_{\mathrm{adj}}$ are variables that are based on different similarity definitions, such as the heat kernel in locality preserving projections^{44} and neighborhood reconstruction coefficients in neighborhood preserving embedding.^{45} ${\mathbf{L}}_{\mathrm{adj}}$ is a Laplacian matrix that is computed by

## Eq. (20)

$${W}_{ij}=\{\begin{array}{ll}\mathrm{exp}(-\frac{{\Vert \varphi ({\mathbf{B}}_{i})-\varphi ({\mathbf{B}}_{j})\Vert}^{2}}{\sigma}),& \text{if}\text{\hspace{0.17em}\hspace{0.17em}}{\mathbf{B}}_{i}\in {C}_{c}\text{\hspace{0.17em}\hspace{0.17em}}\text{and}\text{\hspace{0.17em}\hspace{0.17em}}{\mathbf{B}}_{j}\in {C}_{c}\\ 0,& \text{otherwise}\end{array},$$As known by linear algebra, the projection direction of $\mathbf{V}$ in Eq. (17) can be represented as a linear combination of the eigenspace on the mapped space

By substituting Eq. (21) into Eq. (17), we obtain $\mathbf{\lambda}={\mathbf{\alpha}}^{T}{\mathbf{\Phi}}^{T}\mathbf{\Phi}{\mathbf{L}}_{\mathrm{loc}}{\mathbf{\Phi}}^{T}\mathbf{\Phi}\mathbf{\alpha}$. We use the Riemannian kernel function [e.g., Eq. (4)] to build the kernel Gram matrix $\mathbf{K}={\mathbf{\Phi}}^{T}\mathbf{\Phi}$, Eq. (17) can be rewritten as

## Eq. (22)

$$\mathbf{\lambda}={\mathbf{\alpha}}^{T}\mathbf{K}{\mathbf{L}}_{\mathrm{loc}}\mathbf{K}\mathbf{\alpha}.$$Equation (22) can be solved by the eigendecomposition of $\mathbf{K}{\mathbf{L}}_{\mathrm{loc}}\mathbf{K}$ subject to ${\mathbf{\alpha}}^{T}\mathbf{\alpha}=1$.

In the theory of eigenspectrum regularization, we need to regularize the whole feature space $\mathbf{V}$ [see Eq. (21)] on the mapped space. Assume that, the regularization can be generalized to the weighting function, which is defined as

where $L$ is the number of all training sets. The full-dimensional feature space $\mathbf{V}$ contains $L$ vectors $[{v}_{1},{v}_{2},\dots ,{v}_{L}]$ since the dimensions of matrix $\mathbf{K}{\mathbf{L}}_{\mathrm{loc}}\mathbf{K}$ is $L\times L$). Hence, the regularized eigenspace can be computed asAccording to Eq. (21), we have ${v}_{k}=\mathbf{\Phi}{\mathbf{\alpha}}_{k}$. By defining

the regularized eigenspace of Eq. (24) can be rewritten asThe regularized eigenspace is known as a transformation matrix^{24} that can transform the original feature data to an intermediate feature vector space. It is worth noting that the transformation matrix $\tilde{\mathbf{V}}$ is a full-dimensional matrix with size $L\times L$. Hence, the mapped data $\mathbf{\Phi}(\mathbf{B})$ from SPD manifold can be transformed to the new feature vector space $\tilde{\mathbf{\Phi}}(\mathbf{B})$ with no dimensional reduction, which can preserve information as much as possible. We denote $\tilde{\mathbf{\Phi}}(\mathbf{B})$ as $\tilde{\mathbf{\Phi}}$ for simplicity. The transformation is depicted as

Although $\mathbf{\Phi}$ is implicitly defined, the transformed feature $\tilde{\mathbf{\Phi}}$ can be explicitly expressed by the kernel trick. According to Eqs. (26) and (27), we can denote the transformed $\tilde{\mathbf{\Phi}}$ by $\tilde{\mathbf{\Phi}}={\mathit{\rho}}^{T}{\mathbf{\Phi}}^{T}\mathbf{\Phi}$. As $\mathbf{K}={\mathbf{\Phi}}^{T}\mathbf{\Phi}$, Eq. (27) can be rewritten as

In this aspect, according to the previous mathematical deduction, by defining the important regularized eigenspace $\mathit{\rho}$ [see Eq. (25)], the regularization of eigenspace $\mathbf{V}$ is turned into the regularization of the eigenspace $\mathbf{\alpha}$. In other words, the effectiveness of eigenspectrum regularization model on the eigenspace $\mathbf{\alpha}$ is equivalent to the eigenspace $\mathbf{V}$.

The selection of a suitable eigenspectrum regularization model is a critical aspect of the proposed method. The proper eigenspectrum regularization model ensures that the regularized data can be very close to the real population variances.^{46} The eigenspectrum regularization of LRE is an adaptive model that estimates the optimal parameter $\gamma $ using training data. However, this process is usually time-consuming, and the performance decays quickly when the training data are insufficient. In this paper, we employ the data-independent eigenspectrum regularization models of ERE and CDEFE to regularize the eigenspace $\mathit{\alpha}$, which are more general and robust. The first model is the eigenspectrum regularization model of ERE.^{24} The heuristic theory of ERE for designing the eigenspectrum regularization model is the median operation. The weighting function applied to the eigenspace $\mathit{\alpha}$ is defined as

## Eq. (29)

$${w}_{k}^{\mathrm{ERE}}=\{\begin{array}{ll}{\lambda}_{k}^{-1/2}& k<{m}_{1}\\ {\left(\frac{a}{k+b}\right)}^{-1/2}& {m}_{1}\le k\le r\\ {\left(\frac{a}{r+1+b}\right)}^{-1/2}& r<k\le L\end{array},$$## Eq. (30)

$${\lambda}_{{m}_{1}}=\mathrm{max}\{\forall \text{\hspace{0.17em}\hspace{0.17em}}{\lambda}_{k}|{\lambda}_{k}<[{\lambda}_{\mathrm{med}}+\mu ({\lambda}_{\mathrm{med}}-{\lambda}_{r})]\},$$^{24}$r$ is the rank of $\mathbf{K}{\mathbf{L}}_{\mathrm{loc}}\mathbf{K}$. The parameters of $a$ and $b$ are calculated as

## Eq. (31)

$$a=\frac{{\lambda}_{1}{\lambda}_{{m}_{1}}({m}_{1}-1)}{{\lambda}_{1}-{\lambda}_{{m}_{1}}},\phantom{\rule{0ex}{0ex}}b=\frac{{m}_{1}{\lambda}_{{m}_{1}}-{\lambda}_{1}}{{\lambda}_{1}-{\lambda}_{{m}_{1}}}.$$The second regularization model is taken from CDEFE.^{26} The eigenspectrum regularization model of CDEFE regularizes the eigenspace in a Gaussian kernel space, which may have a special effect on the proposed RGCDL in the Riemannian kernel space.

The second regularization model aims to find the minimum eigenratio from the eigenspectrum of $\mathbf{K}{\mathbf{L}}_{\mathrm{loc}}\mathbf{K}$, which is formed by the eigenvalues [see Eq. (17)] in descending order. Let ${\delta}_{k}$ denote the ratio of two adjacent eigenvalues ${\lambda}_{k}$ and ${\lambda}_{k+1}$ in the eigenspectrum, we have

The minimum eigenratio can be formulated as

## Eq. (33)

$${\delta}_{s}=\mathrm{min}\{\forall \text{\hspace{0.17em}\hspace{0.17em}}{\delta}_{k},1\le k<r\},$$## 4.3.

### Feature Extraction and Dimensional Reduction

The new feature vectors $\tilde{\mathbf{\Phi}}$ are compact and full dimensional, the eigenfeature of $\tilde{\mathbf{\Phi}}$ should be decorrelated and dimension reduced for classification. According to Jiang et al.,^{24} PCA is exploited to extract the final discriminative eigenfeatures since it is less sensitive to different training databases. However, the class affinity is not considered in Ref. 24, which may cause the missing discriminative information. In this work, we employ a graph-embedded framework to extract the final discriminative features, which incorporates a similarity weight matrix to form the final scatter matrix. Although LRE had extended the graph-embedded framework to eigenfeature extraction, our method is designed to address the problems in a Riemannian kernel space.

According to Eq. (13), the eigenfeature extraction and dimensional reduction of RGCDL can be achieved by solving the following eigendecomposition problem on the mapped space:

## Eq. (35)

$${\mathbf{U}}^{*}=\underset{{\mathbf{U}}^{T}\mathbf{U}=1}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{max}}\text{\hspace{0.17em}}{\mathbf{U}}^{T}\tilde{\mathbf{\Phi}}\mathbf{G}{\tilde{\mathbf{\Phi}}}^{T}\mathbf{U},$$## Eq. (36)

$${\mathbf{G}}_{ij}=\{\begin{array}{ll}\frac{1}{{N}_{c}}-\frac{1}{L},& \text{if}\text{\hspace{0.17em}\hspace{0.17em}}{\mathbf{X}}_{i}\in {C}_{c}\text{\hspace{0.17em}\hspace{0.17em}}\text{and}\text{\hspace{0.17em}\hspace{0.17em}}{\mathbf{X}}_{j}\in {C}_{c}\\ -\frac{1}{L},& \mathrm{otherwise}\end{array},$$Obviously, $\mathbf{Z}$ does not have an explicit expression since $\tilde{\mathbf{V}}=\mathbf{\Phi}\mathit{\rho}$, and $\mathbf{\Phi}$ is the vector space mapped by the Riemannian mapping, which is implicitly defined.

However, an explicit expression can be provided by the kernel trick when calculated with the test samples. For a given test nonsingular covariance matrix ${\mathbf{B}}^{te}$, which is an element of the SPD manifold, we use ${\varphi}^{te}$ to denote the test feature vector that is mapped by the Riemannian mapping. Subsequently, we can extract the discriminative feature $\mathbf{F}$ by the transformation

## Eq. (38)

$$\mathbf{F}={\mathbf{Z}}^{T}{\varphi}^{te}={\mathbf{U}}^{T}{\tilde{\mathbf{V}}}^{T}{\varphi}^{te}$$## Eq. (39)

$$\mathbf{F}={(\mathit{\rho}\mathbf{U})}^{T}{\mathbf{K}}^{te},\phantom{\rule[-0.0ex]{1em}{0.0ex}}{\mathbf{K}}^{te}={[{k}_{\mathrm{log}}({\mathbf{B}}_{1},{\mathbf{B}}^{te}),\cdots ,{k}_{\mathrm{log}}({\mathbf{B}}_{L},{\mathbf{B}}^{te})]}^{T}.$$## 4.4.

### Complete RGCDL Algorithm

The steps of RGCDL algorithm are given in Algorithm 1.

## Algorithm 1

RGCDL algorithm.

At the training stage: |

1. Collaboratively reduce $L$ training sets $\{{\mathbf{X}}_{1},{\mathbf{X}}_{2},\dots ,{\mathbf{X}}_{L}\}$ by Eq. (16) to $\{{\mathbf{Y}}_{1},{\mathbf{Y}}_{2},\dots ,{\mathbf{Y}}_{L}\}$, and model them as ${\mathrm{sym}}_{{d}_{1}}^{+}$ by the covariance descriptor [Eq. (1)], denoted as $\{{\mathbf{B}}_{1},{\mathbf{B}}_{2},\dots ,{\mathbf{B}}_{L}\}$. |

2. Use Riemannian mapping $\varphi :\mathcal{M}\to \mathcal{H}$ to map the ${\mathrm{sym}}_{{d}_{1}}^{+}$ matrices $\{{\mathbf{B}}_{1},{\mathbf{B}}_{2},\dots ,{\mathbf{B}}_{L}\}$ to the RKHS, and designate them $\{{\varphi}_{1},{\varphi}_{2},\dots ,{\varphi}_{L}\}$. |

3. Obtain the optimal coefficient $\mathbf{\alpha}$ and the eigenspectrum $\mathbf{\lambda}$ by solving the eigendecomposition problem of Eq. (22); switch the local Laplacian matrix ${L}_{\mathrm{loc}}$ by ${L}_{\mathrm{bin}}$, ${L}_{\text{class}}$, and ${L}_{\mathrm{adj}}$, respectively. |

4. Regularize the eigenspace $\mathbf{V}$ using the weighting functions of Eqs. (29) and (34), respectively, and construct the full-dimensional transformation matrix $\tilde{\mathbf{V}}$ using Eq. (24). |

5. Transform the mapped feature $\mathbf{\Phi}$ to an intermediate full-dimensional feature space using Eqs. (25) and (28). |

6. Solve the eigendecomposition problem of Eq. (35), and compute the final projection matrix $\mathbf{Z}$ by Eq. (37). |

At the testing stage: |

1. Given a test image set ${\mathbf{X}}^{te}$, project it to a lower-dimensional feature space and extract the covariance feature ${\mathbf{B}}^{te}$. |

2. Map ${\mathbf{B}}^{te}$ to the RKHS by the Riemannian mapping, and designate as ${\varphi}^{te}$. |

3. Project ${\varphi}^{te}$ to the final projection matrix $\mathbf{Z}$ to extract the discriminative features by Eq. (39). |

4. Measure the extracted features between the training and test sets, and classify the label by the classifier (e.g., NN). |

## 5.

## Experimental Results

Experiments were conducted on set-based face recognition and object categorization tasks. First, we compare the proposed RGCDL to the original CDL method when using different numbers of extracted features. Second, we show the advantages of RGCDL over the recent RGDA method. Last, we evaluate the recognition performance of our RGCDL, and compare it to numerous image set-based classification methods.

## 5.1.

### Dataset and Parameter Settings

We employed the Extended Yale face database B (ExtYaleB)^{47} for face recognition task and the RGB-D object database^{48} for object categorization task. The ExtYaleB database is the extension of the Yale face database B; it contains 16,128 images of 28 human subjects with 64 illumination conditions and 9 poses for each subject. According to the 9 poses of each subject, we built 9 image sets ($\sim 60$ images per set), which correspond to 9 poses for each subject. We utilized a cascaded face detector^{49} to collect faces from each image frame. The captured faces were then converted to grayscale and resized to $20\times 20\text{\hspace{0.17em}\hspace{0.17em}}\text{pixels}$. Some example images are shown in Fig. 2. We selected 2 to 5 image sets of 9 poses for discriminative training (103 sets) and employed the remaining sets for testing (149 sets). The experiments were repeated 10 times by randomly choosing the reference sets for training and the test sets for probe.

The RGB-D object database is a large-scale dataset of 300 common household objects that are organized into 51 categories (classes). Each category has 3 to 14 objects that belong to the same category. For each object, 3 video sequences are recorded with a camera that is mounted at different heights so that the object is viewed from different angles with the horizon. The video sequences were captured by placing each object on a turntable for a whole rotation using a Kinect style three-dimensional camera. More than 100 images were extracted for each object’s video sequences; they involve RGB color channels and a depth channel. We removed the depth images in this study to ensure fair comparisons. We built image sets according to each object, forming a total of 300 image sets (102 sets for training and 198 sets for testing). Grayscale and resized images of $20\times 20\text{\hspace{0.17em}\hspace{0.17em}}\text{pixels}$ were adopted for RGB-D dataset. Some example objects are shown in Fig. 3. To obtain more general results, we also conducted 10 cross-validation experiments by randomly choosing different combinations of training sets and test sets. The NN classifier was applied for all evaluations to ensure fair comparisons.

## 5.2.

### Stability of Extracted Features

In this section, we evaluated the stability of the extracted features from RGCDL. We show that by applying eigenspectrum regularization, the features extracted by RGCDL are more stable than those extracted by the original CDL method.

As described in Sec. 4, the proposed RGCDL aims to extract features from the whole regularized eigenfeature space. As the number of final extracted features increases [controlled by varying ${d}_{2}$ dimensions of $\mathbf{U}$ in Eq. (37)], a higher performance can be achieved by our RGCDL, whereas the original CDL algorithm cannot retain this characteristic. To confirm this assumption and provide evidence, we employed real data of face and object datasets to conduct these experiments.

The collaborative dimensional reduction of each image set is set to 100 dimensions, that is, the dimension-reduced covariance matrix of each image set is $100\times 100$. Attributed to the covariance descriptor, the computational cost of constructing the Riemannian kernel matrices is not associated with the number of images within a set. The Riemannian kernel induced by the LED in Eq. (4) was employed for the RGCDL and CDL methods. We vary the final feature dimensions of RGCDL and CDL to perform a comprehensive comparison. The comparison results are shown in Figs. 4 and 5. Each figure consists of the recognition rates against the number of final extracted features. The recognition rates are the average results of 10 cross-validation experiments. Two eigenspectrum regularization models of ERE and CDEFE were evaluated for the proposed RGCDL.

As shown in Figs. 4 and 5, with the increasing dimensions of the final extracted features, the proposed RGCDL methods with two regularization models generally produce low error rates, whereas the original CDL degrades rapidly from the dimension $C-1$. The dimension of $C-1$ represents the optimal performance number of features in LDA.^{17} The degradation of CDL is caused by the incorrectly scaled null kernel space of the within-class scatter matrix,^{24} which causes overfitting and poor generalization. These results reveal that, with the eigenspectrum regularization models, the conventional problems (e.g., the singularity of within-class scatter matrix) caused by limited training samples in CDL can be alleviated. Since the new feature space is properly scaled, the estimated eigenvalues obey the true variances of the population;^{24} hence, better generalization can be achieved. The final extracted features are learned from the regularized full-dimensional transformation matrix $\tilde{\mathbf{V}}$ [Eq. (26)], which can preserve discriminative information as much as possible. As a result, using an increasing number of features, the recognition rates of RGCDL preserve the stable performance.

## 5.3.

### Performance Evaluation against RGDA

RGDA is the preliminary work of this paper, however, RGDA was proposed to solve the overfitting and poor generalization problems of Grassmann discriminative learning, whereas our RGCDL solves these problems against CDL on SPD manifold. Moreover, we further employed different local Laplacian graphs to analyze the locality preserving ability and improve the performance; the locality preserving is evaluated in Sec. 5.4. We discovered that the performance of the SPD manifold with the eigenspectrum regularization techniques is better than that of the Grassmann manifold. We conducted two experiments to evaluate the advantages of the proposed method. First, we compared the classification ability of RGDA and RGCDL when different dimensions of the extracted features were applied. As shown in Figs. 6 and 7, for both eigenspectrum regularization models in two datasets, the error rates of RGCDL with different number of features always lower than those of the RGDA. Moreover, the error rate curves of RGCDL are smoother and steadier for a different number of features, and RGCDL can achieve a lower error rate even with a low number of features, particularly for the ExtYaleB dataset. The RGDA method usually cannot achieve high performance using lower dimensions of features. This finding demonstrates that the SPD manifold formed by the covariance matrices has better discriminative information preservation ability than the Grassmann manifold formed by subspaces.

Subsequently, we conducted noisy set data to evaluate the robustness of the proposed method. Image sets may contain noisy data in real-world applications, for example, outliers of other categories or subjects within sets exist, which may degrade the performance of classifiers. Here, we show that the SPD manifold-based RGCDL method is more robust than the Grassmann manifold-based RGDA method. We conducted experiments by systematically corrupting the training (gallery) sets or test (probe) sets. The corruption is implemented by adding images from other classes. The data with no noise are denoted as “clean,” the data with noise in the gallery sets are denoted as “N_G,” and the data with noise in the probe sets are denoted as “N_P.” Experiments were evaluated using both face recognition and object categorization.

The average classification rates of several cross-validations with different noise-corrupted datasets are shown in Figs. 8 and 9. The classification rates of our RGCDL always outperform RGDA in clean and different corrupted data. Especially in the gallery corrupted data N_G, RGCDL-ERE and RGCDL-CDEFE exhibit great advantages than RGDA-ERE and RGDA-CDEFE. Once again, these results demonstrate that the SPD manifold formed by the second-order statistic covariance matrices can be able to account for the noisy set data better than the Grassmann manifold formed by subspaces; it reveals the robustness of RGCDL when dealing with noisy set data.

## 5.4.

### Performance Comparison to Other Set-Based Classification Methods

We further evaluated the proposed RGCDL compared with other set-based classification methods. Multiple image set-based classification methods were evaluated for comprehensive comparison. The compared methods include the subspace-based methods DCC^{7} and ECCA;^{36} the Grassmann manifold methods of GDA,^{8} KGDA,^{38} GGDA,^{9} MMD,^{3} GNP,^{11} and RGDA;^{2} the SPD Riemannian manifold methods of CDL,^{10} PPCDL,^{16} and DARG.^{41}

The parameter settings of different methods are depicted as follows. The final feature dimension of GDA, KGDA, GGDA, and RGDA was established as the recommendation of Ref. 2, and only the projection kernel^{8} was employed for Grassmannian mapping. The parameters (such as the nonlinearity score and number of NNs of data points) of MMD were tuned to be optimal with the code provided by the authors in our datasets. For CDL and PPCDL, the final feature dimension was set as the recommended value $C-1$. The dimension of the input covariance matrices of DARG and PPCDL was set to $100\times 100$, which is the same as the proposed RGCDL for fairness. The Riemannian kernel induced by LED was applied for CDL, PPCDL, and our RGCDL. We chose kernel-based DARG and the good performance of MD + LED^{41} distance matrix for evaluation. We fixed the dimension to 150 for DCC by preapplying PCA to the data.^{7} The number of canonical correlations of DCC and ECCA was set to 20, which is the same as the Grassmannian dimension of the GDA, KGDA, GGDA, and RGDA methods. For RGDA and our RGCDL, the eigenspectrum regularization models of ERE and CDEFE were applied for comparison. Three local Laplacian matrices of ${L}_{\text{bin}}$, ${L}_{\text{class}}$, and ${L}_{\mathrm{adj}}$ were evaluated in the proposed RGCDL.

Experiments were evaluated on the ExtYaleB and RGB-D datasets. The experimental results are formed by the average classification rates and standard deviations over 10-fold cross-validations. As shown in Table 1, the proposed RGCDL with regularization models of ERE and CDEFE achieves the best classification results among all methods. The SPD manifold with covariance matrices of CDL, PPCDL, DARG, and our RGCDL approaches usually achieves better performance than other methods in ExtYaleB, which has shown the better accommodative ability of the second-order statistic of covariance matrix on handling the illumination-varying face recognition. The inferior performances of PPCDL and DARG on RGB-D dataset may cause by the conventional problems of discriminative learning and the improperly estimated GMM. Benefitting from the eigenspectrum regularization with the graph-embedded framework, the proposed RGCDL with different models outperforms all other methods. For the evaluation of different local Laplacian graphs, the ${L}_{\text{class}}$ achieves the best results with the ERE regularization model. However, the performance of adjustable Laplacian matrix ${L}_{\mathrm{adj}}$ is also outstanding in ERE model, and it achieves the best results with the CDEFE model. The adjustable Laplacian matrix ${L}_{\mathrm{adj}}$ performs stable in different regularization models, it has revealed the good locality preserving ability. Obviously, ${L}_{\mathrm{adj}}$ is not the best local Laplacian matrix for locality preserving, better affinity matrix can be designed according to suitable theories.

## Table 1

Average classification rates and standard deviations (%) on ExtYaleB and RGB-D datasets.

Method | ExtYaleB | RGB-D |
---|---|---|

MMD | $58.9\pm 4.1$ | $54.4\pm 2.8$ |

KGDA | $86.7\pm 2.7$ | $66.5\pm 2.6$ |

GGDA | $73.4\pm 4.1$ | $61.9\pm 2.6$ |

GDA | $89.7\pm 2.1$ | $66.3\pm 3.6$ |

ECCA | $68.2\pm 2.5$ | $56.3\pm 2.7$ |

DCC | $79.9\pm 2.6$ | $67.3\pm 2.6$ |

CDL | $97.4\pm 1.2$ | $69.7\pm 3.4$ |

GNP | $80.1\pm 1.6$ | $62.1\pm 1.1$ |

PPCDL | $97.2\pm 1.3$ | $51.8\pm 2.7$ |

DARG | $91.8\pm 2.4$ | $55.3\pm 4.1$ |

RGDA-ERE | $91.7\pm 2.4$ | $65.2\pm 3.0$ |

RGDA-CDEFE | $91.0\pm 1.1$ | $64.9\pm 3.2$ |

RGCDL-ERE-${L}_{\text{bin}}$ | $98.5\pm 1.3$ | $71.9\pm 3.3$ |

RGCDL-ERE-${L}_{\text{class}}$ | $98.5\pm 1.3$ | $72.1\pm 3.3$ |

RGCDL-ERE-${L}_{\mathrm{adj}}$ | $98.5\pm 1.3$ | $72.0\pm 2.2$ |

RGCDL-CDEFE-${L}_{\text{bin}}$ | $98.1\pm 1.1$ | $71.4\pm 3.3$ |

RGCDL-CDEFE-${L}_{\text{class}}$ | $98.3\pm 1.1$ | $71.1\pm 2.2$ |

RGCDL-CDEFE-${L}_{\mathrm{adj}}$ | $98.5\pm 1.1$ | $71.5\pm 2.2$ |

## 6.

## Conclusion

In this paper, we proposed a regularized graph-embedded CDL method, which is referred to as RGCDL. The eigenspectrum regularization and graph-embedded framework are collaboratively employed to attenuate the overfitting and poor generalization problems of the original CDL method. Comprehensive mathematical deduction in SPD manifold kernel space is given to exhibit the combination of these techniques. The experimental results of evaluating a different number of extracted features show that the proposed method can maintain stable and lower error rates throughout all dimensions of the extracted features. This result manifests the stability of the eigenspectrum regularization to linear discriminative learning in the SPD manifold kernel space. The graph-embedded framework benefits by preserving compact within-class affinity relations and achieves higher performance. Compared with the more recent RGDA method, our RGCDL achieves higher and steadier performance when different number of features are employed. Moreover, our RGCDL exhibits more robust ability than RGDA when the gallery or probe sets are corrupted by noise. According to the plentiful comparisons with other set-based classification methods, our RGCDL has shown considerable results. The local Laplacian matrix reflects the local structure of intraclass, how to devise the similarity of intraclass vertex pairs to better preserve locality information is one of our future works.

## Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61701126, 61802148, and 61802079, the Research projects in Guangzhou University, China, under Grant No. RP2020123, and the Scientific Research Program of Guangzhou under Grant No. 201904010493.

## References

## Biography

**Hengliang Tan** received his BE degree from Foshan University, Foshan, China, in 2006, and his ME and PhD degrees from Sun Yat-sen University, Guangzhou, China, in 2011 and 2016, respectively. He joined the School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, in 2016. His current research interests include machine learning, pattern recognition, and manifold learning.

**Ying Gao** received his PhD from the South China University of Technology, Guangzhou, China, in 2002. He is currently a professor at the School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou. His main research interests include intelligent optimization algorithms, pattern recognition, and signal processing.

**Jiao Du** received her MS and PhD degrees from the Chongqing University of Posts and Telecommunications in 2013 and 2017, respectively. She is currently a lecturer at the School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China. Her research interests include pattern recognition and image processing.

**Shuo Yang** received his master’s degree in software engineering from Dalian Jiaotong University, China, in 2013. He was awarded a doctorate degree in software engineering, University of Macau, in 2017. He is currently a lecturer at the School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China. His research interests include semantic interoperability and semantic inference with artificial intelligence technology, mainly applied to the fields of e-commerce, e-marketplace, and clinical area.