Targeted adversarial discriminative domain adaptation

Abstract. Domain adaptation is a technology enabling aided target recognition and other algorithms for environments and targets with data or labeled data that is scarce. Recent advances in unsupervised domain adaptation have demonstrated excellent performance but only when the domain shift is relatively small. We proposed targeted adversarial discriminative domain adaptation (T-ADDA), a semi-supervised domain adaptation method that extends the ADDA framework. By providing at least one labeled target image per class, used as a cue to guide the adaption, T-ADDA significantly boosts the performance of ADDA and is applicable to the challenging scenario in which the sets of targets in the source and target domains are not the same. The efficacy of T-ADDA is demonstrated by cross-domain, cross-sensor, and cross-target experiments using the common digits datasets and several aerial image datasets. Results demonstrate an average increase of 15% improvement with T-ADDA over ADDA using just a few labeled images when adapting to a small domain shift and afforded a 60% improvement when adapting to large domain shifts.


Introduction
Aided target recognition (AiTR) focuses on developing automatic target recognition (ATR) to aid a human user. 1 Three examples of AiTR include discriminability with data fusion, 2 extendibility over data sparsity, 3 and interpretability from data compression. 4Data fusion supports AiTR through enhancing target recognition by combining data from two or more sensors.Data fusion has demonstrated numerous capabilities for applications such as infrared (IR) and millimeter-wave IR for object detection, 5 electro-optical (EO) and IR for object tracking, 6 and EO and radar for enhanced situation awareness. 7Combing EO with radar signatures allows for machine processing of multiresolution data with sparsity and complexity. 8These data fusion methods afford interpretability of data for task success.Examples include interpretability over compressed imagery data, 9 3D volumetric lidar data, 10 and classifier assessment. 11One of the challenges is to develop efficient methods for large volumes of data from which developments in deep learning (DL) have become popular.
Deep convolutional neural networks (CNNs) trained on large datasets have demonstrated excellent performance on computer vision tasks such as object classification, 12 change detection, 13 and ATR 14 from EO and radar data.However, the data distribution in the target domain, where testing takes place, may be different from the data distribution in the source domain, where training occurs.Domain adaptation (DA) aims to overcome the domain shift, or dataset bias, 15 that reduces classifier performance when classification takes place in a target domain.The shift in the data distribution may be due to differences in illumination, sensor type, perspective, background, and target classes.Conventional deep transfer learning utilizes pretrained CNN models for feature extraction and performs fine-tuning for training on a labeled dataset of interest.Unsupervised DA deals with unlabeled data in the target domain after training with labeled data in the source domain.Many unsupervised DA approaches have demonstrated excellent performance, but only when the domain shift is small.
For applications such as transferring knowledge from one set of targets to another set of targets, any unsupervised DA approach is doomed to fail as the class correspondence is typically ambiguous and without further information, and an unsupervised DA method will have limited knowledge on how the adaptation should proceed.Figure 1 shows an example of DA ambiguity.In Fig. 1(a), the red uppercase letters represent the source domain, and the blue lowercase letters represent the target domain.For an unsupervised DA, the target feature vectors in classes a, b, and c will be adapted to nearby source classes A, B, and C, respectively.Figure 1(b) shows what an unsupervised DA approach can achieve.Without knowing the correspondence between the classes in the source and target domains, adjacent classes in the source and target domains will be merged, thus representing the best that an unsupervised DA method can achieve.Obviously, the adaptation results are not necessarily correct without a domain mapping.
The need for the algorithm to know where the target classes a, b, and c should be adapted to for correct adaptation 16 motivates the targeted adversarial discriminative DA (T-ADDA) approach.T-ADDA assumes the availability of at least one labeled target image per target class (i.e., one labeled target feature vector per class).The labeled target feature vectors are indicated by the dark blue, underlined lowercase letters in Fig. 2(a).By enforcing all labeled target feature vectors to move toward their targeted source class centers as indicated by the dashed lines, T-ADDA adapts the target model, so the resulting target classes in the target domain correctly match the corresponding source classes as shown in Fig. 2

(b).
This paper is organized as follows.Section 2 provides a brief review of unsupervised DA approaches and a deeper look into the unsupervised DA approach that T-ADDA was built upon, i.e., Adversarial Discriminative DA (ADDA).The proposed T-ADDA is detailed in Sec. 3 and followed by implementation methods in Sec. 4. In Sec. 5, four experimental results using digit datasets and real aerial image datasets (AID) are presented.Finally, concluding remarks are provided in Sec. 6.
2 Literature Review of Domain Adaptation

Unsupervised Domain Adaptation
Subspace alignment (SA) 17 is one of the early unsupervised DA approaches that performs a transformation on the source and target domain representations to generate features that are domain invariant.Other methods that perform subspace alignment include CORAL 18 and manifold aligned label transfer DA. 19 Adversarial learning is often used by DA methods.The domain adversarial neural networks method 20 uses a gradient reversal layer to learn features that are class discriminative and domain invariant.Domain symmetric networks (SymNets) 21 are based on a symmetric design of source and target task classifiers and adversarial training with a domain confusion scheme for learning domain invariant representations.
The work that is most relevant to the proposed T-ADDA semi-supervised domain adaptation method is the unsupervised DA framework by Tzeng. 22In fact, the T-ADDA approach can be considered an extension of ADDA from unsupervised learning to semi-supervised learning.
ADDA is a generalized framework for adversarial DA that combines discriminative modeling, untied weight sharing, and a generative adversarial network (GAN) loss.ADDA first learns a discriminative representation using the labels in the source domain and then a separate encoding that maps the target data to the same space using an asymmetric mapping learned through a domain-adversarial loss.It is a simple, flexible, yet surprisingly powerful approach that achieves state-of-the-art visual adaptation results on standard DA datasets.
All of the above unsupervised DA methods assume that the initial domain shift is relatively small and that adjacent classes in the source and target domains correspond to the same target class.However, a small domain shift assumption may not be true if the source and target domains are very different.When the domain shift is large, extra information in terms of a few labeled target images is needed, and it is known as semi-supervised DA (SSDA).

Semi-Supervised Domain Adaptation
SSDA is an important task; however, it has not been fully explored with regard to DL-based methods. 23One notable SSDA work was the minimax entropy DA by Saito et al. 23 In minimax entropy DA, domain invariant class prototypes are defined as the weight vectors of the classifier C, which takes normalized feature vectors as its input and outputs the probability of classes with a softmax activation function.Then, the weight vectors are updated during training to maximize the entropy measured by the similarity between W, the weight vectors associated with the classifier C, and the unlabeled target features.Next, the feature extractor F is updated to minimize the entropy on unlabeled target examples to yield discriminative features extracted by F. At the same time, C and F are trained to classify both labeled source examples and a few labeled target examples correctly by minimizing the cross-entropy.

Adversarial Discriminative Domain Adaptation
ADDA 21 uses a GAN framework along with an adversarial loss for DA.Details of ADDA are provided next to set the stage for describing the proposed T-ADDA.To begin, we need source images X s and labels Y s drawn from a source domain distribution p s ðx; yÞ and target images X t drawn from a target domain distribution p t ðx; yÞ, where there are no labels available.The goal is to learn a target feature encoder M t and a target classifier C t that can correctly classify target images into one of K categories at test time, despite the lack of target domain annotations.Since direct supervised learning on the target is not possible, DA instead learns a source feature encoder M s along with a source classifier C s and then adapts that model for use in the target domain.The adaptation is accomplished by minimizing the distance between the two empirical source and target distributions M s ðX s Þ and M t ðX t Þ and setting The source classification model is trained using the standard supervised cross-entropy loss as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 7 2 3 min To minimize the empirical source (M s ðX s Þ) and target (M t ðX t Þ) distributions, the adversarial learning of ADDA consists of alternating the following two optimizations: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 6 4 9 min D L adv D ðX s ; X t ; M s ; M t Þ; (2) and Eq. ( 3) states that the target encoder M t is optimized according the GAN loss function L GAN , which is defined as ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 4 7 1 It is worth noting that the source encoder M s is optimized during pretraining and is fixed during the above adversarial learning process.
3 Proposed T-ADDA Approach

Assumption
T-ADDA, as illustrated in Fig. 2, makes two assumptions.The first assumption is that the source and target features of different target classes are well separated and clustered, and the second assumption is that all target feature points of the same classes will follow the movements of the few labeled target feature points to result in the desired adaptation result.The success of T-ADDA relies on the validity of the above two assumptions.
To enforce the validity of the first assumption, the combined cross-entropy and center loss function 24 is adopted to encourage separation and clustering of source feature vectors.However, it is not straightforward to enforce clustering of target feature vectors, which are encoded by the initial target feature extractor (target feature encoder).In T-ADDA, clustering of target feature vectors is supported experimentally by carefully choosing the initial target feature encoder.
The second assumption, that all target feature points of the same classes will follow the movements of the few labeled target feature points to result in the desired adaptation result, is enforced by adversarial learning as described in ADDA and is validated by extensive experiments shown in the results.

Targeted Adversarial Discriminative Domain Adaptation
When there are no labeled target images, the proposed T-ADDA is identical to ADDA reviewed in Sec. 2. When few labeled target images are available, three types of input data can be distinguished in T-ADDA: (1) the labeled source data X s , (2) the target data X t , and (3) the few labeled target data X 0 t ⊂ X t .The use of X s and X t in T-ADDA is identical to ADDA as described in Eqs. ( 4) and (5).When few labeled target images are available, i.e., X 0 t is not an empty set, the target encoder M t defined in Sec. 2 is additionally optimized according to the following feature class matching (FCM) loss function using X 0 t .
where x 0 t i is the feature vector extracted from the i'th labeled target image and S C t is the corresponding source feature class centers, which are extracted after the source model is trained.Figure 3 shows the overview of the proposed T-ADDA approach, which consists of three steps.In Step 1, a source model is pretrained using the source domain training dataset with either crossentropy or combined cross-entropy and center loss functions, which is described in the next subsection.Once the source model is pretrained, T-ADDA computes and saves the center of features in each class.The key contribution of the proposed T-ADDA is performed in Step 2 and attempts to adapt a target encoder M t so that the features extracted by it cannot be distinguished from the features extracted by the source encoder M s .In this step, L adv D , L adv M , and L FCM , given in Eqs. ( 4)-( 6), respectively, are optimized alternately.Finally, in Step 3, the target model is formed by the adapted target encoder concatenated with the classification layer(s) of the source model and is used to classify images in the target domain.

Center Loss
Through supervised training, via minimizing categorical cross entropy loss, discriminative features are guaranteed to be generated, but well-clustered features are not guaranteed.The idea of center loss was originally presented in Ref. 24 and was adopted in Ref. 19 for DA.It was shown that by combining cross-entropy loss and center-loss functions, well clustered features are generated, and the accuracy of classifiers can be improved, 24 which is confirmed by our experiments.Thus, in this section, the center loss is presented and then employed throughout our experiments for improved source model performance and, thus, improved T-ADDA performance.
Center loss function is formulated as ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 6 ; 4 2 1 entropy loss function only and computes the centers of all classes to be used in Eq. ( 6) to encourage feature clustering.In the second stage, the computed class centers were then used in Eq. ( 6) to compute center loss.The complete loss to be minimized is the combination of cross entropy and center loss given as ; t e m p : i n t r a l i n k -; e 0 0 8 ; 1 1 6 ; 5 0 6 where L S denotes the standard cross entropy loss and L C is the center loss given in Eq. (7).A visual comparison of features resulting from cross entropy loss and combined cross entropy and center loss is given in Fig. 4. Figure 4 uses MNIST data and a LeNet++ source model 24 to generate the plots by setting the feature dimension to be two.We note that, in T-ADDA, the computed source class centers are used in the feature matching loss function given in Eq. ( 6), where the source class centers are denoted by S C t .

Implementation
The pseudo code of the proposed T-ADDA approach is provided in Fig. 5. Given source domain data X_source, source class centers sourceCenters, unlabeled target domain data Y_target_unlabeled, labeled target domain data Y_target_labeled, as well as the following neural networks sourceEncoder, targetEncoder, discriminator, gan, and fcm.The loss functions employed by discriminator, gan, and fcm are eq.(4), eq.( 5) and eq.( 6 For the experimental results involving digits datasets (Secs.5.2.1-5.2.2), we constructed the source model based on LeNet++. 24Table 1 shows the summary of our LeNet++ based model, which is a variation of LeNet++ by incorporating batch normalization and dropout layers.The source encoder is formed from the InputLayer up to the layer ip1.The dimension of the feature space is fixed at 500.The dense layer ip2 serves as a linear 10 class classifier.The LeNet++ based source models, after being trained with source domain datasets, are used as the initial target models for adaptation in Experiments 1 and 2 involving digit datasets.
For the experimental results involving real AID (Secs.5.2.3-5.2.4), we built our source model based on ImageNet 25 pretrained models, as we found that the LeNet++ based model was not able to extract well-separated and clustered features.Specifically, we employed the ImageNet pretrained DenseNet 26 model provided in Keras, and the layer GlobalMaxPooling2D was adopted to reduce the CNN features from dimension 7 × 7 × 1024 to 1024 to form the base model.Then, a dense (fully connected) layer fc1 was added to form the source encoder.To complete the source classification model, we added another fully connected layer fc2 to the source encoder as the classifier.The purpose of the fc1 layer is to further reduce the feature space dimension to a pre-determined size, in this case, 256.To construct the initial target encoder, two strategies were considered: (1) adopting the source encoder as the initial target encoder and ( 2) concatenating the base model with the fc1 layer from the source encoder.The first strategy tunes the feature extraction CNN toward classifying the specific source classes, and the second strategy keeps the feature extractor intact.In our experiments, we found that the first approach worked better when the source and target domains shared the same object classes, and the second approach was preferred when new classes were introduced in the target domain.We believe that the reason is that when source and target domains share the common target classes, the fine-tuned feature extraction layers are able to extract well separated and clustered features in the source domain as well as in the target domain.However, if new classes appear in the target domain, since the feature extraction layers are finetuned toward extracting features that well separate the source classes, it may not have the power to extract the features that distinguish different target classes in the target domain.In this case, it is preferred to retain all feature extraction layers that are pretrained using a large amount of the ImageNet dataset.
Next, the network discriminator in both cases consists of three dense layers as shown in Fig. 6.
The network GAN is formed by concatenating targetEncoder and discriminator, and only the targetEncoder is trainable.Finally, the network FCM is implemented similarly to how the source model is trained by combined center loss and cross-entropy loss functions.However, in FCM, only the center loss is employed.The label dummy_y is randomly generated as no label is required for employing the center loss function.

Datasets
The proposed T-ADDA is first evaluated in two experiments involving three datasets with 10 digit classes and then tested in two experiments involving six aerial datasets.Brief descriptions about each dataset are provided below.

Modified National Institute of Standards and Technology
The Modified National Institute of Standards and Technology (MNIST) 27 database consists of 70,000 grayscale handwritten digit images.Among them, 60,000 images are sequestered for the training set, and the remaining 10,000 are saved for the test set.The MNIST database is commonly used for developing and testing various image processing systems.

Street View House Numbers
The Street View House Numbers (SVHN) 28 dataset, obtained from house numbers in Google Street View images, is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirements on data preprocessing and formatting.The image size of SVHN is 32 × 32.It can be seen as similar in style to MNIST (e.g., the images are of small cropped digits), but it incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images).Among them, 73,257 images are for training, and 26,032 are for testing.

Devanagari Handwritten Character
The Devanagari handwritten character (DHC) dataset 29 is a database of handwritten Devanagari characters consisting of 46 classes of characters, including ten Devanagari digits, with 2000 examples each.The image size of DHC dataset is 32 × 32.
In Fig. 7, example digits images from MNIST, SVHN, and DHC databases are provided for comparison.We note that that the same digits in Arabic and Devanagari numerals do not necessarily have the same meaning.For example, the Arabic digit 9 resembles Devanagari digit 1

Aerial Image Datasets
AID 30 contains over 10,000 aerial images from 30 classes.The image size is 600 × 600 pixels, obtained at multiple ground sampling distances (GSDs) (8 to 0.5 m).The source is Google Earth images from various countries.

The University of California Merced
The University of California Merced (UCM) landmass dataset 31 has 2100 images representing 21 classes with 100 images per class.The UCM images are of size 256 × 256 pixels, at GSD of 1 foot∕pixel.They are manually extracted images from the United States Geological Survey National Map Urban Area Imagery.

xView
The xView 2018 dataset 32 is one of the largest publicly available datasets of overhead imagery.It contains around 1 million labeled object samples divided across 60 classes with the option of using either 3-band or 8-band imagery.The images were obtained from the WorldView-3 satellite at 0.3-m ground sample distance.The xView dataset is an imbalanced dataset that has some classes with a few instances and some with many instances.

DOTA
The DOTA dataset 33 is a large-scale dataset designed for the development and evaluation of object detectors for aerial imagery.It contains 2806 aerial images from different sensors and platforms.Image sizes range from about 800 × 800 to 4000 × 4000 pixels and contain objects exhibiting a wide variety of scales, orientations, and shapes.There are sixteen object categories in DOTA-v1.0,including plane, ship, and storage tank.

NWPU
The NWPU-RESISC45 dataset 34 consists of 31,500 images divided into 45 scene classes.Each class includes 700 images that have a size of 256 × 256 pixels.The spatial resolution varies from about 30 to 0.2 m per pixel for most of the classes except for island, lake, mountain, and snowberg, which have lower spatial resolutions.

Remote Sensing Image Classification Benchmark
The Large-Scale Remote Sensing Image Classification Benchmark (RSI-CB) dataset 35 consists of two parts: the RSI-CB256 dataset and the RSI-CB128 dataset.They both have spatial resolutions of 0.3 to 3 m.RSI-CB256 contains 35 categories and more than 24,000 images of size 256c256.RSI-CB128 contains 45 categories and more than 36,000 images of size 128 × 128.Both datasets have six common categories: agricultural land, construction land and facilities, transportation and facilities, water and water conservancy facilities, woodland, and other land, and there are various subcategories within them.

Experiments and Results
Three transfer learning scenarios are considered in four experiments.Each experiment, including the considered scenario, experimental procedure, and experimental result, is provided below.

Experiment 1
In the first experiment, we consider the transfer learning from simulated data to measured data.For this scenario, SVHN is employed as the simulated data, as they were collected from printed house numbers, and MNIST is employed as the measured data, as they were hand-written digits.In the first stage, we performed source model training using cross-entropy as the loss function to be minimized.Then, we computed and saved the centers of source classes S i , i ¼ 1; : : : ; K, in the feature space, where K is the number of source classes.Next, we performed source model training by minimizing the combined cross-entropy and center loss function.This completed the first stage of source model training.
In the second stage, adversarial DA, we used the source model as the initial target model followed by randomly selecting N target images for labeling, 10 ≥ N ≥ 0 and then performed T-ADDA.When N equals 0, it reduces to ADDA.This process was repeated 10 times, and the results were averaged together.Finally, in the last stage, we combined the classification layer of the source model and the adapted target encoder to evaluate the performance of the target model before and after adaptation.Table 2 shows the common settings used in Experiment 1 and Experiment 2.
The accuracy of the cross entropy trained source classifier on source validation data is 92.86%, and the accuracy of the combined cross entropy and center loss trained source classifier on source validation data is 93.65%.Intuitively, these two values can be used as the upper bounds of target classifier performance after adaptation.Table 3 and Fig. 8 show the numerical and graphical results of the experiment.Clearly, T-ADDA is very effective with an improved performance of 3% to 18% over ADDA when N is increased from 1 to 10.In addition, we observe that the standard deviation decreases with increased N.This indicates that the target images that are selected for labeling have an impact on the adaptation result.How to effectively select target images for labeling within a given selection budget is a topic to investigate in the future.Also, the results showed the combined cross entropy and center loss consistently outperformed cross entropy loss by 2% to 4%.This indicates that a better clustered source domain is beneficial to performing DA via T-ADDA.Figure 9 shows two t-distributed stochastic neighbor embedding (t-SNE) visualizations of the source domain containing features of the ten digits classes, and the t-SNE visualization of features in the target domain.In Fig. 9(a), features are extracted from the cross-entropy trained source model, in Figs.9(b) and 9(c), features are extracted from the combined cross-entropy and center loss trained source model.We note that the target features extracted from the cross-entropy trained source model are very similar to the ones extracted  from the combined cross-entropy and center loss trained source model; thus, it is not shown.
It is worth noting that the target features are well separated and clustered in this case.Also notice that, in both cases, the performance of T-ADDA when 10 target images (∼1% of the total target images) are randomly selected for labeling approaches the upper bounds established by evaluating the source classifier on source validation data.

Experiment 2
The transfer learning scenario that learns how to classify a set of targets from a classifier that is trained to classify a different set of targets was consider in Ref. 36, in which the authors utilize one-labeled sample per class to transfer the classification from lung cancer to breast cancer.Two experiments, i.e., Experiments 2 and 4, are conducted under this scenario.In Experiment 2, SVHN and DHC datasets are employed.Though images of numerals from zero to nine are employed in both datasets, from Fig. 7 we see that only 0, 2, and 3 are visually similar and represent the same numerals.Others are either new to one another, i.e., 1, 4, 5, 7, and 8 in SVHN, or represent different numerals, i.e., 6 and 9 in SVHN.Table 4 and Fig. 10 show the numerical and graphical results of the experiment.In this experiment, ADDA failed; this was expected as the domain shift is more likely to be large and the adapted target domain will not necessarily match the source domain in terms of class labels.On the other hand, T-ADDA is very effective with an improved performance from 18% to 80% over ADDA when N is increased from 1 to 10.However, the improvement of combined cross entropy and center loss trained source classifier over the cross-entropy trained source classifier is reduced.It is interesting to note that, when 10 target images per class (<0.6%) are randomly selected for labeling, the adaptation result from cross entropy trained source classifier reaches the performance upper bound, and the adaptation result from combined cross entropy and center loss trained source classifier exceeds the performance upper bound established by applying source classifier on source validation data.This is indicated by the bold values in Table 4.
The outstanding performance from T-ADDA of SVHN to DHC adaptation that exceeds both the performance of SVHN to MNIST adaptation shown in Experiment 01 and the intuitive   Clearly, DHC features are better separated than SVHN validation set features, which explains why the target classifier outperforms the source classifier when they are evaluated against target and source domain datasets, respectively.
To visualize the adaptation result, Fig. 12 shows the parametric t-SNE 37 visualizations of DHC features (a) before adaptation, (b) after ADDA adaptation, and (c) after T-ADDA adaptation.The parametric t-SNE model was trained by the features of the source training set extracted by the source encoder as shown in Fig. 12(d), to which the DHC features are adapted.Finally, it is interesting to get the classification performance for each target class by observing the confusion matrices resulting from the target model before and after T-ADDA.By observing Fig. 13(a), the confusion matrix resulting from the initial target model has relatively good performance for digits 0, 2, and 3, with classification accuracies of 0.68, 0.6, and 0.47, respectively.This is consistent with the observation that these three digits share very similar forms.After adaptation, distributions of all ten target classes are very close to the distributions of the corresponding ten source classes as observed from Fig. 13(b) with the lowest classification accuracy associated with numeral seven in the source domain.In this case, about 60% of numeral seven in the DHC dataset are correctly classified as numeral seven in the SVHN dataset, and about 20% are misclassified as numeral one in SVHN dataset.

Experiment 3
In the next two experiments, aerial images are used.In Experiment 3, we consider the transfer learning scenario from one imaging condition to another.For this scenario, we formed augmented xView and augmented DOTA datasets and performed DA from the former to the latter.5 shows the common settings used in Experiment 3 and Experiment 4. It is worth noting that, in this case, both the source and target domains have the same eight target classes.In Table 6, we list all eight classes and the number of images used in each class.In this experiment, DenseNet was selected as the base model.After constructing the source model, we fine-tuned all parameters with the training set of the source data and employed the trained source encoder as the initial target encoder because the same target classes were involved in both the source and target domains.The number of labeled target images N is set to be 0, 2, 4, 6, 8, and 10, and for each value of N, we ran the experiment five times and reported the mean ± standard deviation of the classification accuracies in the target domain.Table 7 and Fig. 15 show the numerical and graphical results of the experiment.Surprisingly, in this case, ADDA failed to improve target classification accuracy after adaptation.We believe that this is because the domain shift is not small enough for successful ADDA adaptation.However, T-ADDA still shows promising results when N is increased from 2 to 10. Again, the results showed that combined cross entropy and center loss consistently outperformed cross entropy loss by 1.2% to 5.5%.This indicates that a better clustered source domain is beneficial to performing DA via T-ADDA.
Also notice that in both cases, the performance of T-ADDA when 6 or above target images (0.6% to 1% of the total target images) are randomly selected for labeling exceeds the upper bounds established by evaluating the source classifier on source validation data.We conjecture  that the reason for this is that target features are better separated and clustered than the source validation data features.This is confirmed by the t-SNE visualization plots provided in Fig. 16.
To quantify the clustering quality, we applied K-means clustering 38 and computed the clustering accuracies, which are indicated under each t-SNE plot.

Experiment 4
In the last experiment, we again consider the scenario involving different classes of targets in the source and target domains.The datasets employed were AID and UCM.Ten classes from the AID dataset and ten classes from UCM dataset were selected for the experiment.Among them, five classes were common in both the AID and UCM datasets, and five classes were unique to each dataset.Table 8 shows the ten source and target domain classes along with the number of images in each class.Sample images from some of the common classes are shown in Fig. 17, and those from some of the unique classes are shown in Fig. 18.
For source model training, we first randomly split the source images into training (85%) and validation (15%) sets.All images in the training set were used for source model training.After training, the source model was evaluated against the source validation set as well as the entire target domain images, which were the 10 classes from the UCM dataset, and there were 100 images in each target class.For T-ADDA adaptation, the target encoder was initialized with   the ImageNet pretrained DenseNet model with the feature space dimension equals to 256 and the number of classes set to 10.For each number of labeled target images N ¼ 0;2; : : : ; 10, we ran the experiment five times and reported the mean ± standard deviation of the classification accuracies in the target domain.
The accuracy of the cross entropy trained source classifier on source validation data was 98.4%, and the accuracy of the combined cross entropy and center loss trained source classifier on source validation data was 98.8%.Intuitively, these two values can be used as the upper  bounds of the target classifier performance after adaptation.Table 9 and Fig. 19 show the numerical and graphical results of the experiment.As expected, ADDA failed to improve the target classification accuracy after adaptation due to different target classes in the source and target domains.On the other hand, T-ADDA shows promising results even when N is equal to 2, with the accuracy improving from about 20% to above 60%.However, in this experiment, the advantage of the combined cross entropy and center loss was not as clear as in other experiments.To see the classification accuracy of each individual class, in Fig. 20, we provide the confusion matrices for N ¼ 0;2; : : : ; 10.From Fig. 20, no clear difference between the five common classes and the five classes unique to each domain is observed.

Conclusions
The

Fig. 4
Fig. 4 MNIST features obtained using (a) cross-entropy and (b) combined center and crossentropy loss as the function to be minimized.

Fig. 5
Fig. 5 Pseudo code of the proposed T-ADDA.

Fig. 6
Fig. 6 Structure of the implemented discriminator in T-ADDA.

Fig. 9
Fig. 9 t-SNE visualization of the (a) source features obtained from cross-entropy trained source model; (b) source features obtained from combined cross-entropy and center-loss trained source model; and (c) target features obtained from combined cross-entropy and center-loss trained source model.
performance upper bounds may be attributed to the lack of diversity of the DHC data within the same classes as compared with that in the MNIST and SVHN datasets.In other words, we anticipate that the DHC features encoded by the SVHN trained source encoder are very well separated and clustered.This is confirmed by the t-SNE visualization shown in Fig.11(a).As a comparison, Fig. 11(b) shows the t-SNE visualization of features of the SVHN validation dataset.

Fig. 11
Fig. 11 The t-SNE visualization of target features encoded by source encoders (a) DHC features encoded by SVHN trained source encoder and (b) features of SVHN validation set encoded by SVHN trained source encoder.

Fig. 13
Fig. 13 Confusion matrix resulting from the target model (a) before and (b) after T-ADDA (N ¼ 10).

Fig. 16 (
Fig. 16 (a) t -SNE visualization of source validation data features encoded by the source encoder and (b) t -SNE visualization of target features encoded by the source encoder.

Fig. 17 Fig. 18
Fig. 17 Example images of some common classes in (a) AID and (b) UCM.
L adv D ¼ −E x s ∼X s ½log DðM s ðx s ÞÞ − E x t ∼X t ½logð1 − DðM t ðx t ÞÞÞ;

Table 1
Summary of the implemented LeNet++ based model.

Table 2
Settings employed in the first two experiments.

Table 4
Numerical results of Experiment 2: SVHN to DHC digits adaptation.

Table 5
Common settings in Experiments 3 and 4.

Table 6
Augmented xView and augmented DOTA datasets.

Table 7
Numerical results of Experiment 3: augmented xView to augmented DOTA adaptation.

Table 8
Source and target classes used in Experiment 4.

Table 9
Numerical results of Experiment 4: AID to UCM adaptation.
paper describes a robust DA framework, T-ADDA.It is a semi-supervised approach that provides the required robustness for scenarios in which the initial domain shift is large.Digit image datasets and real AID were employed to demonstrate the proposed T-ADDA framework.Three scenarios were tested including transferring knowledge from simulated data to measure data (SVHN to MNIST), transferring knowledge from one set of targets to another and different set of targets (SVHN to DHC and AID to UCM), and transferring knowledge from one imaging condition/sensor to new imaging conditions or sensors (augmented xView to augmented DOTA).Our experimental results show that T-ADDA is very effective in all three scenarios.When the available labeled target images are as few as two images per class, T-ADDA increases performance over ADDA by at least 8% in the simulated-to-measured scenario, 12% in the sensor-tosensor scenario, and over 40% in the target-to-target scenario.wasan assistant professor in the Department of Computer Science and Engineering, University of Texas at Arlington from 2002 to 2008.He joined Intelligent Fusion Technology Inc. in 2015 and was promoted as a principal scientist in 2017.His research areas are in digital signal/image/ video processing and in machine learning.Andreas Savakis is professor of computer engineering and director of the Center for Humanaware Artificial Intelligence at Rochester Institute of Technology.He received his PhD in electrical and computer engineering with a mathematics minor from North Carolina State University and was with the Kodak Research Labs before joining RIT.His research has generated over 120 publications and 12 U.S. patents.His interests include computer vision, deep learning, DA, visual tracking, pose estimation, and scene analysis.Ashley Diehl is a research electronics engineer with the Air Force Research Laboratory (AFRL) in Dayton, Ohio.She began her career in the early 2010s as an undergraduate researcher.For the last eight years, her work has focused on designing classification frameworks for vibrometry and EO imagery.Her research interests include hierarchical learning, transfer learning, and few-short learning.She is also a doctoral candidate in the electrical engineering program at Wright State University.Erik Blasch is an Air Force Research Laboratory Program officer.He received his BS degree in mechanical engineering from MIT and his PhD in electrical engineering from Wright State with master's degrees in ME, industrial Eng., EE, medicine, military studies, economics, and business.His assignments include USAF Reserve colonel (ret) and adjunct associate professor with interests in information-fusion and human-machine integration.He is the author of 900+ papers, 35 patents, and 8 books.He is an AIAA associate, IEEE member, and SPIE fellow.Sixiao Wei received his MS degree from the Department of Computer and Information Sciences, Towson University in 2014 and his BS degree in electrical engineering from Huazhong University of Science and Technology, Wuhan, China, in 2010.Currently, he has been a research scientist at Intelligent Fusion Technology, Inc., Germantown, Maryland, since summer 2014.His research interests are wireless cyber security, computer networking, cloud computing, and big data.Genshe Chen is the CTO of Intelligent Fusion Technology, Inc.He received his BS and MS degrees in electrical engineering and his PhD in aerospace engineering in 1989, 1991, and 1994, respectively, all from Northwestern Polytechnical University, Xian, China.He has been the PM/PI/Technical Lead for 100+ projects.He served as a technical conference chair of SPIE DSS Sensors and Systems for Space Applications for many years.He is a senior member of SPIE.