## 1.

## Introduction

The twin satellite mission Gravity Recovery and Climate Experiment (GRACE) was launched on March 17, 2002. In orbit for the past 13 years, the satellites have been providing global models of the time-varying gravity field of the Earth. As is well known, the composition and structure of the Earth, including the distribution of the atmosphere and water mass on and below the Earth’s surface (e.g., rivers, lakes, soil moisture storage, groundwater storage, and ice including snow), are reflected by the Earth’s gravity field.^{1} In this regard, the sensitivity of the GRACE satellite mission to detecting changes in the continental water cycle has clearly been demonstrated over the last years (see, e.g., Ref. 2 and references therein). GRACE has been widely used for estimating terrestrial water storage anomalies (TWSA) and changes, for instance, in the Amazon,^{3} Yangtze,^{4} Volta,^{5} Nile,^{6} and other river basins. Additional studies from GRACE are related to monitoring the groundwater withdrawal in India^{7} and China,^{8} contributions of glaciers and ice caps to sea-level rise,^{9} and evaluation of naturally and anthropogenically induced variations in water availability in Africa.^{10} In this work, however, the aim is to assess the quality of the GRACE-derived TWSA fields inverted from the level-2 products that were calculated by four different centers. Three of these centers are identified in the mission proposal as the GRACE Science Data System (SDS), while the other is the solution provided by the Groupe de Recherche de Géodésie Spatiale (GRGS) at the Centre National d’Études Spatiales (CNES).^{11}^{,}^{12} The three SDS centers are the University of Texas Center for Space Research (CSR),^{13} the NASA Jet Propulsion Laboratory (JPL),^{14} and the Deutsches GeoForschungsZentrum (GFZ).^{15}

Error estimates for space-borne sensors often rely on ground truth validation;^{16} therefore, in the absence of such data, the assessment of the uncertainties of satellite-based products is a challenge. This applies especially to regions such as Africa and South America for which data are limited. This is even worse for the GRACE-derived TWSA fields, for which there are no direct TWSA measurements to validate the satellite-derived measurements. Uncertainties of GRACE-derived TWSA fields have been estimated by fitting and removing a constant and an annual cycle from the monthly values of each Stokes coefficient.^{16}^{,}^{17} For instance, based on 22 monthly GRACE gravity field solutions, Wahr et al.^{17} have estimated the global area weighted root mean square error (RMSE) of TWSA at 21 mm when filtered with a 750 km Gaussian smoothing radius. Additionally, they have reported that the errors decrease as the radius increases, falling from 38 mm at 500 km to 15 mm at 1000 km. Recently, Sakumura et al.^{18} removed the majority of the signals attributed to seasonal, subseasonal (semiannual and 161-day cycle), and linear variations as a means of assessing the four different data centers (i.e., CSR, GFZ, JPL, and the GRGS). These authors have reported an RMSE of 12.3 mm for CSR, 11.3 mm for GFZ, 14.4 mm for JPL, and 13.0 mm for GRGS at a global scale. At a basin scale, irrespective of the size and relative signal amplitude, the solutions all vary from the ensemble mean (composed from CSR, JPL, GFZ, and GRGS) on the order of 10 to 15 mm.^{18}

Efforts have also been made to assess the GRACE-derived TWSA fields; for example, combining measurements and/or models in the context of the water-budget equations ^{19}20.21.22.^{–}^{23} or in terms of groundwater estimations.^{8}^{,}^{24}25.^{–}^{26} Additionally, direct comparisons with hydrology models,^{27}^{,}^{28} remotely sensed rainfall data,^{29}^{,}^{30} and ocean-bottom pressure products^{31} have been reported. In practice, however, the data considered to represent the truth in these validations are subject to measurement errors and/or spatial scale mismatching. Nevertheless, these regional comparisons show typical values of $<25\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$, which appears to indicate the level of accuracy of GRACE solutions in terms of TWSA. While the estimation of GRACE errors could be improved by using the full covariance matrix^{32} and errors in the background models,^{33} a comparison of the GRACE measurements with true observations of the target quantity would be more interesting for validation purposes. However, as mentioned in the previous paragraph, there are no such *in situ* data for assessing GRACE-derived TWSA fields. The three-cornered hat (TCH) method can offer an alternative approach to quantify the uncertainty of each GRACE processing center. Originally applied to assessing the relative precision of oscillators and timing devices,^{34} this method has recently been applied to the fields of geodesy^{35}36.37.^{–}^{38} and hydrology.^{39}^{,}^{40}

The TCH method is based on a difference approach relying on the removal of common signals from the observations (i.e., the truth), then providing the uncertainties that reflect the measurement errors.^{38} According to Chin et al.,^{35} the method is algebraically simple to apply, at least to the three datasets that have statistically independent measurement error processes. However, the generalized formulation of the TCH method, as proposed by Premoli and Tavella,^{41} considers the correlations among the noises of the time series by minimizing the global correlation among them. In this investigation, the generalized formulation of the TCH method, taking into account the correlations, is applied to assess the relative quality of GRACE-derived TWSA fields from the three SDS processing centers and the GRGS solution. Although using the mean of the four (or more) centers is recommended (see, e.g., Ref. 24), it is still necessary to assess the individual uncertainties of each processing center. This is important to guide the GRACE processing centers and help the end users by confirming that the accuracy and precision of the estimates lie within a required specification.

## 2.

## Methodology and Data

## 2.1.

### Three-Cornered Hat Method

In the absence of a reference dataset, the TCH method can be used to estimate the relative uncertainties of the GRACE-derived TWSA from different sources if at least three products are available. To estimate the uncertainty in the TWSA datasets, consider the time series of the available products stored as ${\{{X}_{i}\}}_{i=1,2,\dots ,N}$, where $i$ corresponds to each solution center (that is, $N=4$, three products from the GRACE SDS processing centers, CSR, JPL, and GFZ, and one from the GRGS solution), and split each time series as

## (1)

$${X}_{i}=S+{\u03f5}_{i}\phantom{\rule[-0.0ex]{1em}{0.0ex}}\forall \text{\hspace{0.17em}\hspace{0.17em}}i=1,\dots ,N,$$^{35}), here representing the noise deviation of the GRACE processing center $i$. Since no true estimate of $S$ is available, the differences between $N-1$ processing centers and one center designated as the reference (chosen arbitrarily) can be computed as

^{36}

## (2)

$${Y}_{iN}\equiv {X}_{i}-{X}_{N}={\u03f5}_{i}-{\u03f5}_{N}\phantom{\rule[-0.0ex]{1em}{0.0ex}}\forall \text{\hspace{0.17em}\hspace{0.17em}}i=1,\dots ,N-1,$$The samples of the $N-1$ solution centers’ differences [Eq. (2)] are concatenated in an $M\times (N-1)$ matrix as

where each row contains a monthly observation (here, $M=143$; i.e., 143 months from August 2002 to June 2014, after making provision for the missing months as shown in Sec. 3.2), and each column represents the difference between each time series and the reference. The covariance matrix $S$ of the series of differences is computed as where $\mathrm{cov}(\u2022)$ is the covariance operator. A generic element of $S$ represents either a variance estimate (for $i=j$) or a covariance estimate (for $i\ne j$). The $N\times N$ covariance matrix (Allan covariance matrix) of the individual noises $R$, whose elements are the unknowns of the problem and should be determined, is related to $S$ by^{42}

## (5)

$$S={H}^{\mathrm{T}}\xb7R\xb7H\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{with}\phantom{\rule[-0.0ex]{1em}{0.0ex}}H=\left[\begin{array}{c}I\\ -{\mathbf{u}}^{\mathrm{T}}\end{array}\right],$$Equation (5) can be rewritten as^{43}

## (6)

$$S=[I-\mathbf{u}]\left[\begin{array}{cc}\widehat{R}& \mathbf{r}\\ {\mathbf{r}}^{\mathrm{T}}& {r}_{NN}\end{array}\right]\left[\begin{array}{c}I\\ -{\mathbf{u}}^{\mathrm{T}}\end{array}\right],$$## (7)

$$\widehat{R}=S-{r}_{NN}[{\mathbf{uu}}^{\mathrm{T}}]+{\mathbf{ur}}^{\mathrm{T}}+{\mathbf{ru}}^{\mathrm{T}}.$$^{43}However, Premoli and Tavella

^{41}have pointed out that this condition is not sufficient to determine a unique solution for $R$. They have proposed an optimal choice criteria for the free parameters based on the minimization of the global correlation among the noises of the involved time series, maintaining the positive definiteness of $R$. Galindo and Palacio

^{44}have suggested the constrained minimization problem aiming to reach a unique solution (minimum of the quadratic mean of covariances) using the Kuhn-Tucker theorem, where the objective function $F$ is given by

^{44}with a constraint function

^{44}

## (9)

$$G(\mathbf{r},{r}_{NN})\equiv -\frac{{r}_{NN}-{[\mathbf{r}-{r}_{NN}\mathbf{u}]}^{\mathrm{T}}\xb7{S}^{-1}\xb7[\mathbf{r}-{r}_{NN}\mathbf{u}]}{K}<0,$$## (10)

$${r}_{iN}^{(0)}=0,\phantom{\rule[-0.0ex]{1em}{0.0ex}}i<N\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{and}\phantom{\rule[-0.0ex]{1em}{0.0ex}}{r}_{NN}^{(0)}={(2\xb7{\mathbf{u}}^{\mathrm{T}}\xb7{S}^{-1}\xb7\mathbf{u})}^{-1},$$^{45}After determining the free parameters (${r}_{1N},{r}_{2N},\dots ,{r}_{N-1,N}$, and ${r}_{NN}$) by minimizing Eq. (8), the remaining unknown elements of $R$ can be determined by Eq. (7).

## 2.2.

### Gravity Recovery and Climate Experiment Derived Terrestrial Water Storage Anomalies

## 2.2.1.

#### Terrestrial water storage anomaly computations

The shape of the Earth’s gravity field is a geoid, the equipotential surface that best fits, in a least-squares sense, the global mean sea level. Consequently, variations in gravity are equivalent to variations in the geoid. Temporal variations of the geoid, predominantly resulting from changes in water mass (considering that the atmospheric and oceanic contributions have been removed from GRACE measurements during data processing for dealiasing purposes) can be inverted to surface mass anomalies $\mathrm{\Delta}\sigma $, which is expressed as^{46}

## (11)

$$\mathrm{\Delta}\sigma (\theta ,\lambda ,t)=\sum _{n=0}^{{n}_{\mathrm{max}}}\sum _{m=0}^{n}[\mathrm{\Delta}{\tilde{C}}_{nm}(t)\mathrm{cos}\text{\hspace{0.17em}}m\lambda +\mathrm{\Delta}{\tilde{S}}_{nm}(t)\mathrm{sin}\text{\hspace{0.17em}}m\lambda ]{\overline{P}}_{nm}(\mathrm{cos}\text{\hspace{0.17em}}\theta ),$$^{46}

## (12)

$$\left[\begin{array}{c}\mathrm{\Delta}{\tilde{C}}_{nm}(t)\\ \mathrm{\Delta}{\tilde{S}}_{nm}(t)\end{array}\right]=R\frac{{\rho}_{e}}{3}\frac{2n+1}{1+{k}_{n}}\left[\begin{array}{c}\mathrm{\Delta}{\overline{C}}_{nm}(t)\\ \mathrm{\Delta}{\overline{S}}_{nm}(t)\end{array}\right],$$## (13)

$$\left[\begin{array}{c}\mathrm{\Delta}{\overline{C}}_{nm}(t)\\ \mathrm{\Delta}{\overline{S}}_{nm}(t)\end{array}\right]=\left[\begin{array}{c}{\overline{C}}_{nm}(t)\\ {\overline{S}}_{nm}(t)\end{array}\right]-\frac{1}{M}\sum _{t=1}^{M}\left[\begin{array}{c}{\overline{C}}_{nm}(t)\\ {\overline{S}}_{nm}(t)\end{array}\right],$$Equation (11) is used for computing the TWSA for each grid point defined by $\theta $ and $\lambda $; thus, one can use the grid for averaging the TWSA over a particular region (e.g., a river basin). Another approach for averaging the TWSA over a river basin is based on the exact averaging kernel, $\vartheta (\theta ,\lambda )$, which is a function that describes the shape of the basin as^{47}

## (14)

$$\vartheta (\theta ,\lambda )=\{\begin{array}{cc}0& \text{outside the basin}\\ 1& \text{inside the basin}\end{array}.$$## (15)

$$\left\{\begin{array}{c}{\vartheta}_{nm}^{c}\\ {\vartheta}_{nm}^{s}\end{array}\right\}=\frac{1}{4\pi}{\int}_{0}^{2\pi}{\int}_{0}^{\pi}\vartheta (\theta ,\lambda ){\overline{P}}_{nm}(\mathrm{cos}\text{\hspace{0.17em}}\theta )\left\{\begin{array}{c}\mathrm{cos}\text{\hspace{0.17em}}m\lambda \\ \mathrm{sin}\text{\hspace{0.17em}}m\lambda \end{array}\right\}\mathrm{sin}\text{\hspace{0.17em}}\theta \mathrm{d}\theta \text{\hspace{0.17em}}\mathrm{d}\lambda .$$^{47}

## (16)

$$\overline{\mathrm{\Delta}\sigma}(t)=\frac{1}{\mathrm{\Omega}}\sum _{n=0}^{{n}_{\mathrm{max}}}\sum _{m=0}^{n}[{\vartheta}_{nm}^{c}\mathrm{\Delta}{\tilde{C}}_{nm}(t)+{\vartheta}_{nm}^{s}\mathrm{\Delta}{\tilde{S}}_{nm}(t)],$$^{47}an average calculated using the exact representation of the basin shape is influenced by the satellite measurement errors, which rapidly increase for high degrees and orders (short wavelengths).

## 2.2.2.

#### Gravity Recovery and Climate Experiment level 2 products

Gravity field solutions (static and time-variable gravity fields) are obtained from the level 1 data after further processing. As few users can construct their own solutions from the level 1 product, the GRACE mission partner institutions (CSR, JPL, and GFZ) and CNES/GRGS provide monthly solutions in the form of Stokes coefficients known as level 2 (L2) data to some degree and order (d/o), freely available on the Internet (e.g., Ref. 48). Additionally, institutions such as the Delft Institute of Earth Observation and Space Systems, Delft University of Technology, Netherlands, and Tongji University, China, provide independent solutions based on alternative approaches. Each center processes the data using different methodology, which may result in differences in their solutions.^{28} Additionally, Wouters et al.^{49} have reported that the differences in the approaches of the processing centers are attributed to the background models used (e.g., an *a priori* model of the Earth’s gravity field, force models representing luni-solar and third body tides, and gravitational effects of the ocean and atmospheric mass variations), the period over which the orbits are integrated, weighting of the data, etc.

In this study, however, the monthly solutions generated by the SDS processing centers (CSR Release 05,^{13} GFZ Release 05,^{15} and JPL Release 05.1^{14}) and GRGS Release 03-v1^{12} were used. The processing center solutions of CSR, GFZ, GRGS, and JPL are computed based on the integration of variational equations (dynamic method), where monthly solutions (GFZ also provides weekly solutions) have been estimated throughout the whole GRACE mission period. Currently, an alternative approach using radial basis functions is under investigation at GFZ with promising results.^{50} Despite the approach being the same, possible differences can be related to, for example, the background models. As such, it is important to determine the noise deviations of each solution relative to an unknown true value (see Sec. 2.1) to assess their strengths and weaknesses. The Stokes coefficients from the four processing centers were considered up to d/o 60 in Eq. (11) (i.e., ${n}_{\mathrm{max}}=60$, which can be associated with the shortest resolution as $\rho \text{\hspace{0.17em}}[\mathrm{km}]=(\pi /{n}_{\mathrm{max}})R\text{\hspace{0.17em}}[\mathrm{km}]$ at the equator). The degree 1 coefficients (${C}_{\mathrm{1,0}}$, ${C}_{\mathrm{1,1}}$, and ${S}_{\mathrm{1,1}}$), which represent the changes in the geocenter, and the degree 2 coefficient (${C}_{\mathrm{2,0}}$), which is associated with the oblate shape of the Earth, were not considered here. Since they would be a common-mode signal for the four centers, through Eq. (2), the contribution of these low-degree coefficients is canceled out.

GRACE gravity fields at high-degree coefficients exhibit a high level of noise, which is known as stripes at the spatial domain.^{51} Therefore, to obtain coherent results, it is necessary to remove these stripes in postprocessing by reducing correlated errors with a minimal impact on the real signal. These correlations can be reduced by using an *a priori* synthetic model of the observation geometry, as suggested by Kusche in Ref. 52. The filtered coefficients, ${\mathbf{x}}_{\gamma (a)}$, are obtained by^{52}

*a priori*signal covariance matrix. Parameter $a$ can be adjusted to tune the smoothness of the solution. Here, a value of $a=1\times {10}^{13}$ was adopted, which is called DDK2, with the level of smoothing approximately corresponding to a Gaussian smoothing radius of 340 km. An exhaustive comparison of the suitability of the filter methods available can be found in Ref. 53, and the choice of DDK2 is supported by their results and, additionally, is consistent with the results provided by Sakumura et al. in Ref. 18. Although it is not necessary to filter GRGS monthly solutions,

^{11}

^{,}

^{12}since they have already been stabilized during their generation process, in this study, they have been filtered as the SDS processing centers.

## 3.

## Results and Discussion

In this section, the main results from this study are presented. Section 3.1 shows the spectral intercomparisons of L2 products (i.e., Stokes coefficients) of the four processing centers. Section 3.2 shows the results related to global comparisons of the four GRACE processing centers, and Sec. 3.3 shows the results at the basin scale. Finally, Sec. 3.4 shows a summary of the results.

## 3.1.

### Spectral Intercomparison of the Gravity Recovery and Climate Experiment Processing Centers

The relative intercomparisons pertaining to the spectral domain of the GRACE SDS processing centers and the GRGS solution in Fig. 1 show the Pearson’s linear correlation coefficient (CC) in the Stokes coefficients after applying the DDK2 filter. It is possible that CSR and GFZ are more correlated to each other than, for example, CSR and JPL. CSR and GFZ show a mean correlation for Stokes coefficients $\mathrm{\Delta}\overline{C}$ and $\mathrm{\Delta}\overline{S}$ of 0.60 and 0.59, respectively, while between CSR and JPL, it is 0.57 and 0.57 for $\mathrm{\Delta}\overline{C}$ and $\mathrm{\Delta}\overline{S}$, respectively. GFZ and JPL show a relative correlation of 0.53 and 0.52 for $\mathrm{\Delta}\stackrel{\u203e}{C}$ and $\mathrm{\Delta}\overline{S}$, respectively. Additionally, the strength of relationships between SDS processing centers and the GRGS solution is moderate. CSR and GRGS show a mean CC of 0.41 and 0.41 for $\mathrm{\Delta}\overline{C}$ and $\mathrm{\Delta}\overline{S}$, respectively; GFZ and GRGS show a mean CC of 0.41 and 0.42 for $\mathrm{\Delta}\stackrel{\u203e}{C}$ and $\mathrm{\Delta}\overline{S}$, respectively; JPL and GRGS show a mean CC of 0.38 and 0.38 for $\mathrm{\Delta}\overline{C}$ and $\mathrm{\Delta}\overline{S}$, respectively. It is possible to see that for the sectorial ($n=m$) coefficients with d/o higher than 20 and for coefficients with d/o higher than 40, the CC values are weak for SDS centers and GRGS comparison pairs.

Additionally, Fig. 2 shows the Nash-Sutcliffe efficiency (NSE) coefficient [for details, see Eq. (18) of Ref. 54]. The NSE coefficient ranges between $-\infty $ and 1, where the coefficient tends to 1 provided that the phase, amplitude, and mean of the simulated and observed time series agree. Values between 0 and 1 are generally viewed as acceptable levels of performance, whereas values $<0$ indicate that the mean observed value is a better predictor than the simulated value.^{55} Therefore, results with NSE values $<0$, which indicate unacceptable performance, were masked out in black. Similar to the correlation, NSE coefficients decrease with increasing d/o. However, it is possible to see that for the sectorial coefficients, the NSE values are $<0$ for all comparison pairs. Overall, the SDS processing centers show good NSE values up to degree 50 and order 20 (near zonal coefficients) among themselves. For d/o higher than 40 and 20, respectively, the relative comparison between GRGS and the SDS solutions shows low NSE values, where the majority are $<0$. This could be due to amplitude reduction rather than phase shift, since NSE evaluates consistency in phase and amplitude and CC only in phase.

## 3.2.

### Global Comparisons of the Solutions

First, the global grids ($1\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{deg}\text{\hspace{0.17em}}\times \text{\hspace{0.17em}}1\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{deg}$) of TWSA were computed using Eq. (11) for the period of August 2002 to June 2014, starting from degree 3 and order 0 up to d/o 60. Subsequently, 134 GRACE-derived TWSA monthly fields were resampled for exactly the middle of each month, and the data of the nine missing months were interpolated. This was necessary since the number of days differs by a few days among the GRACE processing centers and all the datasets must be aligned in time in order to use the TCH method. Following the computations of TWSA fields, a simultaneous fitting of a time series taking into account constant, linear, and periodic variations (annual, semiannual, and 161-day amplitudes and phases) was carried out at each grid point ($\theta ,\lambda $). This was necessary to verify whether the differences/similarities from the intercomparisons in Sec. 3.1 propagated to the spatial domain in terms of linear trends and annual amplitudes. Overall, the relative comparisons of annual amplitudes and linear trends (see Fig. 3) between every set of two products showed correlations higher than 0.99 and the same patterns. As shown in Fig. 1, larger amplitude coefficients exhibited higher NSE and CC values, which implies that annual and long-term components of TWSA fields derived from these processing centers provide similar results.

The quality of the results published by the four processing centers was assessed using the uncertainties computed from the TCH method. First, Eq. (8) was minimized at each grid point to derive the four ($N=4$) free parameters, which were used to compute the remaining elements of matrix $R$ using Eq. (7). The diagonal elements of matrix $R$, which contains the noise variances (${r}_{11}$, ${r}_{22}$, ${r}_{33}$, ${r}_{44}$), were then used to compute the noise deviations, which express the quality of each processing center (GFZ, GRGS, JPL, and CSR). Despite the differences between the strategies adopted by each processing center to compute the Stokes coefficients, the GRACE measurements are the same among the centers. Thus, it is possible to assume that GRACE detects the same geophysical phenomena and the common signals are canceled out by using Eq. (2), making the TCH method able to provide methodology-dependent errors of each processing center.

The results of TCH-derived uncertainties are presented in Fig. 4. It is possible to see that all GRACE processing centers (i.e., CSR, GFZ, GRGS, and JPL) present patterns of uncertainty that resemble the stripes. CSR seems to present the smallest uncertainties relative to the other three solutions, especially over the Greenland, Amazon, and Antarctic regions. The uncertainty map of GFZ shows larger overall error patterns than JPL, which shows higher uncertainties near the poles. Figure 4 also illustrates the uncertainties over the polar regions for the four processing centers; it confirms that the JPL-computed uncertainties are higher over these regions. Wahr et al.^{16} have reported that the errors are generally smaller over the poles than at low latitudes, apparently due to denser ground-track coverage near those regions. Furthermore, in Ref. 18, Sakumura et al. have reported that JPL-derived TWSA fields present relatively higher variations with respect to CSR over the North and South Poles. They have linked this deviation to the known motion of the sun relative to the orbital plane, which manifests as error in tides and spacecraft environment, and somehow, it is larger in the JPL solution. Additionally, CSR and JPL calculate the monthly solutions as a deviation from the GIF-48 mean geopotential model.^{13}^{,}^{14} However, there is a slight difference between CSR and JPL on the diurnal and semidiurnal band of ocean tide contribution, where CSR RL05 uses the GOT4.8 and JPL RL05.1 uses the GOT4.7, which differ only in the harmonics of the S2 tide.^{13}

The area-weighted uncertainties for each processing center were computed by considering the uncertainty distribution presented in Fig. 4. The results show that CSR presents a globally averaged (weighted) uncertainty of 9.4 mm, GFZ 13.7 mm, GRGS 14.8 mm, and JPL 13.2 mm. The low performance of GFZ is evident over the mid-latitude and tropic regions (Fig. 4), and appears to be primarily due to increased striping over the oceans. Chambers and Bonin et al.^{31} have reported that the CSR, GFZ, and JPL time series are virtually the same over the oceans in terms of comparisons based on ocean-bottom pressure. Large uncertainties of GRGS are evident, particularly over regions above the latitudes $\pm 80\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{deg}$, Northern Australia, and Southeast Asia. It is well known that the ensemble mean is effective in reducing the noise in comparison with the members used in its computation, as shown in Ref. 18. Here, the ensemble mean using the CSR, GFZ, GRGS, and JPL series was computed and compared with each individual solution. The standard deviation of the differences was 9.4 mm for CSR, 11.5 mm for GFZ, 12.0 mm for GRGS, and 11.2 mm for JPL, which confirms the results derived by the TCH method, despite the magnitude of the uncertainties and the differences between standard deviation and Allan deviation.

However, the noises are insufficient to describe the relative quality of each processing center. Hence, for each GRACE product (i.e., CSR, GFZ, GRGS, and JPL), the signal-to-noise ratio (SNR) was estimated as the ratio between the standard deviation of the time series of TWSA and the uncertainties presented in Fig. 4; the results are presented in Fig. 5. The spatial patterns of SNRs are different from those of the noise distribution (Fig. 4), indicating higher values in regions with strong hydrological signals (e.g., Amazon basin, Congo/Zambezi basins, and Northwest India) and in regions under significant ice storage changes (e.g., Greenland and Asian high mountains). Area-weighted SNRs were computed for each processing center considering the SNRs presented in Fig. 5. The results show that CSR presents an SNR of 4.2, GFZ of 2.9, GRGS of 2.3, and JPL of 2.9, indicating the overall good performance of the CSR product relative to the other two SDS centers as well as to GRGS. However, over regions of low signal variations, such as the Sahara Desert and Gobi Desert, China, it is apparent that GFZ and JPL show smaller variations relative to their uncertainties, while CSR presents high SNRs over these regions. Again, in comparison with the ensemble mean, the mean of the SNRs shows 3.9 for CSR, 3.2 for GFZ, 2.8 for GRGS, and 3.3 for JPL.

## 3.3.

### Basin Scale Comparison of the Solutions

The river basins of the various drainage areas and locations (Fig. 6) were carefully chosen in order to quantify the relative differences of the four data centers at basin scale. The boundaries of the selected 91 basins with areas larger than $100\times {10}^{3}\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{km}}^{2}$ were obtained from the WRI Major Watersheds of the World Delineation, available at Ref. 56. This choice was based on the fact that basins smaller than the spatial resolution of the GRACE-derived TWSA ($\sim 333.6\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{km}$ at the equator, considering the expansion up to d/o 60) could not be adequately captured by GRACE. However, for anyone interested in using GRACE for basins smaller than this spatial resolution, the study of Longuevergne et al. in Ref. 57 is recommended. The basin-averaged TWSA for all 91 basins were computed by applying Eq. (16) for the period of August 2002 to June 2014, starting from degree 3 and order 0 up to d/o 60. The spherical harmonic coefficients ${\vartheta}_{nm}^{c}$ and ${\vartheta}_{nm}^{s}$, defining a mask with the perimeter of the basins, shown in Fig. 6, were computed up to d/o 60 using Eq. (15), then filtered with the DDK2 filter scheme using Eq. (17). An example of the geographic mask is shown in Fig. 6 for the Amazon basin after filtering its spherical harmonic coefficients (${\vartheta}_{nm}^{c}$ and ${\vartheta}_{nm}^{s}$).

The TCH-based uncertainties are shown in Fig. 7(a), from which it is evident that CSR provides the lowest uncertainties for almost all 91 basins, and GFZ and JPL present similar results, with JPL performing slightly better than GFZ. However, GRGS presents the lowest performance compared to the SDS processing centers over the river basins considered here, particularly over basins 21, 27, 33, 52, 66, and 90. The TCH-based uncertainties of the four processing centers vary with respect to the basin size, which is in agreement with the fact that GRACE is more sensitive and precise for large river basins (see, e.g., Refs. 16, 58, and 59). In order to determine whether the relative basins’ hydrological signal variations influence the magnitude of the uncertainties, SNR values for each river basin were computed [Fig. 7(b)]. It is apparent that the scatter of the SNRs is relatively decreasing; that is, it generally depends on the size of the basin. This finding somewhat agrees with the results presented in Ref. 18 (see Fig. 3), where it has been shown that the differences in the solutions are due to random errors rather than signal differences. Regardless of the size of the basin, the amplitude of the hydrological signal also plays an important role in the performance of the GRACE-derived TWSA fields. For example, the SNRs of basins 02 to 11 and 31 to 45 are relatively lower than those of the basins with smaller areas. Overall, all four centers showed high performance in recovering the hydrological variations over the basins considered here (see Table 1 for the names and Fig. 6 for the locations of all 91 basins).

## Table 1

Uncertainties and SNRs for the 91 basins larger than 100×103 km2. Each basin location is shown in Fig. 6 and can be identified by the numbers in the first column of the table.

Basin | Area (km2) | Uncertainties | SNR | |||||||
---|---|---|---|---|---|---|---|---|---|---|

No. | Name | CSR | GFZ | GRGS | JPL | CSR | GFZ | GRGS | JPL | |

01 | Amazon | 6,035,841 | 4.2 | 7.6 | 13.0 | 4.4 | 24.5 | 13.6 | 7.6 | 22.9 |

02 | Zaire | 3,679,287 | 5.7 | 8.3 | 9.2 | 6.2 | 6.3 | 4.4 | 3.9 | 5.7 |

03 | Mississippi | 3,228,838 | 3.4 | 7.4 | 8.1 | 5.5 | 11.7 | 5.7 | 4.6 | 7.3 |

04 | Ob | 3,085,466 | 3.5 | 4.4 | 11.2 | 7.6 | 6.8 | 5.2 | 2.0 | 3.1 |

05 | Nile | 3,057,377 | 5.1 | 9.4 | 9.0 | 8.3 | 2.4 | 1.7 | 1.3 | 1.7 |

06 | Parana | 2,565,049 | 6.4 | 8.0 | 12.0 | 7.7 | 6.7 | 5.5 | 3.4 | 5.3 |

07 | Lena | 2,418,600 | 3.2 | 5.6 | 9.9 | 6.0 | 7.1 | 4.2 | 2.5 | 4.0 |

08 | Amur | 2,214,920 | 4.1 | 6.4 | 10.3 | 6.0 | 6.2 | 3.8 | 2.6 | 3.8 |

09 | Niger | 2,197,243 | 5.0 | 7.3 | 8.3 | 6.3 | 4.8 | 3.4 | 2.8 | 3.9 |

10 | Yenisey | 1,926,905 | 4.3 | 7.3 | 8.7 | 6.9 | 4.7 | 2.8 | 2.2 | 3.2 |

11 | Yangtze | 1,709,102 | 5.6 | 6.2 | 7.8 | 6.3 | 5.3 | 5.0 | 3.8 | 5.1 |

12 | Ganges-Brahmaputra | 1,662,311 | 6.0 | 7.1 | 10.9 | 7.7 | 13.9 | 12.1 | 7.8 | 10.6 |

13 | Mackenzie | 1,537,735 | 3.6 | 6.5 | 9.6 | 4.9 | 10.8 | 6.1 | 3.9 | 7.9 |

14 | Lake Chad | 1,497,075 | 5.9 | 9.3 | 8.7 | 7.9 | 3.7 | 2.7 | 2.3 | 3.0 |

15 | Volga | 1,393,757 | 4.8 | 6.0 | 9.4 | 6.5 | 8.2 | 6.6 | 4.3 | 6.1 |

16 | Zambezi | 1,381,198 | 5.4 | 8.0 | 11.5 | 7.1 | 16.2 | 10.8 | 7.6 | 11.9 |

17 | Saskatchewan-Nelson | 1,138,155 | 2.3 | 7.1 | 10.8 | 6.3 | 14.3 | 4.6 | 3.0 | 5.4 |

18 | St. Lawrence | 1,055,493 | 7.0 | 8.9 | 9.5 | 7.7 | 5.1 | 4.1 | 3.1 | 4.2 |

19 | Murray-Darling | 1,049,014 | 6.4 | 8.7 | 9.8 | 9.3 | 4.6 | 3.2 | 3.0 | 3.1 |

20 | Indus | 1,030,597 | 6.8 | 8.1 | 10.8 | 8.0 | 3.5 | 3.1 | 2.2 | 3.1 |

21 | Orinoco | 934,465 | 5.9 | 11.1 | 16.9 | 7.3 | 15.5 | 8.2 | 5.5 | 12.4 |

22 | Hwang Ho | 881,193 | 8.7 | 6.9 | 8.5 | 7.7 | 2.9 | 3.6 | 2.7 | 3.2 |

23 | Yukon | 834,934 | 3.4 | 3.6 | 11.4 | 7.1 | 19.2 | 17.9 | 6.0 | 8.8 |

24 | Mekong | 801,880 | 6.7 | 8.7 | 11.1 | 9.2 | 12.3 | 9.6 | 7.4 | 9.0 |

25 | Danube | 801,832 | 5.3 | 7.4 | 9.3 | 7.6 | 7.3 | 5.0 | 4.0 | 5.0 |

26 | Jubba-Shebelle | 799,531 | 7.2 | 9.6 | 9.5 | 7.9 | 4.3 | 3.3 | 3.1 | 3.7 |

27 | Tocantins | 776,785 | 5.7 | 11.6 | 16.8 | 10.1 | 18.7 | 9.2 | 6.2 | 10.5 |

28 | Kolyma | 751,157 | 4.4 | 8.8 | 7.8 | 7.1 | 5.7 | 3.0 | 3.4 | 3.6 |

29 | Okavango | 692,213 | 6.8 | 10.6 | 10.4 | 8.6 | 10.1 | 6.6 | 6.8 | 8.2 |

30 | Columbia | 670,165 | 5.0 | 9.3 | 9.4 | 8.4 | 11.6 | 6.4 | 6.3 | 6.9 |

31 | Colorado | 669,314 | 5.8 | 10.1 | 14.1 | 9.9 | 5.8 | 3.7 | 2.3 | 3.5 |

32 | Rio Grande | 641,609 | 6.0 | 10.5 | 12.3 | 9.1 | 6.2 | 3.9 | 2.7 | 4.3 |

33 | Sao Francisco | 621,622 | 5.4 | 10.6 | 16.2 | 9.9 | 9.6 | 4.9 | 3.0 | 5.3 |

34 | Orange | 620,401 | 6.8 | 9.2 | 11.9 | 8.2 | 2.9 | 2.3 | 1.6 | 2.4 |

35 | Amu Darya | 611,451 | 6.2 | 8.3 | 11.7 | 7.9 | 6.0 | 4.5 | 3.1 | 4.7 |

36 | Tigris-Euphrates | 600,305 | 7.5 | 11.0 | 8.0 | 9.6 | 6.6 | 4.6 | 6.4 | 5.3 |

37 | Baikal | 588,895 | 5.6 | 8.0 | 8.7 | 8.1 | 4.4 | 3.2 | 2.6 | 3.2 |

38 | Tarim (Yarkand) | 580,923 | 5.8 | 8.8 | 13.6 | 9.9 | 4.7 | 3.2 | 2.1 | 2.8 |

39 | Dnieper | 534,768 | 7.4 | 8.4 | 9.4 | 7.6 | 4.5 | 4.0 | 3.4 | 4.5 |

40 | Don | 487,495 | 7.5 | 8.2 | 9.7 | 8.9 | 5.6 | 4.9 | 4.2 | 4.7 |

41 | Ili | 478,974 | 5.7 | 8.2 | 14.1 | 10.9 | 4.1 | 3.0 | 1.9 | 2.3 |

42 | Si | 459,251 | 7.3 | 8.4 | 10.4 | 9.5 | 5.3 | 4.7 | 3.7 | 4.2 |

43 | Senegal | 434,749 | 6.0 | 11.4 | 8.8 | 7.7 | 5.5 | 2.9 | 3.3 | 4.0 |

44 | Syr Darya | 429,847 | 7.2 | 8.0 | 12.5 | 8.6 | 4.3 | 3.7 | 2.6 | 3.7 |

45 | Rio Colorado | 418,990 | 7.0 | 9.5 | 10.8 | 10.3 | 6.3 | 4.4 | 4.1 | 4.1 |

46 | Limpopo | 413,588 | 6.5 | 9.1 | 13.9 | 10.2 | 4.9 | 3.7 | 2.2 | 3.1 |

47 | Volta | 412,481 | 5.9 | 13.3 | 9.4 | 8.5 | 8.4 | 3.7 | 5.0 | 6.0 |

48 | Irrawaddy | 386,957 | 8.1 | 11.8 | 12.8 | 10.9 | 12.3 | 8.4 | 7.8 | 8.8 |

49 | N. Dvina | 361,615 | 5.0 | 8.5 | 11.3 | 9.2 | 8.2 | 4.9 | 3.6 | 4.5 |

50 | Uruguay | 343,288 | 7.5 | 9.2 | 14.1 | 11.2 | 6.8 | 5.3 | 3.3 | 4.6 |

51 | Indigirka | 330,261 | 5.6 | 9.3 | 10.1 | 8.4 | 7.6 | 4.6 | 4.4 | 5.3 |

52 | Parnaiba | 329,441 | 5.3 | 11.4 | 16.6 | 9.7 | 11.3 | 5.2 | 3.4 | 6.1 |

53 | Godavari | 319,823 | 9.2 | 9.2 | 11.6 | 9.0 | 8.1 | 7.9 | 6.2 | 8.0 |

54 | Helmand | 317,312 | 7.1 | 10.0 | 10.2 | 9.8 | 3.9 | 3.0 | 2.5 | 2.8 |

55 | Liao | 274,646 | 6.6 | 8.4 | 12.1 | 8.7 | 3.7 | 2.9 | 2.2 | 3.0 |

56 | Ural | 262,748 | 5.3 | 9.4 | 10.4 | 7.4 | 6.7 | 3.9 | 3.5 | 4.8 |

57 | Krishna | 251,617 | 7.9 | 9.6 | 11.1 | 8.4 | 6.3 | 5.0 | 4.4 | 5.8 |

58 | Salween | 249,041 | 6.3 | 10.8 | 10.9 | 9.3 | 11.9 | 7.0 | 6.8 | 7.8 |

59 | Fraser | 240,935 | 5.1 | 9.4 | 9.2 | 8.8 | 14.1 | 7.8 | 7.9 | 8.4 |

60 | Magdalena | 232,208 | 6.4 | 11.8 | 11.0 | 9.4 | 6.1 | 3.5 | 3.7 | 4.2 |

61 | Ogooue | 221,217 | 8.5 | 12.7 | 11.6 | 8.9 | 6.2 | 3.9 | 4.2 | 5.6 |

62 | Rufiji | 203,544 | 7.3 | 11.1 | 12.0 | 10.4 | 6.6 | 4.3 | 4.1 | 4.6 |

63 | Vistula | 183,927 | 6.4 | 8.0 | 11.5 | 9.6 | 4.7 | 3.6 | 2.4 | 3.0 |

64 | Song Hong | 171,848 | 10.8 | 10.5 | 10.4 | 9.2 | 5.7 | 6.0 | 5.9 | 7.0 |

65 | Oder | 164,965 | 6.1 | 8.8 | 12.4 | 9.5 | 5.2 | 3.4 | 2.3 | 3.2 |

66 | Essequibo | 164,857 | 7.0 | 12.9 | 18.6 | 8.9 | 12.9 | 7.0 | 4.9 | 10.0 |

67 | Kura-Araks | 159,531 | 8.2 | 9.5 | 10.9 | 10.3 | 7.3 | 6.3 | 5.9 | 5.8 |

68 | Negro | 160,588 | 6.5 | 8.7 | 11.2 | 10.3 | 6.1 | 4.4 | 3.6 | 3.8 |

69 | Ruvuma | 152,339 | 7.0 | 11.4 | 11.5 | 10.2 | 8.7 | 5.2 | 5.5 | 5.9 |

70 | Rhine | 151,487 | 6.9 | 11.4 | 11.3 | 8.1 | 4.3 | 2.6 | 2.3 | 3.6 |

71 | Chao Phraya | 151,450 | 9.4 | 10.2 | 14.2 | 10.5 | 10.8 | 9.8 | 6.9 | 9.4 |

72 | Mahanadi | 149,125 | 9.3 | 10.6 | 13.8 | 9.8 | 9.4 | 8.3 | 6.3 | 8.7 |

73 | Chobut | 141,152 | 5.8 | 9.5 | 12.2 | 11.3 | 5.1 | 3.1 | 2.5 | 2.6 |

74 | Kwanza | 140,896 | 7.8 | 12.3 | 12.4 | 10.4 | 7.1 | 4.5 | 4.8 | 5.3 |

75 | Sanaga | 133,895 | 8.0 | 13.1 | 10.2 | 9.8 | 7.7 | 4.7 | 5.6 | 6.1 |

76 | Fly | 133,142 | 8.3 | 10.3 | 13.7 | 10.5 | 4.9 | 3.9 | 3.0 | 3.7 |

77 | Usumacinta | 126,436 | 7.3 | 11.9 | 11.1 | 10.6 | 8.1 | 4.9 | 5.1 | 5.4 |

78 | Hari Rud | 126,310 | 8.1 | 9.9 | 9.4 | 10.0 | 3.6 | 3.0 | 2.8 | 2.9 |

79 | Santiago-Lerma-Chapala | 126,031 | 8.4 | 12.2 | 11.2 | 9.7 | 5.1 | 3.5 | 3.5 | 4.2 |

80 | Sacramento | 125,338 | 6.1 | 12.8 | 10.1 | 9.6 | 7.7 | 3.8 | 4.6 | 5.0 |

81 | Brazos | 121,181 | 8.1 | 10.9 | 11.3 | 8.9 | 6.9 | 5.3 | 4.5 | 6.3 |

82 | Balsas | 119,007 | 7.3 | 12.1 | 10.0 | 8.7 | 6.4 | 3.8 | 4.5 | 5.2 |

83 | Kuskokwim | 118,428 | 6.6 | 8.1 | 11.7 | 7.9 | 10.2 | 8.3 | 6.0 | 8.2 |

84 | Elbe | 115,016 | 6.1 | 10.3 | 12.0 | 8.9 | 4.9 | 2.9 | 2.3 | 3.3 |

85 | Loire | 114,547 | 9.7 | 10.0 | 11.8 | 8.1 | 3.1 | 2.7 | 2.4 | 3.6 |

86 | Alabama | 113,852 | 8.3 | 10.9 | 11.3 | 9.3 | 7.9 | 6.3 | 5.8 | 7.1 |

87 | Save | 113,497 | 6.8 | 8.8 | 14.1 | 12.3 | 6.1 | 4.8 | 2.8 | 3.2 |

88 | Colorado River | 112,249 | 8.1 | 11.2 | 11.7 | 9.7 | 6.7 | 5.1 | 4.2 | 5.8 |

89 | Cunene | 109,646 | 7.6 | 12.4 | 12.6 | 8.9 | 6.7 | 4.1 | 4.4 | 6.0 |

90 | Kapuas | 102,482 | 7.5 | 8.7 | 21.1 | 12.5 | 3.7 | 3.4 | 1.6 | 2.2 |

91 | Rio Salado | 102,038 | 7.6 | 10.4 | 13.9 | 11.2 | 7.3 | 5.2 | 3.7 | 5.1 |

Nevertheless, CSR (blue solid line) noise deviation [Fig. 7(a)] and SNR [Fig. 7(b)], for basins Saskatchewan-Nelson (No. 17), Volta (No. 47), Parnaiba (No. 52), Mississippi (No. 03), Sacramento (No. 80), Tocantins (No. 27), and Sao Francisco (No. 33), present the relatively lowest noise deviations and the largest SNRs by a factor of $\sim 2$ among SDS processing centers. The GRACE SDS processing centers present the highest SNR values for the Amazon basin (No. 01), because of its size and the strength of its hydrological signal (Table 1), while GRGS presents the highest SNR for the Fraser basin (No. 59). The noise deviation of the Amazon basin is 4.2 mm based on the CSR, 4.4 for JPL solutions, 7.6 mm for GFZ, and 13.0 mm for GRGS (Table 1). The relative comparison between the CSR and GRGS, in terms of SNR, shows that the former presents values approximately three times larger than the latter for the Ob (No. 04), Parnaiba (No. 52), Amazon (No. 01), Sao Francisco (No. 33), Yukon (No. 23), and Tocantins (No. 27) river basins. Additionally, for the Saskatchewan-Nelson basin (No. 17), the SNR of the CSR is four times larger than that of the GRGS.

Similar to Sec. 3.2, the ensemble solution calculated as the arithmetic mean of TWSA time series derived from the Stokes coefficients provided by the four processing centers was used to assess each ensemble member. Despite the differences between the magnitudes of the TCH and ensemble mean based uncertainties, the results shown in Fig. 7(c) are comparable to those of Fig. 7(a). It is evident that the uncertainties increase overall, while the basin size decreases. Additionally, the figure shows the overall good performance of CSR relative to that of JPL and GFZ. The second-best performance is that of JPL, followed by GFZ. The SNR results shown in Fig. 7(d) are also comparable to those of Fig. 7(b), where, taking into consideration the relative hydrological signal, it can be concluded that GRACE is able to detect changes in the terrestrial water storage of smaller basins.

In order to summarize the finds presented in Table 1 and Fig. 7, a multiple-comparison test was performed to determine the similarities (if any) of the CSR, GFZ, GRGS, and JPL solutions over the 91 river basins. First, a nonparametric Kruskal-Wallis test^{60} at 95% confidence interval was applied to the uncertainties shown in Table 1 to determine whether there were statistically significant differences between the solutions. Figures 8(a) and 8(b) show the box plots that visually present the summary statistics for each solution pertaining to 91 river basins for TCH and ensemble mean based errors, respectively. It is evident that the notches in the GFZ and JPL box plots overlap, and it can be concluded with 95% confidence that the true medians do not differ. Overall, the box plots for TCH and ensemble mean based error panels [Figs. 8(a) and 8(b)] show the same patterns, while GRGS presents outliers. The basins classified as outliers for the GRGS solutions are No. 90, No. 66, No. 21, No. 27, No. 52, and No. 33, for both centers. These six basins are located in regions where GRGS seems to present high uncertainties [Fig. 4(c)] relative to the SDS centers, such as Indonesia (Kapuas River basin, No. 90) and South America (basin Nos. 66, 21, 27, 52, and 33).

Subsequently, a follow-up test was conducted to identify which processing center presented uncertainties resulting from a different distribution. The relative performance pertaining to the 91 river basins by the multiple-comparison test, based on the Tukey-Kramer procedure,^{61} is shown in Figs. 8(c) and 8(d) (TCH and ensemble mean based errors, respectively). The test indicates a significant difference between the uncertainties of CSR and those of the other three processing centers. However, there is no significant difference between the uncertainties of GFZ and JPL; therefore, the test does not reject the null hypothesis that the uncertainties of these two solutions obtained from the same distribution. Additionally, the uncertainties of GRGS, for the 91 river basins, originate from a different distribution to those of CSR, GFZ, and JPL, since there is significant difference among their means. Again, the results from the ensemble solution confirm those computed by the TCH method, despite the magnitude of the estimated uncertainties from both approaches.

## 3.4.

### Results Summary

The intercomparison of the Stokes coefficients (Sec. 3.1) from the four GRACE processing centers (CSR, GFZ, GRGS, and JPL) showed an overall good agreement in terms of correlation coefficient between each pair of solutions up to degree 50 and order 40, with the GRGS being less correlated with the SDS processing centers (Fig. 1). Additionally, the Stokes coefficients from the four centers, up to degree 40 and order 20, showed relatively good agreement in terms of the NSE coefficient (Fig. 2). Moreover, the sectorial Stokes coefficients ($n=m$) showed NSE values less than zero for each pair of solutions, which indicates unacceptable performance. However, this relative comparison is unable to validate these products since it generally depends on the choice of one of them as the truth. One option would be, for example, to use an ensemble solution, computed as an arithmetic mean of the four centers, as suggested by Sakumura et al.^{18} to assess each ensemble member. As mentioned in Sec. 1, GRACE errors could be estimated by using the full covariance matrix and errors in the background models;^{32}^{,}^{33} yet a comparison of the GRACE-derived TWSA fields with true observations would be more interesting for validation purposes. However, there are no such *in situ* data for directly assessing GRACE-derived TWSA.

The importance of the present study lies in assessing the quality of each processing center using a generalized formulation of the TCH method (Sec. 2.1), which allows the noises of the time series to have a certain degree of correlation (see, e.g., Ref. 36). At a global scale, CSR presents a weighted averaged uncertainty of 9.4 mm and an SNR of 4.2, while GFZ, GRGS, and JPL have uncertainties of 13.7, 14.8, and 13.2 mm, with SNRs of 2.9, 2.3, and 2.9, respectively (Sec. 3.2). In addition to the recognized problem associated with the GRGS over regions above latitudes $\pm 82\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{deg}$, large uncertainties were also indicated for regions such as Northern Australia, Southeast Asia, and the Amazon basin [see Fig. 4(c)]. Comparisons pertaining to the 91 large river basins exceeding $100\times {10}^{3}\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{km}}^{2}$ showed that CSR, GFZ, GRGS, and JPL present mean uncertainties of 6.5, 9.4, 11.4, and 8.8 mm (Sec. 3.3). The TCH-based uncertainties were confirmed by comparing the ensemble solution (arithmetic mean of the four centers in terms of TWSA) with each ensemble member. However, TCH-based internal precision makes sense only when all processing centers are considered together. Overall, the results are encouraging and show that the TCH method has the potential to assess the quality of other space-borne sensors that deliver observations of the same target variable.

## 4.

## Conclusions

The quality of four GRACE processing centers has been estimated by using a generalized formulation of the TCH method. The overall comparisons of TCH-based uncertainties pertaining to the period from August 2002 to June 2014 showed that the four solutions (CSR, GFZ, GRGS, and JPL) lie within 5 to 15 mm in terms of global grids, as well as basin-averaged values of TWSA. However, during the time span of the comparisons, CSR provides the most precise monthly solution in terms of TWSA at the global and basin scale, though the performance of each processing center differs from basin to basin. Over the polar regions, JPL shows higher uncertainties relative to the other two SDS processing centers, while the performance of GRGS is inadequate for high-latitude regions, Northern Australia, and Southeast Asia. However, the low performance of GRGS over these regions is yet to be investigated. Overall, it can be concluded that the applicability of GRACE for a specific river basin (or a region) would depend on the strength of the signal (i.e., on the annual cycle), rather than on the size of the basin. Although the use of the ensemble mean of the four (or more) centers is recommended from the practical standpoint, the present study contributed with the possibility of properly choosing the weights, so the TWSA ensemble mean series will have a low noise variance. However, studies are needed to determine whether these findings could be impacted in the case in which the assumption of low correlation among the noises of the series fails.

## Acknowledgments

Vagner G. Ferreira acknowledges the financial support provided by the National Natural Science Foundation of China (Grant Nos. 41204016 and 41574001) and the Fundamental Research Funds for the Central Universities (Grant No. 2015B21014). He is also grateful for the research opportunity provided by Humboldt-Foundation, Research Alumni Initiative, and benefited by a grant from Karlsruhe Institute of Technology (International Scholars and Welcome Office) as part of the Research in Germany Campaign, funded by the Federal Ministry of Education and Research. Professor Dr.-Ing. Jürgen Kusche is appreciated for providing the DDK filter coefficients. We are grateful for the data used in this study as described in Secs. 2.2.2 and 3.3. We thank Professor Hongjie Xie (associate editor) and the three anonymous reviewers for their constructive comments that helped us to improve the quality of this manuscript.

## References

*in situ*river level and precipitation data,” Remote Sens. Environ. 114(8), 1629–1637 (2010). http://dx.doi.org/10.1016/j.rse.2010.02.005 Google Scholar

## Biography

**Vagner G. Ferreira** is an associate professor at the School of Earth Sciences and Engineering, Hohai University, China. He received his MSc and PhD degrees in geodetic sciences from the Federal University of Paraná, Brazil, in 2008 and 2011, respectively. He was with the Geodetic Institute of Karlsruhe (GIK), Karlsruhe Institute of Technology (KIT), Germany, during 2009, where he worked on local gravity field modeling. His current research interests include hydro-geodesy.

**Henry D. C. Montecino** is a PhD candidate at the Graduate Program in Geodetic Science, Federal University of Paraná, Brazil, and a lecturer at the Department of Geodetic Sciences and Geomatics, University of Concepción, Chile. He received his BSc degree from the University of Concepción in 2006 and his MSc degree from the Federal University of Paraná in 2012.

**Caleb I. Yakubu** is an MSc student at the School of Earth Sciences and Engineering, Hohai University, China. He received his BSc degree in geomatic engineering from the Kwame Nkrumah University of Science and Technology, Ghana, in 2013.

**Bernhard Heck** is a full professor at KIT, responsible for the chair of physical and satellite geodesy at GIK. His scientific activities include various branches of geodesy, such as mathematical and physical geodesy, analysis of geodetic networks, GNSS positioning and GNSS meteorology, as well as geodynamics.