Uncertainties of the Gravity Recovery and Climate Experiment time-variable gravity-field solutions based on three-cornered hat method

Abstract. Currently, various satellite processing centers produce extensive data, with different solutions of the same field being available. For instance, the Gravity Recovery and Climate Experiment (GRACE) has been monitoring terrestrial water storage (TWS) since April 2002, while the Center for Space Research (CSR), the Jet Propulsion Laboratory (JPL), the GeoForschungsZentrum (GFZ), and the Groupe de Recherche de Géodésie Spatiale (GRGS) provide individual monthly solutions in the form of Stokes coefficients. The inverted TWS maps (or the regionally averaged values) from these coefficients are being used in many applications; however, as no ground truth data exist, the uncertainties are unknown. Consequently, the purpose of this work is to assess the quality of each processing center by estimating their uncertainties using a generalized formulation of the three-cornered hat (TCH) method. Overall, the TCH results for the study period of August 2002 to June 2014 indicate that at a global scale, the CSR, GFZ, GRGS, and JPL presented uncertainties of 9.4, 13.7, 14.8, and 13.2 mm, respectively. At a basin scale, the overall good performance of the CSR was observed at 91 river basins. The TCH-based results were confirmed by a comparison with an ensemble solution from the four GRACE processing centers.

Uncertainties of the Gravity Recovery and Climate Experiment time-variable gravity-field solutions based on three-cornered hat method 1 Introduction 2 Methodology and Data

Three-Cornered Hat Method
In the absence of a reference dataset, the TCH method can be used to estimate the relative uncertainties of the GRACE-derived TWSA from different sources if at least three products are available. To estimate the uncertainty in the TWSA datasets, consider the time series of the available products stored as fX i g i¼1;2;: : : ;N , where i corresponds to each solution center (that is, N ¼ 4, three products from the GRACE SDS processing centers, CSR, JPL, and GFZ, and one from the GRGS solution), and split each time series as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 6 2 1 where S is the true signal and ε i is a zero-mean white noise process (representing the measurement error 35 ), here representing the noise deviation of the GRACE processing center i. Since no true estimate of S is available, the differences between N − 1 processing centers and one center designated as the reference (chosen arbitrarily) can be computed as 36 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 5 4 5 Y iN ≡ X i − X N ¼ ε i − ε N ∀ i ¼ 1; : : : ; N − 1; (2) where X N is the reference time series. The CSR-derived TWSA time series was selected as the reference series. Nevertheless, the results are independent of the special choice of a particular GRACE processing center (see, e.g., Refs. 36 and 40). The samples of the N − 1 solution centers' differences [Eq. (2)] are concatenated in an M × ðN − 1Þ matrix as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 4 5 4 where each row contains a monthly observation (here, M ¼ 143; i.e., 143 months from August 2002 to June 2014, after making provision for the missing months as shown in Sec. 3.2), and each column represents the difference between each time series and the reference. The covariance matrix S of the series of differences is computed as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 3 7 5 S ¼ covðYÞ; (4) where covð•Þ is the covariance operator. A generic element of S represents either a variance estimate (for i ¼ j) or a covariance estimate (for i ≠ j). The N × N covariance matrix (Allan covariance matrix) of the individual noises R, whose elements are the unknowns of the problem and should be determined, is related to S by 42 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 2 9 8 where I is the ðN − 1Þ × ðN − 1Þ identity matrix and u is the ðN − 1Þ vector ½1 1 · · · 1 T . Equation (5) can be rewritten as 43 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 2 3 1 whereR is the ðN − 1Þ × ðN − 1Þ submatrix, r is the (N − 1) vector ½r 1N r 2N · · · r N−1;N T grouping the covariance estimates that involve the N'th time series, and r NN is the variance of the N'th series (i.e., the reference series). The partitioning of R as in Eq. (6) is necessary to solve the underdetermined problem in Eq. (5) by isolating the N free parameters (that is, r and r NNspecifically, r 1N ; r 2N ; : : : ; r N−1;N , and r NN ). Once the free parameters have been estimated, the solution for the other unknown elements of R (that is, the elements ofR) is given by E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 6 ; 1 1 7R To determine the N free parameters, a suitable objective function can be defined, which must always fulfill the positive definiteness of R. This important constraint on the solution domain for the free parameters is valid if and only if detðRÞ > 0, as shown by Tavella and Premoli. 43 However, Premoli and Tavella 41 have pointed out that this condition is not sufficient to determine a unique solution for R. They have proposed an optimal choice criteria for the free parameters based on the minimization of the global correlation among the noises of the involved time series, maintaining the positive definiteness of R. Galindo and Palacio 44 have suggested the constrained minimization problem aiming to reach a unique solution (minimum of the quadratic mean of covariances) using the Kuhn-Tucker theorem, where the objective function F is given by 44 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 1 1 6 ; 6 5 1 with a constraint function 44 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 5 9 3 . The initial conditions were selected to assure that the initial values fulfill the constraints E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 1 1 6 ; 5 2 8 as proposed by Torcaso et al. 45 After determining the free parameters (r 1N ; r 2N ; : : : ; r N−1;N , and r NN ) by minimizing Eq. (8), the remaining unknown elements of R can be determined by Eq. (7).

Terrestrial water storage anomaly computations
The shape of the Earth's gravity field is a geoid, the equipotential surface that best fits, in a leastsquares sense, the global mean sea level. Consequently, variations in gravity are equivalent to variations in the geoid. Temporal variations of the geoid, predominantly resulting from changes in water mass (considering that the atmospheric and oceanic contributions have been removed from GRACE measurements during data processing for dealiasing purposes) can be inverted to surface mass anomalies Δσ, which is expressed as 46 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 1 ; 1 1 6 ; 3 2 4 Δσðθ; λ; tÞ ¼ X n max n¼0 X n m¼0 ½ΔC nm ðtÞ cos mλ þ ΔS nm ðtÞ sin mλP nm ðcos θÞ; (11) in kg∕m 2 for a particular month t.P nm is the normalized associated Legendre function of degree n and order m, θ is the co-latitude, λ is the longitude, and ΔC nm and ΔS nm are the residual surface density coefficients, which are given by 46 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 2 ; 1 1 6 ; 2 3 9 ΔC nm ðtÞ where ρ e is the average density of the Earth, R is the mean radius of the Earth (6371 × 10 3 m), and k n is the load Love number for degree n. The residual spherical harmonic coefficients, which describe the functionals of the Earth's gravity field (also known as Stokes coefficients), ΔC nm ðtÞ and ΔSðtÞ, are defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 3 ; 1 1 6 ; 1 4 6 ΔC nm ðtÞ ΔS nm ðtÞ ¼ where the long-term mean of Stokes coefficientsC nm andS nm is removed from each monthly value to exclude the static gravitational field. In Eq. (13), M is the total number of monthly solutions (134 months from August 2002 till July 2014; nine months are missing).
Equation (11) is used for computing the TWSA for each grid point defined by θ and λ; thus, one can use the grid for averaging the TWSA over a particular region (e.g., a river basin). Another approach for averaging the TWSA over a river basin is based on the exact averaging kernel, ϑðθ; λÞ, which is a function that describes the shape of the basin as 47 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 4 ; 1 1 6 ; 6 8 7 ϑðθ; λÞ ¼ 0 outside the basin 1 inside the basin : The function ϑðθ; λÞ can be expanded in spherical harmonic coefficients, ϑ c nm and ϑ s nm , as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 5 ; 1 1 6 ; 6 3 1 ϑ c nm ϑ s ϑðθ; λÞP nm ðcos θÞ cos mλ sin mλ sin θdθ dλ: (15) Using ϑ c nm and ϑ s nm , which describe ϑðθ; λÞ, the averaged surface mass anomalies Δσ can be expressed by a sum of Stokes coefficients as 47 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 6 ; 1 1 6 ; 5 6 2 ½ϑ c nm ΔC nm ðtÞ þ ϑ s nm ΔS nm ðtÞ; (16) where Ω, the angular area of the region of interest over the sphere, is equal to the ϑ c 00 coefficient. The use of Eq. (16) is recommended because, as observed by Swenson and Wahr,47 an average calculated using the exact representation of the basin shape is influenced by the satellite measurement errors, which rapidly increase for high degrees and orders (short wavelengths).

Gravity Recovery and Climate Experiment level 2 products
Gravity field solutions (static and time-variable gravity fields) are obtained from the level 1 data after further processing. As few users can construct their own solutions from the level 1 product, the GRACE mission partner institutions (CSR, JPL, and GFZ) and CNES/GRGS provide monthly solutions in the form of Stokes coefficients known as level 2 (L2) data to some degree and order (d/o), freely available on the Internet (e.g., Ref. 48). Additionally, institutions such as the Delft Institute of Earth Observation and Space Systems, Delft University of Technology, Netherlands, and Tongji University, China, provide independent solutions based on alternative approaches. Each center processes the data using different methodology, which may result in differences in their solutions. 28 Additionally, Wouters et al. 49 have reported that the differences in the approaches of the processing centers are attributed to the background models used (e.g., an a priori model of the Earth's gravity field, force models representing luni-solar and third body tides, and gravitational effects of the ocean and atmospheric mass variations), the period over which the orbits are integrated, weighting of the data, etc.
In this study, however, the monthly solutions generated by the SDS processing centers (CSR Release 05, 13 GFZ Release 05, 15 and JPL Release 05.1 14 ) and GRGS Release 03-v1 12 were used. The processing center solutions of CSR, GFZ, GRGS, and JPL are computed based on the integration of variational equations (dynamic method), where monthly solutions (GFZ also provides weekly solutions) have been estimated throughout the whole GRACE mission period. Currently, an alternative approach using radial basis functions is under investigation at GFZ with promising results. 50 Despite the approach being the same, possible differences can be related to, for example, the background models. As such, it is important to determine the noise deviations of each solution relative to an unknown true value (see Sec. 2.1) to assess their strengths and weaknesses. The Stokes coefficients from the four processing centers were considered up to d/o 60 in Eq. (11) (i.e., n max ¼ 60, which can be associated with the shortest resolution as ρ ½km ¼ ðπ∕n max ÞR ½km at the equator). The degree 1 coefficients (C 1;0 , C 1;1 , and S 1;1 ), which represent the changes in the geocenter, and the degree 2 coefficient (C 2;0 ), which is associated with the oblate shape of the Earth, were not considered here. Since they would be a common-mode signal for the four centers, through Eq. (2), the contribution of these low-degree coefficients is canceled out.
GRACE gravity fields at high-degree coefficients exhibit a high level of noise, which is known as stripes at the spatial domain. 51 Therefore, to obtain coherent results, it is necessary to remove these stripes in postprocessing by reducing correlated errors with a minimal impact on the real signal. These correlations can be reduced by using an a priori synthetic model of the observation geometry, as suggested by Kusche in Ref. 52. The filtered coefficients, x γðaÞ , are obtained by 52 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 7 ; 1 1 6 ; 6 6 3 using the weighting factor a to the signal- x is the vector of unfiltered Stokes coefficients, N is the normal matrix, and Efxx T g ¼ M −1 is an a priori signal covariance matrix. Parameter a can be adjusted to tune the smoothness of the solution. Here, a value of a ¼ 1 × 10 13 was adopted, which is called DDK2, with the level of smoothing approximately corresponding to a Gaussian smoothing radius of 340 km. An exhaustive comparison of the suitability of the filter methods available can be found in Ref. 53, and the choice of DDK2 is supported by their results and, additionally, is consistent with the results provided by Sakumura et al. in Ref. 18. Although it is not necessary to filter GRGS monthly solutions, 11,12 since they have already been stabilized during their generation process, in this study, they have been filtered as the SDS processing centers.

Results and Discussion
In this section, the main results from this study are presented. Section 3.1 shows the spectral intercomparisons of L2 products (i.e., Stokes coefficients) of the four processing centers. Section 3.2 shows the results related to global comparisons of the four GRACE processing centers, and Sec. 3.3 shows the results at the basin scale. Finally, Sec. 3.4 shows a summary of the results.

Spectral Intercomparison of the Gravity Recovery and Climate Experiment Processing Centers
The relative intercomparisons pertaining to the spectral domain of the GRACE SDS processing centers and the GRGS solution in Fig. 1 show the Pearson's linear correlation coefficient (CC) in the Stokes coefficients after applying the DDK2 filter. It is possible that CSR and GFZ are more correlated to each other than, for example, CSR and JPL. CSR and GFZ show a mean correlation for Stokes coefficients ΔC and ΔS of 0.60 and 0.59, respectively, while between CSR and JPL, it is 0.57 and 0.57 for ΔC and ΔS, respectively. GFZ and JPL show a relative correlation of 0.53 and 0.52 for ΔC and ΔS, respectively. Additionally, the strength of relationships between SDS processing centers and the GRGS solution is moderate. CSR and GRGS show a mean CC of 0.41 and 0.41 for ΔC and ΔS, respectively; GFZ and GRGS show a mean CC of 0.41 and 0.42 for ΔC and ΔS, respectively; JPL and GRGS show a mean CC of 0.38 and 0.38 for ΔC and ΔS, respectively. It is possible to see that for the sectorial (n ¼ m) coefficients with d/o higher than 20 and for coefficients with d/o higher than 40, the CC values are weak for SDS centers and GRGS comparison pairs. Additionally, Fig. 2 shows the Nash-Sutcliffe efficiency (NSE) coefficient [for details, see Eq. (18) of Ref. 54]. The NSE coefficient ranges between −∞ and 1, where the coefficient tends to 1 provided that the phase, amplitude, and mean of the simulated and observed time series agree. Values between 0 and 1 are generally viewed as acceptable levels of performance, whereas values <0 indicate that the mean observed value is a better predictor than the simulated value. 55 Therefore, results with NSE values <0, which indicate unacceptable performance, were masked out in black. Similar to the correlation, NSE coefficients decrease with increasing d/o. However, it is possible to see that for the sectorial coefficients, the NSE values are <0 for all comparison pairs. Overall, the SDS processing centers show good NSE values up to degree 50 and order 20 (near zonal coefficients) among themselves. For d/o higher than 40 and 20, respectively, the relative comparison between GRGS and the SDS solutions shows low NSE values, where the majority are <0. This could be due to amplitude reduction rather than phase shift, since NSE evaluates consistency in phase and amplitude and CC only in phase.  and the data of the nine missing months were interpolated. This was necessary since the number of days differs by a few days among the GRACE processing centers and all the datasets must be aligned in time in order to use the TCH method. Following the computations of TWSA fields, a simultaneous fitting of a time series taking into account constant, linear, and periodic variations (annual, semiannual, and 161-day amplitudes and phases) was carried out at each grid point (θ; λ). This was necessary to verify whether the differences/similarities from the intercomparisons in Sec. 3.1 propagated to the spatial domain in terms of linear trends and annual amplitudes. Overall, the relative comparisons of annual amplitudes and linear trends (see Fig. 3) between every set of two products showed correlations higher than 0.99 and the same patterns. As shown in Fig. 1, larger amplitude coefficients exhibited higher NSE and CC values, which implies that annual and long-term components of TWSA fields derived from these processing centers provide similar results. The quality of the results published by the four processing centers was assessed using the uncertainties computed from the TCH method. First, Eq. (8) was minimized at each grid point to derive the four (N ¼ 4) free parameters, which were used to compute the remaining elements of matrix R using Eq. (7). The diagonal elements of matrix R, which contains the noise variances (r 11 , r 22 , r 33 , r 44 ), were then used to compute the noise deviations, which express the quality of each processing center (GFZ, GRGS, JPL, and CSR). Despite the differences between the strategies adopted by each processing center to compute the Stokes coefficients, the GRACE measurements are the same among the centers. Thus, it is possible to assume that GRACE detects the same geophysical phenomena and the common signals are canceled out by using Eq. (2), making the TCH method able to provide methodology-dependent errors of each processing center.

Global Comparisons of the Solutions
The results of TCH-derived uncertainties are presented in Fig. 4. It is possible to see that all GRACE processing centers (i.e., CSR, GFZ, GRGS, and JPL) present patterns of uncertainty that resemble the stripes. CSR seems to present the smallest uncertainties relative to the other three solutions, especially over the Greenland, Amazon, and Antarctic regions. The uncertainty map of GFZ shows larger overall error patterns than JPL, which shows higher uncertainties near the poles. Figure 4 also illustrates the uncertainties over the polar regions for the four processing centers; it confirms that the JPL-computed uncertainties are higher over these regions. Wahr et al. 16 have reported that the errors are generally smaller over the poles than at low latitudes, apparently due to denser ground-track coverage near those regions. Furthermore, in Ref. 18,Sakumura et al. have reported that JPL-derived TWSA fields present relatively higher variations with respect to CSR over the North and South Poles. They have linked this deviation to the known motion of the sun relative to the orbital plane, which manifests as error in tides and spacecraft environment, and somehow, it is larger in the JPL solution. Additionally, CSR and JPL calculate the monthly solutions as a deviation from the GIF-48 mean geopotential model. 13,14 However, there is a slight difference between CSR and JPL on the diurnal and semidiurnal band of ocean tide contribution, where CSR RL05 uses the GOT4.8 and JPL RL05.1 uses the GOT4.7, which differ only in the harmonics of the S2 tide. 13 The area-weighted uncertainties for each processing center were computed by considering the uncertainty distribution presented in Fig. 4. The results show that CSR presents a globally averaged (weighted) uncertainty of 9.4 mm, GFZ 13.7 mm, GRGS 14.8 mm, and JPL 13.2 mm. The low performance of GFZ is evident over the mid-latitude and tropic regions (Fig. 4), and appears to be primarily due to increased striping over the oceans. Chambers and Bonin et al. 31 have reported that the CSR, GFZ, and JPL time series are virtually the same over the oceans in terms of comparisons based on ocean-bottom pressure. Large uncertainties of GRGS are evident, particularly over regions above the latitudes AE80 deg, Northern Australia, and Southeast Asia. It is well known that the ensemble mean is effective in reducing the noise in comparison with the members used in its computation, as shown in Ref. 18. Here, the ensemble mean using the CSR, GFZ, GRGS, and JPL series was computed and compared with each individual solution. The standard deviation of the differences was 9.4 mm for CSR, 11.5 mm for GFZ, 12.0 mm for GRGS, and 11.2 mm for JPL, which confirms the results derived by the TCH method, despite the magnitude of the uncertainties and the differences between standard deviation and Allan deviation.
However, the noises are insufficient to describe the relative quality of each processing center. Hence, for each GRACE product (i.e., CSR, GFZ, GRGS, and JPL), the signal-to-noise ratio (SNR) was estimated as the ratio between the standard deviation of the time series of TWSA and the uncertainties presented in Fig. 4; the results are presented in Fig. 5. The spatial patterns of SNRs are different from those of the noise distribution (Fig. 4), indicating higher values in regions with strong hydrological signals (e.g., Amazon basin, Congo/Zambezi basins, and Northwest India) and in regions under significant ice storage changes (e.g., Greenland and Asian high mountains). Area-weighted SNRs were computed for each processing center considering the SNRs presented in Fig. 5. The results show that CSR presents an SNR of 4.2, GFZ of 2.9, GRGS of 2.3, and JPL of 2.9, indicating the overall good performance of the CSR product relative to the other two SDS centers as well as to GRGS. However, over regions of low signal variations, such as the Sahara Desert and Gobi Desert, China, it is apparent that GFZ and JPL show smaller variations relative to their uncertainties, while CSR presents high SNRs over these regions. Again, in comparison with the ensemble mean, the mean of the SNRs shows 3.9 for CSR, 3.2 for GFZ, 2.8 for GRGS, and 3.3 for JPL.

Basin Scale Comparison of the Solutions
The river basins of the various drainage areas and locations (Fig. 6) were carefully chosen in order to quantify the relative differences of the four data centers at basin scale. The boundaries of the selected 91 basins with areas larger than 100 × 10 3 km 2 were obtained from the WRI Major Watersheds of the World Delineation, available at Ref. 56. This choice was based on the fact that  Fig. 6 for the Amazon basin after filtering its spherical harmonic coefficients (ϑ c nm and ϑ s nm ). The TCH-based uncertainties are shown in Fig. 7(a), from which it is evident that CSR provides the lowest uncertainties for almost all 91 basins, and GFZ and JPL present similar results, with JPL performing slightly better than GFZ. However, GRGS presents the lowest performance compared to the SDS processing centers over the river basins considered here, particularly over basins 21,27,33,52,66, and 90. The TCH-based uncertainties of the four processing centers vary with respect to the basin size, which is in agreement with the fact that GRACE is more sensitive and precise for large river basins (see, e.g., Refs. 16, 58, and 59). In order to determine whether the relative basins' hydrological signal variations influence the magnitude of the uncertainties, SNR values for each river basin were computed [ Fig. 7(b)]. It is apparent that the scatter of the SNRs is relatively decreasing; that is, it generally depends on the size of the basin. This finding somewhat agrees with the results presented in Ref. 18 (see Fig. 3), where it has been shown that the differences in the solutions are due to random errors rather than signal differences. Regardless of the size of the basin, the amplitude of the hydrological signal also plays an important role in the performance of the GRACE-derived TWSA fields. For example, the SNRs of basins 02 to 11 and 31 to 45 are relatively lower than those of the basins with smaller areas. Overall, all four centers showed high performance in recovering the hydrological variations over the basins considered here (see Table 1 for the names and  Fig. 6 Watersheds of the world larger than 100;000 km 2 extracted from the WRI Major Watersheds of the World Delineation. The numbers assigned here to each basin can be used to identify the names of the river basins ( Table 1). The embedded panel shows the geographic mask adopted for the Amazon basin after DDK2 smoothing.
13.0 mm for GRGS (  17), the SNR of the CSR is four times larger than that of the GRGS. Similar to Sec. 3.2, the ensemble solution calculated as the arithmetic mean of TWSA time series derived from the Stokes coefficients provided by the four processing centers was used to assess each ensemble member. Despite the differences between the magnitudes of the TCH and ensemble mean based uncertainties, the results shown in Fig. 7(c) are comparable to those of Fig. 7(a). It is evident that the uncertainties increase overall, while the basin size decreases. Additionally, the figure shows the overall good performance of CSR relative to that of JPL and GFZ. The second-best performance is that of JPL, followed by GFZ. The SNR results shown in Fig. 7(d) are also comparable to those of Fig. 7(b), where, taking into consideration the relative hydrological signal, it can be concluded that GRACE is able to detect changes in the terrestrial water storage of smaller basins.
In order to summarize the finds presented in Table 1 and Fig. 7, a multiple-comparison test was performed to determine the similarities (if any) of the CSR, GFZ, GRGS, and JPL solutions over the 91 river basins. First, a nonparametric Kruskal-Wallis test 60 at 95% confidence interval was applied to the uncertainties shown in Table 1 to determine whether there were statistically significant differences between the solutions. Figures 8(a) and 8(b) show the box plots that visually present the summary statistics for each solution pertaining to 91 river basins for TCH and ensemble mean based errors, respectively. It is evident that the notches in the GFZ and JPL box plots overlap, and it can be concluded with 95% confidence that the true medians do not differ.  Table 1 Uncertainties and SNRs for the 91 basins larger than 100 × 10 3 km 2 . Each basin location is shown in Fig. 6 and can be identified by the numbers in the first column of the  Subsequently, a follow-up test was conducted to identify which processing center presented uncertainties resulting from a different distribution. The relative performance pertaining to the 91 river basins by the multiple-comparison test, based on the Tukey-Kramer procedure, 61 is shown in Figs. 8(c) and 8(d) (TCH and ensemble mean based errors, respectively). The test indicates a significant difference between the uncertainties of CSR and those of the other three processing centers. However, there is no significant difference between the uncertainties of GFZ and JPL; therefore, the test does not reject the null hypothesis that the uncertainties of these two solutions obtained from the same distribution. Additionally, the uncertainties of GRGS, for the 91 river basins, originate from a different distribution to those of CSR, GFZ, and JPL, since there is

Results Summary
The intercomparison of the Stokes coefficients (Sec. 3.1) from the four GRACE processing centers (CSR, GFZ, GRGS, and JPL) showed an overall good agreement in terms of correlation coefficient between each pair of solutions up to degree 50 and order 40, with the GRGS being less correlated with the SDS processing centers (Fig. 1). Additionally, the Stokes coefficients from the four centers, up to degree 40 and order 20, showed relatively good agreement in terms of the NSE coefficient (Fig. 2). Moreover, the sectorial Stokes coefficients (n ¼ m) showed NSE values less than zero for each pair of solutions, which indicates unacceptable performance. However, this relative comparison is unable to validate these products since it generally depends on the choice of one of them as the truth. One option would be, for example, to use an ensemble solution, computed as an arithmetic mean of the four centers, as suggested by Sakumura et al. 18 to assess each ensemble member. As mentioned in Sec. 1, GRACE errors could be estimated by using the full covariance matrix and errors in the background models; 32,33 yet a comparison of the GRACE-derived TWSA fields with true observations would be more interesting for validation purposes. However, there are no such in situ data for directly assessing GRACE-derived TWSA.
The importance of the present study lies in assessing the quality of each processing center using a generalized formulation of the TCH method (Sec. 2.1), which allows the noises of the time series to have a certain degree of correlation (see, e.g., Ref. 36). At a global scale, CSR presents a weighted averaged uncertainty of 9.4 mm and an SNR of 4.2, while GFZ, GRGS, and JPL have uncertainties of 13.7, 14.8, and 13.2 mm, with SNRs of 2.9, 2.3, and 2.9, respectively (Sec. 3.2). In addition to the recognized problem associated with the GRGS over regions above latitudes AE82 deg, large uncertainties were also indicated for regions such as Northern Australia, Southeast Asia, and the Amazon basin [see Fig. 4(c)]. Comparisons pertaining to the 91 large river basins exceeding 100 × 10 3 km 2 showed that CSR, GFZ, GRGS, and JPL present mean uncertainties of 6.5, 9.4, 11.4, and 8.8 mm (Sec. 3.3). The TCH-based uncertainties were confirmed by comparing the ensemble solution (arithmetic mean of the four centers in terms of TWSA) with each ensemble member. However, TCH-based internal precision makes sense only when all processing centers are considered together. Overall, the results are encouraging and show that the TCH method has the potential to assess the quality of other space-borne sensors that deliver observations of the same target variable.

Conclusions
The quality of four GRACE processing centers has been estimated by using a generalized formulation of the TCH method. The overall comparisons of TCH-based uncertainties pertaining to the period from August 2002 to June 2014 showed that the four solutions (CSR, GFZ, GRGS, and JPL) lie within 5 to 15 mm in terms of global grids, as well as basin-averaged values of TWSA. However, during the time span of the comparisons, CSR provides the most precise monthly solution in terms of TWSA at the global and basin scale, though the performance of each processing center differs from basin to basin. Over the polar regions, JPL shows higher uncertainties relative to the other two SDS processing centers, while the performance of GRGS is inadequate for high-latitude regions, Northern Australia, and Southeast Asia. However, the low performance of GRGS over these regions is yet to be investigated. Overall, it can be concluded that the applicability of GRACE for a specific river basin (or a region) would depend on the strength of the signal (i.e., on the annual cycle), rather than on the size of the basin. Although the use of the ensemble mean of the four (or more) centers is recommended from the practical standpoint, the present study contributed with the possibility of properly choosing the weights, so the TWSA ensemble mean series will have a low noise variance. However, studies are needed to determine whether these findings could be impacted in the case in which the assumption of low correlation among the noises of the series fails.