Ground-truth of a 1-km downscaled NLDAS air temperature product using the New York City Community Air Survey

Abstract. Ground-truthing results are presented for a new 1-km air temperature product downscaled for New York City (NYC) from ∼12  km North American Land Data Assimilation System (NLDAS) air temperature data using 1 km moderate resolution imaging spectroradiometer surface temperature data. The downscaled product was compared against a unique highly spatially resolved ground-level ambient air temperature dataset collected through the New York City Community Air Survey (NYCCAS), a neighborhood level air pollution and temperature monitoring network, for the years 2009 and 2010. This work focuses on the spatial variation in daily minimum temperatures within the five counties that comprise NYC (∼784  km2). Overall, the downscaled daily minimum temperature was well correlated with ground station data, with NYCCAS minimum temperatures being slightly higher. Minimum temperature R2 values were 0.9 and 0.92, and mean absolute errors were 0.69°C and 0.86°C for years 2009 and 2010, respectively. The smallest differences between NYCCAS and the downscaled data were seen at lower temperatures, in less densely urbanized areas, and in areas with higher vegetative cover, suggesting systematic bias in the downscaled data related to land-use. The 1-km dataset discerned neighborhood level temperature differences in high-density urban situations with heterogeneous land cover.

1 Introduction data products is a critical component in the development of accurate exposure metrics for health studies. Prior work in the validation of NLDAS components has relied on the Oklahoma Mesonet and the Atmospheric Radiation Measurement Program (ARM) as data sources for comparison. Luo et al. 5 found that air temperature at 2 m from model simulations forced with NLDAS data compared well with those forced with station observations. Another retrospective forcing study of NLDAS radiation and precipitation data 6 found the best results for radiation measures. Robock et al. 6 found good agreement for soil temperature, but differences in surface energy partitioning between the model and station observed values. Properly estimating energy partitioning between latent and sensible heat is an ongoing challenge in many of these studies and may explain discrepancies in this study's outcomes as well. Energy absorbed at the surface and released as sensible heat will heat the surface and the air near the surface, while energy used to evaporate water will cool the same surface. The amount of surface cooling is related to both the vegetation coverage of the land surface and soil moisture content, both of which vary significantly across a coastal urban setting such as New York City (NYC).
Although performing well in ground-truthing analyses, one drawback of the NLDAS datasets is that the spatial resolution (∼12 km) may not be appropriate for describing exposures in areas with large variation in land cover, such as complex coastlines, steep slopes, or urban areas. In order to address this gap, our project created a downscaled 1-km resolution air temperature dataset. Prior ground-truthing has not been done for a downscaled NLDAS dataset or for air temperature in a highly variable urban landscape such as NYC.
The New York City Community Air Survey (NYCCAS) has the most comprehensive geographic coverage of any urban air monitoring network in the United States, with up to 150 monitors in a 790-km 2 area. While developed to capture spatial variation in air pollution, it also records ground-level air temperature and relative humidity. Using NYCCAS' highly spatially resolved data, we assessed the extent to which downscaled NLDAS data products capture the intraurban patterns in warm season temperature and identified adjustments to the downscaling process that improved the accuracy of the data product.
Once the downscaled data products have been finalized, they will be incorporated into a collaborative heat stress vulnerability study involving the National Aeronautics and Space Administration (NASA), the New York State Department of Health (NYSDOH), and other agencies. The fine spatial scale of the final dataset will fill in spatial gaps in ground-based temperature sources. The accuracy of the 1-km downscaled data in highly variable landscapes, such as cities, will be evaluated and improved through the ground-truthing process in NYC. The final nationwide dataset will be disseminated via the Centers for Disease Control and Prevention (CDC) Environmental Public Health Tracking Network to be used as an alternative to ground-based or 12-km NLDAS surface temperature data in vulnerability mapping or potentially for the calibration or validation of other models.

New York City Community Air Survey: Air Temperature Records
NYCCAS is a monitoring dataset that was developed primarily for the purposes of tracking street-level air pollutant distribution across the city. The spatial designation of the sensors was optimized to capture intraurban spatial variation in fine particulate pollutant concentrations from local sources in NYC. More details of the site selection process and monitoring strategy can be found in Ref. 7. In order to compute the temperature-corrected flow volume of the sensor air pump, temperature and relative humidity were also recorded, thereby providing more spatially resolved temperature data than the three National Weather Service locations in NYC. Most NYCCAS sites were monitored for 14 continuous days, once per season (four times per year). A small subset of sites (five) was continuously monitored, so as to track city-wide temporal variation in pollution. The monitors were mounted on utility poles 3 m above the street and recorded temperature and relative humidity every 15 min. Our study uses May to October data from the years 2009 to 2013, for which the NYCCAS and downscaled NLDAS datasets coincide. The number of NYCCAS sites has fluctuated over time due to funding availability, with 155 sites monitored in 2009 and 2010, 100 sites in 2011 and 2012, and 60 sites in 2013. These air-sampling units were not standard weather stations and were variably subject to direct sunlight, which may have introduced additional errors in the monitored daytime maximum temperature across sites. Because of this, our study focused on minimum temperatures, which tended to occur during nighttime when the influence of shading across sites was presumed to be minimal.
To provide context to our temperature comparison, we retrieved NDVI data from thee United States Geological Survey (Landsat 5, image taken April 22, 2010) and calculated the average value around each NYCCAS sampling location at 15 buffer distances from 100 to 1000 m. We also developed measures of the total interior square footage of all buildings within the 15 buffers around each location using extensive land use and geographic data at the tax lot level maintained by New York City Department of City Planning. 8

NLDAS Meteorological Reanalysis 1 × 1 km 2 Downscaled Temperature Data
A fine-scaled, near-surface (2 m above ground level) temperature dataset was derived from meteorological reanalysis and remote sensing data for the years 2005 to 2013. Historical NLDAS temperature data were derived from the NARR analysis fields, which have a 32-km spatial resolution and 3-hourly temporal frequency. 4 The NARR fields that were used to generate NLDAS meteorological fields were spatially interpolated to the finer resolution of the NLDAS 1∕8-deg grid (∼12 km) and then temporally disaggregated to the NLDAS hourly frequency. From the hourly NLDAS data, we derived daily maximum and minimum air temperatures on the 1∕8 deg grid for the months of May to October. These daily temperatures, described in Ref. 9, were then downscaled to a 1-km product using the 1-km moderate resolution imaging spectroradiometer (MODIS) land surface temperature (LST) product "MYD11A2." In this downscaling algorithm, LST data from the daytime and nighttime (∼1:30 PM and AM local standard time, respectively) aqua MODIS overpasses were used to provide 1-km spatial patterns of LST for the conterminous United States. The MODIS LST grids form the basis for the downscaled patterns of air temperatures, using the algorithm described below. Daytime LST data were used for downscaling daily maximum temperatures, and nighttime LST data were used for downscaling daily minimum temperatures.
We chose this remote sensing-based approach, as opposed to using available data sets that rely solely on ground station data, such as DayMet 10 and PRISM (parameter-elevation relationships on independent slopes model), 11 in order to capture geographic temperature distributions, including intraurban variations, which are extremely difficult to represent using ground station data alone. Remotely sensed LST has been shown to be an excellent predictor of spatial patterns in near-surface air temperature over large expanses, [12][13][14] which substantiates the use of LST for estimating air temperature at the MODIS scale.
The downscaling model is based on the assumption that in the absence of strong horizontal temperature advection, air temperature is driven by sensible heat flux from the surface, thus the spatial patterns of air temperature mimic the patterns of LST. Another assumption is that daily maximum air temperatures occur during the early-mid afternoon, near the time of the PM aqua overpass (1:30 PM local standard time), and the daily minimum air temperatures occur in the early morning, near the 1:30 AM aqua overpass. These two assumptions are generally appropriate for quiescent conditions associated with weak synoptic flow, typical of the warm season at mid-latitudes. A final assumption is that the spatial pattern (but not the magnitude) of temperatures is nearly constant from day to day within a 40-day period, so that use of MODIS LST data predating the current day by up to 40 days is still valid. This assumption can be violated by rapidly changing vegetation conditions or large changes in soil moisture, but neither of these conditions typically is relevant to an urban environment. [15][16][17] Air temperature variations tend to be much smaller in magnitude than corresponding LST variations. Therefore, our method computes and applies normalized MODIS LST spatial anomalies to disaggregate daily maximum NLDAS air temperature; the normalization procedure accounts for this difference in magnitude of variation. We first created daytime and nighttime LST grids using the most recent previous 8-day composite MODIS LST products. If LST data are missing due to cloudiness in the most recent composite, we use data from prior periods, going back a maximum of five 8-day periods, or 40 days, from the day for which downscaling is being performed. By using up to 40 days of LST data, the missing data problem is virtually eliminated. After the composite LST grid is established, standardized LST departures, Z HR , [Eq. (1)] are calculated from the composite LST grid, in which the spatial means and standard deviations are calculated within a local neighborhood or "moving window" (kernel). The size of the window can be varied; we consider it to be a "tunable parameter." As a step toward optimizing the downscaling model, we tested two sizes of the spatial neighborhood, 3 × 3 NLDAS grid cells [version 1 (v1)] and 5 × 5 NLDAS grid cells [version 2 (v2)], or ∼36 × 36 km 2 or 60 × 60 km 2 . The departures are calculated according to E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 6 0 4 (1) where T HR is the high-resolution LST, and T HR;mean and σ HR are the mean and standard deviation, respectively, of high-resolution LST over the neighborhood. Once the departures are calculated, we then compute the downscaled daily maximum air temperatures for each day, TDIS, based on the standardized LST departures E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 5 2 9 where T LR and σ LR are the mean and standard deviation, respectively, of low-resolution daily maximum or minimum air temperatures over the neighborhood. Through this algorithm, areas within the NLDAS grid cell for which the MODIS LST composite is warmer (colder) than the NLDAS grid cell mean will result in a maximum or minimum air temperature that is warmer (colder) than the NLDAS air temperature recorded for the respective day. More details of this procedure are provided in Ref. 18.

Statistical Analysis
In order to assess the performance of the downscaling algorithm, we intersected NYCCAS monitor locations with the native NLDAS grid (∼12 km) and downscaled (1 km) NLDAS grid (see Fig. 1 for graphical explanation) and matched the minimum temperatures by day for all days present in both datasets during the months of May to October. We then averaged the values by year and NLDAS grid cell (see Table 1 for sample sizes). We characterized the relationship between native and downscaled NLDAS data and NYCCAS monitor data using the following metrics: coefficient of determination (R 2 ) from a linear model with the spatial distribution of the NYCCAS minimum temperatures as the dependent variable and the downscaled NLDAS minimum temperatures as the explanatory variable, root-mean square error (RMSE) for information on bias and precision, mean absolute error (MAE) to describe the accuracy of the modeled data, and the slope of the regression equation to determine any bias in the linear relationship. Initial comparisons included all 5 years (2009 to 2013); however, the analyses presented here focused on years with the most spatial coverage in the NYCCAS program, 2009 and 2010, a cooler and a typical summer, respectively. We tested for spatial autocorrelation in the residuals from this model using Moran's I using the R package ape. 19 In order to account for spatial autocorrelation in the residuals, we included a thin-plate smooth function of the XY coordinates in a generalized additive model. We tested various degrees of freedom in the spatial smooth term and selected the minimum number necessary to remove spatial autocorrelation in the residuals. We explored the spatial pattern of residuals from the model without the smooth term by correlating them with indicators of the intensity of development and greenness (NDVI, density of interior built space). All analyses were conducted using statistical package R version 3.3.2 (R Core Team, Vienna).

12 km NLDAS-NYCCAS Minimum Temperature Comparisons
The results from the comparison of the native NLDAS data with average NYCCAS temperature was based on only the five cells whose centroids fall within NYC boundaries (Table 1 and Fig. 2). These results provided context for the 1-km outcomes, demonstrating that the temperature variability found within NYC boundaries was not well described at the ∼12 km resolution.   ig. 2 Example of the native and downscaled NLDAS minimum temperatures for July 7, 2010; the high reported at JFK weather station on this day was 38°C, and the low was 25°C. . Examination of the residuals did not reveal violations of linear model assumptions; however, we did see a significant spatial pattern in the residuals (Moran's I p-value < 0.000001). The xy smooth function with seven degrees of freedom was sufficient to control spatial autocorrelation in the residuals, which increased the R 2 from 0.85 to 0.9, decreased the error by over 20% while changing the slope by less than 3% and the intercept by on average 15% (Table 1).
NYCCAS-NLDAS temperature differences are mapped in Figs. 4(a) and 4(b). (See Fig. 5 for histograms). A comparison of these two plots shows the improvement that the 5 × 5 kernel makes in the downscaling, with average difference between NYCCAS and NLDAS decreasing from 0.93°C with a 3 × 3 kernel (v1) to 0.64°C with a 5 × 5 (v2) kernel for 2009. In 2010, while the average differences are higher, the 5 × 5 kernel improves the performance of the downscaled data to the same extent as is 2009 [2.29°C for v1 and 1.98°C for v2, Figs. 4(c) and 4(d)]. Orange points in Fig. 4(c), representing a 4°C to 5°C difference between NYCCAS and downscaled NLDAS are not present in Fig. 4(d). Figures 4(a) and 4(b) also show greater divergence between the datasets in the higher density areas of mid-town Manhattan and Brooklyn. NYCCAS minimum temperatures tend to be higher than downscaled estimates in most of NYC except Staten Island, the least built-up borough, as well as in less densely urbanized parts of the other boroughs. We found the spatial pattern in the residuals from the model without the spatial smooth to be correlated most strongly with NDVI at a buffer size of 300 m (r = −0.65 for 2010 v2), with lower NDVI values indicating more urbanization and less green cover corresponding to areas where the downscaled dataset underestimated NYCCAS temperatures the most [ Fig. 6(b)] while temperatures in greener areas were overestimated NLDAS.

Overall Results
This study takes advantage of a unique high-density monitoring network in NYC to ground-truth remotely sensed and modeled temperature estimates, specifically in terms of the spatial variability in temperature within a complex urban and coastal environment. The native NLDAS (∼12 km) resolution performs well in regions that are flat and landlocked with comparatively low population density, such as the Southern Great Plains, as demonstrated in the many studies that use Oklahoma Mesonet data. 20,21 NYC has, in common with many other cities, a high population density and complex coastal features, which are among the contributors to surface temperature heterogeneity that are not well captured at a ∼12-km scale. However, the spatial variability in warm season daily minimum temperatures in NYC is well described by downscaling the native NLDAS air temperature estimates to a 1-km product using an algorithm that  employs a 5 × 5 kernel neighborhood to impose the spatial temperature pattern from MODIS LST data. While the datasets were highly correlated (R 2 = 0.85), NLDAS downscaled data consistently underestimated the warmest minimum temperatures measured by NYCCAS. Using the relationship of the downscaled NLDAS data to ground station monitoring that we have determined for NYC, other cities can make use of the regression parameters from our adjusted model and apply them to the downscaled data now available for the contiguous United States. The US Centers for Disease Control (CDC) recommends that health departments use data at the finest geographic resolution available when assessing vulnerability to climate change 22 and many local health departments express the desire for finer resolution exposure data. Similar measures of R 2 , MAE, and RMSE have been used in prior studies that compare remote sensing sources with monitored air temperature. [23][24][25] A good correspondence (i.e., a high R 2 ) between satellite-based and ground-measured data is possible only when the observations contain sufficient variation in measured values, which is more often typical of country-scale studies. The very small spatial extent of cites make a good fit less likely; Ho et al. 25 found that the spatial pattern of Landsat-derived daytime surface temperature correlated poorly (R 2 = 0.39) with maximum air temperature measured at 39 weather stations for 6 hot summer days in the greater Vancouver area. In Birmingham, United Kingdom, nighttime MODIS-based surface temperature averaged over June to August was moderately correlated (R 2 = 0.6) with minimum air temperatures measured at 107 locations. 23 In comparison, we find good correspondence in the spatial patterns between the temperature measures in our study.
Other data sets exist that provide high-resolution daily meteorological variables. For our specific application, the primary shortcoming of some of these, such as DayMet 10 and PRISM 11 (parameter-elevation relationships on independent slopes model), is that their temperatures are based on station data alone, with no remotely sensed inputs. From the station data, DayMet and PRISM interpolate to a 1-km or 800-m grid through a sophisticated spatial algorithm, which also adjusts temperatures based on surface elevation in the case of DayMet, and surface elevation and other physiographic factors such as coastal proximity, topographic facet orientation, and vertical atmospheric layer in the case of PRISM. However, ground stations are rarely available at anything close to 1 km or 800 m spacing. Thus, even in urban areas where surface observations are relatively dense, the estimated temperatures rely heavily on spatial interpolation. In more rural areas, ground observations are very sparse and thus the DayMet temperatures are even more heavily interpolated. In our approach, use of remote sensing data at the desired spatial resolution of the resulting data set allows us to capture the thermal details of, for example, an urban environment. Even in rural areas, temperature variations that occur due to agricultural patterns, soil moisture variations, and riparian areas are not well captured by ground observations and thus are mostly absent in the DayMet and PRISM temperature data sets. These features are observed by MODIS LST and are therefore represented in our downscaled temperature data. Admittedly, the MODIS data are not of air temperature, but they at least provide the fine-scale structure of temperature, and we argue that our downscaling model provides a bridge between LST and air temperature.
In NYC, there is variability in built density as measured by NDVI, and the fact that we found monitored values to exceed remotely sensed modeled estimates in areas of high built density is consistent with other studies. Bechtel compared temporal trends in air temperature with remotely sensed LST at six ground-station sites in Hamburg, Germany, which ranged from suburban neighborhoods and urban parks to very compact buildings and highly sealed areas; the remotely sensed data underestimated air temperature in sites with extensive impervious surface. 24 This inverse relationship of modeled versus observed temperature differences and NDVI could have several explanations. 26,27 MODIS surface temperature used in the downscaling process tends to amplify temperature extremes in both directions, especially over built or sparsely vegetated surfaces. 9,28 Satellite-derived surface temperature spatial patterns have been found to diverge from the spatial pattern of air temperature in urban settings, hypothesized to be due to the advection of heat produced in the city center. 23 Perhaps the warm season synoptic flow in NYC is not as weak as is assumed in the downscaling model. The summertime prevailing winds from the southwest in NYC may push heat created in Midtown Manhattan and the heavily industrialized area of New Jersey into residential neighborhoods to the north and east, exacerbating air temperatures already impacted by high surface temperatures as seen in Birmingham, United Kingdom. 29 Weather patterns in NYC, as with many other coastal cities, include complex land-water boundaries with sharp changes in energy and moisture exchanges. These drive a host of meteorological effects, such as the pattern of land-sea breezes, which are distinct from inland areas and can vary in effect on the urban climate depending on the weather pattern. 26,27 While the combination of the NLDAS-based air temperatures and the MODIS surface temperatures did well in describing air temperatures in NYC, further steps could be taken to improve the fit such as using land-use regression modeling to combine the downscaled data with NDVI.
Evrendilek et al. 30 used satellite-based and mesoscale regression modeling of air temperature with other variables including NDVI to compare against 83 ground-based stations in Turkey. The adjusted R 2 of the best minimum air temperature model was 0.65 and the RMSE 5.9°C and included NDVI in addition to the remotely sensed surface temperature. The downscaled NLDAS dataset is available for the continental United States as is NDVI (USGS) which would allow other cities to produce 1-km resolution temperature surfaces for exposure assessment in health studies and evaluation of urban heat mitigation activities.

Limitations
The entire area of NYC is coastal and urban, and weather patterns evident here may not apply to inland locations. We do not have access to information regarding when monitors are in sunlight or shade, thus daytime NYCCAS temperatures are likely more subject to errors than recordings at night. Daytime comparisons (Fig. 7) should be interpreted with caution. While we limited this analysis to ground-truthing overnight minimum temperatures due to daytime temperature limitations, minimum temperatures are highly correlated with maximum temperatures and heat index in NYC and have similar association with warm-season natural cause mortality. 31 The frequency of extreme warm overnight temperatures is increasing more rapidly than extreme daytime temperatures in the United States (graph 32,33 ).

Summary
Our study has demonstrated the improvement of the downscaled NLDAS model in capturing the spatial variability of temperature across NYC neighborhoods over the native ∼12-km resolution.
Comparisons of warm-season (May to October) downscaled NLDAS modeled temperatures with NYCCAS ground station measurements showed strong agreement overall, with better agreement for v2, which uses a 5 × 5 kernel. NYCCAS measured minimum temperatures were warmer on average than the downscaled NLDAS temperatures, with the best agreement occurring for areas and years with cooler minimum temperatures. Improvements to the downscaling algorithm in v2 reduced these differences further for both normal and cooler summers and improved model fit in the more densely urbanized areas. This study demonstrates that the 1-km downscaled NLDAS dataset provides high accuracy temperature data at a temporal and spatial resolution that is desired by end-users in the climate and health community for research and policy development and evaluation purposes. 34 While the ∼12-km data are useful in places with less heterogeneous land-cover and lower population density, the higher resolution is needed for coastal and/or urban applications. In cities without a high resolution network of temperature monitors such as NYCCAS, this dataset could provide exposure estimates useful in the understanding of heat-related health effects, as heat-health vulnerability exhibits high rates of geographic variability. 35