Evaluation of remote sensing and modeled chlorophyll-a products of the Baltic Sea

Abstract. The Baltic Sea is an optically very complex study object for watercolor remote sensing because of the high quantity of colored dissolved organic matter, two optically distinct phytoplankton seasons, high variability in concentrations of optically active substances, and low sun angles. Despite this, there are numerous remote sensing and modeled chlorophyll-a (Chl-a) products publicly available for the Baltic Sea. Sixteen openly accessible Chl-a products were tested with 267 in situ Chl-a measurements that were carried out in Estonian marine waters during 2016 to 2021. All modeled products and about half of the remote sensing products failed to produce reliable results. The best-performing remote sensing Chl-a product was Case2/Regional CoastColour produced from Sentinel-3 ocean and land color imager (OLCI) reflectance with R2  =  0.55, root mean squared error   (  RMSE  )    =  4.5  mg m  −  3, mean absolute percentage error (MAPE) = 74%. In addition, eight different band ratio algorithms were applied on Sentinel-3 OLCI and Sentinel-2 multispectral instrument data. The best remote sensing band ratio algorithm was derived from top-of-atmosphere reflectance of Sentinel-3 data using 665, 709, and 754 nm bands (R2  =  0.67, RMSE  =  3.9  mg m  −  3, and MAPE = 63%). Our results show good suitability of Sentinel-3 for Chl-a retrieval. However, the high uncertainties suggest for the further product development and validation needs.


Introduction
Chlorophyll-a (Chl-a) is the dominant pigment of phytoplankton, and its concentration is often used as a proxy for phytoplankton biomass making Chl-a one of the most studied indicators of water quality. 1,2 The wide ecological importance of phytoplankton also causes the extensive need to get accurate Chl-a data at different temporal and spatial scales. Chl-a is an optically active component, i.e., it affects water reflectance (color). Consequently, it can be estimated by remote sensing.
The European Space Agency opened a new era in water remote sensing by launching twin satellites Sentinel-3A and -3B (Sentinel-3) with Ocean and Land Color Imager (OLCI) sensor in 2016 and 2018, respectively. Previous watercolor sensors, such as the medium resolution imaging spectrometer (MERIS) on Environmental Satellite (ENVISAT), were one-off scientific missions while Sentinels Program guarantees now availability of data for decades to come allowing thus monitoring of water quality. Sentinel-3 is specially designed for marine monitoring and has 21 well placed spectral bands for that purpose with medium spatial resolution (300-m pixel). It provides global coverage (at the equator) every two days. 3 At the latitude of the Baltic Sea two Sentinel-3 satellites provide two images per day.
Two identical multispectral instruments (MSIs) onboard the Sentinel-2A and -2B (Sentinel-2) were launched in 2015 and 2017, respectively. Sentinel-2 was designed for land monitoring but has proved to be suitable for estimating water quality as well. 4 It has 13 spectral bands, which *Address all correspondence to Tuuli Soomets, tuuli.soomets@ut.ee offer high resolution optical imagery at 10, 20, and 60 m spatial resolution, depending on the band. This mission provides global coverage every 5 days. 5 At the latitude of the Baltic Sea a Sentinel-2 images can be acquired every 2 to 3 days.
The Baltic Sea is the world's largest inland brackish water sea, where the combination of a large catchment area with a high rate of human activities and a small volume with limited exchange with the Atlantic Ocean makes it especially sensitive to eutrophication. 6 Also, the Baltic Sea is notorious for its cyanobacterial blooms, 7-10 which lead to many serious ecological problems. Concentration of Chl-a is spatially and temporally very variable in the Baltic Sea. Moreover, unlike most of phytoplankton, cyanobacteria can move in the water column and the vertical distribution of cyanobacteria has significant impact on the remote sensing signal. 11 There are two biologically and optically distinct phytoplankton seasons separated by a relatively clear water period. 12 Diatoms dominate the spring blooms and cyanobacteria dominate summer/late summer/early autumn blooms.
Estonian marine waters include different regions of the Baltic Sea: the Gulf of Finland, the Gulf of Riga, and the northern Baltic Proper. Those areas belong to the most eutrophicated parts of the Baltic Sea. 13 Furthermore, high spatial and temporal variation of optical properties of water (the color and transparency) is characteristic for these areas making them an extremely complex study object for ocean color remote sensing. In addition to great variations in Chl-a concentrations, a high amount of colored dissolved organic matter (CDOM) received from the catchment area of the Baltic Sea makes the water dark, lowering the water leaving signal and requiring highly sensitive remote sensing sensors and very accurate atmospheric correction. 4 Highly variable concentration of total suspended matter (TSM) is also characteristic for coastal waters in Eastern and Southern parts of the Baltic Sea besides Chl-a and CDOM. This is caused by shallow and sandy coastal areas where wind causes resuspension of sediments. Finally, we must not forget that the sun angle is low during most of the year in the Baltic Sea area lowering the water leaving signal detectable by remote sensing sensors and increasing sun and sky glint which is noise if we want to map water quality parameters such as Chl-a. More than 90% of signal measured by satellites above water bodies originates from atmosphere not from water. Therefore, it is obvious that even a small error in atmospheric correction of the dark water imagery may be as large as the whole body of water leaving signal.
Despite this, Chl-a has been estimated with remote sensing techniques in the Baltic Sea region numerous times before. 4,8,9,[14][15][16][17][18][19][20][21][22][23][24][25] In general, these algorithms can provide high accuracy at certain times and areas, but usually, the algorithm performance is not adequate in other conditions. Ligi et al. 23 and Simis et al. 26 have shown that optical differences between the two different bloom seasons are so high that seasonal remote sensing algorithms may be needed to achieve higher accuracy in Chl-a mapping. Regardless of the issues, and even if the statistical error of the Chl-a products is as high as 100% to 200%, remote sensing has proven to be a valuable method for monitoring water quality, due to its advantage in temporal and especially spatial coverage compared with in situ methods. 27,28 There are numerous remote sensing and modeled Chl-a products for the Baltic Sea which are publicly available and can be freely used by all. Their users belong to very different interest groups from the policymakers (the European Commission, HELCOM, local authorities etc.) and researchers to the common public. Nevertheless, it is extremely challenging for the end-users to decide which product to choose. Therefore, the objective of this study was to assess the performance of the freely available remote sensing and modeled Chl-a products of the Baltic Sea using in situ observations from Estonian marine waters. Additionally, the performance of different well known band ratio algorithms (BR) was tested.
in dark and cold for <10 h before filtering. Depending on particle concentration in the water, 0.5 to 1 l was filtered through two parallel Whatman glass microfiber filters (GF/F pore size 0.7 μm). Phytoplankton pigments were extracted from the filters with 96% ethanol at 20°C for 24 h and optical density was measured with PERKIN ELMER Lambda 35 UV/VIS spectrophotometer. Later, formula by Jeffrey and Humphrey 29 was applied and mean of the two parallels was taken to calculate the Chl-a values.

Remote Sensing and Modeled Chlorophyll-a Products
Almost all (15 out of the 16) currently available Chl-a products for the Baltic Sea used in this study are partly or fully based on the data from the Copernicus program (European Union's Earth Observation Programme) 30 collected with Sentinel-3 OLCI and with the MSI on-board Sentinel-2. Only eco-hydrodynamic model SatBaltyk uses data of moderate resolution imaging spectroradiometer (MODIS) and EcoSat. 31 Different freely available Chl-a remote sensing (13) and modeled (3) products were gathered, whereas only the same-day match-ups were used. From 13 different remote sensing products, five were used in both, high and low spatial resolutions (HR and LR, accordingly), so altogether 21 different Chl-a products were used in this study.
Different atmospheric correction processors [Case-2 Regional/CoastColor (C2RCC), Case2R/CoastColor-Extreme (C2X), and Case2R/CoastColor COMPLEX 32 and POLYMER 33 ] were used in ESTHub Processing Platform 34 to derive some of the Chl-a products ( Table 1: products 1 and 2, and 9-12). Some Chl-a products were produced by EUMETSAT or were based on reflectances produced by EUMETSAT (OLCI WFR Level 2) 44 (Table 1: products 3-6). And some Level 3 and Level 4 (modeled) Chl-a products were produced by the Copernicus Marine Service 30 (Table 1: products 7 and 8, 13, and 15 and 16). The difference between the Level 3 and 4 is that in the Level 4 data the missing values of the daily Chl-a from remote sensing estimates are optimally interpolated, therefore we consider those as modeled products (Table 1: products [14][15][16]. One Chl-a product was downloaded from SatBałtyk portal 31 (Table 1: products 14). Detailed list with all the match-up selection criteria of all the 21 Chl-a products is brought out in Table 1.  In addition to the products that were listed in the Table 1, we tested different BR (Table 2) on top-of-atmosphere (TOA) and bottom-of-atmosphere (BOA) reflectances on both, Sentinel-3 OLCI and Sentinel-2 MSI data. The selected band ratios are based on the best band ratios of the previous studies in similar region. 4,23,45 For Sentinel-3 OLCI C2RCC atmospheric correction and for Sentinel-2 MSI both POLYMER and C2X atmospheric correction processors were used.
Although, we tested all the BR listed in Table 2, here we presented the details only for the best results for each satellite sensor. The best BR for Sentinel-3 OLCI (BR8_S3) (1 × 1 300 m resolution match-up pixels) is from TOA reflectances (using bands on 754, 709, and 665 nm). BR8_S3 Chl-a was derived using Eq. (1): where x is TOA reflectance BR R754/R709-R754/R665 (BR8, Table 2). Match up pixels with following raised flags removed from the study: invalid, sun_glint_risk, cloud_risk, cloud, and cloud_shadow.
The best BR for Sentinel-2 MSI (BR3_S2) (3 × 3 mean 20-m resolution match-up pixels) is from POLYMER reflectances using bands on 704 and 665 nm. BR3_S2 Chl-a was derived using Eq.
where x is the reflectance band ratio R704/R665 (BR3, Table 2). Only with no masking flags match-up pixels were considered as valid. In case the match-up point was at the overlap of two image tiles, the better-matching concentration was considered.

Statistical Parameters
The in situ dataset is described by basic descriptors like mean, median, first and third quartile (first and third Q) and standard deviation (StDev). The performance of the remote sensing products was evaluated using the determination coefficient (R 2 ). R 2 is used to analyze how well observed in situ values are predicted by the model based on the proportion of total variation of outcomes explained by the model; it is calculated using Eq. (3) E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 2 4 9 whereŷ is the predicted value, y is the observed value,ȳ is the mean value of observed y values, and n is the number of observations. With R 2 , a p-value is presented, that is showing the statistical significance by describing how likely the data would have occurred by random chance.
To evaluate the errors of the remote sensing products, root mean squared error (RMSE) and bias were used (in mg m −3 ). RMSE is a frequently used measure of differences between values observed in situ and predicted by a model; it is calculated using Eq. (4). Bias shows a systematic error, and it is calculated using Eq. (4) whereŷ is the predicted value, y is the observed value, and n is the number of observations. In addition, the uncertainty in the remote sensing products was evaluated using the mean absolute percentage error (MAPE in %). MAPE measures the percentage error of the estimated values in relation to the actual values; it is calculated using Eq. (6) E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 6 4 8 whereŷ is the predicted value, y is the observed value, and n is the number of observations.

In Situ Database
We carried out in situ measurements in five years covering different parts of Estonian coastal and territorial waters (Fig. 1). More measurements were carried out in the western part of the Gulf of Finland and around the largest islands, less coverage was in the eastern part of the Gulf of Finland.
All the measurements were done from March to October, mostly from April to July, when the most cloud free days occurred ( Table 3).
The in situ Chl-a dataset combines 267 unique measurements ranging from 0.4 to 45.9 mg m −3 . Although the range is 45.5 mg m −3 , then the mean of the dataset is 6.9 mg m −3 and median is 4.9 mg m −3 , leaving most of the measured values under 8 mg m −3 ( Fig. 2; Table 4). The Chl-a concentration that exceeds the threshold of 5 mg m −3 is often considered as bloom in the Baltic Sea. As a background information, the absorption coefficient of CDOM at a wavelength of 420 nm ranged from 0.42 to 12.84m −1 (mean 1.53 m −1 and median 0.87 m −1 ) and TSM from 0.60 to 29.40 (mean 8.45 mg m −3 ; median 8.06 mg m −3 ) for the same dataset.

Chlorophyll-a Products
In this study, we used 16 freely available Chl-a products including 3 modeled and 13 remote sensing products that are available for the Baltic Sea. Five of the products are available in two Table 3 The temporal distribution of the in situ Chl-a, n denotes the number of measurements. different spatial resolutions ( Table 1). The number of found match-ups was from 28 to 262 (Table 5), depending on the satellite data and quality flags used for each product ( Table 1). The products that were based on Sentinel-3 data had generally more match-ups than the ones based on Sentinel-2 data due to the higher revisit time.
The R 2 of the Chl-a products were ranging from <0.001 up to 0.55 given by C2RCC_S3 product. This and maximum chlorophyll index_S3 (MCI_S3) are the two Chl-a products available with the highest R 2 , but C2RCC_S3 has better higher MAPE and RMSE than MCI_S3 [ Fig. 3(a)].
Generally, products based on Sentinel-3 data were performing better than the products based on Sentinel-2 or MODIS. The best product based on Sentinel-2 data was POLYMER_S2_HR [ Fig. 3(a)]. The weakest performance was by the C2RCC_S2_LR and the ocean Chl-a products [fluorescence line height_S3 (FLH_S3), OC4ME_S3]. These results were also not statistically significant. The uncertainties were high for all the products, MAPE ranging from 66% to 456%. RMSE ranged from 3.8 to 22.1 mg m −3 ( Table 5).
The BR8 (Table 2) based on Sentinel-3 TOA reflectances outperformed all the other Chl-a products (BR8_S3, Table 5), also the BR3 (Table 2) on Sentinel-2 POLYMER_S2_HR reflectance had high R 2 with the lowest MAPE (59%) and RMSE (3.6 mg m −3 ) of all (BR3_S2, Table 5) [ Fig. 3(b)]. All the R 2 from band ratio testing are shown in Table 6. We chose atmospheric processors based on the results of the Chl-a products (C2RCC for Sentinel-3, and POLYMER and C2X for Sentinel-2). Fig. 2 Box plot of the in situ Chl-a dataset (n ¼ 267). Gray color shows Chl-a values from first Q to median, orange from median to third Q, and X marks the mean value. Table 4 Descriptive statistics of the in situ Chl-a (mg m −3 ) dataset: t is the time period, n is the number of the measurements, Q is the quartile, and StDev is the standard deviation of the dataset.  Table 5 Performance of the different Chl-a remote sensing products: n denotes the number of match-ups, R 2 is the determination coefficient, and p-value is the statistical significance of the linear regression. MAPE in % and root mean square error (RMSE, mg m −3 ) show the uncertainties of the products. Bias is the systematic error. The products are in the decreasing order of the R 2 .  Fig. 3 Comparison of the in situ and the best (a) available remote sensing Chl-a products; and (b) BR for Sentinel-2 (POLYMER_S2_HR and BR3_S2) and -3 (C2RCC_S3 and BR8_S3).

Discussion
The in situ measurements covered the seasonal variations of phytoplankton (spring bloom, summer minimum and cyanobacteria bloom) giving a good sense of the state in the Baltic Sea ( Fig. 1 and Table 3). About 267 unique in situ measurements showed variability with range of 45.5 mg m −3 , but concentrations of Chl-a remained mostly between 2 and 8 mg m −3 (first to third Q) with average 7 mg m −3 and median 5 mg m −3 ( Fig. 2; Table 4).
Only five Chl-a products out of 21 had R 2 > 0.3 and only six products had RMSE < 5 mg m −3 (median of the in situ database). The best performing available Chl-a remote sensing product was C2RCC_S3 (R 2 ¼ 0.55, RMSE ¼ 4.5 mg m −3 , MAPE = 74%, and n ¼ 125). This product has shown good performance also in other parts of the Baltic Sea. Kyryliuk and Kratzer 24 tested it in the west coast of the Baltic Sea (Sweden) and found strong relationship with in situ data (R 2 ¼ 0.79 and n ¼ 27). Kratzer and Plowey 25 found the similar correlations with our results (R 2 ¼ 0.56 and n ¼ 59) in the North-Western part of the Baltic Sea.
The six products, that had lowest RMSE, were all remote sensing products of Chl-a. Three of them were the POLYMER Chl-a products (Table 5). While POLYMER products showed one of the lowest RMSE, then C2X products had lowest MAPE (66% to 67%). Although, MCI_S3, ST_S3, and BaltAlg_S3 remote sensing products had stronger relationship with in situ, they also had much higher uncertainties, so those products might not be reliable. In addition, we can without doubt claim that all model-based products (CEMS_model, ERGOM_model, and SatBaltyk_model) and almost half of the remote sensing products (C2RCC_S2_HR, C2RCC_ S2_LR, FLH_S3, HR_OC_S3, OC4ME_S3, ONNS_S3_HR, and ONNS_S3_LR) failed to retrieve Chl-a in Estonian marine waters: R 2 < 0.1, MAPE > 90%, RMSE > 5 mg m −3 , and p-value mostly showed no statistical significance and in most of the cases, remote sensing products outperformed modeled Chl-a products (Table 5). Overall, the C2RCC_S3 and POLYMER_S2_HR products have reasonably high R 2 and lower errors than other products ( Fig. 3 and Table 5).
From eight different band ratios tested, the Sentinel-3 TOA BR2, BR3, BR8, and BOA BR3 and BR8 outperformed all the available Chl-a products for the Baltic Sea ( Table 6). The best was BR8 on TOA reflectances (R 2 ¼ 0.67, n ¼ 125, and MAPE = 63%). With Sentinel-2 data the BRs were not as successful, no BR outperformed the best Chl-a available product (C2RCC_S3). For Sentinel-2 data, the most successful were BR3 on POLYMER (R 2 ¼ 0.50, n ¼ 77, and MAPE ¼ 59%) and BR8 on TOA reflectances (Table 6). POLYMER showed the best suitability Table 6 The determination coefficients (R 2 ) for each tested band ratio (BR; the formulae of the band ratios are in Table 2). The band ratios were tested on: Sentinel-3 OLCI TOA and C2RCC reflectances (S3 TOA and S3 C2RCC, accordingly); Sentinel-2 MSI TOA, Case-2Extreme, and POLYMER reflectances (S2 TOA, S2 C2X, S2 POLYMER, accordingly). The highest values for S3 and S2 are shown in bold. n notes the number of match-ups. S3  as atmospheric correction in the Baltic Sea area for Sentinel-2 MSI data. The same conclusion was made by Warren et al. 46 Overall, BR3 and BR8 were the most successful and seemed to work better with Sentinel-2 and -3 TOA data. The fact that the best BR algorithm was based on TOA reflectances, is nothing new. The TOA band ratios have been tested before. 4,45 Even more, Soomets et al. 45 found also BR8 and BR3 most successful in inland waters with low Chl-a (∼5 mg m −3 which agrees well with Chl-a median of the present study). Both the BR3 and BR8 are essentially algorithms that are using the depth Chl-a absorption feature and a peak at 700 to 710 nm against the Chl-a absorption maximum (reflectance minima). BR8 only normalizes it to a SWIR band to minimize atmospheric and glint effects. The fact that empirical algorithms using distinct spectral features in the TOA data perform better than any other method on atmospherically corrected data suggests that atmospheric corrections still need improvements.
The high uncertainties of the results are due to the optically complex nature of the Baltic Sea which has high temporal and spatial variability of the Chl-a and turbidity caused by the high loads of organic matter and nutrients from the rivers, shipping, or other human activities. 10 In the cyanobacterial bloom, that occurs in the Baltic Sea every summer, the Chl-a often vary by 2-3 orders of magnitude over the distance of a few meters. 14 It makes the development of remote sensing algorithms very difficult as water sample collected in one point may not have anything in common with concentration obtained for a satellite pixel which is 300 × 300 m in size. 14 The same problem occurs in validation of satellite products-in situ sampling has to be carried out in very precise location in space and time and even then, may not be useful if water around the sampling station is very heterogenous.
The best available Chl-a product and the tested BR algorithms showed high errors: RMSE from 3.6 to 4.5 mg m −3 is too high for the in situ dataset with the median of 4.9 mg m −3 and this might influence the reliability of the derived concentration of Chl-a. Regardless of the high errors, the tested BR algorithms show still more promise in accuracy of deriving Chl-a in the Baltic Sea than any of the evaluated products. Especially, the model (Level 4) results were very poor and are not useful for smaller nor larger scale analysis. We can say the same to all the Copernicus Marine Service Level 3 products, except for BaltAlg_S3. This product is competitive by its results, although it is still not preferable (usable in larger analysis) because of the lack of valid data. There were only 28 match-ups, which is noticeable less than in any other product. Because there are still room for improvement in deriving Chl-a in Baltic Sea, it is necessary to continue validation of the remote sensing products.

Conclusions
Both modeled and remote sensing-based, altogether 21 different Chl-a products of the Baltic Sea, were tested on in situ data collected in Estonian marine waters during 2016-2021. In addition, eight different empirical band ratio type algorithms were tested on TOA and BOA reflectances of Sentinel-3 and Sentinel-2. The best performing Chl-a product was C2RCC of Sentinel-3 OLCI (R 2 ¼ 0.55, RMSE ¼ 4.5 mg m −3 , MAPE = 74%, and n ¼ 125). All modeled Chl-a products and about half of the remote sensing Chl-a products failed to produce reasonable results. The empirical BR with Sentinel-3 data were in fact more successful than any of the available remote sensing or modelling products: R 2 ¼ 0.67, RMSE ¼ 3.9 mg m −3 , MAPE = 63%, and n ¼ 125. Our results showed better performance of the TOA reflectance, which suggests that there is still room for improvement of atmospheric correction methods. Also, high uncertainties of the retrieved Chl-a concentrations (products or band ratio algorithms) due to the complex waters of the Baltic Sea, show that future validation and improvement of Chl-a deriving methods are still needed.
Kaire Toming is an associate professor at the University of Tartu. She received her MS degree in hydrology from the University of Tartu in 2006, and her PhD in hydrobiology from the Estonian University of Life Sciences in 2013. She is the author of 23 journal papers. Her current research interests include inland and coastal optical remote sensing, water quality, aquatic ecology, and limnology.
Birgot Paavel is an associate professor at the University of Tartu. She received her MS and PhD degrees from the University of Tartu in environmental physics and hydrobiology in 2004 and 2008, respectively. She is the author of 32 journal papers. Her current research interests include inland and coastal optical remote sensing, environmental technology, and physical oceanography.
Tiit Kutser is a professor at the University of Tartu. He received his in environmental physics from the University of Tartu in 1997. He is the author of 127 journal papers. He has been opponent of more than 10 PhD theses in Estonia, Australia, Canada, Finland, Spain, Sweden, UK, Switzerland. His current research interests include inland and coastal optical remote sensing, water quality, geophysics, and physical oceanography.