Predictive performance of NMME seasonal forecasts of global precipitation: A spatial-temporal perspective

Predictive performance of NMME seasonal forecasts of global precipitation: A spatial-temporal perspective

Accepted Manuscript Research papers Predictive performance of NMME seasonal forecasts of global precipitation: a spatial-temporal perspective Tongtieg...

1MB Sizes 0 Downloads 10 Views

Accepted Manuscript Research papers Predictive performance of NMME seasonal forecasts of global precipitation: a spatial-temporal perspective Tongtiegang Zhao, Yongyong Zhang, iaohong Chen PII: DOI: Reference:

S0022-1694(19)30005-8 https://doi.org/10.1016/j.jhydrol.2018.12.036 HYDROL 23357

To appear in:

Journal of Hydrology

Received Date: Revised Date: Accepted Date:

24 October 2018 14 December 2018 17 December 2018

Please cite this article as: Zhao, T., Zhang, Y., Chen, i., Predictive performance of NMME seasonal forecasts of global precipitation: a spatial-temporal perspective, Journal of Hydrology (2019), doi: https://doi.org/10.1016/ j.jhydrol.2018.12.036

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Predictive performance of NMME seasonal forecasts of global precipitation: a spatial-temporal perspective

Tongtiegang Zhao 1, Yongyong Zhang 2, and Xiaohong Chen 1

1. Center for Water Resources and Environment, Sun Yat-Sen University, Guangzhou, China 2. Key Laboratory of Water Cycle and Related Land Surface Processes, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China Corresponding author: Tongtiegang Zhao ([email protected])

Revised manuscript submitted to Journal of Hydrology

1

Abstract: Global climate models (GCMs) produce informative seasonal forecasts of global precipitation months ahead of the occurrence for hydrological forecasting. Meanwhile, the skill of GCM forecasts varies by location and initialization time. In this paper, we investigate the anomaly correlation, which indicates the correspondence between forecasts and observations, for 10 sets of global precipitation forecasts in the North American Multi-Model Ensemble (NMME) project. We propose to use principal component analysis to characterize the variation of anomaly correlation. We identify the existence of spatial and temporal patterns at the global scale. The spatial pattern reveals that high (low) anomaly correlation at one initialization time coincides with high (low) anomaly correlation at other initialization times. In other words, for a grid cell, the anomaly correlation at different initialization times tends to be similarly high, or low. It is observed that some of the regions where grid cells are with overall high anomaly correlation tend to exhibit tele-connections with global climate drivers. On the other hand, the temporal pattern suggests that the anomaly correlation tends to improve with initialization time. This pattern is attributable to data assimilation that bases forecasts at a later initialization time on more global observations and simulations. Generally, the two patterns are effective and explain 50% to 70% of the variation of anomaly correlation for the 10 sets of NMME forecasts. The projections of anomaly correlation vectors to the two patterns help illustrate where and when the NMME precipitation forecasts are skillful. Keywords: Global climate model; Seasonal forecasts; Precipitation; Anomaly correlation; North American Multi-Model Ensemble.

2

1. Introduction Global climate models (GCMs) that formulate land-ocean-atmosphere processes to predict climate around the world are run by major climate centers to provide operational forecasts (Merryfield et al., 2013; Saha et al., 2014; Jia et al., 2015). The National Centers of Environmental Prediction (NCEP) of the United States uses the Climate Forecast System version 2 (CFSv2) (Saha et al., 2014). The Geophysical Fluid Dynamics Laboratory (GFDL) develops climate model version 2.1 and forecast-oriented climate model version 2.5 (Jia et al., 2015). The Canadian Meteorological Centre (CMC) employs coupled climate model versions 3 and 4 (Merryfield et al., 2013). In the North American Multi-Model Ensemble (NMME) project, there are in total 10 active GCMs that produce global climate forecasts (Kirtman et al., 2014). In the meantime, informative GCM forecasts have been increasingly integrated into hydrological forecasting systems and are shown to considerably improve forecast skill and lead time (e.g., Schepen and Wang, 2014; Bennett et al., 2017; Lehner et al., 2017). GCM forecasts are updated dynamically by regular initialization of GCMs based on global observational and simulation datasets; timely updated forecasts are expected to become better at predicting future climate (Bauer et al., 2015). They are found to be indicative of global climate drivers, such as El Niño Southern Oscillation (Tippet et al., 2017), North Atlantic Oscillation (Dunstone et al., 2016), and Madden Julian Oscillation (Saha et al., 2014). Also, they are observed to capture precipitation and temperature in the United States (Slater et al., 2016), Europe (Slater et al., 2017), Africa (Sheffield et al., 2014; Shukla et al., 2016), China (Ma et al., 2016), Northern Hemisphere of the Earth (Kim et al., 2012), and around the world (Yuan et al., 2011). Despite the great potential of GCM forecasts, their predictive performance is elusive as it varies spatially and temporally (Yuan et al., 2011; Kim et al., 2012; Luo et al., 2013; Saha et al., 2014; Zhao et al., 2018). For gridded GCM precipitation forecasts, the variation means that the performance changes by grid cell and that for one grid cell, the performance differs by initialization time (Zhao et al., 2017a, 2017b). Spatial averaging at catchment and region levels is used to deal with the variation (e.g., Ma et al., 2016; McEvoy et al., 2016; Slater et al., 2016; Schepen et al., 2018) and to facilitate more efficient use of GCM forecasts, lots of efforts are devoted to empirically identifying in which catchment/region and at what initialization time GCM forecasts perform well (e.g., Merryfield et al., 2013; Yuan and Wood, 2013; Saha et al., 2014; Jia et al., 2015; Slater et al., 2017). On the other hand, it is still largely unknown whether general patterns exist in the spatially and temporally varying 3

predictive performance. For regions with skillful GCM forecasts, do forecasts improve steadily with initialization time? For regions with poor forecasts, is forecast skill always unsatisfactory? These issues have useful implications for hydrological forecasting and are yet to be addressed. In this paper, we aim to identify patterns to characterize the variation of anomaly correlation, a widely used measure of forecast skill (e.g., Yuan et al., 2011; Luo et al., 2013; Barnston et al., 2016; Ma et al., 2016; Slater et al., 2017), of GCM seasonal forecasts of global precipitation. The objectives are threefold: (1) to show patterns of the spatially and temporally varying anomaly correlation at the global scale. We retrieve the retrospective forecasts, i.e., hindcasts, of 10 GCMs in NMME and conduct principal component analysis (PCA); (2) to quantify the variation explained by the patterns. The quantification is achieved by formulating anomaly correlation at different initialization times as a vector and projecting vectors across grid cells onto the patterns; and (3) to evaluate the relative effectiveness of the patterns for the 10 sets of forecasts. As will be demonstrated later in this paper, the proposed patterns are effective and explain 50% to 70% of the variation of anomaly correlation globally.

2. Data The NMME retrospective forecasts are obtained from the data library of the International Research Institute (IRI) at Columbia University (https://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/). The GCMs were originally run at different spatial resolutions and time steps. In NMME, their forecasts have been regridded and aggregated to form a consistent gridded dataset. Specifically, the spatial resolution of the NMME forecasts is 1 degree by 1 degree, and the temporal resolution is 1 month. The data preprocessing facilitates efficient inter-comparison of results for different sets of forecasts (e.g., Kirtman et al., 2014; Ma et al., 2016; Shukla et al., 2016; Slater et al., 2016, 2017). Basic information, such as the number of ensemble members and the range of lead time, on the NMME forecasts are summarized in Table 1.

Table 1. Basic information on the 10 sets of NMME forecasts investigated in this paper Climate center

Number of ensemble

Model 4

Lead time (month)

members Canadian coupled model version 3 (CanCM3)

10

0–11

Canadian coupled model version 4 (CanCM4)

10

0–11

Community climate system model version 3 (CCSM3)

6

0–11

Community climate system model version 4 (CCSM4)

10

0–11

Climate model version 2.1 (CM2p1)

10

0–11

Climate model version 2.1 (CM2p1-aer04)

10

0–11

Climate model version 2.5 with forecast-oriented low ocean resolution (CM2p5FLOR-A06)

12

0–11

Climate model version 2.5 with forecast-oriented low ocean resolution (CM2p5FLOR-B01)

12

0–11

National Center for Atmospheric Research (NCAR)

Community earth system model version 1 (CESM1)

10

0–11

National Centers for Environmental Prediction (NCEP)

Climate forecast system version 2 (CFSv2)

24

0–9

Canadian Meteorological Center (CMC)

Center for Ocean-Land-Atmosphere Studies, Rosenstiel School of Marine and Atmospheric Science (COLARSMAS)

Geophysical Fluid Dynamics Laboratory (GFDL)

The Climate Prediction Center’s merged analysis of precipitation (CPC–CMAP) corresponding to the forecasts is also obtained from the IRI’s data library (https://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/.CPC-CMAP/). The CPCCMAP dataset, which is a monthly dataset on a 2.5° latitude-longitude grid, was produced through the integration of worldwide-gauge observations, satellite estimates, and numerical model outputs (Xie and Arkin, 1997; Xie et al., 2007). In NMME, this dataset has been regridded to match its resolution with that of the forecasts (Kirtman et al., 2014). In this paper, the anomaly correlation between CPC-CMAP observations and NMME forecasts in the hindcast period from 1982 to 2010 is calculated. 5

3. Methods 3.1 Formulation of anomaly correlation Each of the 10 sets of NMME forecasts has five dimensions, namely, latitude, longitude, initialization time, lead time, and ensemble member. The CPC–CMAP observations have three dimensions, which are latitude, longitude, and time. In this paper, the focus is on the NMME seasonal forecasts initialized in January, February, March, April, May, and June. Specifically, the forecasts targeting at the total precipitation in the three months of June, July, and August are investigated. In the Northern Hemisphere, the three targeted months generally comprise the summer, when floods and droughts are of concern (e.g., Yuan and Wood, 2013; Emerton et al., 2017). In the Southern Hemisphere, they are winter months when floods and droughts occur under the Mediterranean climate (Bennett et al., 2017). The anomaly correlation between raw ensemble means and observations is calculated. That is, the anomaly correlation is obtained for forecasts initialized from January to June. For each grid cell, the six anomaly correlation coefficients are formulated as a vector Rc  [rc ,1

rc ,2

rc ,3

rc ,5

rc ,4

rc ,6 ] .

(1)

It is worthwhile to note the dimensions considered in Eq. (1). First of all, the subscript c is the index of grid cell. It accounts for the first two dimensions, i.e., latitude and longitude, of GCM forecasts. The value of c ranges from 1 to 16 643 because 16 643 grid cells of land surface locate between 60° S and 80° N. Secondly, the script m in rc ,m represents the third dimension of initialization time, i.e., m= 1, 2, 3, 4, 5, 6. As the targeted months are fixed, m also indicates the fourth dimension of lead time. That is, the lead time is reduced from 5 months to 0 month as m progresses from January to June. Finally, the fifth dimension of the ensemble member is eliminated because the anomaly correlation is calculated for the ensemble mean. In Eq. (1), the vector Rc formulates the anomaly correlation at 6 initialization times at grid cell c. The anomaly correlation vectors are obtained for each set of NMME forecasts. The 10 sets of anomaly correlation vectors lay the basis for the investigation of spatial and temporal patterns. Following the mathematical formulations of PCA (Hotelling, 1933; Demsar et al., 2013), the variation of Rc is calculated as the sum of the squared anomaly correlation 6

(2)

6

varc  |Rc |2  Rc RcT   rc2,m , m 1

The variation across the globe is calculated based on Eq. (2) 16643

16643

vartotal =  varc =  |Rc |2  c 1

c 1

16643 6

 r c 1 m 1

2 c ,m

.

(3)

In this way, the variation of anomaly correlation with initialization time across the globe is quantified.

3.2 Variation of anomaly correlation PCA is a data-driven approach that has been widely used to analyze linear dependency relationships among a group of variables (Hotelling, 1933; Demsar et al., 2013). In this paper, PCA is applied to anomaly correlation at different initialization times. The core of PCA is a set of eigenvectors Vn (n= 1, 2, 3, 4, 5, 6), which are unit vectors orthogonal to one another. That is,

1 (n1  n2 ) Vn1VnT2   0 (n1  n2 )

(4)

The anomaly correlation vectors Rc (c= 1, 2, …, 16 643) are projected onto the space of the eigenvectors using the mathematical operation of inner product 6

Rc   vc ,nVn n 1

v

c ,n

 RcVnT  ,

(5)

where vc ,n is the projection of Rc onto Vn . The variation at grid cell c is re-calculated with Eq. (5), i.e., 6

6

varc  |Rc |    vc ,n1Vn1VnT2 vc ,n2 . 2

(6)

n1 1 n2 1

The above Eq. (6) is simplified using Eq. (4): 6

varc  |Rc |2   vc2,n . n 1

7

(7)

As is illustrated, PCA retains the variation of the anomaly correlation at grid cell c. Specifically, 6

6

m 1

n 1

(8)

varc   rc2,m   vc2,n .

The above equation also indicates that PCA preserves the variation of anomaly correlation across the globe vartotal 

where

16643

v c 1

2 c ,n

16643

 |Rc |2  c 1

16643 6

6 16643

  vc2,n    vc2,n , c 1 n 1

(9)

n 1 c 1

is the variation explained by the n-th eigenvector Vn .

Based on linear algebraic operations, PCA makes V1 the most effective and explain the most variation of anomaly correlation, V2 the second most effective and explain the second most variation, and so on: 16643

16643

16643

16643

16643

16643

c 1

c 1

c 1

c 1

c 1

c 1

 vc2,1 

 vc2,2 

 vc2,3 

 vc2,4 

 vc2,5 

 vc2,6 .

(10)

Therefore, the effectiveness of Vn is measured by the percentage of variation explained by this eigenvector 16643

Vn %  100   vc2,n vartotal

(n  1, 2,3, 4,5, 6) .

(11)

c 1

The eigenvectors determined by PCA are wholly driven by the anomaly correlation vectors. From this perspective, PCA facilitates an objective examination of the variation of anomaly correlation. According to the formulations, the first and second eigenvectors, V1 and V2, are respectively the most and second most effective in characterizing how [rc ,1

rc ,2

rc ,3

rc ,4

rc ,5

rc ,6 ] correlate with one another across the 16 643 grid cells. In the

analysis, the attention is paid to these two eigenvectors. When deriving V1 and V2, their sampling uncertainty is also assessed. It is through bootstrapping. Specifically, the 16 643 anomaly correlation vectors are sampled randomly with replacement for 1000 times. The first and second eigenvectors in the 1000 rounds of sampling are saved. The 1000 sets of eigenvectors tell how the eigenvectors are varying 8

subject to the sampling variability of anomaly correlation vectors. By pooling the eigenvectors, the 10th, 25th, 75th, and 90th percentiles are obtained respectively for V1 and V2. Then, the 50% and 80% uncertainty intervals of the eigenvectors are determined. The uncertainty intervals show whether the relationships among [rc ,1 rc ,2

rc ,3

rc ,4

rc ,5

rc ,6 ] ,

as are characterized by V1 and V2, are robust.

3.3 Spatial and temporal patterns This paper focuses on the variation of anomaly correlation between NMME forecasts and observations. In an investigation of NMME forecasts of summer precipitation in China, Zhao et al. (2018) observed two patterns, spatial and temporal, of the anomaly correlation. This paper examines the two patterns at a global scale and verifies them objectively using the eigenvectors of PCA. The spatial pattern suggests that the anomaly correlation at different initialization times tends to be at the same level. It is formulated as follows, P1 

[1 1 1 1 1 1] 12  12  12  12  12  12

(12)

.

From a mathematical perspective, this pattern means that for certain grid cells, high (low) anomaly correlation at one initialization time coincides with high (low) anomaly correlation at other initialization times. The temporal pattern suggests that the anomaly correlation improves linearly with initialization time. The formulation is shown below, P2 

[5 3 1 1 3 5] (5) 2  (3) 2  (1) 2  12  32  52

.

(13)

The indication of the temporal pattern is that the anomaly correlation at a later initialization time tends to be higher. The two patterns are unit vectors orthogonal to each other. This feature follows the definition of eigenvectors in PCA. P1 and P2 facilitate spatial and temporal projections, respectively, to investigate the properties of anomaly correlation vectors. The projections of the anomaly correlation vector onto the two patterns are obtained using inner product (Hotelling, 1933; Demsar et al., 2013) 9

 pc ,1  Rc P1T ,  T  pc ,2  Rc P2

(14)

where pc,i (i= 1, 2) is the projection of Rc onto Pi . In the meantime, the projections allow the quantification of the effectiveness of the two patterns for all grid cells across the globe. The effectiveness is measured by the percentage of variation explained by the patterns 16643  P   pc2,1 vartotal % 100   1  c 1 .  16643 2  P %  100  pc ,2 vartotal  2  c 1 

(15)

An illustrative example is presented for the mathematical projection. Supposing an anomaly correlation vector is as follows R 'c  [0.25 0.35 0.45 0.55 0.65 0.75] .

(16)

In this case, the anomaly correlation vector has two nice properties. First of all, it is positive at the six initialization times. The positive anomaly correlation means that high (low) values of forecasts correspond to high (low) values of observations. Secondly, the correlation linearly increases from January to June. That is, the correspondence between forecasts and observations improves steadily with initialization time. Using Eq. (14), the projections of R 'c onto P1 and P2 are obtained  p 'c ,1  R 'c P1T  1.225 ,  T  p 'c ,2  R 'c P2  0.418

(17)

R 'c  1.225 P1  0.418 P2 ,

(18)

Further,

The above equation can be verified using the values in Eq. (16), Eq. (12), and Eq. (13). In the meantime, the fact that PCA preserves the variation of anomaly correlation can also be verified:

|R 'c |2  0.252  0.352  0.452  0.552  0.652  0.752  1.675 ,  2 2 2 |R 'c |  1.225  0.418  1.675 10

(19)

With Eq. (19), the percentage of variation of R 'c explained by the first and second pattern is quantified  1.2252  P1 %  100  1.675  89.6%  .  2 0.418  P %  100   10.4%  2 1.675 

(20)

The example in Eqs. (16 to 20) is hypothetical with the variation of R 'c wholly explained by the two patterns. For NMME forecasts, the variation of anomaly correlation is far more complex. The anomaly correlation can be negative, and it can change drastically with initialization time. As a result, the anomaly correlation vector of NMME forecasts can be projected onto not only the spatial and temporal patterns, but also other directions. In the following, the spatial and temporal patterns are used as exploratory tools to handle the variation of anomaly correlation. The existence and effectiveness of the two patterns are verified by PCA.

4. Results 4.1 Spatial-temporal variation of anomaly correlation The anomaly correlation measures the correspondence between forecasts and observations. It is one of the most popular measures of the skill of GCM forecasts (e.g., Merryfield et al., 2013; Saha et al., 2014; Jia et al., 2015). For CFSv2 seasonal forecasts initialized in January, February, March, April, May, and June, the anomaly correlation is shown by month in Figure 1. In this figure, the red-blue color scheme represents the value of anomaly correlation. Many instances of red pixels that indicate high positive correlation exist. Forecasts are informative in these cases. That is, large values of forecasts correspond to large values of observations, and small values of forecasts coincide with small values of observations. However, blue pixels that represent negative correlation also exist. They imply that large (small) values of forecasts tend to correspond to small (large) values of observations. In those cases, forecasts are generally non-informative.

11

Figure 1. Anomaly correlation of CFSv2 forecasts of global precipitation in June, July, and August. The six subplots are respectively for forecasts initialized in a) January, b) February, c) March, d) April, e) May, and f) June. The anomaly correlation has also been evaluated for the other 9 sets of forecasts, and the results are shown in the supplementary material.

Given the variation of anomaly correlation from January to June, users of the forecasts would look for regions where clusters of red pixels exist. The regions with “good” anomaly correlation, persistently high across the six initialization times or steadily improving with initialization time, would be of particular interest. Taking a close look at Figure 1, clusters of red pixels can be seen, for example, in the Northwest United States, North Brazil, Southeast Asia, and East Australia. However, over these regions, the spatial extents of the clustered red pixels tend to change from month to month. In some regions, such as Siberia and North Canada, clusters of red pixels are even scattered and surrounded by blue pixels. Overall, it is none too easy for users to identify regions with “good” anomaly correlation. This outcome implies the complexity of the predictive performance of GCM forecasts. It also calls for exploratory mathematical tools to deal with the variation of anomaly correlation.

12

4.2 Existence of spatial and temporal patterns While the spatial-temporal variation is highlighted in Figure 1, the temporal variation is further detailed in Figure 2. For the 16 643 grid cells under investigation, the anomaly correlation is plotted against initialization time in the upper parts of Figure 2. Each colored thin line is for one grid cell. The anomaly correlation is observed to vary drastically with initialization time. In addition, the anomaly correlation falls below zero in many instances. PCA is employed to handle the variation of anomaly correlation with initialization time across the 16 643 grid cells. In PCA, the eigenvectors tell how the anomaly correlation in January, February, …, and June relate to one another. The blue dashed lines in the left- and right-hand sides of Figure 2 respectively show the first and second eigenvectors from PCA. The eigenvectors yield insights into the spatial-temporal variation. The first eigenvector has similar loadings on the six initialization times. The indication is that the anomaly correlation tends to be at a similar level. In other words, for a grid cell, the anomaly correlation at different initialization times can be similarly high, or low. Quantitatively, this eigenvector explains 57.2% of the variation of anomaly correlation for CFSv2 forecasts. The message is that the variation of anomaly correlation, as is shown in Figure 1, is largely spatial. This result can, to some extent, be observed in the subplots of Figure 1. For example, in Australia, high correlation can generally be observed in the eastern and western parts of the country from January to June. By contrast, the correlation appears to be always low in the middle part of the country. The loadings of the second eigenvector tend to increase from January to June. This result implies the temporal improvement of anomaly correlation with initialization time. On the other hand, the temporal improvement is more difficult to be observed than the spatial difference. This outcome is in accordance with the fact that the second eigenvector is not as prominent as the first one and explains 12.7% of the variation. Nevertheless, the cumulative improvements can be observed globally. In Figure 1, the anomaly correlation exhibits more instances of red pixels in June than in January.

13

Figure 2. Respective similarities of the first two eigenvectors V1 and V2 to the spatial and temporal patterns P1 and P2 for CFSv2 forecasts. a) First eigenvector and its spatial pattern in the context of all the anomaly correlation vectors (thin colored lines and different colors representing different grid cells); b) 50% and 80% uncertainty intervals of the first eigenvector; c) Second eigenvector and its temporal pattern; and d) 50% and 80% uncertainty intervals of the second eigenvector. The eigenvectors have also been evaluated for the other 9 sets of forecasts, and the results are shown in the supplementary material.

The first and second eigenvectors are respectively illustrated with spatial and temporal patterns the in Figure 2. The similarity of the eigenvectors to the patterns highlights the 14

existence of the conceptual patterns (Zhao et al., 2018) at the global scale. Furthermore, the persistence of the patterns is evaluated by bootstrapping, i.e., sampling the anomaly correlation vectors randomly with replacement for 1000 times. The 50% and 80% uncertainty intervals of the eigenvectors are shown in the lower part of Figure 2. It can be observed that the intervals are very narrow. Therefore, it is concluded that the sampling variability of anomaly correlation vectors across the globe has a very small impact on the eigenvectors. In other words, the patterns as are indicated by the eigenvectors are robust. This result is attributed to the large sample size, i.e., 16 643 anomaly correlation vectors are employed to derive the eigenvectors.

4.3 Projection to the spatial pattern As the existence of spatial pattern has been verified by PCA, the anomaly correlation vectors are projected onto the spatial pattern, as shown in Figure 3. This figure uses a red-blue scheme, which is slightly different from the color scheme of Figure 1, to represent the value of the projection. While the spatial-temporal variation complicates the analysis of anomaly correlation, the projection yields insights into the variation.

Figure 3. Projection of the anomaly correlation vectors to the spatial pattern for CFSv2 forecasts. The spatial projections under the other 9 sets of forecasts are shown in the supplementary material.

15

The projection onto the spatial pattern helps locate the regions where forecasts tend to be highly correlated with observations across the six initialization times. In Figure 3, the projection is high in the Northwest United States, North Brazil, Southeast Asia, and Eastern and Western Australia. Based on Figure 1, it can be observed that these regions tend to exhibit high positive anomaly correlation from January to June. The projection also suggests that the anomaly correlation tends to be high in Southern Europe, North Africa, and South Africa. Therein, clusters of red pixels can also be seen in Figure 1. It is noted that some of the regions with overall high anomaly correlation were shown to exhibit hydro-climatic teleconnections by previous studies. For example, El Niño Southern Oscillation and Pacific Decadal Oscillation are observed to impact Northwest United States and have been incorporated into seasonal hydro-climatic forecasting since the 1990s (Hamlet and Lettenmaier, 1999). Also, El Niño Southern Oscillation is shown to affect the precipitation in North Brazil, Southeast Asia, and Eastern and Western Australia (Emerton et al., 2017; Dai and Wigley, 2000). On the other hand, the projection is negative in North Asia and North Canada (Figure 3). This result is in accordance with the low and negative anomaly correlation across six initialization times therein (Figure 1). The overall low anomaly correlation in some regions suggests that GCMs are not yet perfect and can have limitations. For example, the anomaly correlation in the Great Lakes region in the Northeast United States and Southeast Canada is not satisfactory for CFSv2 forecasts; it is also deemed to be unsatisfactory for the other 9 sets of NMME forecasts under investigation (supplementary material). This outcome can be due to that the Great Lakes considerably influence the surrounding climate, but the resolution of GCMs is too coarse to realistically account for the impacts of the Great Lakes (Gula and Peltier, 2012).

4.4 Projection to the temporal pattern The projection to the temporal pattern is illustrated in Figure 4. Unfortunately, the temporal projection is no longer high in Northwest United States, North Brazil, Southeast Asia, and East Australia, where the spatial projection is prominently high. The implication is that though forecasts are overall skillful in these regions, they would not necessarily become increasingly more skillful as initialization time progresses. On the other hand, the projection is high in part of North Asia and North Canada. This result suggests that the anomaly correlation tends to improve with initialization time therein. 16

Figure 4. Projection of the anomaly correlation vectors to the temporal pattern for CFSv2 forecasts. The temporal projections under the other 9 sets of forecasts are shown in the supplementary material.

The temporal pattern is meant to reflect the temporal improvement of anomaly correlation. The improvement is generally attributable to data assimilation systems in GCMs (Bauer et al., 2015). Specifically, more observations, as well as model simulations, become available as initialization time progresses; the information is all assimilated into the GCM forecasting process. Conceptually, forecasts at a later initialization time are based on more information. From this perspective, the anomaly correlation is expected to increase with initialization time, which would contribute to the effectiveness of the temporal pattern in explaining the variation. However, the anomaly correlation turns out to change drastically with initialization time in many cases. There is thus a gap between the conceptual improvement and the actual drastic changes. This gap implies the difficulty of assimilating information from multiple sources at a global scale (Merryfield et al., 2013; Saha et al., 2014; Jia et al., 2015). Nevertheless, the existence of the temporal pattern does suggest that data assimilation works. As assimilation algorithms become more efficient, the temporal pattern is expected to be more effective.

17

4.5 Results for 10 sets of NMME forecasts The variation of anomaly correlation has been investigated for all the 10 sets of NMME forecasts (please refer to the supplementary material for the details). The 10 first and second eigenvectors are obtained and their effectiveness is evaluated. In Figures 5a and 5b, the two patterns are illustrated with the first two eigenvectors. It can be observed that the spatial and temporal patterns are respectively similar to the first and second eigenvectors for all the 10 sets of forecasts in NMME. This result highlights the existence of the spatial and temporal patterns at the global scale. The percentage of the variation of anomaly correlation explained by the patterns and the eigenvectors is quantified and shown by bar plots in Figure 5c. The spatial pattern explains 57.2% of the variation for CFSv2, 38.8% for CanCM3, 48.3% for CanCM4, 39.2% for CCSM3, 50.6% for CCSM4, 50.2% for CM2p1, 54.1% for CM2p1-aer04, 52.0% for FLORA06, 50.9% for FLOR-B01, and 52.8% for CESM1. The temporal pattern is not as prominent. It explains 12.7% of the variation for CFSv2, 14.6% for CanCM3, 13.5% for CanCM4, 15.1% for CCSM3, 11.6% for CCSM4, 13.2% for CM2p1, 11.5% for CM2p1-aer04, 12.5% for FLOR-A06, 13.3% for FLOR-B01, and 11.1% for CESM1. The stacked bar charts in Figure 5c indicate that the two patterns and the first two eigenvectors are similar not only in the loadings on the 6 initialization times (Figures 5a and 5b), but also in the effectiveness in explaining the variation of anomaly correlation.

18

Figure 5. Effectiveness of the two patterns in characterizing the spatial-temporal variation of anomaly correlation for the 10 sets of NMME forecasts. a) Similarity of the spatial pattern to the first 10 eigenvectors, b) Similarity of the temporal pattern to the second 10 eigenvectors, c) Percentages of variation explained by the spatial and temporal patterns and the first two eigenvectors

Furthermore, the percentage of explained variation is plotted against the average variation for the 10 sets of forecasts in Figure 6. For CMC1-CanCM3 and CMC2-CanCM4, the respective average variations are 0.265 and 0.278, and the respective percentages of explained variations are 53.4% and 61.8%. For COLA-RSMAS-CCSM3 and COLA-RSMAS-CCSM4, the 19

respective average variations are 0.251 and 0.288, and the respective percentages of explained variations are 54.3% and 62.2%. The upgrades in the GCMs appear to contribute to the variation of anomaly correlation. Also, they tend to contribute to the effectiveness of the two patterns in explaining the variation. For the 10 sets of NMME forecasts, a linear relationship exists between the average variation and the percentage of explained variation. The correlation coefficient is as high as 0.94.

Figure 6. Relationship between the average variation of anomaly correlation and the percentage of explained variation for the 10 sets of NMME forecasts

For Figure 6, it is noted that the dot in the top-right corner is for CFSv2 forecasts. In other words, among the 10 sets of forecasts, CFSv2 forecasts exhibit the highest average variation and are with the highest percentage of variation explained by the spatial and temporal patterns. This result is in accordance with a previous finding that CFSv2 exhibits the highest anomaly correlation skill in NMME (Kirtman et al., 2014). Despite this result, it is noted that in some regions, e.g., North Asia and North Canada, CFSv2 can still be outperformed by other GCMs in terms of the spatial and temporal projections (supplementary material). This 20

outcome highlights that besides global investigation, it is also important to consider regional difference of predictive performance for the use of GCM forecasts. The spatial and temporal patterns can serve as exploratory tools to investigate the predictive performance at both global and regional scales.

5. Discussion This paper has investigated the spatial and temporal patterns of anomaly correlation for 10 sets of NMME seasonal forecasts. Overall, the two patterns are effective and explain 50% to 70% of the variation of anomaly correlation. In this paper, the analysis of the patterns is based on characterizing anomaly correlation at a single grid cell as a vector and pooling vectors across the grid cells. The mathematical formulations are demonstrated to be effective in addressing the variation of anomaly correlation at the global scale. Flexibility is an advantage of this approach. While it is applied to anomaly correlation, it is applicable to other measures of predictive performance, such as mean absolute error and root mean squared error. Computational efficiency is another advantage. PCA and the related projections rely on linear algebraic operations (Hotelling, 1933; Demsar et al., 2013). They can be adapted to handle large datasets, for example, forecasts produced by regional climate models (RCMs) and numerical weather predictions (NWPs) that have higher spatial resolutions and are more frequently initialized. The projections of the anomaly correlation vectors onto the spatial and temporal patterns yield insights into the use of GCM forecasts. For regions with positive projection onto the spatial pattern, the anomaly correlation tends to be at a high level in January, February, …, and June. For these regions, hydrological forecasting systems can be developed to leverage GCM forecasts to predict streamflow months ahead (Hamlet and Lettenmaier, 1999; Emerton et al., 2017; Lehner et al., 2017). For regions with positive projection onto the temporal pattern, the anomaly correlation tends to steadily improve from January to June. For those regions, it is important to incorporate timely updated GCM forecasts into hydrological forecasting (Schepen and Wang, 2014; Bennett et al., 2017). On the other hand, there exist regions with low projections onto the spatial and temporal patterns. The implication is that the anomaly correlation of GCM forecasts is neither high at the six initialization times, nor improve with initialization time. Therein, hydrological forecasting may need to seek information other than GCM forecasts, such as RCM forecasts and NWPs. 21

6. Conclusions GCMs produce informative seasonal forecasts of global precipitation months ahead of its occurrence. In this paper, we propose two patterns to characterize the predictive performance, as is indicated by anomaly correlation, of GCM forecasts of global precipitation. The projections of anomaly correlation vectors to the two patterns help illustrate where and when the GCM precipitation forecasts are skillful. The spatial pattern locates where high (low) anomaly correlation at one initialization time coincides with high (low) anomaly correlation at other initialization times. The temporal pattern tells how anomaly correlation improves with initialization time. Overall, the two patterns explain 50% to 70% of the variation of anomaly correlation for the 10 sets of NMME seasonal forecasts of global precipitation in June, July, and August. Future studies can analyze GCM forecasts for other seasons and apply the spatial and temporal to investigate the predictive performance. Apart from precipitation, GCMs also produce forecasts of other climate variables of interest, such as temperature, wind speed, and solar radiation. In the future, efforts can be devoted to investigating the patterns for more variables and interpreting the predictive performance of GCM forecasts for users.

Acknowledgments: We are grateful to the editors and reviewers for the constructive comments that have led to substantial improvements to the paper. This study is supported by the National Natural Science Foundation of China (NSFC) and the Ministry of Science and Technology of China (MSTC). Both the observations and the forecasts are downloaded from the International Research Institute for Climate and Society, Earth Institute, Columbia University (https://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/).

References [1]

Barnston, A. G.; Lyon, B., Does the NMME Capture a Recent Decadal Shift toward Increasing Drought Occurrence in the Southwestern United States? Journal of Climate 2016, 29 (2), 561-581. 22

[2]

Bauer, P.; Thorpe, A.; Brunet, G., The quiet revolution of numerical weather prediction. Nature 2015, 525 (7567), 47-55.

[3]

Becker, E.; van den Dool, H.; Zhang, Q., Predictability and Forecast Skill in NMME. Journal of Climate 2014, 27 (15), 5891-5906.

[4]

Bennett, J.C.; Wang, Q.J.; Robertson, D.E.; Schepen, A.; Li, M.; Michael, K., Assessment of an ensemble seasonal streamflow forecasting system for Australia. Hydrology and Earth System Sciences 2017, 21(12), 6007-6030.

[5]

Crochemore, L.; Ramos, M. H.; Pappenberger, F., Bias correcting precipitation forecasts to improve the skill of seasonal streamflow forecasts. Hydrology and Earth System Sciences 2016, 20 (9), 3601-3618.

[6]

Dai, A.; Wigley, T. M. L., Global patterns of ENSO-induced precipitation. Geophysical Research Letters 2000, 27 (9), 1283-1286.

[7]

Demsar, U.; Harris, P.; Brunsdon, C.; Fotheringham, A. S.; McLoone, S., Principal Component Analysis on Spatial Data: An Overview. Annals of the Association of American Geographers 2013, 103 (1), 106-128.

[8]

Dunstone, N.; Smith, D.; Scaife, A.; Hermanson, L.; Eade, R.; Robinson, N.; Andrews, M.; Knight, J., Skilful predictions of the winter North Atlantic Oscillation one year ahead. Nature Geoscience 2016, 9 (11), 809-+.

[9]

Emerton, R.; Cloke, H. L.; Stephens, E. M.; Zsoter, E.; Woolnough, S. J.; Pappenberger, F., Complex picture for likelihood of ENSO-driven flood hazard. Nature Communications 2017, 8.

[10] Gula, J.; Peltier, W. R., Dynamical Downscaling over the Great Lakes Basin of North America Using the WRF Regional Climate Model: The Impact of the Great Lakes System on Regional Greenhouse Warming. Journal of Climate 2012, 25 (21), 77237742. [11] Hamlet, A. F.; Lettenmaier, D. P., Columbia River streamflow forecasting based on ENSO and PDO climate signals. Journal of Water Resources Planning and Management-Asce 1999, 125 (6), 333-341. [12] Hotelling, H., Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 1933, 24, 498-520. [13] Jia, L. W.; Yang, X. S.; Vecchi, G. A.; Gudgel, R. G.; Delworth, T. L.; Rosati, A.; Stern, W. F.; Wittenberg, A. T.; Krishnamurthy, L.; Zhang, S. Q.; Msadek, R.; Kapnick, S.; Underwood, S.; Zeng, F. R.; Anderson, W. G.; Balaji, V.; Dixon, K., 23

Improved Seasonal Prediction of Temperature and Precipitation over Land in a HighResolution GFDL Climate Model. Journal of Climate 2015, 28 (5), 2044-2062. [14] Kim, H. M.; Webster, P. J.; Curry, J. A., Seasonal prediction skill of ECMWF System 4 and NCEP CFSv2 retrospective forecast for the Northern Hemisphere Winter. Climate Dynamics 2012, 39 (12), 2957-2973. [15] Kirtman, B. P.; Min, D.; Infanti, J. M.; Kinter, J. L.; Paolino, D. A.; Zhang, Q.; van den Dool, H.; Saha, S.; Mendez, M. P.; Becker, E.; Peng, P. T.; Tripp, P.; Huang, J.; DeWitt, D. G.; Tippett, M. K.; Barnston, A. G.; Li, S. H.; Rosati, A.; Schubert, S. D.; Rienecker, M.; Suarez, M.; Li, Z. E.; Marshak, J.; Lim, Y. K.; Tribbia, J.; Pegion, K.; Merryfield, W. J.; Denis, B.; Wood, E. F., THE NORTH AMERICAN MULTIMODEL ENSEMBLE Phase-1 Seasonal-to-Interannual Prediction; Phase-2 toward Developing Intraseasonal Prediction. Bulletin of the American Meteorological Society 2014, 95 (4), 585-601. [16] Lehner, F.; Wood, A.W.; Llewellyn, D.; Blatchford, D.B.; Goodbody, A.G.; Pappenberger, F., Mitigating the impacts of climate nonstationarity on seasonal streamflow predictability in the US Southwest. Geophysical Research Letters 2017, 44 (24), 12208-12217. [17] Luo, L. F.; Tang, W.; Lin, Z. H.; Wood, E. F., Evaluation of summer temperature and precipitation predictions from NCEP CFSv2 retrospective forecast over China. Climate Dynamics 2013, 41 (7-8), 2213-2230. [18] Ma, F.; Ye, A. Z.; Deng, X. X.; Zhou, Z.; Liu, X. J.; Duan, Q. Y.; Xu, J.; Miao, C. Y.; Di, Z. H.; Gong, W., Evaluating the skill of NMME seasonal precipitation ensemble predictions for 17 hydroclimatic regions in continental China. International Journal of Climatology 2016, 36 (1), 132-144. [19] McEvoy, D. J.; Huntington, J. L.; Mejia, J. F.; Hobbins, M. T., Improved seasonal drought forecasts using reference evapotranspiration anomalies. Geophysical Research Letters 2016, 43 (1), 377-385. [20] Merryfield, W. J.; Lee, W. S.; Boer, G. J.; Kharin, V. V.; Scinocca, J. F.; Flato, G. M.; Ajayamohan, R. S.; Fyfe, J. C.; Tang, Y. M.; Polavarapu, S., The Canadian Seasonal to Interannual Prediction System. Part I: Models and Initialization. Monthly Weather Review 2013, 141 (8), 2910-2945. [21] Saha, S.; Moorthi, S.; Wu, X. R.; Wang, J.; Nadiga, S.; Tripp, P.; Behringer, D.; Hou, Y. T.; Chuang, H. Y.; Iredell, M.; Ek, M.; Meng, J.; Yang, R. Q.; Mendez, M. 24

P.; Van Den Dool, H.; Zhang, Q.; Wang, W. Q.; Chen, M. Y.; Becker, E., The NCEP Climate Forecast System Version 2. Journal of Climate 2014, 27 (6), 2185-2208. [22] Schepen, A.; Zhao, T. T. G.; Wang, Q. J.; Robertson, D. E., A Bayesian modelling method for post-processing daily sub-seasonal to seasonal rainfall forecasts from global climate models and evaluation for 12 Australian catchments. Hydrology and Earth System Sciences 2018, 22 (2), 1615-1628. [23] Schepen, A.; Wang, Q.J., Ensemble forecasts of monthly catchment rainfall out to long lead times by post-processing coupled general circulation model output. Journal of hydrology 2014, 519, 2920-2931. [24] Sheffield, J.; Wood, E. F.; Chaney, N.; Guan, K. Y.; Sadri, S.; Yuan, X.; Olang, L.; Abou, A.; Ali, A.; Demuth, S.; Ogallo, L., A drought monitoring and forecasting system for Sub-Sahara African water resources and food security. Bulletin of the American Meteorological Society 2014, 95 (6), 861-+. [25] Shukla, S.; Roberts, J.; Hoell, A.; Funk, C. C.; Robertson, F.; Kirtman, B., Assessing North American multimodel ensemble (NMME) seasonal forecast skill to assist in the early warning of anomalous hydrometeorological events over East Africa. Climate Dynamics 2016. [26] Slater, L. J.; Villarini, G.; Bradley, A. A., Weighting of NMME temperature and precipitation forecasts across Europe. Journal of Hydrology 2017, 552, 646-659. [27] Slater, L. J.; Villarini, G.; Bradley, A. A., Evaluation of the skill of North-American multi-model ensemble (NMME) global climate models in predicting average and extreme precipitation and temperature over the continental USA. Climate Dynamics 2016, 1-16. [28] Tippett, M. K.; Ranganathan, M.; L’Heureux, M.; Barnston, A. G.; DelSole, T., Assessing probabilistic predictions of ENSO phase and intensity from the North American Multimodel Ensemble. Climate Dynamics 2017, 1-22. [29] Wilks, D. S.; Hamill, T. M., Comparison of ensemble-MOS methods using GFS reforecasts. Monthly Weather Review 2007, 135 (6), 2379-2390. [30] Xie, P. P.; Arkin, P. A., Global precipitation: A 17-year monthly analysis based on gauge observations, satellite estimates, and numerical model outputs. Bulletin of the American Meteorological Society 1997, 78 (11), 2539-2558. [31] Xie, P. P.; Arkin, P. A.; Janowiak, J. E., CMAP: The CPC merged analysis of precipitation. In Measuring precipitation from space, Springer: 2007; pp 319-328. 25

[32] Yuan, X.; Wood, E. F., Multimodel seasonal forecasting of global drought onset. Geophysical Research Letters 2013, 40 (18), 4900-4905. [33] Yuan, X.; Wood, E. F.; Luo, L. F.; Pan, M., A first look at Climate Forecast System version 2 (CFSv2) for hydrological seasonal prediction. Geophysical Research Letters 2011, 38. [34] Zhao, T. T. G.; Chen, X. H.; Liu, P.; Zhang, Y. Y.; Liu, B. J.; Lin, K. R., Relating Anomaly Correlation to Lead Time: Principal Component Analysis of NMME Forecasts of Summer Precipitation in China. Journal of Geophysical ResearchAtmospheres 2018, 123 (11), 6039-6052. [35] Zhao, T. T. G.; Liu, P.; Zhang, Y. Y.; Ruan, C. Q., Relating anomaly correlation to lead time: Clustering analysis of CFSv2 forecasts of summer precipitation in China. Journal of Geophysical Research-Atmospheres 2017a, 122 (17), 9094-9106. [36] Zhao, T. T. G.; Bennett, J.C.; Wang, Q.J.; Schepen, A.; Wood, A.W.; Robertson, D.E.; Ramos, M.H., How suitable is quantile mapping for postprocessing GCM precipitation forecasts? Journal of Climate 2017b, 30(9), pp.3185-3196.

Highlights

Anomaly correlation persistently exhibits spatial and temporal patterns at the global scale for 10 sets of NMME seasonal precipitation forecasts

The spatial pattern reveals that high (low) anomaly correlation at one initialization time coincides with high (low) anomaly correlation at other initialization times

The temporal pattern suggests that the anomaly correlation tends to improve with initialization time

26

27

28

29

30

Declaration of interests

☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

31

32