Journal Pre-proof Ensemble of 4DVARs (En4DVar) data assimilation in a coastal ocean circulation model. Part I: Methodology and ensemble statistics Ivo Pasmans, Alexander L. Kurapov
PII: DOI: Reference:
S1463-5003(19)30119-2 https://doi.org/10.1016/j.ocemod.2019.101493 OCEMOD 101493
To appear in:
Ocean Modelling
Received date : 5 May 2019 Revised date : 2 October 2019 Accepted date : 4 October 2019 Please cite this article as: I. Pasmans and A.L. Kurapov, Ensemble of 4DVARs (En4DVar) data assimilation in a coastal ocean circulation model. Part I: Methodology and ensemble statistics. Ocean Modelling (2019), doi: https://doi.org/10.1016/j.ocemod.2019.101493. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd.
Journal Pre-proof
A data assimilation system with ensemble-based error statistics was constructed. The cluster search method was presented to speed-up computation. Realistic spatial dependence in wind forcing errors was estimated from observations.
Jo
urn al P
re-
pro
of
Performance was in general agreement with theoretical expectations.
Journal Pre-proof
4
Ivo Pasmansa,1 , Alexander L. Kurapova,b
5 6 7
a
pro
2
of
3
Ensemble of 4DVARs (En4DVar) data assimilation in a coastal ocean circulation model. Part I: Methodology and ensemble statistics
1
College of Earth Ocean & Atmospheric Sciences, Oregon State University, Corvallis, OR, USA b NOAA, Silver Spring, MD, USA
Abstract
9
The ocean state off Oregon-Washington, U.S. West coast, is highly vari-
10
able in time. Under these conditions the assumption made in traditional
11
4-dimensional variational data assimilation (4DVAR) that the prior model
12
(background) error covariance is the same in every data assimilation (DA)
13
window can be limiting. A DA system based on an ensemble of 4DVARs
14
(En4DVar) has been developed in which the background error covariance is
15
estimated from an ensemble of model runs and is thus time-varying. This
16
part describes details of the En4DVar method and ensemble statistics ver-
17
ification tests. The control run and 39 ensemble members are forced by
18
perturbed wind fields and corrected by DA in a series of 3-day windows.
19
Wind perturbations are represented as a linear combination of empirical or-
20
thogonal functions (EOFs) for the larger scales and Debauchies wavelets for
21
the smaller scales. The variance of the EOF expansion coefficients is based
22
on estimates of the wind field error statistics derived using scatterometer
23
observations and a Bayesian Hierarchical Model. It is found that the vari-
24
ance of the wind errors relative to the natural wind variability increases as
Jo
urn al P
re-
8
Preprint Ocean Modelling (Ivo Pasmans) Emailsubmitted address: to
[email protected] 1 Now at the University of New Orleans, New Orleans, LA, USA
October 2, 2019
Journal Pre-proof
the horizontal spatial scales decrease. DA corrections to the control run and
26
ensemble members are calculated in parallel by the newly developed, cost-
27
effective cluster search minimization method. For a realistic coastal ocean
28
application, this method can generate a 30% wall time reduction compared
29
to the restricted B-conjugate gradient (RBCG) method. Ensemble statis-
30
tics are generally found to be consistent with background error statistics. In
31
particular, ensemble spread is maintained without inflating. However, sea-
32
surface height background errors can not be fully reproduced by the ensemble
33
perturbations.
34
Keywords: 4DVAR, Coastal ocean, Data assimilation, Ensemble,
35
Numerical modelling, USA, Oregon
36
1. Introduction
urn al P
re-
pro
of
25
Data assimilation (DA) is a procedure, e.g., used in meteorology and
38
oceanography, in which the output of a numerical model is combined with
39
observations to find the most-likely estimate for the true state of the system.
40
DA algorithms require specification of the error statistics for the model and
41
the observations. These statistics are often assumed to follow a multidimen-
42
sional normal distribution with zero mean and a covariance that is static in
43
time, i.e., the same from one assimilation cycle to the next. An example
44
of such a DA system is the Oregon State University coastal ocean forecast
45
system in an area offshore Oregon-Washington (OR-WA) at the U.S. West
46
coast (Kurapov et al., 2011; Pasmans et al., 2019; Yu et al., 2012). This sys-
47
tem applies the 4DVAR algorithm in a series of consecutive 3-day windows.
48
Initial conditions at the beginning of each window are corrected to yield
Jo
37
2
Journal Pre-proof
the nonlinear analysis that fits observations in this window. The simulation
50
is then continued for another three days to provide the forecast. Summer
51
dynamics in this region are dominated by the wind-forced upwelling and re-
52
laxation and the outflow of the Columbia River (Hickey et al., 2005, 2010;
53
Huyer, 1983; Liu et al., 2009). In such a dynamic environment it is unlikely
54
that the stationarity assumption on the error statistics holds. In particular,
55
the temperature-salinity model error covariance will strongly depend on the
56
presence of the river plume.
pro
of
49
Over the past three decades, ensemble methods have been developed in
58
meteorology to deal with non-stationarity in the error statistics. In these
59
methods, the forecast (“background”) error covariance is estimated from an
60
ensemble of perturbed model runs. One of the earliest, and most popular,
61
examples of such a method is the ensemble Kalman filter (Anderson, 2001;
62
Bishop et al., 2001; Evensen, 1994; Lermusiaux and Robinson, 1999). More
63
recently in meteorology, these ensemble Kalman filter systems have been com-
64
bined with 4DVAR systems in which the background error covariance used
65
by the 4DVAR system is estimated using the ensemble from the Kalman filter
66
system (Buehner et al., 2009; Clayton et al., 2013; Zhang and Zhang, 2012).
67
In this manuscript, we describe our approach to using an ensemble of 4DVARs
68
(En4DVar) to provide a state-dependent background error covariance. This
69
methodology will be tested with the OR-WA 4DVAR system. Generaliza-
70
tion of the ensemble methodology to an ensembles of 4DVARs is nontrivial
71
for three reasons. First, 4DVAR is computationally intensive. Calculation
72
of a DA correction requires minimization of a cost function, or equivalently,
73
solving a linear system with some symmetric and positive definite matrix
Jo
urn al P
re-
57
3
Journal Pre-proof
A. This matrix is large and generally not available in explicit form. Instead
75
only the algorithm for the product of A and a vector is at hand. An iterative
76
method, e.g., the restricted B-conjugate gradient (RBCG) algorithm, is used
77
to find an approximate solution of this system. Each iteration requires prop-
78
agation of the tangent linear (TL) model over the analysis period, forward
79
in time, and its adjoint counterpart (ADJ), backward in time. For practical
80
systems, the 4DVAR cycle can take 10-100 times as much time as a single
81
forward model run. Second, En4DVar only compounds this problem as it
82
requires running the computationally intensive 4DVAR algorithm for each
83
ensemble member. Third, the ensemble has to be initialized and evolved in
84
such a manner that its covariance is representative of the background error
85
covariance.
re-
pro
of
74
Over the last decade, much effort has been put into overcoming these chal-
87
lenges. Several solutions have been found. Instead of applying full 4DVAR
88
to each ensemble member, it has become customary to calculate a low-rank
89
approximation to A, using e.g. Ritz pairs found by the Lanczos algorithm
90
(Trefethen and Bau, 1997). In the Ensemble-Variational Integrated Localized
91
Data Assimilation (EVIL) methodology (Aulign´e et al., 2016), minimization
92
of the cost function is only carried out for the control, or deterministic,
93
model run. From the Ritz vectors obtained as a by-product from this min-
94
imization, a low-rank approximation of A is constructed. The inverse of
95
this approximation is then used to solve the linear system for the ensemble
96
members. Desroziers and Berre (2012); Lorenc et al. (2017) followed similar
97
approaches, but use the Ritz pairs solely to construct a preconditioner. Ad-
98
vances have also been made to speed up the 4DVAR minimization algorithm
Jo
urn al P
86
4
Journal Pre-proof
itself. Parallelisation can be applied to the TL and ADJ models by assigning
100
the calculations for different parts of the domain to different processor cores,
101
as well as to the minimization algorithm. The former is currently standard
102
practice, while the latter is still an area of active research. One popular
103
approach is to use an ensemble of concurrently produced nonlinear model
104
runs to generate approximations of the TL and ADJ model. Examples of
105
this approach are 4DEnVar (Amezcua et al., 2017; Desroziers et al., 2014;
106
Gustafsson and Bojarova, 2014; Liu et al., 2008; Tian et al., 2017) and the
107
Ensemble Kalman Smoother-4DVAR (EKS-4DVAR) (Mandel et al., 2016).
108
4DEnVar and EKS-4DVAR can be used to minimise the same cost func-
109
tion, but their implementation differs in two major ways. First, 4DEnVar
110
uses all observations within a DA window to correct the initial condition,
111
while EKS-4DVAR processes the observations in batches with each batch
112
generating corrections to the model at, and prior to, the batch time. Sec-
113
ond, both in En4DVar and EKS-4DVAR the propagation of perturbations
114
from the background state to the next time step and into the observation
115
space is carried out by a finite-difference scheme involving the nonlinear
116
model and nonlinear sampling operators. The finite-difference scheme in
117
EKS-4DVAR uses a smaller step size and thus better approximates the tan-
118
gent linear model and linearised sampling operators used in classic 4DVAR.
119
Background error localization in these methods is non-trivial. In absence of
120
a TL model to propagate the localised background error covariance forward
121
in time, localization is often assumed to be static in these methods. For
122
limited-size ensembles with non-dense observation networks this can lead to
123
a decrease in forecast performance compared to variational methods that do
Jo
urn al P
re-
pro
of
99
5
Journal Pre-proof
use TL and ADJ models (Poterjoy and Zhang, 2015; Poterjoy et al., 2016).
125
A similar problem is encountered when attempting to implement a hybrid
126
background covariance, a linear combination of an ensemble and climato-
127
logical, static covariance, in these 4DEnVar systems. The use of a hybrid
128
background covariance was found to improve the accuracy of the forecasts
129
produced by traditional 4DVAR systems (Clayton et al., 2013; Kuhl et al.,
130
2013). More specifically, it was found that the best fit to the assimilated
131
observations is achieved when the climatological part makes up the major
132
part of the background covariance (Clayton et al., 2013; Lorenc and Jardak,
133
2018). However, without TL and ADJ models to propagate the covariance
134
back and forth in time, hybrid 4DEnVar failed to provide the same benefits
135
(Lorenc et al., 2015). One different approach to parallelisation that does not
136
suffer from these problems is taken by Rao and Sandu (2016) and Fisher
137
and G¨ urol (2017). They make use of the TL and ADJ model and parallelize
138
the 4DVAR minimization algorithm in time. This is done by dividing the
139
analysis period into separate time intervals. The DA correction is found by
140
minimizing a cost function that consists of the sum of the 4DVAR cost func-
141
tions for each interval plus an additional term representing the constraint
142
that the correction should be continuous going from one interval to another.
143
Another recent approach that circumvents the problems with localisation and
144
hybrid-covariances encountered in En4DVar is the Localized Ensemble-Based
145
Tangent Linear Model (LETLM) in which the matrix for the tangent linear
146
model is not constructed using a linearised version of the extensive nonlinear
147
model, but retrieved from a simple regression against the ensemble members
148
(Allen et al., 2017; Bishop et al., 2017; Frolov and Bishop, 2016; Frolov et al.,
Jo
urn al P
re-
pro
of
124
6
Journal Pre-proof
149
2018). Yet another alternative approach, explored in this paper, is to try to
151
accelerate the linear system solver by using several search directions in par-
152
allel. The current OR-WA system uses the restricted B-conjugate gradient
153
method, RBCG (G¨ urol et al., 2014). In this method, an approximation to the
154
linear system solution is sought in a low-dimensional Krylov space and the
155
search space dimension is equal to the number of iterations. Several generic
156
iterative solvers have already been developed in which the search space di-
157
mension grows faster than that. Consequently, less iterations are necessary
158
to produce a good approximation to the solution. Among these are the En-
159
larged Krylov space method (Grigori et al., 2016) in which the search space
160
is expanded by multiple directions per iteration simultaneously or the Aug-
161
mented Krylov space methods where extra search directions are added to
162
the system coming either from an earlier attempt to solve a similar system
163
(Erhel and Guyomarc’h, 2000; Morgan, 1995), or from the eigenvectors of
164
a preconditioner (Kharchenko and Yeremin, 1995), or from an attempt to
165
solve the system with a different initial residual (Chapman and Saad, 1996).
166
Additional search directions can also lie outside the Krylov subspace. E.g.
167
Yaremchuk et al. (2017) uses model-based Empirical Orthogonal Functions
168
(EOFs) to create search directions. Once a general search space is defined,
169
the approximation can be defined as the vector in the search space that has
170
the smallest distance to the exact solution in some appropriately chosen norm
171
(Brezinski, 1999).
173
pro
re-
urn al P
Jo
172
of
150
In the En4DVar system proposed in this this paper, the EVIL method
(Aulign´e et al., 2016) is parallelized using a new variation of the enlarged
7
Journal Pre-proof
and augumented Krylov space methods. Two principal and novel elements
175
of the En4DVar will be described. The first is the cluster search method,
176
used to enlarge the search space at each iteration at a price of running a
177
relatively small number of TL-ADJ applications in parallel. The second is
178
the use of a Bayesian Hierarchical Model to estimate the magnitude of the
179
wind forcing perturbations for the ensemble members. Part II of this study
180
(Pasmans et al., in preparation) will include (a) a comparison of the ensemble
181
background error covariance produced by the En4DVar system and the static
182
background error covariance based on the balance operator and (b) compar-
183
ative tests of En4DVar and traditional 4DVAR implemented in the OR-WA
184
coastal ocean forecast system. This paper is organized as follows: section 2
185
describes the experimental setup and the layout of the En4DVar system.
186
Derivation of the cluster search method is presented in section 3. Wind per-
187
turbations for the ensemble members are discussed in section 4. In section 5
188
we check if the En4DVar statistics are representative of the background error
189
statistics. Discussion and conclusions are presented in section 6.
190
2. The En4DVar System
urn al P
re-
pro
of
174
Pasmans et al. (2019) describe the OR-WA coastal ocean forecast sys-
192
tem in every detail, implemented as standard 4DVAR with a static back-
193
ground error covariance. A short summary is only provided here. The non-
194
linear model dynamics are described by the Regional Ocean Modeling Sys-
195
tem (ROMS, www.myroms.org) integrating three-dimensional, fully nonlin-
196
ear, hydrostatic, Boussinesq equations featuring advanced numerics (Shchep-
197
etkin and McWilliams, 2003, 2005). The model domain is shown in Fig-
Jo
191
8
re-
pro
of
Journal Pre-proof
urn al P
Figure 1: The model domain and observations assimilated in the window of 26-28 August, 2011: (a) high-frequency radar (HFR) daily-averaged radial velocity components, (b) seasurface temperature (SST) and (c) sea-surface height (SSH, absolute dynamic topography minus the mean along each satellite track).
198
ure 1. The model resolution is approximately 2 km in the horizontal and 40
199
terrain-following layers in the vertical direction. The computational grid has
200
310 × 522 points. Non-tidal boundary conditions are taken from the global 1/12◦ Hycom-NCODA analyses (COAPS, 2015). Tidal forcing is added along
202
the open boundaries (Egbert and Erofeeva, 2002, 2010). Atmospheric surface
203
forcing is calculated based on the bulk flux algorithm (Fairall et al., 2003) us-
204
ing the 12-km resolution Northern American Mesoscale (NAM) model anal-
205
ysis fields (NOAA, 2011b). The fresh water discharge from the Columbia
206
River, Fraser River, and 15 small rivers in Puget Sound is also included. Each
Jo
201
9
Journal Pre-proof
207
hour the three-dimensional ocean state calculated by the model is saved to
208
disk. While standard ROMS includes TL and ADJ models, these are tightly
210
integrated into the code such that implementing the En4DVar directly into
211
ROMS was too challenging for us as the users. Instead, we utilize the stand-
212
alone TL and ADJ AVRORA codes developed in-house (Kurapov et al.,
213
2009, 2011; Yu et al., 2012) and integrate these with the nonlinear ROMS
214
and other components of the En4DVar via Linux shell scripts, similarly to
215
how it is done in the present OR-WA operational forecast system. The TL
216
and ADJ runs are performed on a coarser, 4-km resolution model grid and
217
their output is interpolated to and from the 2-km model grid.
re-
pro
of
209
All computations have been carried out on the COMET cluster with
219
computer allocations made available through the XSEDE framework (Towns
220
et al., 2014). Both the nonlinear ROMS and TL-ADJ AVRORA codes are
221
run using message passing interface (MPI) parallelisation. The model grid
222
is divided into horizontal tiles and computation in the interior of each tile is
223
performed on a separate processor core. Owing to a relatively small grid size,
224
a small number of Ncores = 6 tiles are used for each instance of the ADJ and
225
TL model. More nodes are available to us, and later in this paper we discuss
226
how these can be used to speed up the iterative minimization algorithm.
urn al P
218
In this paper results from two experiments are compared, Ens and No DA.
228
In experiment Ens, the En4DVar system is used to run M = 40 instances
229
of the model forward in time. In the discussions below the run with index
230
m = 0 is referred to as the control run, while instances m = 1, 2, . . . , M − 1
231
are referred to as ensemble members. Since the dynamics in this region
Jo
227
10
Journal Pre-proof
are dominated by the wind forcing, we assume that the errors in the wind
233
velocity are the dominant model error source. To include this error into the
234
error statistics, the nonlinear forecasts for each ensemble member are run
235
with perturbed wind velocities as detailed in section 4. It will be evaluated
236
in section 5 whether the addition of these perturbations alleviates the need for
237
the customary ensemble inflation (Anderson, 2001; Anderson and Anderson,
238
1999; Hamill et al., 2001). No wind perturbations are added to the control
239
run. In our system, the analyses and forecasts from the unperturbed control
240
run are considered to provide the best estimate of the ocean state. Unless
241
explicitly stated otherwise, the control DA run is compared with the results
242
from experiment No DA in section 5. Since no perturbations are added to the
243
control run, the probability distributions of the errors in the control run will
244
deviate from that in the ensemble members. Therefore the control run is not
245
utilized in the calculation of B. The ensemble members are all initialized from
246
the same no-DA model output on 10 March 2011 and propagated forward
247
in time without DA using the perturbed winds thus generating an ensemble
248
of perturbed ocean states on 19 April 2011. Both Ens and No DA cases are
249
then run and compared over the period from 19 April 2011 to 1 October
250
2011.
urn al P
re-
pro
of
232
The set of observations for assimilation includes radial surface currents
252
from high-frequency radars (HFR), alongtrack altimetry, and satellite sea-
253
surface temperature (SST). Although the model includes tides, mainly to
254
include their effect on river and ocean water mixing, our focus is on correct-
255
ing subtidal variability. Surface tidal currents can be dominanted by non-
256
stationary internal tides (e.g. Kurapov et al., 2003; Osborne et al., 2011) that
Jo
251
11
Journal Pre-proof
are poorly predictable and poorly constrained by the available data. At the
258
same time the daily-averaged HFR velocity data present a useful constraint
259
on the 3-day ocean forecasts (Yu et al., 2012). Following our practice in the
260
OR-WA system, we assimilate daily averaged HFR observations, matching
261
those to the daily-averaged model outputs. Altimetry observations can suffer
262
from large errors in the specification of the geoid. To suppress these and the
263
tidal errors in the DA, we assimilate differences from the mean along-track
264
SSH averaged over 24 hours. Details of the procedure can be found in Ap-
265
pendix A. More details on the data and their associated observational error
266
variances are described in Pasmans et al. (2019). The only difference here
267
is that the level 2 GOES-11 SST (Maturi et al., 2008) is used instead of
268
polar-orbiting satellite products. The observational error covariance matrix
269
R is diagonal.
urn al P
re-
pro
of
257
As shown schematically in Figure 2, DA proceeds in a series of 3-day
271
windows. At the beginning of each window, the initial conditions for the
272
control run and each ensemble member are corrected. Then every model
273
starts from the corrected initial conditions and is run forward using the non-
274
linear ROMS for the length of the assimilation window (3 days) to obtain
275
the analyses and continues for another 3 days to obtain the forecasts. The
276
3-day window length is long enough to let the non-tidal dynamics evolve and
277
provide dynamically based space and time interpolation of the data. It is
278
still short enough such that the correction to the initial conditions at the
279
beginning of the DA window impacts forecasts.
280
281
Jo
270
To explain the DA method in more detail, we combine the temperature,
salinity, sea surface height and horizontal velocity fields at the beginning
12
urn al P
re-
pro
of
Journal Pre-proof
Figure 2: Overview of the En4DVar system. Panels a, b, and c show the progression of tasks. In (a), the control run (task 1, solid line) and ensemble members (with the envelope shown as dashed lines, task 2) are run for six days. The first three days are the analyses (blue), the last three days are the forecasts (red). In (b), task 3 is B calculation from the
Jo
ensemble, and task 4 is the calculation of the DA corrections for the control run and the ensemble members using information from the observations (black circles). In (c), task 5 are the new six day model runs started from the corrected ocean states.
13
Journal Pre-proof
282
of the window and prior to DA into a vector of real numbers of length N :
283
xb
284
the window is denoted as y ∈ RD . The innovation vector for each ensemble
∈ RN , m = 0, . . . , M − 1. The vector containing all observations within
of
(m)
(m)
(m)
285
member is defined as d(m) = y − HM(xb ) + (m) . Here M(xb ) is the
286
nonlinear model trajectory started from the initial conditions xb
287
is the collection of data operators. A perturbation (m) is added to the
288
innovation vector for each ensemble member (m = 1, . . . , M − 1) to account
289
for the uncertainty in the analysis introduced by the presence of observational
290
errors (Burgers et al., 1998; Houtekamer and Mitchell, 1998). It is drawn from
291
a normal distribution with zero mean and covariance R. An overview of the
292
symbols used is included as Appendix B.
pro
and H
re-
293
(m)
(m)
The DA correction to the background state xb
is denoted as x(m) ∈ RN .
It is found by minimizing the following cost function for each m (Courtier
295
et al., 1994):
urn al P
294
1 1 T J(x(m) ) = x(m) B−1 x(m) + (d(m) −H M(m) x(m) )T R−1 (d(m) −H M(m) x(m) ). 2 2 (1) (m)
Here, M(m) is the TL model, linearised with respect to M(xb ). M(m),T
297
is the ADJ model. B is the background error covariance obtained as the
298
sample covariance of the ensemble members with localization as described
299
in (Pasmans and Kurapov, 2017). To ensure that B represents dynamics
300
on relatively slow, subtidal and subinertial temporal scales, each ensemble
301
member is time-averaged over the 24h time interval centred at the beginning
302
of the DA window, using the last 12h of the analysis and first 12h of the
303
forecast from the previous window, before it is used in the calculation of B.
304
The minimizer of (1) is sought as a solution of a linear, symmetric and
Jo
296
14
Journal Pre-proof
positive-definite system of equations that can take different forms, e.g., de-
306
pending on whether the solution is sought in a space of size N or D, and on
307
how the system is preconditioned. RBCG proved to be an efficient solver in
308
the data space of dimension D with good convergence (G¨ urol et al., 2014).
309
It finds approximations of x(m) that minimize the cost-function in (1) by
310
solving the system
pro
of
305
(2)
re-
def ˆ (m) ˆ (m) x ˆ (m) = d (I + R−1/2 H M(m) BM(m),T HT R−1/2 )ˆ x(m) = A
311
ˆ (m) = R−1/2 d(m) . ˆ (m) and d where x(m) = BM(m),T R−1/2 x
312
3. Cost Function Minimization
In this section we discuss several approaches to finding an approximation
314
ˆ (m) , the solution of (2), and propose the new, computationally efficient to x
315
cluster search algorithm. We recognize that some mathematical details can
316
be overwhelming to an educated reader who only wants to grasp the idea.
317
For that reason we first provide a “high level” overview in the beginning of
318
this session. This is followed by more formal sections 3.1-3.3 where all the
319
necessary theoretical and algorithmic details are documented.
320
321
322
urn al P
313
(m)
ˆ i , the i-th iteration approximation to x ˆ (m) , is sought in the In RBCG, x ˆ (m) ), where low-dimensional Krylov subspace Ki (d(m) , A
Ki (z, A) = span(z, Az, A2 z, . . . , Ai−1 z). This search space grows by one
dimension per iteration. Let I 0 be the number of iterations necessary to
324
ˆ (m) ˆ (m) . Then obtaining bring x within a certain prescribed error margin of x i
325
ˆ (m) x for each m can require a considerable amount of wall time as well as I0
Jo
323
15
Journal Pre-proof
computational resources. Indeed, every iteration requires the multiplication
327
ˆ (m) with a vector. This demands that for each m a single implementation of A
328
of the ADJ model over the analysis window is run, followed by application of
329
B, and the TL model. To carry this out, a total of Ncores ×M processor cores
330
need to be available in parallel (as described in more detail in section 3.1).
pro
of
326
ˆ (m) , for each m, could be Faster convergence to the exact solution x
332
ˆ (m) achieved by expanding the space in which x is sought with vectors that i
333
ˆ (m) ). Such vectors can be generated at no extra comlie outside Ki (d(m) , A
334
putational cost if we, similarly to Aulign´e et al. (2016), make the assumption
335
that
re-
331
def
M(m) ≈ M(0) = M,
(3)
ˆ (m) ≈ A ˆ (0) def ˆ In that case, the solution of (2) for and consequently A = A.
337
different m can be combined into one system of equations. For each m, the
338
search space will grow by M dimensions per iteration. This will allow to
339
ˆ (m) with the same target accuracy in I < I 0 iterations using approximate x
340
the block diagonal CG method (see section 3.2). This approach will poten-
341
tially exhibit faster convergence, but would still require the same Ncores × M
342
cores per iteration as an ensemble of regular RBCGs. Given presently avail-
343
able resources, this method would be feasible for our relatively small OR-WA
344
forecast system, but it can become prohibitively expensive for larger forecast
345
systems requiring Ncores = O(103 ) (e.g. Kurapov et al., 2017). For these
346
systems, the new cluster search method is introduced (see section 3.3). It
347
also depends on the assumption (3) and involves Ns new direction searches
348
at every iteration, where 1 ≤ Ns M . These new search directions are gen-
349
erated in parallel, requiring Ncores × Ns cores to be available simultaneously.
Jo
urn al P
336
16
Journal Pre-proof
It serves as a compromise between RBCG and block diagonal CG.
351
3.1. RBCG
352
ˆ (m) Using RBCG (G¨ urol et al., 2014), x satisfies i
of
350
ˆ (m) , A ˆ (m) ) ∀w ˆ ∈ Ki (d
pro
(m) ˆ (m) , A ˆ (m) ) : ||ˆ ˆ (m) ˆ x(m) ||E x ∈ Ki (d xi −ˆ x(m) ||E ≤ ||w−ˆ i
(4)
ˆ (m) can be found as a linear combination I.e., the i-th approximation to x
354
of the vectors spanning the i-th Krylov space that minimizes the solution
355
ˆ (m) A ˆ (m) w)1/2 with B ˆ (m) = A ˆ (m) − I = error in the E-norm ||w||E = (wT B
re-
353
356
ˆ (m) R−1/2 H M(m) BM(m),T HT R−1/2 . Then x is uniquely determined as the i
357
ˆ (m) , A ˆ (m) ): ˆ (m) on Ki (d E-projection of x
358
359
urn al P
ˆ i (V ˆ TB ˆ (m) A ˆ (m) V ˆ T )−1 V ˆ TB ˆ (m) A ˆ (m) x ˆ (m) ˆ (m) = V x i i i ˆ (m) ˆ i (V ˆ TB ˆ (m) A ˆ (m) V) ˆ −1 V ˆ TB ˆ (m) d = V i i
(5)
ˆ (m) ˆ i T−1 (V ˆ TB ˆV ˆ i )−1 V ˆ TB ˆ (m) d ˆ (m) x =V i i i i
(6)
or alternatively,
ˆ (m) , A ˆ i is equal to Ki (d ˆ (m) ) and where the column space of V
ˆ TB ˆ (m) V ˆ i )−1 V ˆ TB ˆ (m) A ˆ (m) V ˆ i . Here V ˆ i and Ti depend on m via A ˆ (m) Ti = (V i i
361
(m) ˆ (m) . This yields residuals ˆ ˆ (m) -orthogonal and d ri that are by construction B
362
ˆ i . We refer to the column vectors of V ˆ i as the search directions. As x ˆ (m) to V i
363
is a projection, it is independent of the search directions chosen as long as
364
they span the same space. In RBCG i + 1-th search direction would be
365
ˆ i . Here the i + 1-th search chosen to be E-orthogonal, i.e. conjugate, to V
366
direction is chosen to be equal to ri
Jo
360
(m)
17
ˆ (m) -orthogonal to V ˆ i . I.e. which is B
Journal Pre-proof
ˆ i+1 = [V ˆ i , r(m) ]. In this case, the E-orthonormalization is contained in V i
368
ˆ TB ˆ (m) V ˆ i )−1 in (6). The pseudo-code for this method is included in T−1 (V i
369
Table C.8.
370
3.2. Full Parallelisation: Block Diagonal Conjugate Gradient Method
372
ˆ (m) = A, ˆ B ˆ (m) = B ˆ and (2) for the different m If (3) is assumed then A
pro
371
of
367
can be combined into a single linear system ˆX ˆ = D, ˆ A
374
ˆ (0) , d ˆ (1) , . . . , d ˆ (M −1) ] ∈ RD×M . ˆ = [ˆ ˆ = [d ˆ (1) , . . . , x ˆ (M −1) ] and D where X x(0) , x
re-
373
ˆ i can be found as Similar to (6), the i-th approximation X
ˆi = V ˆ i (V ˆ TB ˆA ˆV ˆ i )−1 V ˆ TB ˆA ˆX ˆ =V ˆ i T−1 (V ˆ TB ˆV ˆ i )−1 V ˆ TB ˆD ˆ X i i i i i
(8)
ˆA ˆV ˆ i. V ˆi = D ˆ if i = 1 and V ˆ i = [V ˆ i−1 , D ˆ − ˆ TB ˆV ˆ i )−1 V ˆ TB where Ti = (V i i
urn al P
375
(7)
376
ˆX ˆ i ] if i > 1. T, V ˆ i are independent of m. The column space of V ˆ i is now A
377
ˆ A). ˆ The advantage here, compared to RBCG, is that the search space Ki (D,
378
ˆ ˆ (m) for each x i , spanned by Vi , now has dimension i × M instead of i. The
379
ˆ i that are no longer B-orthogonal, ˆ ˆ method results in matrices V but B-block
380
ˆ i then v ˆ vq = 0 if |p−q| ≥ M , ˆ p and v ˆ q are two columns of V ˆ pT Bˆ orthogonal: if v ˆ i retrieved in this way are but might be non-zero otherwise. The estimates X
382
the same as those found using the block diagonal CG method (O’Leary, 1980)
383
with B-preconditioning. The pseudo-code for the block diagonal CG is given
384
in Table C.9.
385
3.3. Partial Parallelisation: Cluster Search Method
386
387
Jo
381
ˆ i−1 to V ˆ i in the block diagonal CG method, M In order to expand V
ˆ to a vector are necessary. This will require Ncores × M applications of A 18
Journal Pre-proof
388
cores and can be prohibitively expensive for large systems. Here we introduce
389
the cluster search method that requires Ns M concurrent applications of
ˆ to create the expansion. In this case, we still look for a solution to (7) A
391
ˆ i still given by (8) but with V ˆ i constructed differently. In particular, with X
392
we focus on x ˆ(0) as it is the control run that will be used to produce the
393
operational forecasts and therefore minimization of the error in x ˆi
394
ˆ i is constructed, we momentarily assume that the priority. To explain how V
395
ˆ =U ˆ 0Λ ˆ 2U ˆ T with U ˆ TU ˆ 0 = I is available and require eigendecomposition A 0 0
396
ˆ vq = 0 if |p − q| ≥ Ns similar to block diagonal CG and (ii) ˆ pT Bˆ that: (i) v
pro
of
390
(0)
has top
(0) ˆ i+1 as is the the residual for the control run, ˆri , is in the column space of V
398
case in RBCG for m = 0.
400
(0) ˆ (0) = U ˆ T Aˆ ˆ x(0) and ˆ T ˆr(0) = U ˆ T A(ˆ ˆ x(0) − x ˆ Td ˆ0 = U ˆ 00 = U Define v ˆi ), v 0 0 i 0 0
ˆi = U ˆ0 −A ˆV ˆ i T−1 (V ˆ TB ˆV ˆ i )−1 V ˆ TB ˆU ˆ 0 . Then U i i i
urn al P
399
re-
397
(0) ˆ 0v ˆ x(0) = ˆr(0) − A ˆV ˆ i T−1 (V ˆ TB ˆV ˆ i )−1 V ˆ T Br ˆ (0) = U ˆ iv ˆ 0 = ˆr(0) ˆ 00 (9) U r0 − Aˆ 0 0 i i i =ˆ i i 401
402
403
404
405
406
407
ˆ i to V ˆ i+1 we look for Ns new search vectors of the form s(n) = To expand V P ˆ iΛ ˆ 2 Pn v ˆ 00 with n = 1, 2, . . . , Ns and Pn = ˆ 00 ) with ˆ U ed (ˆ eTd v ed the d∈Dn ˆ
unit vector in direction d. Here Dn is a subset of {1, 2, . . . , D} such that the union of D1 , D2 , . . . , DNs is {1, 2, . . . , D} and Dp and Dq are disjoint if p 6= q. P s ˆ 0v ˆ iv ˆ 00 = v ˆ 00 . Combined with the equality U ˆ0 = U ˆ 00 Consequently, N n=1 Pn v (0)
in (9) this then ensures that ˆri
ˆ i+1 . Thus search vectors of the lies within V
form s(n) satisfy requirement (ii). Furthermore,
Jo
ˆ TB ˆU ˆi = V ˆ TB ˆU ˆ0 −V ˆ TB ˆA ˆV ˆ i T−1 (V ˆ TB ˆV ˆ i )−1 V ˆ TB ˆU ˆ0 V i i i i i i
ˆ TB ˆU ˆ0 −V ˆ TB ˆA ˆV ˆ i (V ˆ TB ˆA ˆV ˆ i )−1 V ˆ TB ˆU ˆ0 = 0 = V i i i i
408
(10)
ˆ i is B-orthogonal ˆ ˆ i and since the Ns new search This shows that U to V 19
Journal Pre-proof
409
ˆ i+1 are linear combinations of the column vectors of U ˆ i , they directions in V
410
satisfy requirement (i). ˆU ˆ i . For i = 0, ˆ TA For the following we also need to be able to estimate U 0
412
ˆi = U ˆ 0 and so U ˆ TA ˆU ˆ 0 = Λ2 , while for i > 0 exact expressions are not U 0
413
ˆ i are the residuals directly available. Instead we observe that the columns of U
414
ˆ 0 =U ˆ 0 in obtained after trying to find a solution to the linear system AX
415
ˆ 0 Λ−2 . Here we the search space Vi . This system has the exact solution U
416
make an ad-hoc assumption that these residuals are, in good approximation,
417
ˆ 0 , i.e. U ˆi ≈ U ˆ 0 Ξ with Ξ diagonal. In this case, Ξ multiples of the columns U
pro
ˆ i. ˆ 0Ξ ≈ U ˆ TU ˆ TU can be estimated as Ξ2 = ΞT U i 0
re-
418
of
411
(0)
ˆ i ) that is defined as the vector in the search space span(V
In RBCG x ˆi
419
minimizes the error (4) in the E-norm. The novel idea behind cluster search
421
is to find a clustering D1 , D2 , . . . , DNs and the associated Ns new search
422
vectors s(n) such that the reduction of the expected error ||ˆ x(0) − x ˆi+1 ||E is
423
larger than can be achieved using any other clustering. Using the properties
424
ˆi ≈ U ˆ 0 Ξi , and the orthonormality of U ˆ 0 , we find of Pn , the estimation U
425
that the E-norm of the expected error for x ˆi+1 can be estimated as (0)
||ˆ x
−
urn al P
420
(0) x ˆi
−
Ns X n=1
ˆ i Pn v ˆ 00 ||2E ≈ αn U
(0)
(0)
Ns X X
−2 00 02 0 2 [(1 − λ−2 d )vd − 2αn vd λd (1 − λd )vd ξd
n=1 d∈Dn +αn2 λ4d (1
002 2 − λ−2 d )vd ξd ],
(11)
where λd and ξd are the d-th element on the diagonal of Λ and Ξ, correspond-
427
ingly. To find the minimum of this function, we set the derivative of (11) as
428
a function of αn to zero and get
Jo
426
α ˆn = (
X
d∈Dn
4 002 2 (1 − λ−2 d )λd vd ξd
X vd0 4 002 2 −1 )( (1 − λ−2 d )λd vd ξd ) vd00 ξd λ2d d∈D n
20
(12)
Journal Pre-proof
n
v0 1 = 00 , v ξd λ2d
where ·n denotes the weighted mean over the cluster Dn with weights (1 −
of
429
(13)
4 002 2 λ−2 ˆ n from (13) back into (11) gives that for our guesses d )λd vd ξd . Inserting α
431
of s(1) , . . . , s(Ns ) the error squared obtains a minimum
pro
430
P s (0) ˆ i Pn v ˆ 00 ||2E ||ˆ x(0) − x ˆi − N ˆnU n=1 α P s P vd02 −2 002 2 4 ≈ N ˆ n2 ] n=1 d∈Dn (1 − λd )vd ξd λd [ vd002 ξd2 λ4d − α P s vd0 = N n=1 Wn varn ( v 00 ξd λ2 ) d
d
P
(14)
−2 4 002 2 d∈Dn (1−λd )λd vd ξd
with varn the weighted variance over cluster n and Wn =
433
the normalization coefficient for the n-th cluster. The K-means clustering al-
434
gorithm (MacQueen, 1967) can now be used to find an approximate cluster-
435
ing that approximately minimizes (14). Once K-means produces a clustering,
436
ˆ iΛ ˆ i+1 = [V ˆ i , s(1) , . . . , s(Ns ) ]. ˆ 2 Pn v ˆ 00 are known and V s(n) = U
urn al P
re-
432
ˆ is not available. Instead it is In reality the eigenvalue decomposition of A
438
used that if R and B are true estimates of the observational and background
439
ˆd ˆ T i (Desroziers et al., 2005). Here h·i denote ˆ = hd error covariance then A
440
the expected value. Approximations to the eigenvectors and eigenvalues of
441
ˆ are then found by calculating the eigenvalue decomposition of A
442
1 ˆ (0) ˆ (1) ˆ (M −1) ][d ˆ (0) , d ˆ (1) , . . . , d ˆ (M −1) ]T [d , d , . . . , d M
443
pseudocode for the cluster search method can be found in Table C.10. An
444
overview of where the cluster search method enters the cost-function mini-
445
mization algorithm is shown in Figure 3.
446
Jo
437
=
1 ˆ ˆT DD M
ˆd ˆ T i. The ≈ hd
Notice that if Ns = 1 there is only one cluster and requirement (i) ensures (0)
447
that the new search direction is equal to ri . Consequently, the clustering
448
method reverts to RBCG described in section 3.1 for m = 0. If Ns = M , 21
Journal Pre-proof
ˆ i constitutes its own cluster and hence the new each column vector of U
450
ˆ i . By construction search directions are multiples of the column vectors of U
451
ˆ i are linear combinations of the i-th residuals from the column vectors of U
452
ˆ i ) is equal in both the different ensemble members. Consequently, span(V
453
the block diagonal CG and cluster search methods.
pro
of
449
In section 5, we will compare convergence rates of RBCG, full parallelisa-
455
tion, and the cluster search methods in the realistic OR-WA system set-up.
456
Before we can proceed with those, we next describe the wind perturbations
457
that will be utilized in the tests of section 5.
458
4. Wind Perturbations
re-
454
Conventionally, multiplicative ensemble inflation (Anderson and Ander-
460
son, 1999) is applied to the ensemble members to compensate for the fact
461
that ensembles generally fail to account for all error sources. Multiplicative
462
ensemble inflation implicitly assumes that the ensemble underestimates the
463
relative growth of the background errors uniformly throughout the model. In
464
the Oregon-Washington region surface currents, the strength of the coastal
465
upwelling (Halpern, 1976; Huyer, 1983), and the location of the fresh water
466
Columbia River plume (Hickey et al., 1998, 2005; Liu et al., 2009) all depend
467
on the wind forcing. Therefore uncertainty in the wind forcing is assumed to
468
be the dominant source of model error. In an attempt to better approximate
469
the structure and strength of the background errors, we have opted for an
470
approach different from multiplicative ensemble inflation. In this approach
471
physically realistic wind perturbations are added to the ensemble members
472
(m = 1, . . . , M − 1) and it is left up to the model physics to translate these
Jo
urn al P
459
22
Journal Pre-proof
wind forcing errors into background errors in the ocean state. Although no
474
comparison with ensemble inflation will be made, we will verify later in this
475
manuscript if adding the wind perturbations helps to avoid ensemble vari-
476
ance shrinking alleviating the need for the ensemble inflation (Hamill and
477
Whitaker, 2005; Li et al., 2009; Whitaker and Hamill, 2002).
pro
478
of
473
The perturbed wind fields for an ensemble members are generated as w(t) = wN AM (t) + wL (t) + wS (t)
(15)
with wN AM (t) ∈ R2Nw the vector containing the meridional and zonal wind
480
velocity components from the NAM model interpolated to the Nw ROMS
481
model surface grid points. Fields wL (t) and wS (t) represent the large-scale
482
and small-scale wind perturbations respectively.
re-
479
For the large-scale perturbations, we use the empirical orthogonal func-
484
0 tion (EOF) decomposition of the series wN AM (t) = wN AM (t) − hwN AM i,
485
where the winds are provided every 6 hr from 1 January 2011 00:00 to 31
486
December 2011 18:00 and hwN AM i is the mean wind field over this period.
487
After the EOF decomposition the NAM wind field can be written as
urn al P
483
wN AM (t) = hwN AM i +
NX EOF i=1
βN AM,i (t)wEOF,i + w⊥ (t),
(16)
where wEOF,i is the EOF mode associated with the i-th largest singular
489
value, w⊥ (t)T wEOF,i = 0 for i = 1, 2, . . . , NEOF and βN AM,i (t) are the EOF
490
expansion coefficients associated with different EOFs and different times.
491
Here, we use 10 EOFs (NEOF = 10) that explain 95% of the variance of
492
wN AM in time. Similarly to H´enaff et al. (2009) and Vervatis et al. (2016),
Jo
488
23
Journal Pre-proof
we define the large-scale wind perturbation to be wL (t) =
NX EOF
βL,i (t)wEOF,i .
i=1
of
493
(17)
The expansion coefficients for the large-scale wind perturbations, βL,i , with
495
specified standard deviation σ ˆEOF,i , are generated by an AR1-process
pro
494
βL,i (t) = σ ˆEOF,i β,i (t)
for t = 0 q ˆEOF,i β,i (t) for t ≥ ∆t. βL,i (t) = cβ βL,i (t − ∆t) + 1 − c2β σ
(18)
Here β,i (t) is drawn from a standard normal distribution, ∆t = 6 h is the out-
497
put time step of the NAM model and correlation coefficient cβ = 0.4 (Milliff
498
et al., 2011). The two dominant wind EOFs scaled by the standard devia-
499
tions of their expansion coefficients in the large-scale wind errors (ˆ σEOF,i ) are
500
shown in Figure 4a,b.
re-
496
The standard deviation of the large-scale expansion coefficients βL,i is
502
estimated based on the differences between the 6-hourly NAM model wind
503
output and the daily, 25 km ASCAT satellite wind product (Figa-Saldaa
504
et al., 2002; NOAA, 2011a), and NDBC buoy (numbers 46089,46015, 46050,
505
46029, 46041) wind observations (NOAA, 2016). The estimate σ ˆEOF,i used
506
is the mode of
urn al P
501
2 p(σEOF,i |wobs ) = R 2 2 p(wS , β L , obs , σEOF,1 , . . . , σEOF,N , σS2 |wobs ) dwS EOF Q EOF 2 2 ×dβ L dobs N j=1,j6=i dσEOF,j dσS ,
(19)
2 the conditional probability distribution for σEOF,i given all the scatterometer
508
and buoy wind observations in the model domain in 2011 mapped to the
509
NAM model output times (vector wobs ). For the buoy observations the time-
510
mapping is done by selecting the buoy wind measurement closest to the NAM
Jo
507
24
Journal Pre-proof
model output time, while the daily ASCAT observations are compared with
512
the NAM model output time on the same day for which the RMSE between
513
ASCAT observations and NAM model output is minimal. The conditional
514
probability distribution in (19) is constructed using a Bayesian Hierarchical
515
Model (BHM) similar to the one used in (Milliff et al., 2011; Wikle et al.,
516
2001). The BHM consists of three stages:
pro
of
511
wobs (tj ) = Htj wtrue (tj ) + obs (tj ) (data stage) P EOF wEOF,i βL,i (tj ) (process stage) wtrue (tj ) = wN AM (tj ) + wS (tj ) + N i=1 (parameter stage)
re-
2 2 σobs , σS2 , σEOF,i
(20)
with the underbar denoting the concatenation of vectors taken at different
518
NAM model output times tj into one vector, e.g. wS = [wS (t1 ); wS (t2 ); . . . ; wS (tNt )],
519
Htj the linear operator that maps the wind velocities at time tj from the
520
model grid to the ASCAT and buoy observation locations, wtrue (tj ) ∈ R2Nw
521
the unknown true wind field at time tj , obs (tj ) the measurement error in
522
the ASCAT/NDBC buoy wind observations, wS (tj ) ∈ R2Nw the error in the
523
small-scale wind field and βL,i (tj ) the expansion coefficient for the i-th EOF
524
mode in the large-scale error in the wind field. Prior distributions of obs (tj ),
525
2 wS (tj ), βL,i (tj ), σS2 and σEOF,i are assumed to be:
urn al P
517
2 obs (tj ) ∼ N (obs (tj ); 0, σobs I)
Jo
wS (tj ) ∼ N (wS (tj ); 0, σS2 I)
2 βL,i (tj ) ∼ N (βL,i (tj ); 0, σEOF,i )
σS2 ∼ IG(σS2 ; aS , bS )
2 2 σEOF,i ∼ IG(σEOF,i ; aEOF,i , bEOF,i )
25
(21)
Journal Pre-proof
where N (x; µ, C) is a normal distribution with mean µ and covariance C,
527
IG(x; a, b) the inverse gamma distribution with parameters a, b (see Ap-
528
pendix D for the details) and and ∼ denotes that a value or vector is
529
randomly drawn from the given distribution. Any spatial structure in the
530
small-scale errors wS (tj ) is neglected. Based on ASCAT validation (Ver-
531
speek et al., 2013) σobs is set to 0.7 m s−1 . We pick aEOF,i =
532
0.1aEOF,i var(βN AM,i ) with Nt the number of days on which ASCAT observa-
533
tions are available and var(βN AM,i ) the variance of the coefficient βi in (16).
534
2 with mode This gives an a priori distribution for σEOF,i
535
2 = These values were chosen such that this mode corresponds to the σ ˆEOF,i
536
0.09 var(βN AM,i ) estimate used by H´enaff et al. (2009) and Vervatis et al.
537
(2016). Similarly, aS and bS are chosen to be aS =
538
2 . given the a priori distribution of σS2 with a mode of approximately σobs
re-
pro
of
526
bEOF,i 1+aEOF,i
bEOF,i =
≈ 0.1 var(βi ).
2 and bS = σobs aS
urn al P
2 NN 20 t w
1 N, 20 t
The conditional probability distribution (19) derived using the BHM
540
above is retrieved using a Gibbs sampler (see Appendix D) and is shown in
541
Figure 5 for the nine dominant EOF modes. Also indicated in Figure 5 is the
542
percentage of the variance in the NAM wind fields explained by each EOF.
543
In addition, the 9% of this variance is shown by dashed lines, as this value
544
was used in other studies to estimate σ ˆEOF,i (H´enaff et al., 2009; Vervatis
545
et al., 2016). Figure 5 shows that the BHM estimate for σ ˆEOF,i is higher
546
than the 9% estimate in all modes except mode one. The difference between
547
the two estimates increases for increasing EOF number. For the higher EOFs
548
(mode 4 and higher), which represent smaller spatial scales in the wind field
549
(not shown), the 9% estimate severely underestimates the contribution of the
550
mode to the error in the wind fields.
Jo
539
26
Journal Pre-proof
The study of scatterometer wind measurements over the Pacific Ocean
552
shows that the power spectral density (PSD) of the wind field scales with κ−ˆγ ,
553
where κ is the wave number and γˆ ≈ 2 (Chin et al., 1998). The PSD of the
554
meridional NAM wind field determined using a Hamming window (Figure 6,
555
solid blue line) decreases faster than this for κ > 0.3 rad km−1 owing to
556
the limited (12-km) NAM resolution. As the NAM model cannot represent
557
the small-scale wind field, probable small-scale wind fields are added to the
558
ensemble members. Following Wikle et al. (2001), it is assumed that the
559
small-scale wind field in (15) can be decomposed into Daubechies-2 wavelets
560
(Cohen et al., 1993):
re-
pro
of
551
wS (t) = γ0
9 X X
561
(n)
(n)
(22)
i
urn al P
n=1
(n)
γi (t)ψi
with γi (t) coming from an AR1-process (n) γi (t)
=
(n) cγ γi (t
− ∆t) +
q
(n)
1 − c2γ σγ(n) i (t)
(23)
Here n indicates the level of the wavelet, with the length scale of the wavelets
563
doubling as the level increases with one and i running over all the wavelets
564
that are available at level n. Similarly to the large-scale wind field, we use
565
cγ = 0.4. This wavelet approach yields small-scale wind perturbations that
566
are local in space and are simultaneously constrained to a limited spectral
567
band. By experimentation, the standard deviation of σγ is chosen such that
568
the dependence of the PSD S on the wave number κ of the total ensemble
569
member wind field scales as S(κ) ∼ κ−2 . This was achieved by setting
570
σγ
571
the variance of the wind speed is (0.55 m s−1 )2 . The PSDs for the wind
Jo
562
(n)
(n)
= exp[1.3(n − 3)](0.5 − 0.5 tanh[0.4(n − 4)]) and picking γ0 such that
27
Journal Pre-proof
fields of the different ensemble members on 8 August 2011 00:00 are shown
573
together with the linear least-square log-log fit to the ensemble mean PSD
574
for κ > 0.1 rad km−1 in Figure 6. The fit confirms that the PSDs have indeed
575
the desired P SD ∼ κ−2.0 relationship.
576
5. Results
577
5.1. Convergence
pro
of
572
The effectiveness of the cluster search algorithm using a different number
579
of clusters is compared with that of RBCG and block diagonal CG. Even
580
though the DA correction x(m) is only calculated after the last inner loop
581
iteration i = I, the cost function (1) can be calculated for each inner loop
582
iteration i if the substitution xi
re-
578
(m)
ˆi = BMT HT R−1/2 x
is made in (1):
urn al P
(m)
(m)
J(xi ) =
1 (m),T ˆ (m) ˆ Bˆ xi x 2 i
ˆ (m) − Bˆ ˆ (m) − Bˆ ˆ x(m) )T (d ˆ x(m) ). + 21 (d i i
(24)
ˆ (m) which is readˆ x(m) = B ˆV ˆ i T−1 (V ˆ TB ˆV ˆ i )−1 V ˆ TB ˆd Using (5) we find that Bˆ i i i i
584
ˆV ˆ i is stored. Using (24) the value of the cost function was ily available as B
585
calculated prior to each inner loop iteration using the cluster search method
586
with different numbers of clusters (1 ≤ Ns ≤ 40). The cost function nor-
587
malized by its value at the start of the minimization is shown in Figure 7a
588
and b, for the windows starting on 31 May and 26 August 2011 respectively.
589
Increase in the number of clusters Ns , and correspondingly the number of
590
search directions at each iteration, consistently improves the rate at which
591
the cost function decreases as the function of the inner loop iteration number.
592
To provide a more quantitative assessment of the advantage of using several
593
search directions in parallel, we compute the speed-up ratio
Jo
583
28
Journal Pre-proof
I(1) , I(Ns )
(25)
of
a(Ns ) =
where I(Ns ) is the number of iterations needed to reach a specific refer-
595
ence level of the cost function J = Jref using the cluster method with Ns
596
new search directions per iteration. I(1) corresponds to RBCG. For Jref , we
597
choose the value in the case using Ns = 4 and I = 12 iterations as this will
598
be adopted later as the standard setup in the long-term system evaluation in
599
Part 2 of this study (Pasmans et al., in preparation). The speed-up ratios are
600
shown in Figure 7c and indicate, e.g., that RBCG (Ns = 1) needs approx-
601
imately 30% more iterations than cluster search with Ns = 4 to reach the
602
same level of cost function reduction. A fit of a 2nd order polynomial to a
603
(dashed black lines in Figure 7c) shows that the coefficient for the quadratic
604
term is negative and significantly different from zero at a 95% significance
605
level indicating that the additional benefit of adding more clusters diminishes
606
as the number of clusters increases.
urn al P
re-
pro
594
Figure 8 compares differences in the initial condition corrections between
608
RBCG (Ns = 1) and the cluster search with Ns = 4 on 26 August 2011.
609
As the time available to perform DA is constrained in operational settings,
610
the minimization in both these two cases is terminated after I = 12 inner
611
loop iterations. The plots on the left show the DA correction calculated for
612
SST, surface velocity and SSH fields with Ns = 1; and the plots on the right
613
show the difference between the DA corrections in the cases with Ns = 4 and
614
Ns = 1. While both methods yield similar large-scale corrections, they differ
615
in details at the scale of geostrophic eddies.
616
Jo
607
For the ensemble members additional dependency on the search space 29
Journal Pre-proof
comes from the fact that when cluster search is used the right hand side of
618
ˆ (7) is replaced by its B-projection on the search space. I.e., on the right-hand
619
ˆ (m) is effectively replaced by V ˆ (m) . For the ˆ i (V ˆ TB ˆV ˆ i )−1 V ˆ TB ˆd side of (7), d i i
620
ˆ (0) lies in span(V ˆ 1 ), but control run this is not an issue as by construction d
621
for the ensemble members this could result in the systematic elimination of a
622
ˆ (m) . Such an elimination would result in DA part of the errors contained in d
623
corrections for the ensemble members that are too small and consequently an
624
ensemble spread that will be too large. To test whether this is a valid concern,
625
the normalized RMSE for each observation type, i.e. the RMS of the elements
626
ˆ (m) associated with one type of observations, is calculated and compared of d
627
ˆ (m) on V ˆ ˆ i with i = 12. If with the RMSE after taking the B-projection of d
628
ˆ (m) lies completely in V ˆ i , as is the case for m = 0, the ratio of the latter d
629
over the former is one. The actual ratio in the experiment is calculated for
630
each ensemble member and each window and the lower bound, upper bound
631
and ensemble mean are shown in Figure 9. Figure 9 shows that, as expected,
632
using the projection can result in the reduction of the RMSE (up to 40%).
633
However, the figure also shows that taking the projection can increase the
634
RMSE. This paradoxical behaviour emerges because the projection uses the
635
ˆ B-inner product, while in the calculation of the RMSE involves the normal,
636
Euclidean, inner product. Taking the mean of the ratios over all ensemble
637
ˆ members shows that increases in the RMSEs created by the B-projection
638
mostly, but not completely, offset the reductions in RMSEs and that the net
639
result is a small decrease in the RMSE of 1.7% for SST, 3.6% for HFR and
640
2.8% for SSH observations. So, the projection effect might indeed result in
641
overestimation of the error variances by the ensemble, but this effect is small.
Jo
urn al P
re-
pro
of
617
30
Journal Pre-proof
642
5.2. Error Reduction
of
Table 1: RMS and the mean of the difference experiment minus observations as shown in Figure 10 and 11 for the different models and over the period 22 April to 28 September
pro
2011.
RMSE
Bias
HFR
SSH
SST
HFR
[◦ C]
[cm s−1 ]
[cm]
[◦ C]
[cm s−1 ]
No DA
1.17
18.3
6.4
-0.18
1.5
Control analysis
0.75
10.5
3.8
-0.04
0.2
Ensemble mean analysis
0.76
9.7
3.9
-0.05
-0.2
Control forecast
0.94
13.2
4.9
-0.07
0.4
Ensemble mean forecast
0.92
11.2
4.8
-0.13
-0.1
urn al P
re-
SST
To test whether the system is effective correcting RMSE not only for the
644
control run but also for the ensemble members, the RMSE between the data
645
used in the assimilation and the nonlinear analyses and forecasts is calcu-
646
lated for the ensemble members as shown in Figure 10. Each line segment
647
represents the RMSE in the analysis (left point) and in the forecast for the
648
subsequent window (right point). Note that the forecast RMSE (right points)
649
are calculated with respect to formally future observations. The En4DVar
650
system is effective in reducing the RMSE: the analysis RMSE for the ensem-
651
ble members exceeds that in experiment No DA (blue line) in less than 4%
652
of the cases. Forecast RMSE for the ensemble members are smaller than No
653
DA forecast RMSE in 73% of cases. The RMSEs for the ensemble members,
Jo
643
31
Journal Pre-proof
however, are consistently larger than those for the control run (green line)
655
as they are forced with perturbed wind fields and corrected with perturbed
656
observations. However, the errors introduced by the perturbations cancel out
657
in the ensemble mean. Indeed, the ensemble mean RMSE lies below that of
658
the ensemble members. Note that in the ensemble Kalman filter there is no
659
“control run” and the ensemble mean is used as the best estimate (Evensen,
660
1994). We could have used the same approach, but the additional commu-
661
nication between the computational nodes required to calculate the mean
662
ocean state would have increased the wall time significantly. Table 1 shows
663
that the RMSE of the control run is on par with that of the ensemble mean,
664
with the exception of the RMSE in the HFR observations after 14 August
665
2011 (see Figure 10b). Hence, our choice to pick the control run over the
666
ensemble mean to produce the forecasts will have only a limited negative
667
impact on the forecast accuracy.
urn al P
re-
pro
of
654
Figure 11 shows the observation-model bias in experiment No DA, the
669
ensemble members, the control run, and the ensemble mean. As the along-
670
track mean is removed from both the altimetry observations and their model
671
equivalents prior to assimilation (see section 2), the along-track mean of
672
both the assimilated altimetry observations and their model equivalents is
673
by construction zero. Consequently, the bias along each track, and thus in
674
general, is zero and is therefore not included in Figure 11. The bias in the
675
HFR observation shows a spread around zero for both the ensemble members
676
as well as the control run forecasts. The bias in the forecasts predictions for
677
the SST observations, however, has a negative tendency with particularly
678
large negative biases during the periods 13-16 May, 5-8 June, 21-24 July,
Jo
668
32
Journal Pre-proof
26-29 August, 25-28 September 2011. This results in an overall negative
680
bias over the whole period as shown in Table 1. It is indicative of either
681
insufficient surface heating in the model, too much mixing in the upper layer,
682
or a positive bias in the satellite observations. Further verification against
683
independent in-situ observations is described in Part II.
684
5.3. Ensemble Reliability
pro
of
679
If the ensemble statistics are truly representative of the background error
686
statistics, the ensemble is said to have high reliability. A rank diagram is
687
a diagnostic that can be used to test this (Hamill, 2001). Figure 12 shows
688
rank diagrams for the three different types of observations. The steps to
689
construct these are as follows: (a) sample each ensemble member forecast
690
at the observation locations and times and add a random observation error,
691
(b) for each observation, count the number of ensemble forecasts that are
692
lower than the measured value, (c) by definition, this number plus one is the
693
rank of the observation, (d) count the frequency of each rank and divide by
694
the total number of observations to determine the normalized frequency. If
695
the ensemble is reliable, the rank diagram should be flat (Hamill, 2001). The
696
95%-confidence interval for the normalized frequency of a reliable ensemble is
697
shown as dashed lines in Figure 12. The figure shows that the ensemble relia-
698
bility is different for different fields. In the rank diagram for SST (Figure 12a)
699
there is no distinctive peak. Instead the rank diagram has an upward slope.
700
This can be due to the negative bias in the ensemble (see Figure 11a). For the
701
HFR observations, mid-range ranks are relatively more abundant than the
702
tails (see Figure 12b). This indicates that either the spread in the ensemble
703
is larger than the standard deviation of the HFR background errors or that
Jo
urn al P
re-
685
33
Journal Pre-proof
the the observational error magnitude is overestimated. The opposite is the
705
case for SSH observations. Here the U-shape (Figure 12c) implies that the
706
forecast ensemble underestimates the magnitude of the background errors.
of
704
Finally, estimates for the background error and observational covariances
708
used in the DA system are compared with estimates obtained from the in-
709
novation statistics. The relations between innovation statistics and error
710
variances are given by (Desroziers et al., 2005):
E (0) (y − HM(xb ))2d = (HMBMT HT + R)dd D E (0) (0) (y − HM(xb ))d (y − HM(xb ) − HMx(0) )d = (R)dd D E (0) (HMx(0) )d (y − HM(xb ))d = (HMBMT HT )dd
re-
D
pro
707
(26) (27) (28)
where 1 ≤ d ≤ D is the index of the observation and h·i denotes the expec-
712
tation value, (·)d the d-th element of the vector, and (·)dd the d-th element
713
on the diagonal of a matrix. The expectation values on the left-hand side
714
of (26)-(28) are approximated by averaging over all observations of the same
715
type in each window. These estimates are shown as blue lines in Figure 13
716
(where the top, middle, and bottom plots are for (26), (27), and (28) corre-
717
spondingly). An approximation to the right-hand side of (27) is obtained by
718
averaging (R)dd over all the observations of the same type. For the right-hand
719
side of (28) an approximation is obtained by doing the same for
urn al P
711
(Bens )dd
=
=
(HMBMT HT )dd = M1−2 PM −1 (m) 1 m=1 HM(xb ) M −1
Jo
HM(xb )
def
PM −1 m=1
(m)
(HM(xb ) − HM(xb ))2d (29)
720
Error standard deviations based on these estimates are displayed as dashed
721
black lines in Figure 13. 34
Journal Pre-proof
Equations (26)-(28) only hold if R and B correctly represent the true
723
observational and background error covariances. Figure 13 shows to what
724
extend this is the case in our system. Figure 13g,h,i show that the ensemble
725
error standard deviation (black line) grows over time for all three types of
726
observations and hence that the wind perturbations are sufficient to prevent
727
the ensemble spread from collapsing even without ensemble inflation. For
728
SST the error standard deviation estimates from the innovation statistics
729
are in agreement with the specified standard deviations (see Figure 13a,d,g).
730
Error standard deviation estimates for HFR observations are consistent up
731
to 1 July 2011 too. After 1 July 2011, however, the total error standard
732
deviation estimate is too large (Figure 13b). The standard deviations for
733
the observational errors agree (Figure 13e), so the overestimation is due to
734
the fact that after 1 July the ensemble background error standard devia-
735
tion estimate (black line Figure 13h) is larger than the standard deviation
736
error estimate based on the forecast-observation differences (blue line in Fig-
737
ure 13h). This finding is consistent with the shape of the rank histogram
738
in Figure 12b indicating overdispersion in the ensemble. Further investiga-
739
tion (not shown here) indicates that the difference between the estimates for
740
background error standard deviation can be attributed nearly entirely to the
741
sparse HFR observations taken far offshore (depth > 2 km). Closer to the
742
shore (depth < 1 km), where numerous, closely-spaced, HFR observations are
743
available for DA to reduce the background error, the ensemble estimates for
744
the observational and background error standard deviations and those based
745
on (26)-(28) show good agreement (Figure 13b, light blue/grey). Initially,
746
the total SSH error standard deviation estimate from R and Bens (black line
Jo
urn al P
re-
pro
of
722
35
Journal Pre-proof
in Figure 13c) is smaller than the total SSH error standard deviation from the
748
innovations (blue line in Figure 13c). One could put forward the hypothesis
749
that this is due to the fact that the standard deviation from Bens (black line
750
in Figure 13i) is smaller than the observational error estimate used in the
751
DA (black line in Figure 13f) resulting in small SSH DA corrections. This
752
would be a satisfactory explanation near the shore (depth < 1 km) where
753
ˆ and B ˆ + R remain nearly constant over time (Figensemble estimates for B
754
ure 13i, light blue/grey). However, in general the standard deviation from
755
Bens keeps increasing over time, growing beyond the specified observational
756
error standard deviation of 2 cm, and (27) is not satisfied: the innovation
757
statistics estimate (blue line in Figure 13f) for the observational error stan-
758
dard deviation continues to lie above the specified standard deviation (black
759
line in Figure 13f). Either we have underestimated the observational error
760
standard deviation while specifying R or the structure of the background
761
errors is such that the system cannot remove them effectively.
762
6. Conclusions and Discussion
urn al P
re-
pro
of
747
The development of ensemble-based 4DVAR systems has been one of
764
the main focus areas in numerical weather prediction. Similarly, there is a
765
rationale to applying ensemble-based 4DVAR systems for oceanic prediction.
766
Utility of a static B can be limiting in shelf applications. The OR-WA
767
forecast system, used in this study as a test ground for En4DVar, is a good
768
example where model error statistics are influenced by high temporal and
769
spatial ocean state variability. Before En4DVar can be applied successfully
770
to the OR-WA or any other coastal system, many technical details must be
Jo
763
36
Journal Pre-proof
worked out as outlined in this manuscript. The newly developed En4DVar
772
systems need to go through statistical tests for self-consistency using actual
773
observations, which help us understand the system behaviour and potential
774
biases in the data.
of
771
Critical to a successful implementation of En4DVar for large prediction
776
systems will be the development of time-efficient cost function minimization
777
algorithms that take advantage of the massive parallel computer architec-
778
tures. The cluster search method developed and tested in this study explores
779
Ns search directions in parallel at each inner loop iteration. It was found that
780
using a relatively small number of parallel direction computations, Ns = 4,
781
can reduce the wall clock time by 30% compared to RBCG to achieve the
782
same level of cost function reduction. It can be interesting to see in future
783
studies whether combination of this method with saddle point algorithms
784
(Rao and Sandu, 2016; Fisher and G¨ urol, 2017) can deliver an even better
785
4DVAR performance given the same limited number of cores.
urn al P
re-
pro
775
Our system did not employ ensemble inflation (e.g. Anderson, 2001; An-
787
derson and Anderson, 1999; Hamill et al., 2001), but generated background
788
errors by perturbing the wind fields in the ensemble members in a realistic
789
way. Although no comparison was made with ensemble inflation, and thus
790
it cannot be concluded that wind perturbations are superior to ensemble in-
791
flation, no collapse of the ensemble spread was observed and therefore wind
792
perturbations alleviated the need for ensemble inflation. The common as-
793
sumption (H´enaff et al., 2009; Palmer et al., 2009; Vervatis et al., 2016) that
794
the variance of the wind errors is proportional to the natural time-variability
795
was found to be unrealistic. This possibly due to the atmospheric model
Jo
786
37
Journal Pre-proof
being less able to represent small scales, or due to the inability to represent
797
all possible small-scale error modes correctly with a very limited set of EOFs.
798
Instead we found using a BHM that the wind errors increase in proportion
799
to the natural variability as the spatial scale of the wind error decreases.
800
Based on this, we agree with the findings of Whitaker and Hamill (2012)
801
that additive inflation is more suitable to representing model errors, like the
802
errors in wind forcing, than multiplicative inflation in which error variances
803
are assumed to be proportional to the temporal variance in the signal.
pro
of
796
Even though the En4DVar system was effective in reducing forecast er-
805
rors compared to the case No DA, the rank diagram analysis suggests that
806
the ensemble fails to represent the background error statistics perfectly: the
807
ensemble overestimates the spread in the surface velocity background errors,
808
while it underestimates the spread in the SSH background errors (see Fig-
809
ure 12). Although the rank diagrams in Figure 12 are not uniform, the bias
810
and the maximum/minimum frequency-ratio of the diagrams is not excep-
811
tionally large compared to the rank diagram analyses in ensemble Kalman
812
filter DA studies (e.g. Cookson-Hills et al., 2017; Fujita et al., 2007; Leeuwen-
813
burgh, 2007; Meng and Zhang, 2008).
urn al P
re-
804
This concludes the introduction of the new En4DVar system for the OR-
815
WA system and evaluation of the error. In Part II of this study Pasmans et al.
816
(in preparation) we will discuss if En4DVar yields better quality predictions
817
than the traditional 4DVAR with balance operator background covariance
818
currently used in the operational OR-WA system.
Jo
814
38
Journal Pre-proof
819
7. Acknowledgements This work was supported by the National Oceanic and Atmospheric Ad-
821
ministration (NOAA) Coastal Ocean Modelling Testbed (COMT) [grant
822
number NA13NOS0120139], the NOAA Quantitative Observing System As-
823
sessment Program (QOSAP), Integrated Ocean Observing System / North-
824
west Association of Networked Ocean Observing Systems (IOOS/NANOOS)
825
[grant number NA16NOS0120019], the National Aeronautics and Space Ad-
826
ministration (NASA) SWOT Science Definition Team project [grant num-
827
ber NNX13AD89G]. This work used the Extreme Science and Engineering
828
Discovery Environment (XSEDE) [allocation TG-OCE160001], which is sup-
829
ported by the National Science Foundation [grant number ACI-1548562].
830
The views, opinions, and findings contained in this paper are those of the
831
authors and should not be construed as an official NOAA or U.S. government
832
position, policy, or decision. We would like to thank the four anonymous re-
833
viewers for their very motivating and insightful comments.
Jo
urn al P
re-
pro
of
820
39
urn al P
re-
pro
of
Journal Pre-proof
Figure 3: Overview of two (blue and green) inner loop iterations of the 4DVAR algorithm using the cluster search method. (1) For the control run and each ensemble member the residuals are normalized by the observation error standard deviation, (2) the EOF modes ˆ (3) and singular values of the residuals are determined as they provide an estimate of B, the EOF modes are combined using the cluster search algorithm into M 0 search directions such that the search direction sum to the residual of the control run, (4) the ADJ, B, and TL are applied to each search direction, (5) for each ensemble member a solution in the shared search space is sought and new residuals are calculated (6) in the end the ADJ and
Jo
B are applied to each solution, generating a correction in the model space.
40
D
pro
of
Journal Pre-proof
E
/ D W L W X G H
>m s 1 @
urn al P
re-
/ R Q J L W X G H
F
>m s 1 @
/ R Q J L W X G H
>m s 1 @
/ R Q J L W X G H
Figure 4: (a) σ ˆEOF,1 wEOF,1 , first EOF of the wind field scaled by the standard deviation of the wind perturbations for this mode, (b) σ ˆEOF,2 wEOF,2 the second EOF of the wind field scaled by the standard deviation of the wind perturbations for this mode, (c) wS , example of a small-scale wind error field for one time. Colour scale shows the wind error
Jo
speed of the different wind fields in m s−1 .
41
0 R G H
0 R G H
9 D U L D Q F H >104 m2s 2 @
0 R G H
9 D U L D Q F H >104 m2s 2 @
0 R G H
urn al P
0 R G H 0 R G H
0 R G H
0 R G H
of
0 R G H
pro
re-
3 U R E D E L O L W \ G H Q V L W \ >10 5 m 2s2 @ 3 U R E D E L O L W \ G H Q V L W \ >10 5 m 2s2 @ 3 U R E D E L O L W \ G H Q V L W \ >10 5 m 2s2 @
Journal Pre-proof
9 D U L D Q F H >104 m2s 2 @
Figure 5: A priori (red) and posteriori (blue) probability density distributions for the vari2 ance of the EOF coefficients σEOF,i . The variance for which the blue distribution attains
Jo
2 its maximum is used as estimate σ ˆEOF,i . For reference, the values based on estimates used
previously in literature, σ ˆEOF,i is 30% of the standard deviation in βN AM,i (t), is marked by the vertical, black dashed line.
42
pro
re-
urn al P
3 6 ' >m3s 2 @
of
Journal Pre-proof
: D Y H Q X P E H U >radkm 1 @
Figure 6: Power spectral density of the meridional wind field as a function of the meridional wave number from the interpolated NAM wind field (blue) and the perturbed wind fields of the ensemble members (grey) on 8 August 2011 00:00. Also shown are the fits to the mean of the ensemble power spectral densities (dashed black line) and the NAM power
Jo
spectral density (dashed blue line) and the 95% confidence interval for the PSD estimates.
43
, Q Q H U O R R S L W H U D W L R Q i
re-
1 R U P D O L ] H G F R V W I X Q F W L R Q
1 R U P D O L ] H G F R V W I X Q F W L R Q
E : L Q G R Z V W D U W L Q J R Q $ X J Ns Ns Ns Ns Ns Ns Ns
pro
D : L Q G R Z V W D U W L Q J R Q 0 D \ Ns Ns Ns Ns Ns Ns Ns
of
Journal Pre-proof
, Q Q H U O R R S L W H U D W L R Q i
F 5 D W L R U H T X L U H G L W H U D W L R Q V 0 D \ $ X J
urn al P
5 D W L R L W H U D W L R Q V
Ns
Figure 7: The cost function (5) versus the number of inner loop iterations using different levels of parallelisation Ns for the window starting (a) on 31 May 2011 and (b) on 26 August 2011. For reference, the value of the cost function after 12 inner loop iterations with Ns = 4 is marked by a dashed black line. The cost function values are normalized
Jo
by the value before the start of 4DVAR. Panel (c) shows the speed-up ratio a(Ns ) (25).
44
Journal Pre-proof
' $ F R U U H F W L R Q Ns
' L I I H U H Q F H Ns
D 6 6 7
> &