Multidomain, sparse, spectral-tau method for helically symmetric flow

Multidomain, sparse, spectral-tau method for helically symmetric flow

Accepted Manuscript Multidomain, sparse, spectral-tau method for helically symmetric flow M. Beroiz, T. Hagstrom, S.R. Lau, R.H. Price PII: DOI: Refer...

814KB Sizes 0 Downloads 77 Views

Accepted Manuscript Multidomain, sparse, spectral-tau method for helically symmetric flow M. Beroiz, T. Hagstrom, S.R. Lau, R.H. Price PII: DOI: Reference:

S0045-7930(14)00233-3 http://dx.doi.org/10.1016/j.compfluid.2014.05.028 CAF 2576

To appear in:

Computers & Fluids

Received Date: Revised Date: Accepted Date:

20 January 2014 23 May 2014 24 May 2014

Please cite this article as: Beroiz, M., Hagstrom, T., Lau, S.R., Price, R.H., Multidomain, sparse, spectral-tau method for helically symmetric flow, Computers & Fluids (2014), doi: http://dx.doi.org/10.1016/j.compfluid.2014.05.028

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

MULTIDOMAIN, SPARSE, SPECTRAL-TAU METHOD FOR HELICALLY SYMMETRIC FLOW M. BEROIZ, T. HAGSTROM, S. R. LAU, AND R. H. PRICE

Abstract. We consider the application of a multidomain, sparse, and modal spectral-tau method to the helically reduced Navier Stokes equations describing pipe flow. This work (i) formulates the corresponding modal approximations, (ii) describes improved boundary conditions for the helically reduced equations, and (iii) constructs iterative solutions of the corresponding elliptic problem that arises in the reduction. Regarding (iii), we also present and test a method for preconditioning the matching conditions between subdomains, a method based on statistical sampling and the interpolative decomposition. Although the following application is only discussed in our concluding section, a partial motivation for this work has been our ongoing development of similar spectral methods for the construction binary neutron star spacetimes.

1. Introduction and preliminaries 1.1. Introduction. Spectral methods are well-developed for the solution of complex flow problems in Cartesian, cylindrical, and spherical domains; see, for example, the monographs [1, 2]. However, important flow structures in pipes and jets often exhibit exact or approximate helical symmetry. Consider, for example, the travelling wave solutions constructed by Gertsenshtein and Nikitin [3], Pringle and Kerswell [4], and Smith and Bodonyi [5]. In this paper we develop an efficient spectral-tau method for solving the helically reduced Navier Stokes equations describing spiral-wave pipe flow (which we often informally refer to as the “pipe problem”). Applications will be considered in subsequent works. The techniques we apply in this paper to iteratively solve the helically reduced Navier-Stokes equations are as follows: (i) tau-integration sparsification [7, 6], (ii) block Jacobi preconditioning adapted to our modal approximations, and (iii) interpolative decomposition via stochastic sampling [8]. The following paragraphs briefly review each of these techniques. Following this review, we discuss our approach in the context of modern preconditioning techniques [9, 10] for elliptic equations. Spectral tau methods [11] constitute a basic approach for approximation of differential equations. Differential operators are realized, as is standard for a modal spectral method, via recursion relations applied to the modal coefficients. In a tau method, the spectral basis functions are not adapted to the particular boundary conditions enforced in the problem; rather, supplementary relations between the modal coefficients (the “tau conditions”) are introduced to enforce the boundary conditions. Tau integration-sparsification [7, 6] (called “integration preconditioning” in these references) is a particular tau formulation in which differential operators are undone using integration realized through recursion relations. For certain operators this formulation results in sparse matrices. It turns out that the integration step also reduces the number of equations, so that there are fewer equations than the Corresponding author: S. R. Lau. Email: [email protected], Phone: +15052774613, Fax: +15052775505. 1

number of unknown modal coefficients. The extra information needed to complete the system is then exactly equivalent to supplying integration constants. In a physical problem this information comes in the form of auxiliary conditions (say boundary conditions) enforced as tau conditions. Tau integration-sparsification [7, 6] was originally developed for ordinary differential equations with rational coefficients, a setting in which the technique yields sparse, banded, wellconditioned system matrices. However, for partial differential equations the benefits of the method are not obvious a priori. This partly motivated our exploration of tau integrationsparsification with the PDEs in Refs. [12, 13]. In two and three dimensions, we found that while tau integration still yields sparse linear systems (for certain operators), conditioning issues prove more complicated. Nevertheless, [13] showed that further preconditioning on top of integration sparsification afforded drastic improvements in conditioning. Due to the results of our explorations, in the PDE context we now refer to the technique as integration sparsification rather than integration preconditioning. However, sparsification is only obtained for certain classes of operators. The further preconditioning mentioned in the previous paragraph, our second technique, is block-Jacobi preconditioning. While this is of course a standard preconditioner, its precise formulation in our sparse modal approach requires some description. We give such a description below for the 2d problems considered in this paper. However, we also point out that for the 3d helically reduced wave equation, essentially the same implementation of block-Jacobi preconditioning proved extraordinarily effective [13]. As we will see, having an already effective “rough” or “base” preconditioner at the ready, is necessary if we are subsequently to correct this preconditioner through interpolative decomposition. The third ingredient, added for the first time in this paper, is an additional preconditioning step in which the modern technique of interpolative decomposition [8] is employed to “correct” the block Jacobi preconditioning, thereby obtaining an overall preconditioner which is closer to the exact inverse of the full system matrix. As we will see, this correction is chiefly responsible for preconditioning the equations responsible for the joining or “gluing” of conforming subdomains. We show empirically that the corrected preconditioner scales when increasing the number of subdomains (annuli in our examples). A similar strategy has been considered in [14]; however, that reference only considered a Poisson-type problem posed on two adjacent rectangular subdomains, each discretized via the finite element method. Here we consider the first application of the strategy to spectral elements in the context of many subdomains and for a nonlinear problem. Moreover, as we shall see, the strategy makes use of sparsity, which is not present in typical (modal or nodal) spectral element formulations. The preconditioning technique we examine in this paper is experimental. Therefore, here we attempt to put our work in context. From both theoretical and implementation standpoints, preconditioning techniques [9, 10] for elliptic problems are highly developed for finite element and high-order spectral element methods. In particular, iterative substructuring methods are designed to handle conforming (non-overlapping subdomains). These are based on the use of the Schur complement to eliminate degrees of freedom which are interior to a subdomain, and lead to multilevel schemes which scale with the number of subdomains. However, we do not know how to exploit Schur complements in the context of our modal approach in which subdomain matching is expressed through tau-conditions.

2

The next subsection describes the helically reduced Navier Stokes equations. Section 2 presents the two scalar problems meant to elucidate the solution process. This section provides details for our block Jacobi and statistically corrected preconditioning, in addition to the results of several numerical tests based on the scalar problems. Throughout, our experiments have been carried out in serial with a 3.00GHz Intel(R) Core(TM)2 Duo CPU E8400 with 8GB of RAM (although only a fraction of this has been used). Nevertheless, Section 2 also comments on parallelization of the corrected preconditioner. Section 3 discusses our numerical methods for solving the full nonlinear pipe problem and presents the results of numerical tests. This section also presents a new and more efficient way of implementing boundary conditions for the pipe problem. Some of the details from Sections 1.2 and 2 are relegated to four Appendices. Our concluding section summarizes our results and describes our next steps. 1.2. Helically reduced Navier Stokes equations. We use subscripts both to indicate partial derivatives and (orthonormal) cylindrical components of vector fields. Although this dual practice may be confusing, the former case always pertains to scalar variables (e.g. u0 and ψ with derivatives ψr , u0ϕ , etc.) and the latter to vector fields (e.g. u with components ur , uθ , and uz ). Partial derivatives of vector components are alway explicitly written out (e.g., ∂ur /∂z). The incompressible Navier-Stokes system is ∂u + u · ∇u + ∇p = νΔu, ∇ · u = 0, (1) ∂t where u is the fluid velocity three-vector, p is the pressure, ν is the constant viscosity, and Δ = ∇2 is the standard Laplacian. In terms of cylindrical coordinates (r, θ, z), we introduce ϕ = θ + αz and assume a solution to (1) of the form (2)

u = u(t, r, ϕ),

p = P0 (z) + q(t, r, ϕ),

P0 (z) = −Qz,

where α is a constant. The helical reduction here really means that we assume a solution of the form w(t, r, θ + αz) = u(t, r, θ, z) and we use the symbol u in place of w. We introduce the vector 1 γ2 = (3) h = γ 2 (ez − αreθ ), 1 + α2 r 2 which obeys the following identities (each easily shown): (i) h · h = γ 2 ,

(ii) ∇ · h = 0,

(iii) ∇ × h = −2αγ 2 h,

(iv) h · ∇ϕ = 0.

Identity (iii) shows that h is a Beltrami vector field. We collect two other useful results concerning the Beltrami vector field h. First, since u = u(t, r, ϕ) satisfies ∇ · u = 0, there exist scalar functions u0 = u0 (t, r, ϕ) and ψ = ψ(t, r, ϕ) such that (4)

u = u0 h + ∇ψ × h.

Second, for any scalar function ψ [which need not be the one in (4)] we have (5)

∇ × (∇ψ × h) = −hγ −2 Δ∗ ψ,

where the action of the pipe Laplacian Δ∗ is   1 ∂2ψ 1 ∂ ∗ 2 ∂ψ rγ + 2 2. (6) Δψ ≡ r ∂r ∂r r ∂ϕ Here γ 2 ≡ 1/(1 + α2 r2 ). Appendices B and C respectively verify (4) and (5). 3

The original equations (1) are three evolution equations (equations for time development of the solution) and one constraint involving three velocity variables u and the pressure p. Helical reduction will yield two evolution equations and one constraint for three new variables: u0 , ψ, and η ≡ γ −2 h · ω, where ω = ∇ × u is the vorticity. In particular, we find evolution equations for u0 and η, as well as an auxiliary equation for ψ. We derive these equations in the Appendix, but here express them as follows: 1 1 ∗ (7a) Δ ψ + 2αu0 + 2 η = 0 4 γ γ    1 ∗ 1 ∂ψ ∂u0 ∂ψ ∂u0 1  (7b) Δ u0 + 2αη + 2 − = 2 u0t − Q 4 γ νγ r ∂r ∂ϕ ∂ϕ ∂r νγ   2 2α 1 ∂ψ ∂η ∂ψ ∂η 2α2 ∂ψ ∂u0 1 ∗ Δ η − (7c) u + − + η = 0 γ4 ν ∂ϕ νγ 2 r ∂r ∂ϕ ∂ϕ ∂r ν ∂ϕ  1 2α  u0t − Q + 2 ηt . ν νγ These equations, essentially those derived by Landman [15], hold on the disk (15) with a “wall boundary condition” u = rΩeθ at r = rmax . The pipe rotation rate Ω here is unrelated to the parameter in Eq. (16) below. Owing to the helical symmetry, we have ∇ψ = ψr er + r−1 ψϕ eϕ + αψϕ ez , whence (4) becomes (8)

u = r−1 ψϕ er − (γ 2 ψr + αrγ 2 u0 )eθ + (γ 2 u0 − αrγ 2 ψr )ez .

Therefore, the wall boundary condition enforces  2π 2 ψ(t, r, ϕ)dϕ = 0 (all at r = rmax ) , (9) u0 = −αr Ω, ψr = −rΩ, ψϕ = 0, 0

where we add the last condition since ψ is always differentiated in (7) and the remaining boundary conditions. In effect, we have a homogeneous Dirichlet condition ψ = 0 at r = rmax on the stream function, in addition to the Neumann condition ψr = −rΩ. On the other hand, no boundary condition is imposed on the vorticity variable η. Landman [15, 16] has studied (7) as a time evolution problem, but here our interest lies in viewing (7) as an elliptical system, in which case the above wall boundary boundary conditions (9) completes the associated boundary value problem. This viewpoint is akin to the one adopted in constructing helically symmetric binary neutron stars; see the concluding section. One scenario in which this alternative viewpoint arises is implicit or implicit-explicit (IMEX) evolution of (7); another is the enforcement of time-dependence consistent with spiral waves. Above we have assumed that the three-dimensional solution depends on θ and z only through ϕ = θ +αz, where the constant α is the pitch of the spiral wave being studied. Let us now further assume dependence on θ, z, and t only through (10)

ζ = θ + α(z − ct),

where the wave has axial speed c. Subject to this assumption, time derivatives on the right-hand sides of (7b,c) can be expressed in terms of ϕ derivatives via the replacements (11)

u0t → −αcu0ϕ ,

ηt → −αcηϕ .

With these replacements made in the right-hand sides, we get a nonlinear system of equations for ψ, u0 , η, and c in the independent variables r and ϕ only. Here α and Q are viewed as freely chosen and fixed. If c is treated as a new unknown, then the system (7), with the above 4

replacements in (7b,c), needs to be augmented with a supplemental algebraic equation, a “phase condition” [17]. Such spiral waves are also the eigensolutions to the eigenvalue problem stemming from linearization about the basic stationary solution to (7), (12)

u=U ≡

Q 2 − r2 )ez + rΩeθ , (r 4ν max

1 p = −Qz + Ω2 r2 2

or, equivalently,

    2 2 2 r2 αQrmax αQ 4 rmax αQrmax ψ=Ψ≡− (13a) Ω+ + r + Ω+ 2 4ν 16ν 2 8ν Q 2 u0 = U0 ≡ (13b) − r2 ) − αr2 Ω (r 4ν max αQ 2 (13c) r η = H ≡ 2Ω − 2ν 1 q = Ω2 r2 . (13d) 2 Consider perturbations (δΨ, δU0 , δH) of the basic solution (13) to (7). We assume that the nearby solution (Ψ + δΨ, U0 + δU0 , H + δH) obeys the same boundary conditions as (Ψ, U0 , H), so that the perturbations obey homogeneous boundary conditions. Appealing to both the stationary equations for (Ψ, U0 , H) and the ϕ-independence of this solution, dropping quadratic terms in the perturbations, and making the replacements δU0t → −αcδU0ϕ and δHt → −αcδHϕ , we cast (7) into the system 1 ∗ 1 Δ δΨ + 2αδU0 + 2 δH = 0 4 γ γ   1 ∗ 1 ∂Ψ ∂δU0 ∂δΨ ∂U0 αc (14b) Δ δU0 + 2αδH + 2 − = − 2 δU0ϕ 4 γ νγ r ∂r ∂ϕ ∂ϕ ∂r νγ   2 2α 1 ∂Ψ ∂δH ∂δΨ ∂H 2α2 ∂δΨ ∂δU0 1 ∗ Δ H − (14c) U + − + H = 0 γ4 ν ∂ϕ νγ 2 r ∂r ∂ϕ ∂ϕ ∂r ν ∂ϕ αc 2α2 c − δU0ϕ − 2 δHϕ . ν νγ This is a generalized eigenvalue problem of the form HδU = cKδU , and we consider the numerical solution of this problem below. (14a)

2. Two scalar problems Although our interest lies with the pipe Laplacian (6), this section considers a second scalar operator stemming from the 2d helically reduced wave equation (HRWE). This operator affords a more challenging test of our preconditioning methods, and also serves as a foil for the pipe Laplacian (6). We consider the problems associated with both operators in this section, because their contrast elucidates our ideas on preconditioning. Both problems are posed on a disk (15)

D = {x ∈ R2 : 0 ≤ |x| < rmax }.

Cartesian coordinates x = r cos ϕ, y = r sin ϕ are associated in the usual way with polar coordinates r, ϕ. Figure 1 shows a decomposition of D into a central square and annuli. 5

The number Nann of concentric annuli can be chosen to meet the needs of the computation; Fig. 1 depicts Nann = 2 for simplicity. The square subdomain S fills the region of the disk center, thereby avoiding troublesome coordinate issues. We have chosen such a domain decomposition since, due to conjectured radial layers [5], the modeling of spiral waves in pipe flow likely requires such control over the radial resolution. A second motivation involves similar aspects of the 3d domain decomposition needed to model neutron stars; see the concluding section. 2.1. Statement of the scalar boundary value problems. Our first problem is the 2d helically reduced wave equation (HRWE) [12]. For this case, the angle on D is1 ϕ = φ − Ωt, where t, r, φ are the time-spherical polar coordinates associated with an inertial frame. Therefore, for this problem our Cartesian coordinates x = r cos ϕ, y = r sin ϕ are not fixed in an inertial frame, but instead are fixed in a frame that rotates at angular velocity Ω with respect to an inertial frame. Moreover, as is appropriate for a “two-center problem”, we will assume that the rotation center x, y = −a, −b is not the coordinate center x, y = 0, 0 of D, as illustrated in Fig. 1. The equation is then most easily expressed in the above rotating Cartesian coordinates. In these coordinates the wave operator, L = ∇2 − ∂t2 , subject to helical reduction, takes the form (16)

L ≡ ∂x2 + ∂y2 − Ω2 [(x + a)∂y − (y + b)∂x ]2 .

The problem to solve is therefore (17)

Lψ = g, x ∈ D;

ψ = h, x ∈ ∂D,

where and g and h are prescribed sources. We sometimes refer to (16) as the L-problem. Our second problem is the pipe Poisson problem [cf. (6)]: (18)

Δ∗ ψ = g, x ∈ D;

ψ = h, x ∈ ∂D.

We sometimes refer to (18) as the Δ∗ -problem. While the physical natures of our two problems are different, both arise from the mixing of periodic variable (azimuthal angle) with 1The

parameter Ω here is unrelated to the Ω appearing in (9) above.

2

y

1 x

S b a

(a) domain decomposition

(b) rotational center for HRWE

Figure 1. Domain D. The details of the domain decomposition are shown on the left. The dashed smallest circle is the inner boundary of annulus 1. For simplicity, the decomposition is shown for Nann = 2 concentric annuli. For the case of the HRWE only, the relationship between D and the rotational center (heavy black dot) is shown on the right. 6

a nonperiodic variable of infinite range, either z (Δ∗ -problem) or t (L-problem). Moreover, from the numerical standpoint the problems (16) and (18) will be treated similarly. This section mostly focuses on the more challenging L-problem. 2.2. Sparse modal approximation. We approximate both (16) and (18) via modal spectral methods through “integration preconditioning” [6, 12, 13], which we also refer to as integration sparsification. Throughout, we assume both familiarity with the technique and the notation used in these references. The first step is to express both of these problems on the square S and on a generic annulus in analytical forms that facilitate the integration sparsification technique [6, 12, 13]. First, for the 2d HRWE on S we write L = ∂x2 [1 − Ω2 (y + b)2 ] + ∂y2 [1 − Ω2 (x + a)2 ] (19) − Ω2 [∂x (x + a) + ∂y (y + b) − 2∂x ∂y (x + a)(y + b)], here viewing L as an operator which eventually acts on the scalar function ψ. On each annulus we view (16) as r2 Lψ = r2 g. Then, in terms of the angular functions F (ϕ) = a sin ϕ − b cos ϕ, G(ϕ) = a cos ϕ + b sin ϕ, we have [12] (20)

r2 L = ∂r2 r2 (1 − Ω2 F 2 ) + ∂r r[−3 + Ω2 (2F 2 + G2 + rG)] + ∂ϕ2 [1 − Ω2 (r + G)2 ] + Ω2 ∂ϕ rF − 2Ω2 ∂ϕ ∂r rF (r + G) + 1 − Ω2 (G2 + rG).

For the pipe Laplacian (6) in the Cartesian coordinates on S, we have 1 ∗ Δ = ∂x2 (1 + α2 r2 )(1 + α2 y 2 ) + ∂y2 (1 + α2 r2 )(1 + α2 x2 ) γ4 − 2α2 ∂x ∂y (xy + α2 x3 y + α2 xy 3 ) (21) + α2 ∂ (−5x + α2 x3 + α2 xy 2 ) x

2

+ α ∂y (−5y + α2 y 3 + α2 x2 y) + 8α2 . As before, the Cartesian derivatives on the left-hand side see all terms to the right and the function ψ on which γ −4 Δ∗ eventually acts. After multiplication by r2 on each annulus, we find the form for the Δ∗ -problem equivalent to (20) r2 ∗ Δ = ∂r2 (r2 + α2 r4 ) − ∂r (3r + 9α2 r3 ) + (1 + 15α2 r2 ) + ∂ϕ2 (1 + α2 r2 )2 . γ4 The key point in Eqs. (19), (20), (21), and (22) is that all derivatives have been pulled out to the left, and so each operator is in a form ready for integration sparsification. For example, starting with the form of r2 L in (20), we replace all physical space operators by their matrix counterparts in the space of modal coefficients (e.g., ∂r → Dr , r → Ar , F (ϕ) → F). All derivative matrices then appear at the left. Therefore, these differentiations can be “undone” via the action of integration matrices from the left. Here this action is 2 . The result is given in Eq. (62) of Ref. [12]. defined through the “sparsifier” B = Bϕ2 ⊗ Br[2] We use the same technique for the pipe Laplacian on annuli, and the relevant matrix after integration preconditioning is

2 BAL = Bϕ2 ⊗ Ir[2] (Ir + α2 A2r )A2r − 3Br[2] (Ir + 3α2 A2r )Ar + Br[2] (Ir + 15α2 A2r ) (23) 2 (Ir + 2α2 A2r + α4 A4r ), + Iϕ[1] ⊗ Br[2] (22)

7

where AL = (Iϕ ⊗ A2r ) · L is the spectral matrix representation of r2 γ −4 Δ∗ and B has the 2 as for the 2d HRWE. same form B = Bϕ2 ⊗ Br[2] 2.3. Block Jacobi preconditioner. This subsection describes the block Jacobi preconditioner that we have used in iteratively solving the above equations with GMRES. To simplify the description, we again take Nann = 2, and we start by decomposing D as in Fig. 1. On S we let AL denote L (HRWE) or γ −4 Δ∗ (pipe), and on each annulus we let AL denote r2 L (HRWE) or r2 γ −4 Δ∗ (pipe). With B representing integration sparsification on all subdomains, the linear system of equations approximating one of the problems above has the form (24)

= BA Mψ g,

1, ψ 2 )T of modal coefficients from each is the concatenation (ψ S, ψ where M = BAL and ψ is a similar concatenation of coefficients for g or γ −4 g, as the case subdomain. Moreover, g may be. In Eq. (24) rows in both M and BA g that would have otherwise been empty (due to empty rows in B) have been filled with the tau conditions responsible for both the Dirichlet boundary conditions and the “gluing” of S to 1 and 1 to 2. The gluing (or matching) conditions involve the appropriate use of Dirichlet and Neumann vectors; see [12] for a detailed description. The sparse matrix M has the structure ⎞ ⎛ MSS MS1 (25) M = ⎝ M1S M11 M12 ⎠ . M21 M22 The “rough” preconditioner we use then has the following structure: ⎞ ⎛ GSS ⎠, G11 (26) G=⎝ G22 −1 where GSS ≡ M−1 SS is stored as a precomputed P LU factorization. In general G11 = M11 −1 and G22 = M22 , but G11 and G22 are themselves block diagonal. On an annulus the operator (20) is not diagonal in the Fourier space dual to the ϕ coordinate since this operator mixes Fourier modes. To explain the construction of G11 , we let α, β, · · · denote the global indices associated has components ψ(α). with the overall set of concatenated equations (24), so that ψ The unknowns for the central square are modal coefficients for a double Chebyshev expansion. Let Nx denote the maximum Chebyshev mode number used for the x dependence in the central square, with the analogous meaning ascribed to Ny . Nx + 1 is then the number number of x modes used, and Ny + 1 the number of y modes. We define the integer σ1 ≡ (Nx + 1)(Ny + 1), the “shift” for annulus 1. The components ψ(α) which belong to S lie in the range 0 ≤ α ≤ σ1 − 1, since the indexing of α starts at 0. It follows that the block matrix MSS operating on the these mode coefficients has elements MSS (I, J) with 0 ≤ I, J ≤ σ1 −1. The indexing for annulus 1 starts at σ1 . We let Nr1 and Nϕ1 specify the truncations for the radial and angular modes on annulus 1 (so, in particular, Nϕ1 + 1 is the total number of sines and cosines). The components ψ(α) which belong to annulus 1 then have σ1 ≤ α ≤ σ1 + 1 1 (Nr +1)(Nϕ +1)−1. Moreover, the block M11 has components M11 (I, J) = M(σ1 +I, σ1 +J),

8

with 0 ≤ I, J ≤ (Nr1 + 1)(Nϕ1 + 1) − 1. Now, the direct product representation on annulus 1 (and similarly on all annuli) is chosen such that the two-index modal coefficients are [12] (27)

1 + (N 1 + 1)q + j). ψ 1qj = ψ(σ r

Therefore, we may likewise express I = (Nr1 + 1)q + j and J = (Nr1 + 1)p + k, here with 0 ≤ q, p ≤ Nϕ1 , as the mode indices dual to ϕ, and 0 ≤ j, k ≤ Nr1 as those dual to r. Therefore, M11 q(Nr1 + 1) + j, p(Nr1 + 1) + k is our direct product representation for the components M11 (I, J) of M11 . To define G11 , we sample M11 (I, J) only for (I, J) pairs with p = q, i.e. on the Fourier ¯ 11 M11 with the p = q elements of block diagonal, in order to define an approximation M −1 ¯ M11 set to zero. We set G11 ≡ M11 , and in practice the application of G11 is defined through ¯ 11 . The matrix G22 precomputed P LU factorizations of the Fourier blocks which comprise M is defined in the same way, as is the matrix Gjj for the jth annuli. We have found empirically that such (sub-)block Jacobi preconditioning drastically reduces the iteration count for 3d subdomain GMRES solves [13]. Here we find the same iteration reduction, but we also study “correction” of this basic preconditioner through statistical sampling. Finally, notice that (23) is diagonal in Fourier space; whence for the pipe Laplacian Δ∗ we have G11 = M−1 11 and −1 G22 = M22 , with G11 and G22 as defined above. In fact, we have the following. Claim 2.1. For the approximation of the Δ∗ -problem, consider the coefficient matrix M for an arbitrary number Nann of annuli. Let G be the corresponding block Jacobi preconditioner described above. If M−1 = G + ΔG, then the rank of the correction ΔG is (28)

 = 2(Nx + Ny ) + 2    term 1 

N ann 

(Nϕj + 1) − (NϕNann + 1) .    j=1 term 3   term 2

Here term 1 is the number of empty rows corresponding to the edge boundary conditions of S, term 2 is the number of empty rows corresponding to the inner/outer boundary conditions of all annuli, and term 3 (subtracted) is the number of empty rows corresponding to the outer boundary condition for the outermost annulus. Proof. To justify the claim, first note that for our approximation of the Δ∗ -problem the difference ⎞ ⎛ ⎞ ⎛ ¯ M11 M11 ... ... ⎠=M−⎝ ⎠ (29) M−⎝ ¯ MNann ,Nann MNann ,Nann has all zero elements, except for those rows corresponding to “gluing tau conditions”. There are precisely  of these rows; whence this difference is rank . The claim then follows from the Woodbury matrix identity.  Remark 2.2. The claim does not hold for our approximation of the L-problem. These observations lie behind our comparison of approximations for Δ∗ and L. 9

M/PPSPD square S annuli (j = 1, 2) 8.50 Nx = 5 = Ny Nrj = 6, Npj = 16 11.1 Nx = 7 = Ny Nrj = 8, Npj = 20 13.1 Nx = 9 = Ny Nrj = 9, Npj = 24 15.4 Nx = 10 = Ny Nrj = 11, Npj = 28 18.6 Nx = 12 = Ny Nrj = 14, Npj = 32 21.2 Nx = 14 = Ny Nrj = 16, Npj = 36 23.8 Nx = 16 = Ny Nrj = 18, Npj = 40 25.5 Nx = 18 = Ny Nrj = 20, Npj = 44 28.1 Nx = 20 = Ny Nrj = 22, Npj = 48 Table 1. Sequence of resolutions. These truncations correspond to the number of modes and/or number of nodal points. M/PPSPD stands for modes/points per subdomain per dimension. 2.4. Modal versus nodal preconditioning. This subsection considers the 2d HRWE (L-problem) on the subdivided disk D, showing that our sparse-tau method with the blockJacobi preconditioning described above is competitive in comparison to a standard nodal method with finite-difference preconditioning based on an incomplete LU factorization [1]. Appendix A describes the finite-difference preconditioner used for the nodal method. We use the method of manufactured solutions, choosing the sources g and h such that the exact solution to the problem (17) is



(30) ψ(x, y) = sin x + κx sin(λx x) cos y + κy cos(λy y) for λx = −40, λy = 32, and κx = 10−14 = κy . We motivate the 10−14 choice below. The operator is fixed by (a, b) = (2, 21 ) and Ω = 0.0975. Moreover, the domain decomposition of D has been chosen as − 12 ≤ (x − a), (y − b) ≤ 12 for S, 25 ≤ r ≤ 54 for annulus 1, and 5 ≤ r ≤ 2 for annulus 2. 4 We have numerically solved (17) on the sequence of truncations listed in Table 1, using preconditioned GMRES [18] with a 10−12 tolerance on the relative residual. The parameters κx and κy have been chosen to ensure that the derivatives of ψ have sizes on the order of the tolerance. The resulting errors are compared in Fig. 2 and the iteration count and times required in Fig. 3. The errors have been calculated as follows. For the nodal method ψ corresponds to a collection of nodal values on the various subdomains, from which we may compute nodal derivative values, using appropriate nodal derivative matrices. Computing x- and y-derivative nodal values on S, and r- and ϕ-derivative nodal values on annuli, we are then able to assemble nodal values for the Cartesian (first or second order) derivatives over all of D. These nodal values for the x- and y-derivatives are then compared to the exact nodal values, with the maximum error then computed over all nodal values and over each coordinate direction. For the sparse-tau (modal) method, our goal is to also compute such nodal values for the x- and y-derivatives, thereby setting up the exact same comparison. corresponds to a collection of modal values (expansion coefficients) on each However, now ψ subdomain. We could, but do not, convert these modal values to nodal values and proceed as before. However, we first compute modal values for the x- and y-derivatives on S, and rand ϕ-derivative modal values on each annuli. To compute these, we apply (modal) spectral differentiation matrices. Once the modal values (expansion coefficients) for each needed 10

second derivative first derivative zeroth derivative

−3

modal nodal

10 −6 10 −9 10 −12 10 10

15

20

25

−3

30 modal nodal

10 −6 10 −9 10 −12 10 10

15

20

25

−3

30 modal nodal

10 −6 10 −9 10 −12 10

10 15 20 25 30 modes or nodes per subdomain per dimension

Figure 2. Maximum derivative errors. Computation of these errors is described in the text. derivative has been obtained, we then transform these derivative values to nodal values and assemble the relevant Cartesian derivative values.

iterations

150 100

modal nodal

50

t(GMRES)

10 6 4

20

25

30

15

20

25

30

modal nodal

2 0 6

t(total)

15

4

10 modal nodal

2 0

10 15 20 25 modes or nodes per subdomain per dimension

30

Figure 3. Iterations and timings. Plots here correspond to those in Fig. 2. Figures 2 and 3 indicate that for the problem considered the two methods perform similarly. In Fig. 3 we see slightly better run times (in seconds) for the modal method, with the cost 11

clearly dominated by the construction of the block Jacobi preconditioner. However, this construction appears to be sensitive to cache size, and in our observations whether the modal or nodal method is fastest depends on the machine used. Figure 2 shows that the modal method gives better derivative errors for the given parameters; however, here the errors are limited by the tolerance for GMRES. Moreover, for larger choices of κx and κy the derivative errors are comparable. We have arranged our experiment so that each solution corresponds to the tolerance being achieved without restarts (starting from zero initial condition). With more iterations (perhaps with restarts) all errors between the two methods are essentially the same. The modal tau method does offer two advantages over the nodal approach. First, in our experience it is more robust (iterations are less prone to hanging with restarts seldom needed). Second, as the figures clearly indicate, the modal method should prove more efficient when multiple right-hand sides/solves are encountered. Indeed, its iteration count is essentially independent of the truncation, while its setup costs are independent of the number of righthand sides. 2.5. Correction of the preconditioner through interpolative decomposition. We now sketch an algorithm [8] for computing the interpolative decomposition (ID) of a matrix A, an algorithm based on statistical sampling and designed to efficiently compress a low-rank matrix. We use essentially this algorithm to “correct” the preconditioner G given in (26) in order to account for the gluing of subdomains in our multidomain sparse modal treatment of (17) and (18). We work with real matrices, whereas [8] used complex ones. Suppose we wish to approximate Am×m , known to be of low rank (or approximately of low rank) k < m. An interpolative decomposition (ID) of A then has the form [8] (31)

Am×m ≈ Bm×k · Pk×m ,

Bm×k · Pk×m − Am×m 2  σk+1 ,

where · 2 is the matrix two-norm, Bm×k (a “column skeleton”) is comprised of k columns of Am×m , σk+1 is the (k + 1)st singular value of Am×m , and a subset of the columns of Pk×m comprise the k × k identity matrix. No entry of Pk×m is larger in magnitude than 2. With high probability the following algorithm [8] produces the ID for a rank-k matrix. Algorithm 1 Low-cost construction of the ID for a rank-k matrix 1: For k   < m choose R×m with entries that are independent and identically distributed Gaussian random variables with zero mean and unit variance. 2: Compute the product Y×m = R×m · Am×m . 3: Via a pivoted-QR factorization2 of Y×m , construct the ID Y×m ≈ Z×k · Pk×m ,

Z×k · Pk×m − Y×m 2  τk+1 .

Here Pk×m is the same matrix as above, Z×k is comprised of k columns of Y×m , and τk+1 is the (k + 1)st singular value of Y×m . 4: Construct Bm×k so that its columns correspond to those of Am×m in exactly the same way the columns of Z×k correspond to those of Y×m . Let us now describe how we use essentially this algorithm to correct the “rough” preconditioner (26). Note that the block-Jacobi preconditioner G does not account for the coupling between the subdomains. A key point is the following: as seen in the experiments above, use of G by itself yields accurate results with reasonable iteration counts (that is to say, with 12

preconditioned GMRES we can solve the relevant linear system accurately with G alone as the preconditioner). The approach we describe is only worthwhile if a large number of right-hand sides g are encountered, or multiple solves within a system or an outer NewtonRaphson iteration. These are precisely the situations encountered below when considering the helically reduced Navier-Stokes equations. The idea is to assume (32)

M−1 = G + ΔG,

where ideally the correction ΔG is low rank. We seek to approximate ΔG through an ID. Let (33)

R×m = N×m · Mm×m ,

where N×m is a random Gaussian matrix. The computation (33) is fast, since Mm×m is sparse. However, it does require the application of the transpose, since what we actually T . The right-hand side of compute is MTm×m Nm× (34)

R×m · ΔGm×m = N×m − R×m · Gm×m

can then be computed directly, although, similar to before, this requires computation of T T Gm×m Rm× . Due to sparsity, the cost of (33) is O(m) rather than O(m2 ); the cost of (34) is also subordinate to O(m2 ), since Gm×m is block diagonal. Equation (34) serves as the starting point (step 2 in Algorithm 1) for an interpolative decomposition of ΔG. For  m the dominant cost of the QR factorization (step 3 in Algorithm 1) is O(2 m). Therefore, for our sparse formulation the construction and interpolative decomposition of Y×m = R×m · ΔGm×m has a cost subordinate to O(m2 ). In any case, here R×m is not a Gaussian random matrix as described above, rather it is a Gaussian matrix applied to Mm×m ). Nevertheless, we proceed as above, and from step 4 therefore need to compute k   select columns of ΔG. Since columns of G are easily generated through application of the block Jacobi preconditioner on the canonical basis vectors, obtaining the pth column of ΔG is then tantamount to solving Mz = ep , since ΔGep = z − Gep . Clearly the construction of ΔG is then k times more expensive than a single linear solve. Here we test construction and use of the corrected preconditioner on the same problem considered above. We choose k =  in the algorithm above as precisely the number (28) of gluing tau conditions, and, moreover, apply the above algorithm even when rank(ΔG) > . As shown in Claim 2.1, for the Δ∗ -problem rank(ΔG) = . However, for the L-problem the rank(ΔG) > , and the correction computed via the ID will only approximate the exact correction. The L-problem is therefore a morerobust test of the method. For both cases the ann k k 2 cost of applying G to a vector is O(Nx2 Ny2 + N k=1 Nϕ (Nr ) ), and the cost of applying the correction ΔG (whether truly the exact one or not) is O(m), where the system size is (35)

m = (Nx + 1)(Ny + 1) +

N ann 

(Nrk + 1)(Nϕk + 1).

k=1

Assuming our ID algorithm computes the exact correction ΔG, for the Δ∗ -problem the exact inverse M−1 can then be applied to a vector with cost subdominant to O(m2 ). Table 2 displays the results of solving the same L-problem as above, using our sparse modal approach and three choices of preconditioner: no preconditioner, the block Jacobi preconditioner considered above, and the block Jacobi preconditioner with correction ΔG computed as above. Clearly the corrected preconditioner yields the fastest solve and requires 13

no preconditioning (2 annuli) t(GMRES) error 0 error 1 7.6e-02 2.2199e-05 3.2900e-04 2.4e-01 1.3613e-07 2.1646e-05 4.6e-01 1.8377e-09 1.1036e-07 8.7e-01 1.0763e-10 9.1362e-09 1.8e+00 1.4376e-10 1.5369e-08 3.0e+00 3.4054e-10 5.0385e-08 4.8e+00 1.1208e-09 2.2544e-07 5.6e+00 2.6867e-10 4.3812e-08 8.4e+00 9.8875e-10 1.9889e-07

MPSPD 8.5 11.1 13.1 15.4 18.6 21.2 23.8 25.5 28.1

iterations 166 233 289 343 419 495 570 561 638

MPSPD 8.5 11.1 13.1 15.4 18.6 21.2 23.8 25.5 28.1

block Jacobi preconditioning (2 iterations t(GMRES) error 0 45 1.6e-02 2.2199e-05 45 3.6e-02 1.3613e-07 46 5.6e-02 1.8376e-09 46 8.4e-02 1.8061e-11 46 1.4e-01 8.3400e-13 46 1.9e-01 7.5950e-13 46 2.5e-01 8.5365e-13 46 2.9e-01 8.0547e-13 46 3.7e-01 8.2279e-13

MPSPD 8.5 11.1 13.1 15.4 18.6 21.2 23.8 25.5 28.1

corrected block Jacobi iterations t(GMRES) 10 8.0e-03 9 1.2e-02 9 2.0e-02 9 2.8e-02 9 4.4e-02 9 6.4e-02 9 9.2e-02 9 1.0e-01 9 1.4e-01

annuli) error 1 3.2900e-04 2.1646e-05 1.1036e-07 4.7783e-09 1.0757e-11 6.1950e-12 3.8676e-12 4.8276e-12 4.0188e-12

preconditioning (2 annuli) error 0 error 1 2.2199e-05 3.2900e-04 1.3613e-07 2.1646e-05 1.8376e-09 1.1036e-07 1.8063e-11 4.7783e-09 8.3500e-13 1.0763e-11 7.6161e-13 3.3424e-12 8.5398e-13 6.4925e-12 8.1113e-13 5.8906e-12 8.2634e-13 1.6957e-12

error 2 6.6734e-03 1.0571e-04 7.5988e-07 5.5204e-07 1.1703e-06 4.9883e-06 2.9337e-05 9.8955e-06 2.6070e-05 error 2 6.6734e-03 1.0571e-04 7.3158e-07 3.2999e-08 2.3935e-10 3.3724e-10 2.0967e-10 2.1378e-10 2.2273e-10 error 2 6.6734e-03 1.0571e-04 7.3158e-07 3.2999e-08 2.0541e-10 3.0519e-10 1.1477e-09 1.0893e-09 2.2483e-10

Table 2. Comparison with corrected preconditioner. Here error 0 corresponds to the max norm of the error in the solution, error 1 the max norm in the error of the derivative, and error 2 the max norm in the error of the second derivative.

the fewest iterations. Note, however, there is degradation in the derivative errors for the overresolved truncations, although at the optimal resolution of 18.6 MPSPD the error values of the second and third sub-tables are comparable. Table 3 shows some of the same results as shown in Table 2, but for the Δ∗ -problem instead of the L-problem. The results of this table indicate that, apart from finite precision effects, the corrected preconditioner is indeed the exact inverse, as theoretically argued. The degradation in the derivative errors seen with the corrected preconditioner likely stems from the following mechanism. Effected through finite-precision matrix multiplication, the correction introduces perturbations in the components of the vector to which it is applied. Since modal Chebyshev differentiation is ill-conditioned, such perturbations are then “pumped up,” especially for large truncations. Table 4 (top) shows results for the corrected preconditioner, only now with 4 annuli defined by 25 ≤ r ≤ 45 , 45 ≤ r ≤ 65 , 65 ≤ r ≤ 85 , and 85 ≤ r ≤ 2. The radial resolution 14

MPSPD 8.5 11.1 13.1 15.4 18.6 21.2 23.8 25.5 28.1

corrected block Jacobi iterations t(GMRES) 1 0.0e+00 1 4.0e-03 1 4.0e-03 1 4.0e-03 1 1.2e-02 1 1.6e-02 1 2.0e-02 1 2.0e-02 1 2.8e-02

preconditioning (2 annuli) error 0 error 1 4.0259e-05 6.4721e-04 1.0357e-07 1.3734e-05 1.5229e-09 2.1160e-07 9.1173e-12 2.4355e-09 4.3432e-13 3.9453e-11 5.4105e-13 5.8913e-11 1.4668e-12 1.6113e-10 2.0399e-12 2.4261e-10 3.8755e-11 1.2271e-08

error 2 1.2326e-02 7.4549e-05 9.0623e-07 1.1552e-08 3.1705e-09 6.6716e-09 2.3859e-08 2.4436e-08 1.7926e-06

MPSPD 8.5 11.1 13.1 15.4 18.6 21.2 23.8 25.5 28.1

corrected block Jacobi iterations t(GMRES) 1 0.0e+00 1 4.0e-03 2 4.0e-03 2 8.0e-03 2 1.2e-02 2 2.0e-02 2 2.8e-02 2 2.8e-02 2 4.0e-02

preconditioning (2 annuli) error 0 error 1 4.0259e-05 6.4721e-04 1.0357e-07 1.3734e-05 1.5228e-09 2.1160e-07 9.1295e-12 2.4355e-09 5.9508e-14 2.0482e-11 1.6653e-15 1.3806e-13 3.8303e-15 1.4169e-13 2.9976e-15 1.1746e-13 3.8858e-15 5.2375e-14

error 2 1.2326e-02 7.4549e-05 9.0622e-07 1.1550e-08 1.1347e-10 1.1667e-11 1.7272e-11 1.0964e-11 7.8440e-12

Table 3. Corrected preconditioner results for the Δ∗ -problem. This table shows the same results for the corrected preconditioner as shown in Table 2, except here for the Δ∗ -problem rather than the L-problem. In the top table the GMRES solve was constrained to a single iteration (again starting from a zero initial iterate). In the bottom table the iteration count is determined by the tolerance choice as before.

is therefore effectively doubled. Table 4 (bottom) shows the results for 6 annuli defined by 2 ≤ r ≤ 0.7125, 0.7125 ≤ r ≤ 0.935, 0.935 ≤ r ≤ 1.2, 1.2 ≤ r ≤ 1.475, 1.475 ≤ r ≤ 1.75, and 5 1.75 ≤ r ≤ 2. For the new cases (4 or 6 annuli) the number of modes on the rectangle and each annuli again stems from Table 1 (increasing the number of annuli does slightly affect the listed MPSDPD value). While the setup cost associated with constructing the preconditioner correction increases with the number of annuli (subdomains), if we ignore this cost (as is justified when the preconditioner will be reused), then the corrected preconditioner scales: the total number of iterations appears to be independent of the number of annuli. Of course by itself, the block Jacobi preconditioner does not scale. 2.6. Potential for parallelization. As mentioned, our experiments have been carried out in serial, but we here offer simple observations regarding parallelization of our approach. Our procedure yields a factorization of the form (36)

ΔGm×m ≈ Bm× · P×m .

: mhigh (likely asAssuming that the μth processor Proc(μ) has an ownership range mlow μ μ sociated with a given subdomain or mode thereon) corresponding to the index m, Proc(μ) high would then hold both B(mlow μ : mμ , :) and P (:, :), where we have switched to colon notation for matrices. Application of the correction ΔG to a vector x distributed across processors could then be carried out via parallel matrix-multiplication; with the implementation details then relying –as is standard– on the message passing procedure for x. We have yet to 15

MPSPD 9.0 11.7 13.7 16.3 19.7 22.4 25.0 27.3 29.9

corrected block Jacobi iterations t(GMRES) 9 1.2e-02 9 2.8e-02 8 4.0e-02 9 7.2e-02 9 1.2e-01 10 1.9e-01 9 2.3e-01 9 2.9e-01 9 3.7e-01

preconditioning (4 annuli) error 0 error 1 2.5506e-05 3.1871e-04 1.3707e-07 2.1623e-05 6.8277e-10 1.1007e-07 1.8338e-11 4.7779e-09 9.1394e-13 1.0761e-11 7.9536e-13 2.7923e-12 8.3833e-13 2.0056e-12 8.1690e-13 3.6423e-12 8.2201e-13 1.2392e-11

error 2 6.7079e-03 1.0292e-04 6.9083e-07 3.2985e-08 1.0210e-09 1.3761e-10 3.6708e-10 1.6350e-09 7.3218e-09

MPSPD 9.3 12.0 14.0 16.7 20.1 22.8 25.6 28.0 30.7

corrected block Jacobi iterations t(GMRES) 11 2.8e-02 9 5.2e-02 9 8.0e-02 9 1.3e-01 8 2.0e-01 9 3.2e-01 9 4.3e-01 9 5.5e-01 9 7.0e-01

preconditioning (6 annuli) error 0 error 1 2.5759e-05 3.1762e-04 1.4033e-07 2.1623e-05 6.9876e-10 1.1007e-07 1.8187e-11 4.7779e-09 9.0894e-13 1.7552e-11 7.9692e-13 2.9205e-12 8.3955e-13 2.6229e-12 8.1135e-13 2.9496e-12 8.1946e-13 1.4793e-11

error 2 6.7095e-03 1.0285e-04 6.9174e-07 3.2984e-08 4.9230e-09 2.9729e-10 1.0561e-09 1.4460e-09 1.0542e-08

Table 4. Corrected preconditioner. The setup for this table is the same as for Table 2, except here it involves either 4 or 6 rather than 2 annuli.

consider whether or not construction of the factorization can be parallelized. However, this factorization is a start-up cost. 3. Numerical treatment of the helically reduced Navier Stokes system 3.1. Numerical implementation. Since in this paper we view the system (7) in the context of a boundary value problem, our solution method is Newton-Raphson iteration with the linear problem for each Newton step solved by preconditioned GMRES. We use the sparse-tau method to approximate the system (7), and here comment on some implementation issues. These issues pertain both to (7) and its linearization on any of the constituent subdomains, i.e. essentially the same issues arise in (i) computing the nonlinear residual and (ii) defining the matrix-vector multiply within GMRES. Item (i) is of course a key ingredient in Newton-Raphson iteration. Item (ii) is also a key ingredient, since it affords iterative solution of the linear problem defining the Newton step. To fix ideas, let us now consider only the modal approximation of (7b) on the jth annulus with the replacement of the time derivative given in (11). That is, we consider only the u0 field component in the calculation of the nonlinear residual, ingredient (i). In addressing ingredient (ii) for the u0 component we would consider an equation like (14b). Ingredients (i) and (ii) also pertain to the square S, although for this subdomain all equations must be expressed in terms of the Cartesian coordinates. In Section 2 we introduced global indices α, β, · · · corresponding to the linear system (24). For the 2d HRWE problem on the jth annulus there is only a single scalar field ψ, and the two-index modes are [cf. (27)] (37)

j + (N j + 1)q + j), ψ qj = ψ(σ r 16

 k k where σj = (Nx + 1)(Ny + 1) + j−1 k=1 (Nr + 1)(Nϕ + 1). For the pipe problem we must now consider three fields ψ, u0 , η in the concatenation of modal coefficients. We therefore with components U (A) such that introduce super indices A, B and the vector U (38)

(σψ + α), ψ(α) =U

(σu0 + α), 0 (α) = U u

(ση + α), (α) = U η

where σψ = 0, σu0 = σNann +1 , ση = 2σNann +1 . (These shifts use the expression for σj with j = Nann + 1.) To isolate just the coefficients for the jth annulus, we now set (39)

j (I) = ψ(σ j + I), ψ

j0 (I) = u 0 (σj + I), u

j (I) = η (σj + I), η

(A), the modal where 0 ≤ I ≤ (Nrj + 1)(Nϕj + 1) − 1. In terms of these subvectors of U approximation of (7b) on the jth annulus with the above replacement of the time derivative is     j · Dϕ u j · Dr u . j0 − Dϕ ψ j0 = −ν −1 BΓ2 αcDϕ u j0 + Q η j +ν −1 BΓ1 Dr ψ (40) BAL uj0 +2αBA As described in Section 2 the matrix BAL is our sparse form of the pipe Laplacian. (Since j j , rmax , this it is of course determined by the truncations Nrj , Nϕj and the parameters rmin matrix —and others to be described— might also carry the j label.) The matrices Γ1 and Γ2 represent the action of rγ −2 and r2 γ −2 , respectively. (Recall that on annuli the equation is multiplied by an overall r2 factor.) The matrices Dr = Iϕ ⊗ Dr and Dϕ = Dϕ ⊗ Ir represent spectral differentiation. Such modal differentiations present in our approximations of (7) · Dϕ u 0 are have motivated our study of derivative errors in Section 2. Products such as Dr ψ computed as Cauchy products, which is expensive but avoids the introduction of aliasing errors associated with computing products in physical point space. Here we have described computation of the u0 component of the nonlinear residual (on an annulus). However, similar · Dϕ u 0 , appearing Cauchy products are used to compute variational terms, for example Dr δ ψ the defining matrix-vector multiply for GMRES. The boundary conditions described above feature both fixation of the stream function ψ (Dirichlet condition) and its derivative ψr (Neumann condition). With the Dirichlet condition on u0 , we therefore have three sets of tau conditions involving the modal coefficient for the outer annulus “O” These conditions may be expressed as   NrO NrO NrO   2 O + + O + + + + ψmn δn = hm , ψmn νn = q m , (u0 )O (41) mn δn = k m , O O r − r max min n=0 n=0 n=0 where δn+ and νn+ are components of the Dirichlet and Neumann vectors[12] . Here we are using a superscript O as shorthand for Nann , so that, for example, NrO ≡ NrNann and σO = σNann . To complete each encountered linear system these tau conditions must fill empty rows which arise from the integration sparsification. Since there are three variables (ψ, u0 , η), there are three sectors or “super” blocks associated with the linear system. We fill empty rows in the ψ-ψ superblock with the first set of tau conditions in (41), and the u0 -u0 superblock with the last set. However, the middle conditions must then be placed in the η-ψ superblock, since the empty rows in the ψ and u0 sector have already been filled with the other tau conditions. As a result the spectral representation of the pipe Laplacian for the η sector is singular, although the overall linear system is of course nonsingular. Nevertheless, we seek an improved implementation of the boundary conditions and their implementation as tau conditions. With an eye toward the conditioning of the discretized system and to 17

speed up the convergence of its iterative solution, we now trade fixation of ψr for a boundary condition involving η (and also u0 ). The idea is to have each block operator in the system associated with its own boundary condition (expressed as tau conditions), and our treatment is akin to the approach to boundary conditions developed in [19, 20, 21]. For reasons which will be apparent, the new conditions will be called transpose response boundary conditions, and TRANS boundary conditions for short. Upon discretization the equation (7a) takes the form (42)

+ BA , Mψ g = BAJ

is the modal coefficient vector for all domains (concatenation), with g holding and J where ψ 2 the same coefficients for 2αu0 + η/γ and J, respectively. The source J is not present in (7a), but we include it here to allow for manufactured solutions. As discussed in Section 2, B represents integration preconditioning on all domains, and A the process of multiplication by Iϕ ⊗ A2r on annuli. On the annuli the source J, for example, is premultiplied by r2 prior to its conversion to modal coefficients. Here M represents BAL, where L is the spectral representation of γ −4 Δ∗ . O of ψ The relevant outer boundary condition is expressed in terms of the coefficients ψ O corresponding to the outer annulus (“O”), or ψmn in two-index form. As usual, certain  O + δn = hm that enforce empty rows of the system (42) are filled with the equations n ψ mn the Dirichlet condition, here generalized with right-hand side hm to the inhomogeneous case. In particular, the Dirichlet vector components δn+ are placed within rows of M, and we may . If both g are prescribed, then and J view the hm as inserted into rows of BAJ

O (I) ≡ ψ(σ O + I) = M−1 BA(J −g ) (σO + I). (43) ψ O O ((N O + 1)m + n) with the scaled Neumann We then contract the coefficients ψ mn = ψ r + O O vector 2νn /(rmax − rmin ) in order to achieve an expression for the modal coefficients of ψr on the outer boundary, an expression depending on the modal coefficients for u0 and η on all subdomains. To compute M−1 (σO + I, :) for 0 ≤ I ≤ (NrO + 1)(NϕO + 1) − 1, that is, to compute the last (NrO + 1)(NϕO + 1) rows of M−1 , we exploit the formula

(44)

(M−1 )T (:, α) = (M−1 )T eα ,

where eα is a canonical basis vector with α = Nann + I and I in the above range. That is, we solve the transpose problem (45)

MT z = eα ,

for the column z = (M−1 )T (:, α) which gives the αth row M−1 (α, :) of the inverse. Here α can be expressed as α = σO + (NrO + 1)m + n, with 0 ≤ n ≤ NrO and 0 ≤ m ≤ NϕO . Therefore, for a fixed index m (angular mode) (46)

−1 O ), ψ O mn = M (σO + (Nr + 1)m + n, :)BA(J − g

and we can precompute and store the NrO + 1 rows of M−1 needed to recover ψ O mn for O 0 ≤ n ≤ Nr . The Neumann derivative is then   Nr 2 ). νn+ ψ O (47) mn = G(m, :)BA(J − g O O rmax − rmin n=1 18

α ν Q c A1 , B1 A2 , B2 A3 , B3

0.25 1.20 2.03 1.11 1.00,1.00 0.90,1.10 0.80,-0.70

Nx Ny 11 11 annulus 1 2 3

rectangle xmin xmax ymin ymax -0.5 0.5 -0.5 0.5 Nr Nϕ rmin rmax 11 40 0.40 1.25 12 41 1.25 2.00 13 42 2.00 3.00

annulus Nr 1 11 2 12 3 13 4 14 5 15 6 16

Nϕ 40 41 42 43 44 45

rmin 0.40 0.80 1.25 1.60 2.00 2.50

rmax 0.80 1.25 1.60 2.00 2.50 3.00

Table 5. Parameters specifying manufactured solution and subdomains. The first collection of three tables to the left specifies the 3-annuli test; for the 6-annuli test, replace the third table with the last one. Here for each m mode we have introduced a vector   Nr 2 νn+ M−1 (σO + (NrO + 1)m + n, :). (48) G(m, :) ≡ O O rmax − rmin n=1 Now, if the Neumann conditions to be enforced are   Nr 2 + νn+ ψ O m , (49) mn = q O O rmax − rmin n=1 then with the above equations we can express these conditions as − q + . (50) G(m, :)BA g = G(m, :)BAJ m

In our three-field super linear system, we will use these equations as a set of tau conditions which replace the middle conditions in (41). In this context, write (51)

= 2α η, g u0 + G

where G is the spectral representation of multiplication by γ −2 . The relevant tau conditions, which are placed within empty rows stretching across the u0 -η and η-η super blocks, are then − q + . = G(m, :)BAJ (52) 2αG(m, :)BA u0 + G(m, :)BAG η j

(NrO

1)(NϕO

+ + 1) global transpose solves, to Therefore, numerically the strategy is, from precompute the non-square matrix G(0 : NϕO , :). To apply the boundary condition as part η ) and of the matrix-vector multiply for the iterative solver, we then (i) form BA(2α u0 + G then (ii) apply G(0 : NϕO , :). Notice that construction of G(0 : NϕO , :) involves global Dirichlet solves, i.e. a scalar problem as considered in Section 2. Therefore, these solves may be sped up considerably by using the aforementioned block Jacobi preconditioner corrected through the randomized algorithm. 3.2. Various boundary conditions and preconditioners. We first consider the system (7) with a manufactured solution. Again with v = (ψ, u0 , η), we fix (53)

vk = cos(Ak y) sin(Bk x) + x + y 2 + 2,

with different constants Ak and Bk for k = 1, 2, 3. We also pick α, ν, Q, and c, where t derivatives are expressed in terms of ϕ derivatives as prescribed in (11). We must also introduce the appropriate forcing terms J = (Jψ , Ju0 , Jη ) on the right-hand sides of (7a,b,c) in order to balance the equations. For the experiment described here parameter and subdomain choices are listed in Table 5. We consider both the case of 3 annuli and 6 annuli. 19

For 3 annuli Table 6 reports results for the solution of (7) corresponding to the described manufactured solution for various choices of boundary conditions and options for the preconditioner. For all cases Dirichlet conditions are adopted for ψ and u0 , but the final boundary condition can be one of the following: (i) a Neumann condition (PSI-N) also enforced on ψ, (ii) the transpose response conditions (TRANS) described above, and (iii) a Dirichlet condition (ETA-D) on η. Each of these cases is considered in Table 6. However, not listed is the case of no preconditioning. With a Neumann condition also on ψ (the PSI-N case), no preconditioning corresponds to nearly 104 cumulative GMRES iterations accumulated over the Newton steps, and a total wall-clock time of ttotal  2000 seconds. Evidently, this case is expensive and not comparable to the other table entries. For all other reported cases we at least use a block Jacobi preconditioner for each sector. This preconditioner is the same for the ψ and u0 sectors, but is different for the η sector if TRANS boundary conditions are adopted (since this sector is then filled with tau conditions which specify the transpose response boundary conditions rather than Dirichlet conditions). Actually, for the PSI-N boundary conditions, the relevant tau-conditions are placed within the η-ψ block, and therefore the discretized η-η block operator corresponding to Δ∗ is singular (its empty rows are not filled with tau conditions). For this case we nevertheless use the block Jacobi preconditioner corresponding to Dirichlet boundary conditions. Table 7 reports the same results, but now with the disk partitioned into 6 rather than 3 annuli (with corresponding increase in the radial resolution). For each of the cases the number of Newton iterations performed is not the same as in Table 6, hindering direct comparison. Nevertheless, for the CPC cases doubling the number of annuli has little effect on the cumulative number of iterations, suggesting that also for these solves the corrected preconditioner scales. Inspection of the errors in Tables 6 and 7 indicates that 6 annuli corresponds to over resolution. For all cases in the table the tolerance for each linear solve within the Newton-GMRES iteration is tied to the size of the nonlinear residual F as follows [22]. For the first n = 0 linear solve we choose the tolerance tol as maxtol = 0.1, and thereafter as   (n) ) 2 / F(U (n−1) ) 2 . (54) tol = min maxtol, 21 F(U 2 2 (n) ) 2 / F(0) 2 < 1.0e-13. The termination criterion for the outer Newton iteration is F(U All setup solves, e.g. preliminary solutions to construct preconditioner corrections via the randomized algorithm or the transpose response boundary conditions, use a tolerance of 1.0e-12. As mentioned, the transpose of the computed correction to the Dirichlet block Jacobi preconditioner can also be used to speed up the solves necessary to construct the matrix G(0 : NϕO , :) described above. Enforcement of Dirichlet conditions on η (ETA-D) is also considered in the table, but this option is only possible here since we have chosen a manufactured solution. In the physical pipe problem, η does not have its own boundary condition, and this “ideal” option is not at our disposal. Nevertheless, the table results indicate that the TRANS boundary conditions yield results which are comparable with this ideal case. 3.3. Computation of eigenvalues for the linearized problem. Now we turn to the represent eigenvalue problem (14), expressed symbolically as HδU = cKδU . Let X ≡ δ U the associated concatenation of spectral expansion coefficients for all three fields on each 20

BCs PSI-N PSI-N TRANS TRANS ETA-D ETA-D

CPC n y n y n y

CPCT NA NA n y NA NA

itotal 670 273 253 79 365 69

iNew 5 6 5 5 6 4

tNewGMR 147.34 64.44 74.82 30.46 80.78 17.08

ttotal 147.46 83.80 75.12 74.87 80.90 36.80

errψ 1.0081E-12 1.0054E-12 1.3192E-11 1.1715E-12 1.4067E-12 3.8625E-11

erru0 2.8129E-12 2.8111E-12 6.0156E-12 2.8320E-12 2.9035E-12 2.9424E-11

errη 3.5169E-11 3.5162E-11 9.2614E-10 3.4668E-11 2.4428E-11 3.6950E-11

Table 6. Influence of boundary conditions (3 annuli). The top table corresponds to 3 annuli, the bottom table to 6 annuli. The BC types PSI-N, TRANS, and ETA-D are described in the text. CPC = y/n refers to whether or not the block Jacobi preconditioner is corrected (with interpolative decomposition), and CPCT = y/n to whether or not the supplemental block Jacobi preconditioner for the η sector with transpose response boundary conditions is also corrected. Here itotal refers to the cumulative GMRES iterations over iNEW outer Newton iterations, and tNewGMR is the time in seconds spent on linear solves within the Newton-GMRES iterations, whereas ttotal also includes the cost of constructing the preconditioners and (as the case may be) transpose response boundary conditions. Variable errors such as errψ are computed relative the exact solution after interpolation onto a uniform physical grid (they are maximum pointwise errors).

BCs PSI-N PSI-N TRANS TRANS ETA-D ETA-D

CPC n y n y n y

CPCT NA NA n y NA NA

itotal 1023 193 459 81 580 72

iNew 5 5 6 5 7 4

tNewGMR 620.07 129.63 336.86 112.90 349.10 50.54

ttotal 620.37 344.47 337.61 498.30 349.39 265.14

errψ 2.4869E-13 6.9544E-13 1.2941E-11 4.5963E-13 2.3412E-12 1.1161E-11

erru0 2.4247E-13 5.3380E-13 5.3664E-12 3.0598E-13 1.4504E-12 7.2129E-12

errη 1.3101E-12 3.4070E-11 8.1263E-10 9.1838E-12 9.3481E-13 8.0989E-12

Table 7. Influence of boundary conditions (6 annuli).

subdomain. The preconditioned equation is then

(55)

HX = cKX,

where the precise expressions for the real matrices for H and K involve integration matrices as described earlier. We solve this generalized eigenvalue problem via inverse iteration, doing so with purely real arithmetic despite the possibility of complex eigenvalues. Assume that ¯ the corresponding two-dimensional {c, c¯} are conjugate eigenvalues, with X = span(X, X) eigenspace. Further assume that {c, c¯} are the eigenvalues closest to the real number μ. The following yields approximations for {c, c¯}. 21

Algorithm 2 Inverse iteration for (55) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:

Choose tol > 0 and random unit vectors v (−2) , v (−1) , v (0) for j = 1, 2, . . . do Solve (H − μK)w = Kv (j−1) v (j) = w/ w 2 Compute QR = [v (j) , v (j−1) , v (j−2) ] if |(R)33 | < tol then break end if end for Choose the first two columns {q 1 , q 2 } of Q as an orthonormal basis for X Solve both (H − μK)w1 = Kq 1 and (H − μK)w2 = Kq 2 Compute the eigenvalues of the 2-by-2 matrix [ˆ q 1 , qˆ 2 ]T · [w1 , w2 ]

We remark on several of the steps above. Steps 2 through 4 constitute power iteration for the matrix (H − μK)−1 K, and this matrix has eigenvalue (c − μ)−1 , provided c−1 is an eigenvalue of H−1 K. For complex c and X, after many iterations we expect that the successive iterates v (j) , v (j−1) , v (j−2) will all lie in X , i.e. they will be linearly dependent and therefore in step 5 subject to a QR-factorization of the form ⎞ ⎛ 1 r12 r13 (56) [q 1 , q 2 , q 3 ] ⎝ 0 r22 r23 ⎠ = [v (j) , v (j−1) , v (j−2) ], |r33 | 1. 0 0 r33 The size |r33 | therefore indicates whether the set {q 1 , q 2 } is an approximate basis for X . With respect to this basis, steps 11 and 12 effectively compute the restriction (H − μK)−1 K|X of the full matrix (H − μK)−1 K to the eigenspace X . The eigenvalues of the matrix in step 12 are then approximately {(c − μ)−1 , (¯ c − μ)−1 }. Mackrodt [23] has considered this eigenvalue problem via different finite-difference methods, and as a validation of our procedures we compute one eigenvalue, comparing our answer to his. Mackrodt assumes [see his Eq. (2.1)] a perturbation of the form uˆ(r, φ, z, t) = u(r) exp[i(αz + γφ − βt)]

(57)

(Mackrodt’s α is not our α).

To avoid confusion with our variables, we first rewrite Mackrodt’s argument αz + γφ − βt as az + γθ − βt. To match the dependence in (10), we then set az + γθ − βt = γ(αz + θ − αct), 2 /ν, and he also uses3 from which a = γα and β = γαc. Mackrodt’s variable W0 = 14 Qrmax 2 3 4ν Ωrmax Qrmax c , Re = = . , Re here φ z 2 Qrmax ν 4ν 2 In one scenario Mackrodt fixes a¯ = 1 = −γ, so we choose rmax = 2 and α = −0.5. He also has Reφ = 0 and Rez = 200, so we choose Q = 25, ν = 0.5, and Ω = 0. Then cMack = 50c. For this scenario Mackrodt quotes

(58)

V0 = Ωrmax ,

(59)

a ¯ = armax ,

cMack =

cMack = 0.64526 − 0.12921i

as one of a conjugate pair corresponding to the linear problem. 3Literally,

Mackrodt’s Eq. (2.4) is c = β/αW0 , which we interpret as cMack = β/(aW0 ) = chere /W0 , a formula which makes sense dimensionally. Also note that Mackrodt’s α ¯ is our a ¯. 22

y

Reψ 2

1

1

1

0

0

0

−1

−1

−1

−2 −2

0

2

−2 −2

Imψ

y

Reη

Reu0

2

0

2

2

−2 −2

2

1

1

1

0

0

0

−1

−1

−1

0

x

2

−2 −2

0

x

2

Imη

Imu0

2

−2 −2

0

2

2

−2 −2

0

x

2

Figure 4. Eigen-solution for Ω = 0. Since the mode (57) has the single azimuthal wave number 1, the equations which define the eigenvalue problem are effectively ODE. Nevertheless, we may use our PDE solver to compute the eigenvalue, merely taking Npj = 3 as the number of angular modes for all subdomains (3 will include both sin/cos modes of wave number 1). We choose three equally spaced annuli, each with Nrj = 25 and Nx = 15 = Ny . We take μ = 0.64 (real) and run the inverse iteration with 1.0e-11 as the tolerance for |r33 |. After 42 iterations with the adjoint response boundary conditions and ID correction preconditioning, we find 4ν (60) c = 0.6452614916 − 0.1292054348i. 2 Qrmax Figure 4 shows the corresponding eigen-solution. Results for PSI-N boundary conditions and without corrected preconditioning are similar, but take nearly three time longer to compute. To examine the effect of pipe rotation, we examine the same scenario, but now with Ω = 4.55997, that is with Mackrodt’s Reφ = 36.47976. This configuration is close to a point on the “γ = −1 neutral stability curve” depicted in Fig. 2(b) on page 157 of [23]. For this 2 ) is  10−7 . Figure choice of pipe rotation we find that the imaginary part of 4νc/(Qrmax 5 shows the corresponding eigen-solution. Relative to the same plots in Fig. 4, notice the enhancement of vorticity η at the boundary. 4. Summary and Conclusions We have presented an efficient sparse, multidomain, spectral-tau method for solving the helically reduced Navier Stokes equations (7) on a disk, whether viewed in the context of spiral waves or an implicit time-stepping scheme. In particular, with M representing the matrix approximation corresponding to the pipe Laplacian Δ∗ (the key operator featured in the system (7)), we have shown how to use the interpolative decomposition to construct a 23

y

Reψ 2

1

1

1

0

0

0

−1

−1

−1

−2 −2

0

2

−2 −2

Imψ

y

Reη

Reu0

2

0

2

2

−2 −2

2

1

1

1

0

0

0

−1

−1

−1

0

x

2

−2 −2

0

x

2

Imη

Imu0

2

−2 −2

0

2

2

−2 −2

0

x

2

Figure 5. Eigen-solution for Ω = 4.55997. preconditioner which effectively serves as a fast direct inverse. The construction has been supported by theoretical arguments (cf. Claim 2.1) and numerical tests (cf. Table 3). Furthermore, we have empirically studied this new technique on a more complicated operator, the operator L (16) stemming from the 2d HRWE. Our results tentatively suggest that correction of a “rough” preconditioner via statistical sampling and the interpolative decomposition may prove useful for a broader class of problems. Finally, to further support our approach, we have made a basic comparison between preconditioned-nodal and preconditioned-sparsemodal spectral methods as applied to the 2d HRWE (cf. Subsection 2.4). We have also described improved boundary conditions for the system (7) when it is solved as boundary value problem (cf. Subsection 3.1). The combination of our fast preconditioner and these “transpose response boundary conditions” (the TRANS-y-y option in Table 6) should prove efficient for parameter sweeps, provided the constructed preconditioner and boundary conditions can be reused for different Reynolds numbers and pipe rotation rates. As an application of our approach, we hope to numerically construct the neutral modes conjectured in [5]. Such construction will likely require parameter continuation and potentially a huge number of preconditioned Newton-GMRES solves. Optimized boundary conditions and preconditioning would therefore prove correspondingly important. As a separate application of the described methods, we mention ongoing work toward construction of binary neutron star (BNS) spacetimes. A BNS is a pair of orbiting neutron stars which loses energy due to outgoing gravitational waves and, as a result, slowly spirals inward. BNS inspiral should prove a secure source of gravitational waves for ground-based detectors (LIGO, VIRGO, GEO600, TAMA) now operating, and for their advanced versions in the near future [24, 25, 26, 27]. The problem of constructing appropriate BNS initial data is difficult, and current prescriptions [28, 29, 30, 31, 32] have relied on a simplifying technical assumption (conformal flatness) which renders the data unrealistic. Our approach towards 24

construction of BNS initial data relies on imposition of helical symmetry on the coupled coupled matter-Einstein equations. See Refs. [33, 34, 35, 36, 12, 13] preliminary work in this direction. The reduced equations are of mixed type (elliptic within a “light cylinder,” and hyperbolic outside of it), and they are solved as a boundary value problem on a large, solid, spherical domain surrounding the BNS, notably with radiation boundary conditions placed on the outer spherical boundary. The nature of the resulting solutions requires that we partition [13] the domain into a complicated configuration of block, cylindrical-shell, and sphericalshell subdomains, as depicted in Fig. 6. In particular, notice that each star (see the right panel in Fig. 6) is surrounded by concentric spherical shells, with a central cube filling in an inner hole. In [13] we have numerically solved the scalar three-dimensional helically reduced scalar wave equation (3d HRWE) on a similar “two-centered domain” (essentially the domain shown in Fig. 6, but with a topological difference since the central cubes were absent). The approach in [13] involved numerical approximation of the 3d HRWE by the same sparse modal methods considered here, with the resulting linear system solved by preconditioned GMRES. Much of [13] focused on a particular realization of block Jacobi preconditioning, a realization that yielded a drastic reduction in GMRES iterations. However, such preconditioning is enacted only on a subdomain-by-subdomain basis; it ignores the coupling between subdomains. As evident in Fig. 6, our configuration of subdomains does include overlapping elements, and we have used the alternating Schwarz method to precondition such overlap. However, the decomposition chiefly features conforming subdomains, and heretofore our preconditioners have ignored the coupling between such subdomains. As solving the full BNS problem by iterative methods will require a huge number of linear solves which are themselves essentially inversions of the 3d HRWE, development of a preconditioning strategy for the conforming subdomains in our decomposition is imperative. Based on our results in the paper, we believe that the interpolative decomposition can be used in this regard, and we plan to implement the ideas tested here in our developing software for constructing helically symmetric BNS spacetimes.

(a) domain decomposition

(b) close up of decomposition

Figure 6. Domain decomposition for neutron star problem. Here we show the 3d decomposition. 25

5. Acknowledgments This work was primarily supported by NSF Grant DMS-1216866 to UNM and UTB. In addition, Hagstrom was partially supported by ARO Grant W911NF-09-1-0344 and NSF Grant OCI-0904773. The conclusions and recommendations in this manuscript are those of the authors and do not necessarily represent the views of the NSF or ARO. We are grateful for discussions with D. Appel¨o and E. A. Coutsias, whose comments and suggestions have improved the manuscript. Appendix A. Incomplete LU preconditioner on an annulus For the nodal method in Section 2 we have employed a preconditioner based on an incomplete LU factorization of a corresponding finite-difference stencil obtained as follows (our construction is similar to the one given in [1]). First, in the operator (20) we drop all first derivatives in both ϕ and r, keeping only the second-order derivative terms in order to define r2 LFD ≡ (1 − Ω2 F 2 )r2 ∂r2 + [1 − Ω2 (r + G)2 ]∂ϕ2 .

(61)

Similar to our notation in the main text, we use ψjk to indicate nodal values and ψ(σ +(Nr + 1)j + k) for the corresponding values stored as a single-array vector. Here 0 ≤ j ≤ Nϕ and 0 ≤ k ≤ Nr , and σ is a shift term which locates the annulus coefficients within the overall linear system; for simplicity, we now set σ = 0. We approximate (61) by a finite difference stencil (here with the index restrictions 1 ≤ k ≤ Nr − 1) with the following form: (62)

(HFD ψ)jk = δjM AM k ψ0k + (1 − δj0 )Bjk ψj−1,k + Cjk ψj,k−1 + Djk ψjk + Ejk ψj,k+1 + (1 − δjM )Fjk ψj+1,k + δj0 G0k ψM k ,

where we have used the shorthands M = Nϕ and (HFD ψ)jk = (HFD ψ)((Nr + 1)j + k). To these equations we add boundary conditions of the form (with N = Nr ) (63)

(HFD ψ)j0 = (ψj1 − ψj0 )/Δr,

(HFD ψ)jN = ψjN ,

where Δr = r1 −r0 . Evidently, here we consider a Dirichlet condition on the outer boundary, and a Neumann condition on the inner boundary (we choose a first-order stencil for the latter in order to preserve the diagonal structure of HFD ). Other possibilities are handled similarly. Using the same notation, now define the actions of L and U as (both expressions with 1 ≤ k ≤ N − 1) (64a)

(Lψ)jk = δjM AM k ψ0k + (1 − δj0 )Bjk ψj−1,k + Cjk ψj,k−1 + jk ψjk

(64b)

(Uψ)jk = ψjk + ujk ψj,k+1 + (1 − δjM )vjk ψj+1,k + δj0 w0k ψM k .

To determine our incomplete factorization LU HFD , we compute (LUψ)jk and then match seven of the terms in the resulting expression with the corresponding ones in (62). This process yields the equations (again, all with 1 ≤ k ≤ N − 1)

(65)

Djk Ejk Fjk G0k

= jk + Cjk uj,k−1 + (1 − δj0 )Bjk vj−1,k + δjM AM k w0k = jk ujk = jk vjk = 0k w0k . 26

Enforcement of the boundary conditions in (63) is affected through (66)

(Lψ)j0 = −ψj0 /Δr, (Lψ)jN = ψjN , (Uψ)j0 = ψj0 − ψj1 , (Uψ)jN = ψjN .

When coupled with the formulas (65) above, we then have the following algorithm for generating the incomplete factorization. Algorithm 3 Construction of L and U 1: for j = 0, 2, . . . , M do 2: Set j0 = −1/Δr, uj0 = −1 3: for k = 1, N − 1 do 4: jk = Djk − Cjk uj,k−1 − (1 − δj0 )Bjk vj−1,k − δjM AM k w0k 5: ujk = Ejk /jk 6: if j < M then 7: vjk = Fjk /jk 8: end if 9: if j = 0 then 10: w0k = G0k /0k 11: end if 12: end for 13: Set jN = 1, ujN = 0 14: end for The preconditioner is then the inverse of LU, and this inverse can be applied on the fly due to the special triangular structure of L and U. Appendix B. Velocity decomposition For ϕ = θ + αz let u = u(r, ϕ) and ∇ · u = 0, with both holding inside a “pipe” P = {(r, θ, z) : r < rmax , 0 ≤ θ < 2π, |z| < ∞}. Then there exist scalar functions u0 = u0 (r, ϕ) and ψ = ψ(r, ϕ) such that u = u0 h + ∇ψ × h,

(67) 2

where h = γ (ez − αreθ ). We make the following comments: (i) here all time dependence is suppressed, as it plays no role in the result; (ii) again, we precisely mean (cf. the text just after Eq. (2)) u(r, θ, z) = v(r, θ + αz), for example; and (iii) here we view all functions and vectors as defined on the three-dimensional domain P. To prove the result, we first use identity (i) from the beginning section to write h·u a · h = 0, (68) u = 2 h + a × h, γ and therefore u0 ≡ γ −2 h · u. To show that a is conservative, that is can be written as a = ∇ψ, it suffices to demonstrate that ∇ × a = 0 follows from our assumptions. Here we are relying on the fact that our pipe P is a star-shaped domain. Since h × u = h × (a × h) = (h · h)a − (h · a)h, again by identity (i) we have    er eθ ez   h × u  =  0 −αr 1  = −(αruz + uθ )er + ur eθ + αrur ez . (69) a= 2 γ  ur u θ u z  27

The last result implies

(70)

    e re e r θ z  1  ∂/∂r ∂/∂θ ∂/∂z  ∇×a=  r  −(αru + u ) ru αrur  z θ r   ∂ur ∂ur er = α − ∂θ ∂z   ∂ ∂uz ∂uθ − α (rur ) + αr + eθ ∂r ∂z ∂z   1 ∂ ∂uz 1 ∂uθ + (rur ) + α + ez . r ∂r ∂θ r ∂θ

The er component of the last expression vanishes by the helical symmetry. To simplify the remaining components, we use the assumption 0=∇·u=

(71) to write

1 ∂uθ ∂uz 1 ∂ (rur ) + + , r ∂r r ∂θ ∂z

    ∂uz ∂uz ∂uθ ∂uθ − eθ + α − ez . ∇×a= α ∂θ ∂z ∂θ ∂z

(72)

Whence ∇ × a = 0, again by the helical symmetry. Given a helically symmetric u, computation of ψ from u would require solution of the elliptic problem. Indeed, using (4), (5), and identity (iii), we find ∇ × u = ∇u0 × h − 2αγ 2 u0 h − hγ −2 Δ∗ ψ,

(73) from which (74)

h · (∇ × u) = −2αγ 4 u0 − Δ∗ ψ.

Therefore, as the elliptic equation we get (75)

Δ∗ ψ = −h · (∇ × u) − 2αγ 2 (h · u)

which must be solved subject to appropriate boundary conditions on ∂P. Appendix C. Vector form of the helical Laplacian This Appendix establishes (5). Via a standard vector identity and identity (ii) listed above for h, we have (76)

∇ × (∇ψ × h) = −hΔψ + (h · ∇)∇ψ − (∇ψ · ∇)h.

Focusing on the last two terms in this expression, we find (77)

(h · ∇)∇ψ − (∇ψ · ∇)h =     ∂ ∂ ∂ ∂ ∂ 2 −1 −2 −α (ψr er + r ψϕ eθ + αψϕ ez ) − ψr + r ψϕ + αψϕ h. γ ∂z ∂θ ∂r ∂θ ∂z 28

Continuing with ∂θ er = eθ and ∂θ eθ = −er , we then get (h · ∇)∇ψ − (∇ψ · ∇)h

(78)



 ∂ ∂ ∂ −2 = αγ (r ψϕ er − ψr eθ ) − ψr + r ψϕ + αψϕ γ 2 (ez − αreθ ) ∂r ∂θ ∂z ∂γ 2 ∂ ez + αψr (rγ 2 )eθ − αγ 2 r−1 ψϕ er = αγ 2 (r−1 ψϕ er − ψr eθ ) − ψr ∂r ∂r ∂γ 2 ∂γ 2 ez + αrψr eθ = −ψr ∂r ∂r ∂ = −hψr log γ 2 . ∂r 2

−1

Substitution of this result into (76) with γ −2 Δ∗ ψ = Δψ + ψr ∂r log γ 2 yields (5). Appendix D. Helical reduction We first express the system (7) as follows: (79a) (79b) (79c)

Δ∗ ψ + 2αγ 4 u0 + γ 2 η = 0    γ 2 ∂u0 γ2  ∗ 4 ∇ψ × ∇u0 · h = −Q Δ u0 + 2αγ η + ν ν ∂t  

2α2 γ 4 ∂u0 1 2αγ 4 ∂u0 γ 2 ∂η ∗ 2 u0 + ∇ψ × ∇(γ η) · h = −Q + . Δη− ν ∂ϕ ν ν ∂t ν ∂t

The equivalence of (79) and (7) can be checked by evaluating the curl terms in cylindrical polar coordinates, for example using    er eθ ez   (80) ∇ψ × ∇u0 =  ψr r−1 ψϕ αψϕ  .  u0r r−1 u0ϕ αu0ϕ  The dot product with h then yields (81)

h · (∇ψ × ∇u0 ) = r−1 (ψr u0ϕ − ψϕ u0r ).

Using ∇γ 2 = −2α2 rγ 4 er , we similarly find (82)

h · [∇ψ × ∇(γ 2 η)] = r−1 γ 2 (ψr ηϕ − ψϕ ηr ) + 2α2 γ 4 ηψϕ .

Therefore, our task is to derive (79) from the Navier-Stokes equations (1) and the definition ω = ∇ × u of vorticity, subject to our assumptions (2). Our derivation will make repeated use of (4) and (5), as well as identities (i)–(iv) above. Before deriving Eqs. (79), we note that (79c) can also be expressed as   1 ∂ψ ∂ξ ∂ψ ∂ξ 2α2 γ 4 ∂u0 1 ∗ 2 2 ∂ξ 2 4 + 4α γ ξ − u0 + − Δ ξ + 4α rγ γ2 ∂r ν ∂ϕ νr ∂r ∂ϕ ∂ϕ ∂r (83) 1 ∂w 2αγ 4 − Q, = ν ∂t ν 29

where ξ ≡ γ 2 η and w ≡ 2αγ 4 u0 + γ 2 η. We then find 4 1 ∗ 1 ∗ 2 ∂w 2 2 4 2 ∂(2αγ u0 ) Δ w + 4α r γ w − Δ (2αγ u ) − 4α r + 4α − 4α2 γ 2 (2αγ 4 u0 ) 0 γ4 ∂r γ4 ∂r     2α2 γ 2 ∂u0 1 ∂ψ ∂w ∂ψ ∂w 1 ∂ψ ∂(2αγ 4 u0 ) ∂ψ ∂(2αγ 4 u0 ) (84) − u0 + 2 − − 2 − ν ∂ϕ νγ r ∂r ∂ϕ ∂ϕ ∂r νγ r ∂r ∂ϕ ∂ϕ ∂r 2 1 ∂w 2αγ − Q, = 2 νγ ∂t ν as another expression for the preceding equation.

D.1. Equation (79a). To establish this equation, we first combine the definition of vorticity with (4) and (5) to reach (85)

ω = −2αγ 2 u0 h + ∇u0 × h − hγ −2 Δ∗ ψ.

Contraction of this equation with h then yields (79a). Note that we may also write ω = ηh + ∇u0 × h,

(86)

in parallel with the decomposition (4) for u. D.2. Equation (79b). Since the velocity u is divergence free, we can write ∂u + u · ∇u + ∇p + ν∇ × (∇ × u) = 0. ∂t Now ω = ∇ × u, so from (85) and (79a) we have (87)

(88)

∇ × u = ηh + ∇u0 × h.

Therefore, dotting the second to last equation with h, we get ∂u0 (89) + h · (u · ∇u) + h · ∇p + νh · ∇ × (ηh + ∇u0 × h) = 0. γ2 ∂t In simplifying this equation, we first focus on the viscous terms proportional to ν. First, direct calculation gives

∇ × ηh = ∇ × γ 2 η(ez − αreθ )     (90) ∂ 2 α ∂ 2 2 −1 2 2 2 γ η eθ − r γ η ez . = (r γ + α γ r)ηϕ er − ∂r r ∂r From the last equation, we then have (91)

h · (∇ × ηh) = αγ 2 r

∂ 2 αγ 2 ∂ 2 2 γ η− r γ η = −2αγ 4 η. ∂r r ∂r

Second, from (92)

∇ × (∇u0 × h) = ∇u0 ∇ · h − hΔu0 + (h · ∇)∇u0 − (∇u0 · ∇)h,

we similarly calculate from (5) that (93)

∇ × (∇u0 × h) = −hγ −2 Δ∗ u0 .

Therefore, in total the viscous terms are (94)

νh · ∇ × (ηh + ∇u0 × h) = −2ναγ 4 η − νΔ∗ u0 . 30

Turning to the advective terms, notice that 1 (95) u · ∇u = ∇(u · u) − u × ω. 2 Therefore, again by the helical symmetry, h · (u · ∇u) = −h · (u × ω).

(96)

With (4) and (86), standard vector identities, and the identities obeyed by h, we then find u × ω = γ 2 u0 ∇u0 − γ 2 η∇ψ + [(∇ψ × ∇u0 ) · h] h,

(97) from which (98)

h · (u · ∇u) = −γ 2 (∇ψ × ∇u0 ) · h.

Substitution of (94), (98), and h · ∇p = −γ 2 Q (which again holds by the helical symmetry) into (89) yields (79b). D.3. Equation (79c). To derive (79c), we first substitute (95) into the the curl of (87), thereby finding ∂ω − ∇ × (u × ω) + ν∇ × (∇ × ω) = 0. (99) ∂t Contraction of this equation with h gives



∂η (100) γ2 − h · ∇ × (u × ω) + νh · ∇ × (∇ × ω) = 0. ∂t Let consider the advective and viscous terms in this equation separately. First, consider the advective term in (100). Taking the curl of (97) and then invoking identity (iii), we write (101)

∇ × (u × ω) = u0 (∇γ 2 ) × ∇u0 − (∇γ 2 η) × ∇ψ + h × ∇[h · (∇u0 × ∇ψ)] − [h · (∇u0 × ∇ψ)](−2αγ 2 h).

Therefore, contraction with −h yields

−h· ∇ × (u × ω) = (102) − u0 h · [(∇γ 2 ) × ∇u0 ] + h · [(∇γ 2 η) × ∇ψ] − 2αγ 4 h · (∇u0 × ∇ψ). Finally, using ∇γ 2 = −2α2 rγ 4 er and er × ∇u0 = r−1 γ −2 u0ϕ h, we arrive at

−h· ∇ × (u × ω) = (103) ∂u0 2α2 γ 4 u0 + h · [(∇γ 2 η) × ∇ψ] − 2αγ 4 h · (∇u0 × ∇ψ) ∂ϕ as the reduction of the advective term. We use (86) in the viscous term in (100) to reach





(104) νh · ∇ × (∇ × ω) = νh · ∇ × (∇ × ηh) + νh · ∇ × (∇ × (∇u0 × h)) . The first term on the right-hand side is





(105) νh · ∇ × (∇ × ηh) = νh · ∇ × (∇η × h) + νh · ∇ × (η∇ × h) . Eq. (5) and identity (ii) immediately determine that



(106) νh · ∇ × (∇ × ηh) = −νΔ∗ η + νh · ∇ × (η∇ × h) . 31

Next, with identities (ii) and (iii) we reduce the last term above to find

(107) νh · ∇ × (∇ × ηh) = −νΔ∗ η + 4να2 γ 6 η. Again by (5), the remaining term in (104) is



(108) νh · ∇ × (∇ × (∇u0 × h)) = −νh · ∇ × (hγ −2 Δ∗ u0 ) . Again with identities (ii) and (iii), we then have

(109) νh · ∇ × (∇ × (∇u0 × h)) = 2ναγ 2 Δ∗ u0 . Collecting the results of this paragraph, we get

(110) νh · ∇ × (∇ × ω) = −νΔ∗ η + 4να2 γ 6 η + 2ναγ 2 Δ∗ u0 as the reduction of the viscous term in (100). Finally, substitution of (110) and (103) into (100) yields ∂u0 − h · [(∇γ 2 η) × ∇ψ] = νΔ∗ η − 2α2 γ 4 u0 ∂ϕ (111) ∂η + 4να2 γ 6 η + 2ναγ 2 Δ∗ u0 − 2αγ 4 h · (∇u0 × ∇ψ), γ2 ∂t which upon rearrangement becomes 2α2 γ 4 ∂u0 1 Δ∗ η − u0 − [(∇γ 2 η) × ∇ψ] · h = ν ∂ϕ ν   (112) 2 γ2 γ ∂η 2 ∗ 4 + 2αγ Δ u0 + 2αγ η − (∇u0 × ∇ψ) · h . ν ∂t ν Substitution of (79b) into this equation yields (79c). Note that the factors in the curl terms need to be exchanged to match the expressions. References [1] C. Canuto, M. Y. Hussaini, A. Quarteroni, and T. A. Zang, Spectral Methods: Evolution to Complex Geometries and Applications to Fluid Dynamics (Springer, 2007). [2] M. O. Deville, P. F. Fischer, E. H. Mund, High-order Methods for Incompressible Fluid Flow (Cambridge, 2002). [3] S. Ya. Gertsenshtein and N. V. Nikitin, Finite-amplitude self-oscillations in a rotating Hagen-poiseuille flow Fluid Dynamics, vol. 20, issue 4, 513-518 (1985); Translated from Izv. Akad. Nauk. SSSR, Mekh. Zhidk. Gaza, No. 4, 22-28 (1985). [4] C. C. T. Pringle and R. R. Kerswell, Asymmetric, Helical, and Mirror-Symmetric Traveling Waves in Pipe Flow, Phys. Rev. Lett. 99, 074502 (2007) [4 pages]. [5] F. T. Smith and R. J. Bodonyi, Amplitude-dependent neutral modes in Hagen-Poiseuille flow through a circular pipe, Proc. R. Soc. A 384, 463-489 (1982). [6] E. A. Coutsias, T. Hagstrom, J. S. Hesthaven, and D. Torres, Integration preconditioners for differential operators in spectral τ -methods, in (ICOSAHOM 3, 1995, p. 21-38 / Spec. Issue Houston Journal of Mathematics - 1996). [7] E. A. Coutsias, T. Hagstrom, and D. Torres, An efficient spectral method for ordinary differential equations with rational function coefficients, Math. Comput., vol. 65, no. 214, pp. 611-635 (1996). [8] E. Liberty, F. Woolfe, P.-G. Martinsson, V. Rokhlin, and M. Tygert, Randomized algorithms for the low-rank approximation of matrices, Proc. Nat. Acad. Sci., vol. 104, no. 51 (2007) 20167-20172. [9] A. Toselli and O. Widlund, Domain Decomposition Methods — Algorithms and Theory, (SpringerVerlag, Berlin & Heidelberg, 2005). [10] B. Smith, P. Bjorstad, and W. Gropp, Domain Decomposition, Parallel Multilevel Methods for Elliptic Partial Differential Equations (Cambridge University Press, Cambridge, 2004). 32

[11] J. S. Hesthaven, S. Gottlieb, and D. Gottlieb, Spectral Methods for Time-Dependent Problems (Cambridge University Press, Cambridge, 2007). [12] S. R. Lau and R. H. Price, Multidomain Spectral Method for the Helically Reduced Wave Equation, J. Comput. Phys. 227, 1126-1161 (2007). We regret an error in the definition of ν − in Eq. (42). The correct expressions are ν ± = [T0 (±1), T1 (±1), T2 (±1), T3 (±1), T4 (±1), · · · ] = [0, 1, ±4, 9, ±16, · · · ] . The right-hand side of the second equation of (69) is also off by a sign. [13] S. R. Lau and R. H. Price, Sparse spectral-tau method for the three-dimensional helically reduced wave equation on two-center domains, J. Comput. Phys. 231, issue 22, 7695-7714 (2012). [14] A. D. Quintana, Preconditioners Constructed from the Interpolative Decomposition for the Variable Coefficient Poisson Problem, Master’s Thesis, University of New Mexico, July 2013. [15] M. J. Landman, On the generation of helical waves in circular pipe flow, Phys. Fluids A 2 (5), 738-747 (1990). [16] M. J. Landman, Time-dependent helical waves in rotating pipe flow, J. Fluid Mech. 221, 289-310 (1990). [17] N. Toplosky and T. R. Akylas, Nonlinear spiral waves in rotating pipe flow, J. Fluid Mech. 190, 39-54 (1988). [18] V. Frayss´e, L. Giraud, S. Gratton, and J. Langou, A Set of GMRES Routines for Real and Complex Arithmetics on High Performance Computers, CERFACS Technical Report TR/PA/03/3, July 2007. Available at http://www.cerfacs.fr/algor/. [19] E. A. Coutsias, T. Huld, and J. P. Lynov, in Preprint Volume of the Ninth Symposium on Turbulence and Diffusion, p. 212 (American Meteorological Society, Boston, 1990). [20] E. A. Coutsias and J. P. Lynov, Fundamental interactions of vortical structures with boundary layers in two-dimensional flows, Physica D 51, 482-497 (1991). [21] E. A. Coutsias, J. S. Hesthaven, and J. P. Lynov, An accurate and efficient spectral tau method for the incompressible Navier-Stokes equations in a planar channel (ICOSAHOM 3, 1995, p. 39-54 / Spec. Issue Houston Journal of Mathematics - 1996). [22] C. T. Kelly, Iterative Methods for Linear and Nonlinear equations (SIAM, Philadelphia, 1995). [23] P.-A. Mackrodt, Stability of Hagen-Poiseuille flow with superimposed rigid rotation, J. Fluid Mech. 73, 153-164 (1976). [24] S. J. Waldman (for the LIGO Scientific Collaboration), Status of LIGO at the start of the fifth science run, Class. Quantum Grav. 23, S653-S660 (2006). [25] F. Acernese et al., The Virgo status, Class. Quantum Grav. 23, S635-S642 (2006). [26] H. L¨ uck et al., Status of the GEO600 detector, Class. Quantum Grav. 23, S71-S78 (2006). [27] M. Ando and the TAMA Collaboration, Current status of the TAMA300 gravitational-wave detector, Class. Quantum Grav. 22, S881-S889 (2005). [28] J. W. York, Jr., Conformal “Thin-Sandwich” Data for the Initial-Value Problem of General Relativity, Phys. Rev. Lett. 82, 1350-1353 (1999). [29] H. P. Pfeiffer and J. W. York, Jr., Extrinsic curvature and the Einstein constraints, Phys. Rev. D 67, 044022 (2003) [8 pages]. [30] S. Bonazzola, E. Gourgoulhon, and J.-A. Marck, Relativistic formalism to compute quasiequilibrium configurations of nonsynchronized neutron star binaries, Phys. Rev. D 56, 7740-7749 (1997). [31] S. Bonazzola, E. Gourgoulhon, and J.-A. Marck, Numerical Models of Irrotational Binary Neutron Stars in General Relativity, Phys. Rev. Lett. 82, 892895 (1999). [32] E. Gourgoulhon, P. Grandcl´ement, K. Taniguchi, J.-A. Marck, and S. Bonazzola, Quasiequilibrium sequences of synchronized and irrotational binary neutron stars in general relativity: Method and tests, Phys. Rev. D 63, 064029 (2001) [27 pages]. [33] B. Bromley, R. Owen, and R. H. Price, Periodic standing-wave approximation: Nonlinear scalar fields, adapted coordinates, and the eigenspectral method, Phys. Rev. D 71, 104017 (2005) [27 pages]. [34] C. Beetle, B. Bromley, and R. H. Price, Periodic standing-wave approximation: Eigenspectral computations for linear gravity and nonlinear toy models, Phys. Rev. D 74, 024013 (2006) [17 pages]. [35] C. Beetle, B. Bromley, N. Hern´ andez, and R. H. Price, Periodic standing-wave approximation: PostMinkowski computations, Phys. Rev. D 76, 084016 (2007) [16 pages]. [36] N. Hern´ andez and R. H. Price, Periodic standing-wave approximation: Computations in full general relativity, Phys. Rev. D 79, 064008 (2009) [13 pages]. 33

(M. Beroiz and R. H. Price) Center for Gravitational Wave Astronomy, Department of Physics & Astronomy, University of Texas at Brownsville, Brownsville, TX 78520 (T. Hagstrom) Department of Mathematics, Southern Methodist University, Dallas, TX 75275 (S. R. Lau) Department of Mathematics and Statistics, 1 University of New Mexico, Albuquerque, NM 87131

34

Highlights • Sparse spectral-tau methods • Helically symmetric flow • Correction of a multidomain preconditioner • Interpolative decomposition