Domain Decomposition: an approach to building efficient linear solvers

Gabriel Mateescu

Table of Contents


Introduction

This essay aims at convincing the reader that domain decomposition is a powerful strategy for obtaining efficient solution methods for partial differential equations. The definition of domain decomposition given by Smith, Bjorstad and Gropp is:

        domain decomposition refers to the process of subdividing the solution
        of a large linear system into smaller problems whose solutions can be
        used to produce a preconditioner (or solver) for the system of equations
        that results from discretizing the PDE on the entire domain.

There are two main classes of domain decomposition strategies:

In this document we consider only the substructuring variant of domain decomposition.

A domain decomposition approach provides a way of preprocessing and preconditioning the algebraic representation

A x = b             (1)

of a boundary value problem. Preprocessing and preconditioning based on domain decomposition achieve:

A remark is in order with respect to the last item on the above list. Many problems arising from modeling physics phenomena have the property of global dependency, in the sense that the value of the unknown function at each grid point depends on the values of the unknown function at all the other grid points. This implies that totally independent subproblems cannot be defined. However, the effect of "close" neighbors is much larger than the effect of "remote" neighbors, which gives an opportunity for parallelizable preconditioning of the problem.

Note that domain decomposition provides a unifying framework both for preprocessing (reordering) and preconditioning, in the sense that the block structure of the matrix obtained with domain decomposition reordering leads naturally to a preconditioner.


Paradigm of Krylov-based iterative methods

Preprocessing and preconditioning are components of a Krylov subspace iterative solution method. We can break down the process of solving a linear system with a Krylov subspace method in three stages: Note that we call M the preconditioner, while some authors call M-1 the preconditioner. The above variant of preconditioning is called left preconditioning, because A is left-multiplied by M -1.

As an example of preprocessing method, we cite the TPABLO reordering scheme -- introduced by Choi and Szyld and applied as a preprocessor for ILU by Benzi, Choi and Szyld -- which achieves the low fill-in property using a symmetric permutation of the matrix A. The permutation is determined with three-parameter criterion for finding the diagonal block to which a vertex of the adjacency graph belongs. An independent set reordering for a given matrix A is a symmetric permutation P of A such that

                          | D       F |
        P T A P =   |               |
                          | E       C |
where D is nonsingular and diagonal.

Some popular preconditioning methods are incomplete LU factorizations (denoted by ILU(k)), and threshold ILU. Here k is the maximum level of fill of undiscarded elements. Threshold ILU is denoted by ILUT(p, tol), where p and tol are parameters used in the drop rules.


Preprocessing

Preprocessing reorders the rows and columns of the matrix A. Preprocessing can achieve one or both of the goals: Usually, preprocessing reorders matrices which are obtained using natural ordering of the grid points. Natural ordering is a well known method. It leads to matrices having desirable properties; for example, for second order partial differential equations defined on subdomains of R2 which are discretized with finite differences using the 5-point stencil, the resulting matrix has a block-tridiagonal structure.

Two examples of reordering schemes are TPABLO and independent set reordering, the latter used by multi-elimination block-ILU. Both start with the matrix A obtained with natural ordering and apply a symmetric permutation to the matrix. Successive reduction applies recursively symmetric permutations. The TPABLO preprocessing method improves the condition number of an incomplete LU factorization preconditioner. It reduces the number of fill-ins in ILUT and IQR factorizations and improves the convergence of incomplete LU factorization preconditioned GMRES, Bi-CGSTAB, and QMR. TPABLO attenuates the undesirable property of incomplete LU factorization of potentially generating an ill-conditioned matrix M even when A is well conditioned.

Domain Decomposition based Preprocessing

Domain decomposition preprocessing achieves the goal of generating a block structure of the matrix A for which a good preconditioner can be constructed. The effect of applying domain decomposition reordering is to obtain an "arrowhead matrix pattern" (adjacency graph) for the reordered matrix PT A P .

Like many other reordering schemes, domain decomposition preprocessing is also using natural ordering as the starting point. The reordering generated by domain decomposition can be mathematically be described as a symmetric permutation. However, the idea of the reordering is based on the substructuring approach pioneered by Przemieniecki. Two essential advantages of this approach are:

Domain decomposition reordering works as follows:
  1. partition the domain into subdomains and define three kinds of nodes
  2. order the subdomains (e.g., in natural order) and number first the interior points in each subdomain, following the subdomain order. Second, number interface points, and third number the cross points.
Let B be the matrix generated from A by the above reordering. Let p be the number of subdomains. The matrix B has an "arrowhead" matrix pattern. This arrowhead matrix contains p + 2 diagonal blocks in which the first p diagonal blocks represent the subdomain problems, the next block represents the interface (edge) problem, and the last block represents the crosspoint problem. The matrix has the important property that there is no coupling between interior unknowns in different subdomains. This property is essential for parallelism.


Domain Decomposition Based Preconditioning

After reordering the matrix by a preprocessing method we can take advantage on the block structure of the resulting matrix to construct a good preconditioner. First, let's define what are the desirable properties of a good preconditioner. A good preconditioner: The methods to solve preprocessed problem may actually solve the reduced Schur complement (RSC) problem or may iterate on the full matrix A. As Yousef Saad observes, a preconditioned Schur complement iteration is mathematically equivalent to a certain full matrix preconditioned iteration. The advantage of the full matrix iteration is that it does not need to explicitly multiply the Schur complement operator S by a vector, therefore the convergence of the iterative process does not depend on the precision of the computation S u .

For the sake of brevity, we give next a Krylov subspace iteration for the RSC problem:

  1. Form the RSC problem, which involves the interface unknowns
  2. Solve the RSC problem with a Krylov subspace method preconditioned by the matrix

    B = U D U T

    where U is a basis change from a hierarchical basis to the nodal basis restricted to the interface, and D is a 2-block diagonal matrix. The top diagonal block of D is itself a block diagonal matrix, with each block being an interface preconditioner for a corresponding edge. The bottom block of D is the discrete form of the PDE operator on the grid defined by the crosspoints.

  3. Solve (in parallel) the subproblems associated with the interior points.


Domain Decomposition based parallel iterative solvers

Domain decomposition provides a framework for obtaining well condtioned and efficient preconditioners. The approach achieves this challenging goal by employing physics-driven (model-driven) reasoning.

A domain-decomposition-based linear system solver will operate as follows:

Note that applying the preconditioner has good parallelism: once the cross-point and interface point unknowns are determined, the unknowns associated with interior points in different subdomains can be determined in parallel. The parallelism of applying the preconditioner is an essential feature required for effeicient solution of large linear sytems.


Special Preprocessing for Collocation Discretization

Now that we have the big and pretty picture it is time to go into the dirty implementation details. One such detail is that some discretization methods, such as collocation, require special effort in order to get the natural ordering style of matrix structure. This section explains the problem and the solution.

For the case of collocation-based discretization, simply scanning the grid nodes in natural order does not provide a natural-ordering-like matrix pattern. Therefore, an initial preprocessing step is needed before performing the domain decomposition reordering. The reason for this peculiarity is that -- unlike, say, finite difference discretization, where for each grid point there is one unknown and one equation -- in the collocation method there are more unknowns and equations associated with a grid point.

We illustrate the problem for a collocation method which employs piecewise Hermite bicubics basis functions. However, the ideas, with slight modifications, apply to other collocation methods, e.g., quadratic spline collocation. With piecewise Hermite bicubics, for each grid point G(i, j) we have associated four unknowns, namely

u(xi, yj), uy(xi, yj), ux(xi, yj), uxy(xi, yj)

For each node n the four unknowns associated with n are numbered with

4n-3, 4n-2, 4n -1, 4n.

On the other hand, for each grid cell, where a cell is defined by four grid points G(i, j), G(i, j+1), G(i+1, j+1), and G(i+1, j), the PDE is imposed at four collocation points. The four equations associated with grid cells while the unknowns are associated with grid points. In the "standard" collocation approach, the four collocation points are numbered in cell-order, also known as collocation order. If the discretization uses the collocation ordering then the resulting matrix has mostly zeros on the main diagonal and around it.

The problem is caused by the equation ordering scheme not folllowing the unknown numbering scheme. Therefore, we need a preprocessing step whose goal is to permute the equations so that for each node n the four enclosing collocation points are numbered with

4n-3, 4n-2, 4n -1, 4n

that is, exactly the same way as unknowns are associated with n.