The asymptotic distribution of rounding off errors in linear transformations

The asymptotic distribution of rounding off errors in linear transformations

TRE ASYMPTOTIC DISTRIBUTION OF ROUNDING OFF ERRORS IN LINEAR TRANSFORMATIONS* V. V. VOEVODIN ?~oscow 9 December (Received 1966) 1. LET G be a simp...

739KB Sizes 0 Downloads 57 Views

TRE ASYMPTOTIC DISTRIBUTION OF ROUNDING OFF ERRORS IN LINEAR TRANSFORMATIONS* V. V.

VOEVODIN ?~oscow 9 December

(Received

1966)

1. LET G be a simple-connected convex closed region in real n-dimensional sperce f?. The vectors z of G are assumed to be random quantities, the distribution density of which is a continuous function D(Z), where ?(z) > c > 0. Let a sequence of transformations on the vectors Z, the transformation

:]I, I/,, . . ., ‘!h, matrices being

k < ;I be

performed

r= 1,2,..., k.

U, = E - atblr,

Here a, and b, are n x 1 matrices (vector-columns). Most direct and iterative methods of solving problems of linear algebra are based on such transformations. Let zr denote the vector format ions: then

obtained

from z c

C after

the first

zr--i - (b,, z+i)a,.

Zr =

r trans-

(2)

We cannot in fact perform these transformations, since all the computations have to be realized to a finite number of places. Suppose the computations are performed to t binary places after the point, while intermediate results having more than t places are rounded off according to themselves will be dethe usual rule (see below). The transformations fined by the matrices (t)=

u

E

_

where the vectors *

Zh.

vychis

1.

a,

ct ),

b, ct

Mat.

mat.

Fiz.

b(‘)

d”

r

r

T

) are obtained 7,

5,

3

965

7

from a, and b,

- 976,

1967.

by rounding

4

V.V.

Voevodin

off their coordinates to t places. In many problems of linear algebra, e.g. when solving sets of linear algebraic equations, it is unimportant whether the transformations (2) be performed with the vectors a,, b, or c_x~(~), b,.(t), the only thing that matters is that the actual transformation be performed fairly accurately. It is therefore assumed in future that the vectors z,. are computed from

(3) But even in this case, only the vector zr (t) is actually where

computed,

and E,.(~) is the error introduced at the r-th step by the inaccurate realization of (3). We have

zr

= ...=

[J(l) u’ft T 7

1..

.

...

UP(2 + 2 + u,‘“’-’ +

. * *

j-p-

El

(0

LI;(‘r’. *, $r*

4

+

up SF> )

A

82

(0

+.

. .

,

Here su(t) is the error obtained from rounding off the components of z to t places after the point. From the last formula, zp ft ) can be regarded as the result of accurate tr~sfo~ation of the vector in the brackets. The latter differs from the given vector z by

which, following ElII will be called the equivalent perturbation. Major& inequalities for the equivalent perturbation are examined in [ll . In the present paper nr tt ) is investigated as a function of the random argument t. In addition, we differ from [tl in examining only the effect of the error on the transformation equivalence, so that errors in computing the vectors +(tf, b,(t) can be ignored, and as a result, rather better inequalities obtained. It is shown below that, as t + co, the principal term of nr (t) is the sum of independent uniformly distributed quantities. Bounds are obtained for the principal term and the terms of higher order.

Rounding

off

errors

in Einear

2. Suppose that, when realizing (3), to double the number of places: then

5

transformations

the scalar

product is computed

where 5, ct ) is the error from rounding off the scalar product (!I?!;, z:!.~), and (srft ) the error from rounding off the product of the rounded scalar product and the vector a, (t 1. Obviously,

‘Ibe second inequality means that none of the moduli of the components of cryct ) is greater than 52-t. From (5) and (6),

where

The error distribution for the main numerical. methods is shown below to be determined by the function vr tt ). We therefore examine v,(t) as a function of the random argument z, and obtain an upper bound for K, (t 1. With a, ft)

fixed.

prod&$ (b’“” cit r , z,._&

the error u, tt) depends only on the rounded scalar Since the vectors subjected to tr~sfo~ation

are

much more numerous than the possible rounded off values of (@,

z$!r),

a certain number of vectors z!?, will necessarily ur (t ) at the r -th transformat ion.

have the same error

Let G(t) denote the set of vectors into which vectors of G transform after rounding off their components to t places, Obviously, G(t) C G, except for points close to the boundary. Divide Jft) into non-intersecting subsets

G&...Rr,

so that each only contains vectors of Z(t)

which have the same rounded off scalar products, namely, k,, kz, . . . , k, during all transfo~ations from the 1st to the r-th. The subsets

G&.-A,

will be termed bundles of the r-th order. From

6

V.V.

Voevodin

what has been said, at least some of the bundles consist of more than all the vectors belonging to a given one vector. From the definition, r-th order bundle will yield the same error v,(*). In addition, clearly

Our immediate tasks are to examine the errors which a bundle carries, to investigate a bundle structure, and to evaluate the probability of coming within a bundle. We define

the deviation

of a number x from the nearest

wt

LrT= where Ix)

denotes

the fractional

number x rounded off {

1,

L

1,

(

(4 -

component of which is obtained of the initial vector. It is easily

(5) <

if

{x> b %,

If

a vector

l/2,

in a sign

is contained

of the operation by applying

by

By we denote the

part of x [21.

to t places.

> , the result

1,

if

integer

vector,

is another

the operation

each

to the component

shown that

(a> = a -

2-fL2’a7,

so that (0

csr = The deviation

-

from the nearest

of the fractional

part,

while

2-q__ 2t ( integer

1 be the rounded off

error

of.‘)

will

clearly

i.e.

all

The layer angles

the vectors thickness

to b,(t).

scalar

be contained

*/,2-t

l-

<

belonging is

2-‘~#‘~’

is a piecewise

linear

the number 2f((b(,‘ls$z1 )>

Investigation of the error carried ing the distribution of fractional runs over a sequence of integers. Let

(b!‘),&)> a?’1. function

is an integer.

by a bundle thus amounts to examinparts of the vectors ka,(t ), where k

product in all

(by’ , &)

for the

< I+

the vector &,

z,_x. The same ^(*)

for which

1/22-t,

to a layer with one boundary excluded. while

its

plane

faces

are at right

Rounding

off

errors

in

linear

transformations

7

whence the transformation of vectors belonging to a given bundle amounts merely to the operation of translation, since the vector -

&)

((b!“,

) af” + $)

This means that region lar

all

will

be constant

the vectors

for

bl’,

nesses

(0 ,. . . , by) bz

vectors

non-intersecting,

and

U

respectively

of two distinct

G,ka...kf-

r+i

ha

=

all

the vectors

of the region

transform

to t places

after

in the bundle

belonging

to the coordinateexes,

boundaries,

into

the point.

Gk$!*...k

perpendicu-

None of the regions

bundles,

all

u Gk, =

..k r)

to a

these

regions

are

G.

kl

Denote by E, the set of vectors parallel

belong

and having the thick-

.

kr+i

and sides

of the bundle.

of S with r layers

2-tII bi(” IIs-’ & 2-t)I bf’ IjE’, . . . ,2-‘11 br(‘)[ii

Gk,k2..:kr contains

vectors

G&..:k,

of the bundle

Gk,k,..:kTI which is the intersection

to the vectors

all

f

G n E,,

the vector

to the cube with centre equal to 2-t.

If zc

with the exception z after

The probability

rounding

z

GW,

of certain

their

components

p(z c G$z...kr)

of z being

is

P(z C

Gk:!~.:.kr) = 1 P(W, G(Jd A,Al...Ar

where (E)

U

Gk,k,.:.kr= z

c

G(‘) A,Ao...Ar

(GnEz).

3. Some propositions from the theory of numbers will what follows. Given [21 a finite number of vectors z (9), components Zj ‘9’, distributed in the unit cube, i.e.

Let a and P be n-dimensional

vectors

be needed for 1~ q ~2, with

with components oj and Sj,

8

V.V.

Voevodin

G < aj < Pj < 1. Denote by F(a, in the parall elepiped

p) the number of vectors

Zj”’

CZj <

of volume ll(Pj - oj).

is called

<

z(q),

lying

pj

Then,

the deviation

Given an infinite first (2 vectors. If

of the vectors

vector

sequence,

z(q). let DQ denote

the deviation

of

its

limDq=O, Q--

the sequence unit cube.

will

of z(g)

be said

to be uniformly

distributed

in the

From bl, a sequence of vectors {k(3), k = 1, 2, . . . , is uniformly distributed provided the components of the vector 8 are rationally independent together with the number 1. Take a sequence of vectors G,, p = 1, 2, . . . . convergent

to 8, and construct

vectors && + Y), { (kP + WP 0, the deviation of this system. ‘%eorem

for

each p a set of

+ ~1,. . . , { UG + Wb Then,

+ ~1.

Denote by

1

If the vector

8 has components rationally

independent

together

with

t e number 1, and lim I, =

lim 0 p = 0,

00,

P-+-J

P--J

then IimD, P+-

uniformly

with respect

to k,

regions

lemma establishes

‘%,k*.:.k, @es

0

and y.

We shall not dwell on the proof no particular immediate interest. The following

=

since

it

the ratio

is very laborious

and is of

between the volumes of the

&,k*...k ,,) and &!$*...k, (IUS ($2 ,...,+.

~ound~~~

off

errors

in

linear

9

transformations

For almost all vectors bl, bz, . . ., hr. mes Gk,k,.:.k_

the conversance

r =

being uniform with respect to almost all k,,

ks,

. . . , k,.

The underwing idea of the proof can be seen by taking the ease IZ= 2, 1, to which we shall confine ourselves for simplicity.

It can be assumed without loss of generality that the elements of t?(t) are the base-points of a mesh with unity interval. To prove the lemma, it only needs to be shown that the number of elements of -i(t) occurring in a layer is asymptotically equal to the volume of the layer. All the points of Zct) on a given straight line will be called a ray. The number of points in a layer is equal to the number of its projections (allowing for multiplicity~ on the vector blct 1, which enter into the projection of the layer itself on to this vector. The projection of a layer is a semi-interval

of length l]bj”’//il.

Project the points of a ray of Gctf on to the vector bl(t). These projections form on bl(t) a uniform mesh with interval cos cyct), where q(t) is the angle between the ray and bl(t ). Projection of the points of the adjacent rey is equivalent to a shift through sin act) of the first mesh leftwards or rightwards. If the deviation of sin q(t )/cos g(t ) is convergent to an irrational number (which is defined solely by the vector blct 11, then, by Theorem 1, the number of points belonging to any F adjacent rays and the projections of which on to b,(t) come within a fixed segment, is asymptotically pro~rtional to p, independently of the position of the segment. Independently of the position of the layer, its volume is asymptotitally proportional to the numberof reys intersecting it. Hence the lemmafollows from the fact that

kg

kt

The convergence (‘7) will be uniform for sll leyers except those cant inu ous to the points of intersection of the vectors b, tt ) with the boundary of ii.

10

V. V. Voevodin

Given the conditions

the convergence

being uniform

The function division

of Lemma 1, we have

o(z)

of C into

with respect

is uniformly

to almost

continuous

non-intersecting

regions

all

kl,

in G, so that, gi(e),

gs(s),

kz,. . .,k,. given E, a

. . . ,gN(e),

can

be found such that the deviation of P(z) fran the mean value on each K~(E) is not greater than E. Obviously, the gi(e) can be regarded as regions G of Lemma 1. Consequently,

mesgf(4 fl GM....k,.

lim

t-+m mes gi (e) Let the mean value mean value theorem

=

C-U

1

n G6fds...k P

*

> 0. Using the

of P(z) in the region gi(e) be pi&c for an integral and (9), we have

P(z c gi (8) II G~tkr...k~) = *- P (z c gi (e) n G,$.. .Q

lim

=

lim(~‘+O(e))mesgf(e)nGh,~...~,

= 1+

*-((~~+O(e))mesg,(e)nG~~~...h~ Hence, given

E,

a

to(~)

(idO(e))P(ZCgi(e)

can

be found such that,

for

all

fV%....g,

O(e) 2 0.

Further,

p(Z c Gitlk2.../z,.) =

t

G&...

k,.)

[ ~~~zcg,(e)nGk~k~...k~)] i=l N

whence, for all

t 2 to(~),

tato(~),

,I G W = gt(8) nG~,,k,.~.k ,) G

nGcm ~...k

G (1 +O(eHW=g4e)

p(Z

O(e).

X

Rounding

off

*-O(e)< Since

e is arbitrary,

Now fix

in

linear

(2 C Gk,i+../a,,>

p

P(z c G&,...kr ) fl

+0(e).

small number u > 0 and consider

from 4 by cutting

out the boundaries

Gk,kz...kr, for which the uniform convergence shall

11

transformations

the theorem follows.

an arbitrarily

Gg) , obtained

errors

(8)

the region

of the regions

is not satisfied.

We

assume here that

P(zcGtf')> 1-p. It can be assumed without loss of generality that. the error ur ct ) is distributed in the unit cube, having the same number of dimensions as the non-zero coefficients of the vector a,. Let S, be a parallelepiped inside this cube. Then, Lemma 2 We have p

lim t-co the convergence

(or(‘)c S,, 2 c

Gk:‘+k,_J

mes S,P (2 G Gk$, ,.. A,_~)

being

uniform

for

all

bundles

=

1

t

Gf).

of

We have [31

P(2)

c

sr,

zc

=

Gdfk, ...R

Notice

that,

all

bundles

s,/zc

Gk’fh,..

k,_i

) = mes S,

of GY).

as t - COeach bundle of order

generates an infinity . . ., b, are linearly

),

if we can show that

lim p (u!t) C

t-ww for

)=

P(~cG~~~~...k,_~)P(urcSr/z=G~fk,...k,_,

so that the lemma follows

uniformly

r_-i

r

-

1 in the region

Gr)

of r-th order bundles provided the vectors b 1, b2, If all the r-th order bundles were independent.

V.V.

12

equiprobable, this relation Theorem 2. Given

E

>

Voevodin

(IO) would follow trivially from Theorem 1. The proof of in the general case is very similar to the proof of

0, we find a division

of the region

Gf,f) into non-inter-

secting regions ci(e) bounded by planes perpendicular to the vectors (1: b (0 and the boundaries G$). It will be assumed that all b I , 2 ,**-, bQ)r the r-th order bundles in each qi(~) factor

(1+0(a))

for all

According to the division

are equiprobable apart from a

t 3 ti(s).

of

We get

L33

G$) into regions q i(E),

limP(a!“cS,/zcqi(e)nG~~~*...k,*)= t-t=

mesS,(l

We have t-Q(s))

uniformly for all bundles and qi (E). Since E iNarbitrary, follows. Theorem

(10) now

7

independent The errors a? , r = 1, 2, *a., n - 1, are ~ymptotical~ uniformly distributed entities for almost all vectors or and b,. Using Lemma2, lim P (PC t-w= =

&) = lim[P(lo!f)c: S,.,z c Gf’) +

P (as*)C S,, 2 E G$)l

t-too

lim z t-k03[ $1

P(o,“‘c

ST, 2 c Gf’n G&

kik-. . . . k,_*

+P(df’cS,,

z=G:l’)]=mesS,+O(p)

... k,_i ) +

=

13

a**~o$‘cE 8,) lim t+m

~P(~CG~~~l...k,_l)P(~jf)CSr/~CG~~~:...I,_ F 1) =

IfI p (2 c: qtJ, ._.x,_J

C mes S, lim P ( CT?, c Sr_*, . . . , ai

c

Si) =

mes S, me.9S,i . . . mes Si.

The summation in these latter sums is only over those bundles which give errors from S,, S,, . . ., S,_,. 4. Let v=ai+az+...+ UT be a random ~antity , equal to the sum of random independent quantities oi, each of which is uniformly distributed in a parallelepiped with centre zero and sides of length

oil,

GiZ,s**,

cfir.

The

mathematical expectation of llylt2 is given by Ml~llJ? =

‘/12

2

oij’.

iJ

This is easily verified directly. Now take some concrete transformations fitting into the scheme (1) and encountered in numerical methods of solving problems of linear algebra t41. Gauss’ s method (without division of a row by a principal element). It can be assumed without loss of generality that the matrix U, in this method is

14

V. V. Voevodin

Consequently , r-1

a lp’=

(n,

b,‘=(O

,...,

All the above assertions tion,

Sf)=O,

r=l,

0, 0,

%tl, r9 * . -

I, 0,

9

%r)7

. . ..O).

hold for the unit vectors b,,

and in addi-

1. Further, only the first

2,... *n-

are different

in the matrix Uf’-’ V.$f)“. . . Ul”-’

columns of the unit matrix, while the first

r

r

columns

from the corresponding components of the vector

u, ft ) are zero. Hence

Since n - 1 transformations are required to realize we have

Method

of reflect

Gauss s method,

In this method the matrices Ur are orthogonal,

ions.

and = 1. u p = E - 2W
rlx:~*II2 ,p+liv

lim

f-+m

‘Jethod

of

22f

lim JfltdLll2 4

ort~o~on~~~zut

*

22f

f-km

(without normali~tion

ion

this method 1

0

1

u, = %l,

0 Hence

n* Q.

-

1.

. .

%+I,

P

1

. 1

of the rows). In

Rounding

off

a,’

errors

in

=(O_,

b,’ = (a,+~,1,

. . . .

linear

15

transformations

I,0

O),

,...)

a,+~,r, 0, 0, . . . , 0).

It, follows that ur ct) = 0. Further, only the first

r columns of the matrix Uf’-‘Us’-‘.

differ from the columns of the unit matrix, while the first of the vector a,(t) are zero. Hence

uiw-1u;tr’

. . Uf.“‘-’

r components

. . . u,“’--i6,(t) a,(1) = 6:’ a:‘.

Finally,

An interesting than a sixfold

point is that,

if we consider

improvement in the inequality

M/I I&\$,

not more

is obtained.

5. Taking the example of Gaussian transformations, we now describe the method of estimating the most probable value of the principal term of the norm of the equivalent’ perturbation, We have

1

max Ilv!$Il~ < 2-t n3 ZCG

(12)

12’

Let oi’,“’ denote the i-th component of o,(? The error of o$) in each bundle of order r - 1 is none other than the error from multiplying two numbers, one of which is the i-th component of the vector o$) , and the other the rounded off value of the scalar product (brcf), z$). using IAt

lM(d?/zc M

Gk,kr.:.k,._,) I = O( (~,...k,_~2’)

&) (Ci$z C &$!..A

42)

Hence,

2-f,

= (1 + O( (%..:k _,2Y’~9)~.

in the direction where 2k,...k?_, is the nlength” of the bundle G/,!.k,l of the vector

6,

(t ).

Further, the function

‘&

is constant in each bundle of order r - 1,

16

V.V.

Voevodin

so that

i==r-+-i

i=r+l

G(t) kl...k,_l

i--,-i-1

Here, the constants of the terms O(2-tfi) depend only on the region I; and the distribution density Yz), and are independent of n. Without dwelling in detail on the truth of these assertions in these last relationships, we merely remark that it follows from the existence of the math~ati~al expectation of the terms ~((~2~)-“~~, where x is the “length” of the bundles. Consider ~11~~~~11~ in the light of these relationships:

G

Mll~~~*ll~ + G(1

+0(2-t/2))+

n~2-~~0(2-~/~)sg.,

.

2-2fn2 . . .

zg 7(1+

nq2-f/2)).

Rounding

off

This gives us asymptotic the

dispersion

of

The final equality

[31:

I’(ll&llE

Q Y $$I



solution

+

in

errors

formulae

for

\Iv~>,~/~;. In fact

of our

linear

is

+-0(2-@)))=

>

the

mathematical

17

expectation

and

Chebyshev’s

in-

[31,

problem

+ nOC-f’W

transformations

obtained

by using

I--P(IIY,-IlIE>

b 1 -P(

1IldflillE - MII&IIE~

(Y- 1)2-‘n(IfnO(2-t/Z)))>+

> (y&2.

-f2d

In particular,

P(llv:i& < These

results

are

in good agreement

n2-ya0.93.

with

both

(11)

and

Translated

(12). by

D.E. Brown

REFERENCES 1.

WILKINSON, J.H. The Press. 1965.

2.

CASSELS, J.W.S. Introduction Cambridge U.P., 1957.

3.

GNEDENKO, B.V. Course on probability theory nostei), Moscow, Gostekhizdat, 1954.

4.

VOEVODIN,

algebry) 5.

V.V.

algebraic eigenoalue problem, Oxford, Clarendon

Computational

, Moscow - Leningrad,

to Diophantine

methods

Nauka,

of

approximations.

(Kurs teorii veroyat-

algebra

(Chislennye

metody

1966.

KIM, G. and CHIBISOV, D.M. Rounding off error dlstrlbution when multiplying two numbers on a computer witi fixed memory. Mat. Zametki 1, 2, 225 - 234, 196'7.