Automatic differentiation in MATLAB

Automatic differentiation in MATLAB

Applied Numerical Mathematics 9 (1992) 33-43 North-Holland 33 Lawrence C. Rich The Mathematical Software Group, 54 Sequoia Drive, Newtown, PA 18940,...

1MB Sizes 2 Downloads 133 Views

Applied Numerical Mathematics 9 (1992) 33-43 North-Holland

33

Lawrence C. Rich The Mathematical Software Group, 54 Sequoia Drive, Newtown, PA 18940, USA

David R. Hill Mathematics Department, Temple University,Philadelphia, PA 19122, USA

Abstract Rich, L.C. and D.R. Hill, Automatic differentiation 33-43.

in MATLAB, Applied Numerical Mathematics 9 (1992)

Automatic differentiation is a chain rule based technique for derivative evaluation introduced by Wengert (1964) (see also Wilkins (1964)) and further developed by Rail (1980, 1981, 1983). With the development of more powerful programming languages automatic differentiation has received increased attention as a viable alternative to symbolic differentiation and finite difference methods. (See Griewank (19891.) Here we present a C language implementation of automatic differentiation that can be called from lI4ATLAB, a highly interactive robust scientific computing environment. Our implementation accepts a multivariate function and a point as input returning a function value and gradient, hence the Jacobian at a point can be constructed. The implementation along with examples is described. The code is available from the authors.

Keywords. Automatic differentiation,

parser, data structure, C language, gradient, Jacobian, MATLAB.

1. Introduction

I. I. Automa tic differentiation

First partial derivatives can now be computed easily and accurately within the MATLAB ’ software environment using a new C language implementation of automatic differentiation 2. Automatic differentiation is a chain rule based technique used to obtain numerical values for the derivatives of a multivariate function simultaneously with the function evaluation without appeakrg to numeric or symbolic differentiation. Our C language function GRAD [25] automatically computes the value of a multivariate function and its gradient vector of first partial derivatives at a given domain point. G called from MATLAB with a character string representing a function f of n variables and a point X in n-space at which f and its first partial derivatives are to be evaluated. A vector of ’ MATLAB is an interactive computing environment that provides access to quality matrix computation software as in LINPACK and EISPACK. 2 The program we describe is also available from the authors as a stand-alone C function on C’s and the VAX.

L.C. Rich, D. R. Hill / Automatic differentiation in MA I-LAB

33

rz + 1 values is returned from GRAD representing f(X) and the first partial derivatives of f at X. In the following we compare GRAD to several implementations of automatic differentiation, explain how GRAD was implemented, illustrate how GRAD is used in solving a system of nonlinear equations, and indicate that GRAD can be used in many applications that require the computation of first partial derivatives. A wide variety of automatic differentiation implementations appear in the literature. Here we mention several to indicate how our work relates to the existing imple*.rentations. 1.2. Table algorithms In the works of Kalabs and others [ 12-15,I 1,30,5], a table approach to automatic differentiation is used. In this approach the values of a multivariate function and its partial derivatives are obtained by decomposing a multivariate function into elementary functions. The elementary functions are then evaluated by calling a series of subroutines that compute the value of each elementary function and its partial derivatives simultaneously. The user must compute the partial derivatives of each elementary function by hand, translate them into subroutines, and then manually order the subroutine calls in a program before compiling pnd linking to produce an executable code. One advantage of this approach is that it can be extended to compute higher-order partial derivatives by continuing the table. However, when the multivariate function is changed the process outlined above must be repeated before an executable code can be produced. i3

Operator overloading

An alternative approach of operator overloading is used by Rall in a Pascal-SC implementation [23] of automatic differentiation. Pascal-SC allows the user to redefine the standard arithmetic operators so they can be used with user defined data types. The GRADIENT data type defined by Rall represents the value of a function of n variables and its gradient vector of first partial derivatives with respect to the 12independent variables. By redefining the standard arithmetic operators to perform GRADIENT arithmetic [24] the values of the first partial derivatives of the function can be computed simultaneously with the function evaluation (see Appendix A for a brief explanation of gradient arithmetic). The main advantage of this approach is that the user is not required to decompose the multivariate function into elementary functions, since this is handled by the Pascal-SC compiler. However, as in the table algorithm, each time the multivariate function is changed the program must be compiled and linked to produce an executable code. A similar approach is used in [7] where the implementation is in ADA. See also [ 10,181. I. 4. Precompilers

Several FORTRAN precompilers exist [4,6,9] that accept as input FQRTRAN source code for the evaluation of a multivariate function. Each precompiler translates the given source code into a FORTRAN subroutine that computes both the value of the multivariate function and its partial derivatives. The resulting subroutine must then bc compiled and linked with the

L.C. Rich, D. R. Jill / Automatic differerztiation in MATLA B

35

FORTRAN application code to produce an executable code. This method offers flexibility in the complexity of the multivariate functions that can be differentiated, As with the methods mentioned above, each time the multivariate function is changed a compile and link step is required to produce an executable code. 1.5. Forward and reverse .modes

The implementations of automatic differentiation desrribed in Sections 1.2 and 1.3 operate in what is termed the forward mode. That is, the value of a function and its partial derivatives are evaluated simultaneously. In the reverse mode due to Speelpenning [27] the partial derivatives are computed after the function has been evaluated. The precompilers described in Section 1.4 all operate in the reverse mode (GRESS [6] also operates in the forward mode). The works of Iri [S] and Griewank [3] indicate that the reverse mode is superior in terms of computational effort, but that the forward mode may often require significantly less storage. An excellent discussion of the forward and reverse modes is found in Griewank [3] together with the execution speed and storage requirements of the two modes. In the next section we describe GRAD, which is a forward mode implementation of automatic differentiation.

2. The function GRAD in MATLAB 2.1. Motivation In 1972 Kuba and Rall [17] described a FORTRAN program for obtaiiring rigorous error estimates for the solution of systems of equations. A parser was used to read in the system of equations, decompose each equation into its components parts, and determine the correct calling sequence of subroutines to evaluate each of the partial derivatives at a given point. The motivation for our function GRAD is derived from this and other work by Rall [23,24]. 2.2. The implementation of GRAD in IMATLAB User defined functions can be added to extend the capabilities of MATLAB. The MEX-File interface of MATLAB [19] makes it possible to call FORTRAN and C programs as if they were built-in functions. The MEX-File interface accepts programs written in MS-FORTRAN, MS-C, and TURBO-C. Our MEX-File interface for GRAD is contained in Appendix B. The format for calling GRAD in MATLAB is result = grad ( math -expression, domain goin t )

where math_expression is a character string representing a function of n variables composed of unary and binary operations, domainyoint is a vector representing a point in n-space at which the function and its first partial derivatives are to be evaluated, and result is a vector of n f 1 values representing the function and its first partial derivatives evaluated at domairzgoint. Standard arithmetic has been replaced with GRA arithmetic to co the first partial derivatives simultaneously wit the value of math _expressiortat domaingoint.

L.C. Rich, D. R. Hill / Automatic differentiation in MA TUB

36

The evaluation of math_expression is directed by a parser. The parser used in GRAD is a modified version of a C program distributed with Borland’s TURBO-C. In order to understand how the parser affects the evaluation of math-expression, we consider the following example. xarnpk 2.2.1. Suppose math _expression is given as x, * x2 +x3. The parser scans math _expression one character at a time searching for items called tokens. Each item in x, * x2 +x,

is considered to be a token. As scanning proceeds, tokens are pushed onto a stack if there is not sufficient information to decide what to do next, or popped off the stack when the parser has sufficient information to perform an action. Once the parser “sees” x2 it has enough information to perform x1 * x2. In this case x 1, x2, and * are popped off the stack, GRADIENT multiplication is performed, and the result is pushed onto the stack. When the parser “sees” x3 it has enough information to perform x, * x2 +x,. In this case x, * x2 and + are popped off the stack, GRADIENT addition is performed, and the result is pushed onto the stack. The stack result is then returned to MATLAB containing the value of math-expression and its first partial derivatives evaluated at domainqoint. If math-expression is changed, there is no need to produce a new executable code, only modifications in the character string are required before calling GRAD. Note that if math_expressioa remains unchanged in subsequent calls, repeated parsing occurs since the evaluation of the stack is driven by the recognition of tokens in math_expression. For moderate sized problems that can be expressed succinctly as character strings, GRAD offers a convenient way to compute the value of a multivariate function and its first partial derivatives without repeated production of an executable code. This approach complements MATLAB’s highly interactive environment by providing a new tool for problem solving. pie 2.2.2. As an example of the use of our function GRAD we consider the nonlinear system e(-xl+x2)

-

0.1

=

0,

e(-xl-x2)

-

0.1

=

0,

which appears in 121.Newton’s method with GRAD to determine the Jacobian at each step is used with an initial guess x1 = 4.3 and x2 = 2.0. The termination criteria required the infinity norm of the Newton increment to be less than or equal to the tolerance l.OE-12. In addition we monitored the condition number of the Jacobian using MATLAB’s built-in LINPAC estimator RCOND, which estimates the reciprocal of the condition number in the l-norm (see il9]). In the MATLAB program NEWTG (see Appendix C) we display a warning message whenever RCOND is below a user defined threshold. The results we obtained were virtually identical to those reported in [a], convergence in 56 iterations. The condition number estimates for the Jacobian computed by NEWTG varied between a wide range of approximately l.OE-1 to l.OE-22. When RCQND was below l.OE-16 MATLAB’s linear equation solver issued the warning message Matrix Results

is close may be

to singular inaccurate.

or

badly

scaled,

L.C. Rich, DR. Hill / Automatic differentiation in MA TLA B

together with the corresponding value of RCOND. The ATLAB environment makes to incorporate such condition number estimates into your algorithms.

33 iteasy

3. Remarks Automatic differentiation is being used in an increasingly wide variety of applications including categorical data analysis [26], the solution of nonlinear systems [2], the solution of nonlinear least squares [13], quadrature [ 11, weather modeling [28], economics modeling, sensitivity analysis [ 161,optimization, and the solution of differential equations. The function GRAD offers a convenient method for computing the gradient vector or Jacobian matrix of a given multivariate function in the MATLAB environment. As Example 2.2.2 illustrates it is easy to develop algorithms in MATLAB in combination with GRAD. An enhanced version of this work will soon be available for IIessians.

In this appendix we consider the differentiation arithmetic defined by Rall [24] in 1986. ifferentiation arithmetic is defined as an ordered pair arithmetic in which the basic elements are pairs of real numbers. If U = (u, u’) and V= (v, v’) are elements of R*, then the basic rules for differentiation arithmetic can be expressed as: U+V=h+v,

u'+v'),

Addition: Subtraction: Multiplication:

lbv=b.J-v/ U"V'), UV=(uv, u'v+uv'),

Division:

u/v=h.J/v,

cvu'-uv'3/v2L

The first component of each ordered pair represents the function value formed by the rules for evaluation in real arithmetic. The second component represents the value of the derivative formed by the corresponding rules for differentiation. A function of one variable and its first derivative can be evaluated with the rules given above if X = (x, 1) is substituted for x and C = (c, 0) is substituted for each constant c in the expression of the function. Notice that these values are just t):e rules for differentiation of an independent variable and a constant. Differentiation arithmetic can be extended to include any differentiable function by means of the following formula f

w =(f (49 @i-‘(u)~9

where f’(u) is the fir:: derivative of $ with respect to U. This formula allows us to define the necessary rules for computing the derivatives of the trigonometric functions, ine logarithmic functions, the exponential function, and other functions. By extending differentiation arithmetic to include functions of several variables we can t f be a function of n vari compute partial derivatives. first partial derivatives can valuated using the rules d variable is replaced by

38

L.C. Rich, D. R. Hill / Automa tic differentiation in MA TLAB

C = (c, 0,) where 0, is the zero vector. This extension of differentiation arithmetic is called GRADIENT ARITHMETIC. Taking this process one step further allows us to compute second partial derivatives. Using gradient arithmetic as a model we can define a new arithmetic called HESSIAN ARITHMETIC based on ordered triples U = (u, u’, u”) of real numbers and the known rules for obtaining first and second partial derivatives of the basic unary and binary operations. In HESSIAN ARITHMETIC the indepclldent variables Xi are replaced by Xi = (Xi, e,, I),) and constants C are replaced by C = (c, OV,0,) where e, is the ith unit vector, 0, is the zero vector, and 0, is the zero matrix.

This appendix contains the TURBO-C/MATLAB #include

"grad.h"

#include

"cmex.h"

user

( nlhs,

plhs,

;

/*

number

of

left-hand

/*

number

of

right-hand

fen

int

nlhs

int

nrhs

Matrix

; *plhs&LJ

guments

nrhs,

;

/*

MATLAB

;

/*

MATLAB

prhs

MEX-File interface for.+GRAD.

1

Matrix

side

arguments

side

structure

arguments for

*/ */

left-hand

side

ar-

right-hand

side

ar-

*/

Matrix

*prhsCl

guments

Matrix

structure

for

*/

c int

i

int

NDimension

int

NumberOfTokens;

char

; ;

MathExpressionCMAX

double

DomainPointLMAX_VARIABLES3

GRADIENT

Result

; number

/*

verify

the

if

( nrhs

!=

mex

MATH_EXPRESSION+13

error

(

2

of

;

;

input

arguments

*/

1

"(grad):

Two

input

arguments

required.\ntB

1

;

required/W'

1

;

1/*

verify

the

if

( nlhs

!=

number 1

of

output

arguments

*/

1

< mex 1-

error

(

"(grad):

One

output

argument

L. C. Rich, D. R. I-M / Automatic differentiation in MA TU B /*

ccpy

input

math

NumberOfTokens for

expression

from

= prhsCOl+n

( i = 0 ;

; itt

= prhsCOl+prCiIl

Gil

matrix

structure

*/

;

i < NumberOfTokens

MathExpression

MATLAB

39

1

;

3 MathExpressionCNumberOfTokensl /*

copy

input

NDimension for

domain

point

= prhstll->n

( i = 0

; i <

= from

"\O ’ ; MATLAB

matrix

structure

*/

;

NDimension

; it+

1

c DomainPointCil

= prhsCll->prCil

;

3 /* co,mpute tives */ Result /*

value

= grad(

setup

MATLAB = create

/*

Result

pihsCO1 for

->prCOl

( i = 0

MathExpression

MathExpression,

plhsC01 copy

of

matrix

DomainPoint

structure

matrix( to

MATLAB

=

Resu1t.f

and

to

return

prhstll->m, matrix

its

first

deriva-

1 ; Result

to

UprhsCll->n)

structure

partial

MATLAB +

I),

*k/ REAL

1 ;

*/

;

; i < NDimension

;

it+

1

c plhsCOl->prCi+ll

= Result.dfCil

;

3 3

This appendix contains the MATLAB program NEWT6 used to solve systems of nonlinear equations with Newton’s method. The function GR.4D is called to compute the Jacobian matrix. function %NEWTG %

CY,

cl

- Solves with

= newtgt the

the

FJ

nonlinear

Jacobian

X0,

tol,

system matrix

max,

jsen

1

C=(X) = 0 by

determined

by

Newton's the

autom

method

L.C. Rich, D.R. Hill / Automatic differentiation in A&lTLAB

40 %

differentiation

function

%

Use

-->

%

Input F

%

in

the

arguments -- A column

x0 to1

%

max jscn

%

of

k functions

--

method Maximum

--

Note:

number

Tolerance matrix J

% %

vector

---

% %

= newtg(F,XO,tol,max,jseni

CY,cI

strings) A vector of k values Error tolerance for

% %

form

grad.

of

for

A warning message If jsen is omitted

% %

is

set

to

(character

(initial termination

iteration

condition

<-=-

guess) of Newton's

to

number

is issued if as an input

be

performed

of

the

rcond( J argument,

Jacobian

1 < jsen. then it

lO*eps.

Output arguments containing the results of each iteration as Y -- A matrix The last column is the approximate column vectors. solution of the nonlinear system to the given toler-

%

% % % %

ante.

%

C

A vector

--

%

cobians

% check if nargin

for

the

containing at

correct

each

number

the

condition

numbers

of

the

iteration. of

input

arguments

< 4

disp( ’ ’ 1 disp( 'NEEGCTGERROR: disp( ’ ’ 1 help newtg

incorrect

number

of

input

4,

then

arguments.

’ 1

return end %

if

the

number

of

input

arguments

is

if nargin == 4 jsen = IO * eps end % check Emf,

x nx

nf3

the

= size(

= reshapet =

dimensions

X0,

of

the

input

F 1 ; 1,

lengtht

X0

1 1 ;

arguments

set

jsen

to

lO*eps

Ja-

L.C. Rkh, D.R. Hill / Automatic differentiation in A44TUB if

mf

-= nx

disp(



’ 1

disp ( return end

'NEWTG

ERROR:

% apply

Newton's

notdone Y =x;

=

while

compute

%

in

method

==

function

i =

values

CY

% check

to

x3

(stored

in

fJ

and

Jacobian

(stored

iteration

condition

Y

<

of

the

Jacobian

matrix

check

J 1 ;

jsen

NEWTG

WARNING:

rcondf

J 1 is

Less

’ 1

whether

norm(

disp( disp( solution

the

d ', inf ’

J

’ 1

'**** ’

matrix

number

= rcond(

cUteration l

vector

end

if

values

;

the

disp( disp( disp(

%

‘Y ’

;

correction 1;

the

ctiteration) if

equal

disagree'

1 ;

= G(l,j)

increment =

is

variables

1:mf

compute the d= J \ t-f x= X+d;

Y

notdone

and

J)

J(i.j-I) end end

%

while

equations

'Y ’

G= Grad( F(i,:), X f(i,l) = G(l,l) ; for j = 2:imf+l)

%

of

= 1 ;

notdone

%

number

'Y ’ ;

iteration

for

41

error

1 <=

tolerance

has

been

met

tol

’ 1

'Error tolerance = X ’

= C 'after tmp_msg disp( tmp_msg 1



satisfied int2strCiteration)

ith ’ 1 ’

i

than

jsen.

’ 1

1

L.C. Rich, D. R. Hill / Automatic differentiation in n/t4 TLAB

42 disp(



disp(

'The

error

=

’ 1 estimated

norm<

notdone

=

error

d ', inf

'N

in

the

final

approximation

is ’ 1

1

’ ;

end %

check

%

formed if

whether

notdone

the

==

maximum

'Y ’ &

disp(

'The

disp(



maximum

disp(

'The

Last

=

'II’ ;

number

of

>=

max

iteration number

of

iterations

iterations

has

have

been

been

performed

per-

’ 1

’ 1 approximate

soiution

was

’ 1

x ’ notdone end %

increment if

notdone

iteration

the

number

==

'Y ’

=

iteration

of

+

iterations

1

performed

;

end end

eferences 113G.F. Corliss and L.B. Rail, Adaptive, self-validating numerical quadrature,

SIIAM J. Sci. Cornput. 8 (1987)

8::: -847.

El A.A.M. Cuyt and L.B. Rail, Computational

implementation of the multivariate Halley method for solving nonlinear systems of equations, ACM Trans. Math. Software 11 (1985) 20-36. [31A. Griewank, On automatic differentiation, in: M. Iri and K. Tanabe, eds., Mathematical Programming: Recent Developments and Applications (Kluwer, Dordrecht, 1989) 83-108. Mathematics and 141K.E. Hillstrom, User’s guide for JAKEF, Technical Memorandum ANL/MCS-TM-16, Computer Science Division, Argonne National Laboratory, Argonne, IL 60439 (1985). El S.W. Horton, Numerical derivatives: A comparative study, Master of Science Thesis, University of Southern California (1989). b1 J.E. Horwedel, B.A. Worley, E.M. Oblow and F.G. Pin, Grcss version 0.0 user’s manual, ORNL/TM 10835, OakRidge National Laboratory, Oak Ridge, TN 37830 (1988). 171 R.E. HUSS, An ADA library for automatic evaluation of dcrivativcs, Appl. Math. Cornput. 35 (19901 103-123. Bl M. Iri, Simuhaneous computation of functions, partial derivatives and estimates of rounding errors-complexity and practicality, Japan J. Appl. Math. 1 (1984) 223-252. [91 M. Iri and K. Kubota, Methods of fast automatic differentiation and applications, Research Memorandum RMI 87-0, Department of Mathematical Engineering, University; of Tokyo (1987). ml M.E. Jerrell, Automatic differentiation in almost any language, SIGNUM 24 (1) (1989) 2-9. Dll H. Kagiwada, R. Kalaba, N. Rosakhoo and K. Spingarn, Nrrmerical Deriltatilfes and Nonlinear Anailysis, Mathematical Collcepts and Methods in Science and Engineering 31 (Plenum Press, New York, 1986). WI R. Kalaba, L. Tcsfatsion and J.-L. Wang, A finite algorithm for the exact evaluation of higher order partial derivatives of functions of many variables, J. Math. Anal. Appl. 92 (1983) 552-63.

L.C. Rich, D. R. Hill / Automatic differentiation in MA TLAB

43

[13] R. Kalaba, N. Rasakhoo and A. Tishler, Nonlinear least squares via automatic derivative evaluation, A&. Math. Comput. 12 (1983) 119-137. [14] R. Kalaba and L. Tesfatsion, Automatic differentiation of functions of derivatives, Conzput. M&h. Appl. 12A (1986) 1091-l 103. [15] R. Kalaba and L. Tesfatsion, Automatior! of nested matrix and derivative operations, Appl. Math. Cornput. 23 ( 1987) 243-268. [ 161 R. Kalaba and L. Tesfatsion, Nonlocal automated sensitivity analysis, Comput. Mgth. Appl. 20 (1990) 53-65. [17] D. Kuba and L.B. Rall, A UNIVAC 1108 program for obtaining rigorous error bounds for approximate solutions of systems of equations, Technical Summary Report 1168, Mathematics Research Center, University of Wisconsin at Madison, Madison, WI (1972). [18] R.D. Neidinger, Automatic differentiation and APL, College Math, J. 20 (3) (1989) 238-251. [19] PC-MATLAB User’s Guide (MathWorks, Sherborn, MA, 1987). [20] L.B. Rail, Applications of software for automatic differentiation in numerical computation, Computing 2 (1980) 141-156. [21] L.B. Rail, Automatic Differentiation:

Techniques and Applications, Lecture Notes in Computer Science 120 (Springer, Berlin, 1981). [22] L.B. Rail, Differentiation and generation of Taylor coefficients in PASCAL-SC, in: 1J.W. Kulisch and W.L. Miranker, eds., A New Approach to Scientific Computation (Academic Press, New York, 1983) 291-309. [23] L.B. Rail, Differentiation in Pascal-SC: Type GRADIENT, ACM Trans. Math. Software 10 (1984) 161-184. [24] L.B. Rail, The arithmetic of differentiation, Math. Mag. 59 (5) (1986) 275-282. 1251 L.C. Rich, An implementation of automatic differentiation, Master’s Thesis, Department of Mathematics, Temple University, Philadelphia, PA (1989). [26] J.W. Sawyer, First partial differentiation by computer with application to categorical data analysis, Amer. Statist. 38 (1984) 300-308. [27] B. Speelpenning, Compiling fast partial dericatices of functions giuen by algorithms, Ph.D. Thesis, Department

[28] [29] [30]

[31]

of Computer Science, University of Illinois at Urbana-Champaign, UIUCDCS-R-80-1002 (1980). W.C. Thacker, Large-Least Squares Problems and the Need for Automating the Generation of Adjoint Codes, Lectures in Applied Mathematics (Amer. Math. Sot., Providence, RI, 1990) 645-677. R.E. Wengert, A simple automatic derivative evaluation program, Comm. ACM 7 (1964) 463-464. A.S. Wcxlcr, Automatic evaluation of derivatives, Appl. Math. Comput. 24 (1987) 19-46. R.D. Wilk ins, Investigation of a new analytic method for numerical derivative evaluation, Comm. ACM '7(1964) 465-471.