Applied Numerical Mathematics 9 (1992) 33-43 North-Holland
33
Lawrence C. Rich The Mathematical Software Group, 54 Sequoia Drive, Newtown, PA 18940, USA
David R. Hill Mathematics Department, Temple University,Philadelphia, PA 19122, USA
Abstract Rich, L.C. and D.R. Hill, Automatic differentiation 33-43.
in MATLAB, Applied Numerical Mathematics 9 (1992)
Automatic differentiation is a chain rule based technique for derivative evaluation introduced by Wengert (1964) (see also Wilkins (1964)) and further developed by Rail (1980, 1981, 1983). With the development of more powerful programming languages automatic differentiation has received increased attention as a viable alternative to symbolic differentiation and finite difference methods. (See Griewank (19891.) Here we present a C language implementation of automatic differentiation that can be called from lI4ATLAB, a highly interactive robust scientific computing environment. Our implementation accepts a multivariate function and a point as input returning a function value and gradient, hence the Jacobian at a point can be constructed. The implementation along with examples is described. The code is available from the authors.
Keywords. Automatic differentiation,
parser, data structure, C language, gradient, Jacobian, MATLAB.
1. Introduction
I. I. Automa tic differentiation
First partial derivatives can now be computed easily and accurately within the MATLAB ’ software environment using a new C language implementation of automatic differentiation 2. Automatic differentiation is a chain rule based technique used to obtain numerical values for the derivatives of a multivariate function simultaneously with the function evaluation without appeakrg to numeric or symbolic differentiation. Our C language function GRAD [25] automatically computes the value of a multivariate function and its gradient vector of first partial derivatives at a given domain point. G called from MATLAB with a character string representing a function f of n variables and a point X in n-space at which f and its first partial derivatives are to be evaluated. A vector of ’ MATLAB is an interactive computing environment that provides access to quality matrix computation software as in LINPACK and EISPACK. 2 The program we describe is also available from the authors as a stand-alone C function on C’s and the VAX.
L.C. Rich, D. R. Hill / Automatic differentiation in MA I-LAB
33
rz + 1 values is returned from GRAD representing f(X) and the first partial derivatives of f at X. In the following we compare GRAD to several implementations of automatic differentiation, explain how GRAD was implemented, illustrate how GRAD is used in solving a system of nonlinear equations, and indicate that GRAD can be used in many applications that require the computation of first partial derivatives. A wide variety of automatic differentiation implementations appear in the literature. Here we mention several to indicate how our work relates to the existing imple*.rentations. 1.2. Table algorithms In the works of Kalabs and others [ 12-15,I 1,30,5], a table approach to automatic differentiation is used. In this approach the values of a multivariate function and its partial derivatives are obtained by decomposing a multivariate function into elementary functions. The elementary functions are then evaluated by calling a series of subroutines that compute the value of each elementary function and its partial derivatives simultaneously. The user must compute the partial derivatives of each elementary function by hand, translate them into subroutines, and then manually order the subroutine calls in a program before compiling pnd linking to produce an executable code. One advantage of this approach is that it can be extended to compute higher-order partial derivatives by continuing the table. However, when the multivariate function is changed the process outlined above must be repeated before an executable code can be produced. i3
Operator overloading
An alternative approach of operator overloading is used by Rall in a Pascal-SC implementation [23] of automatic differentiation. Pascal-SC allows the user to redefine the standard arithmetic operators so they can be used with user defined data types. The GRADIENT data type defined by Rall represents the value of a function of n variables and its gradient vector of first partial derivatives with respect to the 12independent variables. By redefining the standard arithmetic operators to perform GRADIENT arithmetic [24] the values of the first partial derivatives of the function can be computed simultaneously with the function evaluation (see Appendix A for a brief explanation of gradient arithmetic). The main advantage of this approach is that the user is not required to decompose the multivariate function into elementary functions, since this is handled by the Pascal-SC compiler. However, as in the table algorithm, each time the multivariate function is changed the program must be compiled and linked to produce an executable code. A similar approach is used in [7] where the implementation is in ADA. See also [ 10,181. I. 4. Precompilers
Several FORTRAN precompilers exist [4,6,9] that accept as input FQRTRAN source code for the evaluation of a multivariate function. Each precompiler translates the given source code into a FORTRAN subroutine that computes both the value of the multivariate function and its partial derivatives. The resulting subroutine must then bc compiled and linked with the
L.C. Rich, D. R. Jill / Automatic differerztiation in MATLA B
35
FORTRAN application code to produce an executable code. This method offers flexibility in the complexity of the multivariate functions that can be differentiated, As with the methods mentioned above, each time the multivariate function is changed a compile and link step is required to produce an executable code. 1.5. Forward and reverse .modes
The implementations of automatic differentiation desrribed in Sections 1.2 and 1.3 operate in what is termed the forward mode. That is, the value of a function and its partial derivatives are evaluated simultaneously. In the reverse mode due to Speelpenning [27] the partial derivatives are computed after the function has been evaluated. The precompilers described in Section 1.4 all operate in the reverse mode (GRESS [6] also operates in the forward mode). The works of Iri [S] and Griewank [3] indicate that the reverse mode is superior in terms of computational effort, but that the forward mode may often require significantly less storage. An excellent discussion of the forward and reverse modes is found in Griewank [3] together with the execution speed and storage requirements of the two modes. In the next section we describe GRAD, which is a forward mode implementation of automatic differentiation.
2. The function GRAD in MATLAB 2.1. Motivation In 1972 Kuba and Rall [17] described a FORTRAN program for obtaiiring rigorous error estimates for the solution of systems of equations. A parser was used to read in the system of equations, decompose each equation into its components parts, and determine the correct calling sequence of subroutines to evaluate each of the partial derivatives at a given point. The motivation for our function GRAD is derived from this and other work by Rall [23,24]. 2.2. The implementation of GRAD in IMATLAB User defined functions can be added to extend the capabilities of MATLAB. The MEX-File interface of MATLAB [19] makes it possible to call FORTRAN and C programs as if they were built-in functions. The MEX-File interface accepts programs written in MS-FORTRAN, MS-C, and TURBO-C. Our MEX-File interface for GRAD is contained in Appendix B. The format for calling GRAD in MATLAB is result = grad ( math -expression, domain goin t )
where math_expression is a character string representing a function of n variables composed of unary and binary operations, domainyoint is a vector representing a point in n-space at which the function and its first partial derivatives are to be evaluated, and result is a vector of n f 1 values representing the function and its first partial derivatives evaluated at domairzgoint. Standard arithmetic has been replaced with GRA arithmetic to co the first partial derivatives simultaneously wit the value of math _expressiortat domaingoint.
L.C. Rich, D. R. Hill / Automatic differentiation in MA TUB
36
The evaluation of math_expression is directed by a parser. The parser used in GRAD is a modified version of a C program distributed with Borland’s TURBO-C. In order to understand how the parser affects the evaluation of math-expression, we consider the following example. xarnpk 2.2.1. Suppose math _expression is given as x, * x2 +x3. The parser scans math _expression one character at a time searching for items called tokens. Each item in x, * x2 +x,
is considered to be a token. As scanning proceeds, tokens are pushed onto a stack if there is not sufficient information to decide what to do next, or popped off the stack when the parser has sufficient information to perform an action. Once the parser “sees” x2 it has enough information to perform x1 * x2. In this case x 1, x2, and * are popped off the stack, GRADIENT multiplication is performed, and the result is pushed onto the stack. When the parser “sees” x3 it has enough information to perform x, * x2 +x,. In this case x, * x2 and + are popped off the stack, GRADIENT addition is performed, and the result is pushed onto the stack. The stack result is then returned to MATLAB containing the value of math-expression and its first partial derivatives evaluated at domainqoint. If math-expression is changed, there is no need to produce a new executable code, only modifications in the character string are required before calling GRAD. Note that if math_expressioa remains unchanged in subsequent calls, repeated parsing occurs since the evaluation of the stack is driven by the recognition of tokens in math_expression. For moderate sized problems that can be expressed succinctly as character strings, GRAD offers a convenient way to compute the value of a multivariate function and its first partial derivatives without repeated production of an executable code. This approach complements MATLAB’s highly interactive environment by providing a new tool for problem solving. pie 2.2.2. As an example of the use of our function GRAD we consider the nonlinear system e(-xl+x2)
-
0.1
=
0,
e(-xl-x2)
-
0.1
=
0,
which appears in 121.Newton’s method with GRAD to determine the Jacobian at each step is used with an initial guess x1 = 4.3 and x2 = 2.0. The termination criteria required the infinity norm of the Newton increment to be less than or equal to the tolerance l.OE-12. In addition we monitored the condition number of the Jacobian using MATLAB’s built-in LINPAC estimator RCOND, which estimates the reciprocal of the condition number in the l-norm (see il9]). In the MATLAB program NEWTG (see Appendix C) we display a warning message whenever RCOND is below a user defined threshold. The results we obtained were virtually identical to those reported in [a], convergence in 56 iterations. The condition number estimates for the Jacobian computed by NEWTG varied between a wide range of approximately l.OE-1 to l.OE-22. When RCQND was below l.OE-16 MATLAB’s linear equation solver issued the warning message Matrix Results
is close may be
to singular inaccurate.
or
badly
scaled,
L.C. Rich, DR. Hill / Automatic differentiation in MA TLA B
together with the corresponding value of RCOND. The ATLAB environment makes to incorporate such condition number estimates into your algorithms.
33 iteasy
3. Remarks Automatic differentiation is being used in an increasingly wide variety of applications including categorical data analysis [26], the solution of nonlinear systems [2], the solution of nonlinear least squares [13], quadrature [ 11, weather modeling [28], economics modeling, sensitivity analysis [ 161,optimization, and the solution of differential equations. The function GRAD offers a convenient method for computing the gradient vector or Jacobian matrix of a given multivariate function in the MATLAB environment. As Example 2.2.2 illustrates it is easy to develop algorithms in MATLAB in combination with GRAD. An enhanced version of this work will soon be available for IIessians.
In this appendix we consider the differentiation arithmetic defined by Rall [24] in 1986. ifferentiation arithmetic is defined as an ordered pair arithmetic in which the basic elements are pairs of real numbers. If U = (u, u’) and V= (v, v’) are elements of R*, then the basic rules for differentiation arithmetic can be expressed as: U+V=h+v,
u'+v'),
Addition: Subtraction: Multiplication:
lbv=b.J-v/ U"V'), UV=(uv, u'v+uv'),
Division:
u/v=h.J/v,
cvu'-uv'3/v2L
The first component of each ordered pair represents the function value formed by the rules for evaluation in real arithmetic. The second component represents the value of the derivative formed by the corresponding rules for differentiation. A function of one variable and its first derivative can be evaluated with the rules given above if X = (x, 1) is substituted for x and C = (c, 0) is substituted for each constant c in the expression of the function. Notice that these values are just t):e rules for differentiation of an independent variable and a constant. Differentiation arithmetic can be extended to include any differentiable function by means of the following formula f
w =(f (49 @i-‘(u)~9
where f’(u) is the fir:: derivative of $ with respect to U. This formula allows us to define the necessary rules for computing the derivatives of the trigonometric functions, ine logarithmic functions, the exponential function, and other functions. By extending differentiation arithmetic to include functions of several variables we can t f be a function of n vari compute partial derivatives. first partial derivatives can valuated using the rules d variable is replaced by
38
L.C. Rich, D. R. Hill / Automa tic differentiation in MA TLAB
C = (c, 0,) where 0, is the zero vector. This extension of differentiation arithmetic is called GRADIENT ARITHMETIC. Taking this process one step further allows us to compute second partial derivatives. Using gradient arithmetic as a model we can define a new arithmetic called HESSIAN ARITHMETIC based on ordered triples U = (u, u’, u”) of real numbers and the known rules for obtaining first and second partial derivatives of the basic unary and binary operations. In HESSIAN ARITHMETIC the indepclldent variables Xi are replaced by Xi = (Xi, e,, I),) and constants C are replaced by C = (c, OV,0,) where e, is the ith unit vector, 0, is the zero vector, and 0, is the zero matrix.
This appendix contains the TURBO-C/MATLAB #include
"grad.h"
#include
"cmex.h"
user
( nlhs,
plhs,
;
/*
number
of
left-hand
/*
number
of
right-hand
fen
int
nlhs
int
nrhs
Matrix
; *plhs&LJ
guments
nrhs,
;
/*
MATLAB
;
/*
MATLAB
prhs
MEX-File interface for.+GRAD.
1
Matrix
side
arguments
side
structure
arguments for
*/ */
left-hand
side
ar-
right-hand
side
ar-
*/
Matrix
*prhsCl
guments
Matrix
structure
for
*/
c int
i
int
NDimension
int
NumberOfTokens;
char
; ;
MathExpressionCMAX
double
DomainPointLMAX_VARIABLES3
GRADIENT
Result
; number
/*
verify
the
if
( nrhs
!=
mex
MATH_EXPRESSION+13
error
(
2
of
;
;
input
arguments
*/
1
"(grad):
Two
input
arguments
required.\ntB
1
;
required/W'
1
;
1/*
verify
the
if
( nlhs
!=
number 1
of
output
arguments
*/
1
< mex 1-
error
(
"(grad):
One
output
argument
L. C. Rich, D. R. I-M / Automatic differentiation in MA TU B /*
ccpy
input
math
NumberOfTokens for
expression
from
= prhsCOl+n
( i = 0 ;
; itt
= prhsCOl+prCiIl
Gil
matrix
structure
*/
;
i < NumberOfTokens
MathExpression
MATLAB
39
1
;
3 MathExpressionCNumberOfTokensl /*
copy
input
NDimension for
domain
point
= prhstll->n
( i = 0
; i <
= from
"\O ’ ; MATLAB
matrix
structure
*/
;
NDimension
; it+
1
c DomainPointCil
= prhsCll->prCil
;
3 /* co,mpute tives */ Result /*
value
= grad(
setup
MATLAB = create
/*
Result
pihsCO1 for
->prCOl
( i = 0
MathExpression
MathExpression,
plhsC01 copy
of
matrix
DomainPoint
structure
matrix( to
MATLAB
=
Resu1t.f
and
to
return
prhstll->m, matrix
its
first
deriva-
1 ; Result
to
UprhsCll->n)
structure
partial
MATLAB +
I),
*k/ REAL
1 ;
*/
;
; i < NDimension
;
it+
1
c plhsCOl->prCi+ll
= Result.dfCil
;
3 3
This appendix contains the MATLAB program NEWT6 used to solve systems of nonlinear equations with Newton’s method. The function GR.4D is called to compute the Jacobian matrix. function %NEWTG %
CY,
cl
- Solves with
= newtgt the
the
FJ
nonlinear
Jacobian
X0,
tol,
system matrix
max,
jsen
1
C=(X) = 0 by
determined
by
Newton's the
autom
method
L.C. Rich, D.R. Hill / Automatic differentiation in A&lTLAB
40 %
differentiation
function
%
Use
-->
%
Input F
%
in
the
arguments -- A column
x0 to1
%
max jscn
%
of
k functions
--
method Maximum
--
Note:
number
Tolerance matrix J
% %
vector
---
% %
= newtg(F,XO,tol,max,jseni
CY,cI
strings) A vector of k values Error tolerance for
% %
form
grad.
of
for
A warning message If jsen is omitted
% %
is
set
to
(character
(initial termination
iteration
condition
<-=-
guess) of Newton's
to
number
is issued if as an input
be
performed
of
the
rcond( J argument,
Jacobian
1 < jsen. then it
lO*eps.
Output arguments containing the results of each iteration as Y -- A matrix The last column is the approximate column vectors. solution of the nonlinear system to the given toler-
%
% % % %
ante.
%
C
A vector
--
%
cobians
% check if nargin
for
the
containing at
correct
each
number
the
condition
numbers
of
the
iteration. of
input
arguments
< 4
disp( ’ ’ 1 disp( 'NEEGCTGERROR: disp( ’ ’ 1 help newtg
incorrect
number
of
input
4,
then
arguments.
’ 1
return end %
if
the
number
of
input
arguments
is
if nargin == 4 jsen = IO * eps end % check Emf,
x nx
nf3
the
= size(
= reshapet =
dimensions
X0,
of
the
input
F 1 ; 1,
lengtht
X0
1 1 ;
arguments
set
jsen
to
lO*eps
Ja-
L.C. Rkh, D.R. Hill / Automatic differentiation in A44TUB if
mf
-= nx
disp(
’
’ 1
disp ( return end
'NEWTG
ERROR:
% apply
Newton's
notdone Y =x;
=
while
compute
%
in
method
==
function
i =
values
CY
% check
to
x3
(stored
in
fJ
and
Jacobian
(stored
iteration
condition
Y
<
of
the
Jacobian
matrix
check
J 1 ;
jsen
NEWTG
WARNING:
rcondf
J 1 is
Less
’ 1
whether
norm(
disp( disp( solution
the
d ', inf ’
J
’ 1
'**** ’
matrix
number
= rcond(
cUteration l
vector
end
if
values
;
the
disp( disp( disp(
%
‘Y ’
;
correction 1;
the
ctiteration) if
equal
disagree'
1 ;
= G(l,j)
increment =
is
variables
1:mf
compute the d= J \ t-f x= X+d;
Y
notdone
and
J)
J(i.j-I) end end
%
while
equations
'Y ’
G= Grad( F(i,:), X f(i,l) = G(l,l) ; for j = 2:imf+l)
%
of
= 1 ;
notdone
%
number
'Y ’ ;
iteration
for
41
error
1 <=
tolerance
has
been
met
tol
’ 1
'Error tolerance = X ’
= C 'after tmp_msg disp( tmp_msg 1
’
satisfied int2strCiteration)
ith ’ 1 ’
i
than
jsen.
’ 1
1
L.C. Rich, D. R. Hill / Automatic differentiation in n/t4 TLAB
42 disp(
’
disp(
'The
error
=
’ 1 estimated
norm<
notdone
=
error
d ', inf
'N
in
the
final
approximation
is ’ 1
1
’ ;
end %
check
%
formed if
whether
notdone
the
==
maximum
'Y ’ &
disp(
'The
disp(
’
maximum
disp(
'The
Last
=
'II’ ;
number
of
>=
max
iteration number
of
iterations
iterations
has
have
been
been
performed
per-
’ 1
’ 1 approximate
soiution
was
’ 1
x ’ notdone end %
increment if
notdone
iteration
the
number
==
'Y ’
=
iteration
of
+
iterations
1
performed
;
end end
eferences 113G.F. Corliss and L.B. Rail, Adaptive, self-validating numerical quadrature,
SIIAM J. Sci. Cornput. 8 (1987)
8::: -847.
El A.A.M. Cuyt and L.B. Rail, Computational
implementation of the multivariate Halley method for solving nonlinear systems of equations, ACM Trans. Math. Software 11 (1985) 20-36. [31A. Griewank, On automatic differentiation, in: M. Iri and K. Tanabe, eds., Mathematical Programming: Recent Developments and Applications (Kluwer, Dordrecht, 1989) 83-108. Mathematics and 141K.E. Hillstrom, User’s guide for JAKEF, Technical Memorandum ANL/MCS-TM-16, Computer Science Division, Argonne National Laboratory, Argonne, IL 60439 (1985). El S.W. Horton, Numerical derivatives: A comparative study, Master of Science Thesis, University of Southern California (1989). b1 J.E. Horwedel, B.A. Worley, E.M. Oblow and F.G. Pin, Grcss version 0.0 user’s manual, ORNL/TM 10835, OakRidge National Laboratory, Oak Ridge, TN 37830 (1988). 171 R.E. HUSS, An ADA library for automatic evaluation of dcrivativcs, Appl. Math. Cornput. 35 (19901 103-123. Bl M. Iri, Simuhaneous computation of functions, partial derivatives and estimates of rounding errors-complexity and practicality, Japan J. Appl. Math. 1 (1984) 223-252. [91 M. Iri and K. Kubota, Methods of fast automatic differentiation and applications, Research Memorandum RMI 87-0, Department of Mathematical Engineering, University; of Tokyo (1987). ml M.E. Jerrell, Automatic differentiation in almost any language, SIGNUM 24 (1) (1989) 2-9. Dll H. Kagiwada, R. Kalaba, N. Rosakhoo and K. Spingarn, Nrrmerical Deriltatilfes and Nonlinear Anailysis, Mathematical Collcepts and Methods in Science and Engineering 31 (Plenum Press, New York, 1986). WI R. Kalaba, L. Tcsfatsion and J.-L. Wang, A finite algorithm for the exact evaluation of higher order partial derivatives of functions of many variables, J. Math. Anal. Appl. 92 (1983) 552-63.
L.C. Rich, D. R. Hill / Automatic differentiation in MA TLAB
43
[13] R. Kalaba, N. Rasakhoo and A. Tishler, Nonlinear least squares via automatic derivative evaluation, A&. Math. Comput. 12 (1983) 119-137. [14] R. Kalaba and L. Tesfatsion, Automatic differentiation of functions of derivatives, Conzput. M&h. Appl. 12A (1986) 1091-l 103. [15] R. Kalaba and L. Tesfatsion, Automatior! of nested matrix and derivative operations, Appl. Math. Cornput. 23 ( 1987) 243-268. [ 161 R. Kalaba and L. Tesfatsion, Nonlocal automated sensitivity analysis, Comput. Mgth. Appl. 20 (1990) 53-65. [17] D. Kuba and L.B. Rall, A UNIVAC 1108 program for obtaining rigorous error bounds for approximate solutions of systems of equations, Technical Summary Report 1168, Mathematics Research Center, University of Wisconsin at Madison, Madison, WI (1972). [18] R.D. Neidinger, Automatic differentiation and APL, College Math, J. 20 (3) (1989) 238-251. [19] PC-MATLAB User’s Guide (MathWorks, Sherborn, MA, 1987). [20] L.B. Rail, Applications of software for automatic differentiation in numerical computation, Computing 2 (1980) 141-156. [21] L.B. Rail, Automatic Differentiation:
Techniques and Applications, Lecture Notes in Computer Science 120 (Springer, Berlin, 1981). [22] L.B. Rail, Differentiation and generation of Taylor coefficients in PASCAL-SC, in: 1J.W. Kulisch and W.L. Miranker, eds., A New Approach to Scientific Computation (Academic Press, New York, 1983) 291-309. [23] L.B. Rail, Differentiation in Pascal-SC: Type GRADIENT, ACM Trans. Math. Software 10 (1984) 161-184. [24] L.B. Rail, The arithmetic of differentiation, Math. Mag. 59 (5) (1986) 275-282. 1251 L.C. Rich, An implementation of automatic differentiation, Master’s Thesis, Department of Mathematics, Temple University, Philadelphia, PA (1989). [26] J.W. Sawyer, First partial differentiation by computer with application to categorical data analysis, Amer. Statist. 38 (1984) 300-308. [27] B. Speelpenning, Compiling fast partial dericatices of functions giuen by algorithms, Ph.D. Thesis, Department
[28] [29] [30]
[31]
of Computer Science, University of Illinois at Urbana-Champaign, UIUCDCS-R-80-1002 (1980). W.C. Thacker, Large-Least Squares Problems and the Need for Automating the Generation of Adjoint Codes, Lectures in Applied Mathematics (Amer. Math. Sot., Providence, RI, 1990) 645-677. R.E. Wengert, A simple automatic derivative evaluation program, Comm. ACM 7 (1964) 463-464. A.S. Wcxlcr, Automatic evaluation of derivatives, Appl. Math. Comput. 24 (1987) 19-46. R.D. Wilk ins, Investigation of a new analytic method for numerical derivative evaluation, Comm. ACM '7(1964) 465-471.