t...!ODEUl\(; LA:\(;Ll.-\CES 11
( :o p \ ri g lll © IF.\( : Ihn ;lIlIi( \ li Hll-lIill g' ;II IC I ( :(Jll l ro l 0 1 :\ ;lliolla l b Oll o lnil", \\';.", hill g'lt lll \)( :, l 'S.\ I q;-{ : ~
A MATHEMATICAL-COMPUTER LANGUAGE FOR LINEAR PROGRAMMING PROBLEMS D. Kendrick C I' II/I'/"
f in l:'colllllllic !?" ." 'lIrch . D ,'/)({I'IIIII" I/ of EUIII IJ lll ic<. Ull h'fni/.\' of T n (ls 11/ AilS/ill. AilS/ill. T ,'xII.,. L'SA
Abstract
offer an improved way of communicating with computers, they leave much to be desired as a medium of communication among humans. A powerful new approach to mode ling called GAMS (General Algebraic Modeling System) has recently been developed by Alexander Meeraus (1982) at the World Bank. This approach greatly improves the ease of communication between individuals and computers as well as among individuals.
7his pape r de scribes the development of a compiler which permits the translation of linear programming problems from a mathematical statement of the problem to a GAMS statement. The GAMS statement can then be read by the computer and the linear programming problem can be solved. Thus it is demonstrated that it is feasible to develop a computer language which will permit one to use mathematica l symbols to direct the ~0 mputer to solve a cl a ss of linear I, "ogramming problems.
However, even with the use of the GAMS language there remains a gap between the mathematical notation that is used to conceptualize and communicate models and the language that is used to input the models to the computer. This paper describes a step toward filling that gap by the creation of a computer language that accepts mathematical input and creates a GAMS statement of a linear programming model. This is done by writing the mathematical model in a text formatting language and using this input not only to create the mathematical statement but also to create the GANS statement. Thus one can conceptualize and write in mathematics and can be assured that the computer is solving precisely the same model as is described by the mathematics.
~ h ~ paper provides a small e xample linear i)l' vgramming transportation problem and shows h u~ the problem may be input to the computer i, a mathematical formatting language and til ~ n tra nsla ted (i) into a mathematical Si .atement and (ii) into a GAMS statement. T[,en a description is given of the lexical analyze r generator and the parser generator tltal were used to develop the math to GANS tra nslator.
The present version of the translator can handle only a very restricted set of probl ems. Thus the paper closes with a des cri ption of the work which will be r e quir ed to enabl e the translator to handle a much wider class of programming problems.
This paper does not announce the completi.on of a fully operational language to fill the gap between mathematics and GAMS. Rather it provides something which in physics might be called a "proof of principle". That is, a simple linear programming model has been written out mathematically and a compiler has been created which will translate this model to a GAMS statement. The GAMS statement has then ueen used to solve the model. Thus it has been demonstrated that the gap can be filled. However the language created in this process is presently relatively specific to the example problem and much work remains before it will have general applicability.
I NTRODUCTION Linear programming models are conceptualized and communica ted among individuals in mathematical terms.* In contrast the models are communicated to computers in the "MPS format", a long column of numbers giving row and column names and parameter values. While the mathematical statement of a linear programming model is easy to understand, the input to the computer is extremely tedious to decipher. Therefore it is difficult to assure that the mathematically stated model is ind e ed the same as the model being solved by the computer.
The next section of this paper provides statements of a small linear programming model and shows how the gap between
In order to remedy this situation and make the input to the computer more understandable to humans, a variety of matrix generators have been developed. While matrix generators
* The author is indebted to Bill Lee and Alex Meeraus for many helpful comments and suggestions.
233
D. Kendrick
234
mathematics and GAMS can be filled for this problem. This is followed by a section which provides a description of the lexical analyzer generator (Lex) and the compiler generator (YACC) which were used to develop the math to GAMS translator. The paper closes with a description of the present status of the translator and of the developments which will be required to make it operational.
1.
An Example and Proof of Principle
While it would be desirable to translate directly from the mathematical statement to the GANS statement, this cannot yet be done. Rather, it is necessary instead to begin with the problem in the mathematical formatting language EQN and use this statement to produce both the mathematical and the GAMS statements of the problem. Therefore, this section provides statements of the small linear programming example problem in three languages: (i) mathematics, (ii) GAMS, and (iii) EQN. It would be logical to begin with EQN and then go to mathematics and GAMS. However the mathematical statement is easier to read than GAMS and GM1S is easier to read than EQN. SO they will be presented in the math to GAMS to EQN order. A small linear programming transportation problem was chosen as the example problem because of its simple structure and because this particular problem has been used to illustrate the functioning of GAMS. The problem is given in Figure 1. It is a cannery problem in which there are two plants and three markets. The problem is to meet market demands at minimum transporation cost without vilolating capacity constraints at the plants. The model statement is divided into three sections: • declarations • mathematics • data The declarations section includes a list of all the sets, variables, parameters, and equations which are used in the model. The mathematics section includes the criterion function and the constraints. Finally, the data section includes the vectors and matrices of parameters for the problem. The GAMS statement of the problem is given in Figure 2. It has a slightly different structure from the mathematical statement, namely • sets • data • lists of variables and equations
• constraints • model and solve statements The primary difference between the order of the mathematical and GAMS statements is that the data is presented after the constraints in the math statement and before the constraints in the GAMS statement. A second difference is that the declaration section of the math statement includes sets, variables, parameters, and equations grouped together while the equivalent segment of the GAMS statment is in two parts and need not include the parameter declarations. Thus there are *'s in column one of the parameters part of the GAMS statement to indicate that the statements are ignored by the GAMS compiler and treated like comments. Aside from the order of presentation the primary difference between the two statements is of course that the constraints are stated mathematically in the one case and in the GAMS notation in the the other. To the analyst who is trained in mathematics but not in GAMS the math statement will be easier to read. However, students quickly learn to read the GAMS language and to use it with facility. The choice between writing in mathematics and in GAMS will be a matter of individual style. Once the choice is available some will find that they can think and communicate more clearly and efficiently in mathematics and others will find that GAMS suits them better. As was stated earlier it is not yet possible to go directly from the math to the GAMS statement. Rather is is necessary to write che problem first in a mathematical formatting language and then use that language to create both the mathematical and GAMS statements. Actually two formatting languages are used. One is the general formatting language NROFF (see Ossanna (1977) and Smith and Mashey (1977) ) and the other is the mathematical typesetting language, EQN, and its typewriter counterpart, NEQN (see Kernighan and Cherry (1977». The description of the problem in the NROFF-NEQN language is given in Figure 3. The first few lines in Figure 3 are used to set the page width and paragraphing and heading style. Also, a library of formatting subroutines is called with the .so statement. Next come the declarations. First the sets and then the variables, parameters and equations. In the variables section, the commands .EQ and .EN are used to set off a section of the input which contains mathematics and which must therefore be processed by EQN before being passed to NROFF. The - or tildes used in this section of the input force extra spaces into the mathematical statement of the model. Without them the words and symbols in the EQN section of the input would run together. Next notice tlle greek variable which is spelled out here as xi. It will be translated to ~ in the ;,.J -hematical statement and to XI in the GAMS
A Mathematical-Computer Language
In summary, the model is input in the form shown in Figure 3. Then the NEQN and NROFF software is used to translate the model into the mathe~atical form shown in Figure 1. Next the input in Figure 3 is processed with the math to CAMS translator developed by the author to produce the CAMS statement in Figure 2. The combination of these three statements permits one to work with a mathematical statement of the problem and make changes to this statement which can then be translated into CAMS and solved with a linear programming routine. Furthermore CAMS will eventually be able to solve non-linear problems so the method can be extended to a much larger class of problems.
Figure 1 Mathematical Statement For sets I
plants { Seattle, San-Diego}
J
markets { New-York, Chicago, Topeka }
variables
parameters
The statement in Figure 3 was input by the author and used to produce both Figure 1 and Figure 2. Thus it is possible "in principle" to fill the gap between mathematics and CAMS statements. The software which was used to develop the programs to fill this gap is described in the next section.
c ij , ki, rj; equations 1, 2, 3, 4; minimize (1)
subject to
2.
( 2)
Xij
(3)
( 4)
x
i£I
~
r ·.
j £J
J'
i£I
~O
ij
where Vector k
k· .
~,
Xij
itr
235
Capacity of Plants
Development of the Processor
Two pieces of software were crucial to the development of the language described in this paper. Both pieces are a part of the UNIX system which was developed at the Bell Telephone Laboratories. The first package is "Lex". a lexical analyzer generator developed by Lesk and Schmidt (1977) and the second is "YACC", yet another compiler-compiler, developed by Johnson (1977).
j £J
cases per year
Seat tIe 350 San-Diego 600 Vector r
Requirements at Markets
New-York Chicago Topeka Matrix c Seat tIe San-Diego
cases per year
300 300 300
Unit Transportation Costs New-York 25 25
dollars per case
Chicago 17 18
statement. This is followed by the variable x sub {i j}. This variable will appear as x .. in the mathematical statement and as X(I,J) in the CAMS statement. After the "minimize" command comes the listing of the first equation which is the objective function. The term "mark" in this equation may be used to lineup the equations which follow. The term "SICMA" produces an upper case E in the mathematical statement of the problem and a "SUM" in the CAMS statement. Similarly the term "epsilon" produces a lower case £ in the math statement and nothing in the CAMS statement. Finally the data input parts of the NROFF-NEQN statement are copied in a straightforward manner to both the math and CAMS statements.
Topeka 18 14 Lex uses (i) a list of tokens and (ii) a list of actions to be taken when each token is recognized to generate a lexical analyzer. Similarly YACC uses a grammar for the language to generate a compiler which will parse the language. a. Lexical Analyzer The lexical analyzer processes the input as a long string of characters and breaks it to pieces (tokens). As each token is recognized either an action may be taken (such as printing a string of characters) or a message is passed to the parser indicating the token that has been found. Consider first the way in which the lexical analyzer is created and then the manner in
D. Kendrick
236 Figure 2 GAMS Statement
passing the character strings "MIN" and "MAX" respectively. The greek letter "epsilon" is recognized in a similar manner and the parser is informed that a lower case epsilon has been found.
SETS I PLANTS / SEATTLE, SAN-DIEGO / J MARKETS
/ NEW-YORK, CHICAGO, TOPEKA /
*
DATA PARAMETER K CAPACITY OF PLANTS CASES PER YEAR / SEATTLE 350 SAN-DIEGO 600
The lexical analyzer will also recognize classes of characters. Consider the description of a number. The characters [09] means any number between one and nine and the * placed after the bracket means that this number may be repeated any number of times. The \ is called the escape character and means that the character following it should not be given its usual special meaning but rather should be treated as a regular
/ PARA}lliTER R REQUIREMENTS AT MARKETS CASES PER YEAR / NEW-YORK 300 CHICAGO 300 TOPEKA 300
/
TABLE C UNIT TRANSPORTATION COSTS DOLLARS PER CASE TOPEKA NEW-YORK CHICAGO character. Also the? means that the 18 SEATTLE 25 17 character preceeding it is optional. Thus 14 SAN-DIEGO 25 18 the characters \ . ? mean that there is an optional decimal point. The decimal point VARIABLES may be followed by another string of numbers. XI, X(I,J) Identifiers (denoted by the return of ID) consist of a string of upper and lower case * PARAMETERS letters and numbers of any length and may C(I,J) include one or more dashes as well. * K( I)
*
R(J)
*
In summary, the lexical analyzer is called by the parser and returns the set of letters indicated on the right hand side of Table 1 to specify the token which has been recognized.
EQUATIONS El, E2, E3, E4; El..
XI =E= SUM( (I), SUM( (J),C(I,J)
E2(I)..
SUM( (J), X(I,J)) =L= K(I)
E3(J)..
SUM( (I), X(I,J)) =G= R(J)
E4(I,J)..
*
X(I,J))) ;
X(I,J) =G= 0 ;
MODEL PROBLEM /ALL/ SOLVE PROBLEM USING LP MINIMIZING XI
which it is used. First, as was discussed above, a list of tokens and actions is developed. An example of such a list is given in Table 1. This list is input to Lex which in turn writes a computer program which is the lexical analyzer. Table 1 provides an illustrative subset of the tokens used in the language developed by the author. For example, when the word "For" is recognized no action is taken, but when "sets;' is recognized the lexical analyzer will print "SETS" in capital letters and inform the parser that the key work "sets" has been found in the stream of input. Similarly, when the words "minimize" or "maximize" are found in the input stream, the' lexical analyzer informs the parser by
b. Parser Consider first the way in which the parser is created and then the way in which it is used. The parser is created by providing a grammar for the language to YACC. A partial grammar of this sort is shown in Table 2. For example, the first line in Table 2 indicates that the entire input is called a problem statement and that statement can be divided into two parts : (i) the declarations and (ii) the problem. The declarations in turn consist of four lists: (i) sets, (ii) variables, (iii) parameters, and (iv) equations. Furthermore, the problem can be separated into the math and the data which are separated by the keyword "where". The math in turn is divided into a direction and a body where the direction is either minimize or maximize. Finally the body is divided by the keywords "subject to" into the criterion function and the constraints. The sets of upper case letters such as WH, MIH, MAX, and ST are used to indicate terminals in the parse tree and are tokens which are passed from the lexical analyzer to the parser as was shown in Table 1. A more complete grammar is provided in Appendix A. (Appendix A available upon request from author. It was removed to shorten the paper.)
A Mathematical-Computer Language
A grammar like that in Table 2 is used with YACC to generate a compiler for the language. This compiler is written in the language C which was developed by Kernighan and Ritchie (1978). Once the compiler is completed a linear programming problem in the NROFF-NEQN form shown in Figure 3 may be used to generate a GAHS statement like that shown in Figure 2.
Figure 3 NROFF-NEQN Statement .nr 0 4 .nr W 72 .nr N 2 .nr Pt 1 .so /usr/lib/tmac.m .DS For sets I plants { Seattle, San-Diego}
3.
markets { New-York, Chicago, Topeka }
J
- k sub i
Project Status
This section provides a review of (i) what has been accomplished and (ii) the work that remains to be done. While much has already been completed the project is a large one and is nearer to the beginning than the end.
variables .EQ --------xi ,- x sub {i j} .EN parameters • EQ --------c sub {i j} .EN equations 1, 2, 3, 4 ;
237
a. Accomplished The first task in the development of the math to GAMS translator was the conceptualization • Tnis phase begin with the use of GAMS for several years and the realization that it would be useful to be able to write GAMS-like statements in mathematics. Also, as my students made increasing use of GAHS it became apparent that while they originally developed their research problems in mathematics , the more they used GANS the more they attempted to think through their problem in GAMS terms rather than in mathematics.
- r sub j
minimize .DE .EQ (1)
----------mark xi - = SIGMA from {i epsilon I} --- SIGMA from --- c sub{i j} - x sub {i j} .EN subject to .EQ (2) ----------SIGMA from {j epsilon J} --- x sub {i j}-------------- i epsilon I .EN .EQ
<= ---
k sub i
)= - - -
r sub j
(3)
----------SIGMA from {i epsilon I} --- x sub {i j}-------------- j epsilon J .EN .EQ (4 )
----------x sub {i j} -- )= 0 - ; i epsilon I -- j epsilon J .EN .DS where Vector k
Capacity of Plants
cases per year
Seattle 350 San-Diego 600 Vector r
Requirements at Markets
New-York Chicago Topeka Matrix c Sea ttle San-Diego .DE
cases per year
300 300 300
Unit Transportation Costs New-York
Chicago
25 25
17 18
dollars per case Topeka
18 14
D. Kendrick
238
Table 1 List of Tokens and Actions
,
"ForI! "sets" "minimize" "maximize"
{print(SETS)j return(SET)} return(MIN); return(MAX); re turn( ST) ; return(WH) ; return(LCEPSILON); return(SUB); return(SUP); return(CM); return(SC); re t urn (NUMBER) ; return( ID) j
"subject to" "where"
"epsilon" Ifsub" 11 "
sup"
,
II
It."
•
[0-91*·?[0-91* [ -A-Za-zO-91* Table 2 A Partial Grammar
::= ::= ::=