Journal of Economic Behavior & Organization Vol. 38 (1999) 349±356
On the optimum number of library software licenses Richard E. Quandt* The Andrew W. Mellon Foundation, Princeton University, Princeton, NJ 08544-1021, USA Received 13 July 1998; received in revised form 13 July 1998; accepted 28 September 1998
Abstract The cost of licenses to use integrated library software generally depends on the number of concurrent users. The greater the number of concurrent users, the higher the software cost, but the lower the implicit cost incurred when a user does not have access to the software, because all concurrent user licenses are in use. Three different models are formulated for determining the optimum number of licenses that a library should acquire. The optimum depends on the cost of the license, the cost of not being able to satisfy patrons, the total patron population, and the probability that a patron will require access to the software in any unit time period. # 1999 Elsevier Science B.V. All rights reserved. JEL classi®cation: D21; D22 Keywords: Software licenses; Expected costs; Disequilibrium; Optimization; Probability of shortfalls; Monte Carlo integration
1. Introduction One of the major cost elements in introducing in a library integrated automated library software is the cost of the software license. A recent request for proposals by a consortium of eight libraries has yielded software proposals ranging from $592,680 to $937,825, with annual maintenance charges ranging from $88,900 to $110,250. In the light of the cost pressures faced by libraries in recent decades, (Cummings et al., 1992; Quandt, 1996) it behoves libraries to make software decisions with care and with due regard to the cost of the software. * E-mail:
[email protected] 0167-2681/99/$ ± see front matter # 1999 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 2 6 8 1 ( 9 9 ) 0 0 0 1 4 - 1
350
R.E. Quandt / J. of Economic Behavior & Org. 38 (1999) 349±356
Software vendors have various complicated charging schemes; there is typically a basic fee that covers usage of the software by a certain number of simultaneous or concurrent users, as well as an annual maintenance fee that ensures that the purchaser receives maintenance and update services. Sometimes different modules1 have separate charges for them. License fees also differ by the nature of the license provided: hence, licenses for use by librarians will tend to cost more than simple OPAC licenses for students. In the present paper we shall abstract from most of these refinements and concentrate on the basic issue, namely the `correct' number of concurrent users that the library should make provision for. The more the concurrent users, the higher the cost, although we have not undertaken a systematic investigation of how the cost increases with the number of concurrent users2. It seems to be a safe assumption that the total cost of licenses increases in the number of concurrent users and we shall make the convenient assumption that the cost of licenses is linear in concurrent users, although other assumptions can be handled with ease. The basic trade-off out of which an optimality calculation emerges is the following. The library faces two types of costs. The first of these is the cost of licenses (plus the cost of terminals), and this cost clearly increases with the number of licenses. A second cost is incurred if a patron arrives in the library and wishes to use a terminal, but cannot do so, either because there are not enough terminals or there are not enough concurrent user licenses. The cost incurred is not a monetary cost, but a `dissatisfaction cost', although that does not make it any less real3 (Portes et al., 1987). This cost clearly decreases as there are more terminals and licenses. The aggregate cost is then the sum of these two. Three approaches are discussed in the present paper: the first two are couched in terms of licenses for students and the last one in terms of licenses for librarians. 2. Model I: the expected cost of disequilibrium Let us denote by k the number of terminals and user licenses (for simplicity we assume that there is exactly one terminal for each user license). There may well be two or more sub-populations using the library with very different requirements; thus, for example, library-using patrons may differ drastically in their behavior from librarians. We shall concentrate here on a single class of users and, in principle, the analysis would then have to be conducted separately for each class; the total number of licenses would then be the sum of the optimal numbers of licenses for the separate classes4. We shall also denote by x the unknown number of patrons who demand terminal service during a given interval of time.
1
Cataloguing, acquisitions, circulation, serial control, union catalogue, etc. One vendor has priced licenses for staff users as follows: for up to 30 users, $2500 per user, for 31±90 users, $2000 per user, above 90 users, $1500 per user. 3 It is somewhat analogous to what has been called the cost of disequilibrium in a planned-economy context, when the planners derive disutility if consumption demand exceeds the supply of consumption goods. 4 Thus, the analysis ignores the possibility that some terminals may be used by members if both classes, and implicitly assumes that each class has its own terminals. See Section 4 for a relaxation of this assumption. 2
R.E. Quandt / J. of Economic Behavior & Org. 38 (1999) 349±356
351
A key assumption of the analysis is that x is a random variable and we take 1 hour as our unit interval5. We make the simplifying assumption that during prime time (say, between the hours of 8 a.m. and 10 p.m.) the distribution of x does not depend on what time of the day it is; thus we shall assume that the arrival of patrons is (statistically) the same in any 1-hour interval. Our next assumption pertains to characterizing the distribution of patron arrivals. It seems entirely plausible to assume that this is binomial: there is a given probability p that a student will want to use a terminal in a given 1-hour interval (which we call a `success' in the parlance of the binomial distribution)6. Then the probability of exactly x students out of a student population of n students wishing to use terminals in a 1-hour interval is n x p
1 ÿ pnÿx : g
x x Now assume that the unit cost of terminals/licenses is c1 and the unit cost of expected dissatisfied demand for terminals is c2. Due to its discreteness, the binomial distribution is awkward to deal with in the present context; however, we know that the binomial converges in distribution to the normal p density N(,2), where pn and np
1 ÿ p. If x k, the disequilibrium cost is zero; a positive disequilibrium cost is incurred only if x > k. Hence, the costs of not satisfying patrons, h(x,k), can be denoted as h
x; k c2
x ÿ k; if x k; h
x; k 0; otherwise: Hence, the expected cost of the program is E
C Eh
x; k c1 k:
(1)
Substituting in Eq. (1), we can rewrite as Z 1 x ÿ k ÿ0:5
xÿ2 =2 p e dx c1 k E
C c2 2 k or
"
# 2 2 kÿ eÿ0:5
kÿ = p c1 k E
C c2
ÿ k 1 ÿ 2
(2)
where denotes the cumulative standard normal integral. Setting dC/dk 0 yields dE
C kÿ ÿc2 1 ÿ c1 0: (3) dk
5
There is no reason why we should not take the unit interval to be some different period. The binomial distribution assumes that successes are independent of one another; hence we rule out the kind of behavior where a student decides to go to the library and tells his room-mate that he might want to go to the library as well. 6
352
R.E. Quandt / J. of Economic Behavior & Org. 38 (1999) 349±356
Eq. (3) does not permit a solution for k in closed form, but we can certainly solve dE(C)/ dk 0 by numerical methods. First we note that 2
2
d 2 E
C c2 eÿ0:5
xÿ = p > 0: dk2 2 Hence, since E(C) is convex everywhere, the solution of Eq. (3) defines a global minimum. We further note that the solution may be smaller or larger than the mean demand np; it will be larger than the mean demand if and only if c2 > 2c1, since we have from Eq. (3) that c1/c2 1 ÿ , and the right-hand side of this latter equation is less than 1/2 if and only if the argument of the () function is greater than zero. Finally, we note that the total differential of Eq. (3) is 2 d E
C dk 0 dc2 ÿ
1 ÿ
dc1 dk2 from which we have the comparative statics results that dk/dc1 < 0 and dk/dc2 > 0, as we would expect. In Fig. 1, we show an example of the expected cost function for c1 0.12, c2 1.2, n 10,000, and p 0.005. The value of c2 expresses the assumption that the library is penalized $1.2 for every patron turned away. The hypothetical cost c1 is arrrived at as follows: if we assume that the cost of a single license plus terminal is $3000, and that the license and terminal have a useful life of 5 years, and that there are 5000 usable hours per year, the hourly cost (neglecting interest payments) is 3000/25,000 0.12. From Fig. 1 it appears that the optimum is near 60 or so. For a few selected cases we have computed the exact solution to the problem. Generally, variations in c1 and c2 have only moderately little effect on the optimal number
Fig. 1.
R.E. Quandt / J. of Economic Behavior & Org. 38 (1999) 349±356
353
of licenses. Thus, for example, if p 0.005, c1 0.12 and c2 0.48, the optimal number of licenses for 30,000 users is 158.24; if c2 is increased to 2.0, the number of optimal licenses increases only to 168.997. What really affects the optimal number of licenses in a major way, is our estimate for p, the probability that a randomly chosen student will demand terminal service in any one particular hour. Thus, for example, with c1 0.12, c2 0.48 and p 0.01, the optimal number of licenses is 311.62, in contrast with the earlier figure of 158.24. Hence, the key element that has to be estimated correctly is the value of p8. We also note that the number of licenses increases nearly linearly with the population n; for the case of p 0.01, c1 0.12 and c2 0.48, the optimal number of licenses is 106.71 when n 10,000, 209.49 when n 20,000, 311.62 when n 30,000, and 413.42 when n 40,000. 3. Model II: The expected cost of the probability of shortfalls A quite different model is obtained if we assume that costs are incurred when there is a high probability that an arriving student will not have an available terminal or license. According to this mode, the cost would be written as C c2 Probfx kg c1 k
(4)
where the second term is, as before, the cost of acquiring the licenses. The first term penalizes the cost function if the probability of a shortfall of terminals or licenses is high. Since this is obviously measured in quite different units than the expected number of students who cannot get terminals, we are dealing with a very different metric and the appropriate value of c2 in this model will be quite different from its value in Model I. Substituting from the normal density, we can write Eq. (4) as Z 1 2 1 2 p eÿ0:5
xÿ = dx c1 k (5) C c2 2 k and differentiating, and setting the derivative equal to zero yields 2 dC c2 2 p ÿ eÿ0:5
kÿ = c1 0: dk 2
(6)
The second derivative is
d2 C c2 k ÿ ÿ0:5
kÿ2 =2 p 0: e dk2 22
It is immediately clear from (6) that there are two solutions, provided that p c1 2 < 1; c2
(7)
(8)
7 The number of licenses should increase, because in the latter case the penalty for not satisfying demand is greater. 8 We note in passing that a relatively straightforward sample survey could determine a realistic estimate for p.
354
R.E. Quandt / J. of Economic Behavior & Org. 38 (1999) 349±356
and no interior solution in the converse case. These are given by s p log
c1 2 : k ÿ2 c2
(9)
But it is clear from the second derivative in Eq. (7) that the smaller of the two solutions corresponds to a local maximum of the cost function, hence the cost-minimizing solution is s p log
c1 2 ; (10) k ÿ2 c2 and unlike the case of Model I, the optimal solution is always larger than the mean. We finally need to comment on Condition Eq. (8). What this says is that c2 must be sufficiently large relative to the product c1. If this is not the case ± if, for example, the cost of licenses is too high, the optimal solution will be a corner solution, namely k 0. Justpfor the sake of argument, assume that c1 0.12, p 0.01, and n 30,000. Then c1 2 < c2 if c2 > 1.63927. We finally do some sensitivity tests. As in Model I, variations in p have a rather massive effect on the optimal value of k. Thus, for example, when c1 0.12, c2 10, and n 10000, p-values ranging from 0.005 to 0.014 generate k values from 62.42 for the smallest p to 156.95 for the largest. When n 40 000, the corresponding k-values range from 218.47 to 579.58. The optimal k grows a little less fast than n; a quadrupling of n causes a less than quadrupling of k. Variations in c1 and c2 have a much smaller effect, as in Model I, which is to be expected from Eq. (10). 4. An alternative approach An alternative approach has been suggested by Martin for the CALICO Consortium of research libraries in the Cape Town area (Martin, 1998). Martin and colleagues have identified eight classes of librarians, defined by the value of the probability that a librarian in each class will require access to a computer during the morning peak hours. The data are shown in Table 1. For each class i, i 1,. . .,8, Martin computes a number xi from the binomial distribution, such that the probability of more than xi persons wishing to have computer access is less than P or equal to, say, 0.05. The aggregate number of licenses is then estimated as k 8i1 xi , which turns out to be 313 for the case represented by Table 1. As Martin correctly notes, this provides an overestimate of the total required, because a computer not used by a person in one class may well be used temporarily by a person in another class. We now reformulate the problem to take this substitutability into account. First, as in the previous sections, we approximate the binomial distributions by normal distributions with means i and variances 2i in class i9. It is clear that in Class 1 we shall 9
Only in the last class is this approximation not very good.
R.E. Quandt / J. of Economic Behavior & Org. 38 (1999) 349±356
355
Table 1 Classes of librarians by computer usage Probability
Number of librarians
1.00 0.95 0.85 0.75 0.65 0.45 0.35 0.10
179 30 24 4 52 28 38 12
have to provide exactly 179 licenses. Since we have no particular reason to assume that the demands in the various classes are correlated, the joint density function of demands in the remaining seven classes is ) ( 7 Y 1
xi ÿ i 2 p f
x2 ; . . . ; x8 exp ÿ 22i 2i i2 We need to determine a value k such that
Evaluating the integral by Monte Carlo integration in which we have used 10,000 random drawings yields k 113.27, which when added on to the 179 licenses we have to provide for librarians in Class 1, yields a total number of required licenses of approximately 292, about 7 percent less than the solution obtained by Martin. If we relax the stringent assumption that librarians in Class 1 have a 1.00 probability of needing a computer and set that probability at 0.99, the required number of licenses declines to 290.47. 5. Conclusions The principal conclusions from the first two models are as follows: (1) The optimal k depends primarily on p, our estimate of the probability that a randomly selected student will demand termninal/license services in a particular 1-hour interval; (2) the optimal k clearly depends very much on the value of n, the size of the target population, and increases nearly linearly or slightly less fast than n itself; (3) c1 and c2 have a lesser impact on k in the ranges examined; (4) we formulated two models, in one of which the optimal solution is always larger than the mean value of the demand. On balance, Model I appears to be the more defensible one, simply because it formulates expected cost directly in terms of the expected `disequilibrium', the costs of which are more easily quantifiable by librarians than the costs of the probability that a student will be turned away.
356
R.E. Quandt / J. of Economic Behavior & Org. 38 (1999) 349±356
In the last model, we take some concrete data and compute the number of licenses to be provided for librarians when they can be sorted into discrete classes, each with its own probability that a librarian in a class will require a license in a peak period. From this, we can compute an aggregate number of required licenses so that the probability is, say, 0.95 that all demands will be satisfied. Acknowledgements I am indebted to Avinash Dixit, Duncan Martin, and Thomas Nygren for helpful comments. I alone am responsbile for errors, Department of Economics, Princeton University, Princeton, NJ 08544-1021 References Cummings, A.M., Witte, M.L., Bowen, W.G., Lazarus, L.O., Ekman, R.H., 1992. University Libraries and Scholarly Communication, Washington, DC, Association of Research Libraries. Martin, D., 1998. Estimation of the Number of Concurrent User Licenses Required by CALICO for Library Staff, University of Cape Town, Cape Town. Portes, R., Quandt, R.E., Winter, D., Yeo, S., 1987. Macroeconomic planning and disequilibrium: Estimates for Poland, 1955±1980. Econometrica 55, 19±41. Quandt, R.E., 1996. Simulation model for journal subscription by libraries. Journal of the American Society for Information Science 47(8), 610±617.