Valid inequalities for concave piecewise linear regression

Valid inequalities for concave piecewise linear regression

Accepted Manuscript Valid inequalities for concave piecewise linear regression Karthick Gopalswamy, Yahya Fathi, Reha Uzsoy PII: DOI: Reference: S01...

494KB Sizes 0 Downloads 120 Views

Accepted Manuscript Valid inequalities for concave piecewise linear regression Karthick Gopalswamy, Yahya Fathi, Reha Uzsoy

PII: DOI: Reference:

S0167-6377(18)30145-7 https://doi.org/10.1016/j.orl.2018.12.004 OPERES 6418

To appear in:

Operations Research Letters

Received date : 30 March 2018 Revised date : 30 August 2018 Accepted date : 12 December 2018 Please cite this article as: K. Gopalswamy, Y. Fathi and R. Uzsoy, Valid inequalities for concave piecewise linear regression, Operations Research Letters (2018), https://doi.org/10.1016/j.orl.2018.12.004 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Valid inequalities for concave piecewise linear regression Karthick Gopalswamya , Yahya Fathia , Reha Uzsoya,∗ a Edward

P. Fitts Department of Industrial and Systems Engineering, North Carolina State University, Raleigh, NC, USA

Abstract We consider the problem of fitting a concave piecewise linear function to multivariate data using the Least Absolute Deviation objective. We propose new valid inequalities for the problem using the properties of concave functions. Results with univariate data show that the proposed valid inequalities improve the root relaxation lower bound, permitting significant improvements in solution time. Keywords: concave regression, piecewise linear fitting, valid inequalities, clearing function.

1. Introduction Piecewise linear functions are a very powerful tool providing high flexibility in data modeling. The problem of fitting a piecewise linear function to empirical data dates back to the work of Hildreth [8] on estimating production functions. Several other authors [1, 15, 17, 19] have considered the problem, with applications including approximation of cost-to-go functions in reinforcement learning and approximate dynamic programming [16]. Magnani and Boyd [14] proposed an allocate-fit algorithm analogous to the location-allocation procedure for facility location problems. Hannah and Dunson [7] proposed a convex adaptive partitioning (CAP) algorithm that partitions the domain space of the function. While these approaches [7, 14] are heuristic in nature, we focus on obtaining an ∗ Corresponding

author Email addresses: [email protected] (Karthick Gopalswamy), [email protected] (Yahya Fathi), [email protected] (Reha Uzsoy)

Preprint submitted to Operations Research Letters

August 30, 2018

exact solution using a mixed integer programming formulation. Current state of the art algorithms [6, 7, 14] for piecewise linear fitting require hyper-parameter tuning which can be avoided by using mixed integer linear programming (MILP) formulations. A general formulation with possibly discontinuous piecewise linear functions defined over a polyhedral partition is presented in [2]. Lauer [12] discusses the computational complexity of the problem, noting that the general problem is NP-complete; however, when the number of segments is equal to 2 and the dimension of the domain space is fixed, solution algorithms polynomial in the number of observations exist. The problem studied here, where the fitted piecewise linear function is constrained to be convex, does not admit this polynomial time solution approach. Discrete optimization techniques, including mixed integer linear programming (MILP) models, have been widely used to solve statistical problems such as classification and clustering [5, 13]. The generic optimization formulation of the multivariate fitting problem we consider can be stated as: min

m X f (xi ) − yi q i=1

s.t.

f ∈F

where F is a finite set of piecewise linear functions. The data associated with the problem consists of m observations (xi , yi ), i = 1, · · · , m where xi ∈ Rn

denotes a vector of independent variables (covariates) and yi ∈ R the response

variable, while q ∈ {1, 2} defines the norm to be minimized. The specific function considered for fitting is the minimum of affine functions (min-affine) such that f : Rn → R with f (x) = mink=1..p {ck x + dk }, where ck and dk denote the gradient and intercept, respectively, of the k’th affine function. We assume that the domain partitions are not predetermined and that a known number p of linear segments or polytopes are used to fit the data. The objective function seeks to minimize the sum of absolute deviations of the function value fi = f (xi ) evaluated at a given data point xi from the observed value yi . The min-affine functional form is non-linear, but the problem 2

can be formulated as a MILP using disjunctive constraints [18]. A special case of this formulation with p = q = 1 is linear regression, which can be formulated as a linear program (LP). Our contribution is the identification of valid inequalities (VIs) to strengthen the weak LP relaxation of the MILP formulation of Toriello and Vielma [18] when the slopes ck are bounded both above and below. Although the proposed VIs are valid for q ∈ {1, 2}, our computational experiments focus on the case with q = 1. Section 2 introduces the formulation and necessary notation along with the MILP constraints used to model the disjunctions, and restates the results from Toriello and Vielma [18] upon which our work is based. In Section 3 we propose VIs for the problem based on the convexity or concavity of the function used in fitting. These non-linear VIs are linearized in an exact manner. Section 4 discusses the reduction obtained in the special case of univariate functions with q = 1. Section 5 provides computational results on the performance of the models with the VIs. We conclude with some directions for future research in Section 6.

2. Formulation The generic problem with the parametric functional form and notation defined in Section 1 yields the formulation: min

m X i=1

s.t.

|fi − yi |

(1)

fi = min {ck xi + dk } ∀i ∈ 1, ..., m k=1,...,p

fi ∈ R, c ∈ Rn×p , d ∈ Rp

3

(2) (3)

MILP formulations for both parametric and non-parametric fitting were proposed by Toriello and Vielma [18]. The natural approach would be to define the disjunctive constraints f (x) ≤ ck x + dk p _

∀k = 1, ..., p

(4)

{f (x) ≥ ck x + dk }

(5)

k=1

where

W

is the disjunction (multiple choice) operator. However, the min oper-

ator cannot be modeled with disjunctive techniques because the polyhedra (5) have different recession cones [4, 9]. While (4) holds due to the concavity of f (x), the disjunctions (5) can be linearized using the Big-M technique to yield f (x) ≤ ck x + dk

∀k = 1, ..., p

(6)

f (x) ≥ ck x + dk − M (1 − z k ) ∀k = 1, ..., p p X

zk = 1

(7) (8)

k=1

Defining fi = f (xi ) ∀i = 1, ..., m and enforcing concavity on the data points yields the following MILP: min

m X i=1

s.t.

|fi − yi |

(9a)

fi ≤ ck xi + dk

∀k = 1, ..., p

∀i = 1, ..., m

fi ≥ ck xi + dk − M (1 − zik ) ∀k, ∀i p X

k=1

zik = 1 ∀i = 1, ..., m

(9b) (9c) (9d)

z ∈ {0, 1}m×p

(9e)

c ∈ Rn×p ,

d ∈ Rp

(9f)

∀i = 1, ..., m

(9g)

fi ∈ R,

The variable zik takes the value of 1 if the observation i ∈ {1, ..., m} is allocated to the linear segment or region k ∈ {1, ..., p}, and the value of 0 otherwise. 4

The above formulation has been studied by Toriello and Vielma [18] who report that its LP relaxation is weak, and suggest the VIs presented in the following proposition. Proposition 2.1. There is an optimal solution to the MILP (9) such that c11 ≥ c21 ... ≥ cp1 and

d1 ≤ d2 .... ≤ dp

(10)

∀k = 1, ..., p ∀i = 1, ..., m,

(11)

and if ck xi + dk ≤ 0 then fi ≥

p X

k=1

ck xi + dk , ∀i = 1, ..., m

(12)

Proof. Given in Toriello and Vielma [18]. Since the function is concave there must exist an order satisfying (10) in at least one of the dimensions, otherwise the function would not be concave. The condition (11) is not restrictive because we can add a large negative number to yi , shifting the regression problem by a constant to include (12) in the model. Inequalities (10) and (12) were added to the model (9) in the computational experiments of Toriello and Vielma [18], but did not tighten the LP relaxation of (9) sufficiently to avoid high solution times for even small problem instances.

3. New Valid Inequalities We propose the following new valid inequalities to tighten formulation (9). Lemma 3.1. Let ξ i ∈ Rn be a sub-gradient of the function f at xi and ∆ji = xj − xi , then

fi + h∆ji , ξ i i ≥ fj , i 6= j ∈ {1, ..., m}

is a valid inequality for (9).

5

(13)

 Proof. By concavity we have f (x) + hw − x, ξ x i ≥ f (w), ∀x, and w ∈ dom f .

Applying this result to all pairs of points in the data (xi , wi ) completes the proof. The valid inequalities (13) require the computation of a sub-gradient of the

function at each data point xi . The sub-gradients ck in formulation (9) do not relate to the data points directly; one has to take the ck corresponding to the segment or region where the function f (w) takes the smallest value at w. We can compute the gradient at each point using the variables zik ∀i, k which take the value of 1 if xi is allocated to the linear segment or region k and 0 otherwise. This can be modeled as ξli =

p X

k=1

ckl zik

∀i = 1, ..., m ∀l = 1, ..., n

fi + ∆ji , ξ i ≥ fj , i 6= j ∈ {1, ..., m}

(14)

The ξ variables are the sum of products of binary and continuous variables. If k the slopes ckl are bounded above by C¯l and below by 0, we can linearize the

constraints (14) as ξli =

p X

k wli ∀i = 1, ..., m ∀l = 1, ..., n

(15a)

k wli ≤ C¯l zik ∀i = 1, ..., m ∀k = 1, .., p ∀l = 1, ..., n

k

(15b)

k wli ≤ ckl ∀i = 1, ..., m ∀k = 1, ..., p ∀l = 1, ..., n

(15c)

k=1

k wli ≥ ckl − (1 − zik )C¯l

k

∀i = 1, ..., m ∀k ∀l

k wli ≥ 0 ∀i = 1, ..., m ∀k = 1, ..., p ∀l = 1, ..., n

(15d) (15e)

The linearization of products of binary and continuous variables (ξ) requires bounds on the continuous variables that must be identified based on knowledge of the application. Hence bounds on the ckl must be determined based on the application domain of interest. We shall refer to the constraints (13) and (15) as the Linearized Valid Inequalities (LVIs) in the remainder of the paper.

6

3.1. Size of the Formulation The MILP (9) with the additional LVI’s has a large number of constraints as shown in Table 1, rendering it impractical to implement all the LVIs simultaneously, especially when the number of observations m is very large. The natural way to include them would be through cutting plane generation methods. Leaving this approach for future research, this paper focuses on the special case of univariate data. Table 1: Model comparison

MILP

MILP + LVI (13) & (15)

m + np

m + np + npm

# of binary variables

mp

mp

# of constraints

mp

m2 − m + 3mpn + mp

# of continuous variables

4. Special Case of a Univariate Function In the case of a univariate function with n = 1 the number of LVIs (13) and (15) can be reduced significantly using the following proposition: Lemma 4.1. In the univariate case (n=1) the valid inequalities (13) imply that fi+1 − fi fi − fi−1 ≥ i+1 with xi−1 < xi < xi+1 , ∀i = 2, ..., m − 1 xi − xi−1 x − xi

(16)

Proof. By Lemma 3.1, we have

fi + h∆ji , ξ i i ≥ fj , i 6= j ∈ {1, ..., m} with ∆ji = xj − xi (xj − xi )ξ i ≥ fj − fi , i 6= j ∈ {1, ..., m}

(17)

when j = i + 1 and j = i − 1, (17) yields fi+1 − fi , (since xi+1 > xi ) xi+1 − xi fi−1 − fi (xi−1 − xi )ξ i ≥ fi−1 − fi =⇒ ξ i ≤ i−1 , (since xi−1 < xi ) x − xi

(xi+1 − xi )ξ i ≥ fi+1 − fi =⇒ ξ i ≥

7

(18) (19)

Combining (18) & (19) yields the desired result. We shall refer to the constraints (16) as the Reduced Valid Inequalities (RVI). The RVIs (16) are implied by the non-increasing derivative condition that characterizes concave functions. We also note that (16) is implied by (15), but is not equivalent; our computational results show that the LVIs (13) and (15) are much stronger than the RVIs (16). As seen by comparing Tables 1 and 2, Lemma 4.1 reduces the number of constraints required to implement the valid inequalities from m2 − m to m − 1. In addition, the RVIs (16) are linear in the original MILP variables and so no additional variables are required. We also note that the proposed LVIs’ requirement of increasing derivatives is implied by (10) in the univariate case, but the resulting solution time and lower bound are significantly different. Proposition 4.2. If a feasible solution of MILP with n=1 satisfies fi − fi−1 fi+1 − fi ≥ i+1 ∀i = 2, ..., m − 1 xi − xi−1 x − xi it also satisfies c11 ≥ c21 ... ≥ cp1 Proof. We prove that for every affine function k, if the data set contains at least two points (xi , fi ), (xi+1 , fi+1 ), then fi+1 − fi = ck xi+1 − xj Substituting the points in the affine function y = ck x + dk , we get fi = ck xi + dk & fi+1 = ck xi+1 + dk Solving the above equations gives the slope as ck =

fi+1 − fi xi+1 − xi

Since the feasible solution satisfies (16), the above argument implies that c11 ≥ c21 ... ≥ cp1 .

8

Table 2: Model comparison for univariate case n = 1



MILP

MILP + RVI (16)

m + np

m + np

binary variables

mp

mp

constraints

mp

m + mp − 2

continuous variables

5. Numerical Experiments Our motivating application is the fitting of clearing functions (CF) that describe the expected output of a production resource as a function of its expected workload and have yielded promising results when used in production planning models [10]. CFs are concave, monotone and non-negative in the positive orthant. When used in production planning models to represent production resources, the performance of the planning models can be highly sensitive to parameter estimates [3]. The data for the formulation and experimental study are generated from a deterministic clearing function that relates observations xi of the workload of the resource (machine) in each of m discrete periods to the observed output yi of the resource in that period. In addition, the slope of the clearing function is bounded below by zero, since output cannot decrease with workload as long as sequence-dependent setup times are not present, and above by 1, since the output cannot exceed workload during the period. We use the functional form suggested by [11] which takes the form f (x) =

k1 x . k2 + x

We consider the domain x ∈ [0, 100] with parameter values k1 = 30 and k2 = 15. The cases with 100 and 200 observations are shown in Figure 1 with their corresponding fits. The data is generated from f (x) and then perturbed with a random normal error whose coefficient of variation (CV) varies per the experimental design. We present computational experiments to examine the benefit of the VI’s in the univariate case using all three models; the formulation of [18] without any 9

VIs, with the LVIs (15) and with the reduced VIs for the univariate case (16), denoted as MILP, MILP + LVI and MILP + RVI, respectively. The comparisons are based on solution of a static formulation in which all valid inequalities are added at the root node, not through a branch and cut procedure, which will be the subject of future work. The models were solved using Gurobi 7.2 on an Intel Xeon 3.4 GHz system with 64 GB of RAM. We limited the solution time to 1000 seconds. The Big-M value used for the experiments is 1000. The experiment is conducted for a univariate function following the experimental design presented in Table 3. Table 3: Experimental Design

Factor # of observations (m) # of segments (p) Coefficient of Variation (CV)

Values

Levels

100, 200, 400

3

2, 3, 4

3

0.5, 1, 2

3

# of independent replications

10

The number of observations and number of segments affect the size of the model and hence the solution time. We include the coefficient of variation of the error distribution to understand the effect of variability in the data on the solution time. Each independent replication represents an independent sample from the random error distribution. In total, we generate 90 datasets and solve 270 random problem instances using each of the three models. Three values of p = 2, 3 and 4 are considered for each dataset. The performance measures considered for comparison are solution time, number of nodes explored, LPbound gap and optimality gap with the time constraint. The optimality gap and LP-bound gap are defined as follows: obj ∗ − obj LB × 100 obj ∗ obj IP − obj LP × 100 LP-bound gap = obj IP

Optimality gap =

where obj LP and obj IP denote the optimal objective function values of the op10

timal solution of the LP relaxation and the MILP, respectively. obj ∗ and obj LB denotes the objective value of the best IP feasible solution (upper bound) and best lower bound found by the solver in the given time limit of 1000 s. We use the algorithm of Hannah and Dunson [7] to generate an initial feasible solution. The original CAP algorithm solves the least squares linear regression subproblem, which we replace with the least absolute deviation problem, a special case of MILP (9) with p = 1. 31 27

26

22

21

17

16

12

11 6

7 0

20

40

60

80

100

0

(a) m = 100, p = 2.

20

40

60

80

100

(b) m = 200, p = 2. 31

27

26

22

21

17

16

12

11 6

7 0

20

40

60

80

100

(c) m = 100, p = 3

0

20

40

60

80

100

(d) m = 200, p = 3

Figure 1: Plots showing the results of fitting for different cases.

5.1. Computational results First, in Table 4, we report the LP-bound gap for the three models. We note that MILP+LVI provides the strongest LP-bounds (smallest LP-bound gaps), followed by MILP+RVI and the MILP model itself. This is not surprising as the number of additional constraints (valid inequalities) in MILP+LVI is significantly larger than those in MILP+RVI, and MILP has no additional valid inequalities. Furthermore, for both models MILP+LVI and MILP-RVI, the LP-

11

bound gap tends to decrease as the number of segment p increases, and this decrease is more notable for MILP+LVI when the value of m is relatively large. This is also not surprising, since the LVIs have 3mp additional constraints, whereas the number of RVIs is invariant in the number of segments p. Table 4: Summary of LP-bound Gap

m

100

200

400

p



obj IP −obj LP obj IP

%



CV = 0.5

CV = 1

CV = 2

Average

Average

Average

MILP

LVI

RVI

MILP

LVI

RVI

MILP

LVI

RVI

2

100

48.33

48.69

100

23.59

24.54

100

8.95

10.58

3

100

24.46

24.98

100

9.14

10.26

100

2.86

4.61

4

100

11.41

12.06

100

2.94

4.18

100

0.56

2.36

2

100

45.94

58.35

100

22.15

40.93

100

7.87

30.82

3

100

20.45

38.69

100

7.09

29.46

100

2.06

26.42

4

100

9.04

29.87

100

2.93

26.3

100

0.68

25.38

2

100

44.54

80.27

100

21.39

72.25

100

8.05

67.7

3

100

19.7

71.35

100

6.94

67.07

100

2.03

65.53

4

100

8

67.19

100

2.63

65.56

100

0.74

65.09

Tables 5, 6 and 7 present the solution times for the instances with m = 100, 200 and 400, respectively. We present the average and worst case results for various performance measures along with their standard deviation. The lower bound for the original MILP is indeed very weak compared to the other models, resulting in long computation times. MILP+LVI provides a lower bound slightly better than MILP+RVI and reaches the optimal solution exploring fewer nodes. The solution times do not differ greatly when p = 2, whereas for p = 3 and 4, MILP was unable to solve all problem instances to optimality within the allocated time limit. As the number of observations increases, the performance of the models differs significantly. The MILP model consistently performs worse than the other models. MILP+RVI is faster when the number of observations and the number of segments are small, but as m and p increase MILP+RVI 12

Table 5: Computation Time Summary for Test instances: m = 100 p

2

CV = 2

Avg.

Std.

Max

Avg.

Std.

Max

Avg.

Std.

Max

0.81

0.26

1.17

2.17

1.23

4.94

6.98

0.39

7.59

LVI

4.96

0.93

6.71

4.66

0.50

5.88

4.60

0.92

5.98

RVI

0.40

0.06

0.52

0.26

0.03

0.30

0.27

0.04

0.34

820.43

245.03

1000.07

835.42

298.98

1000.09

955.63

140.53

1000.09

LVI

65.59

28.43

126.28

58.12

27.44

129.11

52.05

44.16

171.93

RVI

3.54

1.42

6.84

4.11

5.89

20.59

1.45

0.73

2.63

MILP 4

CV = 1

MILP

MILP 3

CV = 0.5

Model

1000.10

0.03

1000.15

1000.10

0.04

1000.2

1000.08

0.02

1000.12

LVI

195.54

71.45

293.82

133.51

42.84

199.42

152.91

140.38

543.67

RVI

19.20

16.74

53.63

17.12

19.39

50.41

3.79

1.23

6.00

cannot solve all instances to optimality, whereas MILP+LVI does so within the given time limit. The increase in CV of the random error does not affect the solution times when the other factors remain constant. Tables 8, 9 and 10 present the optimality gaps for all problem instances. The optimality gap increases with the number of segments and observations for both the MILP and MILP+RVI models, whereas the MILP+LVI model has a zero gap in all scenarios. MILP+LVI has considerably more constraints than the other models. The number of nodes explored by MILP + VI is the lowest among all the models for p = 3, 4 and does not increase greatly with the number of observations m as shown in Table 11. We would like to emphasize the importance of the RVIs, especially in the case where MILP+LVI cannot be implemented. To examine this issue we select a particular setting with p = 2 and increase the number of observations (m = 1000, and 5000) to quantify the effect of having a weaker but smaller problem to solve at each node of the branch and bound tree. We cannot use MILP+LVI in this scenario because of the large number of constraints (≈ 106 and 25 × 106 , respectively), so these problem instances are solved using MILP+RVI and the results are presented in Table 12. Although MILP+RVI could not solve the original problem instances for p = 3, 4, it scales well for the case p = 2.

13

Table 6: Computation Time Summary for Test instances: m = 200 p

MILP 2

LVI RVI MILP

3

LVI RVI MILP

4

CV = 0.5

Model

CV = 1

CV = 2

Avg.

Std.

Max

Avg.

Std.

Max

Avg.

Std.

Max

34.07

14.3

61.3

47.66

70.58

247.89

42.09

24.09

90.7

9.09

0.84

10.7

9.08

0.88

10.81

9.43

0.84

10.67

0.79

0.18

1.17

0.68

0.11

0.91

0.81

0.42

1.77

1000.05

0.02

1000.09

1000.03

0.01

1000.05

1000.04

0.02

1000.06

64.07

25.89

110.3

46.52

14.39

70.09

51.69

19.11

87.88

60.49

93.76

316.46

134.48

305.15

1000.01

334.37

397.98

1000.03

1000.07

0.04

1000.14

1000.07

0.02

1000.09

1000.06

0.03

1000.12

LVI

201.55

43.3

257.85

163.41

73.07

293.36

148.46

60.28

263.85

RVI

952.62

141.36

1000.06

1000.03

0.01

1000.04

952.41

150.61

1000.06

Table 7: Computation Time Summary for Test instances: m = 400

p

MILP 2

4

CV = 1

CV = 2

Avg.

Std.

Max

Avg.

Std.

Max

Avg.

Std.

Max

1000.04

0.01

1000.06

973.55

83.8

1000.12

1000.06

0.02

1000.09

LVI

16.14

1.18

18.52

17.25

1.46

18.98

18.2

1.56

20.22

RVI

104.55

38.35

183.81

74.98

7.02

84.42

86.68

28.72

150.53

MILP 3

CV = 0.5

Model

1000.07

0.02

1000.09

1000.07

0.02

1000.09

1000.08

0.03

1000.14

LVI

136.03

27.23

183.11

108.8

22.91

156.22

115.8

16.35

145.86

RVI

1000.08

0.03

1000.14

1000.07

0.03

1000.12

1000.07

0.03

1000.14

MILP

1000.08

0.02

1000.11

1000.09

0.05

1000.19

1000.08

0.03

1000.14

LVI

384.12

71.34

535.27

299.84

92.48

480.95

246.11

50.97

316.38

RVI

1000.09

0.03

1000.14

1000.09

0.01

1000.11

1000.09

0.02

1000.12

14

Table 8: Optimality Gap Summary for Test Instances: m = 100

p

2

CV = 2

Std.

Max

Avg.

Std.

Max

Avg.

Std.

Max

MILP

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

LVI

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

RVI

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

11.24

12.55

38.54

10.61

8.26

21.35

23.27

13.10

39.56

LVI

0.00

0.00

0.00

0.00

0.00

0.01

0.00

0.00

0.01

RVI

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.01

36.38

8.60

44.49

40.37

8.76

58.07

48.20

7.65

58.34

LVI

0.00

0.00

0.01

0.00

0.00

0.00

0.00

0.00

0.01

RVI

0.00

0.00

0.01

0.00

0.00

0.01

0.00

0.00

0.00

MILP 4

CV = 1

Avg.

MILP 3

CV = 0.5

Model

Table 9: Optimality Gap Summary for Test Instances: m = 200

p

2

Model

CV = 2

Std.

Max

Avg.

Std.

Max

Avg.

Std.

Max

MILP

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

LVI

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.01

RVI

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

50.93

9.07

62.45

46.45

9.20

53.54

32.57

12.43

50.34

LVI

0.00

0.00

0.00

0.00

0.00

0.01

0.00

0.00

0.01

RVI

0.00

0.00

0.01

0.27

0.87

2.74

0.49

1.17

3.62

76.73

3.50

81.21

74.78

3.16

81.58

62.07

11.09

72.51

LVI

0.00

0.00

0.01

0.00

0.00

0.01

0.00

0.00

0.01

RVI

6.14

6.09

19.72

6.43

2.85

10.47

6.18

3.35

11.65

MILP 4

CV = 1

Avg.

MILP 3

CV = 0.5

15

Table 10: Optimality Gap Summary for Test Instances: m = 400

p

CV = 0.5

Model

Std.

Max

Avg.

Std.

Max

Avg.

Std.

Max

25.78

7.67

33.67

18.40

8.42

31.77

12.44

7.53

26.74

LVI

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

RVI

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

69.94

8.02

79.18

54.05

9.67

68.75

45.13

8.18

56.31

LVI

0.00

0.00

0.01

0.00

0.00

0.00

0.00

0.00

0.01

RVI

40.75

8.92

49.74

37.45

4.19

41.53

32.55

4.93

37.65

MILP

86.23

3.19

90.27

72.07

12.29

85.99

65.72

12.22

75.65

LVI

0.00

0.00

0.01

0.00

0.00

0.01

0.00

0.00

0.01

RVI

48.48

6.50

59.63

47.34

6.62

54.24

35.86

6.93

50.43

MILP 3

4

CV = 2

Avg. MILP 2

CV = 1

Table 11: Summary of Number of Nodes explored for Test instances

p

Model

Average

Average

Average

m = 200

m = 400

m = 100

m = 200

m = 400

m = 100

m = 200

m = 400

58513

586940*

14523

66021

435500*

29951

49675

403187*

LVI

169

182

140

143

125

151

139

97

116

RVI

980

1410

81190

72

1099

36918

54

1551

46360 169017*

3946118*

975319*

473715*

1629180*

526514*

324829*

1136754*

306227*

LVI

4371

3256

3414

4016

2602

3110

5193

2680

3340

RVI

5217

291679

394249*

13871

662085*

403187*

2526

1660790*

279376* 199433*

MILP 4

CV = 2

2772

MILP 3

CV = 1

m = 100 MILP 2

CV = 0.5

2216536*

1149838*

534326*

1346244*

418631*

259092*

719697*

363487*

LVI

14831

10179

12561

7707

8288

9601

9152

6971

7661

RVI

57343

2847785*

572195*

47845

2529864*

368853*

8615

2062138*

304662*

*problem instances not solved to optimality (i.e. non-zero optimality gap, see table 8, 9 and 10)

Table 12: Average solution time and optimality gap for p = 2 with large m (m)

CV

Time in s (RVI)

Time in s (MILP) 3

3

Gap (RVI)

Gap (MILP)

3

1000

0.5, 1, 2

11.66, 8.07, 3.95

10 , 10 , 10

0.00, 0.00, 0.00

41.18, 34.25, 34.52

5000

0.5, 1, 2

287.93, 227.80, 183.14

103 , 103 , 103

0.00, 0.00, 0.00

96.01, 97.47, 98.20

16

6. Conclusions and Future Directions We have discussed the MILP model for concave piecewise linear fitting and introduced valid inequalities to tighten the model. The original form of the proposed valid inequalities is non-linear, and so a linear reformulation (LVI) is provided. For the special case of univariate functions, a second linear reformulation (RVI) is also provided that has a much smaller number of valid inequalities, although it is not as strong as the first reformulation. Through a computational experiment, we show that these valid inequalities indeed provide computational advantage for all experimental cases. There are several promising directions for future work, beginning with understanding the complexity class of the problem of concave piecewise linear fitting. For the multi-dimensional case the number of proposed VIs is large and so a branch and cut algorithm is a possible direction, with methods to choose the strongest VI in each iteration (separation problem). The numerical experiments show that the VIs are in fact cuts, but further exploration of the strength of these cuts is needed. The effect of aggregation of the proposed VIs on the solution time is also of interest as aggregation results in a smaller formulation.

Acknowledgment This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

References [1] Gad Allon, Michael Beenstock, Steven Hackman, Ury Passy, and Alexander Shapiro. Nonparametric estimation of concave production technologies by entropic methods. Journal of Applied Econometrics, 22(4):795–816, 2007. [2] Edoardo Amaldi, Stefano Coniglio, and Leonardo Taccari. Discrete optimization methods to fit piecewise affine models to data points. Computers & Operations Research, 75:214–230, 2016. 17

[3] Jakob Asmundsson, Ronald L Rardin, Can Hulusi Turkseven, and Reha Uzsoy. Production planning with resources subject to congestion. Naval Research Logistics (NRL), 56(2):142–157, 2009. [4] Egon Balas. Disjunctive programming: Properties of the convex hull of feasible points. Discrete Applied Mathematics, 89(1):3–44, 1998. [5] Dimitris Bertsimas and Romy Shioda. via integer optimization.

Classification and regression

Operations Research, 55(2):252–271, 2007.

ISSN 0030364X, 15265463. URL http://www.jstor.org/stable/ 25147075. [6] Jerome H Friedman. Multivariate adaptive regression splines. The annals of statistics, pages 1–67, 1991. [7] Lauren A Hannah and David B Dunson. Multivariate convex regression with adaptive partitioning. The Journal of Machine Learning Research, 14 (1):3261–3294, 2013. [8] Clifford Hildreth. Point estimates of ordinates of concave functions. Journal of the American Statistical Association, 49(267):598–619, 1954. [9] Robert G Jeroslow and James K Lowe. Modelling with integer variables. Mathematical Programming at Oberwolfach II, pages 167–184, 1984. [10] Necip Baris Kacar, Lars M¨ onch, and Reha Uzsoy. Modeling cycle times in production planning models for wafer fabrication. IEEE Transactions on Semiconductor Manufacturing, 29(2):153–167, 2016. [11] Uday S Karmarkar. Capacity loading and release planning with work-inprogress (wip) and leadtimes. Journal of Manufacturing and Operations Management, 2(105-123), 1989. [12] Fabien Lauer. On the complexity of piecewise affine system identification. Automatica, 62:148–153, 2015.

18

[13] Alessandro Magnani and Stephen P. Boyd. Convex piecewise-linear fitting. Optimization and Engineering, 10(1):1–17, Mar 2009. ISSN 15732924. doi: 10.1007/s11081-008-9045-3. URL https://doi.org/10. 1007/s11081-008-9045-3. [14] Alessandro Magnani and Stephen P Boyd. Convex piecewise-linear fitting. Optimization and Engineering, 10(1):1–17, 2009. [15] Rahul Mazumder, Arkopal Choudhury, Garud Iyengar, and Bodhisattva Sen. A computational framework for multivariate convex regression and its variants. arXiv preprint arXiv:1509.08165, 2015. [16] Warren B Powell. Approximate Dynamic Programming: Solving the curses of dimensionality, volume 703. John Wiley & Sons, 2007. [17] Emilio Seijo, Bodhisattva Sen, et al. Nonparametric least squares estimation of a multivariate convex regression function. The Annals of Statistics, 39(3):1633–1657, 2011. [18] Alejandro Toriello and Juan Pablo Vielma. Fitting piecewise linear continuous functions. European Journal of Operational Research, 219(1):86 – 95, 2012. ISSN 0377-2217. doi: https://doi.org/10.1016/j.ejor.2011.12. 030.

URL http://www.sciencedirect.com/science/article/

pii/S0377221711011246. [19] Hal R Varian. The nonparametric approach to production analysis. Econometrica: Journal of the Econometric Society, pages 579–597, 1984.

19