An exact dynamic programming algorithm for large-scale unconstrained two-dimensional guillotine cutting problems

An exact dynamic programming algorithm for large-scale unconstrained two-dimensional guillotine cutting problems

Computers & Operations Research 50 (2014) 97–114 Contents lists available at ScienceDirect Computers & Operations Research journal homepage: www.els...

1MB Sizes 1 Downloads 67 Views

Computers & Operations Research 50 (2014) 97–114

Contents lists available at ScienceDirect

Computers & Operations Research journal homepage: www.elsevier.com/locate/caor

An exact dynamic programming algorithm for large-scale unconstrained two-dimensional guillotine cutting problems Mauro Russo a, Antonio Sforza b, Claudio Sterle b,n a b

Uniplan Software S.r.l., Via Manzoni 21, 84018 Scafati, Salerno, Italy Department of Electrical Engineering and Information Technology, University “Federico II” of Naples, Via Claudio 21, 80125 Naples, Italy

art ic l e i nf o

a b s t r a c t

Available online 15 April 2014

In the unconstrained two-dimensional cutting problems (U2DCP) small rectangular objects have to be extracted from a large rectangular sheet, with no limits on the number of small objects. The exact U2DCP solving approaches present in literature show some limits in tackling very large size instances, due to the high memory requirements. In this work we propose five improvements, three original and two derived from the literature, in order to overcome these limits and to reduce the computational burden of the knapsack function based U2DCP solving approaches. These improvements, based on proofed theoretical results, allow to reduce the search space and to avoid redundant solutions without loss of the feasible ones. The presented improvements, together with several computational refinements, are integrated in a new dynamic programming algorithm, which modifies the one by Russo et al. (2013 [16]). The proposed algorithm has been experienced on test instances present in literature and compared with the best U2DCP solving approaches. The obtained results show that it significantly outperforms them and it determines the optimal solution of unsolved very large size instances. & 2014 Elsevier Ltd. All rights reserved.

Keywords: Guillotine cutting Knapsack function Dynamic programming

1. Introduction The two-dimensional cutting problem (2DCP) consists in dissecting a large rectangular object (stock sheet) in small rectangles (pieces) with the aim of minimizing the trim loss or maximizing the profit associated to the extracted pieces. The guillotine variant of the 2DCP uses just straight cuts dissecting a rectangle in two parts. When the number of pieces to be extracted is unbounded, then the problem is referred to as the unconstrained two-dimensional guillotine cutting problem (U2DCP), otherwise it is referred to as constrained (C2DCP). The 2DCP has been widely treated in the literature by exact (dynamic programming, branch and bound, MIP models) and heuristic approaches as witnessed by the survey papers of Dyckhoff et al. [7], Lodi et al. [14] and Wäscher et al. [18]. Gilmore and Gomory [10] introduced the 2DCP and proposed two dynamic programming procedures based on the knapsack function. Herz [12] and Christofides and Withlock [3] proposed the first approaches introducing the so-called discretization points (also known as normal coordinates). The ideas proposed in these

n

Corresponding author. Tel.: þ 39 081 7685911; fax: þ 39 081 7683816. E-mail addresses: [email protected] (M. Russo), [email protected] (A. Sforza), [email protected] (C. Sterle). http://dx.doi.org/10.1016/j.cor.2014.04.001 0305-0548/& 2014 Elsevier Ltd. All rights reserved.

works have been widely explored in the literature. Concerning the dynamic programming, one of the main reasons of the research interest in this approach is that it returns the optimal solution for all the U2DC subproblems related to the subrectangles included in the sheet. These solutions are generally used as upper bounds in the C2DCP. Concerning instead the discretization points, they allow to reduce the problem size and consequently the computational effort. For this reason some contributions presented new performing methods for the computation and the reduction of the discretization points. In this perspective Beasley [1] was the first one proposing an exact and a heuristic algorithm complementing the basic Gilmore and Gomory [10] dynamic programming procedure with the discretization points. In particular, in the heuristic algorithm, the author finds a suboptimal solution using a subset of the discretization points obtained through the iterative elimination of the smallest pieces, until a prefixed size of the subset is achieved. Scheithauer and Terno [17] introduced the raster points, which constitute a subset of the discretization points. The raster points, contrarily to the subset proposed by Beasley [1], can be used in an exact dynamic programming algorithm without loss of optimality. Cintra et al. [4], working on the idea of Beasley [1], proposed an exact dynamic programming procedure which simplifies the computation of the knapsack function and presented efficient procedures for

98

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

the computation of the discretization points. Moreover, they reduced the number of the discretization points introducing an idea which partially recalls the raster points. Kang and Yoon [13] proposed a branch and bound algorithm which is the best exact algorithm in the literature for the U2DCP. They modified the algorithm of Young-Gun et al. [19] improving the upper bound and using anti-redundancy and dominance techniques, based on the development of the strategies proposed in Cung et al. [6]. Moreover they introduced a pre-processing to be performed before running the algorithm aimed at reducing the number of piece types to be taken into account. This pre-processing is independent from the solving approach. Recently Birgin et al. [2] proposed a two-phase heuristic for the non-guillotine case of U2DCP, which in the first phase solves the guillotine variant of the problem. This phase has two steps: a fast heuristic step based on the two-stage algorithm by Gilmore and Gomory [9] and an exact dynamic programming step. They used the raster points and introduced a pre-processing similar to the one of Kang and Yoon [13], but less performing even if faster. For very large size instances they obtained good suboptimal solutions, using a subset of the raster points. Russo et al. [16] corrected and improved one of the two dynamic programming procedures by Gilmore and Gomory [10]. Moreover they introduced in their algorithm the reduction of the discretization points proposed by Cintra et al. [4], and a preprocessing which modifies the one proposed by Birgin et al. [2]. To the best of our knowledge, this algorithm is the best exact dynamic programming algorithm proposed in the literature for the U2DCP. The main weakness point of the exact approaches, and in particular of the dynamic programming algorithms, resides in the large memory requirements. In this work we propose five improvements which allow to overcome these limits. In particular the first two improvements are derived from ideas present in the literature, i.e. the raster points and the pre-processing on piece types. Concerning the raster points, we provide an efficient data structure for their integrations whereas, concerning the pre-processing, we propose new effective strategies. It is important to underline that both these improvements can be used regardless of the adopted solving approach. The remaining three improvements are strongly correlated and are specifically referred to knapsack function based algorithms and in particular to the one by Russo et al. [16]. These improvements, based on proofed theoretical results, allow to significantly reduce the solution space of the algorithm, avoiding redundant solutions without loss of the feasible ones. All these enhancements, together with several computational refinements are integrated in a new dynamic programming algorithm, which modifies the one by Russo et al. [16]. The proposed algorithm significantly outperforms the best available solving approaches present in the literature for the U2DCP and determines the optimal solution of unsolved very large size instances. The paper is structured as follows. In Section 2 we recall the U2DCP setting, the knapsack function and the algorithm by Russo et al. [16]. In Section 3 we describe the proposed improvements. Section 4 presents the new algorithm and provides also some computer programming refinements. Finally Section 5 presents a computational study of the proposed improvements and the results obtained on literature instances.

2. Unconstrained cutting problem and knapsack function In the U2DCP a single large rectangular stock sheet of dimensions L and W has to be dissected in order to obtain smaller rectangular pieces among n given types {t1, t2,…, tn}, each of them

characterized by its dimensions (li, wi) and profit πi. Pieces are extracted from the sheet, without overlapping, by a sequence of vertical and horizontal straight guillotine cuts, i.e. cuts which divide a rectangle into two parts. No piece rotation is allowed. A set of guillotine cuts is referred to as cutting scheme. Each set corresponds to a cutting pattern, i.e. an n-dimensional integer vector (b1, b2,…, bn), where bi represents the number of pieces of type ti extracted from the sheet. No bound exists on the bi values. If the profits πi are equal to the areas of the pieces, the problem is referred to as unweighted, and consists in finding the optimal solution which minimizes the trim loss, otherwise it is referred to as weighted and consists in finding the optimal solution which maximizes the profit associated to the set of extracted pieces. Gilmore and Gomory [10] introduced the knapsack function to compute the maximum profit associated to a rectangle of dimensions (l, w). The knapsack function, refined by Russo et al. [16] and used in their dynamic programming algorithm (referred to as RS_kf), is defined as follows: 0

0

0

f ðl; wÞ ¼ max ff 0 ðl; wÞ; f v ðl; wÞ; f h ðl; wÞ; f ðl–1; wÞ; f ðl; w–1Þg

ð1Þ

where 0

ð2aÞ

0

ð2bÞ

0

ð2cÞ

f 0 ðl; wÞ ¼ max f0; π i : li rl; wi r wg f v ðl; wÞ ¼ max ff ðx1 ; wÞ þf ðx2 ; wÞ : 0 ox1 r x2 ; x1 þ x2 ¼ lg f h ðl; wÞ ¼ max ff ðl; y1 Þ þ f ðl; y2 Þ : 0 o y1 r y2 ; y1 þ y2 ¼ wg

The term f00 is the profit obtained extracting a single piece. For the sake of clarity in the following it will be referred to as onepiece term. The term fv0 is the profit obtained performing vertical cuts of width w. The term fh0 is the profit obtained performing horizontal cuts of length l. The last two terms of (1), f(l  1, w) and f (l, w 1), allow to take into account solutions where the sum of the vertical and horizontal cut coordinates (x1 þx2 and y1 þ y2) does not correspond exactly to l or w respectively. In (1), differently from Russo et al. [16], we used the notations f00 , fv0 and fh0 instead of f0, fv and fh for the first three terms. This choice has two main reasons. Concerning the one-piece term f00 , we will propose a new definition in Section 3.3 and we will refer to the new function as f0. Concerning instead the other two terms, we need to distinguish the definition of the functions fv0 and fh0 and the matrix values that will be actually computed by the algorithm, denoted by fv and fh respectively. The RS_kf algorithm computes the knapsack function exploiting the support functions λ and ω, proposed by Gilmore and Gomory [10], and the corrected version of the three anti-redundancy conditions on their values. In particular the used support functions are 8 if f ðl; wÞ ¼ f ðl 1; wÞ > <0 0 λ ðl; wÞ ¼ min fl; x1 : 0ox1 rl=2; C v;1 ðlx1 ; wÞ; > : f ðl; wÞ ¼ f ðx1 ; wÞ þf ðlx1 ; wÞg otherwise ð3aÞ 8 if f ðl; wÞ ¼ f ðl; w 1Þ > <0 ω0 ðl; wÞ ¼ min fw; y1 : 0oy1 rw=2; C h;1 ðl; w y1 Þ; > : f ðl; wÞ ¼ f ðl; y1 Þþf ðl; w y1 Þg otherwise ð3bÞ and the three conditions are  C v;1 ðx2 ; wÞ  f( x r x2 : ωðx; wÞ 4 0g  C v;2 ðx1 ; wÞ  λðx1 ; wÞ ¼ x1

 C v;3 ðx1 ; x2 ; wÞ  x1 r λðx2 ; wÞ

ω

 C h;1 ðl; y2 Þ  f( y r y2 : λðl; yÞ 4 0g

 C h;2 ðl; y1 Þ  ωðl; y1 Þ ¼ y1

 C h;3 ðl; y1 ; y2 Þ  y1 r ωðl; y2 Þ:

Also in this case we used the notation λ0 and ω0 instead of λ and to distinguish between the definitions of the support functions

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

x1

l

x2

u1

99

u2

u3

u4

w

Fig. 1. Redundancy examples.

and the matrix values that will be actually computed by the algorithm. It is important to note the presence of the conditions Cv,1 and Ch,1 in the definitions (3a) and (3b) respectively. This implies a “cross effect” in the computation of the two support functions, since the ω values intervene in the computation of λ ones and vice versa. The value λ(l, w) represents the minimum coordinate x to be used for a vertical guillotine cut dissecting the rectangle with dimensions (l, w) in two parts, such that the value f(l, w) is the sum of the values associated to the smaller rectangles f(x, w) and f (l x, w). If the value f(l, w) is obtainable just with one horizontal cut, then λ(l, w)¼ l. If instead the value f(l, w) is equivalent to the one associated to a rectangle with smaller length, i.e. f(l  1, w), then λ(l, w)¼0. This allows to discard all the sums between f(l, w) and values associated to rectangles having width equal to w, since the value of f(l  1, w) can be used in its place. A similar description is valid for ω. The redundancies tackled by the three couples of conditions are referred to as D2-type and D3-type by Cung et al. [6]. In the following, for the sake of the brevity, we explain the effect of the three conditions just on the vertical dissections. The first condition Cv,1 eliminates the D2-type redundancies, for which we give an example in Fig. 1a and b. Indeed we can observe that the main vertical cut in Fig. 1a can be replaced by a main horizontal cut at the y-coordinate equal to the width of the biggest piece (Fig. 1b). Hence the sum f(x1, w)þ f(x2, w) associated to Fig. 1a is avoided. The two conditions Cv,2 and Cv,3 eliminate the D3-type redundancies, for which an example is reported in Fig. 1c and d, where four pieces having width w and lengths u1 ou2 ou3 ou4, are differently arranged within a rectangle (l, w). Indeed, the condition Cv,2 drops all sums f(x1, w) þf(l  x1, w) where x1 is obtained as sum of two or more addends u1, u2, u3, u4, and hence λ(x1, w) is equal to the minimum of them. Then, in case x1 corresponds to one of the four values u1, u2, u3, u4, all satisfying Cv,2, the condition Cv,3 allows only the sum with x1 ¼ u1, since otherwise l x1 would have u1 as addend and λ(l  x1, w) ¼u1 ox1. Hence, among the possible solutions, just the one of Fig. 1d (with x1 ¼u1) is built. Algorithm 1 reports the pseudocode of RS_kf. Here we just recall the structure of the algorithm and notice that the first couple of conditions (Cv,1 and Ch,1) is realized through two Boolean vectors, Λ1 (l ¼1,…, L) and Ω1 (w ¼1,…, W). The vector Ω1(w) is set to true as soon as ω(l, w) is proved to be positive for at least one l value, because this means that Cv,1(x, w) is satisfied for x Zl. Similarly Λ1(l) is set to true when λ(l, w) is positive for at least one value of w. The algorithm runs two nested outer cycles, on the width y2 (line 3) and on the length x2 (line 4) respectively. For each couple (x2, y2) it computes at first the values of f(x2, y2), λ(x2, y2) and ω(x2, y2) and updates the two Boolean values Ω1(y2) and Λ1(x2). Then two independent inner cycles are run, the first on the length x1 (line 9) and the second on the width y1 (line 13). In the first inner cycle the authors combine f(x2, y2) with the values f(x1, y2) and update the corresponding values fv(x1 þx2, y2) and λ (x1 þx2, y2). In the second inner cycle the authors combine f(x2, y2)

with the values f(x2, y1) and update the corresponding values fh(x2, y1 þy2) and ω(x2, y1 þy2). Input : (L, W), (li, wi,

πi) for i ¼1, …, n

Output : f(l, w), λ(l, w), ω(l, w) for 0 rl rL, 0 rw rW 1 Set f(l, 0) ¼0 for 1 rl rL, f(0, w)¼ 0 for 1r wrW; Set fv(l, w)¼ fh(l, w)¼0 for 1r lrL, 1 rwr W 2 Set Λ1(l)¼false for 1 rl rL, Ω1(w)¼false for 1r wrW 3 For y2 ¼ 1 To W Do//outer cycle over the widths 4 For x2 ¼1 To L Do // outer cycle over the lengths 5 6

//settings of f, λ, ω, Λ1 and Ω1 Set f(x2, y2)¼max{ f00 (x2, y2) , fv(x2, y2) , fh(x2, y2), f(x2  1, y2), f(x2, y2  1) } If f(x2, y2) ¼f(x2  1, y2) Then Set λ(x2, y2)¼ 0 Else-if f(x2, y2)

4fv(x2, y2) Then Set λ(x2, y2) ¼x2 End If

ω(x2, y2)¼0 Else-if ω(x2, y2) ¼y2 End If 8 If λ(x2, y2)40 Then Set Λ1(x2)¼true End If; If ω(x2, y2)40 Then Set Ω1(y2) ¼true End If 7

If f(x2, y2) ¼f(x2, y2  1) Then Set

f(x2, y2)4fh(x2, y2) Then Set

//inner cycle over the lengths 9

If

Ω1(y2) Then For x1 ¼1 To min{λ(x2, y2), L  x2} Do

10

If λ(x1, y2)¼ x1 and sum  f(x1, y2) þ f(x2, y2) Z fv(x1 þ x2, y2) Then

11

Set fv(x1 þ x2, y2)¼sum , λ(x1 þ x2, y2)¼x1 End If End For End If //inner cycle over the widths

12 13 14

If

Λ1(x2) Then For y1 ¼ 1 To min{ω(x2, y2), W  y2} Do

If ω(x2, y1) ¼y1 and sum  f(x2, y1) þ f(x2, y2) Z fh(x2, y1 þ y2) Then

15 Set fh(x2, y1 þ y2)¼ sum , 16 End For End If 17 End For 18 End For

ω(x2, y1 þ y2) ¼y1 End If

Algorithm 1. RS_kf.

3. Modifications and enhancements of the RS_kf algorithm As said above the main limit of the U2DCP solving approaches and in particular of the ones based on the knapsack function is in the high memory required to allocate large size matrices: f, f00 , fv, fh, λ, ω. In the following subsections we propose five modifications of RS_kf which have the twofold aim of reducing the memory allocation and speeding up the algorithm: i ii iii iv v

integration of the raster points; two-step pre-processing; modification of the one-piece term; λ and ω reduction; inner and outer cycles reduction.

100

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

It is important to say that the last three modifications can be realized independently from the first two, but we present them following the order used in the complete version of our algorithm. 3.1. Integration of the raster points The discretization points have been introduced independently by Herz [12] and Christofides and Withlock [3]. With reference to the horizontal (vertical) coordinates of the vertical (horizontal) cuts, the discretization points are defined as the subset of {0, 1, 2, …, L} ({0, 1, 2, …, W}) containing the values which are a nonnegative integer combination of the piece lengths (widths): ( ) Dx ¼

n

L; d ¼ ∑ mi Uli : d r L; mi Z 0; mi integer (

Dy ¼

ð4aÞ

i¼1

)

n

W; d ¼ ∑ mi U wi : d r W; mi Z0; mi integer

ð4bÞ

i¼1

The raster points have been introduced by Scheithauer and Terno [17] and constitute a reduction of the discretization points. Adopting the following notations: 〈s〉v ¼ max fd A Dx : d r sg;

〈s〉h ¼ max fd A Dy : d r sg

ð5Þ

the two sets of raster points are defined as Rx ¼ {〈L  d〉v: d ADx} and Ry ¼ {〈W  d〉h: d ADy}. As made by Birgin et al. [2] in their dynamic programming algorithm, in this work we integrate the RS_kf algorithm with the raster points. This allows to have smaller size matrices and consequently to reduce the memory requirements and the computational effort. This integration is realized using two couples of vectors (Nx, IRx) and (Ny, IRy), for the vertical cuts and the horizontal cuts respectively: N x ði : i ¼ 1; …; jRx jÞ

N y ði : i ¼ 1; …; jRy jÞ

IRx ðdÞ ¼ min fi : 1 ri r jRx j; N x ðiÞ Z dg IRy ðdÞ ¼ min fi : 1 r ir jRy j; Ny ðiÞ Z dg

ð6Þ

ð7Þ

Nx and Ny contain the raster points of the associated axis in increasing order. IRx and IRy contain an index for each coordinate d of the axis, from 1 to the dimension of the sheet (L or W). This index is the position of the minimum raster point equal to or larger than d in the first vector. 3.2. Two-step pre-processing The pre-processing strategies proposed in the literature consist in removing a piece type tr if a profit higher than πr can be obtained in a rectangle of dimensions (lr, wr) with a homogeneous solution, i.e. using just one piece type ti not larger than tr. In particular, Kang and Yoon [13] exploit the raster points and remove a piece type tr if:

π r r π i U ⌊elr =li c U ⌊ewr =wi c

ð8Þ

where, using the notations (5), elr ¼L 〈L  lr〉v and ewr ¼ W 〈W  wr〉h. We refer to the rectangle (elr, ewr) as the extended rectangle for the type tr. Russo et al. [16] remove a piece type if:

π r r π i U α U β with α ¼ ⌊lr =li c; β ¼ ⌊wr =wi c

ð9Þ

This condition is a modified version of the one proposed by Birgin et al. [2], where the parameters α and β are both set to 1. Condition (9) is less performing than condition (8), since it does not consider the raster points and the extended rectangles. For this reason it is faster. Birgin et al. [2] argued also a more general pre-processing that considers heterogeneous solutions, i.e. solutions where all smaller

piece types can be used to fill the rectangle under investigation. However they did not implement this idea with the motivation that it does not determine a reduction of the discretization points in the unweighted variant of the U2DCP, and it requires a significant computational effort in the weighted variant. Starting from this idea we propose a two-step mixed preprocessing (2SM-PP) which exploits the strong points of both strategies. In the first step we verify the condition (9) in order to obtain a first straight reduction of the piece type set. In the second step we iteratively look for further reductions of the remaining piece type set exploiting the extended rectangles. In this way a reduction of the raster points can be achieved also in the unweighted case. In particular in the second step, the 2SM-PP examines the piece types by increasing extended rectangle area order. For each piece type tr it verifies if another piece of equivalent extended area and bigger profit exists, otherwise it tries to fill tr using smaller piece types not yet removed. It classifies the piece types in small and large. A piece type tr is large if the following condition holds: elr  ewr Z A  (L  W)/n. For these piece types we look for homogeneous solutions applying (8). The other piece types are instead small and for them we look for heterogeneous solutions using the RS_kf algorithm on the extended rectangle (elr, ewr). Since in the worst case the number of small piece types is n, the choice of the threshold value A is aimed at limiting the size of the extended areas for which heterogeneous solutions have to be determined. At each iteration we compute ex-novo the raster points. This can determine an increase of the extended rectangle for some residual piece types. Hence we verify their removal and avoid to do it for the other types. The second step ends when the number of piece types removed in the last iteration is lower than the value γ  n0 , where γ ¼0.1 (constant) and n0 is the number of residual types before the last iteration. In the above introduced classification an exception is represented by the small piece types with a non-small dimension with reference to the sheet dimension L or W. With reference to L, this pffiffiffi pffiffiffi occurs when ðelr 4 LÞ= n (large length) and hence ðewr oWÞ= n (small width). In this case the chosen strategy depends on the width wr. If wr is a discretization point obtainable through the widths of the smaller residual piece types, then the algorithm looks for homogeneous solutions. The motivation is that the elimination of the piece would not provide a reduction of the raster points on the y-axis and the subproblem could require a significant effort. Moreover the large length has a lower impact on the raster point set on the x-axis. If instead wr is not discrete, a heterogeneous solution is searched. The 2SM-PP similarly operates with reference to W, when pffiffiffi pffiffiffi ðewr 4WÞ= n and ðelr o LÞ= n, according to the length lr. The results of the 2SM-PP in terms of removed pieces and computation time with respect to other pre-processing strategies will be shown in Section 5. 3.3. Modification of the one-piece term Given a couple (l, w), the RS_kf algorithm sets the final value f(l, w) comparing the five candidate values of relation (1). They are elements of four matrices (f00 , fv, fh, f) requiring a large amount of memory resources. For this reason, in order to avoid the allocation of the first matrix, we propose the following new definition of the one-piece term to be used in place of f00 given in (2a): f 0 ðl; wÞ ¼ max f0; π i : li ¼ l; wi ¼ wg

ð10Þ

Hence we consider just the case where a single piece of dimensions l and w is extracted from the rectangle with no trim loss. We can do this since the extraction of a smaller piece is taken into account in relation (1) by f(l  1, w) or f(l, w  1).

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

In order to support the definition (10) we use the list of increasing piece widths: Z ¼ w 1 ; w 2 ; …; w q

ð11Þ

where q rn, w1 ow2 o…o wq. Moreover, for each width wj in Z, we define the list of corresponding increasing piece lengths, denoting by oj its size: U j ¼ l j;1 ; l j;2 ; …; l j;oj

ð12Þ

For each couple (j, k) with j A{1,…, q} and kA {1,…, oj}, the couple (wj, lj,k) corresponds to the dimensions of a piece type, whose index is denoted by pj,k. Hence we associate a type index list Tj ¼{pj,1,…, pj,oj} to each list Uj. Each pj,k is uniquely determined because of the pre-processing. The implementation of this modification in the algorithm will be detailed in Section 4.1. Here we just underline that its usage has a fundamental role in the performance of the outer cycles reduction, as will be explained in Section 3.5. 3.4. Reduction of

λ and ω values

In Section 2 we recalled the λ and ω support functions and we highlighted that their definitions contain the conditions Cv,1 and Ch,1. These conditions have two opposite effects on the computational effort. Indeed, with reference to λ, Cv,1 allows to discard a sum f(x, w)þf(l  x, w) when ω(x, w)¼ ω(l  x, w)¼0. This can determine a computed value fv(l, w) lower than fv0 (l, w) defined in (2b), because the set of performed sums is reduced. As a consequence, this reduction can determine an increase of λ(l, w) with respect to the case when Cv,1 is not used and, in turn, a greater number of sums, involving f(l, w), allowed by Cv,2 and Cv,3. Indeed all sums accepted by Cv,2 and Cv,3 remain accepted also after an increase of λ. This may occur since on the one hand Cv,2 requires λ (x1, y2)¼x1 and such value of λ is maximum and cannot increase. On the other hand, Cv,3 requires x1 r λ(x2, y2) and this condition remains satisfied if λ increases. The reduction and the increase of the number of the allowed sums determine a similar effect on the computational effort. In order to maintain the first effect and to avoid the second one, we observe that a sum f(x, w)þ f(l  x, w) discarded by Cv,1 is equivalent to the sum f(x, w 1) þ f(l  x, w  1), which is taken into account in the computation of fv(l, w  1). Hence, our idea is to compare fv(l, w) with fv(l, w  1) and, if the second is greater, to replicate the values of fv and λ from (l, w  1) to (l, w). A similar discussion applies to Ch,1, ω and fh with (l  1, w) instead of (l, w  1). In the following we show that performing this “replication” we achieve the minimization of the λ and ω values. First of all we redefine the support functions by removing the conditions Cv,1 and Ch,1 in definitions (3) and we refer to them as λ0 and ω0 respectively: 8 if f ðl; wÞ ¼ f ðl  1; wÞ > <0 λ0 ðl; wÞ ¼ min fl; x1 : 0 o x1 rl=2; > : f ðl; wÞ ¼ f ðx1 ; wÞ þ f ðl x1 ; wÞg otherwise ð13aÞ 8 > <0 ω0 ðl; wÞ ¼ min fw; y1 : 0 oy1 rw=2; > : f ðl; wÞ ¼ f ðl; y1 Þ þf ðl; w  y1 Þg

if f ðl; wÞ ¼ f ðl; w  1Þ

101

by f(l, w) and f(l  1, w) (f(l, w  1) resp.) regardless of the used conditions. We can also affirm that λ0 and ω0 are the minimum support functions, since, without Cv,1 and Ch,1 conditions, all sums are taken into account. Moreover it is important to remind that in Russo et al. [16] the following has been proven: Property 1. In RS_kf the second and third conditions do not affect the λ and ω values actually computed, regardless of the usage of first condition. As a consequence, when Cv,1 and Ch,1 are used, the RS_kf algorithm obtains values of λ and ω equal to the ones of λ0 and ω0 respectively. Otherwise it returns values equal to λ0 and ω0 respectively. In particular, with reference to λ, when 0 o λ (l, w) ol, the coordinates x1 ¼ λ(l, w) and x2 ¼l  λ(l, w) satisfy the three conditions Cv,1, Cv,2 and Cv,3 in the first case (λ ¼ λ0 ) and Cv,2 and Cv,3 in the second case (λ ¼ λ0). Hence the related sum is performed in both cases. On this basis, the goal of the replication is to obtain λ ¼ λ0 and ω ¼ ω0 without dropping the first condition. In order to prove how this result is achieved, other preliminary considerations are needed. With reference to λ0 it is easy to verify the following property, valid also for ω0: Property 2. Being l, u1,…, uk length values such that l ¼u1 þ… þuk, if λ0(uj, w) ¼0 for at least one index j, jA{1,…, k}, then f(u1, w)þ …þ f(uk, w) rf(l  1, w). Let us introduce two further functions fv,0 and fh,0, different from fv0 and fh0 given in (2): f v;0 ðl; wÞ ¼ max f0; f ðx1 ; wÞ þ f ðx2 ; wÞ : 0 ox1 r x2 ; x1 þ x2 ¼ l; λ0 ðx1 ; wÞ 4 0; λ0 ðx2 ; wÞ 40g

ð14aÞ

f h;0 ðl; wÞ ¼ max f0; f ðl; y1 Þ þ f ðl; y2 Þ : 0 o y1 r y2 ; y1 þ y2 ¼ w; ω0 ðl; y1 Þ 4 0; ω0 ðl; y2 Þ 4 0g

ð14bÞ

With respect to the definitions (2b) and (2c), (14a) and (14b) avoid sums involving terms f(x, y) for which λ0(x, y)¼ 0 or ω0(x, y)¼0, whose results are not greater than f(l 1, w) or f(l, w 1) respectively, according to Property 2. Hence the values of fv,0(l, w) and fh,0(l, w) do not depend on f(l 1, w) and on f(l, w  1) respectively, and the following relations are valid: 0

f v ðl; wÞ ¼ max ff v;0 ðl; wÞ; f ðl  1; wÞg 0

f h ðl; wÞ ¼ max ff h;0 ðl; wÞ; f ðl; w  1Þg

ð15Þ

f ðl; wÞ ¼ max ff 0 ðl; wÞ; f v;0 ðl; wÞ; f h;0 ðl; wÞ; f ðl–1; wÞ; f ðl; w–1Þg

ð16Þ

Taking into account (15) and the term f(l  1, w), it is easy to verify that the relation (16) is valid also if we replace fv,0(l, w) with fv(l, w) when this is contained in [fv,0(l, w), fv0 (l, w)]. Moreover according to (13a), Property 2, (14a) and (15) the following equivalence is valid: 0

0 o λ0 ðl; wÞ o l 3 f ðl; wÞ ¼ f v ðl; wÞ ¼ f v;0 ðl; wÞ 4 f ðl  1; wÞ

ð17Þ

Hence, in the case λ0(l, w)e {0, l}, the interval expressed above is necessarily a singleton because the bounds fv,0(l, w) and fv0 (l, w) are equivalent. For the opposite case, i.e. λ0(l, w) A {0, l}, we introduce the following property:

ð13bÞ

Property 3. If λ0(l, w)A {0, l} then the relation (16) with the replacement fv,0-fv remains valid provided that fv(l, w)rfv0 (l, w), regardless of fv,0(l, w).

By comparing (13) with (3), it is clear that λ0 r λ and ω0 r ω0 . Indeed λ0 (ω0 resp.) can be greater than λ0 (ω0 resp.) only when they are both positive. If instead λ0 ¼0 (ω0 ¼0), then λ0 (ω0 resp.) is necessarily zero, since in this case their values are determined just

Proof. Based on (13a), (15) and (17), if λ0(l, w) ¼0 then f(l, w) is equal to f(l  1, w), which is a candidate in (16) regardless of fv, whereas if λ0(l, w)¼ l then f(l, w)4 fv0 (l, w)Z f(l  1, w) and the maximum candidate is one among f0(l, w), fh,0(l, w) and f(l, w  1).

otherwise 0

102

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

The bound on fv0 (l, w) ensures that the value f(l, w) does not overcome the maximum candidate. A similar discussion can be made to prove that fh,0 can be replaced by fh in relation (16). Finally, from the definitions (1) and (2) we can easily deduce that: Property 4. f, fv0 and fh0 are monotone. According to the previous definitions and properties and, for the sake of clarity, referring to the RS_kf pseudocode, we can assert that: fnv (l,

w) the value of fv(l, w) before the line 5 of Proposition 1. Being RS_kf, the replication fv(l, w)¼fv(l, w  1) and λ(l, w)¼ λ(l, w  1), in case fnv (l, w)ofv(l, w  1), allows to obtain λ(l, w) ¼ λ0(l, w) for each couple (l, w) and at the same time the value assigned to fv(l, w) is equal to fv,0(l, w) (and fv0 (l, w)) when 0 o λ0(l, w)o l, or not greater than fv0 (l, w), when λ0(l, w)A{0, l}. For the sake of brevity, the proof is reported in Appendix A. Here we just highlight that the setting of f(l, w) can be performed either before or after the replication. 3.5. Inner and outer cycles reduction In this section we propose several computational improvements which reduce the number of operations in the inner and outer cycles of the RS_kf algorithm. In particular a general discussion is provided for the inner cycles (over the lengths and over the widths) since the reduction is performed in the same way. Concerning the outer cycles, two different discussions are required and we refer to the cycle over the widths and over the lengths as first and second outer cycle respectively. For the sake of clarity also in this case we specifically refer to the RS_kf pseudocode for the explanation of the proposed strategies and we do not consider the raster points, whose integration can be easily made (as will be discussed in Section 4.1). 3.5.1. Inner cycles reduction In the inner cycle over the lengths, the RS_kf algorithm uses as possible horizontal coordinates of vertical cuts all the values x1, x1 A {1,…, x1,max}, x1,max ¼min{λ (x2, y2), L  x2} (line 9), and then it checks if the condition Cv,2 is verified (line 10), i.e. if λ(x1, y2)¼x1. The same is made for the inner cycle over the widths (lines 13–14), with y1 A {1,…, y1,max}, y1,max ¼ min{ω(x2, y2), W  y2}, and ω(x2, y1)¼ y1 (Ch,2). The modification described in the following is aimed at reducing the set of cut coordinates, keeping just the ones satisfying Cv,2 and Ch,2. For each value w, w¼1,…, W and l, l ¼1,…, L, let us define a vector: Λ2(w) and Ω2(l) respectively. Each vector Λ2(w) contains the length values l such that λ(l, w) ¼lr L in increasing order. On the other side each vector Ω2(l) contains the widths wrW such that ω(l, w)¼ w. In this way, in the inner cycle over the lengths, when y2 ¼w we avoid to necessarily use all the values x1, x1 A {1, …, x1,max}, by considering just the ones belonging to Λ2(w), not greater than x1,max. The same is made for the inner cycle over the widths with x2 ¼l by exploiting Ω2(l). Note that, since the outer cycles are performed by increasing coordinates (first on y2 and second on x2) then the introduced vectors can be fulfilled adding in line (push back) the current value of x2 in Λ2(y2) when λ(x2, y2)¼x2, and the current value of y2 in Ω2(x2) when ω(x2, y2)¼ y2. 3.5.2. First outer cycle reduction Concerning the outer cycle over the widths (first outer cycle) we can avoid to take into account a coordinate value y2 ¼ w for which

no ω(x2, w) is positive. Indeed, if ω(x2, w) ¼0 for all x2 then no sum is performed in the inner cycle over the widths for the condition Ch,3 (line 13). Moreover no sum is performed in the inner cycle over the lengths for the condition Cv,1 (line 9), since in this case Ω1(w) remains always false. Hence, to avoid considering a coordinate value y2 ¼ w, we should know the ω(x2, w) values for all x2, but their evaluation can be accomplished just if the cycle with y2 ¼ w is performed. For this reason we show sufficient conditions which imply ω(x2, w)¼0. In particular we prove that: Proposition 2. f0(x, w)¼fh,0(x, w)¼ 0 8 x A {1,…, x2} ) ω0(x, w)¼0 8 x A {1,…, x2} Proof. From (13b) the condition ω0(x, w) ¼0 is equivalent to f(x, w) ¼f(x, w 1), hence we proceed by demonstrating that f(x, w 1) is the maximum candidate for f(l, w), with l ¼x, in the relation (16). By hypothesis the two candidates f0(x, w) and fh,0(x, w) are both equal to zero. The other three candidates are fv,0(x, w), f(x  1, w) and f(x, w  1). We proceed now by induction. If x ¼1 then fv,0(x, w) and f(x  1, w) are both equal to zero and hence the only possible positive candidate is f(x, w  1). In the case 1 ox rx2, by induction f(δ, w) ¼f(δ, w 1) for each δ r x  1, hence the candidate f(x  1, w) is equal to f(x  1, w  1), which is not greater than f(x, w  1). The same occurs for fv,0(x, w) because, if positive, it is obtained considering sums f(u1, w)þf(u2, w) with u1 þ u2 ¼ x but, by induction, both terms are equivalent to the values associated to a smaller width, f(u1, w  1) and f(u2, w  1) respectively. Hence the sum cannot be greater than f(x, w  1). Proposition 2 is valid also for fh and ω, for which the proof applies using code lines 5 and 7 of the Algorithm 1 with y2 ¼w, and it holds regardless of the replication. We exploit Proposition 2 with x2 ¼L, by introducing a Boolean vector Ω3(w), w ¼1,…, W. Each Ω3(w) is true if at least one between f0(x, w) or fh(x, w) (xr L) is positive, and it is used in the outer cycle step with y2 ¼w. Indeed, for each value y2 the nested code (i.e. the second outer cycle) is not performed if Ω3(y2)¼false. Finally, considering the computation of f and ω, at lines 5 and 7 we have to replace y2  1 with y2p, i.e. the previous maximum value of y2 for which the cycle has been proficiently performed (Ω1(y2p)¼ true). In particular, note that Ω3(y2) ¼true does not imply Ω1(y2)¼true (the opposite occurs) and the usage of Ω1 for the choice of y2p allows to reduce the memory requirements, as it will be discussed in Section 4.2. To conclude it is important to note that Proposition 2 can be verified also using the definition f00 of the one-piece term. However f00 (l, w) is positive for most of the couples (l, w), whereas f0(l, w) is positive for no more than n couples. Indeed, as explained in Section 3.3, we recall that f0 is positive just for the couples (l, w) corresponding to dimensions of a piece type. Hence the modification of the one-piece term definition significantly affects the effectiveness of the outer cycles reduction. 3.5.3. Second outer cycle reduction In order to speed up the outer cycle over the lengths (second outer cycle) we can use the same strategy presented in the previous section, but with the introduction of another Boolean vector, referred to as Λ3(l), l ¼1,…, L and a variable x2p to be used in place of x2  1 in the computation of f and λ. For each width w, when y2 ¼ w in the first outer cycle, each Λ3(l) is true if one between f0(l, y) or fv(l, y) is positive for at least one y, yry2. Indeed, the condition Λ3(l)¼ true is necessary to have a positive value of one or more λ(l, y) according to the following: Proposition 3. f0(l, y) ¼fv,0(l, y) ¼ 0 8 yA {1,…, y2} ) λ0(l, y) ¼0 8 yA {1,…, y2}

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

The proof is equivalent to the one of Proposition 2, and also in this case fv,0 and λ0 can be replaced by fv and λ respectively, regardless of the replication. By exploiting Proposition 3, for each value x2 the nested code is not performed if Λ3(x2)¼false. Moreover, denoting the set {1,…, y2} by Y2 and a generic subset of Y2 by S, a more general proposition can be easily derived: Proposition 4. f0(l, y) ¼ fv(l, y) ¼ 0 8 yA Y2  S and 8 yA S ) λ(l, y) ¼0 8 yA Y2

λ(l, y) ¼ 0

The same proof by induction presented for Proposition 2 can be repeated to show that λ(l, y) is zero for yA Y2  S. Proposition 4 is useful in the following special case. When y2 ¼w, x2 ¼l and Λ3(l)¼true, it is possible to have λ(l, w) ¼0 with Λ1(l) still false. This means that no λ(l, y) with yrw is positive. Moreover no sum with vertical cut could determine a positive value for fv(l, y) with y4w since the outer cycle over the widths is performed considering increasing values of y2. In this case we can break the operations and go to the successive step of the outer cycle over the lengths (x2 ¼l þ1) as in the case Λ3(l) is false, without updating x2p. Finally, based on Proposition 4 with y2 4w and S ¼{1,…, w}, Λ3(l) is switched back to false. It will return true if, in a successive step of the outer cycles with y2 4w, we have a positive value of f0(l, y2) or fv(l, y2). This does not apply to the first outer cycle, where the setting back Ω3(w)¼false has no effect. The last consideration concerns the situation when Λ3(l)¼true and Λ1(l) changes from false to true. In this case for all couples (l, y), yoy2 (y2 ¼w), the second outer cycle step with x2 ¼l has not been performed for two reasons: in some cases since Λ3(l) was false and in other cases since, even if Λ3(l) was true, we had λ(l, y) ¼0 and Λ1(l) still false, as explained above. Hence all ω(l, y) values and several λ(l, y) and f(l, y) values are unset for yow. To overcome this occurrence we have to set the λ(l, y), ω(l, y) and f(l, y) values, yow, to 0, ω(x2p, y) and f(x2p, y) respectively. Moreover all the y values equal to the corresponding ω(l, y) have to be added in the vector Ω2(l). To this aim it is sufficient to replicate the y values currently contained in Ω2(x2p), so adding them to Ω2(l). Obviously they are not greater than w. It is easy to prove that ω(l, w) will be equal to w and so y2 (y2 ¼w) is added to Ω2(l) in any case. Note that since the λ(l, y) values are zero, no vector Λ2(y) is modified. These updates represent a replication, referred to as “recovery replication”, between the columns x2p and x2 ¼l. We replicate only the values related to y coordinates for which Ω1(y)¼ true.

4. The revised RS_kf algorithm In this section we present the revised RS_kf algorithm, referred to as RS_kf2, where we introduced all the improvements proposed in Section 3. We explain the pseudocode without taking into account the raster points and then we briefly discuss the details of their integration. This section concludes with the proposal of further memory reduction refinements. 4.1. The pseudocode Algorithm 2 contains the pseudocode of RS_kf2 with comments for the new code lines with respect to Algorithm 1. We describe the main implementation issues following the sequence of the modifications used in Section 3. A first note is related to x1,max and y1,max which are computed one-time at lines 34 and 43 so reducing the number of operations. Similarly for the difference W  y2 at line 12 (y1,bound).

103

Concerning f0, we compute the value just for the couple of coordinates (x2, y2) under investigation (line 15). This value is equal to πi if y2 corresponds to the current wj and x2 corresponds to the current lj,k, where i ¼pj,k is the k-th element of Tj, i.e. the index of the piece type with dimensions (x2, y2)  (wj, lj,k). Otherwise f0(x2, y2) ¼0. Note that at line 10 all the y-coordinate values wj of the list Z, defined in (11), are taken into account, since Ω3(wj) ¼true (line 5) so satisfying the check at line 8. The same occurs when y2 ¼wj at line 15 for the x-coordinate values lj,k of the list Uj, defined in (12), according to the settings at line 11. This strategy allows to avoid the allocation of the matrix f0. The replications of λ and fv values (λ reduction) at line 17 and of ω and fh (ω reduction) at line 28 are performed after the evaluation of f(x2, y2), since they do not affect this value. In the following we focus on the couples Λ1 and Ω1, Λ2 and Ω2, Λ3 and Ω3. The update of Λ1(x2) (line 25) is performed before the ω-settings (lines 28–32), since in this way if Λ1(x2) is false then we jump directly to the successive x2 value (line 27) without performing further computation. If instead Ω1(y2) is still false after the line 32 (hence ω(x2, y2)¼0), then both inner cycles are rapidly avoided, being Ω1(y2) ¼false (line 33) and y1,max ¼0 (line 43). Concerning Λ2 we associate a logic size vector, denoted by v2, to it. Each v2(w) represents the number of coordinates added to the corresponding vector Λ2(w). Code lines related to their usage and to the reduction of the inner cycle over the lengths are 4, 20 and 35–36. In particular the lines 35 and 36 substituted the cycle on x1 between 1 and x1,max of the RS_kf algorithm. In the same way, a vector h2 is associated to Ω2 and the lines 4, 23–24 (recovery replication), 31 and 44–45 are related to their usage and to the reduction of the inner cycle over the widths. About recovery replication, we underline the decrease of h2(x2) at line 23 in order to drop y2, which is then surely added at line 31. The lines related to the vector Λ3 and the second outer cycle reduction are 5, 9, 11, 14, 22–24 (recovery replication), 27 and 51, besides the update of Λ3 at line 38, whereas the lines for Ω3 and the first outer cycle reduction are 5, 6, 8 and 53, besides the update of Ω3 at line 47. The raster points are integrated exploring the elements of the vectors Nx and Ny defined in (6), instead of all coordinates. The first one has to be scanned at line 13 (outer cycle over the lengths) and the second at line 7 (outer cycle over the widths). The vectors IRx and IRy defined in (7) are useful when, at lines 37–38 and 46–47 respectively, we have to derive the index of the raster point associated to the sum x1 þx2 or y1 þy2 respectively. Moreover the lists Z, Uj and Tj, used for the f0 computation, have to be built according to the dimensions of the extended rectangles. In the reminder of the section, we define several parameters, later exploited in the experimental tests, in order to estimate the effectiveness of the reduction strategies. First, let cΛ and cΩ be the number of x and y coordinates for which the algorithm ends with Λ1(x)¼ Λ3(x)¼true and Ω1(y)¼ true respectively, and c0 Ω the number of coordinates y such that Ω3(y) ¼true. The relation c0 Ω ZcΩ is valid. The required memory depends on cΛ and cΩ, as will be explained in the next section. The computation time increases with the product cΛ  c0 Ω, later referred to as c0 , but the number of couples (x2, y2), for which the λ and ω settings are performed, can be different from c0 . Moreover we define as cin the total number of coordinates to scan in the inner cycles (in case raster points are not used, cin is the sum of all the values x1,max and y1,max when the lines 34 and 43 are performed). Conversely, we denote by s the number of computed sums (lines 37 and 46) and by uf the number of fv or fh updates (lines 38, 47). The following inequalities are valid: cin Zs Zuf.

104

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

Hence, being snr the number of sums that we would compute without λ and ω reduction, the effectiveness of this strategy is determined by the decrease of s with respect to snr. The effectiveness of the inner cycles reduction strategy grows with the ratio between cin and s. Concerning instead the outer cycles reduction, the effectiveness of this strategy increases for values of cΛ and cΩ much lower than |Rx| and |Ry| respectively. The number of updates uf is an estimator of the cost of this reduction strategy since each update involves one value of the two vectors Λ3 and Ω3. We conclude remarking that, even if the effect can be negligible in some cases, each reduction strategy affects also the parameters presented as related to the other strategies. For example the λ and ω reduction does not directly affect the positivity of the values of λ and ω, but it affects the values in Ω3 and this could provide a decrease of c0 Ω (but no effect on cΛ and cΩ). Input : (L, W), (li, wi,

πi) for i¼1, …, n;

Output : f(l, w), λ(l, w), ω(l, w) for 0 rl rL, 0 rwr W 1 Compute Z (type widths wj) sorted list, Uj (type lengths) sorted lists, Tj index lists; Set j¼0 //f0 computation 2 Set fv(l, w) ¼fh(l, w)¼ 0 for 1 rlr L , 1 rw rW ; f(l, 0) ¼0 for 1 rl rL, f(0, w) ¼0 for 1 rwrW 3 Set Λ1(l)¼ false for 1r lr L; Ω1(w)¼ false for 1r wrW 4 Set v2(w)¼0 (logic size of Λ2(w)) for 1 rwr W ; h2(l) ¼0

(logic size of Ω2(l)) for 1 rl rL //inner cycles reduction 5 Set Λ3(l)¼ false for 1r lr L; Ω3(w)¼ (wA Z) for 1r wrW //outer cycles reduction 6 Set y2p ¼ 0 //first outer cycle reduction // ooOuter cycles 44

7 For y2 ¼1 To W Do 8 If Ω3(y2) ¼true Then //first outer cycle reduction 9 Set x2p ¼0 //second outer cycle reduction 10 Set y0 ¼false ; If y2 ¼wj þ 1 Then Set y0 ¼ true; j¼j þ1; k ¼1 //f0 computation For i¼1 To oj  |Uj| Do Set Λ3(lj,i) ¼true End For End If //second outer cycle reduction 12 Set y1,bound ¼W  y2 //one-time computation 13 For x2 ¼1 To L Do 14 If Λ3(x2)¼ true Then //second outer cycle reduction 11

15 16

Set f0 ¼0 ; If y0 ¼true and x2 ¼lj,k Then Set i¼pj,k , f0 ¼ πi, k¼ kþ 1 End If //f0 computation Set f(x2, y2)¼ max{ f0 , fv(x2, y2) , fh(x2, y2), f(x2p, y2), f(x2, y2p)}

// oo λ-settings 44 17 18 19 20

If fv(x2, y2p)4 fv(x2, y2) Then Set fv(x2, y2)¼ fv(x2, y2p), λ(x2, y2)¼ λ(x2, y2p) End If //replication (λ reduction) If f(x2, y2)¼f(x2p, y2) Then Set λ(x2, y2) ¼0

Else-if f(x2, y2)4 fv(x2, y2) Then Set λ(x2, y2) ¼x2 ; v2(y2)¼ v2(y2) þ1 ,

Λ2(y2)[v2(y2)]¼x2 End If //inner length cycle reduction

21 22 23

If λ(x2, y2)40 and Λ1(x2)¼ false Then // oorecovery replication (second outer cycle reduction) 44

Foreach y (oy2) with Ω1(y)¼true Do Set f(x2, y)¼f(x2p, y);

λ(x2, y) ¼0; Set h2(x2)¼ h2(x2p);

ω(x2, y)¼ ω(x2p, y) End For

If h2(x2p) 40 and Ω2(x2p)[h2(x2p)]¼y2 Then Set h2(x2)¼h2(x2)  1 End If //inner width-cycle reduction 24

For i¼1 To h2(x2) Do Set

Set Λ1(x2)¼ true 26 End If

25

Ω2(x2)[i] ¼ Ω2(x2p)[i] End For

27 If Λ1(x2)¼false Then Set Λ3(x2) ¼false; Continue For x2 cycle End If //second outer cycle reduction // oo ω-settings 44 28 If fh(x2p, y2)4 fh(x2, y2) Then Set fh(x2, y2) ¼fh(x2p, y2);

ω(x2, y2) ¼ ω(x2p, y2) End If // replication (ω reduction) 29 If f(x2, y2)¼ f(x2, y2p) Then Set ω(x2, y2)¼0 30 Else-if f(x2, y2) 4fh(x2, y2) Then Set ω(x2, y2) ¼y2; 31

h2(x2) ¼h2(x2)þ1 ;

Ω2(x2)[h2(x2)]¼ y2 End If //inner

width cycle reduction 32 If ω(x2, y2)4 0 Then Set Ω1(y2) ¼true End If // oo Inner cycles 44 33 If Ω1(y2)¼ true Then 34 Set x1,max ¼ min{λ(x2, y2), L  x2} // one-time computation 35 For i¼1 To v2(y2) To Do //inner length cycle reduction 36 If x1  Λ2(y2)[i] 4x1,max Then Break For i cycle End If 37 If sum  f(x1, y2) þ f(x2, y2) Z fv(x1 þ x2, y2) Then 38 Set fv(x1 þx2, y2)¼sum ; λ(x1 þx2, y2)¼ x1 ;

Λ3(x1 þx2)¼true //second outer cycle reduction 39 End If 40 End For 41 End If 42 If Λ1(x2)¼true Then 43 Set y1,max ¼ min{ω(x2, y2), y1,bound} // one-time computation

44 45 46 47

For i¼1 To h2(x2) To Do //inner width cycle reduction If y1  Ω2(x2)[i] 4y1,max Then Break For i cycle End If If sum  f(x2, y1) þf(x2, y2) Z fh(x2, y1 þy2) Then

Set fh(x2, y1 þ y2)¼sum ; ω(x2, y1 þ y2)¼y1; Ω3(y1 þy2)¼ true //first outer cycle reduction End If End For End If Set x2p ¼x2 //second outer cycle reduction End If End For //for x2

48 49 50 51 52 53 If Ω1(y2)¼ true Then Set y2p ¼y2 End If //first outer cycle reduction 54 End If End For //for y2 Algorithm 2. RS_kf2.

4.2. Memory reduction In this section we discuss some computer programming tricks we integrated in the RS_kf2 algorithm obtaining a memory saving of at least 71%. Also in this case, for the sake of simplicity, we discuss them without taking into account the raster points. The algorithm uses eight matrices: f0, f, λ, fv, ω, fh, Λ2 and Ω2. The allocation of f0 has been avoided as discussed above. For the matrices λ, ω, Λ2 and Ω2 whose elements are coordinate values, we propose to use just two bytes per value, since no instance in literature and in practical problems involves rectangle dimensions greater than 65,535 (i.e. 216  1). Moreover, concerning Ω2, taking into account the condition Ch,3 (i.e. y1 r ω(x2, y2)), we can use half the required memory since each column Ω2(x2) can have maximum size equal to W/2. Indeed, in the inner cycle over the widths, being y1,max ¼ min{ω(x2, y2), W y2}, if y2 rW/2 then y1 r ω(x2, y2)ry2 rW/2, whereas if y2 4W/2 then y1 rW y2 rW/2. A similar discussion can be made for Λ2. In the inner cycle over the lengths we access to both λ and fv matrices with a fixed width value (w¼ y2) and a variable length value l ¼x1 þx2 (lines 37–38). In the inner cycle over the widths the opposite occurs for the matrices ω and fh (lines 46–47). Consequently we implemented the matrices λ and fv by rows and in each inner cycle step we access to the values fv(l, y2) in the row

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

corresponding to the current width y2. On the contrary, the matrices ω and fh are implemented by columns. Also the matrices Λ2 and Ω2 are naturally managed by rows and columns respectively. Concerning matrix f, we access it with fixed width or fixed length in the inner cycle over the lengths and over the widths, respectively. This aspect will be discussed below. The management of fv and Λ2 by rows can be exploited in order to strongly reduce the memory. In particular we note that for each w, the two rows fv(w) and Λ2(w) are useful only in the first outer cycle step associated to y2 ¼w. Hence it is useless to allocate these two matrices and we just allocate the memory space for one row of fv and one row of Λ2 used for the current value of y2. This idea cannot be applied to λ because it is used to backtrack and generate the cutting scheme of the optimal solution. Moreover, note that the algorithm uses also the values fv(x2, y2p) (line 17), hence for fv the allocation of an additional row, associated to y2p, is required. Because of the elimination of fv, we choose to implement the matrix f by columns and to allocate a shared memory space for f and fh. Indeed, for each computed value f(x2, y2) at line 16, the corresponding value fh(x2, y2) is no more useful, except at lines 28 and 30 in the same outer cycle step. Therefore we use a single temporary variable fnew where the value computed for f(x2, y2) is stored and we perform the assignment f(x2, y2)¼fnew after the line 30. Moreover, at line 28 the value of fh(x2p, y2) is needed. To this aim we use a further variable fh,p where we store the value of fh(x2, y2), obtained after the line 28, so making it available for the next significant value of x2. We adopt a further refinement in order to improve the algorithm. In particular, for the inner cycle over the lengths we allocate a row vector f2 (with size L/2) associated to the row of Λ2, where we store the list of f(x, y2) values corresponding to the lengths x contained in Λ2. In such a way, we scan consecutive elements of f2, instead of accessing different columns of f. In the inner cycle over the widths this is not necessary because f is managed by columns. Summarizing, instead of allocating eight initial matrices, we just allocate 2.25 equivalent matrices: f with 4 bytes per value, λ and ω with 2 bytes per value (half memory) and half of Ω2 with 2 bytes per value (a quarter memory). This determines a memory saving of 71%. This percentage increases if we exploit the vectors Λ1, Ω1, Λ3 and Ω3 and related conditions (lines 8, 14, 21, 27). Indeed we allocate each row λ(y2) just after positively verifying the condition at line 8 for Ω3(y2) and the columns f(x2), ω(x2) and Ω2(x2) just when Λ3(x2)¼ Λ1(x2)¼true before line 22. Moreover, if Ω3(y2)¼true but Ω1(y2)¼ false, we do not update y2p, and the row allocated for λ can be saved and reused for next y2 value such that Ω3(y2)¼true. Hence the total required memory is proportional to 1.75  cΛ  |Ry|þ0.5  cΩ  |Rx|.

5. Computational experiments In this section we report the computational results obtained on large and very large size U2DCP instances present in literature. With reference to Russo et al. [16], we consider the six large weighted and unweighted instances with the highest computation time. The six weighted instances are APT20, APT22, APT24, APT28, APT29 and W4. These instances belong to the OR library PackLib2 by Fekete and van der Veen [8]. The six unweighted instances are B7 by Cui et al. [5], gcut16, gcut16r, gcut17 and gcut17r by Cintra et al. [4] and U4 by PackLib2. Besides these instances, we tackle eight very large size instances introduced by Hifi [11], four weighted (LW1–LW4) and four unweighted (LU1–LU4). The author presented solutions for a problem variant with additional constraints and allowed 901 piece rotation. Concerning the general U2DCP we are treating, Kang and Yoon [13] found the optimum just for LW1. Birgin et al. [2] provided heuristic solutions for all these instances. In the following we show

105

that the RS_kf2 algorithm optimally solves all the instances and only for LU4 the heuristic solution by Birgin et al. [2] is optimal. This section is structured in four subsections. In first subsection we detail the results obtained by the above described two-step mixed pre-processing (2SM-PP) and we compare it with preprocessing strategies from the literature. In the second and third subsections we focus on the results of the RS_kf2 algorithm obtained on weighted and unweighted instances respectively. In last subsection we study the effect of the pre-processing on the performance of our algorithm. The experimentation has been run on an Intel 4-Core, i5-2410M, 2.30 GHz, 6Gbytes RAM, Windows 7 Home Premium 64-bit Laptop and the code has been developed in Cþ þ and compiled in dev-Cþ þ 5.3.0.5 with the highest optimization options. 5.1. Pre-processing In order to test the performance of the 2SM-PP, we compared it with four pre-processing strategies. The first two strategies are the ones used by Russo et al. [16] and by Kang and Yoon [13], denoted by RS1-PP and KY-PP respectively. The first is equivalent to the first step of the 2SM-PP whereas the second corresponds to one iteration of the second step of the 2SM-PP, when all pieces are considered as large, i.e. only homogeneous solutions are searched. The other two strategies are directly derived from the 2SM-PP. The first, referred to as single-step mixed pre-processing (1SM-PP) implements just the second step with the distinction between small and large piece types. Hence it does not exploit the fast reduction of the first step but, conversely, it has more chances to repeat the second step, despite γ 40, because it starts with n0 ¼n. However, the extended rectangles can be smaller. The second variant, referred to as two-step exact pre-processing (2SE-PP) implements both steps but considers all piece types as small and ignores γ, i.e. it always searches heterogeneous solutions and cycles until no further piece type is removed. In Table 1 for each instance we report: the number of piece types n, the number of large piece types nL, the ratio between the maximum and minimum piece areas Amax/Amin, the number of piece types removed by each strategy (columns nOFF) and the related computation time (columns t). The five strategies are presented in increasing order of the number of removed piece types. In bold we highlight the cases when a strategy outperforms the previous one. We can observe that the two groups of six large size instances are characterized by a high number of large piece types. Indeed just the three unweighted instances gcut16r, gcut17r and U4 present a percentage of large piece types slightly lower than 50%. This means that the strategies based on the classification between large and small pieces are less effective and for this reason they do not determine an increase in the number of removed pieces. For five instances, one weighted (W4) and four unweighted (B7, gcut17, gcut17r, U4), the number of removed pieces is the same for all strategies, and it is zero for B7 and U4. For the other seven large instances, the 2SE-PP removes more piece types than the other strategies, from one on APT24, APT28 and gcut16 to four on APT22, but it spends a higher computation time on all instances except W4, for which the first step leaves just 10 piece types. Concerning the eight very large size instances, the 2SM-PP provides an increase in the number of removed pieces for all the weighted instances (in particular twenty-six more removals on LW4), and for two unweighted instances, with 25 removals on LU4. The 1SM-PP never provides an increase in the number of removed pieces, but in some cases it requires a higher computation time, with particular reference to the weighted instances LW3 and LW4.

106

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

Table 1 Analysis of the pre-processing strategies performance. Instance

n

nL

Amax/Amin

RS1-PP nOFF

Weighted large instances APT20 45 APT22 52 APT24 44 APT28 54 APT29 59 W4 80

KY-PP t (s)

nOFF

2SM-PP t (s)

nOFF

30 43 29 41 51 61

18.7 32.7 18.3 29.8 17.3 42.0

 21  28  16  23  23  70

0.0 0.0 0.0 0.0 0.0 0.0

 21  28  16  23  23  70

0.0 0.0 0.0 0.016 0.0 0.015

 21  28  16  23  23  70

Weighted very large instances B7 180 179 gcut16 62 42 gcut16r 124 56 gcut17 82 67 gcut17r 164 78 U4 40 19

11.5 21.2 21.2 21.2 21.2 24.9

0 1 2 1 2 0

0.0 0.0 0.0 0.0 0.0 0.0

0 1 2 1 2 0

0.015 0.0 0.015 0.0 0.016 0.0

0 1 2 1 2 0

Unweighted large instances LW1 100 LW2 100 LW3 150 LW4 200

4845 5002 5084 35,434

 73  80  112  142

0.0 0.0 0.0 0.0

 73  80  112  142

0.031 0.031 0.031 0.047

4845 5002 5084 35,434

3 2 2 0

0.0 0.0 0.0 0.0

3 2 2 0

0.031 0.047 0.047 0.047

7 3 3 4

Unweighted very large instances LU1 100 7 LU2 100 3 LU3 150 3 LU4 200 4

The 2SE-PP removes more pieces on all unweighted instances, from one (on LU2) to five (on LU3), but the computation time required for the heterogeneous solutions significantly increases.

5.2. Weighted instances In this section we show the results obtained by the RS_kf2 algorithm for the 10 weighted instances. In Table 2 we present the values of the parameters introduced in Section 4.1 to estimate the effectiveness of the reduction strategies proposed in Section 3.4 and in Section 3.5. We report the values |Rx|, |Ry|, cΛ, cΩ and the ratio between c0 ¼cΛ  c0 Ω and |Rx|  |Ry| (column c0 /R %). Moreover we present the values cin, s and uf, and the percentage decrease of s with reference to snr (column Δs%). These values are referred to the full RS_kf2 algorithm, i.e. when all strategies are used. Concerning instead the computation time, we consider several variants of RS_kf2, identifying them by the subset of the used reduction strategies. All variants exploit the raster points, the pre-processing and the memory reduction. We report the impact of each reduction strategy in terms of computation time percentage in case of single or combined usage (columns single and all, respectively). In particular, in the first case the impact is measured starting from the variant with no reduction strategy, whereas in the second case starting from the variant using only the other two strategies. When the increase/decrease is lower than 20 ms or less than 1% then we use the symbol “¼”. The actual computation time of the best variants is reported later in Table 3. The decrease of significant coordinates with respect to the raster points is highlighted by c0 /R. This decrease is due to the outer cycles reduction and is relevant to the APT instances (from 23% to 43%) and to W4 (92%). On the LW instances the phenomenon is less appreciable (less than 25% reduction on LW1 and LW2, more than 10% on LW3 and just 1.6% on LW4). This is due to the presence of very small piece types for which it is less frequent to have λ and ω values equal to zero.

1SM-PP t (s)

nOFF

2SE-PP t (s)

nOFF

t (s)

 21  28  16  23  23  70

0.0 0.0 0.0 0.015 0.0 0.016

 23  32  17  24  26  70

0.031 0.015 0.015 0.031 0.031 0.016

0.015 0.015 0.015 0.015 0.016 0.016

0 1 2 1 2 0

0.015 0.016 0.031 0.015 0.015 0.031

0 2 4 1 2 0

0.109 0.047 0.093 0.047 0.078 0.047

 79  85  121  168

0.046 0.047 0.062 0.358

 79  85  121  168

0.093 0.078 0.297 2.465

 79  85  121  168

0.047 0.047 0.047 0.375

3 2 3  25

0.140 0.218 1.326 13.17

3 2 3  25

0.156 0.218 1.326 13.17

5 3 8  29

0.0 0.0 0.0 0.0 0.0 0.0

41.22 60.19 60.20 38.38

The effectiveness of the inner cycles reduction is highlighted by the ratio between cin and s. This ratio is about 8 for the APT instances, 16 for W4, between 6 and 7 for LW1, LW2 and LW3, and 3 for LW4. Concerning instead the λ and ω reduction, the effectiveness in terms of Δs% is around 10% for the six large instances and it is not higher than 5% for the four very large instances. It is interesting to note that the ratio between the number s of sums and the number uf of updates is around 2 for the six large instances, whereas for the four very large instances it significantly increases, between 5 and 6, and almost 25 for LW4. Conversely, the parameter c0 /R has a similar trend but in opposite direction, and this means that on the LW instances the outer cycles reduction requires a lower computational cost (parameter uf) but it is less effective. Concerning the impact of the reduction strategies on the computation time for the six large instances we have to consider that it is lower than 0.2 s for most of the cases. The inner cycles reduction has the greatest impact on the five APT instances, whereas the outer cycles reduction has the highest impact on W4, according to the low values of cΛ and cΩ. Considering the LW instances, the inner cycles reduction is the best strategy, since it provides a decrease between 52% and 72%, but also the λ and ω reduction determines, if used alone, a decrease of the computation time, between 10% and 14%. The outer cycles reduction instead, if used alone, provides an increase of the computation time for LW1 and LW3 and a decrease for LW2 and LW4. In both cases the percentage increase/decrease is not higher than 10%. When all strategies are combined, an increase of the computation time occurs. Indeed, after that the inner cycles reduction provides a smaller computation time, then the positive effects of the other two reduction strategies are overcome by the computational effort due to the checks. However the inner cycles reduction provides also an increase in the memory requirements for the matrix Ω2. We conclude that for large instances it is useful to combine the three strategies whereas for the very large instances we have a

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

107

Table 2 Analysis of the reduction strategies on weighted instances. Instance

(|Rx|, |Ry|)

c0 /R %

(cΛ, cΩ)

cin

Δs%

s

Impact factor Δt%

uf

λ, ω

Inner cycles

Outer cycles

Single

All

Single

All

Single

All

Weighted large instances APT20 (1702, 1318) APT22 (1593, 1290) APT24 (1178, 1883) APT28 (1154, 1968) APT29 (1141, 1675) W4 (3006, 3367)

(1254, 485) (1085, 728) (925, 1055) (810, 1456) (663, 972) (643, 194)

 43%  38%  23%  31%  43%  92%

27.6  106 36.4  106 49.4  106 61.5  106 25.1  106 10.5  106

3.24  106 4.23  106 5.62  106 7.35  106 2.66  106 0.64  106

 12%  13%  12%  14%  9.3%  12%

1.40  106 1.63  106 2.38  106 2.75  106 1.26  106 0.38  106

¼ ¼ ¼ ¼ ¼ ¼

¼ ¼ ¼ ¼ ¼ ¼

¼  29%  43%  37%  22% ¼

 22% ¼  33%  28%  38%  44%

¼ ¼ ¼ ¼ ¼  62%

¼ ¼ ¼ ¼  28%  79%

Weighted very large instances LW1 (9608, 11,150) LW2 (11,969, 16,285) LW3 (18,399, 13,547) LW4 (22,458, 17,729)

(7449, 10,647) (9534, 15,387) (16,447, 13,064) (22,104, 17,666)

 23%  21%  11%  1.6%

15.3  109 38.0  109 54.1  109 293  109

2.22  109 5.52  109 8.17  109 101  109

 5.3%  4.4%  2.6%  1.4%

0.45  109 0.88  109 1.36  109 4.32  109

 10%  14%  9.9%  13%

þ 1.2% þ 5.7% þ 1.9% ¼

 52%  58%  58%  72%

 44%  44%  48%  65%

þ9.6%  1.3% þ4.2%  10%

þ 13% þ 13% þ 15% þ 2.8%

Table 3 Results for weighted instances. Instance

n

(L, W)

RS_kf

RS_kf2

t (s)

t (s) VRall

Weighted large instances APT20 45 (2840, 2858) APT22 52 (2711, 2110) APT24 44 (2070, 2729) APT28 54 (2020, 2796) APT29 59 (1839, 2829) W4 80 (7500, 7381)

0.296 0.250 0.328 0.375 0.218 0.795

0.109 0.110 0.125 0.156 0.078 0.078

Δt%

Optimum Memory (Mb) VRi

0.125 0.110 0.125 0.156 0.109 0.344

VRλω,o

0.140 0.125 0.187 0.218 0.125 0.140

VRall

14 13 16 15 11 18

VRi

21 20 21 21 18 94

t (s) VRλω,o

12 12 14 13 10 17

5.545.818 4.145.317 3.948.037 4.065.011 3.652.858 377.910

BLM

KY

/RS_kf

80.18 50.09 48.55 51.89 43.52 19.44

1.031 0.485 0.625 0.078 0.063 0.000

 63.2%  56.0%  61.9%  58.4%  64.2%  90.2%

BLM

Weighted very large LW1 100 LW2 100 LW3 150 LW4 200

instances (20,789, 23,681) (25,587, 34,563) (37,587, 27,563) (45,237, 35,983)

– – – –

32.32 65.61 100.0 476.6

27.58 58.53 84.69 435.9

57.60 123.3 191.8 1361

trade-off between the minimization of the computation time and of the memory requirements. For the first target, it is convenient to use just the inner cycles reduction, for the second it is useful to combine only the other two strategies, so achieving effects on the required memory either for the matrix Ω2 or for the lower value of c0 /R. On the basis of the results of Table 2 and for the sake of brevity, we focus just on three variants of RS_kf2 whose details are reported in Table 3:

 VRall: all reduction strategies are used  VRi: only inner cycles reduction  VRλω,o: only the other two reduction strategies (λ, ω/outer cycles) In Table 3 we report for each instance: the number of piece types n; the sheet dimensions (L, W); the computation time (in seconds) of RS_kf; the computation time and the allocated memory (in Mbytes) for the three variants of the new RS_kf2 algorithm; the optimal solution. Then for the first six large instances, optimally solved by all the

754 1403 1967 3382

925 1691 2163 3429

696 1255 1742 3009

171.257.398 325.051.375 430.708.158 566.205.113

t (s)

Heuristic

/BLM

7868 6986 11,801 10,651

171.245.481 325.011.119 430.706.636 566.102.201

 99.65%  99.16%  99.28%  95.9%

algorithms, we report the computation time of the algorithm by Birgin et al. [2] (referred to as BLM) and the exact algorithm by Kang and Yoon [13] (referred to as KY). The last column reports the time percentage decrease (Δt%) between the best variant of RS_kf2 and RS_kf. Instead, for the four very large size instances, we compare just with BLM for which we report the computation time, the heuristic solution and the time percentage decrease (Δt%). This choice is due to the fact that RS_kf and KY (with the only exception of LW1 discussed below) are not able to solve them. The RS_kf algorithm provides the optimal solution for the large size instances. Concerning instead the very large size instances, it does not provide any result because of memory allocation problems. Note also that, with respect to the paper by Russo et al. [16], we obtain reduced computation times for RS_kf by compiling the code and running it on the new system. The computation times for BLM have been obtained on the same system of RS_kf2 using a compiled version of the algorithm for Windows Operating System provided by the authors. Indeed we obtain a computation time reduction between 25% and 35%

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

(17760, 23048) "B8" [16 ; 67]

(3016, 2676) "B66"[4 ; 6] B8 : 133.644 ; B66 : 114.337 ; B67 : 93.233 B81 : 11.120 ; B84 : 3.441

(37076, 27132) "B9" [62 ; 51]

(24948, 34408) "B27" [33 ; 46]

(598, 532) "B9"

(24960, 124) "B84" [195]

B9 : 95.441 ; B27 : 209.231 ; B84 : 3.968 B92 : 172.882

(43988, 160) "B1" [1571 ; 2]

(1232, 80) "B1" [44] (796, 106) "B183" (876, 116) "B74" [2]

(458, 27544) "B149" [;44]

(36708, 394) "B135" [69]

(298, 132) "B126" (420, 266) "B138" [2]

optimal. At the best of our knowledge, the instances LW2, LW3 and LW4 were never solved before, whereas LW1 was already solved by Kang and Yoon [13] on a Windows Vista 64-bit, Pentium 4 Core2 Duo, 3 GHz CPU, 4 GB RAM with a computation time of 1395.8 s, 50 times higher than the 27.58 s of the proposed RS_kf2 algorithm. The three variants of RS_kf2 require decreasing memory with the increase of the computation time. For the LW instances, the variants VRi and VRλω,o require the highest and the least amount of memory respectively. The variant VRall presents a percentage increase of the computation time similar to the percentage reduction of the memory requirements, whereas for the variant VRλω,o the computation time increase is more significant. In Fig. 2 we show the optimal solutions for the four LW instances. We used macro-rectangles (later referred to as blocks), each of them containing an homogeneous dissection with just one piece type. For each of them we report the dimensions (l, w), the piece type name and, in square brackets, the number of pieces. If the piece type is replicated just horizontally, then we use just a single value to indicate the number of copies (as for the 195 pieces of type “B84” in LW2) whereas if we have just vertical replications, then we use the notation “  m” (as for the 44 pieces of type “B149” in LW3). For the sake of clarity we did not represent the cuts to be performed on the sheet and the dimensions of the blocks are not in scale, but the sequences in the cutting schemes which can be derived by the pictures are not changed. For each

(586, 33972) "B92" [;38]

(128, 128) "B84" (236, 496) "B81" [;2]

(754, 20962) "B66" [;47]

(17518, 632) "B67" [19 ; 2]

(2220, 20984) "B8" [2 ; 61]

with respect to the ones reported in Birgin et al. [2], except for LW4 (10%). The computation times for KY are derived from the paper, and obtained on a 32-bit Pentium 4 Core2 Duo, 3 GHz CPU, 2 GB RAM, except for LW1. From the observation of Table 3 we can note a significant decrease of the computation time with respect to RS_kf. The reduction factor is around 2 and 3 for the APT instances and about 10 for the instance W4 with the variant VRall, which presents the best results. We highlight that for the instance W4 the BLM algorithm finds the optimal solution in the first step, i.e. a twostage solution, just in 2.68 s, but it provides the optimality proof only after the second step. The variant VRλω,o is the one characterized by the minimum amount of required memory since it does not use the matrix Ω2. The variant VRall allocates more memory, but it is the fastest. The variant VRi allocates much more memory since it does not exploit the low values of cΛ and cΩ, hence it is the slowest on W4. With respect to KY, the RS_kf2 algorithm is faster on the three instances APT20, APT22 and APT24, whereas the opposite occurs for APT28, APT29 and W4, i.e. the instances with smaller computation time. We point out that RS_kf2 provides also the optimum for all subrectangles whereas KY does not. Concerning the very large instances, the variant VRi is the fastest, with a computation time decrease of 95.9% on LW4 with respect to BLM and higher than 99% for the other three instances. In terms of profits, no solution presented by Birgin et al. [2] was

B9 : 133.617 ; B126 : 9.834 ; B135 : 54.498 B138 : 12.848 ; B149 : 100.348

(224, 80) "B1" [8] (140, 80) "B1" [5]

(1026, 764) "B110"

(210, 804) (196, 160) "B188" [;3] "B1" [7; 2]

108

(43996, 35814) "B5" [17 ; 141] (1238, 34916) "B109" [;29]

B1 : 448 ; B5 : 230.073 ; B74 : 12.702 B109 : 447.166 ; B110 : 235.159 ; B183 : 18.563 B188 : 11.256

Fig. 2. Optimal solutions of the LW instances. (a) LW1 [1.306 pieces], (b) LW2 [1.752 pieces], (c) LW3 [3.278 pieces] and (d) LW4 [5.646 pieces].

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

109

Table 4 Analysis of the reduction strategies on unweighted instances. Instance

(|Rx|, |Ry|)

ΔcΛ

ΔcΩ

cin

Δs%

s

Impact factor Δt %

uf

λ,ω

Unweighted B7 gcut16 gcut17 gcut16r gcut17r U4

large instances (3306, 1326) (2146, 2614) (2356, 2634) (2700, 2700) (2742, 2742) (4950, 3215)

Unweighted LU1 LU2 LU3 LU4

very large instances (9752, 21,521) (23,367, 32,277) (36,455, 13,583) (22,470, 17,793)

Inner cycles

Outer cycles

Single

All

Single

All

Single

All

0 2 3 2 2  15

0 3 3 2 2  25

2.23  109 2.24  109 2.79  109 4.28  109 4.72  109 4.42  109

1.31  109 1.00  109 1.34  109 2.36  109 2.76  109 1.02  109

 1.5%  4.4%  3.0%  1.9%  1.4%  9.3%

39.3  106 44.1  106 49.4  106 65.5  106 68.5  106 82.6  106

 26%  20%  20%  21%  20%  22%

 1.4%  1.7%  1.1% ¼ ¼  2.6%

 54%  61%  62%  60%  59%  61%

 32%  48%  49%  46%  44%  46%

 28%  19%  20%  22%  22%  14%

 1.4% þ 1.1% ¼ ¼  1.0% þ 5.2%

0 0 0 0

0 0 0 0

161  109 908  109 129  109 60.9  109

83.4  109 402  109 89.0  109 40.6  109

 6.7%  15%  4.0%  7.5%

2.47  109 14.0  109 3.88  109 1.15  109

 20%  7.5% ¼  6.5%

 3.2%  14%  1.4%  2.2%

 58%  38%  20%  14%

 45%  40%  17%  1.1%

 19%  14% ¼ ¼

þ 1.3% þ 3.2% þ 3.2% þ 7.0%

Table 5 Results for unweighted instances. Instance

n

(L, W)

RS_kf

RS_kf2

t (s)

t (s) VRλω,i

Unweighted B7 gcut16 gcut17 gcut16r gcut17r U4

large instances 180 (4000, 2000) 62 (3500, 3500) 82 (3500, 3500) 124 (3500, 3500) 164 (3500, 3500) 40 (7350, 6579)

10.39 14.23 16.79 23.12 24.41 30.19

4.602 4.352 5.429 8.471 9.487 6.567

Δt%

Optimum Memory (Mb) VRi

4.275 4.181 5.117 7.940 8.798 6.442

VRλω,o

6.646 8.471 10.72 15.60 16.82 12.73

VRλω,i-VRi

40 51 57 67 68 146

t (s)

VRλω,o

36 45 50 59 60 129

8.000.000 12.248.836 12.248.892 12.250.000 12.250.000 48.304.289

BLM

KY

/RS_kf

41.36 27.42 87.92 178.5 267.9 1660

0.078 – – – – 55.625

 58.9%  70.6%  69.5%  65.7%  64.0%  78.7%

t (s)

Heuristic

/BLM

4961 6985 9917 94.91

492.278.922 884.341.464 1.035.971.468 1.627.681.752

 93.5%  73.5%  95.6% þ 178%

BLM

Unweighted LU1 LU2 LU3 LU4

very large instances 100 (20,789, 23,681) 100 (25,587, 34,563) 150 (37,587, 27,563) 200 (45,237, 35,983)

– – – –

330.2 1795 451.6 277.7

321.7 2099 430.3 264.3

613.6 3100 565.0 300.4

instance we also report the total number of pieces and the profits of the used piece types. 5.3. Unweighted instances In this section we show the results obtained on the unweighted instances. In Tables 4 and 5 we considered parameters similar to the ones of Tables 2 and 3 respectively. Concerning Table 4, in place of cΛ, cΩ and c0 /R, we consider the gap between cΛ and |Rx| (column ΔcΛ) and the gap between cΩ and |Ry| (column ΔcΩ). This choice is motivated by the fact that on the unweighted instances the outer cycles reduction is less performing, since it is infrequent that λ and ω values are 0. Also in this case we report the actual computation times later in Table 5 and only for the best variants of RS_kf2, since we want first to show the effectiveness of the reduction strategies. In Table 4 we note that the decrease of significant coordinates cΛ and cΩ is null for the instance B7 and for the LU instances, and it is negligible for the other five unweighted instances. Notwithstanding that, the outer cycles reduction strategy could determine a reduced computation time, in particular for the instances B7. This can occur

1806 6489 4297 3443

1606 5769 3788 3061

492.279.580 884.358.706 1.035.972.260 1.627.681.752

when we get some Λ3(x) temporarily equal to false, even if at the end of the algorithm the corresponding Λ1(x) is true. The effectiveness of the inner cycles reduction for the first six instances is not as significant as for the weighted case. Indeed, the ratio between cin and s is about 2 in five cases and higher than 4 just for the instance U4. On the four very large instances the ratio is about 2 for LU1 and LU2 and about 1.5 for LU3 and LU4. Also the effect of the λ and ω reduction on the decrease of the allowed sum number is less significant on the six large unweighted instances, where Δs% varies between 1.4% for gcut17r and 4.4% for gcut16, but 9.3% on U4. The effect on the LU instances is instead more significant and it is 15% for the most difficult instance, i.e. LU2. The ratio between s and uf on the unweighted instances is higher with respect to the weighted case. Indeed it is between 30 and 40 for all the instances, with the only exception of gcut16 and LU3 for which it is about 23, and of U4 for which it is 12. Concerning the impact of the reduction strategies on the computation time, the inner cycles reduction is still the most effective, even if the λ and ω reduction and the outer cycles reduction are less effective ( o1%) just on the instance LU3 and on LU3 and LU4, respectively.

110

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

On the basis of the results of Table 4 and for the sake of brevity, also for the unweighted instances we focus just on three variants of RS_kf2 and report the details in Table 5:

 VRλω,i: λ, ω/inner cycles reduction  VRi: only inner cycles reduction  VRλω,o: λ, ω/outer cycles reduction For each unweighted instance we report the same information used in Table 3 for the weighted instances. Concerning the memory allocation, note that the variants VRλω,i and VRi require the same amount of memory. Also in this case we retrieved the computation times for BLM by running it on our system. The reduction with respect to the paper by Birgin et al. [2] is between 25% and 35% for most of instances, but 80% and 90% for gcut16 and gcut17 respectively. They did not test BLM on gcut16r and gcut17r. Concerning KY, the computation time is available just for the instances B7 and U4, and it was unable to solve the LU instances. In this case VRi is the fastest variant on all instances except LU2 for which the memory requirements overcomes the size of 6 GB RAM (i.e. the quantity of available RAM in the system used for the experimentation). Hence an important role is played by the swaps between RAM and hard disk. The computation times of the variant VRλω,i are similar to the ones of VRi, whereas the variant VRλω,o needs the double on LU1, U4 and gcut instances. With respect to KY, the RS_kf2 algorithm is faster on U4 and slower on B7. Hence, as it occurred on the weighted instances, RS_kf2 is better with respect to KY on the bigger instances. Concerning instance B7, it is important to note that it has a no-waste solution and this situation facilitates KY with respect ;to RS_kf2, which, in any case, computes the optimum for all subrectangles. The computation time reduction factor with respect to RS_kf on the first six instances is between 2 and 3 and it is almost 5 for the instance U4. Concerning the four very large instances, with respect to BLM we achieved a significant decrease of the computation time, around 95% for LU1 and LU3. The reduction for LU2 is still high, about 73.5%, notwithstanding the swaps. About the instance LU4, the BLM algorithm finds the optimum with a two-stage solution after 92 s for initialization and 2.65 s for the heuristic step, and then it concludes in a brief time the second step. However no proof of optimality was provided for BLM, even if the portion of used area is equal to the product between the maximum discretization points on both axis, L  1 and W  1 respectively. For the instances LU1, LU2 and LU3, the BLM algorithm does not provide the optimum, and they were never solved before. Fig. 3 shows the solutions for the four LU instances. For the sake of simplicity we do not show the dimensions of some very small blocks. With respect to Fig. 2, we also report the total number of blocks, the number of waste rectangles, all having one dimension equal to 1, and the dimensions of the pieces present only in very small blocks. 5.4. Results without pre-processing In this section we study the effect of the proposed pre-processing on the computation time and the memory requirements of the RS_kf2 algorithm. The aim is to show the effectiveness of our algorithm regardless of the use of the pre-processing. For the sake of completeness, it is important to recall that the piece type reduction after the pre-processing does not necessarily imply a decrease of the instance difficulty, which is mainly determined by the presence of small pieces with respect to the sheet area, as stated by Parada et al. [15] and Kang and Yoon [13]. Hence, despite the significant reduction of the piece type number, the instances LW1–LW4 can still be considered as very large

instances, as well as the LU1–LU4 instances for which the percentage of removed piece types is small. In the following, we will compare two variants of the RS_kf2 algorithm, with and without pre-processing, in terms of computation time and memory requirements. We will focus just on the eight very large instances whereas for the other ones we give just some hints. Moreover, in order to have a deeper insight into the RS_kf2 algorithm performance, we generated other seven very large size instances. The first two instances, referred to as LW5 and LU5, are directly derived from LW1–LW4 and LU1–LU4 respectively. In particular LW5 and LU5 have all piece types of the instances LW1–LW4 and LU1–LU4 respectively, and their sheets have dimensions equal to the ones of the largest sheet in the related group, i.e. LW4 and LU4 respectively. The other five instances, that will be referred to as LX1-LX5, were instead generated modifying LW1–LW5 in order to make the pre-processing ineffective. This result is obtained iteratively modifying the profit of the smallest piece type ti removable by a heterogeneous subsolution. More precisely, let (li, wi) and (eli, ewi) be the dimensions and the extended rectangle of the piece type ti. We change the profit of ti to f(eli, ewi)þ1, where f(eli, ewi) is the best profit associated to (eli, ewi) before the piece modification. Moreover, if another piece type tj has the same extended rectangle dimensions, then we change also the dimensions of ti to (eli þ1, wi  1). These five instances, even if weighted, present some characteristics which are typical of the unweighted case. Indeed they have several small piece types with similar area and same profit, and several big piece types with similar ratio between the profit and the area. The results of the tested RS_kf2 variants on LW1–LW5 and LU1–LU5 are reported in Table 6. For the sake of clarity we use the abbreviations pp and np to indicate the case with and without preprocessing respectively. The first column identifies the instance. The second column, κ, reports the combinatorial degree defined as ja1 U ð⌊ðL U WÞ=Amin c  1Þ þ a2 U lnðlnðn þ 1ÞÞj, where a1 ¼log10(2) and a2 ¼1/ln(10) (Parada et al. [15]). The columns (|Rx,0|, |Ry,0|) and (|Rx|, |Ry|) contain the number of the raster points before and after the pre-processing respectively (as reported in Tables 2 and 4). Columns from the fifth to the eighth, report the computation time and memory requirements for the variants VRi in the pp and the np case (denoted by pp-VRi and np-VRi) and VRall in the pp and the np case (denoted by pp-VRall and np-VRall). The choice of these two variants is motivated by the fact that VRi was the fastest variant on these instances, according to the results of previous subsections, and VRall can better face the pre-processing absence, since the outer cycles reduction strategy is able to find out most of the useless coordinates. The ninth column of Table 6 is aimed at justifying this statement. Indeed it reports for the np case the number of significant x and y coordinates, denoted by cΛ,0 and cΩ,0. These two values correspond to cΛ and cΩ analyzed in the pp case (Tables 2 and 4). At first we note that, in the np case, each LW instance has the same raster point number of the corresponding LU instance, since they share the piece type dimensions. Moreover for all the LU instances the pre-processing determines no reduction of the raster points and has small effects on the computation time of both variants. This occurs also for LU4 and LU5, despite the removal of 25 pieces and 218 pieces respectively. Hence the 13 s needed for the pre-processing of LU4 determine a higher computation time in the pp case. Concerning instead LU5 the algorithm cannot solve it because of the large amount of memory still required, up to 13 Gigabytes. About the LW instances, the pre-processing has different effects. For LW1 and LW3 the raster point number decreases of about 50% when switching from np to pp, and this reduction is addressed to just one of two axes (y-axis for LW1 and x-axis for

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

111

b

Fig. 3. Optimal solutions of the LU instances. (a) LU1 [6.933 pieces, 42 blocks, 4 wastes], (b) LU2 [27.781 pieces, 69 blocks, 5 wastes], (c) LU3 [11.421 pieces, 28 blocks, 4 wastes] and (d) LU4 [112.464 pieces, 9 blocks, 2 wastes].

LW3). For LW2 and LW5 the halving of the raster point number is on both axes, with a total reduction of about 75%. No appreciable reduction is instead obtained on LW4, despite the drastic reduction in the number of piece types. Concerning the results of the VRi variant on weighted instances, we can note that the memory requirement variations between the np and pp cases reflects the behavior observed for the number of raster points. Moreover, as we expected, the computation time grows when switching from pp to np. The increase for LW1 and LW3 is about 60% and 30% respectively, whereas for LW4 the variation is less than 10 s. A specific discussion is required for LW2 and LW5. Indeed, in the np case, for LW2 we have a significant increase of the computation time and for LW5 we do not determine the optimal solution (f(L, W)¼ 679.579.381, obtained instead with pp-VRi). This is due to the increased number of the raster points discussed above, which affects the memory requirements. Hence the threshold of 6 GB RAM is overcome and the effect is similar to the one described in previous subsection for LU2. Concerning instead the VRall variant, it is easy to note that, contrarily to VRi, it better contrasts the absence of the preprocessing. Indeed, except for LW4, the memory requirements

for np-VRall is lower than the one of np-VRi on the weighted instances. Indeed, transposing the rough relation given at the end of Section 4.2, the required memory is proportional to 1.75  cΛ,0  | Ry,0| þ0.5  cΩ,0  |Rx,0| for np-VRall and to 2.25  |Rx,0|  |Ry,0| for np-VRi. Hence, by the ratio between these two expressions, the reduction factor is 0.78  (cΛ,0/|Rx,0|) þ0.22  (cΩ,0/|Ry,0|) and the biggest effects are on the x-axis, due to the order of the outer cycles. However, even if the values cΛ,0 and cΩ,0 are equal to or very similar to cΛ and cΩ (reported in Tables 2 and 4), the variant np  VRall still requires more memory than pp  VRall, because of the allocation of matrix columns with size |Ry,0|, instead of |Ry|, and matrix rows with size |Rx,0|, instead of |Rx|. About the computation time on the weighted instances, the variant np  VRall, with the only exception of LW4, is always better than the variant np  VRi, because of the usage of all the reduction strategies. In the comparison with pp  VRi and pp-VRall, the computation time of the np  VRall variant is always the highest. However we can note that the gap among the computation times decreases when the instance difficulty increases (from LW1 to LW4) and it is significantly higher only on LW5 because of the memory requirements which overcomes the available 6 GB RAM

112

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

Table 6 Analysis of the pre-processing effects on the LW and LU instances. Instance

κ

(|Rx,0|, |Ry,0|)

(|Rx|, |Ry|)

t(s) & Memory (Mb) VRall

VRi

LW1

9045

(9752, 21,521)

(9608, 11,150)

LW2

16,772

(23,367, 32,277)

(11,969, 16,285)

LW3

19,971

(36,455, 13,583)

(18,399, 13,547)

LW4

21,8752

(22,470, 17,793)

(22,458, 17,729)

LW5

218,752

(44,509, 34,775)

(22,466, 17,769)

(9752, 21,521)

(9752, 21,521)

LU1

9045

LU2

16,773

(23,367, 32,277)

(23,367, 32,277)

LU3

19,971

(36,455, 13,583)

(36,455, 13,583)

LU4

218,752

(22,470, 17,793)

(22,470, 17,793)

LU5

218,752

(44,509, 34,775)

(44,509, 34,775)

(cΛ,0, cΩ,0)

pp

np

pp

np

27.58 925 Mb 58.53 1691 Mb 84.69 2163 Mb 435.9 3429 Mb 430.8 3423 Mb 321.7 1806 Mb 2099 6489 Mb 430.3 4297 Mb 264.3 3443 Mb –

44.54 1805 Mb 832.7 6489 Mb 109.3 4296 Mb 445.1 3439 Mb –

32.32 754 Mb 65.61 1403 Mb 100.0 1967 Mb 476.6 3382 Mb 478.7 6724 Mb 334.6 1806 Mb 1852 6489 Mb 466.2 4297 Mb 297.1 3443 Mb –

38.42 1280 Mb 77.11 2776 Mb 106.2 2426 Mb 486.5 3392 Mb 1141 3434 Mb 338.0 1805 Mb 1858 6488 Mb 471.7 4296 Mb 289.7 3439 Mb –

324.6 1805 Mb 2109 6489 Mb 432.9 4296 Mb 255.4 3439 Mb –

(7505, 10,682) (9630, 15,546) (16,470, 13,082) (22,110, 17,698) (22,409, 17,726) (9752, 21,521) (23,367, 32,277) (36,455, 13,583) (22,470, 17,793) –

Table 7 Results of the RS_kf2 algorithm on the LX instances. Instance

LX1 LX2 LX3 LX4 LX5

n

100 100 150 200 550

(|Rx|, |Ry|)

(9752, 21,645) (23,693, 32,399) (36,455, 26,237) (22,470, 17,793) (44,875, 35,575)

Optimum

171.257.398 325.051.414 430.708.220 566.205.113 –

VRi

VRall

(cΛ, cΩ)

t (s)

Memory (Mb)

t (s)

Memory (Mb)

60.12 1343 2891 726.5 –

1818 6607 8235 3444 –

71.23 382.9 210.2 792.5 –

1565 5342 4353 3414 –

of our system. This confirms the effectiveness of VRall with or without the pre-processing. We do not report the results for the instances APT, gcut, W4, U4 and B7, because the comparison is less significant, given the smaller computation times. We just indicate that for the weighted and unweighted instances the computation times slightly increase (up to 0.05 and 0.5 s respectively) for all np variants, except for W4. For this instance, both the computation time and the required memory for np-VRi increase twice. A different discussion is required for LX1–LX5 because there is no difference between the pp and the np cases. The results of two variants, VRi and VRall, are reported in Table 7. For each instance we provide the same information of Table 6 and we add the column n (piece type number) and the optimal solution value. We do not provide the combinatorial degree, since it is the same of LW1–LW5. Concerning the raster points, the value of |Ry| for the instance LX1 is double with respect to LW1, but cΩ increases just of 50%. For LX2, LX3 and LX5, |Rx| and |Ry| increase by a factor 2, but cΛ and cΩ follow the same trend only on LX2, whereas LX3 presents just an increase of 18% on cΛ. The optimum for the instances LX1 and LX4 is the same of LW1 and LW4 respectively, whereas the ones of LX2 and LX3 is slightly higher with respect to LW2 and LW3 respectively. Also in this case we can note that the variant VRi performs better when the values (|Rx|, |Ry|) are not too high with respect to the available RAM memory (LX1 and LX4). Otherwise the variant VRall is preferable

(8737, 15,972) (18,319, 30,253) (19,430, 13,442) (22,244, 17,745) –

(LX2 and LX3). Concerning LX5, as occurred for LU5, the algorithm is not able to solve it for the huge number of raster points.

6. Conclusions In this paper we tackled the unconstrained two dimensional cutting problem (U2DCP), proposing a new dynamic programming algorithm which enhances the exact algorithm by Russo et al. [16]. In particular the algorithm at first performs an original preprocessing in order to reduce the number of pieces to be taken into account. Then it implements a new definition of the knapsack function and of the λ and ω support functions. Finally several proofed original conditions are introduced in order to speed up the inner and the outer cycles over the widths and over the lengths. These conditions constitute also a significant methodological and theoretical contribution for knapsack function based algorithms. The presented improvements have a twofold aim. Indeed they allow to significantly reduce the memory requirements for the allocation of the large size matrices which are at the base of most of the U2DCP solving approaches and, moreover, they provide a significant decrease of the computational burden. Hence the proposed algorithm significantly outperforms the algorithms present in literature for the U2DCP and determines the optimal solution of seven large size unsolved instances within a reasonable computation time.

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

0 o λ0 ðl; w–1Þ r λ0 ðl; wÞ

Appendix A. proof of proposition 1 We proceed by induction. When w¼ 1 then fv(l, w  1) ¼0 and the replication does not occur, i.e. fv(l, 1) ¼fnv (l, 1) and f is correctly computed like in RS_kf. Moreover, f(l, 1) 4 0 3 ω(l, 1) 40 and so also the condition Cv,1 never intervenes. Hence, by Property 1, λ(l, 1) ¼ λ0(l, 1) for all l and, if 0 o λ0(l, 1) ol, then fnv (l, 1) ¼fv,0(l, 1). In order to complete the case w¼1, we need to prove fnv (l, 1) rfv0 (l, 1). It comes from two inequalities proved later also for w4 1, the first deduced by the fact that Cv,2 and Cv,3 allow for fnv just sums related to coordinates x with λ(x, w)4 0 and hence contained in the definition of fv,0 because λ ¼ λ0, and the second by relation (15): 0 f v ðl; wÞ rf v;0 ðl; wÞ rf v ðl; wÞ n

ð18Þ

Let us now suppose that the thesis is valid for all the coordinate couples considered in the outer cycles before (l, w). This has three consequences. The first is that, by λ ¼ λ0 for previous couples, the discussion on Cv,2 and Cv,3 can be repeated, so deducing (18) for the couple (l, w). The other two consequences are valid also if we do not use the replication on the current couple (l, w). Indeed, the definition of fnv and the replication setting fv(l, w)¼fv(l, w  1) imply that fnv (l, w)rfv(l, w)rmax{fnv (l, w), fv(l, w  1)}, where the second inequality becomes an equivalence if we apply the replication. Hence, by (18) and inductive hypothesis, i.e. fnv (l, w) rfv0 (l, w) and fv(l, w 1) rfv0 (l, w  1), we obtain fv(l, w)rmax{fv0 (l, w), fv0 (l, w 1)} and, for Property 4 (on fv0 ) 0

f v ðl; wÞ r f v ðl; wÞ

ð19Þ

The last consequence is the accuracy of f and λ in the case when λ0(l, w) A{0, l}. According to Property 3, the computation of f(l, w)

is correctly performed at line 5 of RS_kf either if we use fnv or fv instead of fv,0 because both of them are bounded by fv0 (18 and 19). About λ, by exploiting the arguments used in the proof of Property 3, if λ0(l, w)¼ 0 then f(l, w)¼f(l 1, w), whereas if λ0(l, w)¼l then f(l, w) 4fv0 (l, w) and, for (15) and (19), f(l, w)4max{ f(l 1, w), fv(l, w)}. Hence in both cases the line 6 of RS_kf correctly performs the update λ(l, w)¼ λ0(l, w). We consider now the case 0 o λ0(l, w)o l which, for (17), is equivalent to f ðl; wÞ ¼ f v;0 ðl; wÞ 4f ðl  1; wÞ

ð20Þ

In particular, denoted λ0(l, w) by u0, we first consider the case when the sum associated to x1 ¼u0 and x2 ¼ l  u0 has been discarded. It could be rejected only by Cv,1, according to inductive hypothesis on λ ¼ λ0 and to Property 1. Hence ω(x, w)¼0 8 xrl  u0, with two consequences: f ðu0 ; wÞ ¼ f ðu0 ; w  1Þ; f ðl–u0 ; wÞ ¼ f ðl–u0 ; w 1Þ n

f v ðl; wÞ of v;0 ðl; wÞ

ð21Þ ð22Þ

Indeed, from ω ¼0, the relation (21) comes from the inductive hypothesis (ω ¼ ω0) and the definition (13b). If instead the relation (22) would not be satisfied then, according to (18), fnv (l, w)¼fv,0(l, w) which is positive for (20) and would be the result of a sum allowed by Cv,1 necessarily associated to an x0 1 ou0 (l x0 1 4l u0), against the definition of λ0(l, w) as minimum in (13a). Note that λ0(u0, w) 40, otherwise by f(l, w)¼ f(u0, w)þf(l  u0, w) and by Property 2 and (13a), we would get the absurd λ0(l, w)¼ 0. As a consequence, using (21), (13a) and Property 4 (on f), f(u0, w  1) ¼f(u0, w)4 f(u0  1, w)Zf(u0  1, w  1) and hence also λ0(u0, w  1) is positive. The same occurs for l  u0 in place of u0. This means that the sum f(u0, w  1) þ f(l  u0, w  1) contributes even to the value fv,0(l, w 1) and hence we get f v;0 ðl; w 1Þ Zf v;0 ðl; wÞ

ð23Þ

113

or

λ0 ðl; w–1Þ A f0; lg

ð24Þ

Using (16), (23), (20) and Property 4 (on f), we can write f(l, w  1) Zfv,0(l, w  1) Zfv,0(l, w) ¼f(l, w) Zf(l, w  1), i.e. f(l, w)¼f (l, w  1) ¼ fv,0(l, w  1) ¼fv,0(l, w) which, for (20), is greater than f (l  1, w) and, in turn, than f(l  1, w  1) again by Property 4 (on f). Hence (23) can be rewritten as fv,0(l, w  1) ¼ fv,0(l, w) ( later referred to as (23)') and a relation equivalent to (20) is deduced for the couple (l, w  1). From this, applying (17) to the couple (l, w  1), we get

λ0 ðl; w 1Þ A f0; lg

ð25Þ 0

Then, for inductive hypothesis and for (23) and (22), fv(l, w 1) ¼fv,0(l, w  1) ¼fv,0(l, w)4fnv (l, w), and hence the replication occurs and determines the update of fv(l, w) to fv,0(l, w). Also the replication setting λ(l, w) ¼ λ0(l, w 1) is correct because, denoting λ0(l, w  1) by u1, we can prove that u1 ¼ λ0(l, w) ( ¼u0 ). Indeed, taking into account (25), we can simplify the relation (24) at first as 0o u1 ru0. Moreover, using also the monotonicity of f and the relations (2b) and (1), we obtain fv,0(l, w  1) ¼ f(u1, w  1) þf(l  u1, w 1) rf(u1, w)þ f(l  u1, w)r fv0 (l, w) rf(l, w), where the two extreme terms, i.e. fv,0(l, w 1) and f(l, w), are both equal to fv,0(l, w) for (23)0 and (20) respectively, and the consequence is that f(l, w)¼ f(u1, w)þf(l u1, w). Hence we cannot get u1 ou0 because this would contradict the definition (13a) of λ0(l, w) as minimum. About the setting of f(l, w) at line 5 of RS_kf, the replication has again no effect. Indeed, even if we use fnv (l, w) instead of fv(l, w) (¼fv,0(l, w) ), for (23)' it is equivalent to fv,0(l, w  1) which has been taken into account in the computation of the candidate f(l, w  1). The last case to be analyzed occurs when 0 o λ0(l, w)ol and the sum associated to λ0(l, w) has been performed. In this case it is obvious that fnv (l, w)¼fv,0(l, w) and λ(l, w) correctly assumes the value λ0(l, w). Moreover we can prove that the replication does not occur and this concludes the proof. Indeed, if by absurd the replication occurs, taking into account the inductive hypothesis, we would have that fv0 (l, w  1) Zfv(l, w 1) 4fnv (l, w) ¼fv,0(l, w) and in turn, for Property 4 (on fv0 ), fv0 (l, w) 4fv,0(l, w) so contradicting the relation (20), according to (15). References [1] Beasley JE. Algorithms for unconstrained two-dimensional guillotine cutting. J Oper Res Soc 1985;36:297–306. [2] Birgin EC, Lobato RD, Morabito R. Generating unconstrained two-dimensional non guillotine cutting patterns by a recursive partitioning algorithm. J Oper Res Soc 2012;63:183–200. [3] Christofides N, Whitlock C. An algorithm for two-dimensional cutting problems. Oper Res 1977;25:30–44. [4] Cintra GF, Miyazawa FK, Wakabayashi Y, Xavier EC. Algorithms for twodimensional cutting stock and strip packing problems using dynamic programming and column generation. Eur J Oper Res. 2008;191:61–85. [5] Cui Y, Wang Z, Li J. Exact and heuristic algorithms for staged cutting problems. Proc Inst Mech Eng, Part B: J Eng Manuf 2005;219(2):201–7. [6] Cung VD, Hifi M, Le Cun B. Constrained two-dimensional cutting stock problems. A best-first branch-and-bound algorithm. Int Trans Oper Res 2000;7: 185–200. [7] Dyckhoff H, Scheithauer G, Terno J. Cutting and packing (C&P). In: Dell'Amico M, Maffioli F, Martello S, editors. Annotated bibliographies in combinatorial optimization. Chichester: John Wiley & Sons; 1997. p. 393–412. [8] Fekete SP, van der Veen JC. PackLib2: an integrated library of multidimensional packing problems. Eur J Oper Res 2007;183:1131–5. 〈http:// www.ibr.cs.tu-bs.de/alg/packlib/instances.shtml〉. [9] Gilmore PC, Gomory RE. Multistage cutting stock problems of two and more dimensions. Oper Res 1965;13:94–120. [10] Gilmore PC, Gomory RE. The theory and computation of knapsack functions. Oper Res 1966;14:1045–74. [11] Hifi M. Exact algorithms for large-scale unconstrained two and three staged cutting problems. Comput Optim Appl 2001;18:63–88. [12] Herz JC. A recursive computing procedure for two-dimensional stock cutting. IBM J Res Dev 1972;16:462–9. [13] Kang MK, Yoon K. An improved best-first branch-and-bound algorithm for unconstrained two-dimensional cutting problems. Int J Prod Res 2011;49: 4437–55.

114

M. Russo et al. / Computers & Operations Research 50 (2014) 97–114

[14] Lodi A, Martello S, Monaci M. Two-dimensional packing problems: a survey. Eur J Oper Res 2002;141:241–52. [15] Parada V, Palma R, Sales D, Gòmes A. A comparative numerical analysis for the guillotine two-dimensional cutting problem. Ann Oper Res 2000;96:245–54. [16] Russo M, Sterle C, Sforza A. An improvement of the knapsack function based algorithm of Gilmore and Gomory for the unconstrained two-dimensional guillotine cutting problem. Int J Prod Econ, Special Issue on Cutting and Packing 2013;145(2):451–62.

[17] Scheithauer G, Terno J. The G4-heuristic for the pallet loading problem. J Oper Res Soc 1996;47:511–22. [18] Wäscher G, Hauβner H, Schumann H. An improved typology of cutting and packing problems. Eur J Oper Res 2007;183:1109–30. [19] Young-Gun G, Seong YJ, Kang MK. A best-first branch and bound algorithm for unconstrained two-dimensional cutting problems. Oper Res Lett 2003;31:301–7.