A multi-objective Monte Carlo tree search for forest harvest scheduling

A multi-objective Monte Carlo tree search for forest harvest scheduling

ARTICLE IN PRESS JID: EOR [m5G;October 17, 2019;5:34] European Journal of Operational Research xxx (xxxx) xxx Contents lists available at ScienceD...

2MB Sizes 0 Downloads 35 Views

ARTICLE IN PRESS

JID: EOR

[m5G;October 17, 2019;5:34]

European Journal of Operational Research xxx (xxxx) xxx

Contents lists available at ScienceDirect

European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor

Interfaces with Other Disciplines

A multi-objective Monte Carlo tree search for forest harvest scheduling Teresa Neto a,∗, Miguel Constantino b, Isabel Martins c, João Pedro Pedroso d a

Escola Superior de Tecnologia e Gestão, Instituto Politécnico de Viseu, Viseu 3504-510, Portugal Centro de Matemática, Aplicações Fundamentais e Investigação Operacional, Faculdade de Ciências, Universidade de Lisboa Cidade Universitária, Lisboa 1749-016, Portugal c Centro de Matemática, Aplicações Fundamentais e Investigação Operacional, Instituto Superior de Agronomia, Universidade de Lisboa Tapada da Ajuda, Lisboa 1349-017, Portugal d INESC TEC, Faculdade de Ciências, Universidade do Porto Rua do Campo Alegre, Porto 4169-007, Portugal b

a r t i c l e

i n f o

Article history: Received 6 February 2018 Accepted 18 September 2019 Available online xxx Keywords: Multiple objective programming Forest planning Core area Connectivity Tree search

a b s t r a c t While the objectives of forest management vary widely and include the protection of resources in protected forests and nature reserves, the primary objective has often been the production of wood products. However, even in this case, forests play a key role in the conservation of living resources. Constraining the areas of clearcuts contributes to this conservation, but if it is too restrictive, a dispersion of small clearcuts across the forest might occur, and forest fragmentation might be a serious ecological problem. Forest fragmentation leads to habitat loss, not only because the forest area is reduced, but also because the core area of the habitats and the connectivity between them decreases. This study presents a Monte Carlo tree search method to solve a bi-objective harvest scheduling problem with constraints on the clearcut area, total habitat area and total core area inside habitats. The two objectives are the maximization of both the net present value and the probability of connectivity index. The method is presented as an approach to assist the decision maker in estimating efficient alternative solutions and the corresponding trade-offs. This approach was tested with instances for forests ranging from some dozens to over a thousand stands and temporal horizons from three to eight periods. In general, multi-objective Monte Carlo tree search was able to find several efficient alternative solutions in a reasonable time, even for medium and large instances. © 2019 Elsevier B.V. All rights reserved.

1. Introduction Land clearing for timber leads to the so-called forest fragmentation, the process of breaking up continuous habitats. This process has many negative impacts on the local environments, mainly on wildlife habitat availability, in particular on habitat area and interhabitat connections (Harris, 1984; Kurttila, Pukkala, & Loikkanen, 2002). Haddad et al. (2015) synthesize results from the set of longterm experiments conducted in a wide variety of ecosystems to demonstrate consistent impacts of fragmentation, how those impacts change over time, and how they align with predictions from theory and observation. In summary, across experiments spanning numerous studies and ecosystems, fragmentation consistently degraded ecosystems, reducing species persistence and species richness of plants and animals, nutrient retention, trophic dynamics,



Corresponding author. E-mail addresses: [email protected] (T. Neto), [email protected] (M. Constantino), [email protected] (I. Martins), [email protected] (J.P. Pedroso).

and, in more isolated fragments, movement, often changing the composition of entire communities. Constraints on clearcut area were originally imposed in forest management for timber production to reduce some harvest impacts on the local environments, mainly on soil, water and aesthetics besides wildlife. Nevertheless, the dispersion of smaller clearcuts across the forest created by such constraints may have a perverse effect involving forest fragmentation. One way to address forest fragmentation concerns in harvest scheduling problems is to directly consider the aspects of fragmentation, fragment area, edge effects or core area inside fragments, and inter-fragment connections, in particular when fragments are habitats. Integrating these aspects adds substantial complexity to the models and solution techniques. Core area, and consequently habitat area, has been considered either directly or indirectly. Indirect approaches use area and shape of forest patches, and can be applied to approximate the core area. However, there is no single indicator that summarizes all characteristics of shape. For example, shape index values (values on ratio perimeter to area McGarigal, Cushman, Neel, & Ene, 2002), range from one (perfect

https://doi.org/10.1016/j.ejor.2019.09.034 0377-2217/© 2019 Elsevier B.V. All rights reserved.

Please cite this article as: T. Neto, M. Constantino and I. Martins et al., A multi-objective Monte Carlo tree search for forest harvest scheduling, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.09.034

JID: EOR 2

ARTICLE IN PRESS

[m5G;October 17, 2019;5:34]

T. Neto, M. Constantino and I. Martins et al. / European Journal of Operational Research xxx (xxxx) xxx

circle) to higher values (irregular shapes), but two patches of different shapes may share the same shape index value (Baskent & Jordan, 1995). There are many other approaches that try to grasp the essential characteristics of shapes (Davis, 1977; Moellering & Rayner, 1981; Wentz, 20 0 0; Xia, 1996). As far as we know, the only indirect approaches to core area used in the literature are the shape index and measures of perimeter and area. Martins, Constantino, and Borges (1999), Rebain and McDill (2003a), Rebain and McDill (2003b), and Martins, Constantino, and Borges (2005) constrain the minimum total area of forest patches with a minimum area requirement. Tóth, McDill, and Rebain (2006) constrain or minimize the total perimeter of the forest patches. All these five works proposed to solve mixed integer linear programming models with constraints on the maximum clearcut area. The approach described in Öhman and Wikström (2008) lies in multi-objective programming, where maximization of the total perimeter of the forest patches is regarded as an additional objective, besides maximizing the net present value. Constraints on clearcut area have not been considered in this study. In order to circumvent computational limitations on the usage of exact methods, some studies proposed heuristics to solve forest planning problems with constraints both on the total area of forest patches above a minimum area requirement and on the clearcut area (simulated annealing and hybrid heuristics by Falcão and Borges (2002), tabu search by Caro, Constantino, Martins, and Weintraub (2003)). Öhman and Lämås (2005) used the shape index as a criterion for evaluating the forest fragmentation. A bi-objective problem was solved using simulated annealing, the purpose of which was to maximize the net present value and minimize the shape index of the forest. Constraints on clearcut area have not been considered in this study. Approaches that include core area directly have been developed by, for example, Wei and Hoganson (2007) and Zhang, Constantino, and Falcão (2011), who solved mixed integer programming formulations, Öhman and Eriksson (1998), Öhman (20 0 0), and Öhman, Pukkala et al. (2002), who proposed simulated annealing, Wei and Hoganson (2008) and Hoganson, Wei, and Hokans (2005), who proposed dynamic programming-based heuristics, and Neto, Constantino, Martins, and Pedroso (2016), who implemented a branch-and-bound approach. From these works, only the last one considered constraints on the clearcut area. Hof, Bevers, Joyce, and Kent (1994) modeled inter-habitat connections indirectly proposing mixed integer linear programming formulations that take into account the spatial and temporal developments of wildlife populations. Constraints on clearcut area have not been included in the formulations. Neto, Constantino, Martins, and Pedroso (2013) used the so-called probability of connectivity index to measure the inter-habitat connections (Saura & Hortal, 2007). Those authors presented a branch-and-bound approach to solve forest harvest scheduling problems with constraints on the total area of forest patches with a minimum area requirement, as well as on the probability of connectivity index. These problems include constraints on clearcut area. The majority of the models applied to forest management problems with forest fragmentation concerns consider a single objective subject to lower or upper bounds on, for example, clearcut size, total perimeter of forest patches, total area of forest patches with a minimum area requirement, total core area, total edge and probability of connectivity index. However, it might be difficult to identify the appropriate threshold requirement bounds that will adequately meet the environmental concerns and, simultaneously, will not be too restrictive. As the objectives are conflicting, a multi-objective optimization approach may overcome this drawback, minimizing or maximizing some of those measures. In this case, trade-offs between the objectives can be provided, thus allowing the decision maker to choose the one that he/she considers to be the best

Multi-objective models have been applied to forest management problems, in general, by transforming the multi-objective goals into a weighted-sum single-objective function. Snyder and ReVelle (1997) used a bi-objective integer programming model that addresses two objectives related with wildlife habitat and timber harvesting activities. Timber harvest constraints were incorporated to prevent harvests from occurring next to one another as well as next to habitat areas. Williams (1998) considered a bi-objective binary programming model where the objectives are the minimization of corridor land costs and the minimization of the amount of unsuitable land within the corridor system. The model specifies sets of stands that were not going to be harvested, which would then be corridors between areas that had been assigned to wildlife. Martins, Ye, Constantino, Fonseca, and Cadima (2014) incorporated an even flow policy as objective in a bi-objective mixed integer programming model with constraints on clearcut area. All these models have been solved by exact methods. Example of models solved by heuristics are Öhman and Eriksson (1998), Öhman (20 0 0), Öhman et al. (2002), Öhman and Lämås (2005) and Öhman and Wikström (2008). Tóth et al. (2006) evaluated the performance of five methods, the weighted sum method and the  -constraint method, the decomposition method based on the Tchebycheff metric, and the triangles and alpha-delta methods, to generate the Pareto front for a bi-objective harvest scheduling problem, where the objectives are maximization of the net present value and maximization of the total area of forest patches with a minimum area requirement. Constraints on clearcut area have been considered. The weighted sum method (Geoffrion, 1968) transforms the multi-objective optimization problem into a series of singleobjective optimization problems, where each one of these problems is a convex combination of the objectives. The optimal solution to the weighted problem is a Pareto-optimal solution to the original problem as long as the weights are greater than zero. When the formulation involves discrete decision variables, the weighted sum method only generates supported Pareto-optimal solutions that is, those belonging to the boundary of the convex hull of the set of feasible solutions in the objective space. The decomposition method based on the Tchebycheff metric (Eswaran, RavindranH, & Moskowitz, 1989) minimizes the weighted Tchebycheff distance between the solutions (in the objective space) and the ideal point, that is, the maximum weighted difference in the attainments of the respective objectives between the solutions and the ideal point. The  -constraint method (Haimes, Lasdon, & Wismer, 1971) generates single objective subproblems, called  -constraint problems, by transforming all but one objectives into constraints. The right-hand sides of these constraints are given by the  -vector and, by varying it, the Pareto front can theoretically be generated if the optimal solutions to the single objective problems are unique. This method and the previous one can generate weakly efficient solutions arising from the existence of alternative optimal solutions to the scalar optimization problems. The triangles method (Chalmet, Lemonidis, & Elzinga, 1986) and the alpha-delta method (Tóth et al., 2006) are hybrid methods combining the weighted sum method and the  -constraint method. The triangles method seeks Pareto-optimal solutions between two adjacent Pareto points that have already been identified. The convex combination of the objectives with the weights fixed arbitrarily is optimized at each step subject to constraints that impose lower bounds (in maximization problems) on the objectives. These lower bounds are slightly above the lower achievements on the objectives at the two adjacent solutions. The alpha-delta method assigns, in a bi-objective problem, a substantially larger weight to one objective than to the other, and optimizes at each step the convex combination of the objectives (with these weights) subject to a constraint that requires the achievement value of the second objective to be greater than or equal to the achievement value

Please cite this article as: T. Neto, M. Constantino and I. Martins et al., A multi-objective Monte Carlo tree search for forest harvest scheduling, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.09.034

JID: EOR

ARTICLE IN PRESS

[m5G;October 17, 2019;5:34]

T. Neto, M. Constantino and I. Martins et al. / European Journal of Operational Research xxx (xxxx) xxx

obtained by the previous step plus a sufficiently small value. For more general discussions on multi-objective optimization, we refer those interested to Steuer (1986). For the instance used in Tóth et al. (2006) (a forest consisting of fifty stands and a planning horizon of three periods), the  constraint, the alpha-delta and the triangles methods found thirty six Pareto points in 58.75, 6.27 and 17.13 hours, respectively. The Tchebycheff decomposition method found thirty four solutions in 36.83 hours, while the weighted sum method found six in 1.72 hours. The authors of this paper concluded that these techniques can only be applied to multi-objective harvest scheduling problems that describe wildlife goals at a small scale, pilot-study level, and therefore approaches that efficiently tackle these problems on larger forests are necessary. Monte Carlo methods have a long history within numerical algorithms. Monte Carlo integer programming is a heuristic commonly used in forest management problems. This approach was applied, for example, in Nelson and Brodie (1990) to solve a combined harvest scheduling and transportation planning problem with concerns as to clearcut area, and in O’Hara, Faaland, and Bare (1989) and Clements and Dallain (1990) to solve other spatial harvest scheduling problems. Monte Carlo tree search (MCTS) is a recent tree search technique that builds iteratively a tree in an asymmetric manner, concentrating on the more promising subtrees, making use of the outcomes of stochastic simulations. MCTS was proposed by Coulom (2006) for the game of Go (9 × 9 board) with considerable success. Game-playing is still the area where the algorithm and its many variants are most commonly used. It has also been applied in artificial intelligence approaches to solve problems that can be represented by trees of sequential decisions and planning problems (see Browne et al. (2012) for a comprehensive survey). There are, however, very few publications on its application to solve combinatorial optimization problems. In this field, some results have been provided for general mixed integer optimization, Sabharwal, Samulowitz, and Reddy (2012) and Pedroso and Rei (2015) used MCTS to solve specific optimization problems. An application of MCTS to the multi-objective optimization has been performed by Wang and Sebag (2012). To the best of our knowledge, there are no applications of MCTS to forest management problems. In this work, we propose a multi-objective MCTS approach to solve a bi-objective harvest scheduling problem with constraints on clearcut area, total habitat area and total core area inside habitats. There are two objectives to be maximized: the net present value of the timber harvested and the probability of connectivity index. We use the concept of subregions to measure the core area (Öhman & Eriksson, 1998; Wei & Hoganson, 20 07; 20 08; Zhang et al., 2011), and the probability of connectivity index to measure inter-habitat connections. Each stand is partitioned into pieces, the so-called subregions, where each piece can be edge, core area or neither edge nor core area, depending on the position of the piece inside the stand and the vegetation condition of the adjacent stands and the stand itself. The probability of connectivity index is a ratio of the area of the habitats, weighted by an indicator of the inter-habitat distance, to the area of the forest. Saura and Hortal (2007) has compared this measure with other indices through a set of properties that any connectivity index should fulfill and it was the only that satisfied all properties. The method is proposed as an approach to generating alternative solutions, indicative of the extent to which economic planning and environmental objectives can be achieved and of the trade-offs between these objectives. MCTS is used as an alternative to standard binary tree search, in situations where the construction and storage of the tree would be computationally prohibitive; MCTS is aimed mainly at solving medium and large instances. The solution at a child node is evaluated by fixing the partial solution of its parent node and consid-

3

ering the changes corresponding to the stand in which the cutting decision is taken. This incremental procedure may provide a considerable speedup for solution evaluation. As opposed to metaheuristics, MCTS is not based on neighbourhoods, and hence does not suffer from local optima. However, except for very small instances, the search tree cannot be fully explored, and hence there is not guarantee of reaching the optimum (or, in our case, the Pareto front). This paper is outlined as follows. In Section 2, we describe the forest planning problem. In Section 3, we present the multiobjective Monte Carlo tree search. In Section 4, we report on computational experiments, performed on benchmark instances corresponding to forests ranging from 32 to 1363 stands. In the last section, we present some conclusions. The supplementary material for the paper includes the following: additional details on the concepts of subregion and probability of connectivity index, which can also be found in Neto et al. (2016) and Neto et al. (2013), respectively; a formulation that we propose for the problem, which we did not use in the multi-objective Monte Carlo tree search due to its unmanageable number of variables for medium and large instances; a proposition stating that if the probability of connectivity index is greater than or equal to a certain value, the total habitat area constraint is redundant; a description of how some parameters and data of the problem were fixed, which can also be found in Neto et al. (2013, 2016); a background information about the hypervolume, the metric that we used to compare the quality of different approximation sets of the Pareto front of the multi-objective optimization problem. 2. Bi-objective harvest scheduling 2.1. Some underlying concepts In this study, we consider forests classified into stands, a current forestry practice. A stand is a grouping of vegetation sufficiently uniform in species composition, age, and condition to be managed as a single unit. As mature forests, i.e., forests that are dominated by mature trees, are strongly related to the supply of wildlife habitat (Franklin & Forman, 1987), our attention goes to this type of forests. A tree is mature if it is older than a certain threshold age. A stand is mature if is dominated by mature trees. A forest patch (a continuous area of the forest) can be classified into edge and core area (or interior space). Core area is defined as the interior area of the patch where ecological functioning is not impacted by the effect of immediate surrounding conditions, the so-called edge effect. The edge effect corresponds to a buffer area (edge), separating the core area from outside influences, and is due to neighbors clearcuts (or patches of early successional species) and non-forest patches (Fig. 1). In this study, the core area is measured by using the concept of subregion which is based on the concept of impact zone. We consider that the edge effect induced by a harvested stand attains a region – the impact zone – starting in the stand’s boundary up to a constant distance outward (the depth of the edge effect) (Baskent & Jordan, 1995). The intersection of the impact zones of all stands and the stands themselves determines a set of subregions. Each subregion is included in one stand. Decisions regarding adjacent stands and the stand itself (clearcutting or doing nothing) may determine whether the subregion is a piece of edge or core area. A subregion is neither a core area nor a piece of edge if it is not included in a mature stand (Fig. 2). Connectivity represents the degree to which landscape promotes or prevents species movements among resource patches. It depends on the travel distances that species need to cover to populate the habitats and on the existence of intermediate steps (e.g.,

Please cite this article as: T. Neto, M. Constantino and I. Martins et al., A multi-objective Monte Carlo tree search for forest harvest scheduling, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.09.034

ARTICLE IN PRESS

JID: EOR 4

[m5G;October 17, 2019;5:34]

T. Neto, M. Constantino and I. Martins et al. / European Journal of Operational Research xxx (xxxx) xxx

Fig. 1. Core area (shaded area) and edge (white area) for two different patch shapes (a) and (b) (area and immediate surrounding conditions held constant), for two different patch sizes (b) and (c) (shape and immediate surrounding conditions held constant), and for two different immediate surrounding conditions (c) and (d) (area and shape held constant).

A path between two habitats h and h is made up of a sequence of distinct mature patches 1 = h, 2, 3, . . . , n = h , such that the pair (i, i + 1 ), with i = 1, . . . , n − 1, represents a direct movement of an individual from i to i + 1, without passing through any other mature patch. The intermediate mature patches can also meet a minimum required area, which can be less than the minimum habitat area if they are intermediate steps or corridors. Let l and m be two mature patches. The possibility of a direct movement occurrence of an individual between l and m is measured by the indicator plm . If the distance to each other allows species dispersal, then plm is given by

plm = e−β dlm ,

Fig. 2. Five-stand mature forest with the subregions. Stand B encompasses the subregions B1 up to B6 . The impact zone of stand B is {A2 , C2 , C4 , D2 , D4 , E2 }. If stand B is harvested, the gray areas stay as core area subregions and the light gray areas become edge subregions.

forest non-habitat patches) or corridors (e.g., strips of forest) that shorten these distances. In this work, a habitat is defined as a mature patch meeting a minimum core area. Such a patch may provide the environmental conditions (e.g., climate, topography, etc.) and the life needs (e.g., food, shelter, etc.) for a given set of wildlife species, but may not be suitable for other species. For example, in Fig. 1, although both (mature) patches (a) and (b) have the same area, for the concerned species, (b) may have enough core area to be habitat and (a) may not, while for other species, (a) may have enough core area or, on the contrary, the core area in (b) may not be large enough. The probability of connectivity index, used to summarize the characteristics of forest connectivity is, for each period t, as follows:

 It =

 h∈Ht

h ∈Ht sh sh ghh 2

F

,

(1)

where Ht is the set of all habitats in period t; sh and sh are the areas of habitats h and h , with h, h ∈ Ht ; F is the total area of the forest; ghh is the largest connectivity of paths from h to h . Details about how to compute ghh are given next.

(2)

where β is a constant greater than zero called the coefficient of dispersion (species’ dependent), and dlm is the edge-to-edge distance l and m. Constant β is computed considering a distance d for which the possibility of a direct movement p is empirically known, β = − ln p /d . The smaller the inter-mature patch distance, and the more favorable the occurrence of a direct movement, the closer the indicator is to 1. If l and m are too distant from each other to allow species dispersal, plm is considered to be null. Therefore, plm ∈ [0, 1]. The connectivity of a path between habitats h and h is given by the product of the indicators pi,i+1 of all pairs (i, i + 1 ) that form the path; ghh indicates the largest connectivity among all paths between h and h , corresponding to the path with the greatest chance of dispersion (ghh ∈ [0, 1]). It ranges from 0 to 1, and increases with connectivity improvement. It is equal to 1 when all the forest is a single habitat, and is equal to zero when there are no habitats, or all habitats are completely isolated (by being too distant). 2.2. The multi-objective optimization problem The bi-objective harvest scheduling problem (referred to as problem P) deals with determining which stands should be harvested in each period during a given planning horizon in order to maximize two potentially conflicting objectives, the net present value generated by the harvestings and the probability of connectivity index (the smallest value of the index over the planning horizon). Stand selection is subject to volume and average ending age constraints (non-spatial constraints) and constraints on clearcut size, total habitat area and total core area (spatial constraints) as follows. Volume constraints: lower and upper bounds on the volume of timber harvested in each period. These constraints ensure a regular production of timber. Average ending age constraints: a minimum average age for the forest at the end of the planning horizon. These constraints

Please cite this article as: T. Neto, M. Constantino and I. Martins et al., A multi-objective Monte Carlo tree search for forest harvest scheduling, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.09.034

ARTICLE IN PRESS

JID: EOR

[m5G;October 17, 2019;5:34]

T. Neto, M. Constantino and I. Martins et al. / European Journal of Operational Research xxx (xxxx) xxx

5

Table 1 Constraint parameters of the harvest scheduling problem. Constraint

Parameter

Notation

Volume

Target volume of timber to be harvested Deviation allowed from target volume Minimum age requirement, in terms of periods Upper bound on the area of a clearcut Lower bound on the total habitat area Lower bound on the core area of a habitat Lower bound on the total core area within habitats

v0

Average ending age Maximum clearcut size Total habitat area Total core area

aim at mainly preventing harvestings in any part of the forest where any immediate profit can be made, ensuring the quality of the forest age profile at the end of the planning horizon. Maximum clearcut size constraints: an upper bound on the area of each clearcut. For the sake of simplicity, it is considered that stands adjacent to clearcuts cannot just be harvested in the period of the interventions, i.e. the minimum number of periods (referred to as green-up time) in which stands adjacent to a clearcut cannot be harvested is equal to one period. Total habitat area constraints: a lower bound on the total area of habitats (edge and core area) in each period. Total core area constraints: a lower bound on the total core area within habitats in each period. In this study, it is assumed the following: The edge effect (not induced except by harvestings) only occurs during the period of the intervention and is the same for all harvested stands, i.e. the width of the impact zone is constant for all stands. Each stand is harvested at the most once, i.e., the minimum rotation in the stand is longer than the planning horizon. A minimum age is required for harvesting, and a harvested stand may become mature within the planning horizon. Harvesting occurs at the beginning of the periods, and all the periods are of the same length. To identify clearcuts and habitats in the forests, we need to define when two stands are adjacent. Two different definitions of adjacency are considered (Goycoolea, Murray, Barahona, Epstein, & Weintraub, 2005). For clearcuts, two stands are considered to be adjacent if they share a boundary of positive length. For habitats, they are considered to be adjacent if they share at least one single point (Goycoolea, Murray, Vielma, & Weintraub, 2009). The first definition is called strong adjacency and the second weak adjacency. From an aesthetic standpoint, rooted in beauty and visual appreciation, besides other forest values such as soil erosion and water quality, it is reasonable to assume that a pair of clearcuts sharing only one single point are non-adjacent and hence can not define one clearcut. As for protecting wildlife, with regard to the flow of genetic material throughout the landscape, it is reasonable to assume that a pair of mature “patches” sharing only one single point are adjacent and hence define one patch. Table 1 displays the constraint parameters of the problem. In min addition, Agemin cut is the minimum harvest age and Ageold is the minimum mature age, both in terms of periods. The volume of timber harvested in each period is constrained between the lower bound (1 − α )v0 and the upper bound (1 + α )v0 . 3. Monte Carlo tree search In this section, we describe the multi-objective Monte Carlo tree search approach that we propose for problem P. For the sake

α

Agemin end Amax min Htot Cmin Cmin tot

of better understanding this approach, we present first a general description of the Monte Carlo tree search; the four main steps are depicted in Fig. 3, using a binary tree for clarity. 3.1. General description Monte Carlo tree search (MCTS) is an iterative method that builds a tree search in an incremental and asymmetric manner, exploring more deeply parts of the tree which correspond to more promising solutions. Stochastic simulations are used for estimating the value of its nodes. The method guides search within the tree, keeping a balance between exploitation of areas where good solutions have been found, and exploring areas previously insufficiently searched. Each node keeps track of the number of simulations started from its state, as well as their outcomes, and these data are used to produce an estimate value for the node when deciding how to expand the tree. Each iteration of MCTS consists of four main steps: Selection: starts from the root node and iteratively selects the child node which currently looks more promising, until a node which has not yet been fully expanded is reached. The best known version of MCTS is the Upper Confidence Bounds for Trees (UCT), first introduced by Kocsis and Szepesvári (2006), which employs Eq. (3) as a tree node selection policy (referred to as UCB1). In this policy, the selection of a child node is treated as a multi-armed bandit problem, where the value of each child is the expected reward approximated by Monte Carlo simulations; as such, these rewards correspond to random variables with unknown distributions. Under this policy, at each node, the child node k is chosen to maximize the following value:



U (k ) = Q (k ) + σ

ln s p(k ) , sk

(3)

where σ is a constant that balances both terms (the value depends on √ the problem approached, but it is common to find σ = 2), sp(k) is the number of simulations performed under the parent node p(k), and sk is the number of simulations performed under the child node k (i.e., simulations started from k or from any node in the subtree under k). A family of policies for determining U(k) for game-playing is proposed in Auer, Cesa-Bianchi, and Fischer (2002); it was designed with rewards in the interval [0,1]. In order to adapt it to the optimization context and maintain the proper balance between the two components of U(k), we use the following expression for Q(k) (Pedroso & Rei, 2015), guaranteeing a value also in [0,1]:

Q (k ) =

∗ − W Zk∗ , ∗ − W Z∗

(4)

 ∗ are, respectively, the best and the worst where  Z ∗ and W simulation outcomes found in the whole of the tree explored so far, and  Zk∗ is the best simulation outcome under node k.

Please cite this article as: T. Neto, M. Constantino and I. Martins et al., A multi-objective Monte Carlo tree search for forest harvest scheduling, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.09.034

JID: EOR 6

ARTICLE IN PRESS

[m5G;October 17, 2019;5:34]

T. Neto, M. Constantino and I. Martins et al. / European Journal of Operational Research xxx (xxxx) xxx

Fig. 3. Main steps of MCTS: selection – the algorithm starts from the root node s and iteratively select child k with maximum U(k) until k is a node which has not been fully expanded (green nodes); expansion – node k generates the children nodes k + 1 and k + 2; Simulation – from the new nodes, a simulation is run by randomly choosing a child of each non-expanded node until a terminal state is reached; backpropagation – the outcome of the simulation is updated through the selected nodes (green nodes).

The first term of the right-hand side of Eq. (3) focuses on the exploitation of previously expanded nodes which appear to be promising, and the second term focuses on the exploration of a new area of the search space. As node k visits increase, the denominator on the exploration term tends to increase, lessening its contribution. If sibling nodes are visited instead, then the numerator of that term increases, promoting its exploration. Expansion: one or more child nodes of the selected node k are added to expand the tree, according to the available actions. Two strategies for node expansion could be considered (Pedroso & Rei, 2015). In the single expansion, a single child node is created using a randomly chosen unexplored decision in k; other unexplored decisions are kept for a later time when node k is again selected for expansion. In full expansion, all children of node k are immediately created by generating all possible decisions in the node. Simulation: from each node created in the expansion step, simulation is performed as a typical Monte Carlo method by randomly choosing a child of each non-expanded node until a terminal state is reached. Various approaches can be applied. The simplest approach consists in taking uniform random decisions, requiring nothing more than a generative model of the problem. Heuristic construction algorithms incorporate domain-specific knowledge, and typically allow for faster convergence at the expense of simplicity and generality. The simulation outcomes are used for computing Q(k) in Eq. (3), in the selection step. Backpropagation: step that propagates the outcome of the simulation backwards through the selected nodes. This updates statistics (simulation scores and counts) that inform future tree policy decisions on all nodes between the selected node and the root.

This process is repeated until a termination criterion is met (such as reaching the allowed number of iterations or elapsed time). A great benefit of MCTS is that the values of intermediate

states do not have to be evaluated: only the value of the terminal state at the end of each simulation is required. 3.2. MCTS for multi-objective optimization The multi-objective MCTS (MO-MCTS) builds iteratively a tree search until a computational time limit is reached. At this point, the set F of all possible non-dominated points found is returned. Each node represents a partial solution of the problem (state) and its children represents actions leading to increasingly complete solutions. The possible actions correspond to the decision of harvesting or not a given stand at a given period. A solution of the problem (terminal state) is reached at the end of a simulation, or when all (stand, period) pairs had been fixed. The result of a simulation, Z, is a vector of values for each objective at the corresponding solution. Let Z = (npv, I ), where npv is the net present value and I is the value of the connectivity index (the smallest value of the index over the planning horizon) corresponding to a solution x; npv is the sum, for all harvested stands, of the individual net present values of each stand (which is given as data). At each node, the solution x is an integer vector with as many components as the number of stands, where xi is the period at which stand i is harvested. If the stand is not harvested across the planning horizon, then xi = 0. The first step of MO-MCTS is the creation of root node s. This node contains a solution where no decision is taken and a set S of all pairs (stand i, period t) such that i is available for harvesting at t (i.e., i is not younger than Agemin cut in t, and its area is not greater than Amax ). The steps of MO-MCTS are presented in Algorithm 1. The four main steps of each iteration are explained in detail below.

Selection: at each node, Eq. (3) is used to select the child node k with maximum value U(k). The outcomes of simulation are now bi-dimensional vectors that cannot be used in this formula. As dealing with a Pareto front in the Monte Carlo tree search would be extremely complex, we reduced this twodimensional vector to a scalar based on the value of the

Please cite this article as: T. Neto, M. Constantino and I. Martins et al., A multi-objective Monte Carlo tree search for forest harvest scheduling, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.09.034

JID: EOR

ARTICLE IN PRESS

[m5G;October 17, 2019;5:34]

T. Neto, M. Constantino and I. Martins et al. / European Journal of Operational Research xxx (xxxx) xxx

verified. For the left branch (harvesting), constraints on volume, average ending age, maximum clearcut size, minimum total habitat area and minimum total core area of habitats are checked. For the right branch (not harvesting), only the constraint of lower bound on timber volume produced at period t needs to be checked; the values for all the other constraints are unchanged. The stand and period (i, t) chosen for expansion are the first tuple of set S, ordered by descending order of net present value of its elements. After a tuple is fixed, it is removed from S. For left branch, it is also removed any pair (i , t ) such that harvesting stand i in period t violates any the following constraints: upper bound on the volume of timber harvested in t ; ending-age constraint; maximum clearcut area for all i adjacent to i. The constraint of lower bound on the volume at period t can only be fully checked when all the decisions have been taken in that period. Otherwise, we nevertheless check if harvesting all stands still available for period t would give a volume of timber smaller that the lower bound; in such a case, the node is fathomed. Simulation: when expanding a node, for each non-infeasible child created, a simulation is performed from that point until reaching a terminal state (a complete solution of the problem or an infeasible solution). In each step of the simulation a tuple (i, t) is randomly chosen and removed from S. The decision to harvest i in t or not depends on the admissibility of the corresponding solution. If both decisions correspond to non-infeasible solutions, the choice is random. When both decisions are infeasible, the simulation ends without a feasible solution. Otherwise, the choice of the feasible alternative is obligated. Backpropagation: so as to use Eqs. (3) and (4), after an expansion, for each node in the path from a newly expanded node and the root, we update the following statistics: the number of times the node was selected and the best Z obtained in simulations of its children.

Algorithm 1: Main steps of Multi-objective Monte Carlo Tree Search. Initialization: let F := ∅ create root node s let S be the set of all possible tuples (stand, period) Step 2 Termination: if time limit is reached then if F = ∅ then nothing can be said else propose heuristic solutions that yielded F Step 1

Node selection: repeat starting from s, recursively select child k with maximum U (k ) until k is a node which has not been fully expanded Step 4 Expansion: choose a not yet fixed pair (stand, period) ∈ S; create two children: one where the stand is cut in that period, another where it isn’t let C be the set of non-infeasible children foreach k’ ∈ C do Step 5 Simulation: repeat choose randomly a (stand, period) pair not yet fixed, and create a child until a complete solution or an infeasible solution is found let Z be the penalized volume corresponding to simulation from k Step 6 Backpropagation: foreach node in the path from a newly expanded node and s do update the number of times the node was selected and the best Z obtained in simulations of its children if Z is non-dominated then F := F ∪ {Z } F := F \ {d ∈ F : d is dominated} Step 3

update set S removing tuple (stand, period) and others which violate some constraint go to Step 2

hypervolume – a scalar measure for the quality of Pareto set approximations. The hypervolume indicator of a set F of non-dominated points is the volume of the objective space dominated by F and bounded below by a reference point, usually a nadir point (in our case, (0,0); see supplementary material of the paper for more information). A similar approach for tackling this difficulty, also using the scalarization into a hypervolume, has been proposed in Wang and Sebag (2012). Our approach is the following. If a simulation outcome Z is non-dominated, Z is replaced by the value of the hypervolume of F ∪ {Z}, where F is the current set of efficient solutions in the objective space. For a dominated solution, Z is replaced by the value of the hypervolume of F minus the Euclidean distance between Z and the convex hull of F. The distance here sets a penalty for dominated points, increasing with their distance to the convex hull of F. Each simulation in our method is evaluated in terms of this penalized hypervolume value. Expansion: two child nodes of the selected node k are immediately created (full expansion), corresponding to the decision of harvesting stand i or not in period t, nodes k + 1 (left branch) and k + 2 (right branch), respectively. The admissibility of the solution corresponding to each child is

7

Even though we are using only two objectives, the method based on hypervolume described in this paper can be employed with any number of objectives (Wang & Sebag, 2012; Zitzler & Thiele, 1998), albeit at the expense of increased computational cost. 4. Computational experiment 4.1. Instances In order to test the performance of the Monte Carlo Tree Search method we have used instances publicly available in the Internet1 . The instances include: El Dorado, a National Forest in northern California, USA (referred to in Goycoolea et al. (2005), Goycoolea et al. (2009), Könnyu˝ and Tóth (2013)); Stafford, a forest in British Columbia, Canada (Crowe, Nelson, & Boyland, 2003); Kittaning4, Bear Town, PhyllisLeeper and FivePoints, forests in Pennsylvania, USA (Könnyu˝ & Tóth, 2013); WLC (Bettinger, Graet, Boston, Sessions, & Chung, 2002); FLG9 and FLG10 (Paradis & Richards, 2001), the computer generated instances (through the Forest Landscape Generator). The number of stands ranges from 32 (Kittaning4) to 1363 (El Dorado) and the number of edges from 48 to 4087 (considering weak adjacency). The length of the temporal horizon ranges from 3 to 8 periods. Some instances are tested for different lengths of horizon T; the notation used is instanceT . Values for α , v0 , Agemin end , min Hmin tot , Ctot and other parameters of the problem are fixed as in 1

Instances have been obtained at the web site http://www.unbf.ca/fmos/.

Please cite this article as: T. Neto, M. Constantino and I. Martins et al., A multi-objective Monte Carlo tree search for forest harvest scheduling, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.09.034

[m5G;October 17, 2019;5:34]

695 417 3468 2890 18,415 10,523 4437 3169 426 256 102,638 68,425 114,149 76,099 668,558 683,632 10 10 5 5 4.5 4.5 5 5 5 5 2.5 2.5 2.5 2.5 7.5 5 15 15 10 10 10 10 10 10 10 10 5 5 5 5 10 10 20 20 40 40 40 40 40 40 40 40 46 46 46 46 40 40 20 20 40 40 40 40 40 40 40 40 46 46 46 46 40 40 6/60 6/60 8/80 8/80 8/40 8/40 8/80 8/80 6/60 6/60 8/40 8/40 8/40 8/40 6/60 8/80 6/60 6/60 8/80 8/80 8/40 8/40 8/80 8/80 6/60 6/60 8/40 8/40 8/40 8/40 6/60 6/60 238 238 546 546 897 897 646 646 677 677 10,000 10,000 10,000 10,000 10,444 21,147 66.59 66.59 95.49 95.49 46.58 46.58 94.38 94.38 63.11 63.11 27.02 27.02 31.91 31.91 50.83 105.86 (10) (10) (10) (10) (5) (5) (10) (10) (10) (10) (5) (5) (5) (5) (10) (10) 3 5 3 5 3 7 3 5 3 5 3 8 3 8 3 3 2.70 2.70 2.82 2.82 2.66 2.66 2.70 2.70 3.00 3.00 2.47 2.47 2.47 2.47 2.58 2.00 0.71 0.71 0.70 0.70 1.33 1.33 0.67 0.67 0.50 0.50 1.10 1.10 0.99 0.99 0.99 1.06 337 337 776 776 675 675 961 961 1347 1347 9129 9129 10,151 10,151 10,560 19,910 7.44 7.44 7.69 7.69 12.28 12.28 7.26 7.26 7.52 7.52 13.10 13.10 11.76 11.76 10.36 15.52

(hectare) (periods/ years) (periods/ years) (hectare) (years) (years per period) Average no. stands Average area (hectare)

Average area (hectare) No. No.

32 32 71 71 73 73 89 89 90 90 763 763 850 850 1008 1363 Kittaning43 Kittaning45 Bear Town3 Bear Town5 WLC3 WLC7 PhyllisLeeper3 PhyllisLeeper5 Five Points3 Five Points5 FLG103 FLG108 FLG93 FLG98 Stafford El Dorado

(hectare)

(% of F)

(% of F)

v0 Cmin tot Hmin tot Cmin Amax Agemin old Agemin cut F age1

No. periods

The platform used was an Intel Core i7 quad-core CPU, running at 3.4 gigahertz in a Mac OS X version 10.10.2, with 24 gigabyte of RAM. All programs were implemented in Python (version 2.7.9). We ran MO-MCTS 12 times for each instance. MO-MCTS was allowed to run for two hours at each time. For each instance, a solution that appeared to be Pareto-optimal within a run may no longer be Pareto-optimal in another run. We shall refer to the first solution as false-efficient and the solution that remains Pareto-optimal over all runs as keep-efficient. Thus, the set of efficient solutions obtained by each run as well as the set of keep-efficient solutions over all runs define potential Pareto fronts for each run and over all runs, respectively. For the sake of simplicity, we refer to both sets simply as Pareto fronts. Five measures were used to evaluate the performance of MOMCTS: (i) the average number of efficient solutions (including false-efficient and keep-efficient), (ii) the average number of keepefficient solutions and (iii) the average number of simulations obtained per run (iv) the dimensionless coefficient of variation of hypervolume (DCV ) and (v) the relative range of hypervolume (RV ) over all runs (see the supplementary material for the paper for the concept of hypervolume). Measure (i) and (ii) reflect the ability of the method to obtain efficient solutions and keep-efficient solutions, respectively. Smaller values of measure (iii) indicate larger, more difficult instances. Measures (iv) and (v) are related to the ability of the method to obtain diversity in the set of efficient solutions; both are measures of relative dispersion of the hypervolume and they normalize a dispersion measure with a localization measure. The coefficient of variation is the ratio between the standard deviation of the hypervolumes and the average of these values, normalizing standard deviation with average. The relative range is the ratio between the total range of the hypervolumes (the maximum minus the minimum) and the maximum hypervolume, normalizing the total range with the maximum value. Larger values of the coefficient of variation and relative range indicate a better ability to obtain diversity in the set of efficient solutions. Table 3 displays, for each instance, the average number of efficient solutions, keep-efficient solutions and simulations obtained by each run, and the values for DCV and RV . The average number of efficient solutions ranged from 2.50 (ElDorado) to 13.58 (Kittaning43 ). Five instances (31.3% of the instances) had an average number of keep-efficient solutions not less than one, meaning that each run contributes, on average, to one keep-efficient solution at least. Kittaning43 was the instance with the largest number of simulations, as opposed to ElDorado. The smallest numbers of efficient solutions and simulations were observed in case of the large instances. As expected, in general, the number of efficient solutions increase with the increase of the number of simulations. For the medium sized instance WLC, a small number of efficient solutions were provided within a large number of simulations. We noted a tendency for many efficient solutions to cease to be efficient among all runs. This leads us to conclude that MO-MCTS must be run several times to find better approximations to the optimal Pareto front. In respect to DCV and RV values, for DCV , the smallest value occurs for WLC3 (0.0 0 05) and the largest value for FLG98 (0.3214). Concerning RV , the smallest value occurs also for WLC3 (0.0016) and the largest value for FivePoints5 (0.4426). Since both indicators assume a wide range of values, the approach appears to provide from very small variability to considerable variability on the hypervolume over the runs. Or, in other words, the ability of

Subregions (impact zone 50 meter)

4.2. Computational results

Stands

Neto et al. (2013) and Neto et al. (2016) (see the supplementary material for the paper for a summary description). An impact zone width of 50 meter has been used to generate the subregions. Table 2 summarizes the characteristics of the instances.

(meter 3 )

T. Neto, M. Constantino and I. Martins et al. / European Journal of Operational Research xxx (xxxx) xxx

Instance

8

ARTICLE IN PRESS Table 2 Characteristics of the instances: with respect to subregions, the third column displays the average number of stands that determine whether a subregion is a piece of edge or core area; age1 is the average age of the stands in min max is the maximum clearcut area; Cmin is the minimum core area for a habitat; Hmin is the minimum total the first period; F is the area of the forest; Agemin cut is the minimum harvest age; Ageold is the minimum mature age; A tot habitat area in each period; Cmin tot is the minimum total core area in each period; v0 is the target volume for each period.

JID: EOR

Please cite this article as: T. Neto, M. Constantino and I. Martins et al., A multi-objective Monte Carlo tree search for forest harvest scheduling, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.09.034

JID: EOR

ARTICLE IN PRESS

[m5G;October 17, 2019;5:34]

T. Neto, M. Constantino and I. Martins et al. / European Journal of Operational Research xxx (xxxx) xxx

9

Fig. 4. Pareto front for Kittaning43 obtained in each of the 12 runs (x-axis and y-axis represent the values of I and npv (103 euros), respectively). The gray lines show the evolution of the Pareto front during a particular run; the dark solid line is the best Pareto front reached in that run; the dotted line represents the best Pareto front over all runs.

the method to generate different approximations to the optimal Pareto front seems to be irregular. For DCV , the first quartile, median and third quartile are 0.028825, 0.05745 and 0.112925, respectively. For Rv , these measures are 0.083900, 0.1686 and 0.289425, respectively. The plot of the coefficients of variation and relative ranges of hypervolume against the size of the instances (number of stands and periods) did not seem to reveal a pattern in these measures of relative dispersion of the hypervolume along the size of the instances. Fig. 4 represents the Pareto fronts obtained for Kittaning43 . Each graphic corresponds to one run (at the top, the first six runs), where the x-axis and y-axis represent the values of I and npv (103

euros), respectively. The dark solid line corresponds to the best Pareto front for each run (false-efficient and keep-efficient solutions) and the dashed line corresponds to the best Pareto front over all runs (keep-efficient solutions). Gray lines show the evolution of the Pareto front throughout each run. This representation enables us to observe, in each figure, the sequence of nondominated points generated by an MO-MCTS run. It also shows that each run generated different sets of efficient solutions, hence producing a different Pareto front approximation. This figure allow us to observe that the trade-offs between net present value of timber harvested and probability of connectivity index are relatively flat between some consecutive alternative

Please cite this article as: T. Neto, M. Constantino and I. Martins et al., A multi-objective Monte Carlo tree search for forest harvest scheduling, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.09.034

ARTICLE IN PRESS

JID: EOR

−52.17 −32.55 −0.73 −5.80 −0.15 −0.81 −0.32 −2.19 −12.36 −4.25 −11.37 −4.17 −4.79 −2.54 −5.42 −2.68

npv−npv∗ npv∗

801,334 936,671 4,337,020 4,249,328 4,629,039 4,146,478 5,571,034 4,840,641 470,481 376,620 36,050,416 44,331,128 42,661,754 49,067,652 127,204,397 1,966,640 50 53 100 101 108 119 59 100 98 108 1460 1433 1462 1443 983 4388 55 55 116 103 116 121 72 107 111 109 1506 1501 1502 1500 1045 5145 1 1 1 1 1 1 1 1 1 1 20 15 18 15 17 11 1675,397 1,388,788 4,368,877 4,510,859 4,635,768 4,180,389 5,589,066 4,949,207 536,853 393,334 40,675,587 46,257,990 44,808,615 50,346,461 134,494,025 2,020,748

0.0536 0.0536 0.0452 0.0355 0.0151 0.0147 0.0309 0.0278 0.0272 0.0262 0.0091 0.0088 0.0099 0.0097 0.0035 0.0307

745 24 4540 2907 4753 4927 6306 2131 6801 5577 2258 6371 4382 4741 1644 7200

(euros)

I∗

0.1940 0.2567 0.0965 0.0561 0.0174 0.0174 0.0845 0.0439 0.0527 0.0796 0.0113 0.0110 0.0121 0.0117 0.0042 0.0429

124 2982 4523 5095 1528 4758 4831 4092 5394 5568 6733 5424 4288 4763 5173 7200

2 2 1 1 1 1 1 1 1 2 12 16 9 13 14 9

123 147 169 129 122 190 187 136 155 220 1503 1500 1501 1502 1049 5651

116 142 162 101 119 184 171 135 154 216 1471 1422 1431 1444 965 4908

261.9 378.9 113.5 58.0 15.2 18.4 173.5 57.9 93.8 203.8 24.2 25.0 22.2 20.6 20.0 39.7

I∗ −I 100 I

time (seconds)

No. habitats

Habitat area (hectare)

Core area (hectare)

Trade-offs (%)

100

Kittaning43 Kittaning45 BearTown3 BearTown5 WLC3 WLC7 PhyllisLeeper3 PhyllisLeeper5 FivePoints3 FivePoints5 FLG103 FLG108 FLG93 FLG98 Stafford ElDorado

solutions, mainly in the left side of the graphic, and very pronounced between other solutions, in the other side, where the values of the connectivity index are larger. In the first case, ensuring a more inter-habitat connectivity is not always obtained at the expense of a significant reduction in the present value. In the second case, slight increases of inter-habitat connectivity produce considerably less net present values. Similar figures for other instances show this pattern, while for other instances, the trade-offs are always flat. Table 4 shows some characteristics of the keep-efficient solutions corresponding to the best net present value and to the best connectivity index. The values of the number of habitats, their total area and total core area, refer to the last period of the planning horizon. There is a tendency for the total area and total core area of habitats to increase in the solutions with the best connectivity index, in particular, for the small and medium instances. For these instances, generally the number of habitats remained the same. For Kittaning43/5 and FivePoints5 , this number increased from one to two habitats. For almost all the large instances, the number of habitats decreased. For FLG108 , the number of habitats did not change, and for FLG93 , the number of habitats decreased

(euros)

Fig. 5. Stand map representing the solutions obtained for FLG103 with the best net present value or the best connectivity index, concerning the last period of the planning horizon.

Best connectivity index

0.2393 0.3427 0.1726 0.0800 0.0016 0.0047 0.3485 0.2970 0.2028 0.4426 0.1114 0.0852 0.0081 0.1652 0.1343 0.2869

npv

0.0983 0.1498 0.0590 0.0233 0.0005 0.0018 0.1265 0.1084 0.0559 0.2565 0.0310 0.0292 0.0277 0.3214 0.0410 0.0899

Core area (hectare)

14763647.25 9459201.92 7326169.33 5572913.08 6590686.27 3731875.00 4416487.42 3180922.75 3326018.58 2033565.67 43826.42 18312.42 30022.75 14015.75 16131.92 9172.83

Habitat area (hectare)

1.17 1.50 1.00 1.17 0.33 0.33 0.83 0.25 0.25 1.33 0.67 0.42 0.33 0.42 0.42 0.42

RV

No. habitats

13.58 12.67 10.17 11.00 3.91 2.92 10.17 9.00 10.42 10.50 7.00 2.42 3.75 2.58 3.50 2.50

DCV

time (seconds)

Simulations

I

Keep-efficient solutions

Best net present value

Efficient solutions

npv∗

Kittaning43 Kittaning45 BearTown3 BearTown5 WLC3 WLC7 PhyllisLeeper3 PhyllisLeeper5 FivePoints3 FivePoints5 FLG103 FLG108 FLG93 FLG98 Stafford ElDorado

Average number of

Instance

Table 3 Average number of efficient solutions (false-efficient and keep-efficient), keepefficient solutions and simulations by run, coefficient of variation (DCV ) and relative range (RV ) of the hypervolume over all runs; an efficient solution that ceases/continues to be efficient among all runs is false-efficient/keep-efficient. Instance

[m5G;October 17, 2019;5:34]

T. Neto, M. Constantino and I. Martins et al. / European Journal of Operational Research xxx (xxxx) xxx Table 4 Net present and index values, CPU time, number, total area and total core area of habitats in the last period for the keep-efficient solutions with the best net present value (npv∗ ) or the best connectivity index (I∗ ). Trade-off between these two solutions.

10

Please cite this article as: T. Neto, M. Constantino and I. Martins et al., A multi-objective Monte Carlo tree search for forest harvest scheduling, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.09.034

JID: EOR

ARTICLE IN PRESS T. Neto, M. Constantino and I. Martins et al. / European Journal of Operational Research xxx (xxxx) xxx

(significantly) as well as their total area and core area (slightly). Fig. 5 shows, for FLG103 , that these habitats become closer to each other compared with those solutions that provide the best net present value. For all instances except Stafford, the values of the connectivity index provided by the keep-efficient solution corresponding to the best net present value makes the total habitat area constraints redundant (see the discussion about the relationship between probability of connectivity index and habitat area in the supplementary material of the paper). Table 4 also shows, for each instance, the trade-off between the keep-efficient solutions referred to above. To reach the best connectivity index, the net present value decreased from 0.15% (WLC3 ) to 52.17% (Kittaning3 ). The net present value decreased within 5% or slight above for the majority of the instances, with increases of the connectivity index from 15 to 200%. 5. Conclusions This work deals with crucial spatial issues that affect forest fragmentation caused by harvesting: total habitat area, total core area inside habitats and inter-habitat connections. We consider a bi-objective harvest scheduling problem with constrains on the maximum clearcut area and on the habitats’ total area and total core area. The problem aims at maximizing both the net present value and the probability of connectivity index. Core area is modeled directly using the concept of subregions, and inter-habitat connections is measured by the probability of connectivity index. We developed a multi-objective Monte Carlo tree search (MOMCTS) to find a set of efficient solutions of the problem. This approach was tested with forests ranging from 32 to 1363 stands, and temporal horizons ranging from three to eight periods. We ran MO-MCTS 12 times for each instance, each time allowing MO-MCTS to run for two hours at the most. The structure of MO-MCTS allows us to know the sequence of non-dominated points that are being generated. Due to the stochastic nature of the method, each run usually gives different set of efficient solutions. The average number of efficient solutions obtained for each instance ranges from 2.50 to 13.58. This average decreases significantly when we select the efficient solutions obtained in all runs (keep-efficient solutions). This means that running MO-MCTS more times, or for a longer time in each run, allows us to discard false-efficient solutions, which may provide better approximations to the optimal Pareto front. With respect to the trade-off between net present value of timber harvested and probability of connectivity index, we observed that in general slightly decreasing the net present value with respect to its best value may allow a significant increase in the connectivity index. Ensuring the best connectivity index may require a significant reduction in the net present value; however, there are instances where even the solution with the best connectivity is obtained with a low reduction of the net present value. By comparing the best solutions in terms of connectivity index with the best solutions in terms of net present value (among the keep-efficient solutions) concerning the habitats, we found that, from the latter to the former solutions, the total area of habitats increased for almost all instances and the total core area of habitats increased with the total area of habitats. The number of habitats decreased for the majority of the large instances, and increased or did not change for the small and medium instances. MO-MCTS can be used to assist decision makers by providing several efficient alternative solutions, thus allowing him/her to choose the one with the best trade off between net present value and probability of connectivity index, according to his/her own preferences. Further work to improve the ability of the algorithm to generate more efficient solutions and to solve much larger instances is needed. A great merit of MO-MCTS is that its structure

[m5G;October 17, 2019;5:34] 11

can be easily adapted to include other issues, so further work may also include other forest management concerns. Supplementary material Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.ejor.2019.09.034. References Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2–3), 235–256. Baskent, E. Z., & Jordan, G. A. (1995). Characterizing spatial structure of forest landscapes. Canadian Journal of Forest Research, 25(11), 1830–1849. Bettinger, P., Graet, D., Boston, K., Sessions, J., & Chung, W. (2002). Eight heuristic planning techniques applied to three increasingly difficult wildlife planning problems. Silva Fennica, 36(2), 561–584. Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M., Cowling, P. I., Rohlfshagen, P., . . . Colton, S. (2012). A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games 4, 1–43. Caro, F., Constantino, M., Martins, I., & Weintraub, A. (2003). A 2-opt tabu search procedure for the multiperiod forest harvesting problem with adjacency, greenup, old growth, and even flow constraints. Forest Science, 49(5), 738–751. Chalmet, L. G., Lemonidis, L., & Elzinga, D. J. (1986). An algorithm for the bi-criterion integer programming problem. European Journal of Operational Research, 25, 292–300. Clements, S. E., & Dallain, P. L. (1990). An operational, spatially constrained harvest scheduling model. Canadian Journal of Forest Research, 20, 1438–1447. Coulom, R. (2006). Efficient selectivity and backup operators in Monte-Carlo tree search. In Proceedings of the international conference on computers and games (pp. 72–83). Springer. Crowe, K., Nelson, J., & Boyland, M. (2003). Solving the area-restricted harvest-scheduling model using the branch and bound algorithm. Canadian Journal of Forest Research, 33(9), 1804–1814. Davis, L. S. (1977). Understanding shape: Angles and sides. IEEE Transactions on Computers, 100(3), 236–242. Eswaran, P. K., RavindranH, A., & Moskowitz, H. (1989). Algorithms for nonlinear integer bicriterion problems. Journal of Optimization Theory and Applications, 63(2), 261–279. Falcão, A. O., & Borges, J. (2002). Combining random and systematic search heuristic procedures for solving spatially constrained forest management scheduling models. Forest Science, 48(3), 608–621. Franklin, J. F., & Forman, R. T. T. (1987). Creating landscape patterns by forest cutting: Ecological consequences and principles. Landscape Ecology, 1(1), 5–18. Geoffrion, A. M. (1968). Proper efficiency and the theory of vector maximization. Journal of Mathematical Analysis and Applications, 22(3), 618–630. Goycoolea, M., Murray, A. T., Barahona, F., Epstein, R., & Weintraub, A. (2005). Harvest scheduling subject to maximum area restrictions: exploring exact approaches. Operations Research, 53(3), 490–500. Goycoolea, M., Murray, A. T., Vielma, J. P., & Weintraub, A. (2009). Evaluating approaches for solving the area restriction model in harvest scheduling. Forest Science, 55(2), 149–165. Haddad, N. M., Brudvig, L. A., Clobert, J., Davies, K. F., Gonzalez, A., Holt, R. D., . . . Collins, C. D., et al. (2015). Habitat fragmentation and its lasting impact on earth’s ecosystems. Science Advances, 1(2), e150 0 052. Haimes, Y. Y., Lasdon, L. S., & Wismer, A. (1971). On a bicriterion formulation of the problems of integrated system identification and system optimization. IEEE Transactions on Systems Man and Cybernetics, 1(3), 296–297. Harris, L. D. (1984). The fragmented forest: Island biogeography theory and the preservation of biotic diversity. University of Chicago press. Hof, J., Bevers, M., Joyce, L., & Kent, B. (1994). An integer programming approach for spatially and temporally optimizing wildlife populations. Forest Science, 40(1), 177–191. Hoganson, H., Wei, Y., & Hokans, R. H. (2005). Integrating spatial objectives into forest plans for Minnesota’s National Forests. Technical Report 656. USDA Forest Service - General Technical Report PNW-GTR. Kocsis, L., & Szepesvári, C. (2006). Bandit based Monte-Carlo planning. In Proceedings of the Ecml: 6 (pp. 282–293). Springer. ˝ N., & Tóth, S. F. (2013). A cutting plane method for solving harvest schedulKönnyu, ing models with area restrictions. European Journal of Operational Research, 228(1), 236–248. Kurttila, M., Pukkala, T., & Loikkanen, J. (2002). The performance of alternative spatial objective types in forest planning calculations: a case for flying squirrel and moose. Forest Ecology and Management, 166(1), 245–260. Martins, I., Constantino, M., & Borges, J. (1999). Forest management models with spatial structure constraints. Working Paper 2. CIO/Faculdade de Ciências de Lisboa. Martins, I., Constantino, M., & Borges, J. (2005). A column generation approach for solving a non-temporal forest harvest model with spatial structure constraints. European Journal of Operational Research, 161(2), 478–498. Martins, I., Ye, M., Constantino, M., Fonseca, M., & Cadima, J. (2014). Modeling target volume flows in forest harvest scheduling subject to maximum area restrictions. TOP, 22(1), 343–362.

Please cite this article as: T. Neto, M. Constantino and I. Martins et al., A multi-objective Monte Carlo tree search for forest harvest scheduling, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.09.034

JID: EOR 12

ARTICLE IN PRESS

[m5G;October 17, 2019;5:34]

T. Neto, M. Constantino and I. Martins et al. / European Journal of Operational Research xxx (xxxx) xxx

McGarigal, K., Cushman, S. A., Neel, M. C., & Ene, E. (2002). FRAGSTATS: Spatial pattern analysis program for categorical maps. University of Massachusetts, Amherst URL https://www.umass.edu/landeco/research/fragstats/fragstats.html. Moellering, H., & Rayner, J. N. (1981). The harmonic analysis of spatial shapes using dual axis fourier shape analysis (DAFSA). Geographical Analysis, 13(1), 64–77. Nelson, J., & Brodie, J. D. (1990). Comparison of random search algorithm and mixed integer programming for solving area-based forest plans. Canadian Journal of Forest Research, 20(4), 934–942. Neto, T., Constantino, M., Martins, I., & Pedroso, J. D. (2013). A branch-and-bound procedure for forest harvest scheduling problems addressing aspects of habitat availability. International Transactions in Operational Research, 20(5), 689–709. Neto, T., Constantino, M., Martins, I., & Pedroso, J. P. (2016). Forest harvest scheduling with clearcut and core area constraints. Annals of Operations Research. doi:10.1007/s10479- 016- 2313- 2. O’Hara, A. J., Faaland, B. H., & Bare, B. B. (1989). Spatially constrained timber harvest scheduling. Canadian Journal of Forest Research, 19(6), 715–724. Öhman, K. (20 0 0). Creating continuous areas of old forest in long-term forest planning. Canadian Journal of Forest Research, 30(11), 1817–1823. Öhman, K., & Eriksson, L. O. (1998). The core area concept in forming contiguous areas for long-term forest planning. Canadian Journal of Forest Research, 28(7), 1032–1039. Öhman, K., & Lämås, T. (2005). Reducing forest fragmentation in long-term forest planning by using the shape index. Forest Ecology and Management, 212(1), 346–357. Öhman, K., Pukkala, T., et al. (2002). Spatial optimisation in forest planning: A review of recent swedish research.. Multi-Objective Forest Planning, 153–172. Öhman, K., & Wikström, P. (2008). Incorporating aspects of habitat fragmentation into long-term forest planning using mixed integer programming. Forest Ecology and Management, 255(3), 440–446. Paradis, G., & Richards, E. W. (2001). Flg: A forest landscape generator. CORS-SCRO Bulletin, 35(3). Pedroso, J. J., & Rei, R. (2015). Tree search and simulation. In M. Mujica Mota, I. F. De La Mota, & D. Guimarans Serrano (Eds.), Applied simulation and optimization (pp. 109–131). Springer International Publishing. doi:10.1007/ 978- 3- 319- 15033- 8_4. Rebain, S., & McDill, M. E. (2003a). Can mature patch constraints mitigate the fragmenting effects of harvest opening size restrictions? International Transactions in Operational Research, 10(5), 499–513.

Rebain, S., & McDill, M. E. (2003b). A mixed-integer formulation of the minimum patch size problem. Forest Science, 49(4), 608–618. Sabharwal, A., Samulowitz, H., & Reddy, C. (2012). Guiding combinatorial optimization with UCT. In Proceedings of the ninth international conference on integration of ai and or techniques in constraint programming for combinatorial optimization problems (CPAIOR 2012) (pp. 356–361). Berlin, Heidelberg: Springer. Saura, S., & Hortal, L. (2007). A new habitat availability index to integrate connectivity in landscape conservation planning: comparison with existing indices and application to a case study. Landscape and Urban Planning, 83(2), 91–103. Snyder, S., & ReVelle, C. (1997). Multiobjective grid packing model: an application in forest management. Location Science, 5(3), 165–180. Steuer, R. E. (1986). Multiple criteria optimization: Theory, computation and application. Graduate Texts in Mathematics. Wiley. Tóth, S. F., McDill, M. E., & Rebain, S. A. (2006). Finding the efficient frontier of a bi-criteria, spatially explicit, harvest scheduling problem. Forest Science, 52(1), 93–107. Wang, W., & Sebag, M. (2012). Multi-objective Monte Carlo tree search. In Proceedings of the Asian conference on machine learning (pp. 507–522). Wei, Y. H., & Hoganson, H. (2007). Scheduling forest core area production using mixed integer programming. Canadian Journal of Forest Research, 37(10), 1924–1932. Wei, Y., & Hoganson, H. M. (2008). Tests of a dynamic programming-based heuristic for scheduling forest core area production over large landscapes. Forest Science, 54(3), 367–380. Wentz, E. A. (20 0 0). A shape definition for geographic applications based on edge, elongation, and perforation. Geographical Analysis, 32(2), 95–112. Williams, J. C. (1998). Delineating protected wildlife corridors with multi-objective programming. Environmental Modeling and Assessment, 3, 77–86. Xia, L. (1996). Technical note. a method to improve classification with shape information. International Journal of Remote Sensing, 17(8), 1473–1481. Zhang, H., Constantino, M., & Falcão, A. (2011). Modeling forest core area with integer programming. Annals of Operations Research, 190(1), 41–55. Zitzler, E., & Thiele, L. (1998). Multiobjective optimization using evolutionary algorithms – A comparative case study. In Proceedings of the international conference on parallel problem solving from nature (pp. 292–301). Springer.

Please cite this article as: T. Neto, M. Constantino and I. Martins et al., A multi-objective Monte Carlo tree search for forest harvest scheduling, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2019.09.034