Improved derivation of forest stand canopy height structure using harmonized metrics of full-waveform data

Improved derivation of forest stand canopy height structure using harmonized metrics of full-waveform data

Remote Sensing of Environment 235 (2019) 111436 Contents lists available at ScienceDirect Remote Sensing of Environment journal homepage: www.elsevi...

2MB Sizes 0 Downloads 44 Views

Remote Sensing of Environment 235 (2019) 111436

Contents lists available at ScienceDirect

Remote Sensing of Environment journal homepage: www.elsevier.com/locate/rse

Improved derivation of forest stand canopy height structure using harmonized metrics of full-waveform data

T

Chinsu Lin Department of Forestry and Natural Resources, National Chiayi University, Chiayi, 60004, Taiwan

A R T I C LE I N FO

A B S T R A C T

Keywords: Spaceborne LiDAR data Harmonized stand height model Plot-aggregate canopy height Probability density function fitting Stand height distribution model Forest dynamics Forest biomass estimation

A variety of prototypical metrics of ICESat-1 GLAS full-waveform data have been applied to estimate the forest canopy height over a footprint area and presented using the values of maximum canopy height, crown-areaweighted canopy height, or basal-area-weighted canopy height. The full-waveform pattern of a large footprint is changeable due to complicated interaction of canopy elements and terrain relief, consequently a noticeable bias may be introduced to the estimation. This paper begins with the approach of area-based growing stock volume calculation, developing harmonized metrics to integrate versatile signals in a denoised multimodal waveform of ICESat-1 data. These novel metrics were applied to derive a harmonized canopy height model for estimating the canopy height at footprint level and the distribution of canopy height at stand level. The performance of this model was evaluated with respect to the prototypical metrics derived canopy height model based on airborne LiDAR data determined canopy heights of a secondary evergreen mixed forest in a steep rugged terrain in central Taiwan. Footprint-level canopy height was evaluated via accuracy measures using a cross-validation method. Appropriateness of canopy height structure was examined via a post hoc test of the shape similarity for both ICESat-1 data-derived and airborne LiDAR-derived canopy height distribution models at stand level. Results showed that the harmonized canopy height model was able to achieve an accuracy of RMSE = 3.13 ± 0.08 m and RMSE% = 22.40 ± 0.55% as well as a significant reduction of bias. The canopy height distribution of the forest stand derived by harmonized metrics of ICESat-1 full-waveform data is statistically the same as that determined by airborne LiDAR data. Harmonized metrics provide compensatory information of trailing-edge extent and terrain index for describing canopy height. In contrast to the canopy height model using only prototypical waveform metrics, the harmonized metrics based canopy height model can mitigate the potential influence of terrain effects and dense canopy coverage on waveform pattern. Because a priori knowledge of terrain relief data is not required for modeling, the harmonized algorithm is useful for retrieving the trend of canopy height growth over the terrestrial forest ecosystem.

1. Introduction A LiDAR remote sensing system actively emits laser pulses towards the ground from an aircraft or spacecraft and senses the energy reflected by objects on the ground enabling analysis of tree canopies. Varieties of LiDAR data have been widely used to estimate forest volume, biomass, and carbon stocks. The major volumetric parameter used in area-based approaches is mean canopy height, generally derived from low resolution LiDAR data (Halla et al., 2005). However, volume/biomass/carbon estimates derived using area-based techniques are notably imprecise in mountainous forest due to changes in the slope of the terrain and heterogeneous growth among individual trees. Treebased algorithms are more accurate in determining individual tree attributes (Popescu, 2007; Reitberger et al., 2007; Lo and Lin, 2013; Lin

et al., 2018) and so more capable of providing precise estimate of forest volume/biomass/carbon stocks (Lin et al., 2016a). Airborne laser scanner (ALS) has been widely used to collect high resolution multiple laser returns over a small area in order to provide detailed information of individual tree crowns. Though ALS data has been repeatedly used for state-wide forest inventories in Europe (Noordermeer et al., 2019), the high acquisition cost of the data might remain a problem for many other countries (Popescu et al., 2018). The ICESat-1 launched in 2003 provides moderate resolution satellite altimeter data that can gather the changes of forest canopy over a wide terrestrial ecosystem. The ICESat-1 Geoscience Laser Altimeter System (GLAS) has a larger laser beam divergence which increases the footprint size and the chances of hitting both tree apexes and the ground over a footprint coverage (Mallet and Bretar, 2009). Typically,

E-mail address: [email protected]. https://doi.org/10.1016/j.rse.2019.111436 Received 28 January 2019; Received in revised form 15 September 2019; Accepted 19 September 2019 0034-4257/ © 2019 Elsevier Inc. All rights reserved.

Remote Sensing of Environment 235 (2019) 111436

C. Lin

structure heterogeneity and terrain slope increased. The pattern of a full-waveform is a result of the interaction between canopy structure and topography, and a change in the pattern is likely able to influence the relationship between individual metrics and canopy height and will introduce significant uncertainty in the estimation of forest canopy height. For example, as shown in Table 2, canopy height was positively related to leading-edge extent of a waveform in TE(6) while it was negatively related to leading-edge in TE(8). Though the metrics wfE, le10, and te10 in NE(3), NE(4) and TE(8), the models developed by Lefsky (2010) revealed an identical positive relationship with forest canopy height, their intercept coefficient regulated the canopy height prediction in an inconsistent way (positive for conifer but negative for broadleaf and mixed forest). This kind of relationship inconsistency was due to the difference of data range in datasets and co-effects of the predictors. Interestingly, an inconsistent relationship between terrain index (see Table 1) and canopy height was also observed in NE(1) and NE(2), the models developed by Hayashi et al. (2013) in the case of forest land with a slope ≤10° or > 10°. We also found that leading-edge extent of a waveform predicted canopy height in opposite ways (see negative vs. positive effects) in TE(4) and TE(6). The oppositional cause-effect relationship of such models seems to indicate that some kind of canopy features and/or terrain features were not appropriately retrieved from the full-waveform for deriving the canopy height estimation model. The pulse energy returns in the full-waveform data will be increased due to the multiple returns from subcanopy vegetation. If the surface targets are distributed in a range of altitudes within a large footprint, they form a vertical-multiple-layer canopy structure. Changes of a fullwaveform pattern should then be considered as valuable canopy information for gathering possibly critical metrics for stand height determination at footprint level. Heights of individual dominant trees can be accurately identified through high-resolution images and airborne LiDAR data (Reitberger et al., 2007; Popescu, 2007; Lin et al., 2011a; Lin et al., 2016a). This enables a reliable estimate of stand height at the footprint level to support the need of deriving harmonized metrics (refer to section 2.3) to integrate diverse features of the full-waveform data. So, the objectives of this study were (1) to demonstrate the methods for deriving harmonized metrics of a full-waveform data, (2) to develop an appropriate harmonized stand height model (abbreviated HSH model) for estimating stand height at footprint level, and (3) to examine if the HSH model is appropriate for deriving canopy height distribution at stand level.

the simplest type of forest canopy full-waveform appears to be a bimodal waveform which increases confidence in the determination of canopy height. As vertical and horizontal canopy-architecture complexity increases, the waveform becomes multimodal due to more energy being reflected from subcanopy elements such as foliage, lower tree branches and/or lower stratum vegetation. The multimodal waveform raises the probability of interference from subcanopy elements (Lee and Lucas, 2007) introducing uncertainty into canopy features over a large footprint area, particularly in the case of complicated waveforms where multiple subcanopy elements may combine with the terrain effects found in mountainous forest areas. GLAS full waveform data has been mostly applied to determine forest canopy height (Lefsky et al., 2005; Rosette et al., 2008; Simard et al., 2011; Los et al., 2012; Montesano et al., 2016) in the form of maximum canopy height (CHMAX). However, from the perspective of area-based methods of volume/biomass/carbon stocks calculation, CHMAX may not be representative of stand tree height particularly in stands where individual trees exibit a large variation in height and where trees are distributed unevenly over the space vertically and horizontally. ICESat-1 GLAS full-waveform data (hereafter the full-waveform data) derived canopy height can also be crown-area-weighted canopy height (abbreviated as CHCA) (Pang et al., 2008), basal-area-weighted canopy height (named as Lorey's height and abbreviated as CHL) (Lefsky, 2010; Fang and Cao, 2014). Trees show a wide range of size parameters due to growth, competition with neighbors, natural/anthropogenic disturbances, and slope/aspect positions. Even trees of similar total height may show a variety of crown sizes and trunk diameters in a competitive or disturbed environment (Lin et al., 2016b). This complicates the full-waveform data and leads to a difficulty in using CHL and CHCA to describe the variations of forest canopy height as well as aboveground biomass over a large footprint area (Asner and Mascaro, 2014; Fayad et al., 2014). In traditional forestry, an area-based approach is used to determine stand stock volume as a product of average diameter and average total height of trees and stem density per hectare (Carron, 1968; Tesfamichael et al., 2010; Lin and Lin, 2013; Lin et al., 2016a). Based on the tree crown levels, the average total height of dominant and codominant trees within the area of a plot or forest stand is defined as the stand height, abbreviated CHDO (Avery and Burkhart, 2002), which is related to site quality and is generally considered as an indicator of forest productivity. The full-waveform data is a footprint covering a specific area that is practically identical to a sample plot in forest inventory. By definition, applying CHDO as the representative of a footprint canopy height needs no priori knowledge of tree trunk diameter and crown size. CHDO is therefore most likely able to account for both vertical and horizontal variations in canopy structure of a forest stand over a large footprint area and can assist in the estimation of volume/ biomass/carbon stocks of forest stands. As shown in Table 1, a variety of metrics such as the full-waveform extent, leading-edge extent, trailing-edge extent, the energy quartile or distance quartile information of a waveform have been developed to describe natural features of forest canopy over slope lands. Because multimodal full-waveforms tend to be a combination of Gaussian components (Blair and Hofton, 1999; Hofton et al., 2000), parameters of Gaussian fits, i.e., the amplitude and width of the full-waveform data have also been applied to examine forest canopy height (Harding, 2005; Brenner et al., 2011; Fang and Cao, 2014). Empirical modeling is primarily the best way to achieve this purpose. According to Hayashi et al. (2013), the extent-based statistical methods use only extent related metrics as the predictors of canopy height for example the equations with codes TE(4, 8, 9) in Table 2. In contrast, the DEM-based statistical methods such as the equations coded TE(3, 5–7) use extra DEM-related variables to predict canopy height. As seen in Table 2, the equations TE (1, 2, 10) developed recently show DEM information used as a priori knowledge of extent-based method for better estimation. As noted, the full-waveform complexity is raised as the canopy

2. Materials and methods 2.1. Study site and data collection The Tsengwen-Reservoir Watershed Area (Fig. 1) in central Taiwan centered at 23° 19′ N, 120° 36′ E was selected for this study. The watershed covers an area of 48020.8 ha. The elevation ranges from 300m to 2012m, and the slope is mostly greater than 20° and averaged 31 ± 15°. The 2009 Taiwan Forestry Bureau vegetation inventory indicated this area is mainly covered with secondary forest of an evergreen broadleaf Machilus-Castanopsis and partially bamboo-broadleaf mixed forest. In addition plantations of the species Chamaecyparis formosensis, Pinus spp., Chryptomeria japonica, Acacia confusa, Cinnamomum camphora, Fraxinus griffithii, Zelkova serrata, and Liquidamar formosana account for 9.5% of the total area. The age of trees in 2017 ranged between 45 and 93 years since artificial regeneration. The forest is primarily managed for water-resource conservation by Taiwan Forestry Bureau under the National Forest Law and Water-Soil Conservation Act. The reservoir in the watershed area is the largest earth dam by volume in Taiwan regulating the flow of the Tsengwen River for the irrigation of the Chianan Plain. The ICESat-1 GLAS data (GLA01 and GLA14) with the laser identifier L3A was acquired on October 8 and 18, 2004 over the Tsengwen2

Remote Sensing of Environment 235 (2019) 111436

C. Lin

Table 1 Definition and abbreviation of the full-waveform metrics and parameters of canopy height models. Abbreviations

Descriptions

Appeared in

wfE leE, teE

Waveform extent, the distance between signal start and signal end. (Harding and Carabajal, 2005; Lefsky, 2005) Leading-edge extent and trailing-edge extent, the edge extents between the two points in the waveform with half the maximum intensity and the corresponding start and end points of the waveform. (Lefsky, 2007; Pang et al., 2008; Chen, 2010) Leading-edge 10% extent and trailing-edge 10% extent, le10 and te10 represent the height difference between the signal start/the signal end and the point where 10%/90% of the cumulative waveform energy return occurs. (Lefsky, 2010) The proportion of energy in the first and the fourth of four equal elevation divisions. (Duncanson et al., 2010; Fang and Cao, 2014) The proportion of energy in the first of four equal energy return divisions. (Duncanson et al., 2010; Fang and Cao, 2014) The 1st, 2nd, …, 9th decile of the returned-energy. (Lefsky, 2010) Amplitude/width of the nth Gaussian peaks determined by Gaussian decomposition. (Fang and Cao, 2014) The range of DEM within a footprint, also Terrain index (TI). (Lefsky, 2005) The distance between the ith vegetated peak and the ground peak. The intensity of the vegetated peak i. The Gaussian-fitted area of the vegetated peak i. Peak-distance-based harmonized metric, the original prototype of harmonized extent metrics, is determined as the average of the distance between each of the vegetated peaks and the ground peak (PDi). Energy weighted peak-distance-based harmonized metric, the vegetated peak's intensity weighted Hpd whereby effect of the distance of each vegetated peak (PDi) on a full-waveform is weighted in proportion to its intensity (PEi). Area weighted peak-distance-based harmonized metric, the vegetated peak's area weighted Hpd whereby effect of the distance of each vegetated peak (PDi) on a full-waveform is weighted in proportion to its Gaussian-fitted area (PAi). Three kinds of harmonized stand height models (HSH models), which are used to estimate the stand height within a footprint area using two predictors: a harmonized extent metric (Hpd/Hpa/Hpe) and a correction factor. Correction factor, an exogenous variable of HSH models, is used to provide compensatory information of the full-waveform complex features particularly the ground surface feature or teE. Based on the harmonized extent metric (Hpd/Hpa/Hpe) used in HSH models, this factor can be in one of the three types CFpd, CFpa, and CFpe.

TE(1–9) TE(4,6)

le10, te10 e14, e44 E14 E10,E20,…,E90 agn vs wgn demE PDi PEi PAi Hpd Hpe Hpa HSHpd, HSHpa,HSHpe CF

TE(1,2,8) TE(9, 10) TE(9, 10) Eqs. 8–10 TE(9, 10) TE(1–5) Fig. 4 Fig. 4 Fig. 4 Eq. (2) Eq. (3) Eq. (4) Eq. (5) Eqs. 8–10

Reservoir Watershed. GLA14 listed the centroid location and the shape parameters of footprints such as azimuth of major axis, the length of major and minor axes, and the mean and standard deviation of background noise. GLA01 gave raw data of the full waveforms of the energy that was returned by object surfaces and bare-earth surfaces within footprints. As atmospheric backward scattering can cause full waveform signals to be saturated and lead to a problem of outliers in a regression model, only the footprints encoded with flag Fri_qaflag = 15 or 14 (cloud-free or almost cloud-free) and saturation index satNdx = 0 or ≤8 (saturation-free or less saturated) were selected for the analysis. The products were the release No.33. According to Brenner et al. (2011), the background noise contained in the raw data of full waveforms was removed using the threshold value that was determined using Eq. (1),

A set of airborne LiDAR data obtained from the Southern Region Water Resources Office, Ministry of Economic Affairs, Taiwan (SRWRO, 2012) was used for deriving the stand height and forest structure information. The data was from January 2012 using the Leica Geosystems ALS60 at an average ground level of 2100–2500 m. The point cloud density was around 3.7 points per square meter from which 1-m cell size of DSM and DEM were produced (Chung et al., 2016). Accordingly, the 1-m resolution of canopy height model (CHM) was determined by CHM = DSM−DEM (Fig. 1). A variety of ancillary data including ortho-rectified aerial photographs and SPOT multispectral images obtained in 2004–2005 were collected for interpreting the land cover.

Threshold  =  E ‾ + nsig⋅Estd

Fig. 2 describes the procedures used for integrating a variety of data to derive stand height at footprint level and height structure at stand level. Each of the ICESat-1 GLAS waveforms was first denoised using a moving average technique and used to derive the prototypical metrics and harmonized metrics. The airborne LiDAR canopy height model was used to derive tree metrics and determine the stand height at footprint level. A regression analysis was carried out to generate appropriate

2.2. ICESat-1 GLAS waveform data and airborne LiDAR data processing

(1)

where the mean noise (E ‾ ) and standard deviation (Estd ) were given at GLAS01 and the nsig was 4.5. As shown in Fig. 1(a), there were 180 footprints aligned in two transect lines over the study site. Excluding those saturated and/or water-body covered points a total of 108 points were used.

Table 2 A comparison to the effect of full-waveform metrics and terrain index in canopy height estimation. Canopy height

Equations

Codesa

Sources

CHMAX

H= 5.902 + 0.814 × wfE − 0.877(le10 + te10) ,if demE ≤ 15 m (n = 238) H= 10.721 + 0.409 × wfE − 0.313(le10 + te10) ,if demE > 15 m (n = 238) H= 0.899 × wfE − 0.431 × demE (slope≤10°) H= 0.549 × wfE + 0.519 × demE (slope > 10°) H= 0.84 × wfE − 0.31 × demE H= 0.81 × wfE − 0.17(leE + teE) H= 4.86 + 0.91(wfE − demE) H= 0.6211(wfE − 0.3692 × demE + 0.4184 × leE ) (n = 23) H= 0.6878 × wfE − 0.1452 × demE  (n = 23) H= 0.95 + 0.59 × wfE − 0.106 × le10 − 0.074 × te10 (for conifer forest) (n = 389)  H = −4.5 + 0.55 × wfE − 0.102 × le10 − 0.0895 × te10 (for broadleaf forest) (n = 95) H= −2.3 + 0.56 × wfE − 0.106 × le10 − 0.0486 × te10 (for mixed forest) (n = 484) H= 0.152 + 0.608 × wfE + 0.573 × E14 − 0.362 × e14 − 0.239 × e44 + 0.287 × ag4 (no slope effect considered) H= 0.593 − 0.418 × e14 − 0.453 × e44 + 0.786 × ag4 + 0.153 × wg6 (slope > 20°)

TE(1) TE(2) NE(1) NE(2) TE(3) TE(4) TE(5) TE(6) TE(7) NE(3) NE(4) TE(8) TE(9) TE(10)

Hayashi et al. (2015)

CHMAX CHMAX CHMAX CHMAX CHL

CHL

a

TE and NE denotes the tested or non-tested for models' performance comparison based on the form and sources of the considerations. 3

Hayashi et al. (2013) Chen (2010) Rosette et al. (2008) Lefsky et al. (2005) Lefsky (2010)

Fang and Cao (2014)

Remote Sensing of Environment 235 (2019) 111436

C. Lin

Fig. 1. A SPOT image overlaid with two transect lines of ICESat-1 GLAS footprints (a) and airborne LiDAR data derived DEM image (b) and CHM image (c) over the Tsengwen-Reservoir Watershed area. The two boxes embedded on upper left and lower right of (c) detail the CHM variation of the parts located on relatively lower and higher DEM areas respectively. The upper left was with a maximum CHM of 29.62 m and averaged at 8.33 ± 6.35 m while the lower right was with a maximum CHM of 33.64 m and averaged at 15.97 ± 6.27 m.

deriving waveform features. A thresholding method of choosing an appropriate peak width to differentiate spike noise from meaningful peaks in a full-waveform can be made based on two general rules regarding the relationship between crown dimension and tree size. One is that the crown width of an individual tree in a forest is generally 15 times its stem diameter (D) if D ≤ 0.75 m and the living trees in a forest have a variety of diameters (Hemery et al., 2005). Another is that the predicted crown length (crown depth) of a target tree in forests may vary with a difference of 1–3 m to 5–14 m when the number of competitors (neighboring trees) is increased (Thorpe et al., 2010). This suggests that crown length estimation might have a minimum error of 1–3 m. Therefore, a peak with a width less than 1.5 m (or 10 bins) can be considered as spike noise. Moving average is a simple Low Pass Finite-Impulse-Response filter commonly used for removing wavelet noise along 1-Dimensional waveforms which takes the average of K-samples while retrieving the

footprint-level stand height estimation models using the prototypical metrics and harmonized metrics separately. Finally, a probability density function (PDF) fitting technique was applied to each dataset of the stand-height models to derive the height structure of the forest stand. Performance of the models in stand height estimation at footprint level was evaluated using cross-validation method, and the appropriateness of the models in presenting height structure of a forest stand was examined by a post hoc test of the similarity of PDF-parameter between the LiDAR-based and GLAS-based structure models using Duncan's new multiple range test (abbreviated as “Duncan's test”). 2.2.1. Removal of wavelet-noise in large-footprint full-waveform A GLAS full-waveform generally contains significant spikes which can be characterized by significantly large or small amplitudes with very short duration or bins width. This spike noise is generally the result of a series of natural disturbances, and should be removed before

Fig. 2. Flowchart of the data processing and the determination of footprint-level harmonized stand height and stand-level height structure. 4

Remote Sensing of Environment 235 (2019) 111436

C. Lin

Fig. 3. An example of waveform denoising. The red ellipse in aerial photo (a) and the canopy height model (b) showed the land cover and the height distribution of objects over the area of a GLAS full waveform in (c). (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

precedent peak position. This filter has been applied to remove noise from an in-situ collected hyperspectral reflectance and assist in deriving a vegetation signature (Lin et al., 2012). An 11 bins-width moving average filter was therefore applied to all of the waveforms to obtain denoised waveforms for further analysis. Fig. 3 illustrates the full-waveform denoising process. The black line depicts the original waveform and the denoised waveform was derived by a moving average filter as shown by the red line. The footprint was mainly covered with betel palm and broadleaf trees on the left and bare soil on the right; and the first and second pronounced peaks (B and C) of the denoised waveform appeared to be the dominant and co-dominant canopy of betel tree and broadleaf tree; the last peak D was contributed by ground surface. A and E represent the signal start and the signal end of the waveform.

0.23 × 7 = 1.61 m. To summarize, the stand-height determination of a GLAS footprint trees was a four-step process applied to high resolution airborne LiDAR data. First, the trees within a footprint area were delineated using MMAC technique. Second, the height of individual trees was determined by LH−1.61. Third, trees with a height greater than the mean value of all trees within the area of that footprint were selected as dominant or co-dominant trees. Finally, the average height of these selected trees was calculated as the stand height of that footprint. 2.3. Deriving harmonized full-waveform metrics for representing canopy height information The patterns of the full-waveform data can be classified into six patterns, in which a bimodal waveform is the typical one while a multimodal waveform is the most complicated and very fuzzy with no distinct ground and vegetation returns (Hillbert and Schmullius, 2012). Similarly, a sparse single-cohort forest (an even-aged stand with one canopy layer) on flat ground also appeared as a bimodal pattern (Harding and Carajabal, 2005) while a spare two-cohort forest stand showed a trimodal pattern (Hancock et al., 2012). Interestingly, the waveform of a single-cohort stand on a slope demonstrated by Lefsky (2007) and Pang et al. (2008) is almost identical to a multimodal pattern. This demonstrates that the pattern of a full-waveform changes dramatically in nature depending on canopy structure and topography. When canopy heterogeneity and topography variation within a largefootprint area were raised, the complexity of a full-waveform increased. As shown in Fig. 4, the physical principle of a multimodal full-

2.2.2. Plot location identification and airborne LiDAR data processing Based on the GLAS14 data, the central location of a footprint with the geodetic latitude, longitude, and height above WGS84 ellipsoid was transformed to the local-map coordinate system, Taiwan datum TWD97. With additional parameters, i.e., major axis, minor axis, and azimuth of a footprint shape information, the area (57.0 by 34.2 m) of a footprint in the airborne LiDAR-derived DEM and CHM image was determined. The terrain features such as slope and elevation were thus directly extracted from the DEM data and the physical features such as tree height, crown radius, and the number of trees within a footprint were extracted from the CHM using the multi-level morphological active contour algorithm (MMAC) (Lin et al., 2011a,b). 2.2.3. Deriving stand height of a footprint area from airborne LiDAR data The airborne LiDAR data was obtained in October 2004 while the GLAS data was from January 2012. Thus there was a difference between the airborne LiDAR-derived tree height (LH) measured in 2012 and the total height of trees in 2004 due to growth in the intervening years. Therefore, the LiDAR determined LH of the trees in January 2012 within each GLAS footprint were further corrected using generalized height-growth-rate (HGR) for tree species in a mixed forest and yeardifference (YD) factors in order to approximately restore the height of trees (adjusted-LH) as they would have been in October 2004. Mathematically, adjusted-LH = LH – HGR × YD where the product item is called tree height calibration scalar in this study. A generalized estimation of HGR was available from the literature. For example, an average HGR of 22.98 ± 6.63 cm was suggested for mixed species forest in Central Taiwan based on the inventory research carried out by Lin et al. (2014). Comparison with the HGR observed from a mixed species forest in Illinois (23.02 ± 9.73 cm/yr) (Ritchie and Hann, 1989), a conifer forest in Oregon (33.3 ± 6.91 cm/yr) (Ware, 1990), and a tropical forest in Barro Colorado Island (33.2 ± 21.1 cm/yr) (Leigh, 1999), suggests that this local HGR appeared to be a reasonable figure for a mixed species forest at the study site. Consequently, the calibration scalar was determined as

Fig. 4. A diagrammatic demonstration of the multimodal full-waveform for small-footprint (highlighted in color) and large-footprint (black) LiDAR data (modified from Mallet and Bretar, 2009). (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.) 5

Remote Sensing of Environment 235 (2019) 111436

C. Lin

to the ground peak and the PEi denotes the returned energy intensity observed at the peak position. Correspondingly, PAi is the area of a fitted Gaussian curve for vegetated peak i. According to Mallet and Bretar (2009), PEi and PAi may change in respect to the complex of canopy structure and biomass amounts, therefore these two parameters are also likely able to account for partial information of stand height. In Eq. (2), the original prototype of harmonized extent metrics Hpd is determined as the average of the distance between each of the vegetated peaks and the ground peak. Hpe in Eq. (3) accounts for the influence of each vegetated peak's intensity on the protocol harmonized metric Hpd via the weighting factor wi . For any vegetated peak i whose weight is calculated as the proportion of its energy intensity to the intensity summation of all vegetated peaks. Similarly, Hpa is designed to specify the effect of each vegetated peak i on the full waveform extent by the Gaussian area of the corresponding peak using Eq. (4), i.e., the weighting factor wi is determined as the areal proportion of any vegetated peak to all vegetated peaks. According to Eq. (2), as a bimodal waveform occurred in a footprint, this waveform will have an Hpd that is mathematically identical to the PD, the particular distance between the vegetated peak and the ground peak. And the weighting wi in Hpe and Hpa will be 1. In other words, Hpd = Hpe = Hpa = PD. In this case, the canopy structure within a footprint should be quite simple and the HSH will work like the classic extent-based statistical method.

waveform can be summarized as follow: 1) a small-footprint lidar pulse echoed by a single target is basically in a pattern of unimodal Gaussian waveform, 2) the lidar pulse transmitted through a canopy to the ground, a waveform with a few discrete Gaussian peaks is generated, 3) for the GLAS data, the pulses echoed by all targets at different locations and with different properties in a subpixel of a large-footprint area are additively fused and tend to generate a continuously complex multimodal full-waveform (Blair and Hofton, 1999; Duncanson et al., 2010), and 4) the pattern of a full-waveform is therefore a consequence of complicated interaction of the varying energy of a pulse against time/ distance and the structural complexity of object(s) in both vertical and horizontal dimensions. It is particularly challenging to derive canopy height features using the Gaussian decomposition method for the footprints covered by dense vegetation over a complex topography area because more discrete elements within a large-footprint will increase the number of Gaussian curves that make up a given waveform (Duncanson et al., 2010). As the peaks in a full-waveform can be thought of as the energy enhanced by the subcanopy, changes in the pattern of a full-waveform should be considered as valuable canopy information for gathering possibly critical metrics for stand height determination. Moreover, the distance between the last and first Gaussian peak is typically used to present the height from the mean elevation of the ground return to the centroid of the uppermost canopy layer (Wang et al., 2016). Any peak in a fullwaveform can be thought of as the energy enhanced by subcanopy elements that distribute vertically and horizontally over the area of a large footprint. So, in contrast to these traditional or prototypical metrics used in Equations in Table 2, the distance, returned energy, and the area of a Gaussian fit of each peak clearly appearing in a full-waveform can be combined to generate additional harmonized waveform metrics (Hpd, Hpe, and Hpa) to provide valuable information for stand height estimation. As shown in Fig. 5, the denoised waveform of a vegetated footprint appeared to be comprised of 11 peaks where the last peak was assumed to be the ground peak and the others were vegetated peaks. Because a peak is considered a vegetative canopy return, a Gaussian curve should be centered at the peak location and therefore the decomposition technique is not suitable for the assumption. Each peak can be fitted with a Gaussian function N (μ, σ ) whose μ was assigned as the peak position or bins number (Bi) and σi was set to be the width of peak, i.e., the distance of the two valleys next to that peak. In this figure, the PDi = distance(Bi, Bground) is the distance between any vegetated peak

Hpd =

1 n−1

n−1

∑ PDi = i=1

1 n−1

n−1

Hpe =

∑ wi⋅PDi ,

and wi =

i=1

n−1

Hpa =

∑ wi⋅PDi , i=1

and wi =

n−1

∑ distance (Bi,

Bground )

i=1

(2)

PEi n−1

∑i = 1 PEi

(3)

PAi n−1

∑i = 1 PAi

(4)

2.4. Prototypes of stand height estimation model and accuracy assessment methods In Table 2, the waveform extent represents pictorially the potentially maximum canopy height, which is actually excessive in contrast to the maximum height, mean canopy height, and basal-area-weighted canopy height within a large footprint. Based on Lefsky et al. (2007), a conceptual model of canopy height can be generalized as H= wfE − correction factor , where the correction factor is a function of leE, teE, and wfE. Similarly, the prototype of harmonized stand height model (hereafter HSH model) can be written as the form of

HSH = harmonized extent metric − correction factor

(5)

where the harmonized extent metric can be one of the proposed metrics Hpd, Hpe or Hpa, and the correction factor of that specific metric is statistically a linear function of the prototypical waveform metrics and the harmonized metrics. Specifically, for example, the Hpd-based HSH model (HSHpd) determines the stand height CHDO over a footprint area by subtracting a value of corrector from that metric, that is HSHpd = Hpd − CFpd . Conceptually, the correction factor is used to fix a difference between harmonized extent metric and LiDAR derived stand height and is statistically expressed as a function of prototypical waveform metrics and harmonized extent metrics. In this study, a stepwise regression analysis was applied to derive an appropriate correctionfactor model to formulate a HSH model if the correction-factor model's F value was significant at the probability level of 0.05 as well as each of its regression coefficients being diagnosed as significantly not equal to zero by a t-test and with a value of variance inflation factor (VIF) less than 10. The estimated CHDO of footprints were evaluated with the accuracy measures of MAE, RMSE, RMSPE, and PRMSE (Lin et al., 2016a) using a cross-validation technique in which the footprints were divided into 2-, 3-, 5-, 7-, and 10-groups that indicate the leaving out of

Fig. 5. An example diagram of peak features for the determination of harmonized extent metrics of a waveform. The gray and red lines present the original and the denoised waveform of a GLAS footprint respectively. Blue curves are Gaussian fits with respect to the peak position and peak width of the denoised waveform. The green line depicts the background noise threshold. The 1st-10th peaks are the signals from the multiple returns of soft surface, i.e., the tree canopy, and the 11th peak is the signal from hard surface, the ground. PDi, PEi, and PAi denotes the distance, energy intensity, and Gaussian area of peak i. 6

Remote Sensing of Environment 235 (2019) 111436

C. Lin

Fig. 6. Examples of terrain and surface features at the footprint level. The first and second columns show the DEM and CHM of each example footprint respectively, and the third column demonstrates CHM overlaid on DEM for each footprint.

distributions. The maximum likelihood estimation method was used to determine the three continuous parameters: shape α , scale β , and location γ of LL-PDF that was separately derived from the whole dataset and the training and assessing subsets. As a result, three duplications of the LL-CDF parameters were available for ANOVA analysis and a post hoc test of μi = μj for all i ≠ j using Duncan's test. If the estimate of any fitted PDF parameters is statistically equal to the estimate of the standard PDF parameters, the two shape estimates for example α-LSH and α-HSH can be considered as having no difference.

50%, 33%, 20%, 14%, and 10% of all footprints data for each validation. This design provided duplications of the study for examining the effects of training-sample size on the CHDO estimation. For such a purpose, when one group of the partitioned subsets was left out for evaluation, the other groups were applied to derive a HSH model. As a result, the number of footprints used for modeling was correspondingly 54, 72, 86, 93, and 98. Taking the four accuracy measures of the OSH model as a baseline, the performance of HSH models was tested in the view of accuracy improvement percentage (AIP) (Lin et al., 2016a) by factorial ANOVA analysis (Lin et al., 2015; Lin and Dugarsuren, 2015) in which the number of AIP observations was 60 which accounted for the combination of treatments with 5 grouping categories by 4 model types by (41) accuracy measures.

LL − PDF: f (x ) =

α ⎛x − γ⎞ ⎜ ⎟ β⎝ β ⎠

α−1

α −2

⎛1 + ⎜⎛ x − γ ⎟⎞ ⎞ ⎜ ⎟ ⎝ β ⎠ ⎠ ⎝

(6)

α −1

β ⎞ ⎞ LL − CDF:⋅F (x ) = ⎜⎛1 + ⎜⎛ ⎟ ⎟ ⎝x − γ⎠ ⎠ ⎝

2.5. Similarity analysis of the height structure of the study site

(7)

3. Results

A probability density function (PDF) and its corresponding cumulative density function (CDF) is a probability distribution that describes the probability of a random variable (X) whose variate takes on a value equal to x or a value smaller than or equal to x. The procedure of applying statistical methods to estimate distribution parameters such as shape, scale, and location based on the sample data is known as distribution fitting. This technique has been applied to examine data distribution for deriving stand structures such as diameter, height, and age and for further exploring dynamics of a forest stand (Lin et al., 2016b). Assuming X is a continuous random variable of stand heights, then the measurements or estimates of X in the area of ICESat-1 GLAS footprints over the study site can be used to simulate the height structure of forests by fitting a series of PDF distributions. The PDF obtained from the HSH model derived stand height estimates (CHDO) can be examined to see if it is identical to the standard one determined from the LiDAR CHM derived stand height observations (LSH). According to a pretest of LiDAR-derived stand heights of the whole dataset, a distribution of 3-parameter log-logistic PDF and CDF (LL-PDF and LL-CDF) as shown in Eqs. (6) and (7) was most appropriate in describing the forest height structure based on Kolmogorov-Smirnov (K–S) test and Anderson-Darling (A-D) test among a series of

3.1. A general descriptive statistics of the LiDAR-derived stand height of the footprints All of the 108 footprints used for this study were distributed along two transecting lines across the study site. Elevation ranged from 218 to 1016 m. A variety of waveforms appeared with peaks varying between 5 and 38 m and centered at average and standard deviations of 19.45 ± 6.05, and whose waveform extents ranged from 25.65 to 79.95 m and averaged 44.66 ± 10.43 m. Three examples were gathered at the specific UTC times representing low, medium and high levels of stand height to illustrate the effects of both topography and canopy surfaces on the determination of stand height at footprint level. The results obtained are shown in Fig. 6(a)-6(c). Obviously, the difference of stand height among the cases at footprint level was greater than the deviation of their waveform extents. Fig. 7 shows the structure of the stand height that was gathered from all the footprint samples. The LSH of the forest were mostly between 10 and 18 m and centered at 13.95 m with a dispersion of 4.30 m. The fitted log-logistic probability density function (dash line) of the 7

Remote Sensing of Environment 235 (2019) 111436

C. Lin

factor models was significant at the 0.05 probability level and with a value of VIF less than 3. A common phenomena of these CF models was the similarly relative contribution of the regression in determining the offset value. According to the standardized βi of each regreessor coefficient, the harmonized extent metric was the most influential variable and then the effects of wfE, E40, leE, and E80 were gradually decreased. In contrast, the OSH model (Eq. (11)) derived using the metrics wfE, leE, and teE was able to achieve an R2 of 0.314 in determining the stand height within the GLAS footprints over the steep rugged terrain. Though the coefficient of trailing edge extent (teE) was significant merely at the probability level of P = 0.52, it was still kept as a standard model for comparing with the harmonized stand height models proposed in this study. The metrics wfE, leE, teE, Hpa, Hpd, and Hpe were measured in meters, and the metrics E40 and E80 were dimensionless with values ranging from 0 to 1. As mentioned previously, many of the waveform metrics were significantly intercorrelated. Thus, the regressor leE, E40 and E80 in Eqs. 8–10 could be replaced by E10, E30 and E70 respectively with only a minimal cost of R2 reduction. While the wfE acted as the kernel element and the additional metrics worked as critical supplementary elements, the unique variable therefore must be kept in the derivation of CF models. Terrain information is not required for such estimation.

Fig. 7. The airborne LiDAR-derived stand height histogram and the fitted LLPDF distribution of the study site.

histogram of this data (bar chart) was tested insignificant at the 0.05 probability level (K–S = 0.0506 and A-D = 0.4427) which indicated that the stand-level LSH of the study site was with the distribution of LL-PDF(x; 8.25, 19.14, -5.59). 3.2. The harmonized stand height models and their performance It was found that the stand height over mountainous forest land was positively or negatively significantly related to a variety of waveform metrics, including the wfE, leE, return-energy-based decile metrics, harmonized extent metrics and related correction factors. Many of those metrics are also significantly intercorrelated. For example, the signal of leE and E10 seemed highly interdependent because their correlation coefficient (r = 0.796) was extremely significant at a probability level of P < 0.01. A serious problem of multicollinearity occurred when most of those metrics were simultaneously used to formulate a HSH model. Fortunately, this problem was fixed by a stepwise regression method. The derived HSH model using the harmonized extent metric Hpa, Hpd, and Hpe was HSHpa = Hpa − CFpa , HSHpd = Hpd − CFpd , and HSHpe = Hpe − CFpe respectively. Based on the stepwise regression analysis, the corresponding correction-factor model of each harmonized extent metric was determined as Eqs. 8–10. Using the waveform metrics Hpa, wfE, leE, and the 4th and the 8th decile of the returned-energy E40 and E80, the CFpa, CFpd, and CFpe models were able to achieve a coefficient of determination (R2) of 0.843, 0.808, and 0.837 respectively. The regression coefficient of each regressor in the correction-

CFpa = −2.514 + 0.996Hpa − 49.146E40 − 20.836E80 − 0.193wfE + 0.289leE

(8)

CFpd = −2.505 + 1.016Hpd − 48.842E40 − 21.338E80 − 0.203wfE + 0.293leE

(9)

CFpe = −2.526 + 0.994Hpe − 49.121E40 − 20.867E80 − 0.191wfE + 0.287leE OSH = 8.566 − 0.398leE − 0.034teE + 0.185wfE

(10) (11)

In comparison, the OSH and HSH-based models showed an estimation bias which differed from each other significantly. In the training dataset, the OSH model estimation bias ranged from -18.0 to 10.2 m (SD = 3.5 m) while the HSH models showed an estimation bias between -7.9 and 7.6 m (SD = 3.0 m), indicating that estimation of the HSH models was closer to the LiDAR-derived stand height. This effect also appeared in the test dataset. The deviation of such estimation biases among the models of OSH, HSHpa, HSHpd, and HSHpe is visually

Fig. 8. Histogram pattern of the prediction bias of stand height derived by the models OSH (a), HSHpa (b), HSHpd (c), and HSHpe (d). 8

Remote Sensing of Environment 235 (2019) 111436

C. Lin

illustrated in Fig. 8. The largest observation (CHDO = 35.26 m) was extremely underestimated by the OSH model with a level of around -19 m bias, but this observation was estimated more accurately at an amount of bias around -3 m by the HSHpa, HSHpd, and HSHpe models simultaneously. In contrast, the smallest observation (CHDO = 4.22 m) was overestimated as a value of around 10.45 and 9.65 m by the OSH and the HSH models respectively. The difference between the estimation biases (6.23 m vs. 5.43 m) for that particular footprint between the OSH and HSH models was less than 1 m. In terms of generalized performance measures, the MAE and RMSE of the HSH models was 2.44 and 3.00 m for the training dataset and 2.53 and 3.13 m for the test dataset. The values showed that the increment of estimation accuracy achieved by the models HSHpa, HSHpd, and HSHpe was almost identical at a level of MAE = 0.25 m and RMSE = 0.50 m. The cross validation results also showed that the HSH models’ PRMSE and RMSPE was around 22% and 28%, so 4% and 3% better than the OSH model. Meanwhile, the AIP factorial ANOVA test concluded that the three types of HSH model achieved similar performance and were able to reduce estimation bias by 12% compared to the OSH model.

Table 3 Summary of the Duncan's test for the AIP (%) of the sampling size for stand height modeling. Subsets

2 groups

5 groups

10 groups

7 groups

3 groups

Left out samples (samples used) Meana Groupingb

50% (54)

20% (86)

10% (98)

14% (93)

-10.06 A

-12.08 B

-12.08 B

-12.85 B, C

33% (72) > -13.91 C

a b

AIP was calculated based on the accuracy measures of the OSH model. The same code in case (a) and (b) indicates insignificance at P = 0.05.

was left out in return for collecting multiple assessment samples of 108 points for cross validation. Thirdly, because the significance of the Duncan's test, carried out via factorial ANOVA analysis, is able to examine the effect of multiple factors on the observations fairly in a single test. Therefore it should be able to reflect the minimum sampling size for deriving the correction factor model of the harmonized extent metric for the estimation of stand height over a GLAS footprint. 4.2. Appropriateness of the HSH models derived stand-level canopy height distribution

4. Discussion

The footprints used in this study were spatially distributed along two lines thus resembling samples collected via the line-transect sampling method. So, aggregating those footprints provides inventory data from which canopy-height features of a forest stand become retrievable. As shown in Fig. 7, the histogram of LiDAR-derived stand height observations was well fitted with a log-logistic probability density function LL-PDF(x; α, β, γ), abbreviated LSH-PDF(x; 8.25, 19.14, -5.59) which was centered at 13 m and bounded with a lower and upper boundary at 4 and 35 m respectively. For those estimates made by the models OSH, HSHpa, HSHpd, and HSHpe, the distributions of stand heights derived from the whole dataset as well as the training and test datasets for the 1/3 leaving out cross validation, were used to carry out PDF fitting. Both K–S and A-D tests showed that the stand height estimates derived by the OSH and HSHs models were concluded to be with LL-PDF distribution for the datasets. The parameters of each PDF model in the datasets varied slightly for the three types of HSH models but changed dramatically for the OSH model. For example, those HSHbased PDF models whose α parameter was between 7.50 and 8.11 and averaged 7.78 ± 0.22, and their β parameter ranged from 11.45 to 12.16 and averaged 11.71 ± 0.27. However, the α and β of the OSHbased PDF model was between 6.50-23.90 and 9.38-33.42 and averaged 15.12 ± 7.11 and 21.08 ± 9.82 respectively. As mentioned in section 2.5, both α and β parameters controlled the shape and scale of the data distribution. The HSH-based PDF distribution appeared to be more similar to the LSH-PDF distribution. With a series of cross-validation processes, the parameters of multiple LSH-PDF and OSH-PDF distribution models were estimated. Multiple observations of the shape value difference between OSH-PDF and LSH-PDF and also HSH-PDF and LSH-PDF can be determined. Fig. 9 gives examples of the histogram of stand height estimates accompanied with an appropriate PDF model of the whole dataset for the OSH and HSH models. Let dShape be the difference of the shape values, the similarity of their shape parameters can be examined by a post hoc test. As a result, the Duncan's grouping showed the dShape between HSHsPDF and the LSH-PDF was around 0.4-0.5 which is significantly smaller than 8.0, the dShape of OSH-PDF and LSH-PDF. This also indicated that each shape parameter of the three types of HSHs-PDF distributions was statistically more similar to the shape of LSH-PDF, the LiDAR-derived stand height distribution at forest-stand level. In contrast, the difference of scale and that of the location parameters of the OSH- or HSHs-PDFs and LSH-PDF were statistically same. With consideration of the descriptive statistics of the stand heights listed in Table 4, it was concluded that the main effects of the three types of HSH models in the

4.1. Effect of sample size on the prediction accuracy of stand height at the footprint level The sample size used in the references for canopy height estimation varied between dozens (Lefsky et al., 2005; Lefsky, 2010) and a couple of hundred footprints (Lefsky, 2010; Hayashi, 2015). In practice, the effective number of samples used in canopy height modeling can be evaluated by the “sampling without replacement” algorithm (Lin et al., 2016b) and is determined as n = (E /(tα /2, ν × CV )2 + 1/ N )−1. Specifically, the sample size (n) used in this study was able to describe the canopy height of the forest with an allowable sampling error (E) at a level of 15% for the conditions of the particularly theoretical population (N) that was calculated by dividing the area of the study site to the specific size of footprints, a value of 0.95 for the coefficient of variation (CV) of the forest canopy height within the study site. The student t value is tα/2, ∞ = 1.645 at the significant probability of α = 0.10. The four accuracy measures, HSHpd, HSHpe, and HSHpa showed an estimation performance very similar to each other and significantly smaller than the OSH model among all the cross-validations with variant sample sizes varied between 54 and 98 points. In summary, the RMSE and RMSE% of the three types of HSH models was averaged 3.13 ± 0.07 m and 22.40 ± 0.53%. The small value of standard deviations found in both RMSE and RMSE% of the models revealed that the influence of sample size on the performance of the canopy height models was quite small. This is similar to the result of a posteriori sensitivity analysis in Hayashi et al. (2015) in which the RMSE of a canopy height model for a steep slope dataset varied slightly between 3.6 and 4.0 m as the samples used for modeling decreased from 451 to 91 and the bounding demE threshold increased from 3 to 25 m. With regards to the AIP, the effect of sample size on the performance of the HSH models was changeable but insignificant at the 0.05 probability level. According to Table 3, the HSH models were able to achieve an estimation significantly better than the OSH model. The bias reduction was between 10% and 14% but the difference of the bias reduction was only around 4% among the sample sizes. This suggests that creating models using a sufficient number of samples, say n ≥ 72 footprints, collected from steep rugged terrain should be able to provide sufficient information to derive an appropriate correction factor model for the HSH models to accurately estimate stand height within a footprint. This is concluded as firstly all the 108 footprints were collected from a mixed forest over a high steep rugged land with an averaged slope of 31 ± 15°. Secondly, as all the data was randomly partitioned into several subsets with a variety of sample size and one of the subsets 9

Remote Sensing of Environment 235 (2019) 111436

C. Lin

Fig. 9. A comparison to the PDF distributions of stand height derived by OSH and HSHpd model using the whole dataset. The PDF of other HSH models was not shown.

occurred in the 54-point-derived HSHpa model. With a series of multiple cross-validations using completely random process determined subset of the data as discussed in the discussion sections 4.1 and 4.2, all estimations of footprint-level canopy height made by the HSH models derived from a variety of modeling samples were averaged RMSE = 3.13 ± 0.08 m and RMSE % = 22.40 ± 0.55%. The accuracy measures showed a quite small standard deviation indicating a relatively small uncertainty in the HSH models. This indicated that the integration of waveform extent metrics and multiple-peak-based energy return via the harmonic algorithm is able to fix the potential influence of terrain effects and canopy structure on the waveform features and has great potential for general use.

estimation of stand height at the footprint level and the stand level is the ability to contribute a precise estimation for those footprints with extreme stand heights. 4.3. A comparison of the uncertainty of canopy height estimation in literature with the suggested models and the HSH models Fig. 10 shows the deviation histogram and the descriptive statistics of the estimates of the tested 10 classic models listed in Table 2 and two HSH models. The uncertainty between the HSH models and the classic models will be examined based on the maximum negative/positive deviation of all points and also their centralization toward the zero. As can be seen in Fig. 10, deviation of those classic models can be divided into three situations: serious underestimation, serious overestimation, and mild overestimation. In models TE(3,5-7), the variable demE worked as regulation factor against the wfE and thus tended to induce extremely negative deviation of canopy height for those footprints distributed over an area with a significant altitudinal range. A few extreme large estimation were observed caused a significant large bias such as 70.50 m in TE(9) and 138.79 m in TE(10). The wide dispersion of estimation bias in these two models indicates that the fixed number of six modes Gaussian decomposition may not always be sufficient to fit a variety of waveform patterns (Iqbal et al., 2013), the use of a specific amplitude and width of Gaussian peaks is most likely inducing inconsistent information for canopy height estimation. Canopy heights estimated by models TE(2,8) were moderately close to the real values among the classic models, however their large bias from -9.57 to 27.08 m still indicates inappropriateness. In contrast, Fig. 10 also demonstrates the deviation of canopy height estimation in a scale identical to the classic models. The particular HSHpa model derived from a small subset of all footprints (N = 54 or N = 98) was selected for comparison due to its accuracy being the smallest or the largest among all the tests during the cross-validation procedure. As noted, the estimation of these two models concluded similarly that the canopy height of the study site averaged 14 ± 3 m. They also showed a similar estimate of the smallest canopy height but a quite different estimate of the largest canopy height. Beyond the general metrics, this research introduced a harmonic algorithm to integrate a versatile patterns of full-waveform footprints to draw canopy height. The histogram of estimate deviations also showed a similar distribution which was centralized around the space of -5 to 0 m. The positively/ negatively maximum deviation was exactly -13.45/6.35 m that

4.4. Harmonized metrics provide compensatory information of trailing-edge extent and terrain index for describing canopy height Leading-edge and trailing-edge extents (leE and teE) have been demonstrated to be useful for describing canopy height if accompanied by the waveform extent (wfE). The leE catches the signal related to the topmost layer of canopy elements within a footprint area and is rarely influenced by changes of canopy structure. However, the teE is quite changeable due to dense canopy coverage that is most likely able to block or reduce the energy passing through a canopy to reach ground surface. As a result, the relationship between teE and canopy height becomes unstable and the capability of teE in describing canopy height is therefore dramatically reduced. This is particularly evident in this study via the t test of correlation coefficients between the variables. As noted, the leE was diagnosed to be extremely significantly related to CHDO at a moderate level of r = -0.387 (P < 0.01) but the teE was completely incapable of revealing any information regarding CHDO (r = -0.009, P = 0.93). Fortunately, the harmonized extent metrics Hpd/Hpa/Hpe is positively/negatively related to teE/leE with a significantly level of correlation coefficient 0.188 ≤ r ≤ 0.474 (P ≤ 0.05) and -0.205 ≤ r ≤ 0.511 (P ≤ 0.01) respectively, and also positively correlated with CHDO (0.284 ≤ r ≤ 0.367, P < 0.01). This implied that the versatile signals of the full-waveform data can be appropriately transformed into Hpd/ Hpa/Hpe metrics. In addition, the Hpd/Hpa/Hpe was also diagnosed to be significantly correlated with demE at a level of 0.246 ≤ r ≤ 0.255 (P ≤ 0.01) indicating that the harmonized extent metrics provide alternative information to compensate for the signals of terrain feature that have been affected by dense canopy coverage. It is therefore in

Table 4 A comparison for the descriptive statistics of stand heights derived by variant models (based on 10% observations left-out cross validation). Models

LiDAR

OSH

HSHpa

HSHpd

Datasets

SH

Training

Test

Training

Test

Training

Test

Training

Test

min max mean std

4.22 35.26 13.95 4.30

8.73 21.13 13.95 2.42

8.87 21.62 13.97 2.46

8.36 32.70 13.94 3.07

8.41 31.18 13.94 3.03

8.33 32.66 13.94 3.07

8.35 31.27 13.95 3.04

8.36 32.65 13.93 3.07

8.41 31.17 13.94 3.03

10

HSHpe

Remote Sensing of Environment 235 (2019) 111436

C. Lin

Fig. 10. A comparison of estimation deviation of canopy height models in literatures and selected HSH models. Values embedded in each subfigure indicates descriptive statistics of CHDO estimates of the footprints.

(CHMAX), crown-area-weighted maximum canopy height (CHCA), basalarea-weighted canopy height (CHL), and stand height (CHDO). A GLAS footprint works like a sample plot in forest inventory. The mean footprint major axis of the GLAS data varied between 52 and 149 m, accounting for roughly 0.2−1.6 ha coverage of land (Bae et al., 2013). The canopy height of discrete components over larger footprint scales becomes spatially averaged (Brenner et al., 2011). Thus, the CHMAXdetermined canopy height seemed inappropriate for volume estimation as this parameter is significantly larger than the stand height. If an areabased approach is applied to estimate forest volume at footprint level, serious overestimation occurs. CHDO is more appropriate to characterize canopy height variations over a GLAS footprint coverage as only the trees which occupied or extended their crown level above the average canopy stratum are generally used to determine the CHDO in forestry. In contrast to the prototypical waveform metrics, this study proposed an entirely new logic of waveform metrics to harmonize the crucial signals returned by parts of both horizontal and vertical forest

accompaniment with the decile metrics of returned energy, wfE and leE, that the correction factors (CFpd, CFpa, and CFpe) are able to catch critical compensatory information of the full-waveform complex features, particularly the ground surface feature or teE. The use of HSH models provide an effective method of integrating structural information of multi-dimensional canopy profiles and also the terrain morphological information to reduce the impacts of both dense canopy coverage and topographic relief effect to improve canopy height estimation. The method can be carried out without the need of gathering detailed information of terrain index which has been generally used as a priori in published methods and is thus also beneficial in the estimation of global scale forest carbon stocks using the full-waveform data. 5. Conclusion This paper explored the potential of GLAS full-waveform data to determine forest canopy height in terms of maximum canopy height 11

Remote Sensing of Environment 235 (2019) 111436

C. Lin

area-based approach. The size parameters and geometric structure of trees will change through time due to the interaction of physiological activities (e.g., growth) and disturbances over the space in the terrestrial ecosystem. The canopy height at both footprint level and stand level are critical for the derivation of stand structure and generating a back-to-back map of stand volume, biomass, carbon and nutrient cycling, productivity, and even the biodiversity of the ecosystem. After the era of ICESat-1 GLAS, the ICESat-2 Advanced Topographic Laser Altimeter System (ATLAS) and the Global Ecosystem Dynamics Investigation (GEDI) lidar instrument have extended the spaceborne lidar missions from 15 September 2018 and 5 December 2018 respectively. The laser detector modality of GEDI is similar to the GLAS which record the entire temporal profile of the reflected laser energy through the canopy resulting in footprints averaging 25 m in diameter (Dubayah et al., 2019). Therefore, the harmonized metrics and harmonized stand height modeling technique can be directly applied to retrieving a stand-height parameter from the full-waveform data of the GEDI lidar. By integrating with the archived GLAS data, height growth at stand level could be derived for accounting carbon balance over forests across the terrestrial ecosystem. In contrast, the ATLAS is a photon-counting lidar. Ideally, according to Neuenschwander, “if the ATLAS laser pulse is transmitted and dwelled onto the surfaces with hundreds of shots, the vertical distribution of the reflected photons are able to resemble a full waveform” (Neuenschwander et al., 2019) this may then facilitate independent determination of vegetation height (Markus et al., 2017). However, considering the effect of canopy density and topography as well as the nature of photons which arrive at the surfaces of the earth as flattened disks of photons travelling in a downward direction within a space-cube having vertical and horizontal dimensions of 0.5 m and 14 m respectively (Neuenschwander et al., 2019), the ability to differentiate canopy signal from solar background noise to reconstruct a waveform will be a challenge for harmonized stand height modeling. According to Popescu et al. (2018), the retrieval of terrain is more accurate than the canopy from simulated ICESat-2 data using multi-level noise filtering approach. The development of appropriate methods for retrieving accurate ground surface height and canopy height over both sparse and dense forests using the photoncounting lidar data remains an issue for further study.

structure elements in a footprint area. After a denoising process, the location and width of peaks were detected from the denoised full waveform. Crucial features such as the distance from each peak to the ground peak, the energy intensity of a peak, and the area of a Gaussian curve derived from the peak location and width were extracted to define harmonized extent metrics Hpd, Hpe, and Hpa. The experimental results demonstrate that the novel harmonizedstand-height algorithm can integrate the harmonized extent metrics, waveform extent, leading-edge extent, and the fourth and eighth decile of the return energy to catch the decisive signals of the full-waveform data to describe information regarding stand height at footprint level and stand level. Three HSH models (HSHpd, HSHpe, and HSHpa) were developed in this study. Each showed promise in ability to estimate stand height. According to the cross-validation, the MAE and RMSE of the HSH models was 2.44 and 3.00 m for the training dataset and 2.53 and 3.13 m for the test dataset. In contrast, the OSH model which was regressed by wfE, leE, and trE made an estimation of footprint-level stand height at an amount of bias 0.25 m MAE and 0.50 m RMSE larger than the HSH models. The cross validation results also showed that the HSH models achieved a measure of 22% PRMSE and 28% RMSPE which was 4% and 3% better than the OSH model. The improvement of accuracy was statistically significant at the 0.05 probability level. Regarding single point evaluation, the estimation bias of the OSH model (ranged from -20 to 10 m) was significantly greater than the HSH models (between -8 and 8 m). As the variation of canopy height increased, the canopy-returned energy varied dramatically and the waveform extent and rugosity increases (Harding et al., 2001). The key achievement of the HSH models were their capability of reducing the prediction bias of the footprints with a wide range of CHM and a big diverse canopy height particularly for the steep slope positions. In other words, the HSH models appeared to be more effective at reducing the prediction uncertainty of the ICESat-1 GLAS footprints via their success in integrating crucial signals of the multiple layers of vegetation canopy elements and bare-earth surface. This further contributes to HSH models providing an increased contribution in derivation of stand-level height structure for forest inventory using a line-transect sampling method. On average, the three types of HSH models performed almost identically in retrieving both the footprint-level and stand-level CHDO information, the HSHpd formulated with the original form of harmonized extent metric could be applicable for its simplicity in derivation. While making allowance for the estimation that is able to cover nearly the full range of real canopy height at a relatively smaller bias, a sufficient number of footprint samples from steep terrain for example n ≥ 72 was recommended for general applications. As Brenner et al. (2011) stated, the degree of terrain-relief variation increases with the footprint size and the difficulty of differentiating the canopy height of grouped trees within the footprint increases. On sloped land, a GLAS waveform extent increases with the terrain slope, further causing overestimation of the canopy height due to the earlier detection of signal start (Carabajal and Harding, 2006; Duncanson et al., 2010; Hilbert and Schmullius, 2012; Fayad et al., 2014). The proposed harmonized extent metrics Hpa, Hpd, and Hpe are significantly related to the extent-based metrics (waveform extent, leading-edge extent, and trailing-edge extent) and terrain features such as the mean, standard deviation, and range of the elevation while not related to the coefficient of variation of the terrain within the footprint coverage. This indicates that the harmonized extent metrics are able to integrate variant signals of the canopy-surface complexity. It also explains why the harmonized stand height model can appropriately react to the extreme stand heights at the study site and accomplish a better accurate estimation of stand height at footprint level and the stand height structure at stand level. The algorithm of deriving harmonized stand height structure of a forest stand via the method of integrating harmonized full-waveform metrics, harmonized stand height models, and probability density function fitting techniques is expected to be beneficial for mapping national/regional/global carbon distribution using the

Acknowledgement The authors would like to acknowledge the support provided by projects MOST 103-2119-M-415-002 and MOST 104-2119-M-415-001 funded by the Ministry of Science and Technology, Taiwan. Thanks are also extended to the Southern Region Water Resources Office for providing the airborne LiDAR data for the study and Li-Han Lin and ChungChun Cheng for their assistance in data processing. Appendix A. Supplementary data Supplementary data to this article can be found online at https:// doi.org/10.1016/j.rse.2019.111436. References Asner, G.P., Mascaro, J., 2014. Mapping tropical forest carbon: calibrating plot estimates to a simple LiDAR metric. Remote Sens. Environ. 140, 614–624. Avery, T.E., Burkhart, H.E., 2002. Forest Measurements, fifth ed. McGraw-Hill, New York. Bae, S., Smith, N., Schutz, B.E., 2013. The GLAS algorithm theoretical basis document for precision attitude determination (PAD). In: ICESat (GLAS) Science Processing Software Document Series. NASA/TM–2013-208641/Vol 10. National Aeronautics and Space Administration, Goddard Space Flight Center Greenbelt, Maryland 20771. Blair, J.B., Hofton, M.A., 1999. Modeling laser altimeter return waveforms over complex vegetation using high-resolution elevation data. Geophys. Res. Lett. 26, 2509–2512. Brenner, A.C., Zwally, H.J., Bentley, C.R., Csathó, B.M., Harding, D.J., Hofton, M.A., Minster, J., Roberts, L.A., Saba, J.L., Thomas, R.H., Yi, D., 2011. Derivation of range and range distributions from laser pulse waveform analysis for surface elevations, roughness, slope, and vegetation heights. In: Geoscience Laser Altimeter System (GLAS)-Algorithm Theoretical Basis Document, Version 5.0.

12

Remote Sensing of Environment 235 (2019) 111436

C. Lin

Lin, C., Popescu, S.C., Thomson, G., Tsogt, K., Chang, C.I., 2015. Classification of tree species in overstorey canopy of subtropical forest using QuickBird images. PLoS One 10 (5), e0125554. Lin, C., Dugarsuren, N., 2015. Deriving the spatiotemporal NPP pattern in terrestrial ecosystems of Mongolia using MODIS imagery. Photogramm. Eng. Remote Sens. 81 (7), 587–598. Lin, C., Thomson, G., Popescu, S.C., 2016a. An IPCC-compliant technique for forest carbon stock assessment using airborne lidar-derived tree metrics and competition index. Remote Sens.-Basel 8 (6), 528. https://doi.org/10.3390/rs8060528. Lin, C., Tsogt, K., Zandraabal, T., 2016b. A decompositional stand structure analysis for exploring stand dynamics of multiple attributes of a mixed-species forest. For. Ecol. Manage. 378, 111–121. https://doi.org/10.1016/j.foreco.2016.07.022. Lin, C.Y., Lin, C., Chang, C.I., 2018. A multilevel slicing based coding method for tree detection. In: Proceeding of Geoscience and Remote Sensing Symposium (IGARSS). 2018 IEEE International, pp. 7524–7527. Lin, J.C., Liu, I.H., Tang, S.C., 2014. The growth performances and carbon stock changes of broadleaf mixed stands. Q. J. Forest Res. 36 (1), 57–66. Lo, C.S., Lin, C., 2013. Growth-competition-based stem diameter and volume modeling for tree-level forest inventory using airborne LiDAR Data. IEEE Trans. Geosci. Remote Sens. 51, 2216–2226. https://doi.org/10.1109/TGRS.2012.2211023. Los, S.O., Rosette, J.A.B., Kljun, N., North, P.R.J., Chasmer, L., Suarez, J.C., Hopkinson, C., Hill, R.A., van Gorsel, E., Mahoney, C., Berni, J.A.J., 2012. Vegetation height and cover fraction between 60o S and 60o N from ICESat GLAS data. Geosci. Model Dev. 5, 413–432. Mallet, C., Bretar, F., 2009. Full-waveform topographic lidar: state-of-the-art. ISPRS J. Photogrammetry Remote Sens. 64, 1–16. Markus, T., Neumann, T., Martino, A., Abdalati, W., Brunt, K., Csatho, B., Farrell, S., Fricker, H., Gardner, A., Harding, D., Jasinski, M., Kwok, R., Magruder, L., Lubin, D., Luthcke, S., Morison, J., Nelson, R., Neuenschwander, A., Palm, S., Popescu, S., Shum, C.K., Schutz, B.E., Smith, B., Yang, Y., Zwally, J., 2017. The ice, cloud, and land elevation satellite-2 (ICESat-2): sciencerequirements, concept, and implementation. Remote Sens. Environ. 190, 260–273. Montesano, M., Sun, G., Dubayah, R.O., Ranson, K.J., 2016. Spaceborne potential for examining taiga–tundra ecotone form and vulnerability. Biogeosciences 13, 3847–3861. https://doi.org/10.5194/bg-13-3847-2016. Neuenschwander, A., Pitts, K., 2019. Ice, cloud, and land elevation satellite 2 (ICESat-2) algorithm theoretical basis document (ATBD) for land - vegetation along-track height (ATL08) release 001. https://icesat-2.gsfc.nasa.gov/sites/default/files/page_files/ ICESat2_ATL08_ATBD_r001_0.pdf, Accessed date: 25 August 2019 accessed. Noordermeer, L., Økseter, R., Ørka, H.O., Gobakken, T., Næsset, E., Bollandsås, O.M., 2019. Classifications of forest change by using bitemporal airborne laser scanner data. Remote Sens.-Basel 11 (18), 2145. Pang, Y., Lefsky, M.A., Andersen, H.E., Miller, M.E., Sherrill, K., 2008. Validation of the ICESat vegetation product using crown-area-wieghted mean height derived using crown delineation with discrete return lidar data. Can. J. Remote Sens. 34, 471–484. Popescu, S.C., 2007. Estimating biomass of individual pine trees using airborne lidar. Biomass Bioenerg 31, 646–655. Popescu, S.C., Zhoua, T., Nelson, R., Neuenschwander, A., Sheridan, R., Narinea, L., Walsh, K.M., 2018. Photon counting LiDAR: an adaptive ground and canopy height retrieval algorithm for ICESat-2 data. Remote Sens. Environ. 208, 154–170. Reitberger, J., Heurich, M., Krzystek, P., Stilla, U., 2007. Single tree detection in forest areas with high-density LiDAR data. Int. Arch. Photogrammetry 36, 139–144. Ritchie, M.W., Hann, D.W., 1989. Equations for Predicting the 5-year Height Growth of Six Conifer Species in Southwest Oregon. Forest Research Laboratory, Oregon State University, Corvallis, pp. 1–14 Research Paper 54. Rosette, J.A.B., North, P.R.J., Suárez, J.C., 2008. Vegetation height estimates for a mixed temperate forest using satellite laser altimetry. Int. J. Remote Sens. 29, 1475–1493. Simard, M., Pinto, N., Fisher, J.B., Baccini, A., 2011. Mapping forest canopy height globally with spaceborne lidar. J. Geophys. Res. 116, G04021. https://doi.org/10. 1029/2011JG001708. SRWRO, 2012. Landform Change and Sediment Transprotation and Sedimentation of Tsengwen Reservoir Watershed Area. Southern Region Water Resources Office of Water Resources Administration. Ministry of Economic Affairs, Kaohsiung City, Taiwan, pp. 508. Tesfamichael, S.G., van Aardt, J.A.N., Ahmed, F., 2010. Estimating plot-level tree height and volume of Eucalyptus grandis plantations using small-footprint, discrete return lidar data. Prog. Phys. Geogr. 34 (4), 515–540. https://doi.org/10.1177/ 0309133310365596. Thorpe, H.C., Astrup, R., Trowbridge, A., Coates, K.D., 2010. Competition and tree crowns: a neighborhood analysis of three boreal tree species. For. Ecol. Manage. 259, 1586–1596. Wang, Y., Li, G., Ding, J., Guo, Z., Tang, S., Wang, C., Huang, Q., Liu, R., Chen, J.M., 2016. A combined GLAS and MODIS estimation of the global distribution of mean forest canopy height. Remote Sens. Environ. 174, 24–43. Ware, G., 1990. Height growth of trees of the Morton Arboretum's tree evaluation plots. In: Proceedings of the 7th METRIA Conference, pp. 49–54.

Carabajal, C.C., Harding, D.J., 2006. SRTM C-band and ICESat laser altimetry elevation comparisons as a function of tree cover and relief. Photogramm. Eng. Remote Sens. 72, 287–298. Carron, L.C., 1968. An Outline of Forest Mensuration with Special Reference to Australia. Australian National University Press, Canberra. Chen, Q., 2010. Retrieving vegetation height of forests and woodlands over mountainous areas in the Pacific Coast region using satellite laser altimetry. Remote Sens. Environ. 114, 1610–1627. Chung, Y.S., Hsu, W.C., Liu, J.K., Shieh, C.L., Shieh, M.L., Chen, Y.S., 2016. A study on the relationship between DoD and distribution of LiDAR point clouds. J. Photogramm. Remote Sens. 20 (3), 201–215. Dubayah, R., Blair, J.B., Goetz, S., Hurtt, G., Hansen, M., Healey, S., Hofton, M., Fatoyinbo, L., Kellner, J., Luthcke, S., Armston, J., Dncanson, L., Patterson, P., Hancock, S., Jantz, P., Lee, S.K., Burns, P., Tang, H., Qi, W., Minor, D., Marselis, S., Bruening, J., Huang, C., Schaaf, C.B., Pontius, J., 2019. GEDI ecosystem lidar. http:// gedi.umd.edu, Accessed date: 25 August 2019 accessed. Duncanson, L.I., Niemann, K.O., Wulder, M.A., 2010. Estimating forest canopy height and terrain relief from GLAS waveform metrics. Remote Sens. Environ. 114, 138–154. Fang, Z., Cao, C., 2014. Estimation of forest canopy height over mountainous areas using satellite lidar. IEEE J-STARS 7, 3157–3166. Fayad, I., Baghdadi, N., Bailly, J.S., Barbier, N., Gond, V., Hajj, M.E., Fabre, F., Bourgine, B., 2014. Canopy height estimation in French guiana with LiDAR ICESat/GLAS data using principal component analysis and random forest regressions. Remote Sens.Basel 6, 11883–11914. Halla, S.A., Burke, I.C., Box, D.O., Kaufmann, M.R., Stoker, J.M., 2005. Estimating stand structure using discrete-return lidar: an example from low density, fire prone ponderosa pine forests. For. Ecol. Manage. 208, 189–209. Hancock, S., Lewis, P., Foster, M., Disney, M., Muller, J., 2012. Measuring forests with dual wavelength lidar: a simulation study over topography. Agric. For. Meteorol. 161, 123–133. Harding, D.J., Carabajal, C.C., 2005. ICESat waveform measurements of within-footprint topographic relief and vegetation vertical structure. Geophys. Res. Lett. 32, L21S10. https://doi.org/10.1029/2005GL023471. Harding, D.J., Lefsky, M.A., Parker, G.G., Blair, B., 2001. Lidar altimeter measurements of canopy structure: methods and validation for closed-canopy broadleaf forest. Remote Sens. Environ. 76, 283–297. Hayashi, M., Saigusa, N., Oguma, H., Yamagata, Y., 2013. Forest canopy height estimation using ICESat/GLAS data and error factor analysis in Hokkaido, Japan. ISPRS J. Photogrammetry Remote Sens. 81, 12–18. Hayashi, M., Saigusa, N., Oguma, H., Yamagata, Y., Takao, G., 2015. Quantitative assessment of the impact of typhoon disturbance on a Japanese forest using satellite laser altimetry. Remote Sens. Environ. 156, 216–225. Hemery, G.E., Savill, P.S., Pryor, S.N., 2005. Applications of the crown diameter–stem diameter relationship for different species of broadleaved trees. For. Ecol. Manage. 215, 285–294. Hilbert, C., Schmullius, C., 2012. Influence of surface topography on ICESat/GLAS forest height estimation and waveform shape. Remote Sens.-Basel 4, 2210–2235. Hofton, M.A., Minster, J.B., Blair, J.B., 2000. Decomposition of laser altimeter waveforms. IEEE Trans. Geosci. Remote Sens. 38, 1989–1996. Iqbal, I.A., Dash, J., Ullah, S., Ahmad, G., 2013. A novel approach to estimate canopy height using ICESat/GLAS data: a case study in the New Forest National Park, UK. Int. J. Appl. Earth Obs. 23, 109–118. Lee, A.C., Lucas, R.M., 2007. A LiDAR-derived canopy density model for tree stem and crown mapping in Australian forests. Remote Sens. Environ. 111, 493–518. Lefsky, M.A., 2010. A global forest canopy height map from the moderate resolution imaging spectroradiometer and the geoscience laser altimeter system. Geophys. Res. Lett. 37, L15401. https://doi.org/10.1029/2010GL043622. Lefsky, M.A., Keller, M., Pang, Y., de Camargo, P.B., Hunter, M.O., 2007. Revised method for forest canopy height estimation from Geoscience Laser Altimeter System waveforms. J. Appl. Remote Sens. 1, 013537. https://doi.org/10.1117/1.2795724. Lefsky, M.A., Harding, D.J., Keller, M., Cohen, W.B., Carabajal, C.C., Del Bom EspiritoSanto, F., Hunter, M.O., de Oliveira Jr., R., 2005. Estimates of forest canopy height and aboveground biomass using ICESat. Geophys. Res. Lett. 32, L22S02. https://doi. org/10.1029/2005GL023971. Leigh Jr., E.G., 1999. Tropical Forest Ecology: A View from Barro Colorado Island, Pp9vols. 3–94 Oxford University Press, NY. Lin, C., Thomson, G., Lo, C.S., Yang, M.S., 2011a. A multi-level morphological active contour algorithm for delineating tree crowns in mountainous forest. Photogramm. Eng. Remote Sens. 77 (3), 241–249. Lin, C., Lo, C.S., Thomson, G., 2011b. A textural modification of the MMAC algorithm for individual tree delineation in forest stand using aerial bitmap images. In: Proceeding of the 4th International Congress on Image Signal Processing, pp. 1604–1608. Lin, C., Tsogt, K., Chang, C.I., 2012. An empirical model-based method for signal restoration of SWIR in ASD field spectroradiometry. Photogramm. Eng. Remote Sens. 78 (2), 119–127. Lin, C., Lin, C.H., 2013. Comparison of carbon sequestration potential in agricultural and afforestation farming systems. Sci. Agric. 70 (2), 93–101.

13