Airborne laser scanning: Exploratory data analysis indicates potential variables for classification of individual trees or forest stands according to species

Airborne laser scanning: Exploratory data analysis indicates potential variables for classification of individual trees or forest stands according to species

ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289 – 309 www.elsevier.com/locate/isprsjprs Airborne laser scanning: Exploratory data anal...

855KB Sizes 0 Downloads 51 Views

ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289 – 309 www.elsevier.com/locate/isprsjprs

Airborne laser scanning: Exploratory data analysis indicates potential variables for classification of individual trees or forest stands according to species T. Moffiet a,*, K. Mengersen b, C. Witte c, R. King a, R. Denham c a

University of Newcastle, School of Maths and Physical Sciences, University Drv., Callaghan N.S.W., 2308, Australia Queensland University of Technology, School of Mathematical Sciences, GPO Box 2434, Brisbane, 4001, Australia c Natural Resource Sciences, Queensland Department of Natural Resources and Mines, 80 Meiers Road, Indooroopilly, Queensland 4068, Australia b

Received 22 July 2004; received in revised form 7 March 2005; accepted 17 May 2005 Available online 1 July 2005

Abstract Understanding your data through exploratory data analysis is a necessary first stage of data analysis particularly for observational data. The checking of data integrity and understanding the distributions, correlations and relationships between potentially important variables is a fundamental part of the analysis process prior to model development and hypothesis testing. In this paper, exploratory data analysis is used to assess the potential of laser return type and return intensity as variables for classification of individual trees or forest stands according to species. For narrow footprint lidar instruments that record up to two return amplitudes for each output pulse, the usual preclassification of return data into first and last intensity returns camouflages the fact that a number of the return signals have only bsingle amplitudeQ (singular) returns. The importance of singular returns for species discrimination has received little discussion in the remote sensing literature. A map view of the different types of returns overlaid on field species data indicated that it is possible to visually distinguish between vegetation types that produce a high proportion of singular returns, compared to vegetation types that produce a lower proportion of singular returns, at least when using a specific laser footprint size. Using lidar data and the corresponding field data derived from a subtropical woodland area of South East Queensland, Australia, map scatterplots of return types combined with field data enabled, in some cases, visual discrimination at the individual tree level between White Cypress Pine (Callitris glaucophylla) and Poplar Box (Eucalyptus populnea). While a clear distinction between these two species was not always visually obvious at the individual tree level, due to other extraneous sources of variation in the dataset, the observation was supported in general at the site level. Sites dominated by Poplar Box generally exhibited a lower proportion of singular returns compared to sites dominated by Cypress Pine. While return intensity statistics for this particular dataset were not found to be as useful for classification as the proportions of laser return types, an examination of the return intensity data leads to an explanation of how return intensity statistics are

* Corresponding author. Tel.: +61 2 4947 9121; fax: +61 2 4921 6898. E-mail address: [email protected] (T. Moffiet). 0924-2716/$ - see front matter D 2005 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved. doi:10.1016/j.isprsjprs.2005.05.002

290

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

affected by forest structure. Exploratory data analysis indicated that a large component of variation in the intensity of the return signals from a forest canopy is associated with reflections of only part of the laser footprint. Consequently, intensity return statistics for the forest canopy, such as average and standard deviation, are related not only to the reflective properties of the vegetation, but also to the larger scale properties of the forest such as canopy openness and the spacing and type of foliage components within individual tree crowns. D 2005 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved. Keywords: lidar; intensity return; exploratory data analysis; species discrimination; data visualization

1. Introduction The pressure for improvement in remotely sensed forest information is driven by the needs of both commerce and the global community for sustainable forest resource and ecosystem management. The challenge, for researchers and the designers of remote sensing equipment, is to specify the system parameters and build the systems that will best automate the collection of the forestry information sought. A challenge of the above nature is often met globally through an iterative and evolutionary process of research and practical application of research findings. Each new research project generally involves a research question, design of data collection (or use of pre-existing data), and data analysis through some quantitative process of statistical modeling. Although the design of remotely sensed data collection typically involves a sampling plan, the derived data is usually observational in nature with regard to many of the variables of interest. The first stage of data analysis therefore involves a process of bgetting to know the dataQ through exploratory data analysis (EDA). This is most effectively done through use of any combination of software products that allow methods of data visualization and exploratory graphical analysis to be creatively designed for questioning of the data from any desired perspective. Interpretation of findings in this exploration stage may not always be based on quantitative statistical measures of significance but does require the support of other research results, a high level of statistical thinking, and subject matter knowledge from a variety of disciplines. While the outcomes of EDA may appear to be based on subjective interpretation and lack scientific method, this investigative, hypothesis building phase is a natural and necessary component of the scientific method

itself. At the very least, the exploration process will achieve the intended objective of getting to know the data prior to launching into more complex methods of data modelling. The theme described above may be followed through this paper. The primary purpose of the investigation from which this paper originated relates to the improvement of a vegetation index used for mapping and monitoring foliage projective cover (FPC) of woody vegetation over the state of Queensland, Australia by the Statewide Landcover and Trees Study (SLATS) using LANDSAT Thematic Mapper (TM) and Enhanced Thematic Mapper Plus (ETM+) satellite imagery (Goulevitch et al., 2002). This multiple regression vegetation index (MRVI) was initially developed using ground based FPC measurements from over 1800 accurately located field sites as the independent variable in a regression relationship. Airborne laser scanning (ALS), also known as airborne scanning lidar (light detection and ranging), described further below, is seen as a potentially efficient and reliable supplement to ground-based FPC survey for improvement and validation of the model. To further the improvement of the MRVI model our study was initiated with the objective to determine if the intensity of return reflections of scanning lidar could be used to improve on an empirically determined, linear (calibration) relationship between FPC measured by lidar and FPC (green leaf component) measured in the field. Note that lidar cannot distinguish between the green leaf and branch component of the vegetation reflections when only its ranging capabilities are used. In this paper, the specific objective of model improvement is subordinated to exploratory data analysis and the understanding of how lidar interacts with the forest canopy to produce intensity return signals. Species identification and classification was

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

not one of the original objectives but is an objective of this paper because it is closely tied to the understanding of how the laser interacts with the forest canopy and whether the intensity return signal carries any information on the green leaf and branch components. ALS or scanning lidar, is an active remote sensing system that utilises pulsed, single wavelength near infra-red (NIR), laser light to measure the distances from the aircraft carrying the lidar equipment to points on the terrain from which the laser is reflected. Laser ranging combined with a differential global positioning system (GPS) and determination of sensor orientation by an inertial measurement unit (IMU) enables the positions and elevations of a swath of terrain reflection points below the flight path to be determined with respect to a geodetic frame of reference (Gaveau and Hill, 2003; Wehr and Lohr, 1999). Over recent years, scanning lidar instruments have advanced from recording the first and last return amplitudes of the backscattered laser pulse to newer instruments that record up to five or more multiple returns from each pulse, or record the complete waveform of the pulse reflection. Full waveform digitising systems are now commercially available from different companies either as stand alone systems or as add-ons for older scanning lidar instruments. An assessment of the advantages and disadvantages of ALS compared with photogrammetry and other systems is given by Baltsavias (1999b). Possible future directions for laser scanning systems, in particular fused with imaging systems, was outlined by Ackermann (1999). Advancement in these predicted directions is evident in more recent literature (e.g., Masahiko et al., 2004). Development of scanning lidar as a complement to other remote sensing tools for terrain modeling, forest monitoring and structural assessment has been one focus of research by the forestry and remote sensing communities, particularly over the last decade. In many forestry applications it has been specifically and simply used as a ranging tool to estimate a digital terrain model (DTM) for terrain below a forest canopy cover, determine vegetation heights relative to the DTM, and determine simple survey site statistics such as FPC. The intensity of the return pulse signal, normally supplied with ranging data, has more or less been an unused byproduct of data collection in most terrain modeling projects (Jonas, 2002). Researchers, however, have shown interest in utilizing the return

291

intensities to deliver more information about the canopy reflecting surfaces than can be derived solely from using lidar as a ranging tool (Lim et al., 2003). In forestry applications, the challenge is to automatically extract as much information as possible on forest structure, the vertical and horizontal distribution of vegetation, the delineation of individual trees and identification of their species. One of the earliest published works on using reflection measurements, combined with height, for classification of vegetation types was by Schreier et al. (1985). This work was carried out using a prototype profiling lidar system at a time when lidar was in its infancy for use as a ranging tool for mapping terrain in forested areas. They found that it was possible to distinguish broadleaf forests from coniferous forests and low-growing vegetation cover using reflection variability measurements (percent coefficient of variation). They also found that pure broadleaf forests showed reflectance values that are significantly higher than that for pure conifer forests. Coniferous trees were shown to have significantly lower mean reflection values than broadleaf trees based on reflection values determined from the tops of individually identified trees. Openings in the forest canopy or vegetation cover were recognized as contributing to variability in the amplitude measurements. The laser system used in that study had a pulse rate of 2000 pps and a peak output of 80 W. Due to limitations of the data acquisition system, reportedly only one in every five return signals was sampled for recording. It was also reported that the return amplitude voltages represented the average of about 20 pulses. It was apparent that the system did not record first and last returns for each emitted pulse. While no specific altitude or footprint size was provided, it was stated that the best flight altitude was 300–450 m above ground and a footprint size of b50 cm diameter circle at half power pointQ was determined on a 290 m range. While many researchers have expressed a desire to extract more information from the intensity returns, Lim et al. (2003), claim that, bsince the work of Schreier et al. (1985), little work has been published on the information content of the lidar intensity returns for vegetation/forest analysis.Q Jonas (2002) claims, bThere is some hope that intensity values may be able to assist in species identification but the little research done to date has failed to find a reliable

292

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

correlation between species and laser intensity return pattern.Q Holmgren and Persson (2004), demonstrated the ability to discriminate between coniferous species, Norway spruce and Scots pine, using features extracted from airborne scanning laser data. These features were related to the characteristics of the crown structure and shape, dependent on identification of individual trees, as well as non-shape measures derived directly from the laser data such as the return intensity and proportions of the different types of laser returns. The different types of returns were defined as b(1) a single return with only one recorded amplitude peak, (2) the first return of a double return with two amplitude peaks, (3) the second return of a double return with two amplitude peaks.Q Using discriminant analysis, they found that a combination of six variables provided the best classification although it was found that a major contribution to large discriminatory power was provided by the combination of two nonshape variables, the standard deviation of intensity of all returns and the proportion of first return hits. These were specifically identified as variables that could be extracted on a stand level without any tree segmentation where the measurement intensity is too low to allow identification of individual trees. Their conclusion alluded to uncertainty in the usability of the variables for discrimination between species in datasets derived from different forest communities using different laser systems. However, the standard deviation of return intensity as a main variable fits with the work of Schreier et al. (1985), described above, for various terrains including broadleaf and coniferous forests in Canada. The work of Holmgren and Perrson was conducted on data generated from a Scandinavian boreal forest using a helicopter mounted Topeye laser system. The forest contained mostly Norway spruce, Scots pine and birch. Holmgren and Perrson reported that beam divergence was 1 mrad, the flight altitude 130 m above ground and the footprint diameter given the beam divergence and flight altitude was 0.26 m on the ground. In part, this paper complements the studies of Schreier et al. (1985), and of Holmgren and Persson (2004), as it examines intensity return and the proportion of first return and single return hits on vegetation cover. In contrast to the Northern Hemisphere studies of these authors, the data for this

investigation originated from a study of an area of diverse forest and woodland communities located at Injune in South East Queensland, Australia (Tickle et al., 2001). The data were derived using an Optech ALTM1020 scanning lidar, a different laser system to either of the two used above, and with a footprint size of 0.085 m compared to 0.26 m for Holmgren and Persson (2004) and 0.5 m or more for Schreier et al. (1985). In broad terms, the contribution of this paper is the enhancement of understanding of small footprint scanning lidar: the interaction of the laser with the forest canopy and the ground, the intensity and type of the reflected pulses, and use of the type of return signal to assist in discrimination between species. While the research is based on data derived from equipment that records only the first and last returns from each laser pulse, the findings would be applicable, at least in part, to the newer multiple return laser systems.

2. The study and the data composition 2.1. Objectives The prime objective is to determine if the intensity return of scanning lidar can be used to improve on an empirical linear relationship between FPC measured by scanning lidar and FPC (green leaf component) measured in the field. The first stage of the study, the results of which are reported in this paper, comprised the standard statistical practice of bgetting to know the dataQ through exploratory data analysis (EDA) including raw data inspection, data visualization and graphical data exploration. The objectives for this stage were threefold: 1. Find any singular or spatially grouped data anomalies that would require explanation and possible correction before further analysis. 2. Find any spatial patterns in types of laser return that would assist with understanding the interaction of the laser with the ground and vegetation. 3. Find patterns in return intensity that would indicate a relationship with foliage distribution or forest structure. That is, understand the distributions of

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

intensity data from which the average and standard deviation statistics are derived. In this paper, the greater objective of model improvement is subordinated to the objectives of the exploratory data analysis phase. Particular emphasis is placed on understanding the interaction of the laser with forest structure which includes the interaction of the laser with different forest species. Since species identification is a subject of topical interest in its own right, it has been made one of the main objectives of this paper. However, it was not a stated objective of the original study. 2.2. The study area The geographical location and nature of the study site is shown in Fig. 1. The study area of 220,000 ha covered diverse multi-aged forests and woodlands supporting a wide variety of species communities for which the common names and species codes are: White Cypress Pine (CP), Poplar Box (PBX), Silver Leaved Ironbark (SLI), Smooth Barked Apple (SBA),

293

and Brigalow (BGL). The understorey genera included Sandalwood Box (SWB) and Wilga (WIL) (Lucas et al., 2001; Paterson et al., 2001; Tickle et al., 2001). Comprehensive multi-sensor data including scanning lidar data were collected from 150 primary sampling units (PSUs), each of 7.5 ha, systematically spaced on a 4 km grid covering the 220,000 ha area. Thirteen selected PSUs (or plots) were each further subdivided into 30 quarter-hectare sub-sampling units (SSUs or subplots) from which 33 (out of a possible 390) were sampled for detailed field survey (Lucas et al., 2001; Paterson et al., 2001; Tickle et al., 2001). For the remainder of this paper, the terms plot and subplot will be used to identify a particular sub-sampling unit. The general term bsurvey siteQ will sometimes be used in place of subplot. 2.3. Lidar data The lidar data were captured over a 1 week period in August 2000 using an Optech 1020 scanning lidar mounted on a Bell Jet Ranger helicopter. The helicop-

Fig. 1. Location map and Landsat image of the Injune study area.

294

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

ter flew at a nominal altitude of 250 m above ground with a laser beam divergence of 0.3 mrad, and a scanning swath width of approximately 200 m to ensure that each 150 by 500 m plot was covered. The flying parameters resulted in a laser point sampling density averaging in the order of 1 pulse per square meter projected on the terrain surface of each survey site. Using the formulae given by Baltsavias (1999a), a nominal footprint size of 0.085 m was calculated from the flying height and beam divergence allowing for a laser aperture of 0.01 m (Optech, pers. comm., 2004). The stated wavelength of the Optech 1020 NIR laser pulse is 1.047 Am. The laser intensity returns included in the data set had not been specifically collected for the purpose of intensity analysis. No instrument calibration steps had been carried out in that regard. It should be noted that the laser pulse intensity (irradiance or power/unit area), over a transverse section of the laser beam, is assumed to follow a theoretical Gaussian profile. The laser beam divergence and footprint diameter are therefore defined in relation to the intensity profile. One common definition for beam divergence is based on the effective edge of the beam corresponding to an intensity of 1 / e times the peak intensity (36.8% of peak intensity). The 1 / e definition for beam divergence, in concordance with US-FDA laser safety standards, is used for Optech ALTM instruments (Optech, pers. comm., 2004). 2.4. Field data Data for calculating field FPC was recorded at 1 m intervals along three 50 m linear transects arranged at fixed separation in a NS direction across each survey site. For trees with diameter at breast height (DBH) N 0.05 m, the species of each tree was identified and measurements were taken of tree diameters, tree heights and crown dimensions (Lucas et al., 2001). The locations of trees with DBH N 0.1 m were mapped relative to a GPS reading at the plot origin (south-west corner) by using laser rangefinders to determine the distance and angle to the most visible of reflectors placed on all four subplot corners. The locations of trees with DBH 0.05–0.1 m were mapped relative to the GPS reading by using a 10 m grid and measuring

distances with tapes (Tickle et al., 2001). Subplot quadrant coordinate rectification, and hence individual tree coordinate rectification, had been carried out after geo-referencing of the field data with large-scale photography (LSP) and lidar data. For purposes of exploratory data analysis, described in this paper, the species information used for estimation of canopy cover dominance and plotting of species location and crown diameter with lidar data, was based on the trees with DBH N 0.1 m. The maximum DBH for Subplots 59-27 and 59-28 was 0.05 m and most trees were below 2 m in height. As a result, these subplots were not included in exploratory analyses involving species dominance. 2.5. The process for combining data from different sources The lidar data had been preprocessed into first and last ground returns and first and last vegetation returns by the data supplier. The data were contained in four separate ASCII text files for each plot. The data included x, y and z coordinates (easting, northing, and height), and return intensity. Field data were contained in separate MS EXCELR (Microsoft Corporation, 2000b) workbooks for each subplot. To combine the data for examination, all text files for the plots that contained the subplot data were transferred into MS ACCESSR (Microsoft Corporation, 2000a) as separate tables and a dedicated data extraction and inspection tool was developed in MS EXCELR. The inspection tool was used to read all lidar and selected field data, for a nominated subplot into a single workbook for data inspection, graphical displays, production of summary statistics, and calculation of other statistics such as vegetation heights and FPC (lidar and field). Lidar FPC for each site was calculated as the proportion of laser first-return reflections off vegetation higher than 2 m, over the total ground and vegetation first-return reflections for each site. Field FPC (green leaf component) for each site was calculated as the proportion of direct overhead green leaf sightings along the 150 m transect length. The comprehensive statistics produced for each subplot were then compiled into one spreadsheet for further analyses across subplots. The use of MS EXCELR fostered the examination of the data from first principles. This approach was

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

effective for datasets derived from 0.25 ha size subplots, although, due to spreadsheet size restrictions, it was not effective for complete datasets derived from 7.5 ha plots.

Table 2 An example of return-type accounting for Plot 114-Subplot 4 Vegetation First

Last

3. Data inspection 3.1. Removal of outliers The first step was to graphically inspect the raw data for outliers that could be considered to be errors of data collection or processing. An occasional, isolated height coordinate (possible bird reflection), completely remote from any neighbouring height coordinates was removed. An occasional, isolated intensity return that was more than 4 standard deviations from the mean intensity was removed. The main reasons for removing outliers were to prevent distortion of graph axes and to avoid undue influence of outliers on summary statistics in the initial exploratory stage. Even after removing outliers, care with comparing summary statistics is still required. For example, subplots with similar standard deviations of vegetation height could have quite differently shaped distributions of this variable. Distribution shape differences such as symmetric versus skew and unimodal versus multimodal, are indicators that different processes may be operating at different subplot locations.

Table 1 Classification of laser return types

295

Singular First-only Total first Singular Last-only Last-onlyveg–Grd Total last Overall total

360 321 681 360 97 457 1138

Ground 167 724 891 167 724 222 1113 2004

Total

1572

1570 3142

These subplot differences are not apparent when only looking at summary statistics. Each distribution of data generating a summary statistic needs to be at least visually examined and compared to other corresponding distributions. 3.2. Classifying laser hit types The different classifications of laser return types used in this paper are shown in Table 1. In the remainder of the document, the subscripts on the return types will be dropped in cases where the subscript is inferred by the context of the paragraph or the type of return applies to both ground and vegetation returns. Note the distinction between bfirstQ returns and bfirst-onlyQ returns based on whether singular returns are included or not. First returns include single amplitude returns (singular returns) as well as the first amplitude return components (first-only) of double amplitude returns.

Laser return type classification

Category abbreviation

First return from vegetation including singular returns First return from vegetation excluding singular returns Singular return from vegetation Last return from vegetation excluding singular returns First return from ground including singular returns First return from ground excluding singular returns Singular return from ground Last return from ground excluding singular returns Last return from ground after hitting vegetation first

Firstveg

3.3. Data accounting

First-onlyveg

Data accounting was carried out to check on the balance of first and last returns for vegetation and ground returns. An example of accounting for the different types of returns for Plot 114-Subplot 4 is given in Table 2. The following set of rules was suggested by the dataset itself and was related to the way that the raw data had been originally split into four separate ASCII files:

Singularveg Lastveg–veg Firstgrd First-onlygrd Singulargrd Lastgrd–grd Lastveg–grd

1. The count of first returns should balance approximately with the count of last returns.

296

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

2. The count of last returns from vegetation should be less than the count of first returns from vegetation, since many of the last reflections are from the ground, not vegetation, and are included in the last return ground file. 3. The count of last ground returns should be greater than the count of first ground returns. For direct ground hits a first and a last return are recorded together. In addition, a last return ground hit is included when it follows a first return vegetation hit. Of particular note, when only one return amplitude is received it is counted as both a first and a last return. This is standard practice (Jonas, pers. comm., 2003). Data for four subplots were found to be inconsistent with the above rules. In two cases rules 1 and 3 were broken where the last ground returns, after first hitting vegetation, and were also included with the first ground returns. In two other cases rules 1 and 3 were broken where the last ground returns, after first hitting vegetation, and were omitted altogether. A shortcoming of the particular data supplied in this study was that when first and last return pulses, for one emitted pulse, were recorded in separate files, the ability to specifically link them through a common time stamp was missing. This facility is important for describing the distribution of distances between first and last returns and potentially relating these distributions to different vegetation types. Holmgren and Persson (2004) used the mean vertical distance between first and last vegetation returns as a variable in their study. Lovell et al. (2003) used the linkage between first and last returns to perform calculations on laser penetration into the canopy. In retrospect, time stamping was a possible option that was not specified at the time our data were originally collected and processed. Without time stamping to assist in the definitive counting of first and last return pulses contained in separate files, the number of singular returns was determined by counting the returns having bnear identicalQ values for their respective x, y, z, coordinates and intensity measurement in each file. The first-only and last-only returns were then determined by subtraction. To capture the singular returns, a bnear identicalQ capture window was defined to be the near-

Table 3 An example of a singular return record included in both first and last vegetation return files for Plot 144-Subplot 4 x

y

z

Intensity

Vegetation first return file for P114-Sp4 547,114.27 7,159,675.73 568.06

37

Vegetation last return file for P114-Sp4 547,114.27 7,159,675.72

37

568.14

est 0.1 m for each of the x and y coordinates, the nearest 1 m for the z coordinate and an identical value for intensity. The bsingularQ returns captured within this relatively large window displayed mostly no difference in the x and/or y directions (occasionally a difference of F 0.01 m) and an approximately normal distribution of small differences in the z direction. For example, the differences in the z values for the bsingularQ returns in the first and last ground files of Plot 142-Subplot 18 had a mean (z first–z last) of 0.015 m and a standard deviation of 0.022 m. For the same subplot, the bsingularQ vegetation returns had a mean difference (z first–z last) of 0.01 m and a corresponding standard deviation of 0.026 m. The reason for the differences in registration for the bsameQ returns contained in different files is not known. In future projects it would be useful to request identification of singular returns by the data supplier. An example of an dalmost duplicateT return record included in both first and last vegetation return files, which we have classified as a singular return, is shown in Table 3.

4. Data visualization and graphical exploration bVisualisation of data is a powerful tool in its own right. . .it can effectively support, in particular, exploratory analysis of spatial data and phenomena.Q (Andrienko et al., 2003). bGeovisualisation implies the use of visual geospatial displays to explore data and through that exploration to generate hypotheses, develop problem solutions and construct knowledge.Q (Kraak, 2003). Graphical exploration involved visual comparisons of separate map scatterplots and histograms across the 33 subplots as well as examination of correlations between summary statistics for the subplots. As is typical of observational, spatial, multi-dimensional

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

297

location, tree species and crown diameter. From the lidar data, the four lidar vegetation hit types incorporated are first-onlyveg, singularveg, lastveg–veg and lastveg–grd. Only the field data for trees with diameter at breast height (DBH) N 0.1 m have been incorporated. The scatterplot shown for field Plot 142, Subplot 13 demonstrates a strong visual indication that hit type may be an important factor for discriminating between Cypress Pine and Poplar Box and possibly between other species. A relatively high proportion of singular hits is apparent at the location of the individual Cypress Pine tree compared with a relatively low proportion of singular hits at the locations of most of the Poplar Box trees. For a given Poplar Box tree, the high proportion of lastveg–grd hits indicates that after interception by vegetation at the higher levels, the path for the remaining portion of each lidar footprint to the ground is relatively unimpeded by lower levels of the tree and undergrowth. In other words, for laser beams of the nominal footprint and intensity used in this study, the Poplar Box canopy appears to be more permeable to portions of the footprint large enough to register a second reflection, compared to Cypress Pine. For some of the other subplots containing Poplar Box and Cypress Pine, the ability to visually discriminate between these species on the same basis as Fig. 2 was not as clear, due to variation in lidar hit density,

data, apparent associations or lack thereof need to be interpreted with care. For example: 1. Important associations may be missed when there is a lack in balance of data across levels of an important variable. 2. The importance of associations between two variables can be incorrectly inferred when in fact they are related only through separate causal links to a third (hidden) variable. 3. Dependence between subplot observations in spatial data may add unjustified strength to some analyses and consequent conclusions. 4. The observations are conditional on instrument factors and environmental factors not recorded with the data. Whether rigorously applying statistical tests or just carrying out visual inspection as part of data exploration, a high level of statistical thinking is important to interpretation. 4.1. Indication of species discrimination using laser hit type: observations and discussion A scatterplot that depicts field data combined with lidar data in a map view is shown in Fig. 2. The data dimensions incorporated from the field data are tree

Map scatterplot:-Plot 142 Subplot 13 7147620

Tree Species (Large markers) PBX Area= 871.3 CP- Area= 36.8

7147610

Poplar Box

Northing

7147600

Cypress Pine

7147590 Lidar Veg Hits (Small markers)

7147580

Last Veg-Grd Last Veg-Veg

7147570

Singular Veg First-Only Veg

7147560 539650

539660

539670

539680

539690

539700

Easting Fig. 2. Shows an example of a map scatterplot of types of lidar hits on vegetation, combined with field species data for trees with DBH N 10 cm.

298

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

variation in species mix and their spatial distributions in both the canopy and understorey levels, and variation in coordinate registration between the two types of data. While our argument of the importance of this observation may be regarded as weakly based given that it is only one particularly strong observation, it should be remembered that we are dealing with observational data not specifically designed for the purpose of species identification. We are not analysing the results of an experimental design but performing data exploration in the face of a large number of uncontrolled variables and expect to have variation in observations obscure our findings. However, this particularly strong observation provided an insight into how the laser possibly interacts with the canopy. The observation makes sense given the very different structural characteristics of those particular species. In general, the Poplar Box species exhibits larger gaps in its crown foliage enabling significant portions of the pulse footprint to pass unimpeded to the ground for a significant second reflection. Cypress Pine has a relatively dense needle like leaf structure providing little opportunity for a significant second reflection either from within the crown or from the ground. Of course, there will be variations to these generalisations depending on the age and conditions of the trees. Despite the variation in observations, the proportion of singular hits on the canopy area was indicated to be

a potential, useful variable for discrimination between Cypress Pine and other species similar to Poplar Box at the individual tree level. This observation is in direct support of the finding of Holmgren and Persson (2004) that the obverse variable, bproportion of first return hitsQ, had high power for discriminating between Scots pine and Norway spruce. The proportion of singular hits may be regarded, in some sense, as a measure of the impermeability of the foliage cover to a laser pulse of defined footprint size. These observations indicate that a new statistic, which we term bvegetation permeabilityQ, might be informative in species discrimination/classification. Vegetation permeability is defined here as the proportion of primary vegetation laser strikes over a defined area, survey site or tree crown, for which secondary reflections from the ground are registered. Site vegetation permeability plotted against the dominant species in each subplot is shown in Fig. 3. The site statistics confirmed the previous observation at the individual tree level. Sites with Poplar Box as the dominant species exhibited relatively high vegetation permeability whereas sites dominated by Cypress Pine exhibited relatively low vegetation permeability. Despite this apparent confirmation, the factors contributing to variation in the measure of vegetation permeability require further investigation before it can be regarded as being a useful measure. On the surface, it does not appear to be very useful for

Site Vegetation Permeability V Species 1

Site Vegetation Permeability

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

SLI

SBA

PBX

1

2

3

CP-

ECH

BGL

GBO

4

5

6

7

0 0

Species Fig. 3. Vegetation permeability plotted against dominant species type.

8

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

The high correlation (correlation coefficient r = 0.82, p-value = 0.000) between ground and vegetation intensity depicted in Fig. 4 is noteworthy. It would seem more than coincidental that average vegetation intensity is not independent of average ground intensity if the incident lidar pulse intensities were approximately constant across all subplots. It suggests that a large component of the between site variation in both ground and vegetation intensity returns in our study is related to variation in the strength of the incident laser pulse. This could be due to factors associated with the atmosphere, the instrument, or the aircraft’s altitude and attitude respectively. b) There is a strong positive correlation (correlation coefficient r = 0.93, p-value = 0.000) between average vegetation intensity returns and standard deviation of vegetation intensity returns (Fig. 5). This type of relationship usually evokes statistical implications for modelling, hypothesis testing and interpretation. However, in this case, the relationship first requires questioning from the perspective of constant laser incident intensity. If the laser intensity was controlled to provide a constant incident intensity, the standard deviation of vegetation return intensity may then be independent of the vegetation average return intensity and both may then be measures of different forest attributes. The

discriminating between other combinations of species. While it is obvious that the variation in percentage of canopy cover for the dominant species, the variations in species mix, and the physical variation in canopy structure of the dominant species, would be contributing factors to the observed variations in permeability, it is likely that variations in lidar parameters, such as footprint size and laser intensity, would also contribute. For example, an examination of the details behind the outlying observation for the group of Cypress Pine dominant sites revealed that while that site exhibited the highest vegetation permeability, it also, unexpectedly, had the highest of the canopy cover dominances in that group. (Cypress Pine comprised 90% of the canopy cover for that site.) The site also exhibited considerable variation in laser hit density and had the highest intensity return average for that group of sites. An attempt to understand some of the causes of variation in intensity is covered in the next section. 4.2. Exploratory analysis of intensity returns: observations and discussion For the particular dataset examined, the following observations were made: a) Vegetation intensity returns are highly, positively correlated with ground intensity returns (Fig. 4).

Correlation between Ground Intensity and Vegetation Intensity 80

Average Intensity for Subplot Vegetation First Returns

Correlation Coefficient= 0.82 p-value= 0.0000

70 60 50 40 30 20 10 0 0

20

40

299

60

80

100

120

Average Intensity for Subplot Ground First Returns Fig. 4. Shows a strong correlation between ground and vegetation intensity returns.

140

300

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

Correlation between Vegetation Intensity and Standard deviation of Vegetation Intensity 45 Correlation Coefficient= 0.93 p-value= 0.0000

St. Dev. Intensity for Subplot Vegetation First Returns

40 35 30 25 20 15 10 5 0 0

10

20

30

40

50

60

70

80

Average Intensity for Subplot Vegetation First Returns Fig. 5. Variation in vegetation intensity returns is correlated with the average intensity return.

intensity in every case occurred only in the East– West direction, which was the flight direction of the aircraft. Since the orientation of the direction of change was not independent of the aircraft direction in all six cases, it was thought possible that an aircraft or instrument related factor, such as variation in aircraft height or variation in laser output power, was a more major contributor to the variation than atmosphere or terrain factors. Optech (personal communication, 2004) indicated that the bpower output of the laser can vary slightly

possible cause of the largest component of the variation in both average intensity returns and standard deviation of intensity returns will be further explored below. c) Relatively large step changes and drifts in the mean ground and vegetation intensity returns were apparent in six of the subplots. An example is shown in Fig. 6a. These changes were not associated with any observed or measured terrain features within their respective subplots. The rise or fall in average Plot 111 Subplot 12 Vegetation & Ground First Intensity Returns

a 140

Ground Intensity

120

Vegetation Intensity

160

Raw Intensity

Raw Intensity

Plot 138 Subplot 16 Vegetation & Ground First Intensity Returns

b

160

100 80 60

140

Ground Intensity

120

Vegetation Intensity

100 80 60

40

40

20

20

0 535890 535900 535910 535920 535930 535940 535950

0 562020 562030 562040 562050 562060 562070 562080

Easting

Easting

Fig. 6. (a) For Plot 112 Subplot 12, there is a step change in average vegetation and ground intensity returns and a corresponding change in intensity variation, (b) For Plot 138 Subplot 16, average ground and vegetation intensity returns are relatively constant, with constant variation.

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

depending on ambient temperature of the diodes and decreasing with age (i.e. hours of operation).Q However, they also stated that variation from this source is controlled by bthermal-electric coolers to temperature stabilize the diodes.Q It is not known whether this systematic variation has been observed for other datasets. However, the contributions of variation in intensity output should be examined in future studies for which intensity of return is a particular measure of interest in the research project. For those studies, pulse output intensity data should be specified to be included with the remotely sensed data in order to confirm output intensity stability over the time of the data collection period. Formulae that show the relationships between pulse power (pulse intensity), pulse duration, pulse energy and pulse rate are given by Baltsavias (1999a). For the remaining subplots, the mean ground and vegetation intensity returns were relatively stable over the area of each subplot (Fig. 6b). As shown earlier in Fig. 4, the average ground intensities ranged from about 15 to 130 units with corresponding, correlated average vegetation intensities in each case. The changes in average intensity between subplots were of a similar nature to the

within subplot changes previously described. The similarities of magnitude and nature of the changes within and between subplots suggested that the same binstrumentQ process was operating to produce the largest component of the intensity variation in the dataset. Any between subplot intensity differences due to terrain or atmosphere properties could not be determined without correcting for the effects of the variation due to the unknown binstrumentQ causes. In the absence of any basis for making appropriate corrections relative to real ground or vegetation reflectivity differences, ground intensity first returns were scaled to an average intensity of 100 units for each subplot and the corresponding vegetation intensities scaled proportionally to match. While this scaling was inadequate as a process for full correction, it allowed the exploration of the shapes of the distributions of vegetation intensity returns on a similar scale for each subplot. d) The within subplot intensity histograms for vegetation singular returns compared to vegetation firstonly returns show consistently different shapes for each type of return across the subplots (Fig. 7). The histograms for the intensities of vegetation singular returns are generally approximately symmetric

Plot 142 Subplot 18: Histograms of Scaled Vegetation Intensity Return by Type 0.3

Vegetation Singular Returns n=89

0.2

0.1

140 150

110 120 130

90 100

60 70 80

40 50

0

0

10 20 30

Relative Frequency

301

-0.1

Vegetation First Only Intensity Returns n=1049

-0.2

-0.3

Scaled Intensity Fig. 7. Shows different shape and location of intensity return distributions for singular and first-only vegetation returns.

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

their own NIR reflectivity and surface component normal to the incident beam. For first-only returns, in addition to the intensity variation described above for singular returns, a large component of the variation in intensity is due to the variation in the intercepted portions of the footprint pulse energy involved in the reflections. The larger the portion of pulse energy intercepted, the greater the opportunity for a large intensity return. e) The intensity distribution for lastveg–grd returns is completely different from that of the direct ground returns despite being reflected from essentially the same surface (Fig. 8). After a portion of the laser pulse is first reflected by vegetation, the maximum possible reflected intensity from the ground is reduced according to the pulse energy remaining within the portion of footprint available for a second reflection. This illustrates how a large component of the variation in combined intensity data is not necessarily just related to variation in reflective surface properties but is also related to the portions of the footprints available for reflection. Since the distribution of laser energy over the laser footprint is not uniform but Gaussian, the measured backscattered intensity of a target smaller than the footprint involved in a first reflection is not only dependent on the radar cross section of the target but also on the position

while the vegetation first-only returns are generally right skewed. The different shape of the intensity distribution for singular returns, compared to first-only returns, indicates that the singular returns are a result of a different process of interaction with the canopy. For singular returns, it is proposed that there is an insufficient coherent portion of the pulse footprint remaining with sufficient energy for a significant second reflection from a lower level. Singular returns are not just a result of unrecorded or bmissingQ second amplitudes. The variation in intensity of singular returns does not contain a large component that is due to variation in the portions of the footprints intercepted by vegetation objects. Instead, the shape of the distribution for singular returns is more likely a result of the variation in reflection due to the arrangement of vegetation objects, contained within bnearly wholeQ footprint areas projected into the canopy. Note that the reflecting vegetation bsurfaceQ is not likely to be one integral planar unit having uniform reflectivity, but instead, may contain varying surface relief, varying surface orientations, and gaps between vegetation objects over the area of the projected footprint. The overall intensity of reflection depends on the amplitude of the integral energy reflected from the individual objects at essentially the same elevation within the footprint, having

Plot 142 Subplot 18 Histograms of Last Ground Return Intensities 0.2

Last Veg-Grd Returns n=1005 150

140

130

90

100

80

70

50 60

40

30

20

0

0 10

Relative Frequency

0.1

110 120

302

-0.1

Last Grd-Grd Returns n=1668

-0.2

-0.3

-0.4

Scaled Intensity Fig. 8. Shows a different shape and location of intensity return distributions for lastgrd–grd and lastveg–grd returns.

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

by height and type of reflection for each subplot. For the examples shown in Fig. 9, singular and lastveg–veg returns are predominant for vegetation reflection heights below about 5 m. At heights above this level, first-only returns are predominant and generally contained within a region characterised by an increase in range of intensity with increase in reflection height. Singular returns are present at the greater heights but distributed more

of the target within the footprint. Similarly, for a target larger than the footprint, when only a portion of the footprint is involved in a second reflection, the measured backscattered intensity depends not only on the radar cross section of the target but also the location of the footprint portion within the original footprint projection. f) Patterns in vegetation return intensity distributions are revealed in bivariate scatterplots of intensity

Plot 142 Subplot 18: Scaled Intensity Return Versus Reflection Height

a 120

Last Veg-Veg (Scaled) Intensity Return Veg First-Only (Scaled) Intensity Return Singular (Scaled) Intensity Return

100

Scaled Intensity

303

80

60

40

20

0 0

5

10

15

20

25

Laser Hit Height (m) Plot 111 Subplot 18: Scaled Intensity Return Versus Reflection Height

b 120

Last Veg-Veg (Scaled) Intensity Return Veg First-Only (Scaled) Intensity Return Singular (Scaled) Intensity Return

Scaled Intensity

100

80

60

40

20

0 0

5

10

15

20

25

Laser Hit Height (m) Fig. 9. (a) and (b) Examples of intensity by height scatterplots. Similar patterns exist for the different types of vegetation returns across different subplots.

304

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

towards higher intensities. A lack of representation of strikes in the mid-height range at the higher intensities is apparent as a roughly V shaped gap in the scatterplot. Variations occur, from the generalised pattern described for the examples above, depending on the extent of canopy cover and mix of vegetation species and their height distributions. In some cases (Fig. 9), there is a reasonably clear delineation of the distribution of lastveg–veg returns with height below the canopy suggesting that they are associated with an outer reflecting surface of understorey vegetation or lower canopy layer rather than with a distribution of reflection heights through the canopy. Variations in the intensity/ height patterns of the singular returns are likely to be determined by the spatial distributions of vegetation surfaces with low laser permeability in direct view of the laser (some canopy species as well as shrubs and young trees exposed in the canopy gaps). The intensity/height distribution patterns of first-only returns are more likely to be determined by the edges created by canopy openness (tree crown outlines) and the spatial distribution of within crown foliage gaps that are large enough to allow a sufficient portion of the laser footprint to penetrate and register a second reflection from a lower level. The signif-

icance of variations in the intensity component of these plots would be better understood if an appropriate intensity return correction could be made to allow for different incident intensities between survey sites as described in Section 4.2c). The importance of controlling incident intensity is highlighted in the next section. g) Variations of incident intensity show up as variations in brightness and contrast in grey scale intensity return images. Fig. 10a demonstrates that the lidar intensity measurements are sufficiently sensitive to differences in ground texture to enable unsealed roads or tracks to be differentiated from the surrounding off-road ground. While vegetation can be clearly distinguished from the ground, the intensity image does not readily allow visual discrimination between different vegetation types. The systematic intensity variation described in comment 4.2c) is clearly shown in a grey scale intensity image of Plot 114 (Fig. 10b). This variation was definitely unrelated to the terrain or atmosphere and due to the instrument. The variation in image brightness and contrast is not unlike that expected if both the camera aperture and exposure times were varied when creating photographic images. Properly correcting for the intensity varia-

a

b

Fig. 10. (a) Grey scale intensity image showing visible differences in vegetation, road and off-road intensities (Plot 144), (b) Grey scale intensity image showing variation in brightness and contrast (Plot 111).

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

tion would involve correcting for both brightness and contrast given some ground references. It follows that the laser output pulse power, duration, and footprint size, should be controlled in such a way that there is a consistent bphotographic flash exposureQ of ground objects to enable the intensity return statistics to be useful for discrimination between subplots having different dominant species. This would result in consistency of brightness and contrast of the intensity image, for similar soil and vegetation terrains (given similar atmosphere transmission properties for the laser, low variation in flying height and consistent detection settings of the instrument for return intensity recording). h) Average vegetation intensity return as a site variable, after scaling to bcorrectQ for differences in average ground return intensity, appeared to assist visual discrimination between Poplar Box and Cypress Pine dominant sites (Fig. 11). The standard deviation of scaled vegetation intensity returns for sites was not shown to have the same effect. However, we held reservations about the accuracy of any statistical significance of these intensity variables because of the difficulties described in Section 4.2g). Despite the statistical reservations, the main purpose for investigating the intensity observations, outlined in this section, is to understand why variables

305

based on intensity statistics may be useful for species discrimination, supporting the findings by Schreier et al. (1985), and, Holmgren and Persson (2004). There are many structural features of forests and trees that will contribute to the average return intensity and the standard deviation of the return intensity distribution. Any features of forests and trees that increase the opportunity to intercept only portions of the lidar footprint will lower the average and increase the standard deviation of the vegetation intensity returns. These features relate to the length of exposed edges of tree crowns in open forest canopies and the openness between foliage components within individual tree crowns. Conversely, the features of forests and trees that will increase the opportunity for singular amplitude reflections, due to bwhole footprintQ interceptions by foliage components at one elevation, will increase the average and decrease the standard deviation of vegetation intensity returns. These features relate to the closure of the forest canopy and the closure of large gaps within the individual tree crowns. The vegetation structural attributes contributing to variation in the portions of intercepted footprint are on a large scale relative to the laser footprint size. On a smaller scale, features of foliage contained within the intercepted footprint area also contribute to average intensity return and standard deviation of intensity return. The features of the intercepting veg-

Ave Vegetation Intensity (Scaled) V Species

Ave Vegetation Intensity (Scaled)

70 60 50 40 30 20 10 SLI

SBA

PBX

1

2

3

CP-

ECH

BGL

GBO

4

5

6

7

0 0

8

Species Fig. 11. The average vegetation return intensity (after scaling) for each subplot, plotted against the dominant species for each subplot.

306

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

etation elements within the footprint include reflectivity, surface morphology and surface area distribution, surface angle distribution, and between surface (interleaf) gap distribution. Garcia-Haro and Sommer (2002), give a description of NIR directional reflectance properties of vegetation canopies in terms of canopy architecture parameters, but under passive illumination by sunlight. The two scales of influence on intensity statistics could mean that in some cases, the larger scale, forest structural differences will be the determining factor for significant differences in intensity return statistics between sites independent of whether there are dominant species differences between sites. In other cases the smaller scale, bwithin footprintQ, differences between vegetation elements will be the main determining factor for significant differences in intensity return statistics between sites that do have dominant species differences. For example, Schreier et al. (1985) found significant differences in intensity return between Jack pine plantations having different stem densities. The site with the higher stem density bshowed a small but statistically significant increase in intensity.Q They also found that there were no observable differences in reflection between a young Jack pine plantation and the adjacent Scots pine plantation despite the fact that they had very different tree densities. Furthermore, at an individual tree level, they found that bgenerally coniferous trees have significantly lower reflection values than broadleaf trees.Q Following from the above discussion, it becomes obvious that vegetation intensity returns are not solely related to the NIR reflectance properties of the different species and therefore intensity return statistics on their own are not sufficient to effectively discriminate between species. Schreier et al. (1985) suggested that a combination of variables, including tree height, is required for automated analysis. Similarly and obversely, To¨rma¨ (2000) when estimating tree species-proportions in forest stands using airborne laser scanning, concluded that only using 3D coordinates was not enough and better results would probably be achieved by using intensity data together with detection of individual trees. Holmgren and Persson (2004), following To¨rma¨’s suggestion, used high-density laser data and algorithms that enabled separation and modelling of individual tree crowns. This then enabled many tree shape parameters, intensity parameters and laser return

type proportions to be determined and used in discriminate analysis. Their analysis revealed that, while a combination of six of these variables resulted in a classification accuracy of 95%, a high classification accuracy could be achieved with just two variables that could be extracted on a stand level without any tree segmentation: the standard deviation of intensity of all vegetation returns and the proportion of first returns. For their study these variables gave classification accuracies of 83.6% and 78.0% respectively when used as separate, individual classifying variables (Holmgren and Persson, 2004, pp. 421). 4.3. Height distributions: observations and discussion Separating the heights for the different types of returns and plotting separate histograms can aid in seeing where differences occur in the vertical distribution of laser hits of different types. This is a onedimensional view of the bivariate intensity/height distributions described in Section 4.2f. Vegetation heights, measured by laser, are normally determined by the firstveg returns, which includes both singular and first-only returns. In a single histogram of laser determined vegetation heights it can sometimes be difficult to see that the forest is composed of more than one distribution of heights (Fig. 12). The separate histograms of the heights of vegetation first-only returns, singular returns and last returns appear to distil separate height distribution patterns for different foliage types. The histograms of singular and last returns tend to emphasize the heights of the smaller trees and shrubs. (See also Fig. 9 and Section 4.2f). It is not known whether different patterns in these secondary height histograms across different sites exclusively represent different forest structural information or whether variations in lidar pulse parameters also contribute. For the purpose of our investigation, average vegetation height and standard deviation of vegetation height were the only height parameters used in the exploratory linear discriminant analysis described below. 4.4. Exploratory species classification: results and discussion Linear discriminant analysis was used to explore the potential of lidar subplot variables to correctly

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

307

Plot 142 Subplot 18: Distributions of Reflection Heights 0.4

Vegetation First Returns n=1138

0.2 0.1 0 0 2 4 6 8 10 12 14 16 18

Relative Frequency

0.3

Vegetation Singular Intensity Returns n=89

-0.1 -0.2 -0.3

Height (m) Fig. 12. Example of different distributions of reflection heights for first returns and singular returns.

predict the bdominant speciesQ for each subplot. The dominant species for each subplot was designated to be the species having the greatest proportion of canopy area based on tree classification and measurements of crown diameter in the field for trees with DBH N 0.1 m. To create the dependent variable, each of the 31 subplots was assigned to one of four groups according to its dominant species: Cypress Pine (CP), Poplar Box (PBX), Smooth barked apple (SBA) and Other. The lidar predictor or classifying variables that were examined are listed below.

matrix and scatterplot matrix of the predictor variables revealed statistically significant linear correlations between these competing main variables. A classification accuracy of 77% was achieved for the species grouping variable using the lidar predictor variables: proportion of singular returns, average of first return vegetation height, standard deviation of first return vegetation height, and FPC. The proportion of singular returns contributed most of the discriminatory power. Fig. 13 shows a bivariate cluster plot of this Bivariate Cluster Plot of Linear Discriminant Scores

1. 2. 3. 4. 5. 6.

Proportion of singular returns Vegetation permeability Average vegetation first return height Standard deviation of vegetation first return heights Average vegetation first-only return height Standard deviation of vegetation first-only return heights 7. Average of scaled vegetation intensity return 8. Standard deviation of scaled vegetation intensity return 9. FPC The proportion of singular returns, vegetation permeability and averaged scaled intensity returns competed for the position of the main variable to be included in the discriminant model. The correlation

Linear Discriminant Score 2

4 PBX CPSBA Other

3 SBA

2 1 0 -1

PBX

CPOther

-2 -3 -4 -4

-3

-2 -1 0 1 2 Linear Discriminant Score 1

3

4

Fig. 13. Bivariate cluster plot of the first two linear discriminants for classifying survey sites by dominant tree type.

308

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309

model’s discriminant scores for each of the 31 sites. The plotting symbols represent the pre-classification groups to which each site belongs. The ellipses contain the scores for sites belonging to the predicted classification. (The ellipses are not equivalent to confidence regions.) It should be noted that this exploratory discriminant analysis is post hoc and independent data are required for verification of the importance of the model variables. The model, and its predictive discriminative ability, makes no allowance in the dependent (predicted) variable for the degree of species dominance or contribution to the canopy dominance of the trees with DBH b 0.1 m. It also makes no allowances in the predictor variables for separate distributions of species heights within subplots, nor variation in the ages of same species trees within and between sites. Without full consideration of the mix of height distributions in complex mixed age and mixed species forests, the importance of including general height statistics in the discriminant model should be treated with some caution. For automating lidar discrimination of species in such forests, the different height distributions may need to be taken into account.

5. Conclusions Through exploratory analysis of lidar data using basic statistical tools the following primary conclusions were drawn. 1. The distributions of lidar intensity returns from vegetation do not solely represent variations in vegetation reflectivity. For this reason the intensities of the individual return signals cannot be used directly to distinguish between separate vegetation components such as branches and green leaves. 2. Average lidar return intensity and intensity variation may be useful variables to assist with species discrimination but the incident NIR pulse intensity needs to be reasonably constant over the time and space in which comparisons are made. The average return intensity and standard deviation of return intensity are affected by forest structure as well as the reflective properties of the vegetation.

3. Combined intensity/height distributions for the different types of laser returns contain information on different reflective layers in the forest structure. 4. The proportion of singular returns, and, the vegetation permeability index are two variables that show potential to assist with species discrimination using ALS.

Acknowledgements The authors would like to acknowledge the support provided for this work by the Australian Research Council under their linkage program, the Queensland Department of Natural Resources and Mines, and the University of Newcastle, N.S.W. They would also like to thank and acknowledge the work of those people involved in the original Injune remote sensing and field surveys from which the data for this study was sourced. The initial study was supported by the Australian Research Council, the Queensland Department of Mines and Natural Resources, the University of NSW, Bureau of Rural Sciences, Queensland University of Technology, Queensland Dept. Primary Industries Tropical Beef Centre, and the Cooperative Research Centre for Greenhouse Accounting. The authors would also like to thank Optech Incorporated, Ontario, Canada (Wayne Szameitat, Survey Applications Specialist) and AAM Hatch, Brisbane, Australia (David Jonas, ALS Operations Manager) for their communications on technical aspects of Optech equipment. In addition, the authors would like to thank Dr. Richard Lucas (University of Wales, Aberystwyth, UK) and the two anonymous reviewers for their constructive comments and suggested improvements.

References Ackermann, F., 1999. Airborne laser scanning—present status and future expectations. ISPRS Journal of Photogrammetry and Remote Sensing 54 (2–3), 64 – 67. Andrienko, G., Andrienko, N., Gitis, V., 2003. Interactive maps for visual exploration of grid and vector geodata. ISPRS Journal of Photogrammetry and Remote Sensing 57 (5–6), 380 – 389. Baltsavias, E.P., 1999a. Airborne laser scanning: basic relations and formulas. ISPRS Journal of Photogrammetry and Remote Sensing 54 (2–3), 199 – 214.

T. Moffiet et al. / ISPRS Journal of Photogrammetry & Remote Sensing 59 (2005) 289–309 Baltsavias, E.P., 1999b. A comparison between photogrammetry and laser scanning. ISPRS Journal of Photogrammetry and Remote Sensing 54 (2–3), 83 – 94. Garcia-Haro, F.J., Sommer, S., 2002. A fast canopy reflectance model to simulate realistic remote sensing scenarios. Remote Sensing of Environment 81 (2–3), 205 – 227. Gaveau, D.L.A., Hill, R.A., 2003. Quantifying canopy height underestimation by laser pulse penetration in small-footprint airborne laser scanning data. Canadian Journal of Remote Sensing 29 (5), 650 – 657. Goulevitch, B.M., Danaher, T.J., Stewart, A.J., Harris, D.P., Lawrence, L.J., 2002. Mapping woody vegetation cover over the state of Queensland Using Landsat TM and ETM+ imagery. Proceedings of the 11th Australasian Remote Sensing and Photogrammetry Conference, September, Brisbane, Australia. ˚ ., 2004. Identifying species of individual Holmgren, J., Persson, A trees using airborne laser scanner. Remote Sensing of Environment 90, 415 – 423. Jonas, D., 2002. Airborne laser scanning: developments in intensity and beam divergence. Proceedings of the 11th Australasian Remote Sensing and Photogrammetry Conference September, Brisbane, Australia. Kraak, M.-J., 2003. Geovisualization illustrated. ISPRS Journal of Photogrammetry and Remote Sensing 57 (5–6), 390 – 399. Lim, K., Treitz, P., Baldwin, K., Morrison, I., Green, J., 2003. Lidar remote sensing of biophysical properties of tolerant northern hardwood forests. Canadian Journal of Remote Sensing 29 (5), 658 – 678. Lovell, J.L., Jupp, D.L.B., Culvenor, D.S., C.N.C., 2003. Using airborne and ground-based ranging lidar to measure canopy structure in Australian forests. Canadian of Journal Remote Sensing 29 (5), 606 – 622. Lucas, R.M., Tickle, P., Witte, C., Milne, A.K., 2001. Development of multistage procedures for quantifying the biomass, structure and community composition of Australian woodlands using polarimetric radar and optical data. Proceedings IEEE Int. Geo-

309

science and Remote Sensing Symposium, University of New South Wales, Australia, pp. 1353 – 1355. Masahiko, N., Ryosuke, S., Dinesh, M., Huijing, Z., 2004. Development of digital surface model and feature extraction by integrating laser scanner and CCD sensor with IMU. International Archives of Photogrammetry and Remote Sensing, XXth ISPRS Congress Symposium July 12–23, Istanbul, pp. 781 – 785. Volume XXXV, part B5. Microsoft Corporation, 2000a. MicrosoftR Access version 2000. Microsoft Pty Ltd., Australian Subsidiary, North Ryde, NSW, Australia. Microsoft Corporation, 2000b. MicrosoftR Excel version 2000. Microsoft Pty Ltd., Australian Subsidiary, North Ryde, NSW, Australia. Paterson, M., Lucas, R.M., Chisholm, L., 2001. Differentiation of selected Australian woodland species using CASI data. Proceedings IEEE Int. Geoscience and Remote Sensing Symposium University of New South Wales, Australia, pp. 643 – 645. Schreier, H., Lougheed, J., Tucker, C., Leckie, D., 1985. Automated measurements of terrain reflection and height variations using an airborne infrared laser system. International Journal of Remote Sensing 6 (1), 101 – 103. Tickle, P.K., Witte, C., Lee, A., Lucas, R.M., Jones, K., Austin, J., 2001. The use of airborne scanning lidar and large scale photography within a strategic forest inventory and monitoring framework. Proceedings IEEE Int. Geoscience and Remote Sensing Symposium, University of New South Wales, Australia, pp. 1000 – 1003. To¨rma¨, M., 2000. Estimation of tree species proportions of forest stands using laser scanning. International Archives of Photogrammetry and Remote Sensing ISPRS Congress Symposium July 16–23, Amsterdam, pp. 1524 – 1531. Volume XXXIII. Wehr, A., Lohr, U., 1999. Airborne laser scanning—an introduction and overview. ISPRS Journal of Photogrammetry and Remote Sensing 54 (2–3), 68 – 82.