ELSEVIER
Critique of Stepwise Multiple Linear Regression for the Extraction of Leaf Biochemistry Information from Leaf Reflectance Data Y. L. Grossman, S. L. Ustin,* S. Jacquemoud,* E. W. Sanderson,* G. Schmuck, t and J. Verdeboutt T h i s study examined the use of stepwise multiple linear regression to quantify leaf carbon, nitrogen, lignin, cellulose, dry weight, and water compositions from leaf level reflectance (R). Two fresh leaf and one dry leaf datasets containing a broad range of native and cultivated plant species were examined using unconstrained stepwise multiple linear regression and constrained regression with wavelengths reported from other leaf level studies and wavelengths derived from chemical spectroscopy. Although stepwise multiple linear regression explained large amounts of the variation in the chemical data, the bands selected were not related to known absorption bands, varied among datasets and expression bases for the chemical [concentration (g g-1) or content (g m-2)], did not correspond to bands selected in other studies, and were sensitive to the samples entered into the regression. Stepwise multiple regression using artificially constructed datasets that randomized the association between nitrogen concentration and reflectance spectra produced coefficients of determination (R2"s) between 0.41 and 0.82 for first and second derivative log(1/R) spectra. The R2"s for correctly-paired nitrogen data and first and second derivative log(1/R) only exceeded the average randomized R2"s by O.02-0.42. Replication of this randomization experiment on a larger dry ground leaf data set from the Harvard Forest showed the same trends but lower R2"s. * Department of Land, Air, and Water Resources, University of California, Davis tAdvanced Techniques Branch, Joint Research Centre, Ispra, Italy Address correspondence to Susan L. Ustin, Dept. of Land, Air, and Water Resources, Univ. of California, Davis, CA 95616. Received 5 November 1994; revised 18 October 1995. REMOTE SENS. ENVIRON. 56:182-193 (1996) ©Elsevier Science Inc,, 1996 655 Avenue of the Americas, New York, NY 10Ol0
All of these results suggest caution in the use of stepwise multiple linear regression on fresh leaf reflectance spectra. Band selection does not appear to be based upon the absorption characteristics of the chemical being examined.
INTRODUCTION The detection of biologically important compounds in plant canopies at regional scales may become possible using airborne sensors with high spectral and spatial resolution (Curran, 1989; ACCP, 1994). The first step in developing techniques to detect these compounds in remotely sensed data is to evaluate laboratory methods for extracting biochemical information from fresh leaves. Laboratory-based statistical techniques known as nearinfrared spectroscopy (NIRS) protocols, have been developed and used extensively for determining protein, carbohydrate, and other biochemical contents from dried, ground agricultural and food products (Williams and Norris, 1987; Card et al., 1988; Marten et al., 1989; Wessman, 1990). The NIRS protocols use stepwise multiple linear regression to develop a calibration equation by selecting a small number of narrow band reflectances that explain a large proportion of the variation in biochemical content (Williams and Norris, 1987; Marten et al., 1989). Frequently, derivative transformations of the reflectance data provide the best explanation of the variation (Hruschka, 1987). These measurements show a high level of reliability and reproducibility when used under carefully controlled conditions. In fact, NIRS methods are now used by many labs in place of analytical chemical techniques to determine chemical composition 0034-4257 / 96 / $15.00 SSDI 0034-4257(95)00235-9
Extraction of Leaf Biochemistry Information 1 83
of dried ground samples (Williams and Norris, 1987; Marten et al., 1989). The NIRS approach has been extended to dried ground plant materials to measure nitrogen, lignin, and other biochemical contents with the goal of developing remote sensing applications (Card et al., 1988; Peterson et al., 1988; Wessman et al., 1988; McLellan et al., 1991). Bolster et al. (1996), however, found stepwise multiple regression did not perform as well on dried green foliage samples as the full spectrum calibration method, partial least squares. The extension of NIRS type protocols to intact fresh leaves may prove difficult. Two types of difficulties arise: 1) The NIRS protocols have been developed and validated on uniformly ground, uniformly packed, optically thick layers of dried, ground material whereas, for use with spaceborne sensors, we wish to determine biochemical information from intact fresh leaves, and 2) the spectrum of water dominates fresh leaf reflectance, potentially masking the reflectance features detected in dry material (Gates, 1970; Tucker and Garratt, 1977; Curran, 1989; Elvidge, 1990). Recent studies on reflectance properties of fresh leaves have used stepwise multiple regression to select wavelengths that correlate with leaf biochemistry (Curran et al., 1992; ACCP, 1994). Johnson and Billow (1996) examined Douglas fir needles that were grown under various fertilization treatments and found that the first derivative of the fresh leaf spectra was strongly correlated with total nitrogen concentration. Yoder and Pettigrew-Crosby (1996) also found first derivative spectra were the best predictors for nitrogen and chlorophyll for bigleaf maple grown under three fertilization treatments. Jacquemoud et al. (1995) determined that stepwise multiple regression of dried leaf and optically thick sample spectra gave higher correlations than did fresh leaves for nitrogen. Curran et al. (1992) examined fresh leaves of Amaranthus and concluded that spectral overlaps between foliar biochemicals needed to be considered to obtain accurate estimates of concentration. In this study we used species-diverse sets of intact fresh and dry leaves to examine the relationship between chemical and spectral characteristics. We have examined the use of unconstrained and constrained stepwise multiple linear regressions to predict carbon, nitrogen, lignin, cellulose, water, and dry weight composition from infrared reflectance. Several data transformations were examined to determine which explains the greatest amount of the variation in the chemical data. To test for spurious correlations, we developed an empirical test to estimate the "baseline" explanation of the variation when there is no statistical relationship between the chemical composition and reflectance spectra. The consistency of the wavelengths selected by stepwise multiple linear regression to changes in dataset composition was tested by running regressions on five subsets of the one dataset.
Table 1. Spectral and Biochemical Analyses Were Conducted on the Following Plant Species Collected at Jasper Ridge Biological Preserve, Stanford University, Stanford, California (a) and at the Joint Research Centre (JRC), Ispra, Italy (b) (a) Jasper Ridge Species Acer macrophyUum Aesculus californica Arbutus menziesii Dirca occidentalis Eriodictyon californicum Heteromeles arbutifidia Ludwigia pacifica Prunus ilicifolia
Quercus agrifolia Quercus kelloggii Quercus lobata Typha angustifolia Umbellularia californica Wyethia s p p Xanthium strumarium
(b) Joint Research Centre Species Acer pseudoplatanus Alnus glutinosa Armeniaca vulgaris Bambusa acundinacea Beta vulgaris Betula alba Brassica oleracea Castanea sativa Chamaerops humilis Corylus avellana Fagus sylvatica Ficus carica Fraxinus excelsior GIycine max Hedera helix Helianthus annuus Iris germanica Juglans regia Lactuca sativa Laurus nobilis Lycopersicon esculentum Medicago sativa Morus alba
Musa ensete Oryza sativa Phleum pratense Phragmites ccnnmunis Platanus acerifolia Populus canadenis Populus tremula Prunus laurocerasus Prunus serotina Quercus pubescens Quercus rubra Robinia pseudoacacia Salix alba Salvia officinalis Solanum tuberosum Sorghum halepense Tilia platyphylla Trifidium pratense Ulmus glabra Urtica dioica Vitis silvestris Vitis vinifera Zea mays
METHODS Study Sites Spectral and biochemical data were obtained at two locations: Jasper Ridge Biological Preserve, Stanford University, Stanford, California, and the Joint Research Centre (JRC), Ispra, Italy. Fifteen native plant species were collected from several coastal California plant communities at Jasper Ridge (Table la). Forty-six plant species, including native and cultivated species, were collected at JRC (Table lb).
Jasper Ridge Biological Preserve At Jasper Ridge, multiple collections of leaf samples were made in June, September, and December 1992 and May 1993. Approximately 10 replicate leaf samples were collected from each plant for spectral analysis. One leaf disk (3.5 cm diameter) was punched from each leaf, weighed, placed into a sample holder against a quartz lens, and backed with a black background for reflectance measurements. Reflectance measurements
184
Grossman et al.
were made with a NIRSystems Model 6500 spectrophotometer (NIRSystems Inc., Silver Spring, Maryland) from 400 nm to 2498 nm at 2-nm intervals with a full width-half maximum slit width of 10 nm. This study only uses the reflectance measurements from the infrared, between 800 nm and 2498 nm. All spectra taken from leaves of one plant were averaged. Following reflectance measurements, leaf disks were wrapped in aluminum foil, frozen in liquid nitrogen, lyophilized, reweighed, and electronically scanned for area measurement. The difference in weight between fresh and dried sample was attributed to water content. An additional 15-20 leaves (bulk samples) of each plant species were wrapped in aluminum foil and frozen in liquid nitrogen for later biochemical analysis. The samples were stored on dry ice during transport, lyophilized, and ground through a 1-~m screen using a ball mill grinder. Leaf powders were stored desiccated at - 7 0 ° C until chemical analyses were performed. A subset of the lyophilized bulk samples was sent to the laboratory of Dr. John Aber, University of New Hamphire, for biochemical analysis. Total carbon and nitrogen were determined with a Perkin-Elmer 2400 CHN Elemental Analyzer (Norwalk, Connecticut). Cellulose and lignin were determined by proximate analysis, a technique of sequential extractions in dichloromethane, water, and sulfuric acid. The sulfuric acid fraction represents cellulose and the residue represents lignin (Eft]and, 1977).
Joint Research Centre (JRC) Leaf samples were collected in the vicinity of the Joint Research Centre (JRC) during the summer of 1993 (Jacquemoud et al., 1995). Five replicate leaf samples were collected from each plant for spectral analysis. Reflectance measurements were made with a PerkinElmer Lambda 19 spectrophotometer (Norwalk, Connecticut) equipped with an integrating sphere. Measurements were made on fresh and dried leaves from 400 nm to 2500 nm at 1-nm intervals, although only data from the infrared portion of the spectrum, at 2-nm intervals were used in this study to match the Jasper Ridge sample interval. Spectral resolution varied from 1-2 nm in the near-infrared to 4-5 nm in the midinfrared. Reflectance spectra from five leaves were averaged for each sample spectrum. Sufficient quantities of fresh leaves were collected to obtain at least 25 g of dry leaf material. Biochemical analyses were performed at the Centre de Recherches Agronomiques, Libramont, Belgium, as described in Jacquemoud et al. (1995). Statistical Manipulations Leaf reflectance measurements were reported as log(1 / reflectance), hereafter identified as log(1/R). Smoothed first derivatives of log(1/R) spectra were produced by taking the difference between log(1 / R) values separated
by 8 nm and then performing a five-channel running average (Johnson and Billow, 1996). Second derivatives of log(1 / R) spectra were obtained by taking the difference between first derivative values separated by 2 nm. Unconstrained stepwise multiple regression was used to determine which six or fewer regressors explained the largest amount of variation in the chemical data. To add or retain a regressor, the partial F statistic must have been significant at the 0.15 level or less. Constrained regressions for nitrogen concentrations were performed using wavelengths identified in other studies (Table 8). Regressions were performed using the reflectance input [first or second derivative log(1/R)] reported by each study. Regression was also performed using wavelengths suggested by theoretical studies based on known absorption features for nitrogen, lignin, and water (Table 9; Curran, 1989). To estimate the "baseline" coefficient of determination (i.e., the coefficient of determination if there were no linear relationship between reflectance and biochemistry), 50 randomized datasets containing nitrogen concentration and content versus log(1 /R), first-derivative, and second-derivative spectra were prepared from the actual datasets for Jasper Ridge fresh, JRC fresh, and JRC dried leaves by randomizing the association between the measured nitrogen concentration or content and the reflectance spectra. Randomizations were done by assigning random numbers to the nitrogen observations, sorting by these random numbers, then assigning the first nitrogen observation to the first spectrum in the actual dataset, the second to the second, and so on. These randomized datasets were submitted to stepwise multiple linear regression, and the resulting 50 coefficients of determination were averaged. An additional dataset, provided by Dr. John Aber and Dr. Mary Martin (described in Bolster et al., 1996) was randomized in the same manner. This dataset contained nitrogen concentrations and reflectance spectra for 186 samples of dried, ground leaves, representing 14 species from the Harvard Forest, Petersham, Massachusetts. The repeatability of wavelength selection was tested by assembling five randomly chosen subsets of 40 samples from the 63 samples in the JRC fresh leaf dataset. Stepwise multiple linear regressions of nitrogen concentration versus log(1 / R) spectra were performed on these subsets. The wavelengths chosen as the best three regressors for each subset were compared for consistency among the five regressions. All statistical procedures were performed using SAS (SAS Institute Inc., Cary, North Carolina).
RESULTS Chemistry Data The chemistry data for the Jasper Ridge dataset contained information from 21-34 samples depending on
Extraction of Leaf Biochemistry Information
185
Table 2. Leaf Chemical Concentrations (g g-l) and Contents (g m -2) for the Jasper Ridge Dataset (a) and the Joint Research Centre (JRC) Dataset (b) Mean (a) Jasper Ridge Dataset Carbon concentration Nitrogen concentration Cellulose concentration Lignin concentration Carbon content Nitrogen content Cellulose content Lignin content Dry weight Water content (b) Joint Research Centre Dataset Carbon concentration Nitrogen concentration Cellulose concentration Ligniu concentration Carbon content Nitrogen content Cellulose content Lignin content Dry weight Water content
0.476 0.018 0.372 0.185 57 2 45 22 123 183 0.474 0.034 0.197 0.102 25 2 11 6 53 115
Standard Deviation 0.021 0.009 0.086 0.031 28 0.8 22 9 57 104 0.029 0.011 0.064 0.064 12 0.5 7 6 23 67
the chemical, representing 14 dicot and 1 monocot species (Table 2a). The chemistry data for the JRC dataset contained information from 63 samples, representing 37 dicot and 9 monocot species (Table 2b). The mean carbon concentrations, 47.6%, Jasper Ridge, and 47.4%, JRC, were not significantly different (Table 2). The mean nitrogen concentration in the Jasper Ridge dataset, 1.8%, was significantly (p< 0.001) lower than the nitrogen concentration in the JRC dataset, 3.4%, although the ranges nearly completely overlapped. The higher mean nitrogen concentrations in the JRC dataset were due to the large number of cultivated plants, which typically have values near or greater than 3%. The mean cellulose and lignin concentrations were significantly higher in the Jasper Ridge dataset than in the JRC dataset, with the ranges overlapping at the lower end and the Jasper Ridge values exceeding the JRC values at the upper end. The mean specific leaf areas differed significantly (8.4 mm2mg -1 and 22.5 mm 2 mg -1 in the Jasper Ridge and JRC datasets, respectively).
Stepwise Multiple Linear Regression Unconstrained Regressions
Stepwise multiple linear regression between 800 nm and 2498 nm with a maximum of six regressors was performed with log(1 / R)-transformed reflectance values and their first and second derivatives as the independent variables. In the Jasper Ridge dataset, the second derivative regressions explained at least 82% of the variation in the chemistry data, and were the best fit regressions
Minimum 0.432 0.008 0.255 0.142 14 0.6 15 7 31 53 0.385 0.012 0.091 0.011 8 0.9 3 0.3 19 46
Maximum 0.511 0.053 0.592 0.260 127 4 99 38 256 552 0.523 0.059 0.372 0.275 67 3 45 30 135 405
n 30 30 26 26 24 24 21 21 34 34 63 63 63 63 63 63 63 63 63 63
for all data (Table 3). The first derivative regressions for lignin and dry weight content (g m- 2) explained the same amount of the variation as the second derivatives. The log(1 /R) and first derivative regressions explained lower proportions of the variation in all other cases, but at least 62% of the variation. Higher explanations of variation were obtained by expressing the data on the basis of content than by expressing the data on the basis of concentration (g g-i) for all regressions except cellulose vs. the first derivative of log(1/R) (Table 3). When the data were analyzed using unsmoothed derivatives, the coefficients of determination were slightly higher [mean difference of 0.04, and 0.03, respectively for first derivative log(1/R), and second derivative log(1/R), with the exception of first derivative log(l/ R) lignin concentration which was 0.23 higher; Grossman et al., 1994]. In most cases in the JRC fresh leaf dataset (Table 4), the first derivative of log(1/R) explained more variation than did log(1/R) or the second derivative of log(1 / R). When the data were expressed on a concentration basis (g g-i), the first derivative of log(1/R) explained the largest proportion of the variation for nitrogen, cellulose, and lignin, and the second derivative of log (1 /R) explained the largest proportion of the variation for carbon. When the data were expressed on a content basis, the first derivative of log(1/R) explained the largest proportion of the variation for carbon, nitrogen, cellulose, and log(1 / R) explained the largest proportion of the variation for lignin and dry weight. Log(1 / R) and
186 Grossman et al.
Table 3. Coefficients of Determination (R~) and Number of Regressors for Best Fit Stepwise Multiple Linear Regression with Six or Fewer Regressors of log(l/R), Smoothed First Derivative of log(l/R), Smoothed Second Derivative of log(l/R) versus Chemical Concentrations (g g-l) and Contents (g m-2) of Fresh Leaves in the Jasper Ridge Dataset Chemical
1st Derivative
log(l/R)
2nd Derivative
R2
#
R2
#
R2
#
Carbon concentration Nitrogen concentration Cellulose concentration Lignin concentration
0.82 0.73 0.73 0.69
5 5 5 4
0.81 0.76 0.90 0.62
3 6 6 2
0.93 0.82 0.91 0.90
6 6 6 6
Carbon content Nitrogen content Cellulose content Lignin content Dry weight content Water content
0.92 0.83 0.95 0.90 0.97 0.94
4 6 6 4 6 5
0.96 0.82 0.88 0.95 0.98 0.97
3 6 3 4 6 6
0.99 0.97 0.98 0.95 0.98 0.98
6 6 6 3 6 6
the first derivative both explained 99% of the variation for water content. At least 60% of the variation was explained for all chemicals except for cellulose and lignin concentrations, for which no significant regressions were found. Higher explanations of variation were obtained by expressing the data on the basis of content than by expressing the data on the basis of concentration for all chemicals except nitrogen (Table 4). In the JRC dry leaf dataset (Table 5), the first or second derivative of log(l/R) consistently explained more of the variation than did log(i/R). When the data were expressed on a concentration basis, the first derivative of log(1 / R) explained the largest proportion of the variation for carbon, cellulose, and lignin. The first and second derivatives explained equal amounts of the variation for nitrogen concentration. When the data were expressed on a content basis, the second derivative of log(1 / R) explained the largest proportion of the variation for carbon, nitrogen, cellulose, lignin, dry weight, and water. In one case, only 35% of the variation was explained. In all other cases, at least 53% of the variation
was explained. Higher explanations of variation were obtained for all chemicals by expressing the data on the basis of concentration than by expressing the data on the basis of content (Table 5). The wavelengths selected by the unconstrained regressions depended upon whether the data were expressed on the basis of concentration or content (e.g., nitrogen, Fig. 1). Over all chemicals analyzed, wavelengths chosen for concentration and content were within 10 nm for less than 6% of the comparisons. "Baseline" Coefficients of Determination for Unconstrained Regressions The largest explanation of variation in nitrogen concentration and content of the randomized datasets was provided by the second derivative of log(1/R) (Table 6). The mean coefficients of determination ranged from 57% to 82% of the variation, however individual coefficients of determination were as high as 91%. The first derivative and log(l/R) regressions explained 4148% and 4-18% of the variation, respectively. The
Table 4. Coefficients of Determination (Rz) and Number of Regressors for Best Fit Stepwise Multiple Linear Regression with Six or Fewer Regressors of log(I/R), Smoothed First Derivative of log(l/R), Smoothed Second Derivative of log(l/R) versus Chemical Concentrations (g g-l) and Contents (g m-2) of Fresh Leaves of the JRC Dataseta Chemical
log(l/R)
1st
2nd
Derivative
Derivative
Rz
#
R2
#
R2
#
Carbon concentration Nitrogen concentration Cellulose concentration Lignin concentration
0.60 0.85 ns ns
6 6 --
0.63 0.86 0.76 0.77
6 6 6 6
0.71 0.79 0.65 0.63
6 6 6 6
Carbon content Nitrogen content Cellulose content Lignin content Dry weight content Water content
0.93 0.77 0.69 0.87 0.94 0.99
6 6 6 6 6 6
0.94 0.81 0.85 0.85 0.93 0.99
6 6 6 6 6 6
0.82 0.74 0.76 0.75 0.80 0.98
6 6 6 6 6 6
ns = no significant regression found.
Extraction of Leaf Biochemistry Information 1 8 7
Table 5. Coefficients of Determination (R2) and Number of Regressors for Best Fit Stepwise Multiple Linear Regression with Six or Fewer Regressors of log(l/R), Smoothed First Derivative of log(l/R), Smoothed Second Derivative of log(l/R) versus Chemical Concentrations (g g-l) and Contents (g m -2) of Dried Leaves of the JRC Dataset Chemical
1st Derivative
log(l/R)
2nd Derivative
R2
#
R2
#
R2
#
Carbon concentration Nitrogen concentration Cellulose concentration Lignin concentration
0.68 0.83 0.81 0.63
6 6 6 5
0.77 0.84 0.89 0.80
6 6 6 6
0.71 0.84 0.82 0.73
6 6 6 6
Carbon content Nitrogen content Cellulose content Lignin content Dry weight content Water content
0.58 0.35 0.57 0.54 0.59 0.59
6 6 6 6 6 6
0.68 0.53 0.62 0.62 0.68 0.69
6 6 6 6 6 6
0.70 0.64 0.68 0.69 0.70 0.70
6 6 6 6 6 6
coefficients of determination were higher for nitrogen content than for nitrogen concentration, except for log(1 /R) regressions for the JRC fresh leaf dataset. Mean coefficients of determination for the Harvard Forest dataset were lower than those for the Jasper Ridge and JRC datasets, but exhibited the same pattern (Table 7). That is, the second derivative of log(1/R) explained the greatest amount of variation (9%), the first derivative explained an intermediate level (5%), and log(l/R) explained the lowest amount (1%). A number of factors may have contributed to lower explanation of variation by random datasets, including the greater species homogeneity in the dataset, lower experimental error due to larger sample numbers, and measurement factors, such as the use of dry ground uniformly packed leaf samples.
Constrained Regressions Wavelengths selected in other studies. The five sets of fixed wavelengths explained 25-44 % of the variation in the nitrogen concentrations in the Jasper Ridge dataset, 14-49% of the variation in the JRC fresh leaf dataset, and 30-71% of the variation in the JRC dry leaf dataset (Table 8). The coefficients of determination for the relationship between measured and predicted nitrogen concentration were less than or equal to the average Figure 1. Wavelengths selected using log(1 / R), first derivative of log(1 / R), and second derivative of log(1 / R) spectra vs. nitrogen content and concentration.
coefficient of determination for randomized datasets (randomized R2) in 10 of 15 comparisons, up to 0.03 units greater than the randomized R2 in two of 15 comparisons, and between 0.20 and 0.29 units greater than the randomized R 2 in three of 15 comparisons. High explanatory power for one dataset did not necessarily indicate high explanatory power for other datasets. For example, although the Johnson and Billow (1996) wavelengths derived from dry Douglas fir needles explained 62% of the variation in the JRC dry leaf dataset, they explained only 25 % of the variation in the Jasper Ridge dataset and 32% of the variation in the JRC fresh leaf dataset. Wavelengths suggested from theoretical studies. Multiple linear regressions using wavelengths suggested from theoretical studies based on known absorption features (Table 9) were inconsistent in the proportion of variation explained for nitrogen, lignin, and water, which explained from 0% to 94% (Table 10). None of the spectral inputs [log(l/R), first derivative log(1/R),
Table 6. Mean Coefficients of Determination (R2) for Best Fit Stepwise Multiple Linear Regression with Six or Fewer Regressors of log(i/R), Smoothed First Derivative of log(i/R), Smoothed Second Derivative of log(l/ R) versus Nitrogen Concentrations (g g ]) and Contents (g m -2) Using 50 Randomized Datasets for Jasper Ridge Fresh Leaves, JRC Fresh and Dried Leaves (Summary of 900 Regressions) Leaf log 1st 2nd Type (I/R) Derivative Derivative
R2 ~omx~
f
•
lslder.conc I-
zx
/
t o n i I"
2rid der, ( ~
~ ~ ,a
•
•
~ • •
zx Lx
zx
o
•
I
I
800
1000
1200
I 1400
•
oo
•
I
0.86
•
o elm
•
A
•
I
I 1800
0.85 0.77
• I 1800
Wavelength (nm)
I
I
I
2000
2200
2400
•
0.81
o
0.79 0,74 2600
Nitrogen concentration Jasper Ridge JRC JRC
Fresh Fresh Dry.
0.08 0.13 0.05
0.41 0.47 0.42
0.75 0.60 0.57
Nitrogen content Jasper Ridge JRC JRC
Fresh Fresh Dry
0.18 0.04 0.13
0.43 0.48 0.47
0.82 0.65 0.62
188
Grossman et al.
Table 7. Coefficients of Determination (R2) for Best Fit Stepwise Multiple Linear Regression with Six or Fewer Regressors of log(l/R), Smoothed First Derivative of log(i/R), Smoothed Second Derivative of log(i/R) versus Nitrogen Concentrations (g g-l) Using 50 Randomized Datasets for Harvard Forest Dried, Ground Leaves
Chemical
Wavelength (nm)
Water
970 1200 1400 1450 1940
Nitrogen
1020 1510 2060 2130 2180 2300
Lignin
1120 1200 1420 1450 1690 1940
I st 2nd log(1/R)Derivative Derivative
Leaf Type Harvard Forest
Table 9. Absorption Features of Water, Nitrogen, and Lignin from Curran (1989).
Dried, ground
0.01
0.05
0.09
and second derivative log(1/R)] consistently explained the nitrogen or lignin concentrations or contents across the three datasets. The results for water content were more consistent. For example, for nitrogen concentration, the coefficients of determination (R 2) for log(1 / R) and the first derivative were higher than the randomized R2's for the three datasets, except for Jasper Ridge first derivative, which was not significant. The second derivative R2's were always lower than the randomized R2's. For nitrogen content, three of the six regressions performed on log(1/R) and the first derivative were not significant, two were greater than the randomized R2's, and one was lower. For lignin concentration, two of the log(1 / R) regressions had moderate R2's but one was not significant, and all of the first derivative RZ's were less than 0.35. The results for water content were more consistent, with the two water-containing datasets (Jasper Ridge and JRC fresh) giving high R2's for log(1/R), first derivative, and second derivative regressions, but low or nonsignificant regressions for the JRC dry leaf dataset (Table 10).
Comparison of Wavelengths Selected by Regressions on Subsets of One Dataset The wavelengths chosen by stepwise multiple linear regression of l o g ( l / R ) for subsets of the JRC fresh leaf dataset exactly coincided with those for the entire dataset for one subset (subset 4, Fig. 2). The regression for the entire dataset explained 49% of the variation in nitrogen concentration. The regressions for the subsets explained 4 1 - 7 1 % of the variation in nitrogen concentration. For two subsets, only one of six selected wavelengths was within 14 nm of a selected wavelength for the entire dataset (subsets 2, 3, Fig. 2). A second selected wavelength in each of these two datasets was
Table 8. Descriptive Information and Coefficients of Determination (R2) for Five Fixed Wavelength Comparisons Using Nitrogen Concentrations and Spectra from the Jasper Ridge and JRC Datasets CoeJflcient of Determination (R2)
JRC Reflectance 1st derivative log(l/R) 980,1194,1644, 1676,2274 1st derivative log(l/R) 1928,2044,2140,2156 2nd derivative log(l/R) 1712,2128,2174 1st derivative log(l/R) 2156,2208,2338 1st derivative log(l/R) 1722,2142,2346
Plant Material
Citation
Jasper Ridge
Fresh Leaves
JRC Dry Leaves
n (Calibration)
Nitrogen Concentration Range
(g g-l)
Fresh maple leaves, single layer
Yoder and PettigrewCrosby, 1996
0.33
0.33
0.30
57
0.018-0.040
Dry, ground deciduous and coniferleaves Fresh, stacked deciduous and conifer leaves Fresh, stacked Douglas fir needles Dry, stacked Douglas fir needles
Bolster et al., 1996
0.44
0.47
0.37
558
0.007-0.028
Martinand Aber, 1996
0.39
0.14
0.71
147
0.011-0.021
Johnson and Billow, 1996 Johnson and Billow, 1996
0.28
0.49
0.70
87
0.014-0.025
0.25
0.32
0.62
87
0.014-0.025
Extraction of Leaf Biochemistry Information
Table 10. Coefficients of Determination (Re) for Regressions of Nitrogen Concentration, Nitrogen Content, Lignin Concentration, Lignin Content, and Water Content against Known Absorption Bands for Compounds Containing These Chemicals" CoeJficient of Determination (Re) 1st 2nd log(l/R) Derivative Derivative Nitrogen concentration Jasper Ridge JRC- fresh JRC- dry
0.51 0.78 0.78
ns 0.60 0.70
0.40 0.26 0.39
Nitrogen content Jasper Ridge JRC- fresh JRC- dry
0.65 0.66 ns
ns 0.47 ns
ns 0.28 ns
Lignin concentration Jasper Ridge JRC- fresh JRC- dry
0.56 ns 0.32
ns 0.23 0.34
0.54 0.21 0.34
Lignin content Jasper Ridge JRC -fresh JRC-dry
0.71 0.36 0.25
0.78 0.40 0.21
0.73 0.28 0.27
Water content Jasper Ridge JRC-fresh JRC- dry
0.88 0.93 0.29
0.65 0.75 ns
0.94 0.85 ns
ns = n o s i g n i f i c a n t r e g r e s s i o n f o u n d .
within 56 nm. For the remaining two datasets, the selected wavelengths were more than 350 nm away from those chosen for the entire dataset. DISCUSSION
Chemistry Data The chemical compositions of the leaves in the Jasper Ridge and JRC datasets were typical of most plant leaves (Table 2). Both datasets included native Mediterraneantype plant species, while the JRC dataset also contained cultivated species (Table 1). The higher mean specific leaf area and cellulose and lignin contents in the Jasper Ridge dataset suggest that the mean leaf was more xerophytic in this dataset than in the JRC dataset. The inclusion of cultivated species in the JRC dataset probably accounted for the higher nitrogen concentration compared to the Jasper Ridge dataset.
189
explained more of the variation than did log(1 / R) (Tables 3-5). Similarly, Yoder and Pettigrew-Crosby (1996) found that the first derivative of log(1/R) explained more of the variation in nitrogen concentration in bigleaf maple fresh leaves, and Johnson and Billow (1996) found the first and second derivatives explained similar amounts of variation in nitrogen concentration in Douglas fir needles. However, the first derivative did not explain a significant amount of variation in protein concentration in Amaranthus leaves (Curran et al., 1992). For the two fresh leaf datasets, more of the variation was explained by expressing the data on a content basis than a concentration basis for all chemicals except cellulose in the Jasper Ridge dataset and nitrogen in the JRC dataset (Tables 3 and 4). This could have resulted from the fact that the spectrometer beam probes the leaf on an area basis. However, this generalization did not hold true for the dry leaf dataset (Table 5). In addition, the results of the randomization study indicated that this apparent pattern may be partially dependent on random processes because the same pattern occurred in eight out of nine comparisons of nitrogen concentration and content (Table 6). In general, the proportions of variation in chemical concentrations and contents explained by the regressions with the highest R2's were somewhat higher in the Jasper Ridge dataset than in the JRC fresh leaf dataset, possibly due to the smaller number of samples in the Jasper Ridge dataset than in the JRC dataset (Tables 3 and 4). Lower R2's were generally obtained for the JRC dry leaf dataset than for the JRC fresh leaf dataset. Similarly, the coefficients of determination for dry needle regressions of Douglas fir were slightly inferior to those for fresh needles (Johnson and Billow, 1996). However, Jacquemoud et al. (1995), using reflectance as the input, found higher coefficients of determination for dry leaves. For all three datasets, the bands selected by stepwise multiple linear regressions differed depending upon whether log(1/R), first derivative log(1/R), or second derivative log(1/R) were used (Fig. 1). Similarly,
Figure 2. Bands selected by stepwise multiple linear regression with three or fewer regressors for log(1 / R) versus nitrogen concentration on the entire JRC fresh leaf dataset and five subsets each containing 63% of the samples in the entire dataset. R2
Stepwise Multiple Linear Regression
o
Entire dataset f
Unconstrained Regressions The data transformation that explained the greatest proportion of the variation in the chemical composition varied with dataset, chemical, and whether the data were expressed as concentration or content (Tables 35). For most chemicals, the first or second derivative
c~
0.49
Subse~ 1 ~-
• •
•
•
•
• Subset 5 ~
•
0.46 •
800
0.41
•
0.60 •
i I 1000
I 1200
0.61
I
I
1~0
I
I
• I
0.71 i
1~i00 ~eoo 2ooo 220o 24o0 2c~0
Wavelength (rim)
190 Grossman et al.
Jacquemoud et al. (1995) reported that different bands were selected for the same chemical in the same leaves when the data were examined in reflectance or transmission modes. As reported by Curran (1989), selected bands rarely corresponded to known features for the chemical being examined. In addition, band selection was heavily dependent upon the basis on which the chemistry was expressed, with selected bands coinciding (selection of bands within 10 nm of one another) in less than 6% of the paired concentration/content regressions (Fig. 1). This suggests that band selection by stepwise multiple linear regression on intact fresh leaves may be sensitive to factors other than the absorption characteristics of the chemicals being examined, including scattering due to cell walls and anatomical characteristics (Peterson and Hubbard, 1992) and spectral overlaps caused by the presence of other biochemicals (Curran et al., 1992). "Baseline" Coefficients of Determination The high coefficients of determination obtained for the randomized datasets suggest that stepwise multiple linear regressions of using log(1/R), first derivative log (1/R), and second derivative log(l/R) "explained" a high proportion of the variation even when no actual relationship existed between the chemical data and the reflectance spectra (Table 6). These coefficients of determination ranged from 41% to 48% for the first derivative and from 57% to 82% for the second derivative of log(1 /R) for the three datasets. Although the number of wavelengths selected was smaller than the number of samples in the datasets, the initial number of wavelengths was quite large (850). The number of samples in the Jasper Ridge and JRC datasets was relatively small, but the JRC dataset is similar in size to several recently reported studies on fresh leaf reflectance and biochemistry (Johnson and Billow, 1996; Yoder and Pettigrew-Crosby, 1996; see Table 8). The number of regressors selected for the JRC dataset meets the criterion given by Hruschka (1987): five to 15 samples for each regression and data treatment constant and for any parameter of the data treatment, such as wavelength, that is allowed to vary. By this criterion, the Jasper Ridge dataset was too small to allow the selection of six regressors. However, the patterns observed with the Jasper Ridge dataset were similar to those observed in the JRC dataset. Use of a larger dataset reduced the magnitude of the "baseline" correlation, but did not eliminate it (Table 7). In any case, when the number of samples is less than the number of initial wavelengths, the number of independent ways in which the wavelengths can vary from sample to sample is limited, a problem termed multicollinearity (Martens and Naes, 1987). Stepwise multiple regression compresses the data to remove this problem, however, the coefficient of determination is
inflated by this data compression (Rencher and Pun, 1980; Birth, 1985). Birth (1985) gives a formula to calculate the coefficient of determination that can be expected when the true correlation is zero. For 850 uncorrelated independent variables and 63 samples, this coefficient of determination is 0.19. We would expect a higher coefficient of determination from stepwise multiple linear regression because more than one regressor was allowed. The lower values obtained for the log(1/R) regressions (Table 6) suggest that some of the independent variables were correlated with others, reducing the number of truly independent variables in the regression. In the regressions using correctly-paired nitrogen concentration and content data and spectra, the first or second derivative generally explained the greatest amount of the variation [compared to log(i/R)], but because the randomized R2's only exceeded the actual R2's by 2-42%, we question how the R2's for the first and second derivative of log(1 / R) should be interpreted. Further caution is suggested since the maximum R2 values obtained from randomized runs of second derivative log(1 /R) are near values reported in the literature as significant relationships between chemistry and reflectance. The nonzero randomized R2 values suggest that to obtain statistical confidence, the "baseline" R2 for randomized data should be established before accepting regressions with high R2's as biologically significant. Constrained Regressions Prior studies have recommended constraining the regression (ACCP, 1994) by using a priori selected wavelengths. In this study, fitting regressions to the wavelengths identified in five studies of fresh or dried leaf material generally did not provide reliable predictions of nitrogen concentration (Table 8). We conclude that none of these sets of fixed bands provided adequate predictive ability across our datasets. The results of regressions using wavelengths suggested by theoretical studies were also inconsistent. Some of the coefficients of determination (R 2) for log (1/R) and the first derivative log(l/R) regressions on nitrogen concentration and content were higher than the randomized R2's for the three datasets, but others were lower or not significant. All of the R2's for the second derivative log(1 / R) were less than the randomized R2's. The R2's for lignin were similarly variable. The results for water content were more consistent, with the two water-containing datasets (Jasper Ridge and JRC fresh) giving high R2's for log(1 / R), first derivative, and second derivative regressions, but low or nonsignificant regressions for the JRC dry leaf dataset. This supports the view that, in fresh leaves, water provides a stronger signal than nitrogen or lignin (Curran, 1989; Elvidge, 1990). Additional explanations for the inability of the constrained regressions to consistently relate reflectance
Extraction of Leaf Biochemistry Information 191
spectra to chemical composition in our datasets may be the wide range of species diversity and the use of intact leaves. Typically, NIRS protocols involve developing species-specific relationships for chemistry prediction. Sample preparation is also closely monitored for drying, particle size, and packing density prior to measurement. Following these techniques, Johnson and Billow (1996) and Kupiec and Curran (1996) obtained good results with monospectific datasets of dried, ground foliage samples. Recently, however, Bolster et al. (1996) reported good stepwise multiple linear regression predictions from large datasets that included a wide range of species, both broadleaf and conifers, and a variety of plant material (leaves, stems, roots, etc.). Furthermore, the ACCP report (1994) stressed the importance of having a calibration dataset that included the range of possible chemical variation rather than limiting species diversity. With our datasets, it was not possible to test these hypotheses for improving model performance by restricting analyses to more homogeneous samples. It should be kept in mind, however, with respect to remote sensing applications, that some environmental heterogeneity is often present within a scene. Although the species in the JRC and Jasper Ridge datasets represent a wide range of foliar conditions and adaptations, both datasets were acquired from commonly occurring plants growing within a radius of 1 km of each other. Thus, potential remote sensing techniques need to accommodate the range of species variability within regions, even if site-specific relationships are to be developed. This study was conducted using single leaf layers that were not optically thick. Jacquemoud et al. (1995) compared reflectance and transmittance on fresh and dry individual leaf and stacked (optically thick) fresh and dry leaves, using five regressors. They reported that the coefficients of determination for single and stacked fresh leaf reflectances were higher for optically thick samples when protein was analyzed, lower when lignin was analyzed, and similar when cellulose and starch were analyzed. Because optically thick samples of fresh leaves did not give consistently better results than single leaves, we conclude that the patterns observed in our study cannot be attributed to the use of single leaves rather than optically thick leaf stacks. Furthermore, Jacquemoud et al. (1994) examined the effect of including different numbers of regressors in the stepwise equations. Their results indicate no significant difference among these datasets based on the number of regressors (up to 10), suggesting that the patterns, within the limits of the number of regressors chosen, are not significantly affected by overfitting. Comparison of Wavelengths Selected by Regressions on Subsets of One Dataset The effect of changing the sample composition of the dataset was tested using 5 subsets of the JRC dataset
each containing 40 samples (Fig. 2). Due to the limited number of samples, only three wavelengths were selected in order to avoid overfitting (Hruschka, 1987). For one of the subsets, the wavelengths exactly coincided with those selected for the entire dataset. Two subsets had one wavelength near a wavelength selected for the entire dataset, and two subsets had no wavelengths near those selected for the entire dataset. The coefficients of determination for the subsets ranged from 0.41 to 0.71. These results were similar to those reported by Jacquemoud et al. (1995), who randomly selected two-thirds of the JRC dataset to develop calibration relationships and tested the regressions on the remaining one-third of their data. Their results, the means of 50 selections, showed that the coefficients of determination on the validation datasets were highly variable. Johnson and Billow (1996) reported more consistent results using subsets containing 14 samples. This greater consistency may have resulted from the homogeneity of the dataset, which contained only Douglas fir needles. As the JRC dataset was deliberately selected to represent a wide range of natural and cultivated species, we were not able to able to test this hypothesis. CONCLUSION Stepwise multiple linear regression relating chemistry data and reflectance spectra produced large coefficients of determination (R2), particularly when the reflectance spectra were expressed as the first or second derivative of log(1 / R) (Tables 3-5). Similar high R2's have been reported for both fresh and dry leaves in other studies (Card et al., 1988; Wessman et al., 1988; Curran, 1989; Jacquemoud et al., 1995; Bolster et al., 1996; Dungan et al., 1996; Johnson and Billow, 1996; Martin and Aber, 1996; Yoder and Pettigrew-Crosby, 1996). However, we question the meaningfulness of these high R2's due to the results of regressions using randomized datasets, the lack of correspondence among the bands selected by stepwise multiple linear between our datasets and those reported in other studies, the dependence of band selection on the data used, and the inability of known absorption bands to explain the chemical variation in our datasets. Stepwise multiple linear regressions using artificial datasets assembled by randomizing the association between nitrogen data and reflectance spectra gave R2's of at least 0.41 and as much as 0.82 for the relationship between nitrogen concentration and content vs. first or second derivative log(1/R) (Table 6). The R2's for correctly-paired nitrogen data and first and second derivative log(1/R) only exceeded the average randomized R2's by 0.02-0.42. This suggests that high R2's for stepwise multiple linear regression on datasets containing substantially fewer samples than initial wavelengths must be examined in light of the "'baseline" R2"s for the
192
Grossman et al.
chemical being examined. Additional caution is advised when examining species-diverse datasets containing fresh, intact leaf spectra. The bands selected by stepwise multiple linear regression for a given chemical (nitrogen, carbon, lignin, etc.) and a given spectral transformation (log(1 /R), first derivative, second derivative) did not correspond among datasets in this study or with bands selected in other studies (Card et al., 1988; Wessman et al., 1988; Curran, 1989; Jacquemoud et al., 1995; Bolster et al., 1996; Dungan et al., 1996; Johnson and Billow, 1996; Martin and Aber, 1996; Yoder and Pettigrew-Crosby, 1996). Band selection depended on whether the chemical data were expressed on a concentration (g g-l) or content (g m -2) basis (Fig. 1). For a given chemical, similar bands were selected on a concentration or a content basis less than 6% of the time. Band selection was also very sensitive to the samples included in the dataset (Fig. 2). Multiple regression using bands identified in other studies to explain nitrogen concentration yielded R2's that were less than the average R2's for artificially construtted randomized datasets in eight of 15 tests (Tables 6 and 8). Multiple regression using bands that represent known absorption characteristics for nitrogen and lignin were inconsistent, yielding high R2's for some chemicals and datasets, but not others (Table 10). The results for water were more consistent, giving high R2's for log(1 / R), first and second derivative regressions for the fresh leaf datasets, and low R2's or nonsignificant regressions for the JRC dry leaf dataset. All of these results suggest caution in the use of stepwise multiple linear regression on fresh leaf reflectance spectra. Band selection does not appear to be based upon the absorption characteristics of the chemical being examined. We wish to thank Dr. Mary Martin and Dr. John Aber for providing their dry leaf dataset and to Dr. John Aber for the chemistry analyses of the Jasper Ridge dataset under the NASA Accelerated Canopy Chemistry Program. We thank Mr. Lee Johnson and Dr. Vern Vanderbilt for assisting in Jasper Ridge data collection and use of the NASA Ames Research Center NIRS 6500 spectrophotometer. We acknowledge the support of the Jasper Ridge Biological Preserve for permitting leaf collections and Dr. Neil WiUits, University of California, Davis Statistics Lab, for statistical advice. This work was supported by NASA EOS Grants NAS5-31359 and NAS5-31714 and the Digital Equipment Corporation, which provided DEC Alpha 3000 computers through the Sequoia 2000 Grant. REFERENCES
ACCP (1994), NASAAccelerated Canopy Chemistry Program, Final Report NASA-EOS-IWG, 19 October. Birth, G. S. (1985), Evaluation of correlation coefficients obtained with stepwise regression analysis, Appl. Spectrosc. 39:729-732.
Bolster, K. L., Martin, M. E., and Aber, J. D. (1996), Interactions between precision and generality in the development of calibrations for the determination of carbon fraction and nitrogen concentration in foliage by near infrared reflectance, Can. J. Forest Res., forthcoming. Card, D. H., Peterson, D. L., Matson, P. A., and Aber J. D. (1988), Prediction of leaf chemistry by the use of visible and near infrared reflectance spectroscopy, Remote Sens. Environ. 26:123-147. Curran, P. J. (1989), Remote sensing of foliar chemistry, Remote Sens. Environ. 30:271-278. Curran P. J., Dungan, J. L., Macler, B. A., Plummer, S. E., and Peterson, D. L. (1992), Reflectance spectroscopy of fresh whole leaves for the estimation of chemical concentration, Remote Sens. Environ. 39:153-166. Dungan, J. L., Johnson, L., Billow, C. et al. (1996), High spectral resolution reflectance of Douglas fir grown under different fertilization treatments: experiment design and treatment effects, Remote Sens. Environ. 55:217-228. Eflland, M. (1977), Modified procedure to determine acidinsoluble lignin in wood and pulp, Tech. Assoc. Pulp Paper Ind. 60:143-144. Elvidge, C. D. (1990), Visible and near infrared reflectance characteristics of dry plant materials, Int. J. Remote Sens. 11:1775-1795. Gates, D. M. (1970), Physical and physiological properties of plants, in Remote Sensing: With Special Reference to Agriculture and Forestry, National Academy of Sciences, Washington, D.C., pp. 224-252. Grossman, Y. L., Sanderson, E. W., and Ustin, S. L. (1994), Relationships between leaf chemistry and reflectance for plant species from Jasper Ridge Biological Preserve, California, in Int. Geosci. and Remote Sens. Symp. (IGARSS "94) Pasadena, California. Hruschka, W. R. (1987), Data analysis: wavelength selection methods, in Near-lnfrared Technology in the Agricultural and Food Industries (P. C. Williams and K. H. Norris, eds.), American Association of Cereal Chemists, St. Paul, MN, Chapt. 3. Jacquemoud, S., Verdebout, J., Schmuck, G., Andreoli, G., Hosgood, B., and Hornig, S. E. (1994), Investigation of leaf biochemistry by statistics, in Int. Geosci., and Remote Sens. Symp. (IGARSS '94) Pasadena, California. Jacquemoud, S., Verdebout, J., Schmuck, G., Andreoli, G., and Hosgood, B. (1995), Investigation of leaf biochemistry by statistics, Remote Sens. Environ. 54:180-188. Johnson, L. F., and Billow, C. R. (1996), Spectroscopic estimation of total nitrogen concentration in Douglas-fir foliage, Int. J. Remote Sens. 17:489-500. Kupiec, J. A., and Curran, P. J. (1996), Remote sensing of foliar chemistry: moving from the leaf to the canopy, forthcoming. Marten, G. C., Shenk, J. S., and Barton, F. E., II, Eds. (1989), Near Infrared Reflectance Spectroscopy (NIRS): Analysis of Forage Quality, U.S. Dept. of Agric. Handbook 643, USDA, Washington, DC, pp. 1-96. Martens, H., and Naes, T. (1987), Multivariate calibration by data compression, in Near-Infrared Technology in the Agricultural and Food Industries (P. C. Williams and K. H. Norris, Eds.), American Association of Cereal Chemists, St. Paul, MN, Chapt. 4.
Extraction of Leaf Biochemistry Information
Martin, M. E., and Abet, J. D. (1996), Determining the chemical composition of fresh leaves using near infrared spectra, J. Near Infrared Reflectance Spectrosc., forthcoming. McLellan, T., Martin, M. E., Aber, J. D., Melillo, J. M., Nadelhoffer, K., and Dewey, B. (1991), Comparison of wet chemistry and near infrared reflectance measurements of carbonfraction and nitrogen concentration of forest foliage, Can. J. For. Res. 21:1689-1693. Peterson, D. L., and Hubbard, G. S. (1992), Scientific issues and potential remote-sensing requirements for plant biochemical content, J. Imaging Sci. Technol. 36:446-456. Peterson, D. L., Aber, J. D., Matson, P. A., et al. (1988), Remote sensing of forest canopy and leaf biochemical contents, Remote Sens. Environ. 24:85-108. Rencher, A. C., and Pun, F. C. (1980), Inflation of R 2 in best subset regression, Technometrics 22:49-53.
19,3
Tucker, C. J., and Garratt, M. W. (1977), Leaf optical system modeled as a stochastic process, Appl. Opt. 16:635-642. Wessman, C. A. (1990), Evaluation of canopy chemistry, in Remote Sensing of Biosphere Functioning (R. J. Hobbs and H. A. Mooney, Eds.), Springer-Verlag, New York, pp. 135156. Wessman, C. A., Aber, J. D., Peterson, D. L., and Mellilo, J. M. (1988), Foliar analysis using near infrared reflectance spectroscopy. Can. J. For. Res. 18:6-11. Williams, P. C., and Norris, K. H., Eds. (1987), Near-lnfrared Technology in the Agricultural and Food Industries, American Association of Cereal Chemists, St. Paul, MN, 330 pp. Yoder, B. J., and Pettigrew-Crosby, R. E. (1996), Predicting nitrogen and chlorophyll from reflectance spectra (4002500 nm) at leaf and canopy scales, Remote Sens. Environ. forthcoming.