REMOTE SENS. ENVIRON. 38:77-121 (1991)
Automated Cloud Screening of AVHRR Imagery Using Split-and-Merge Clustering Timothy C. GaUaudet and James J. Simpson Scripps Institution of Oceanography, La Jolla
Previous methods to segment clouds from ocean in AVHRR imagery have shown varying degrees of success, with nighttime approaches being the most limited. An improved method of automatic image segmentation, the principal component transformation split-and-merge clustering (PCTSMC) algorithm, is presented and applied to cloud screening of both nighttime and daytime AVHRR data. The method combines spectral differencing, the principal component transforvn~tion, and split-and-merge clustering to sample objectively the natural classes in the data. This segmentation method is then augmented by supervised classification techniques to screen clouds from the imagery. Comparisons with other nighttime methods demonstrate its improved capability in this application. The sensitivity of the method to clustering parameters is presented; the results show that the method is insensitive to the split-and-merge thresholds.
1. INTRODUCTION The accurate separation of clouds from ocean in satellite imagery [e.g., Landsat; the Advanced Very High Resolution Radiometer (AVHRR); the Coastal Zone Color Scanner (CZCS); and the
Address correspondence to J. J. Simpson, Scripps Institution of Oceanography, Satellite Oceanography Center, University of California, San Diego, La Jolla, CA 92093. Received 9 November 1990; Revised 16 May 1991. 0034-4257 / 91 / $3.50 ©Elsevier Science Publishing Co. Inc., 1991 6.5.5 Aveliue of the Americas, New York, NY 10010
Geostationary Observing Earth Satellite (GOES)] is a critical step in the analysis of these data. Cloud screening and cloud classification are two methods which perform this step. Cloud screening is a specialized form of classification in which cloud containing pixels in an image are identified as contaminated while the remaining pixels are retained for analysis. This procedure is fimdamental to valid sea surface temperature (SST) determination from remotely sensed data because clouds represent the most significant souree of error in AVHRR data (Henderson-Sellers, 1982; Simpson and Humphrey, 1990). Cloud classification, on the other hand, involves both the segmentation of the image data into its naturally occurring groups and the assignment (or labeling) of these groups to the physical classes which they represent (Schowengerdt, 1983). Cloud classification requires the elimination of both land and ocean from the image and the incorporation of additional quantitative information to distinguish cloud type (e.g. Karlsson, 1990). A comprehensive review of cloud classification studies using multispectral satellite data is given by Pairman and Kittler (1986). Numerous cloud screening methods have been developed for both daytime and nighttime AVHRR imagery. The most advanced daytime method is that of Simpson and Humphrey (1990). Their approaeh dynamically determines both theoretical and empirieal criteria that are unique for every pixel in the data, ancl is therefore optimal for each individual image. No present nighttime
77
78 Gallaudet and Simpson
method for cloud screening incorporates dynamically determined criteria in this fashion. The simplest approach for nighttime cloud screening of AVHRR data is to apply a constant infrared brightness temperature threshold to every pixel in the image. This method fails for several reasons: 1) Subpixel clouds and cloud-pixel misalignment (i.e., the field of view falls on the edge of the cloud) can lead to errors because the distribution of radiances is nonuniform within the pixel (Shenk and Salomonson, 1972; Simpson and Humphrey, 1990); 2) variations in brightness temperature result from sensor geometry (Dalu, 1985; Foody, 1988) and sensor aging (Duggin, 1985); and 3) the spectral response of clouds varies with cloud type and height (Liljas, 1987). Coakley and Bretherton (1982) developed the spatial coherence method for determining clear and cloud sky radiances and, for single layer systems, cloud cover. The method has an advantage over static thresholds because it utilizes the local spatial structure of the infrared radiance field to determine cloud-free and cloud-covered pixels and infer partially filled fields of view. Their method fails, however, when the cloud system in the image is multilayered, when the clouds are everywhere smaller than the instruments field of view, and when the clouds have variable emissivities, as do cirrus. The most advanced nighttime method to date is that presented in Saunders and Kriebel (1988a). This method employs a three layer algorithm consisting of five steps: 1) a static infrared (IR) threshold to screen pixels below a user-defined radiance value, 2) a spatial uniformity test similar to that in Coakley and Bretherton (1982), and 3)-5) three tests which determine the brightness temperature differences (AT's) between the three AVHRR infrared channels. These last three tests incorporate empirically derived values for the'AT's and reject pixels as cloud-contaminated based on these criteria. This approach is limited because the AT's that they employ are both statically applied and regionally specific. Thus, their approach suffers from the threshold limitations discussed above, and is restricted to data covering specific geographic regions. The purpose of this paper is to present a new method for cloud screening nighttime AVHRR data. This method utilizes split-and-merge clustering to separate the data into its natural group-
ings and therefore is not subject to the limitations of previously mentioned nighttime methods. The method is based upon a generic image segmentation scheme; thus, it may be used as a preprocessor to either cloud screening or cloud classification. Additionally, the segmentation procedure in the method can be adapted for classification of Landsat, CZCS, and GOES data. 2. METHODS 2.1. Overview of The Cloud Screening Method An AVHRR infrared image, designated T, consisting of brightness temperatures calibrated to degrees Celsius in Bands 3, 4, and 5 (hereafter referred to as T3, T4, and Ts) is first spectrally differenced to construct a 2-banded differenced image D (i.e., T3-T4, 7"3-Ts). This spectrally differenced image now contains all the information that is needed for cloud segmentation, but only in two linearly independent bands. Next, this spectrally differenced image is transformed using the principal component transformation. The result is a 2-banded transformed image Y (PC1, PC2) in which all interband correlation is removed. Then, this transformed image is segmented using a split-and-merge clustering procedure. The result is a segmented image, Y', in which the natural classes are separated. Finally, this segmented image is retransformed to the original infrared values to produce the segmented image X'. This image then is labeled such that each of the classes is either a cloud or noneloud. The classes labeled as cloud are masked, and the classes labeled as noncloud are retained. Let T' be this cloudmasked image. Hereinafter, this algorithm is referred to as the principal component transformation split-and-merge clustering (PCTSMC) algorithm for nighttime AVHRR cloud screening. Figure 1 presents an outline of the method, and the notation is described in Appendices A-C. Each step is described below. In the following description, image space refers to the original infrared AVHRR image T, the spectrally differenced image D, the retransformed segmented image X', and the final cloud screened image T'. Transform space refers to the PCTdifferenced image ¥ and the segmented image YP. Feature space, however, is a more general term which refers to the space in which features are
Automated Cloud Screening of AVHRR Imagery 79
Input a 3-Banded A V H R R
Infrared
Image
T
1 f
STEP 1 Construct the 2-banded Differenced Image From the AVHRR Infrared Image T --, D" --* D
STEP 2
Transform the 2-banded Differenced Image With the Principal Component Transformation D --* X --* Y
1 f
STEP 3
Segment the Transformed Difference Image Using Split-and-Merge Clustering y
f
.-. y -
STEP 4
Retransform the Segmented, Transformed, Differenced Image And Label Each Cluster As Cloud or Non-Cloud Y"
--~
X"
~
T"
Output the Input Image With A
Cloud Mask
T"
derived for use as segmentation criteria. These features may be derived in either image or transform space. In the PCTSMC method, for example, the feature space is the PCT-differenced transform space, and includes the images Y and Y'.
2.2. The Principal Component Transformation The principal component transformation (PCT) is applied to the speetrally differenced AVHRR infrared image D. The PCT, also referred to as the Karhunen-Loeve (K-L) transformation, the hoteling transformation, and the eigenvector transformation, is a multivariate statistical technique that involves choosing uneorrelated linear combinations of a given set of variables in such a way that each element in this combination represents
Figure 1. Overview of the PCTSMC cloud screening algorithm for AVHRR imagery.
a decreasing amount of the variance in the original variables (Ingebritsen and Lyon, 1985). This linear transformation defines a new set of coordinate axes for the data such that: 1) The transformed origin is at the mean of the data distribution (Lillesand and Kiefer, 1987); 2) the transformed coordinate axes are mutually orthogonal (Jensen, 1986); and 3) the transformed coordinate axes are in the directions of maximum variance (Jensen, 1986). For multispeetral image data, the PCT results in an alternative description of the data in which all interband correlation is removed (Mather, 1987). This decorrelation property has resulted in its use in image enhancement (e.g., Sehowengerdt, 1983; Gillespie et al., 1986; Jensen, 1986; Riehards, 1986; Sabins, 1987). The PCT is applied in the second step of the PCTSMC
80 GaUaudetand Simpson
algorithm because it separates the spectral classes in the original data (Schowengerdt, 1983; Seddon and Hunt, 1985), thus making it a logical preprocessor to the segmentation operation performed in Step 3 (Appendix B). [Note that the PCT may also be used in the analysis of multitemporal image data in which the image space is temporal rather than the spectral (Ingebritsen and Lyon, 1985)]. Appendix A presents a thorough development of the PCT.
2.3. Split-and-Merge Clustering The third step of the PCTSMC algorithm performs image segmentation using a split-and-merge clustering procedure on the PC transform of the spectrally differenced image ¥. This results in a segmented image ¥', in which the natural spectral classes in the original image are separated into distinct groups. A brief description of the approach is given below; details are given in Appendix B.
and Kittler, 1986). Thus, global segmentation criteria were preferred for this study, and a clustering approach was selected.
Clustering Techniques Clustering techniques may be classified as either 1) supervised or 2) unsupervised (Jain and Dubes, 1988). In supervised clustering, a priori knowledge of the groups within the data is used to direct the procedure. In unsupervised clustering, however, the data are grouped on the basis of the inherent structure of the data alone (Devijver and Kittler, 1983). In order to attain full automation and maximum objectivity, an unsupervised approach was selected. Unsupervised clustering may be further subdivided into the categories of a) hierarchical or b) partitional. A hierarchical method constructs a nested sequence of partitions, while a partitional method constructs a single partition of the data. Hierarchical methods include the following:
i. Merging schemes, which start at a single Image Segmentation and Clustering Image segmentation techniques fall into two major categories: 1) direct region detection, and 2) extended edge detection (Horowitz and Pavlidis, 1976; Chen and Pavlidis, 1980). The region detection approach may be subdivided further into either a) region growing (Cross et al., 1988), and b) clustering (Fukada, 1980; Haralick and Shapiro, 1985). A region growing form of direct region detection partitions an image into homogeneous and inhomogeneous regions on the basis of local properties in the image [e.g., texture (Cross et al., 1988)]. Clustering, on the other hand, refers to any type of method that attempts to automatically partition a given data set by identifying the natural groupings of the data within a specified feature space by optimizing a chosen clustering criterion (Pairman and Kittler, 1986). Hence, this approach is dependent upon global properties of the data. As with the region growing form of region detection, edge detection techniques rely on local properties within the data. They operate by detecting edges between regions and then identifying the regions (Fukada, 1980). Edge detection techniques are unreliable for complex images with fuzzy boundaries (Horowitz and Pavlidis, 1976), which is characteristic of cloud containing infrared imagery (Seddon and Hunt, 1985; Pairman
data point (i.e., pixel) or an initial segmentation and proceed to merge pixels into their respective clusters (or regions) based on a specified merging criterion (Pavlidis, 1977; Brice and Fennema, 1970). ii. Splitting schemes, which start with the entire image and procecJ to subdivide the image into homogeneous regions based on a homogeneity (or splitting) criterion (Pavlidis, 1977; Ohlander et al., 1978; Kettig and Landgrebe, 1976; Fukada, 1980). iii. Split-and-merge schemes, which combine the above operations in an effort to incorporate the advantages of both and reduce computational time (Horowitz, 1977; 1976). These methods were first applied to one-dimensional waveforms by Pavlidis (1972; 1973) and then received their formalism in image segmentation by Horowitz and Pavlidis (1974; 1976). Partitional methods of clustering are iterative; they begin with an initial clustering of the data and, at each successive iteration, attempt to converge to a partition of the data which optimizes a chosen clustering criterion (Seddon and Hunt, 1985). The ISODATA algorithm of Ball and Hall (1965) is the classic example of this type of method (Devijver and Kittler, 1983).
Automated Cloud Screening of AVHRR Imagery 81
The Split-and-Merge Clustering Procedure The method of clustering that is employed in the PCTSMC algorithm combines both the partitional and hierarchical approaches. It consists of a partitional clustering algorithm augmented by a splitting-and-merging step at each iteration. Combining a partitional with a hierarchical method has several advantages over the use of either method alone (see Sec. 4.1.3). Step 3 of the PCTSMC algorithm is summarized as follows: a. Initialization: Determine KI vectors that are uniformly distributed throughout the data to serve as initial cluster centers (or mean vectors), where KI is the number of initial clusters. A method to dynamically determine an optimal value of KI for a given image has been devised (Appendix C). b. Clustering: Assign each vector in the image to one of the clusters depending upon the selected clustering criterion. The selected criterion function is the square error criterion (Appendix B). c. Merging: Check all possible groups of two clusters and combine them if they satisfy the chosen merging criterion. The chosen merging criterion is a threshold value of the squared Euclidean distance between cluster mean vectors, designated TMERGE (Appendix C). d. Splitting: Check all clusters and split them in half if they satisfy the chosen splitting criterion. The chosen splitting criterion is a threshold value of the squared Euclidean distance between cluster maximum and minimum vectors, designated TSPLIT(Appendix C). e. Recompute New Cluster Centers: This is performed for the K{ clusters that result from Steps b-d, where K{ is now a new number of clusters. f. Repeat Until Converged: Beginning with Step b, continue the procedure until the convergence criterion is satisfied. (Note that convergence at the jth iteration will require j + 1 iterations to be performed.) The selected convergence criterion is the minimization of within-cluster scatter (Appendix B). g. Complete Cluster Assignment: After the algorithm has converged, assign every vector to
its appropriate cluster such that withincluster scatter is minimized. An overview of the clustering procedure is given in Figure 2.
2.4. Cluster Labeling and Cloud Mask Generation In the final step in the PCTSMC cloud screening procedure, the data are retransformed back to the image space; this results in a segmented image, denoted X'. Each of the clusters (Step 3) now are labeled as either a cloud or noncloud (Step 4) to produce the matrix X'. In the first three steps, the operations performed on the data were entirely unsupervised-that is, no a priori knowledge was required. In this final step, expert knowledge is introduced in order to perform a Boolean classification. After transforming the segmented image back to the image space (i.e., the original AVHRR channel values), the cluster mean and minimum vectors are analyzed. The mean and minimum vectors of the kth cluster each have three components, corresponding to the infrared Bands 3, 4, and 5. These vectors can be represented as [T/3k) T(4k) T(~)]ir and [T(k) MIN T(k) MIN T(k) MIN]T. Then, four tests are performed on these vectors in order to label each cluster. The first two sets are modified forms of tests taken from the nighttime method of Saunders and Kriebel (1988a). [Note: The reader is cautioned to observe that the nighttime tests in Saunders and Kriebel (1988a) are inconsistent with the schematic of their algorithm presented in Figure 5 of their paper; Saunders and Kreibel (1988b) contains the correct schematic]. These two tests threshold the brightness temperature difference between the components of each cluster mean vector. The thresholds used in Saunders and Kriebel (1988a) were derived for 1024 x 1024 pixel images. These thresholds were found to be scale dependent [i.e., for similar classes of data (e.g., low cloud), smaller clusters required larger thresholds]. Thus, each threshold is sealed by the factor ~ = (Me - nk) / M 2, where nk corresponds to the size in number of pixels of the kth cluster and M2 corresponds to the image size used by Saunders and Kriebel (1988a). Thus, the kth cluster is labeled as cloud if either
82 Gallaudetand Simpson
J
Step a : Initialization
I
>
Step b : Clustering
\
Step c : Merging
->
Step d : Splitting
I Step e : Recompute New Cluster Centers After Each of Steps b) - d) // ,/ // %
Step f : Repeat Until Converged
T
Step g : Complete Cluster Assignment After Convergence Criterion is Satisfied
Figure 2. O v e r v i e w o f Step 3 o f t h e P C T S M C algorithm: s p l i t - a n d - m e r g e clustering.
1. r(4k) - T(~) > [1.0 + 3,1.0]°C, or 2. T(~) - T(5k) > [1.5 + 71.5]°C, obtains. The second two tests are from the Simpson and Humphrey (1990) daytime AVHRR cloud screening method, and these are applied to the Band 4 values of the cluster minimum vectors, T(4k)MIN. First, the set of cluster T(k)MIN values are high pass filtered to insure that none are below - 2 . 0 ° C , the minimum valid global SST [U.S. Navy, 1981]. The kth cluster is labeled as cloud if T(k)MIN is below - 2 . 0 ° C . Then the remaining set of cluster T(4k)MINvalues are bandpassfiltered such that all T(4k)MIN values are within + 4 standard deviations about the mean of the Band 4 data which have not yet been labeled as cloud,
denoted T(~)MEAN. Thus, these second two tests label a cluster as cloud if either 3. T(k)MIN < -- 2.0°C, or 4. IT(4k)MIN-- T(4k)MEANI >404 obtains. All pixels which belong to clusters that have been labeled as cloud by either of the above four tests are then masked as cloud-contaminated. Finally, the original image T is multiplied by the cloud mask X' to produce the the cloud masked image T'.
3. R E S U L T S
Testing consisted of three main procedures. First, the algorithm was tested by varying each of its
Automated Cloud Screening of AVHRR Imagery 83
Table 1. T e s t I m a g e I n f o r m a t i o n Parent Images Image Name
Satellite Name
Pass Date
Pass Time (GMT)
Image Sizea
Region
A
NOAAll
30 July 1990
22:11:00
512 x 512
B
NOAA6
11 June 1981
21:55:00
512 x 512
C
NOAA6
31 Aug 1979
16:03:00
512 x 512
D
NOAA9
14 Aug 1985
22:11:00
512 x 512
E
NOAA7
16 Oct 1981
22:19:00
512 x 512
F
NOAA7
3 May 1983
22:15:00
512 x 512
G
NOAA9
17 Oct 1988
22:32:00
512 × 512
Central California Central California Northern California Baja California Central California Baja California Northern California
a
Center, Lat/Long 36 120 35 122 38 125 26 115 35 122 26 115 38 125
00 00 00 00 30 30 00 00 00 00 00 00 00 00
N W N W N W N W N W N W N W
1.1 km x 1.1 km pixels at nadir.
steps. Next, performance was analyzed with respect to landmasking, sensor noise, and efficiency. Lastly, the algorithm was compared with other cloud screening methods. Testing was performed on seven 512 x 512 pixel high resolution AVHRR infrared images from the California Current region (Table 1), and 21 subimages that were extracted from parent Image A [see Gallaudet (1991) for subsectioning details]. These images were selected in order to represent a wide range of image sizes and variances. The meteorological conditions and cloud patterns represented in these data are typical of the region. Surface winds are driven by the gradient between the North Pacific High and continental thermal low over California (Hickey, 1979). This gradient intensifies in the summer when the low deepens (due to continental heating). Long-term mean conditions of the region show that coastal upwelling is induced by these conditions, and, in turn, coastal low stratus and fog is persistent from May to August. Offshore, the cloud deck breaks up into cumulus clouds (Nelson and Husby, 1983). In the remainder of the year, mean cloud cover is greater offshore. Images (Figs. 3, 4, 5, and 9) are used to show the details of the image segmentation and cloud screening. The first panel in each of these figures show the geographical domain of the images. These images were landmasked prior to cloud screening (Simpson, 1991). For AVHRR Band 4 (Figs. 3b, 4b, 5b, and 9b), land appears white,
and coastline appears black; also, white represents cloud, and other gray scales denote SST. For segmented versions of these images (Figures 3c, 4c, 5c, and 9c), gray scales only designate different clusters. For masked SST (Figures 3d, 3f, 4d, 4f, 5d, 5f, and 9d, 9e), black indicates cloud masked pixels and land. Other gray scales indicate valid SST.
3.1. Variations of the PCTSMC Algorithm
3.1.1. Varying the Spectral Differencing The algorithm's performance using different variations of spectral differencing was studied first. Three variations were performed: 1) two-banded spectral differencing with D = [T3 - T4, T3 - Ts], 2) two-banded spectral differencing with D = [T3- T4, T4- Ts], and 3) three-banded spectral differencing with D'= [T3 - T4, T4 - Ts, T3 - Ts]. For these variations, the algorithm's performance was measured with the following parameters: a) percent within-cluster variance in the final partition of segmented image, b) combined number of split-and-merge operations that were performed in the first iteration of the clustering procedure, c) percent of image rejected, d) number of iterations until convergence in the clustering procedure, e) number of clusters in the final partition of the segmented image. Representative results for parent Image A and the three subimages are summarized in Table 2 [see Gallaudet (1991) for the
84
Gallaudet and Simpson
AVHRR BAND 4
REFERENCE GRID
~
/
122°W
/OPOINT CONCEPTION~
120"
(b)
118"
PCTSMC DIFFERENCED MASKED IMAGE SEGMENTED IMAGE
f
~!!!~!! ~ :i!~!i~i~i~ ii!!iiii:ii~iii!:~i ~ ~iii!i ~!II~ ~
(c)
(d)
,
PCTSMC NON -DIFFERENCED MASKED IMAGE SEGMENTED IMAGE
(e) (
f
)
!
Figure 3. The effect of spectral differencing (Step 1) prior to the PCT (Step 2) in the PCTSMC algorithm for Image A.
Automated Cloud Screening of AVHRR Imagery 85
REFERENCE GRID
AVHRR BAND 4
38°N
37 °
36*
(Q) 123°W
122 °
121 °
(b)
PCTSMC DIFFERENCED SEGMENTED IMAGE MASKED IMAGE
(c)
(d)
PCTSMC NON-DIFFERENCED SEGMENTED IMAGE MASKED IMAGE
(e) Figure 4. Analogous to Figure 3, except Image A16.
(f)
86 Gallaudetand Simpson
AVHRR BAND 4
REFERENCE GRID 38"N
~ ° ~
/__J(a) 123"W
(b)
PCTSMC DIFFERENCED MASKED IMAGE SEGMENTED IMAGE
"(c)
(d)
PCTSMC NON -DIFFERENCED SEGMENTED IMAGE MASKED IMAGE
"(e) Figure 5. Analogous to Figure 3, except Image A15.
(f)
Automated Cloud Screening of AVHRR Imagery 87
Table 2. Variation of the P C T S M C Algorithm with T h r e e Different F o r m s of Spectral Differencing (Step 1) (Representative Results for P a r e n t Test Image A a n d T h r e e Subimages) a
Image
2-Banded Spectral Differencing with D = [T3 - T4,T3 - Ts]
2-Banded Spectral Differencing with D = [T3 - T4,T4 - Ts]
3-Banded Spectral Differencing with D' = [T3 - T4,T4 - Ts, T3 - T~5]
(a) Percent Within-Cluster Variance in the Final Clustering Partition A A1 A2 A3
0.361 0.508 0.407 0.588 1.163 0.908 0.455 0.537 0.522 0.204 0.313 0.306 (b) NumberofSplits and Mergesin Iteration l ofthe Clustering Procedure
A AI A2 A3
23 701 173 23 23 25 22 24 22 23 471 55 (c) Percentoflmage Roected Excluding Landmasked Pixe~
A A1 A2 A3
15,478 6,805 16.504 34.766
18.077 6.231 16.406 34.933
15.574 6.219 16.504 35.066
a Note that an initial number of clusters KI = 30 was used in the clustering procedure (Step 3) of the algorithm.
results for parameters d) and e) and the results for all 28 test images]. These show that method 1) produced the smallest percent of within-cluster variance, and hence the most robust segmentations (Appendices B and C). Methods 2) and 3) also exhibited a much greater number of splitand-merge operations in the clustering procedure than method 1), although the final number of clusters and convergence iterations of the methods were similar. The cloud masks did not differ significantly.
3.1.2. Varying the PCT Next, the algorithm's performance using different variations of the PCT was studied. Variations included: 1) no PCT, 2) applying the PCT to the spectrally differenced image, and 3) applying the PCT to the original (nondifferenced) image. For these variations, the algorithm's performance was measured with the same five parameters measured in the spectral differencing tests. Representative results for parent Image A and the three subimages are summarized in Table 3 and in Figures 3-5. They show that spectrally differencing the image prior to performing the PCT produces greatly improved segmentations and cloud masks. In general, 1) the segmented spectrally differenced images were separated into more
classes, 2) these classes were more distinct than those in the nondifferenced segmented images, and 3) classes in the segmented nondifferenced images were mixed (i.e., thin clouds and cloud edges were grouped into the same class as the ocean). For these nondifferenced images, the labeling routine often flagged a class containing ocean as cloudy (Figs. 3 and 4). The PCTSMC algorithm then was tested using two other methods of performing the PCT. In the first of these, the correlation matrix was used instead of the covariance matrix (Appendix A). In the second, the input image was demeaned prior to performing the PCT (in this context, demeaning refers to removing the spectral mean values from each band in the image). The results of these tests showed no consistent p a t t e r n - t h a t is, using the correlation matrix in the PCT or demeaning the input image prior to the transformation did not always result in a different segmentation and cloud mask (Gallaudet, 1991).
3.1.3. Varying the Clustering Procedure Algorithm sensitivity to the two clustering parameters, 1) the number of initial clusters, and 2) the split-and-merge thresholds, was also determined. The number of initial clusters Kt was varied from 1 to 1000 for 12 of the 28 test images. Representa-
88 Gallaudet and Simpson
Table 3. Variation of the PCTSMC Algorithm with T h r e e Different Forms of PCT Differencing (Steps 1 and 2) a (Representative Results for Parent Test Image A and T h r e e Subimages) b
Image
PCTSMC with the PCT and with Spectral Differencing
PCTSMC with the PCT and without Spectral Differencing
SMC without the PCT and without Spectral Differencing
(a) Percent Within-Cluster Variance in the Final Clustering Partition A A1 A2 A3
0.361 0.588 0.455 0.204
2.105 6.944 6.292 5.338
2.162 7.237 6.669 7.461
(b) NumberofSplits and Mergesin Iteration l oftheClustering Procedure A A1 A2 A3
23 23 22 23
866 174 44 877
962 624 33 962
(c) PercentoflmageRejectedExcluding LandmaskedPixels A A1 A2 A3
15.478 6.805 16.504 34.766
40.899 5.059 15.589 33.899
40.762 5.060 19.929 30.620
For all cases, two banded differencing was employed with D = IT3 - Ta,T3- Ts]. t, Note that an initial number of clusters Kt = 30 was used in the clustering procedure (Step 3) of the algorithm.
tive results (Figs. 6 and 7) showed that algorithm performance was strongly influenced by the value of K~, and the sensitivity to Kt was similar for each image, regardless of image size or variance. For low values of Kx, relatively high values of withincluster variance in the segmented image were observed and relatively large changes in the number of pixels rejected were observed; for large values of Kt, however, both the within-cluster variance and the number of pixels rejected seemed to converge to a neighborhood of values. In the split-and-merge threshold tests, 12 different images were tested each for a 10 × 10 array of thresholds. The parameters that were measured for each run in these test arrays were the five measured for the spectral differencing and PCT tests. Representative results (Table 4; Fig. 8) showed that the algorithm's performance is mostly insensitive to these thresholds [additional figures are given in Gallaudet (1991)]. For all of the images tested, variations were seen only for the smallest thresholds.
3.1.4. Varying the Labeling Procedure The effects of scaling in the labeling procedure were also analyzed. Nine subimages were tested with scaling and without scaling (Table 5). The first two thresholds in the labeling procedure
were observed to be highly scale-dependent. When these values were scaled by cluster size, accurate cloud masks consistently were obtained (e.g., Image A8 in Fig. 9).
3.2. General Performance Evaluation Next, the sensitivity of the PCTSMC algorithm was measured with respect to landmasking, sensor noise, and efficiency. In the landmasking tests, cloud masks were compared for three cases: 1) nonlandmasked images, 2) landmasked images, and 3) ocean-masked images. When images were landmasked prior to cloud screening (Fig. 10), cloud detectability was improved. Under these conditions, greater differentiation was obtained in the segmented image, and more accurate cloud labeling resulted. The noise-sensitivity studies showed that for some images, sensor noise (i.e., AVHRR Band 3) degraded the resulting PCTSMC cloud masks (Fig. lla); this effect, however, was reduced by applying a filter on the Band 3 data prior to the cloud screening process (Figs. l l e and llf). The efficiency was also evaluated by measuring the total time for one iteration of the method. Time was recorded for three different-sized images using three different values of KI. When the largest values of K~ were used with the largest
Automated Cloud Screening of AVHRR Imagery 89
60-
IMAGE=A2 N= 1024 2 : 53,015 (7Ol
IMAGE=A3
IMAGE=A7
N= 6 5 5 3 6 o"21 =453.219
O-Z-~u= 5 343
N= 2 5 0 0
40
~0-I
20-
I
Z u3
O
#
,
f
5 710
,
i
,
i
5O 100
,
,
/ ,
i
,
i
5O I00
5 0 0 1000
Z
o
b
,
i
50 I00
,
i
,
500 I000
IMAGE=All N= 3 0 0 0 2 _ 000001 - 140.458
IMAGE = A8 N = 25O0 cr~u = 231 578
IMAGE = A6 N= 1024 z =505650 O-DI
Z
,
O0
3OO
~o£.) 6 0 0 -
~7,'o
,
500 I000
50
I-I-i-Y 4 0 0 -
200
-
_3 <~ Z
(30
I
hi T 200F--
50
Z t~ Z rr ,
$
,
i
i
50 100
rY
i
,
,
500 I000
; ;i,o
t
,
/
I
T
50 1(30
"
I
'
35d
IMAGE = AI N= 16384 0-~1 = 29.772
40.
I
Z "r t-'~
:1
IMAGE = A9 N=3125 2 = 0.068 0-DI
0.4-
5710
500 I000
:50
5 0 ~00
5 0 0 1000
IMAGE =AIO N= 3 1 2 5 o-eel : 2.785
0.3-
202
o2 t
~
I-
0.1-
04 O1
i
; 5"~,o
,
1
,
i
50 I00
,
i
,
500 I000
7 I0
50 I00
NUMBER OF INITIAL
500 I000
CLUSTERS
5710
5O 100
5 0 0 100(3
(K z)
Figure 6. Sensitivityof within-clustervariance to initializationin the clustering procedure (Step 3) of the PCTSMC algorithm (representative results for nine test images).
images, decreased efficiency was observed. These large values, however, were not required for robust segmentations and cloud masks; an operationally efficient range of Kt's, which produced robust results, was determined from these results (Appendix C).
3.3. Comparison with Other Methods Finally, the method was compared to three other methods suitable for nighttime applications and one daytime method. The nighttime methods were 1) IR thresholding, 2) the nighttime method of Saunders and Kriebel (1988a), and 3) the spatial
90 GaUaudet and Simpson
IMAGE=A2 N= 1024 _ 53.015
60000 I
2O-DI
)OOT- ]
IMAGE=A3 N = 65536 Cry1 = 453.219
-~
IMAGE=A7
I
N= 2 5 0 0
tO01 /
G[~, = 5,343
50000600-
40000-
"1/ 300
50000200-
(a) , .
.
.
5 7 I;
.
,,
5; I00
5(3(9
20000
,
I000
IMAGE=A6 N= 1024 G[~ = 505650
W E
(b) 3
t
,
p
5 7 I0
,
i
,
i
,
i
50 I00
200 -
700
700 500 0
,oo4 I
0
-
IMAGE=At N= 16584 2 (7"m = 29772
12500-
-
8500
-
i , ~
5 ;',o
500 1000
500 I0(0~
Gm=140.458
1000-
70(3-
11000
14500 -
1O5O0
!d,)
50 I00
50 100
IMAGE=All N=30CO z
2500-
500
. . . . . . . .
3 5 710
~
1500-
~
3 5 7 I0
I
0
2000-
200-
3
0
500
IMAGE=A8 N= 2500 2 o- m = 231.578
!700 -
I
-
800-
, i , i
5o ,oo
/
(e)
, r ,
500 ,ooo
I
'M G ;r ]
2000-~[
o-~, = o.06e
,600 ~1
~ ; ->,b
(f) ' ~,ooo
, i
' 5'0 ,oo
t/
o-~, = 2.785
600-
1200 400-
t /
6500-
200-
800J 1
4500-
O2500
-
(g) i
; 5 ~,;
,
i ,
i
50 ,oo
'56o',ooo
3 5
7'I0
50 10(3
5(3(31000
I
3 5 7 I0
50 10(9
5(90 10(9(9
NUMBEROF INITIALCLUSTERS (KI)
Figure 7. Sensitivity of the number of pixels rejected to initialization in the clustering procedure (Step 3) of the PCTSMC algorithm (representative results for nine test images). The dashed lines in the various panels show the number of pixels rejected by the LDTNLR algorithm of Simpson and Humphrey (1990). Note that in general the LDTNLR algorithm is more conservative than the PCTSMC algorithm because the LDTNLR algorithm uses geometric constraints, which at present are not included in the PCTSMC algorithm.
Automated Cloud Screening of AVHRR Imagery 91
Table 4. Sensitivity of the PCTSMC Algorithm to the Split-and-Merge Thresholds U s e d in the Clustering P r o c e d u r e (Step 3) (Representative Results for Test Subimage A15) a TSPLIT TMERGE 0.001 × d 0.005 × d 0.010 × d 0.025 × d 0.045 x d 0.070 x d 0.100 × d 0.500×d 1.000 × d 1.500×d
0.001xd
0.005xd
0.010×d
0.025xd
0.045xd
0.070×d
0.100xd
0.500×d
1.000xd
1.500xd
0.938 0.327 0.352 0.698 3.622 1.490 5.736 7.810 10.338 23,883
0.653 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177
0.906 52.303 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177
0.804 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177
2.779 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177
2.779 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177
2.779 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177
2.779 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177
2.779 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177
2.779 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177 0.177
a Percent within-clustervariance in the final clustering partition, given by [Tr(Sw)/(N- 1)oi~] × 100 %. The parameter d is the squared Euclidean distance between the maximum and minimum vectors in the transformed spectrally differenced image Y prior to segmentation. The parameter d provides for an automatic determination of the splitting and merging thresholds used in the clustering procedure (Step 3) of the PCTSMC algorithm on an image by image basis (Appendix C), and was computed for Subimage A15 as d = = [yuax -- YM1N]r[yM~X-- yMIN] = 6593.295°C '2, where yMAXand yM1Nare the maximum and minimum vectors in the transformed differenced image Y, respectively.
coherence method of Coakley and Bretherton (1982). The daytime method was the local dynamic threshold nonlinear Rayleigh (LDTNLR) method of Simpson and Humphrey (1990). Five threshold values were used for method 1); the first three were the AT thresholds described in Saunders and Kriebel (1988a), and the remaining two were the Band 4 thresholds taken from Simpson and Humphrey's (1990) daytime method. Results for the various methods (Images A-G) are presented in Table 6. The results for Images B, C, and D are given also in Figures 12, 13, and 14, respectively, and the spatial coherence results are presented separately in Figure 15. The PCTSMC method produced accurate cloud masks for each image. Compared to both the thresholding method and the method of Saunders and Kriebel (1988a), the PCTSMC method retained more valid ocean data with no loss of cloud mask accuracy. In every case, these two methods rejected more pixels than necessary. The PCTSMC method also detected clouds more accurately than the spatial coherence method. The spatial coherence method often screened cloud free regions containing complex thermal structure, particularly regions of coastal upwelling (e.g., Figs. 15b and 15d), and often failed to detect high cirrus and subpixel clouds (e.g., Figs. 15c and 15f). Furthermore, the spatial coherence method erroneously screens near land boundaries. The PCTSMC method compared well with the day-
time LDTNLR method of Simpson and Humphrey (1990). This can be seen by comparing the cloud masked images with the Band 2 and Band 4 data. In general, the LDTNLR method tended to be more conservative (e.g., Image B). When noise is present in the data (i.e., AVHRR Band 3), however, the PCTSMC method typically produces a less robust result compared to the LDTNLR method (e.g., Image D).
4. DISCUSSION 4.1. The PCTSMC Algorithm 4.1.1. Spectral Differencing Spectral differencing is performed in Step 1 of the PCTSMC algorithm because it increases the dynamic range of the image prior to the PCT [see the image histograms in Gallaudet (1991)]. Note that the original infrared image (T -- [T3, T4, T~]) is a map of values of brightness temperatures in three spectral channels, whereas the spectrally differenced infrared image (D'-- IT3 - I"4, T4 - T~, T3- T~]) is a map of the variations in brightness temperatures between the spectral channels. Thus, spectral differencing adds a step in the separation of spectral classes. Only two of the bands in D' are linearly independent. Hence, the two-banded spectrally differenced image D = [T3 - T4, T3 - Ts] is constructed.
92 GaUaudet and Simpson
SPLITTING THRESHOLD (TsPLIT ) ;<
.OOI xd
~
×
×
--
~o
0
~
Lo
0
0
0
0
0
0
0
--
~
~
t'--
0
0
0
~o
q q O u. q • "/~--~'k:
.O05xd
.
,,#~,{{~.~:~) ~ / .
OlOxd
•
~ ' ~"7""J.
•
q = ~ : . . . . . .
.
•
.........
.
•
.
go
•
o
.
-
•
-
•
~! 50
"
"
•
"15:10
o4s~d| . . . . . . . . . .
.20-
.( . . . . . . . . .
.070xd|
.lO0×d15.
.
.
15--, 1.50 ×d [ - ~ q
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
•
.
0
.
'25"
'
'
(b)
% WITHIN CLUSTERVARIANCE ITr (Sw)/( N-t)o-2Ui] xlO0% "-L ~.54 m , • ~r~ .001 xd ~6~_2 .s6 "~54 LtJ ~; ,O05~d F.OlOxd db .025xd 0 T . 0 4 5 xd CO LLI .070 xd rr :If .lOOxd
" 5"--
.
.
NUMBEROF SPLIT 8~MERGEOPERATIONSIN THE FIRST ITERATION OF THE CLUSTERINGPROCEDURE
.
58
.
.
.
.
.
.
.
.
.
.500×d I.OOxd
(_9
(c)
150xd
PERCENT OF IMAGE REJECTED EXCLUDINGLANDMASKEDRXELS .O01xd ~ .OOS×d
~ "
°'°"-?i
.025xd
.50 .
.
,070xd
,~o~
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
. .
.
. .
.
. .
.
.
.
.
i .
.
.
(d)
.
.
.
. . . . . . . .
.
: : : i :
25
.045xd
.lOOxd
.
1_
NUMBER OF ITERATIONSUNTIL CONVERGENCE
.
.
.
l{!!!!i}!!
(e) FINAL NUMBER OF CLUSTERS KF d == [ YMAX-YMIN]T [ YMAX- YMIN]=6595.295°C 2
Figure 8. Sensitivity of the PCTSMC algorithm to the split-and-merge thresholds (representative results for test Subimage A15). The parameter d (defined in Appendix C) is the squared Euclidean distance between the maximum and minimum vectors in the transformed spectrally differenced image ¥ prior to segmentation• The parameter d provides for an automatic determination of the splitting and merging thresholds used in the clustering procedure (Step 3) of the PCTSMC algorithm on an image by image basis (Appendix C).
Automated Cloud Screening of AVHRR Imagery 93
Table 5. Variation of the Labeling P r o c e d u r e (Step 4) in t h e P C T S M C Algorithm (Representative Results for Nine Test Subimages): P e r c e n t of I m a g e R e j e c t e d Excluding L a n d m a s k e d Pixels b o t h with a n d w i t h o u t D y n a m i c Sealing a
Image
Image Size N
PCTSMC with Static Labeling Thresholds
PCTSMC with Dynamically Scaled b Labeling Thresholds
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A 12
128 × 128 32 × 32 256 × 256 64 × 64 160 x 80 64 x 16 100 × 25 25 × 100 125 × 25 25 × 125 150 × 20 50 × 50
6.950 20.447 35.066 9.699 0.942 100.000 7.600 100.000 0.256 7.360 100.000 99.000
6.805 16.504 34.766 9.000 0.938 52.832 4.760 53.080 0.000 0.992 41.867 0.000
'~These data show, in support of Figures 9 and 18, that when the cluster labeling thresholds were scaled by the image size, cloud-free regions were not erroneously labeled as cloud [i.e., images A6, A11, A12 (now shown), and A8 (Fig. 9)]. ~' Scaled by image size, as defined in the text.
This method of spectral differencing was selected over the other possibilities for three reasons. First, compared to T4 and Ts, AVHRR channel T3 contains the most variation (i.e., T4 and T5 are similar); thus, subtracting T3 from both T4 and T5 yields a spectrally differenced image with a maximum amount of variance. Second, it produced the most robust segmentations (i.e., the lowest percent within-cluster variance, Table 2a) compared to any other method of spectral differencing. This is because most of the variance in D' (e.g., 99.5% average for the 28 images tested) is retained in D when this method of spectral differencing is used. Third, two-banded spectral differencing reduced execution time (i.e., the lowest number of split-and-merge operations in the first iteration, Table 2b) compared to three-banded spectral differencing.
4.1.2. The Principal Component Transformation Spectral differencing and the PCT. The PCT is applied in Step 2 of the PCTSMC algorithm because it separates the spectral classes in the image by destroying all interband correlation. Although this property of the PCT has been extensively applied in pattern recognition [e.g., Fukunaga and Koontz, 1970a; Devijver and Kittler, 1972], it has only recently been applied to segmentation of
REFERENCE GRID
58°N~ 124°W
/
(a)
123 °
AVHRR BAND 4
(b) PCTSMC DIFFERENCED SEGMENTED IMAGE
(c) PCTSMC DIFFERENCED NON-SCALED LABELING MASK
(d) PCTSMC DIFFERENCED SCALED LABELING MASK
(e) Figure 9. T h e effects of cluster size (i.e., scaled vs. nonscaled labeling thresholds) on the cluster labeling proced u r e (Step 4) of t h e P C T S M C algorithm (representative results for test Subimage A8).
cloud containing imagery (e.g., Seddon and Hunt, 1985). An even greater class separation, however, results from spectrally differencing the image prior to performing the PCT (hereafter referred to as PCT differencing). The spectrally differenced images (figs. 3-5) produced more robust segmentations and cloud masks. Spectral differencing substantially decreased the percent within-cluster variance of the segmented image (Table 3a), thus increasing class separability (Appendix B). In each case, spectral differencing also resulted in a greater number of clusters in the segmented image [Table 3e in Gallaudet (1991)] For each segmental image, both the within-cluster variance (Table 8) and the percent within-cluster variance in the PC transform space were always less than
94 Gallaudet and Simpson
124"W
122"
120" i
56"N
54*
Cb)
(a) AVHRR BAND 2 LAND MASKED
REFERENCE GRID
(d) PCTSMC MASKED IMAGE NO LAND MASK
AVHRR BAND 4 NO LAND MASK
¢
.......
(e) PCTSMC MASKED IMAGE LAND MASKED
(f) (d)-(e)
Figure 10. Sensitivity of the PCTSMC method to landmasking (Simpson, 1991) for Image E. Landmasking, like spectral differencing, decreases the dimensionality of the data prior to the PCT.
Automated Cloud Screening of AVHRR Imagery 95
l17*W t
I
I
II 5*
I
I
I
I
28"N
26 o -
24o-
I
I
I
I
I
I
I
I
(a)
I
AVHRR BAND 5
AVHRR BAND 2
AVHRR BAND 4
:
(c)
( PCTSMC MASKED IMAGE 5x5 MEAN FILTER
(b)
(d) PCTSMC MASKED IMAGE NO FILTER
(f)
e) PCTSMC MASKED IMAGE ?x7 MEAN FILTER
Figure 11. Sensitivity of the PCTSMC method to sensor noise in AVHRR Band 3 for Image F.
96 GaUaudetand Simpson
Table 6. C o m p a r i s o n o f t h e P C T S M C M e t h o d to T h r e e N i g h t t i m e M e t h o d s a n d O n e D a y t i m e M e t h o d o f C l o u d S c r e e n i n g ( R e p r e s e n t a t i v e R e s u l t s for t h e S e v e n P a r e n t T e s t I m a g e s ) a (a) Percent of Image Rejected Excluding Landmasked Pixelsfor Each Method Static Saunders and Coakely and Infrared Kriebel ( 1 9 8 8 ) Bretherton(1982) Thresholds NighttimeMethod Spatial Coherence
Image A B C D E F G
PCTSMC 15.487 7.334 35.206 23.872 2.819 9.309 38.049
Image A B C D E F G
Simpson and Humphrey (1990)
(ST)
(SOK)
(COB)
LDTNLR
24.428 13.369 43.388 49.128 3.340 69.054 63.501
24,545 14,109 46,402 49,687 5,536 70,023 63,806
18~816 12.372 22.490 10.718 6.291 19.269 5.896
17.620 23.852 47.350 34.057 22.772 29.266 30.436
(b) Difference in Percent of Image Rejected Excluding Landmasked Pixelsfor Each Method PCTSMC-ST PCTSMC-S&K PCTSMC-C&B PCTSMC-LDTNLR - 8.941 - 6.035 - 8.182 - 25.256 - 0.521 - 59.745 - 25.452
- 9.058 - 6.775 - 11.196 25.815 - 2.717 - 60.714 - 25.757
- 3.329 - 5.038 12.711 13.154 - 3.472 - 9.960 32.153
-
- 2.133 16.518 12.144 10.185 19.953 19.917 7.613
a Note that daytime images were tested for two reasons: 1) to exemplify the general adaptability of the various nighttime methods (i.e., compare performance of nighttime methods on daytime data), and 2) to compare results with the daytime m e t h o d of Simpson and H u m p h r e y (1990), which has been rigorously evaluated and justified. Both the spatial coherence method of Coakley and Bretherton (1982), and the PCTSMC m e t h o d were developed with a general adaptability to daytime and nighttime data. The Saunders and Kriebel (1988a) nighttime method, however, was developed explicitly for nighttime data; they have developed a separate algorithm for daytime data. Their daytime method was neither implemented nor tested in this study.
those values in the feature space, whereas those values of the spectrally differenced image in PC transform space were even less (Fig. 16). These improved segmentations resulted in more accurate cloud masks [Table 3e in Gallaudet (1991) and Figures 3-5 and 16] because the mean vectors of cloud free clusters have reduced the spectral spread between the mean vector components (Table 7). Spectral differencing improves class separation not only for the reasons mentioned above, but also because spectral differencing decreases the dimensionality of the transformed image ¥ (Appendix A). Image dimensionality is simply defined as the number of eigenfunctions required to conserve image variance, and Wacker (1972) has shown that reducing the dimensionality increases class separability. The PCT increases class separability because it reduces dimensionality (Jenson, 1986; Preisendorfer, 1988); spectral differencing enhances this ability. Whereas the spectrally differenced image is composed of two bands, and therefore requires only two eigenfunc-
tions, the nondifferenced image requires three (Table 8). Other methods of the PCT. Although spectral differencing had a significant effect, other variations of the PCT produced no such result. When the correlation matrix was used to perform the PCT, no consistent effect on the segmented image or the resulting cloud mask was seen [see Gallaudet (1991) for tables of these results]. The correlation matrix method acts to normalize the data prior to the transformation. No consistent results were observed because normalizing actually changes the structure of data. When performing a clustering procedure, normalizing will change the interpoint distances and alter the separation between natural clusters in the data (Jain and Dubes, 1988). This explains the results of some images in which ocean and clouds were grouped into the same clusters when the correlation matrix was used. Demeaning the input image prior to performing the PCT produced no consistent effect for the same reason: Demeaning changes the natural clusters that are present in
Automated Cloud Screening of AVHRR Imagery 97
122"
124"W
120"
36"N
34 c
Ia)
(b) AVHRR BAND 4
AVHRR BAND 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
(c)
PCTSMC MASKED IMAGE
P C T S M C SEGMENTED IMAGE
....
SAUNDERS AND KRIEBEL MASKED IMAGE
(d)
"
(f)
(e) LDTNLR MASKED IMAGE
Figure 12. Image B: a) Band 2; b) Band 4; c) the PCTSMC Segmented Image; d) the PCTSMC cloudmasked image; cloud-masked images for the same data derived from: e) the Saunders and Kriebel (1981) algorithm; and f) the LDTNLR algorithm (Simpson and Humphrey, 1990). Note that gray scale conventions are defined in the text.
98 Gallaudetand Simpson
~2yw
,~"
,23"
I
I
40* N
38 ~
(b) AVHRR BAND 4
AVHRR BAND 2
(d)
(c) PCTSMC MASKED IMAGE
PCTSMC SEGMENTED IMAGE
(f)
(e) SAUNDERS AND KRIEBEL MASKED IMAGE
LDTNLR MASKED IMAGE
Figure 13. Analogous to Figure 12, except for Image C.
Automated Cloud Screening of AVHRR Imagery 99
117°W
I
I
I
I
115"
I
I
l
I
28 ° N
lip I
I
26*
24°
I
I
I
I
I
f
I
I
I
I
(a)
AVHRR BAND 2
(b) AVHRR BAND 4
(c) PCTSMC SEGMENTED IMAGE
(d) PCTSMC MASKED IMAGE
(e) SAUNDERS AND KRIEBEL MASKED IMAGE
(f) LDTNLR MASKED IMAGE
Figure 14. Analogous to Figure 12, except for Image D.
1 O0 GaUaudetand Simpson
?ili
(b)
(a) IMAGE B
IMAGE A
(c)
(d) IMAGE D
IMAGE C
(f)
, (e) IMAGE E
IMAGE F
Figure 15. Cloud screening comparisons between the PCTSMC method for Images A (Fig. 3), B (Fig.
12), C (Fig. 13), D (Fig. 14), E (Fig. 10), and F (Fig. 11) and the spatial coherence method of Coakley and Bretherton. (1982).
Automated Cloud Screening of AVHRR Imagery 101
RAW IMAGE & PCT IMAGE
PCT IMAGE & PCT-DIFFERENCED IMAGE
A2
A3 rr LIJ OO AI5
"//////////////A
Z AI6 t.d c9
"//////////////////////////////////////////A
I////I////.///I//I ~///////////1 I
rJ'J"~r,l f , J,J'J'J',~,
I
/'////////////l
AI8
/11//I///I///I I//////////]
AI9
[
(a)
I
l
(b)
I
I
I
I
I
I
I
I
I
1
0
20
40
60
80
0
20
4O
6O
80
WITHIN CLUSTER VARIANCE (°CZ)
m
RAW IMAGE PCT PCT-DIFFERENCED
Figure 16. The effect of the PCT and PCT differencingon within-clustervariance (representativeresults for seven test images). In all cases, spectral differencing(Step 1) prior to performingthe principal component transformation (Step 2) in the PCTSMC algorithmgreatly reduced the dimensionalityof the data and thereby minimized within-clustervariance.
the original data. [Note: Although the correlation matrix method of the PCT is not desirable for a clustering application, it may be desirable for image enhancement (e.g., Singh and Harrison, 1985)].
4.1.3. Split-and-Merge Clustering The SMC technique. Split-and-merge clustering (SMC) is performed on the transformed spectrally differenced image ¥ in order to segment the image into its naturally occurring classes. The SMC technique combines both the partitional and hierarchical approaches to clustering (Sec. 2), and has several advantages over the use of either method alone. They are: 1) Pure hierarchical methods are not appropriate for complex data (Muerle and Allen, 1968; Fukada, 1980; Jain and Dubes, 1988). 2) Pure hierarchical methods are more appropriate for data that is to be partioned on the basis of both local and global information, rather than global information only (Jain and Dubes, 1988). 3) Pure hierarchical methods impose a taxonomic
structure on the data (Anderberg, 1973), which is not characteristic of cloud containing AVHRR imagery. 4) Pure hierarchical methods are orderd e p e n d e n t - t h a t is, the resulting segmentation will vary depending upon the order that the regions are split-and-merged (Cheevasuvit et al., 1986); this often results in less than optimal segmentations of the data. 5) Pure partional algorithms often converge to local minima of the clustering criterion function (Pairman and Kittler, 1986; Jain and Dubes, 1988). 6) The combined approach is more efficient than pure merging or pure splitting methods of region detection (Pavlidis, 1977; Richards, 1986). 7) The combined approach is less dependent on the initial segmentation, and therefore is more capable of recovering all of the natural clusters in the data (Seddon and Hunt, 1985; Jain and Dubes, 1988); this is because the number of clusters in the initial segmentation does not need to be the same as those that actually exist in the given data (Seddon and Hunt, 1985; Pairman and Kittler, 1986).
102
Gallaudet and Simpson
Table 7. Variation of the P C T S M C Algorithm with and w i t h o u t Spectral Differencing (Step 1): C l o u d - F r e e Cluster M e a n Vectors C o r r e s p o n d i n g to Figures 5, 6, a n d 7 a PCTSMC with Spectral Differencing
PCTSMC without Spectral Differencing
Image A
a
Cluster Size in Pixels,
Cluster Mean Vector LrT(k)T(k)T(k)3f, 4 5 J
Sum of Mean Vector Component Differences A T(3k.)4+ A T~k,~
Cluster Size in Pixels,
Cluster Mean Vector trT(k)T(k)T(k)lY34 5 J
Sum of Mean Vector Component Differences
nk
(°c)
(°c)
nk
(°c)
(°c)
75,590
16.651 15.219 14.204 16.207 12.817 11.381 16.616 13.923 12.529
3.879
60,056
1.356
8.216
4422
4.087
6021
16.110 15.862 15.002 14.665 13.797 12.592 15.429 14.527 13.280
A16
8,557
A15
10,203
AT(3k,) + AT~k.~
2.941
3.057
The significance of these data is explained in the caption for Figure 18.
Other methods of clustering were implemented as part of this study and results were compared to those of the split-and-merge method [a thorough review of these results is presented in Gallaudet (1991)]. These results verified the theoretical justification presented above: The SMC technique is the most suitable method for cloud containing AVHRR imagery in which the number of natural
classes is variable and in which the boundaries between classes are indistinct. PCT differencing. PCT differencing not only affected the segmented image and resulting cloud mask; it also optimized the performance of the PCTSMC algorithm. In general, PCT differencing resulted in fewer numbers of split-and-merge operations (Table 3b) and fewer iterations to conver-
Table 8. Variances in t h e P C - T r a n s f o r m e d and N o n t r a n s f o r m e d Images (Representative Results for the Seven P a r e n t Test Images) (a) Band Variances in the Nontransformed Images Nondifferenced Image T Band Variances
Differenced I m a g e d Band Variances
Image
Total a2
Band 3 7"3
Band 4 T4
Band 5 7"5
Total o21
Band 1 7"4- T3
Band 2 T3 - T,5
A B C D E F G
272.605 105.733 209.583 169.514 131.855 120.404 153.616
126.989 40.860 57.248 72.287 45.766 50.433 86.077
72.606 32.461 76.160 50.404 44.541 35.907 33.799
73.010 32.412 76.175 46.823 41.546 34.064 33.740
298.816 17.852 48.541 23,543 2.884 28,109 55,979
139,697 8.936 24,258 11,239 1.396 13,185 27.063
159,119 8,916 24.283 12.304 1,488 14.924 28.916
(b) Principal Component Variances in the PC-Transformed Images Nondifferenced Image PC Variances
Differenced Image Y PC Variances
Image
Total a2
PC1
PC2
PC3
Total a2t
PC1
PC2
A B C D E F G
272.605 105,733 209,583 169,514 131.855 120,404 153,616
173,433 99.929 195,437 162,908 130.916 111.553 142.259
99.560 5.788 14.145 6.553 0.930 8.775 11.001
0.612 0.016 0.001 0.053 0.007 0.074 0.356
298.816 17.852 48.541 23.543 2.884 28.109 55.979
298.622 17.836 48.539 23.465 2.851 28.023 55.623
0.194 0.016 0.002 0.078 0.033 0.086 0.356
Automated Cloud Screening of AVHRR Imagery 103
FIRST ITERATION MERGES
SPLITS 500 -
500 -
400 -
~,00
300
500
-
-
200 200
..
-
..
I00 i
I00 -
OO-
(a)
cr LIJ CD
0
Z
I00
200
300
(b)
;
,oo
2&
COMBINED I000-
800
TEST IMAGE A AI6 AI5 . . . . .
-
. . . . . . . . . . . . .
600
-
400 -
200 -
(c)
0I
- - I
0
I O0
_ ~ m
200
I
300
NUMBER OF INITIAL CLUSTERS (K[)
gence [Table 3d in Gallaudet (1991)]. These resuits confirm the conclusions presented above. PCT differencing adds a step in the separation of spectral classes in the image, and therefore increases the efficiency of the clustering procedure. The clustering criterion. Although the PCT does optimize the square error criterion for a given data set (Appendix B), AVHRR data in general do not fit the required assumptions for the square error clustering criterion (Pairman and Kittler, 1986); these assumptions are that the clusters have approximately equal variances and populations. The result is that pure partitional and pure hierarchical clustering algorithms that incorporate this criterion do not perform well with AVHRR data (Pairman and Kittler, 1986) and often converge to local minima of the criterion
Figure 17. Sensitivityof the splitting and merging operations to initialization in the clustering procedure (Step 3) of the PCTSMC algorithm (representative results for three different-sized images). These data, plus those of Figures 6 and 7, show that robust results are typically obtained in the stable range of 30 ~
function (Richards, 1986; Jain and Dubes, 1988). There are two solutions to this problem. The first is to select a more sophisticated clustering criterion [a review of the possible choices is found in Gallaudet (1991)]. The second is to select a more sophisticated clustering algorithm. The most efficient choice for this application was to utilize the square error criterion within a splitand-merge clustering method. Split-and-merge capabilities overcome the deficiencies mentioned above because they allow more clusters to be formed and similar clusters to be joined (Richards, 1986; Pairman and Kittler, 1986; Jain and Dubes, 1988). The results in Gallaudet (1991) verify this. The convergence criterion. The min Tr (Sw) convergence criterion defined in Appendix B resuited in robust segmentations. The results in Table 3 and Figures 3-5 verify that this criterion
104
Gallaudet and Simpson
is an effective indicator of clustering performance. Although Seddon and Hunt (1985) suggested this, they did not implement it as a convergence criterion. They merely terminated their approach at six iterations because their implementation usually approached a "stable state" at this point. The PCTSMC algorithm has two advantages over their approach with regards to convergence. First, the PCTSMC algorithm is more efficient. The PCTdifferencing that is performed prior to segmentation separates the data to such a degree that only one iteration of the clustering procedure was required to produce a robust segmentation for the 28 images tested [e.g., Table 3d in Gallaudet (1991)]. Second, the PCTSMC algorithm is more objective because the convergence criterion used is theoretically-based. The initialization. Both clustering and cloud masking performance were sensitive to the method of initialization (Appendix C). Previous authors who have utilized split-and-merge methods have not completely addressed this issue [e.g., Chen and Pavlidis, 1980; Seddon and Hunt, 1985; Cross et al., 1988]. It is well recognized that split-andmerge capabilities reduce the effect of the initialization on the final segmentation. The results, however, show that they do not completely eliminate the effect. This has been suggested by other investigators (Richards, 1986; Jain and Dubes, 1988), but never fully explored for split-andmerge procedures. Figures 6 and 7 specifically illustrate this point. For low initial numbers of clusters Kt, the within-cluster variance in the segmented image was relatively high, and the number of pixels rejected in the final cloud masked image T' varied greatly; these low values imposed too much structure on the data. At larger values of Kz, however, both the within-cluster variance and the number of pixels rejected converged to a neighborhood (Fig. 17). For low values of Kt, the number of split-and-merge operations that were performed in the first iteration of the algorithm was relatively large for every image; for higher values of Kt, this number also converged to a neighborhood. The shape of these response curves [and additional curves in Gallaudet (1991)] were closely linked to the variance in the image, a2oi (Appendix C). The details of these results are most likely due to the method of initialization; other methods may exhibit somewhat different characteristics. Initialization effects will occur re-
gardless of the particular method chosen. A method to dynamically determine an optimal value for Kt has been developed (Appendix C). The split-and-merge thresholds. The PCTSMC method was relatively insensitive to the split-andmerge thresholds used (Appendices B and C). Large variations in performance were observed only for those values of thresholds which approached the sensor temperature resolution [from 0.13°C to 0.10°C for AVHRR Channel 4 (Bernstein, 1982)]. These results are due to the threelayer dynamic nature of the SMC technique. Each of the three operations (clustering-splitting-merging) in the PCTSMC algorithm acts to balance the effects of the other two. This issue, like initialization, also has not been completely addressed in the literature. Only Cheevasuvit et al. (1988) has developed a method of adaptive thresholding. These results also motivated the development of a dynamic method of threshold determination (Appendix C).
4.1.4. Cluster Labeling and Cloud Mask Generation Tests 1) and 2) in the labeling procedure in Step 4 of the PCTSMC algorithm were the most critical elements in the final cloud mask generation. These tests are based upon the existence of brightness temperature differences (AT's) between pixels in the three AVHRR infrared channels [see Saunders and Kriebel (1988a) for an explanation of this phenomenon]. While the nighttime method of Saunders and Kriebel (1988a) employs static values, the PCTSMC algorithm employs dynamic values in two ways: 1) the thresholds are imposed on cluster mean vectors, rather than individual pixels (i.e., AT/k) instead of AT), thus retaining the natural classes in the data, and 2) the threshold values themselves are scaled by the size of each cluster. The mean vector component differences [AT(k?4+AT(3k?5]of cloud-free clusters for all 28 of the test images revealed the scale dependence of these thresholds (e.g., Fig. 18). These A T(k)'s always decreased as cluster size increased. Several examples also illustrated the importance of scaling (e.g., Figure 9 and Table 9). The existence of a canonical relationship between cluster size and AT(3k?4+ AT(3k?5was investigated. No such relationship, however, was found; that is, the value of AT(~!4+AT(3k!5 was a function of cluster size within each image, but the value was widely vari-
Automated Cloud Screening of AVHRR Imagery 105
SENSITIVITY OF LABELING TO CLUSTER SIZE I0-
IO---
IMAGE=C N= 2 6 2 [ 4 4
IMAGE = A N= 262144
2
2
o" m =298.816
8.
IMAGE = D N= 262144 z = 23.545 CrOi
Gol = 48.541
8-
6-
6-
4-
A
'~.
2
+ ~ <1
o
?
2
(a) 1
5000
3
100(30
5oooo
I0
N= 16384 2 o'o~ = 29.772
Z u.l Z 0n
i
Iooo
IMAGE = A 15 N= 16584 Cry, 450.038
IMAGE=At5 N= 16584
IMAGE = At
w 8 ~.) z w bA u_ b_ 6 123
z <[ iJJ
i
i
1000(30
I0
I0---
z
0 o n,o I..(.,..) i.t.J >
i
5O000
I(XX)O
70000
z
Go~ = 3.843
8 -
4-
2-
2-
0
(d) ,oo
(e)
,o~
,06o0
0I00
i
i
tooo
toooo
IMAGE = A20 N= 4 0 9 6
IMAGE = A 21 N =4 0 9 6
2
Z
o"ol = 2.0:57
5ooo
IO
I0 IMAGE = A 4 N= 4 0 9 6
n" Lt.I F"(,,0 ZZ) .._.1 (._)
i
Iooo
2
Cro~ = 15.575
Got =9.822
8-
6-
4-4
4-
2-
(g) o
30
(i)
(h) ~o
,o'oo
0-
5O CLUSTER
,
5OO
~
S I Z E IN N U M B E R OF P I X E L S
I
(n u)
Figure 18. Cloud-free cluster mean vector component differences vs. cluster size (representative results for nine test k) k~ images) These data show the sum of cluster mean vector component differences ATe,4 + AT~,.~ (defined in the text) decreased for larger clusters. These results support those represented in Figure 9, and therefi~re justify the scaling that is employed in the cluster labeling procedure (Step 4) of the P C T S M C algorithm.
106 Gallaudet and Simpson
Table 9. Variation of the Labeling Procedure (Step 4) in the P C T S M C Algorithm: C o m p o n e n t Differences in Cloud-Free Cluster Mean Vectors for Three Different-Sized Images. a Values for Only T h r e e Clusters per Image are Shown. Image Size in Pixels, Image Name N C
512 x 512
A17
256 × 256
A1
128 x 128
Cluster Number in the Image
Cluster Size in Pixels, nk
Sum of Cluster Mean Vector Component Differences Used in the. Labeling Step,
1 2 3 1 2 3 1 2 3
95,754 24,332 9334 33,275 14,617 1614 13,558 1528 158
0.174°C 3.103°C 6.750°C 0.461°C 3.116°C 7.701°C 0.456°C 1.494°C 5.002°C
a These data present numerical evidence of the results shown in Figures 9 and 18 (see corresponding figure captions).
ant for equal-sized clusters in different images. Moreover, examination of second-order cluster statistics (e.g., variance) also failed to reveal any consistent relationship. We suggest that the AT(k) values may be more closely linked to higher-order cluster statistics.
4.2 General Performance Evaluation
4.2.1. The Effects of Landmasking Results showed that the discriminatory capability of the algorithm increased if the image was landmasked (Simpson, 1991) prior to differencing. Image E (Fig. 10) illustrates this point. More of the subpixel clouds in the southwest corner of the image were masked when the image was landmasked prior to segmentation. Landmasking compresses the dynamic range of the image, and thus produces an effect similar to differencing. Image E also shows that the present implementation of PCTSMC algorithm for cloud screening is not applicable over land, because the rules of labeling are empirically derived from AVHRR data over the ocean; hence a few cloud-free clusters over land were labeled as cloud contaminated. The segmentation procedure, however, is completely independent of image content. Thus, the PCTSMC algorithm can be adapted for use over land (e.g., Landsat) simply by changing the labeling rules.
4.2.2. The Effects of Sensor Noise The most limiting factor in performance of the PCTSMC algorithm was the sensor noise in AVHRR Band 3. The algorithm segments different spectral classes in the imagery. The noise level in Band 3 often produced a component of spectral variance that was greater than that produced by clouds. This result was observed in the data obtained from the older versions of each satellite (Table 1), and different satellites each had different levels. Data from NOAA 11 (Image A, Fig. 3), for example, did not exhibit any noise, while data from NOAA 9 (Images D, Fig. 14; Image G, not shown), exhibited a large amount of noise. The Band 3 data of Image F (Fig. l l b ) also is noisecontaminated. Systematic sensor noise in Band 3 also caused the stripes that appear in the segmented images of Image B and D (Figs. 12c and 14c). The degree of degradation in the resulting cloud mask was directly related to the level of noise. [A more detailed discussion of the effects of noise in image data is given by Simpson (1990)]. Image B, for example, exhibited a noise signal in the segmented image, but was masked accurately. Image F, on the other hand, was masked poorly, with systematic streaks in cloud free areas masked as cloud (Fig. 1 ld). In the segmented image, these cloud-free areas were grouped with the thin cirrus clouds that are seen in the northwestern section of the image. Filtering Band 3 with a low pass mean filter prior to PCT differencing significantly
Automated Cloud Screening of AVHRR Imagery 107
reduced the effect of noise (Figs. 11e and 11f). Different sizes and types of filters were used, and a 3 x 3 mean filter proved to be the most effective (Fig. lle).
4.2.3. Efficiency The main issue in the operational implementation of clustering algorithms is efficiency (England and Hunt, 1985). For example, each iteration in the implementation of Seddon and Hunt (1985) required approximately 7 rain on a serial computer for a 500 × 400 pixel image. The PCTSMC implementation, however, required only 3 min per iteration on a Hewlett Packard 9000 / 380 workstation (with a 24 Mbyte RAM configuration) for a 512 x 512 pixel image. This value, however, was dependent upon initial cluster size Kl. The results discussed in Appendix C indicate that robust segmentations were obtained when KI was on the order of 30. This value yielded efficient segmentations. When Kt = 300, however, the required computer time was 23.4 rain per iteration. [Note that such high values of K1 are not required for robust segmentations or cloud masks (Appendix C)]. Despite the general time limitations of clustering algorithms, the PCTSMC method is operationally applicable. PCT differencing resulted in convergence in one iteration for 100% of the images tested. (Recall that the convergence criterion dictates that convergence at jth iteration requires j + l iterations to be performed.) Thus, the PCTSMC method required approximately 6 rain for the clustering procedure, compared to approximately 42 min (six iterations) required in Seddon and Hunt (1985). Total computer time for cloud screening was 7.3 min for a 512 x 512 pixel image [6 min for the clustering procedure (Step 3) and 1.3 rain for differencing (Step 1), the PCT (Step 2), and final cloud mask generation (Step 4)]. This time is considerably less than the typical ingestion and registration time for images at the Scripps Satellite Oceanography Center in real-time support of cruise activity. 4.3. Comparisons with Other Methods Considerable care was taken to insure high quality photo reproduction of the images discussed in this section (i.e., Figs. 10-15). Nonetheless, some detail may have been lost in final printing and
photo reproduction, which varied from figure to figure (depending on the total range of gray scales). All of the images were landmasked prior to cloud screening (Simpson, 1991). For AVHRR Band 2 (Figs. 10b, 11a, 12a, 13a, and 14a), land is black, coastline is white, and clouds appear light (generally white in xeroxes). For AVHRR Band 4 (Figs. 11c, 12b, 13b, and 14b), land is white, coastline is black, warm SST appears as dark shades of gray, and cold SST appears as light shades of gray or white. For cloud masked images (e.g., Figs. 12d, 12f, and 15a-f), cloud and land are black, cold SST now appears as dark shades of gray, and warm SST appears as light shades of gray.
4.3.1. IR Thresholding First, the PCTSMC method was compared with simple IR thresholding methods of cloud screening using all seven test images (Table 6). Five thresholds tests were used: 1)-2) the two static AT tests used by Saunders and Kriebel (1988a) described in Sec. 2.4; 3) an additional piecewise static AT test from Saunders and Kriebel (1988a), which is based on a lookup table of AT4,5 values, which vary coarsely with satellite zenith angle; and 4)-5) the Band 4 SST filters used by Simpson and Humphrey (1990). For each image, these tests were applied on a pixel-by-pixel basis and numbers of pixels rejected were computed (Gallaudet, 1991). Tests 1) and 2) consistently overestimated the number of cloud contaminated pixels in the images tested. These results illustrate the inability of static criteria to account for variations in geometry, geographic region, or sensor degradation and confirmed similar conclusions of Coakley and Bretherton (1982) and Simpson and Humphrey (1990). Test 3) screened only a small amount of high cirrus in a few of the images; consistent with its use by Saunders and Kriebel (1988a) as a secondary screening criterion. Tests 4) and 5) from Simpson and Humphrey (1990) accurately screened high cloud regions; consistent with the fact that optical and geometric criteria provide the main cloud discriminating capability in their algorithm. All these results clearly emphasize that simple IR thresholding techniques cannot be used as primary cloud screening discriminators.
108 Gallaudetand Simpson
4.3.2. The Nighttime Method of Saunders and Kriebel (1988a) The PCTSMC method outperformed the nighttime method of Saunders and Kriebel (1988a) for the same reasons mentioned above. Their method uses the three AT thresholds mentioned above, and adds a local uniformity test in which pixels are masked as cloud contaminated if the local standard deviation (for 3 × 3 pixel arrays) is above a cutoff value of 0.2; this step is applied only 50 km away from the coast. Their method was derived for regions over the North Atlantic and Western Europe. When applied to California current imagery, the method grossly overestimates cloud cover. Images B, C, and D (Figs. 12e, 13e, and 14e) show that the Saunders and Kriebel (1988a) method screens much of the cloud-free region in each image, and also screens regions of large oceanic thermal gradient farther than 50 km off the California coast because the local uniformity test in their method mistakes such regions as cloud boundaries. The Saunders and Kriebel (1988a) algorithm is regionally specific, which partially explains the results obtained with their method. Although the PCTSMC algorithm uses a subset of their criteria in the supervised final step, it is far less sensitive to geographical variations than the Saunders and Kreibel (1988a) method. This results from the manner in which the data are separated in the clustering procedure of the PCTSMC algorithm prior to application of the supervised criteria.
4.3.3. The Spatial Coherence Method of Coakley and Bretherton (1982) This method was not developed for strict cloud screening; rather, it is used to determine clear and cloudy sky radiances and, for single-layered systems, cloud cover. This method was implemented and used to mask cloud-containing pixels for the images used in this study (Fig. 15). A visual intercomparison between the spatial coherence method and the other methods that were evaluated can be made by examining each panel in Figure 15 in relation to the appropriate panels in Figures 3 and 10-14. Statistics for the intercomparisons are given in Table 6. The PCTSMC method also outperformed the spatial coherence method. The spatial coherence method performed poorly for each of the seven test images for several reasons: 1) The method assumes that clouds con-
gregate in layers that persist over large areas, which was not the case for the images used in this paper, and 2) the method assumes that the thermal emission for cloud-free and completely cloud-covered regions exhibits a degree of spatial coherence usually unachieved by partially cloud covered regions. These assumptions do not apply for coastal regions which are often characterized by strong SST gradients (e.g., upwelling). This explains the erroneous masking of ocean in Images B, D, and E (Figs. 15b, 15d, and 15e). Consequently, their method does not exhibit the general applicability of the PCTSMC method. (Note the uniformly thick region along the coastline of each image that was masked by the spatial coherence method; this was due to the 8 × 8 pixel local standard deviation array that is used.)
4.3.4. The Daytime Method of Simpson and Humphrey (1990) The PCTSMC method was also compared with the lcoal dynamic threshold nonlinear Rayleigh (LDTNLR) method for daytime AVHRR imagery (Table 6). Results from the two methods generally compared well for the set of images tested (Figs. 12-14). The LDTNLR method uses optical and geometric criteria [see the Appendix in Simpson and Humphrey (1990)] that are not invoked by the PCTSMC method. In general, the LDTNLR method produced a more conservative cloud mask than the PCTSMC method, a fact consistent with the geometric constraints of the LDTNLR algorithm. Moreover, the LDTNLR method is not subject to the limitations of the PCTSMC method imposed by the occasional systematic noise (i.e., AVHRR Channel 3) associated with sensor malfunction. Image D (Fig. 14) demonstrates the effect of occasionally high noise levels in Channel 3. The PCTSMC segmented image (Fig. 14c) shows that the sensor noise pattern was not separated from the subpixel and thin clouds in the center of the image. These thin clouds were grouped into the same cluster as cloud free regions, and therefore were not masked, while the LDTNLR method produced a more accurate cloud mask under these noise-contaminated conditions (Fig. 14f). Note, however, that prefiltering Channel 3 prior to PCT differencing mitigates this problem; under these conditions, the PCTSMC algorithm produces a robust cloud mask even if the original Channel 3 data were contaminated
Automated Cloud Screening of AVHRR Imagery 109
(Fig. 11). Finally, the PCTSMC method can be used for both daytime and nighttime AVHRR data, whereas the LDTNLR method was designed to be used for daytime data only. 4.4. Advantages of the PCTSMC Method The PCTSMC method has three primary advantages over previous segmentation and nighttime cloud screening approaches for AVHRR imagery. First, the PCTSMC method provides the most robust segmentation procedure yet achieved for AVHRR imagery. Employing both spectral differencing and the PCT (Appendix A) before segmentation optimizes the clustering criterion by minimizing the within-class variation and maximizing the between-class variation (Appendix B). Thus, the spectral classes are separated such that each are as homogeneous as possible and that all are as distinct as possible (as visualized in Fig. 19). Second, to the best of our knowledge, the PCTSMC algorithm is the most efficient clustering algorithm yet developed for infrared imagery. PCT differencing causes this in two ways: 1) by decreasing the number of bands that are operated upon in the segmentation procedure from three to two, and 2) forcing convergence at the first iteration. Third, the PCTSMC algorithm is an objective and robust cloud screening method for nighttime AVHRR data, and, unlike the methods of Saunders and Kriebel (1988a) and Coakley and Bretherton (1982), it works near land and in regions of strong SST gradient.
4.5. Importance of Cloud Segmentation Procedures The ability to accurately and automatically segment clouds in remotely sensed imagery is critically important to a broad range of disciplines in earth science. Clouds significantly affect the net heating of the atmosphere and the underlying ocean-land surface by modifying solar and terrestrial radiation (Ohring and Clapp, 1980). This net radiative heating governs the thermodynamics and dynamics of the atmostphere, which in turn influence the formation and dissipation of clouds [e.g., Matveev, 1984]. The potential feedback effects associated with this cloud-radiation interaction is one of the greatest sources of uncertainty in determining the relation between changes in
(a) IMAGE SPACE AVHRR INFRARED IMAGE T PRIOR TO STEP I
I Sw(5)=
sd'L-
(b)
TRANSFORM SPACE SEGMENTED, TRANSFORMED, DIFFERENCED IMAGE Y' AFTER STEP 5
(c)
0
IMAGE SPACE
CLOUD-MASKED AVHRR IMAGE T' AFTER STEP 5
Figure 19. Conceptualizationof the PCTSMC method. The followingdefinitionsapply: Sw = within-clusterscatter, x-axis= image samples, y-axis= image lines. Note that T is a 3-banded image, Y' is a 2-banded image, and Tr is a 3-banded image. Image space and transform space are defined in the text.
climate and changes in external conditions such as solar radiation and atmospheric carbon dioxide concentration (e.g., Henderson-Sellers, 1984; Ramanathan, 1987). Clouds also affect our ability to remotely sense the properties of the atmosphere, ocean, and land; such observations are needed, for example, in weather prediction (e.g., Pailleux, 1986), oceanography (e.g., Eckstein and Simpson, 1991a,b), and agriculture [e.g., rainfall (Browning, 1986)].
110 Gallaudetand Simpson
Many of the important questions related to global change processes will be addressed using climate models of various types. One of the critical components of all such models is the parameterization of atmospheric radiative transfer processes and, hence, the parameterization of clouds (e.g., Henderson-Sellers, 1984; Ramanathan, 1987). In fact, clouds are the most important transient phenomenon incorporated into climate models (Henderson-Sellers, 1984). Cloud climatologies are used by climate modelers either as initial conditions (e.g., Meleshlo and Wetherland, 1981) and/or to validate predictions of cloud structure (e.g., Hansen et al., 1983). Oceanic cloud climatologies are not as common, global, or certain as their terrestrial counterparts. The widely used cloud climatology of London (1957) was compiled from conventional surface-based observations and the oceanic data is particularly uncertain (Henderson-Sellers, 1984). Most satellite-based oceanic cloud climatologies contain little information on interannual variability in clouds or are limited to brief periods of time and hence may be unrepresentative (HendersonSellers, 1984). The preliminary global oceanic cloud climatology of Hughes and HendersonSellers (1983) has eliminated some uncertainty about variations in oceanic cloud cover but the oceanic case is still relatively unstudied compared to its terrestrial counterpart. Robust segmentation procedures are an important component in the development of both oceanic and terrestrial cloud climatologies. Therefore, image segmentation is highly relevant to international programs such as the International Satellite Cloud Climatology Project (ISCCP), the International Satellite Land Surface Climatology Project (ISLSCCP), and the International Geosphere-Biosphere Program (IGBP).
Additionally, the nature of the Channel 3 noise will be studied in detail. Future implementations of the PCTSMC algorithm may incorporate a filter specifically designed to minimize the effects of noise contamination on algorithm performance. 5. CONCLUSIONS A new algorithm (PCTSMC) for automatic image segmentation has been developed and applied to screen clouds from nighttime AVHRR data. The algorithm is not limited by the fact that clouds, ocean, and land each exhibit different spatially and temporally varying spectral responses. Previous attempts at nighttime cloud screening of AVHRR data have been limited only to specific types of data; comparisons with these methods show that the PCTSMC algorithm is independent of geographic region, cloud type, and cloud size. The sensitivity of the method to algorithm parameters was investigated, and the results were used to optimize the algorithm. The performance of the method is limited only in the case of highly noise-contaminated data. Under these later conditions, daytime AVHRR data should be cloud screened using the LDTNLR algorithm by Simpson and Humphrey (1990). The PCTSMC method is objective, efficient, and applicable to global AVHRR data. Furthermore, the segmentation procedure in the algorithm may be applied to data from different sensors (e.g., CZCS, LANDSAT, GOES) and may be adapted to both cloud and land classification studies.
APPENDIX A: THE PRINCIPAL COMPONENT TRANSFORMATION A.I. Overview
4.6. Future Refinements Future refinements of the PCTSMC algorithm will involve improved labeling. Image C (Fig. 13) is a good example. The segmentation of this image is robust; areas of ocean and levels of cloud were distinctly separated. The labeling routine, however, did not distinguish well between areas of ocean and cloud edges. Improved labeling also has the potential for cloud classification (e.g., type, amount, height) within the PCTSMC algorithm.
In Step 2 of the PCTSMC algorithm, the principal component transformation (PCT) is applied to the AVHRR infrared differenced image. A concise mathematical development of the transformation is given here. Although the PCT is presented within the context of cloud screening an AVHRR differenced image, it is stressed by the authors that this technique may be extended to all forms of multispectral or multitemporal image data and is most commonly applied in the general context
Automated Cloud Screening of AVHRR Imagery 111
of arbitrary multivariate data. A complete treatment of the PCT and its applications is given by Preisendorfer (1988). A.2. Defining the PCT
Consider a 3-banded AVHRR image T consisting of infrared Bands 3, 4, and 5 (T3, 7"4, and Ts). The differenced image D' is a 3-banded image and is obtained by subtracting each band from one of the other two, that is, T3- T4, T4- Ts, T3- Ts. Note, however, that only two of these differenced bands are linearly independent. The 2-banded differenced image D therefore is constructed and contains T3- T4, T3- I"5. These are chosen because this combination contains most of the variance in the differenced bands (Sec. 4.1.1). Each band of D contains NL x NS pixels, where NL is the number of lines and NS is the number of samples in the original image T. Let N be the total number of pixels in one band of the differenced image, such that N= NL x NS. D may be represented as a 2 × N array X, where each row of X corresponds to a single band of D in which each line is concatenated one after the other. Let X be defined as the pretransform differenced image. Each column in the pretransform differenced image may be expressed as a 2 × 1 vector, xi. Each vector xi corresponds to a single spatial point, and each element of x~ corresponds to a single pixel in one of the two differenced image bands. Now define the 2 x 1 mean vector of the X, as 1 n m x = - ~xi, (A.1) Ni=l
in which i designates the ith spatial point in X. The covariance matrix Cx, is then found by 1
C~ -
N
Z (xi- mx)(Xi- mx)~ N-I~=I
(A.2)
The eovariance matrix is one of the most important concepts in the analysis of multispeetral imagery (Riehards, 1986). It is a symmetric positive-definite matrix. For the case of a 2-banded AVHRR differenced image C~ is a 2 × 2 matrix. The elements cij for which i =j represent the variance associated with each band i, whereas those for which i # j represent the covariance between bands i and j. The total variance in D, 021, is the sum of the diagonal elements of C×, o~, = TdCx ),
(A.3)
where Tr denotes the trace operation. These properties of the covariance matrix motivate the development of the PCT. The objective is to develop a transformation of the differenced data X such that there is no interband correlation in the transformed data. This objective then requires that the covariance matrix of the transformed data be diagonal-that is, the off-diagonal terms which represent interband covariance are zero. Let the desired transformation be represented as Y = GX.
(A.4)
Then, one must determine the linear transformation G, subject to the constraint that the covarianee matrix of the transformed differenced image Y is diagonal. Define the transformed covariance matrix as 1
Cy -
N
~] (yi - my)(yi- mu)T, N-1i=I
(A.5)
in which my is the mean vector of the transformed image 1 u 1N my=~ ~y,= SGxi = Gmx. 1¥i=1
(A.6)
Ni=l
Using Eqs. (A.1)-(A.2) and (A.4)-(A.6), the transformed eovariance matrix Cy thus becomes Cy -
1
N
E (Gx, - Gmx)(Gxi - Gmx) r N-li=l 1
--
N ~ G(xi - mx)(xi - m~)rG T, N-li=l Cy = GCxG rr.
(A.7)
The transformation matrix G is determined from (A.7) using eigenvalue decomposition. Let g be an eigenvector of Cx and ~, be its associated eigenvalue, such that Cxg = Xg.
(A.8)
If G is constructed as the matrix whose rows are the transposed eigenvectors gT, and if it is further required that the eigenvectors be orthonormal, such that GG T= GTG = I,
(A.9)
where GT= G-1, then multiplying (A.7) by G-1 yields G - 1Cy = G - 1GCx" G T
112
GaUaudet and Simpson
or, equivalently, GrCv = CxG r.
(A. 10)
C v is diagonal, and each diagonal element of C v is an eigenvalue k,. Thus, Cxgn = )~ngn,
(A.11)
where the subscript n represents the nth eigenvector-eigenvalue pair. The transformation in Eq. (A.4) is such that the row vectors of the transform matrix G are the transposed eigenvectors of the covariance matrix Cx and the resulting covariance matrix of Y is composed of their corresponding eigenvalues. Finally, it should be noted that the PCT may also be computed from the correlation matrix R. The elements of correlation matrix are related to those of the covariance matrix Cx by rij = Cij / ~ ,
where eii and e~ are the variances associated with the ith and jth bands, respectively (Richards, 1986). Using the correlation matrix in the computation of the PCT is equivalent to normalizing the covarianee matrix (Jain and Dubes, 1988). Because the elements in the correlation matrix are sealed by different factors, the eigenveetors of the correlation matrix and eovariance matrix for a given data set are different. Thus, the principal components generated by the correlation matrix and the covarianee matrix are also different (Singh and Harrison, 1985). For cloud screening applications, the covariance implementation of the PCT is recommended.
1. Total variance is preserved. The total variance in the pretransformed differenced image X is equal to that in the transformed image Y. This is expressed as B
?1
o•i= Xcii = XXi i=1
(n.13)
i=1
2. It is the only transformation that generates uncorrelated coefficients. This results from the requirement in Eq. (A.9). Thus, a 2-banded AVHRR differenced image is transformed onto a new coordinate system in which the new bands (or PCs) are uncorrelated and in which the separability of the spectral classes in the data is increased. 3. It minimizes mean square error approximations (Appendix B). Because the majority of the variance in the transformed data is compressed into the first few principal components, the original data can be approximated by these first few components, and the mean square error in this method of approximation is minimized.
APPENDIX B: THE SQUARE ERROR CRITERION
B.1. Overview
A.3. Properties of the PCT The desired transformation Y = GX is the principal component transformation. For a 2-banded AVHRR differenced image, the transformation matrix G is a 2 × 2 matrix of the eigenvectors of Cx, X is a 2 × N pretransform differenced image, and Y is a 2 x N transformed differenced image. Each row of Y corresponds to a NL × NS band in the transform space, in which the rows have been concatenated one after the other; these transformed bands are the principal component images (PCs). The eigenvalues of Cy represent the variance associated with each PC. The transformation is arranged such that Xl >X2> • • " > Xn,
where the largest eigenvalue is hi and corresponds to the PC consisting of the largest percentage of total variance. For the transformed differenced image Y, n = 2. Singh and Harrison (1985) mention three important properties of the PCT:
(A.12)
In Step 3 of the PCTSMC algorithm, two separate forms of the square error criterion are employed as 1) the clustering criterion and 2) the convergence criterion. The square error criterion is also closely related to the split-and-merge criteria. Here, the square error criterion is developed in terms of each of these uses. It is also shown that performing the PCT before clustering (Step 2 of the PCTSMC algorithm) optimizes the square error criterion in these contexts. B.2. Square Error as a Clustering Criterion
Definition. The square error criterion is the most commonly used clustering criterion and is essen-
Automated Cloud Screening of AVHRR Imagery 113
tially an interclass distance metric (Jain and Dubes, 1988). In terms of the PCTSMC algorithm, the square error is minimized at each iteration by assigning every vector yi in the transformed differenced image to that cluster for which the squared Euclidean distance to the cluster mean is minimum. As shown below, minimizing square error (or within-cluster variation) at each iteration is equivalent to maximizing betweencluster variation. At each iteration in the cloud screening algorithm, there exist K clusters which represent a partitioning of the transformed differenced image Y. The notation in Appendix A is also used here, where Y represents the transformed differenced image. Each cluster Ck contains nk vectors, y (/k), where each vector is contained in exactly one cluster, such that K
Z nk = N,
(B.1)
k=t
above also can be formulated in terms of scatter matrices (Freidman and Rubin, 1967). The scatter matrix formulation is presented for two reasons: 1) It represents more concisely the square error criterion, and 2) it demonstrates more clearly the relation of the square error criterion to the PCT. Consider the kth cluster Ck at a given iteration in the PCTSMC algorithm. Ck will consist of nk vectors, denoted by [y(~) y(~)y(3k) • • • y~)]. The scatter matrix for the kth cluster, S(k), is defined as nk
S/k) = ~ (y(~)- m(k))(y(/k/- mIk))T,
(S,5)
i=1
where m/k/ is given by (B.2). The within-cluster scatter matrix Sw is the sum of each of the k cluster scatter matrices: K
and where N = NL x NS is the number of vectors in the transformed differenced image. The mean vector of cluster Ck is given by nk
m(k) = 1 Zy(ik) '
(B.2)
nk i=1
where y(~/ is the ith vector in cluster Ck. The square error for cluster Ck is the sum of the squared Euclidean distances between each vector in Ck and it cluster mean m/k). Because the squared Euclidean distance between vectors is expressed as an inner product of their differences; the square error is written
Sw= ~ s (k/.
(B.6)
k=l
Now consider the entire set of K clusters, [Ckl. The mean vector of the kth cluster is given by Eq. (B.2), while the mean vector over the entire set of clusters is given by 1
m =-
t<
E nkm(k).
(B.7)
Nk=l
The between-cluster scatter matrix SB is defined as the scatter matrix for each of the K cluster means about the mean vector of the entire set: K
SB = E nk(m(k)- m)(m(k) - m) T,
nk
e~ = E (Y(ik) - m(k))T(Y(k) -- m(k))•
(B.3)
i=1
The square error for the entire set of K clusters associated with a given partition is the sum of the within-cluster square errors:
k=l
and application of (B.7) to the expansion of this outer product yields K
SB = ~] nkm(k)m(k)r- Nmm T.
K
E2= E e2.
(B.8)
k=l
(B.4)
k=l
The objective of this criterion is to find a partition of the K clusters in which E ~ is minimized. The square error criterion tries to make the K clusters as compact and separated as possible with regards to the Euclidean distance metric (Jain and Dubes, 1988). Square error in terms of scatter matrices. The definition of the square error criterion given
Finally, the total scatter matrix of the transform differenced image, X, is defined as K
S= ~
nk
~ (y(/k)_ m)(y(/k)_ m)~,
(B.9)
k=li=l
which is identical to the scatter matrix of the entire set of K clusters. Note that S is related to the transformed covariance matrix Cy [Eqs. (A.5) and (B.1)] by S = ( N - 1)Cy.
(B,10)
114
Gallaudet and Simpson
Similarly, the total scatter in the transformed differenced image, SDI, is related to the total variance a~), [Eq. (A.3) and (A.13)] by SDI :
(N-
1)O'21 =
( U - 1)Tr(Cy) = Tr(S).
(B.11) Using Eqs. (B.5), (B.6), (B.8), and (B.9), the three scatter matrices can be related by the following simple equation: S = SR + Sw.
(B.12)
Thus, the total scatter matrix of Y can be divided into the between-cluster and the within-cluster scatter matrices (Devijver and Kittler, 1982). The square error criterion is defined in terms of the scatter matrices given above by representing the "size" of the clusters by the trace operation. The trace of a matrix, which is denoted as Tr, is the sum of the diagonal elements. From Eq. (B.6), this quantity becomes K
Tr(Sw) = Z Tr(S(k))•
(B.13)
The trace operation also can be defined in terms of an inner product. Using Eqs. (B.5) and (B.13), Tr(Sw) also can be expressed as T~(Sw) = E T~ Z (y(/k)_ m(k))(y(k)_ m(~)) k=l
i=1
K nk
= ~ ~ (y(/k)_ m(k))T(y(/k)_ m(k)). k=li=l
With Eqs. (B.3) and (B.15), it is now apparent that the square error criterion is identical to a criterion defined by the trace of Sw: K
Tr(Sw) = ~ e~ k=l
and from (B.4) it follows that Tr(Sw) = E 2.
(B. 14)
Finally, applying the trace operation to Eq. (B.12) yields Zr(S) = Tr(SB) + Vr(Sw).
Other interclass distance metrics. In the case of all clustering criterion functions, it is often desirable to apply a transformation to the data before the clustering is performed. As we show in Sec. B.5, this technique can increase the discriminatory capability of the chosen criterion function. Freidman and Rubin (1967) have shown that the square error criterion defined by the trace operation in Eq. (B.15) is invariant under orthogonal transformations, but not under nonsingular linear transformations. This can be restrictive; thus they developed a number of clustering criteria related to the square error criterion. A thorough comparison of these metrics [e.g., the Tr(S ?vlSB) criterion given by Freidman and Rubin (1967)] and other clustering criteria [e.g., the fuzzy clustering criteria defined by Backer (1978)] is presented in Gallaudet (1991). There is no single guideline for choosing a clustering criteria (Jain and Dubes, 1988). Two general guidelines that Fukunaga and Koontz (1970b) use to evaluate the criteria given by Friedman and Rubin (1967) are: a. Performance: The resulting partition should be along the natural boundaries of the data when such boundaries are well-defined. b. Efficiency: The computational and storage requirements should be both feasible and consistent with the requirements of the application. The square error metric and the other metrics discussed in Gallaudet (1991) were evaluated in terms of these two requirements. The criterion in Eq. (B.15) was chosen for two primary reasons. First, this criterion is computationally efficient. The clustering operation is very demanding; thus an efficient criterion (which must be evaluated for all N vectors in Y during the clustering operation) was an operational imperative. Second, the PCT applied to the data in Step 2 of the PCTSMC algorithm directly optimizes this criterion.
(B.15)
Similar to Eq. (B.11), T~(Sw) is a measure of withincluster variance, and Tr(SB) is a measure of between-cluster variance. Because the total variance is conserved, Eqs. (B.11) and (B.15) necessarily imply that minimizing the within-cluster scatter Tr(Sw) is identical to maximizing the betweencluster scatter Tr(SB) (Jain and Dubes, 1988).
B.3. Square Error as a Convergence Criterion
Minimum square error is also employed as a convergence criterion. The square error clustering criterion results in minimizing the variation within clusters at a given iteration. Thus, the PCTSMC algorithm simply terminates the clustering proce-
Automated Cloud Screening of AVHRR Imagery 115
dure at the iteration for which the within-cluster scatter [Eq. (B.14)] is minimum. This criterion was chosen for two reasons. First, the alternative is to terminate the procedure when absolute convergence is reached (i.e., when no vectors change cluster assignment between successive iterations). Our results, and those by others (e.g., Seddon and Hunt, 1985; Pairman and Kittler, 1986), show that absolute convergence is almost never reached for multispeetral imagery. Second, Seddon and Hunt (1985) demonstrate that the within-cluster scatter is an indication of clustering performance; thus, minimizing within-cluster scatter (or equivalently square error) between successive iterations is a logical choice for a convergence criterion.
B.4. Square Error in Relation to the Split-and-Merge Criteria Similar to the square error criterion, the splitting and merging criteria in the clustering procedure of the PCTSMC algorithm incorporate the squared Euclidean distance metric. Although the square error criterion for a given cluster is the sum of squared Euclidean distances of each vector from the cluster mean [Eq. (B.3)], the merging criterion for two clusters is the squared Euclidean distance between their mean vectors; if this distance is below the merging threshold value TMERCE (Appendix C), the two clusters are merged into one. Similarly, the splitting criterion for a cluster is the squared Euclidean distance between that cluster's maximum and minimum vectors; if this distance is above the splitting threshold value TSPLIT(Appendix C), that cluster is split in half. Because the split-and-merge operations are performed within the clustering procedure, the selection of these criteria logically follows from the nature of the clustering criterion. [See Gallaudet (1991) for a thorough discussion of splitting and merging criteria].
B.5. Square Error under the Principal Component Transformation The PCT is performed on the pretransformed differenced image X prior to clustering for two reasons: 1) The natural groups in the original data are more separated in the transformed space (Devijver and Kittler, 1983), and 2) the PCT
optimizes the square error clustering criterion. The first reason was discussed in Appendix A and is illustrated by Figure 17. The second reason is developed below. [Note that the development normally is clone in the context of minimizing the square error between the original data (e.g., the AVHRR infrared image T in Appendix A) and a truncated orthogonal expansion of the data. The PCT minimizes the square error for such an expansion (Preisendorfer, 1988), and this property of the PCT has been exploited in the areas of data compression and feature selection in pattern recognition (Devijver and Kittler, 1982). In the PCTSMC algorithm, however, the PCT is performed in order to enhance the segmentation procedure through optimization of the clustering criterion.] Consider the clustering criterion represented in Eq. (B.15), but now let the scatter matrices defined by Eqs. (B.6), (B.8), and (B.9) correspond to the scatter matrices of the pretransformed differenced image X. The objective is to find a nonsingular orthogonal transformation G of the original data X such that the criterion in (B.15) is optimized. Designate the criterion in (B.15) as J: J(X) = Tr(S) = Tr(Sw) + Tr(SB),
(B.16)
which simultaneously minimizes the square error Tr(Sw), and maximizes Tr(SB). The transformation that we seek will be applied as Y=GX,
(B.17)
with the requirement that G transform the data X so as to optimize J(X)= TdS ). In other words, we desire the transformed criterion j(G) = Tr(GSG ~)
= Tr(GSwG T) + Tr(GSBGT).
(B.18)
This problem can be formulated as an optimization problem (or linear program). In this framework, any number of Q constraints are imposed on the transformation G such that
Tr(GCjG T) = Tr(bjbf), j = 1, 2, 3 . . . . . Q, (B.19) where Cj is a D x D matrix of linear constraints on G, bj is a D-dimensional vector of constants, and D is the dimension of the transform G. We may then find G by optimizing the square error criterion j(G) in the transformed space, subject to
116
Gallaudet and Simpson
the constraints in Eq. (B.19). Thus, the needed objective function f(G) is given by k
f(G) = J(G) - Z Tr(AjGCsGT - Ajbjbf). j=l
(S.20) where Aj is the D x D matrix of unknown Lagrange multipliers associated with the jth constraint. Only one constraint is required for this transformation: that G be orthogonal [Eq. (A.9)]. Thus, from Eq. (S.19), Q = 1, Cj= I, and
eigenvectors are identical (Preisendorfer, 1988). In Appendix A, it was shown that the PCT is defined by the eigenvectors of the covariance matrix Cx. Thus, the PCT results in optimizing the square error clustering criterion. This theoretical optimization implies that the combination of PC differencing and the square error criterion in the PCTSMC algorithm produces two desired results: 1) Each of the clusters are as homogeneous as possible; 2) the entire set of clusters are as distinct as possible.
bj=[1 1 1 . . . 1]~. With only one constraint imposed on G, the optimization problem of (B. 18) may be solved analytically by setting the gradient of f(G) with respect to G equal to zero. From Eq. (B.20), this is Vf(G) = (Sw + SB)G r - AG r = 0.
(B.21)
From this, G is found by solving the eigenvalue problem (Sw + SB)G ~c- AG r = 0.
(B.22)
Thus, the rows of the desired transformation G, which optimize the criterion in Eq. (B.15), are the transposed eigenvectors of the scatter matrix
APPENDIX C: DYNAMIC DETERMINATION OF CLUSTERING PARAMETERS C.I. The Clustering Initialization Method C. 1.1. Initialization
The clustering procedure in the PCTSMC algorithm is initialized by selecting a uniformly distributed set of initial cluster center (or mean) vectors; these are assigned as m (k)= YMIN+ ( (k --1)/(YMAX -- YMIN), (C.1)
\(K,- 1)/
S= Sw + Ss.
Finally, note that the scatter matrix S of the pretransform difference image X is related to the covariance matrix Cx by S = ( N - 1)Cx,
(B.23)
which is the pretransofrm case of Eq. (B.8). Since the two matrices differ only by a constant, their
where Kl is the number of initial cluster centers and k = 1, 2 . . . . . K~, m (k)is the kth cluster center [Eq. (B.2)], and YMAX and YMIN are the maximum and minimum vectors over the transformed differenced image E These initial cluster mean vectors are used in the first iteration of the clustering procedure (Sec. 2.3). There exist a wide vari-
Table 10. Sensitivity Analysis of the Clustering P r o c e d u r e in the P C T S M C Algorithm (Representative Results for P a r e n t Test Image A and Eight Subimages) a Differenced Image Scaled Variance, Image A A1 A3 A6 A7 A8 A9 A10 A15
0.584 0.233 1.770 15.802 0.107 4.632 0.0012 0.050 3.242
Converged b Value of Within-Cluster Variance, c
Percent Total Variance in the Converged Value of Within-Cluster Variance, c
T~(Sw)/~V- 1)
~dsw)/~- 1)a~1]× loo%
Converged b Value of lnitial Number of Clusters, KI
0.564 0.068 0.052 2.390 0.007 0.270 0.0003 0.009 0.648
0.188 0.226 0.171 0.472 0.132 0.116 0.405 0.315 0.144
40 60 23 20 80 30 600 250 24
a The meaning and significance of these data are discussed in Appendix C, and provide a justification for the dynamic initialization procedure which is defined in Appendix C. ~A value of 0.2% of total variance in the differenced image away from the converged value at Ks = 1000. c As measured in the PC transform space.
Automated Cloud Screening of AVHRR Imagery 117
ety of methods to initialize clustering techniques (see Gallaudet, 1991). Like the clustering criterion, there is no theoretically optimal method. The most suitable method of initialization depends upon both the nature of the data and the application (Jain and Dubes, 1988). The method (C.1) was chosen for two primary reasons: 1) It is compntationally efficient; and 2) of all clustering techniques, the split-and-merge technique is the least dependent on initialization (Seddon and Hunt, 1985; Jain and Dubes, 1988). Therefore, a complex method of initialization is not necessary.
C. 1.2. Algorithm Sensitivity to Initialization Although sensitivity to the initial clustering is minimized by the split-and-merge technique, initialization still influenced both segmentation and cloud screening performance in the PCTSMC algorithm (Sec. 4.1.3). Here, a more detailed discussion of this sensitivity is presented, and a method to dynamically determine the optimal initialization condition (i.e., K~) is given. Twelve images were used to investigate the sensitivity of the PCTSMC algorithm. The withincluster variance, or Tr(Sw) (Appendix B) in the final partition of the clustering procedure (Fig. 6) was the first parameter used to measure the segmentation performance sensitivty (Seddon and Hunt, 1985). The second parameter was the final number of clusters, Ke, in the segmentation [results shown in Gallaudet (1991)]; this measured clustering operation sensitivity (i.e., separation along natural data boundaries) and was not intended to reflect performance dependencies. Finally, the number of pixels rejected as cloud contaminated was determined (Fig. 7); this parameter measured cloud screening performance sensitivity. The results show that the segmentation performance was influenced only for very small values of K~, while the cloud screening performance was influenced for a greater range of values. Specific results for the within-cluster variance, Tr(Sw), or square error (Fig. 6), demonstrate that the clustering procedure produced robust segmentation for all but very low values of KI. The low values of KI (e.g., < 15) imposed too much erroneous structure on the initial set of cluster centers, [m(t')]. Therefore, the clustering procedure was not able to completely separate the data into its naturally occurring classes and produced less robust segmentations (i.e., with
higher square errors). Above these low values, however, the within-cluster variance for each test image asymptotically converged to a neighborhood of values that were typically less than 0.5 % of the total variance in transformed differenced image (Table 10). The sensitivity of the final number of clusters, KF, was different in nature to that of Tr(Sw). Typically, KF increased with KI (note, not an exact one-to-one correspondence due to the nature of the split-and-merge clustering). Thus, the clustering procedure generally produced a segmentation in which the separation of the clusters was optimized in terms of Tr(Sw), and in which KF was near KI (Fig. 6; Appendix B). Results for the number of pixels rejected, however, showed that the cloud screening performance was much more sensitive to clustering initialization (Fig. 7). Low values of KI produced the least robust results, similar to the Tr(Sw) results. Compare the values of the number of pixels rejected for each value of KI to the values obtained using the LDTNLR method (represented by the dashed lines in Fig. 7). The shape of the plots suggests that the labeling procedure (Step 4) in the PCTSMC algorithm was not as successful as the clustering procedure (Step 3). Although the within-cluster variances converged smoothly to small values, the number of pixels rejected for each image oscillated between different Kt's, converging to a neighborhood of values for some images. This conclusion is supported by the observation that the oscillations were most pronounced for data with the highest values of scaled variance [variance in the transformed differenced image was scaled by the size of the image as tr ~I / ~/r~,where aDt 2 is defined in Eqs. (A.3) and (A.13) and where N is the number of pixels in the AVHRR infrared image T in Appendix A]. Thus, the data with the greatest structure, or variance (Fig. 7d), were the most difficult to label accurately.
C.1.3. An Optimal Initialization Method Initialization for split-and-merge methods has not been completely explored by previous investigators (e.g., Seddon and Hunt, 1985; Cross et al., 1988). Based on the results discussed above, an optimal method of initialization which dynamically determines the value of Kz has been developed. Although the general objective was to opti-
118 Gallaudet and Simpson
mize the segmentation procedure, Appendix B and the above results motivate a more specific objective: Select KI such that the square error in the segmental data, Tr(Sw), is minimized, subject to the single constraint that extremely high values of K1 are not allowed. Large values of K1 would preclude operational implementation of the PCTSMC algorithm (Sec. 4.2.3). Close examination of the sensitivity tests (e.g., Fig. 6) showed that Tr(Sw) never reached absolute convergence for the range of Kl's tested. [Note that convergence in this section only applies to the initialization results represented in Fig. 6 and is completely distinct from the Tr(Sw) clustering convergence criterion described in Sec. 4.1.3 and Appendix B.] Approximate convergence in the initialization plots was defined as when Tr(Sw) was within 0.2% of a2i of Tr(Sw) at K~= 1000. This value was selected because it produced a segmentation that was both statistically robust [i.e., Tr(Sw) was still on the order of 0.6% of the total variance a 2I] and operationally feasible with regard to efficiency (i.e., K~ < 50 for a 512 × 512 pixel image). Under this definition of convergence, significant observations included: 1) Tr(Sw) typically approached its converged value at 20 ~
(C.2)
where integer [z] denotes the integer value of z, Xl = aes2rbDi / ~/N, fl(xl) = Kt, and 3' is a constant.
The value of 3' provides the lower bound on K~ for high variance data. Results show that 3' = 30 produces robust segmentations with respect to the convergence criterion defined in this appendix. An operationally imposed upper bound is required as well. Thus, in the case of extremely low variance data, we apply a cutoff value of KI that is scaled from the size of the image; this is implemented as
f2(x2) = integer[x2~/~],
(C.3)
where integer [z] again denotes the integer value of z, x2 = 4 ~ is the image dimension, f2(x2)= KI, and K is a constant, x = 30 also provided efficient and robust segmentations for the K1's generated by Eq. (C.3). The final form of the initialization method is written
Kl = minimum[fl(Xl),f2(x2)].
(C.4)
This approximation thus results in an initialization method for which Tr(Sw) is minimized, subject to the constraint that Kl cannot exceed extremely large values. Thus, an image-specific, computationally efficient KI is automatically selected which optimizes the segmentation procedure [i.e., minimum Tr(Sw)].
C.2. The Split-and-Merge Clustering Thresholds Automatic determination of the split-and-merge thresholds in the clustering procedure (Sec. B.4) also was developed. Because the choice of these thresholds has also not been fully explored by previous investigators (e.g., Chen and Pavlidis, 1980; Seddon and Hunt, 1985; Cross et al. 1988), sensitivity of the PCTSMC algorithm to TSPLIT and TMEaGE was studied (See. 4.1.3). For each image tested, an array of thresholds was evaluated where each was scaled by the squared Euclidean distance between the maximum and minimum vectors in the data. Recall that the squared Euclidean distance between vectors can be expressed as the inner product of their difference [e.g., Eq. (B.3)]. Let the difference between the maximum and minimum vectors in the transformed differenced image Y be YM-m- [YMAX-- YMIN].
(C.5)
Thus, the squared Euclidean distance between
Automated Cloud Screening of AVHRR Imagery 119
the maximum and minimum vectors in Y, denoted by d, may be defined as
d = (YM-m,YM-m) = [YMAX-- YMIN]r[YMAX-- YMIN].
(C.6)
The split-and-merge thresholds for each image spanned the following range: from a value of 0.0001 × d to a value of 1.5 x d. Data from Image A15 (Fig. 8) provides a concrete example. The value of d for this image was 6593.295°C 2 (in the PC-differenced space). Thus, the thresholds ranged from 0.65933°C e to 9,889.9°C 2. Typical splitting and merging distances (defined in Sec. B.4) in the PC-differenced data were completely included in this range. For Image A15, withincluster splitting distances ranged from 1.44°C 2 to 72.63°C 2, while between-cluster merging distances ranged from 4.50°C z to 1,738.72°C z. Results show that performance varied only for those values of TSPLIT and TMERCE< 6.593°C 2. At larger values, the results stabilized to a single level (the fiat region in Fig. 8) despite the fact that the number of split-and-merge operations varied. The largest variations occurred for thresholds < 0.6593°C 2. Thus, performance was largely insensitive to the values chosen, with the largest variations occurring for thresholds on the order of the sensor temperature resolution (Sec. 4.1.3). Similar results were obtained for all of the images tested, and these results were independent of image size and variance. Furthermore, the level at which performance stabilized corresponded to a minimum in the percent within-cluster variance (Fig. 8d, Table 4a); thus, the split-and-merge clustering procedure converged to an optimal segmentation. These results motivated a simple method of automatic threshold determination. The single consideration was to avoid thresholds lower than a value of 0.01 x d. Therefore, both the split-andmerge thresholds are set as follows: (TsPLIT, TMERGE) = ")' × (YM- m, YM-m) = "Y× d,
NASA and ONR grants. Satellite data were provided by the Scripps Satellite Oceanography Center. All work was done in ]. J. S.'s Image Analysis Laboratory. Lance A1-Rawi, John Toman, and Darrin Atkinson assisted with computer programming. Guy Trapper and Nancy Hulbirt assisted with the figure preparation. T. C. G. was sponsored by the Oceanographer of the Navy as a graduate; J. J. S. was his thesis advisor. Part of this work was also supported by a grant front the Admiral Nimitz Foundation.
REFERENCES
Anderberg, M R, (1973), Cluster Analysis for Applications, Academic, New York. Backer, E. (1978), Cluster Analysis by Optimal Decomposition of Induced Fuzzy Sets, Delft University Press, Delft, The Netherlands. Ball, G. H., and Hall, D. J. (1965), A Novel Method of Data Analysis and Pattern Classification, Stanford Research Institute, Menlo Park, CA. Bernstein, R. L. (1982), Sea surface temperature estimation using the NOAA-6 Satellite advanced very high resolution radiometer, J. Geophys. Res. 87:9455-9465. Brice, C. R., and Fennema, C. L. (1970), Scene analysis using regions, Artif Intell. 1:205-222. Browning, K. A. (1986), Use of radar and satellite imagery for the measurement and short-term prediction of rainfall in the United Kingdom, in Remote Sensing Applications in Meteorology and Climatology (R. A. Vaughn, Ed.), NATO ASI Series, Vol. 201,480 pp. Cheevasuvit, F., Maitre, H., and Vidal-Madjar, D. (1986), A robust method for picture segmentation based on a split and merge procedure, Comput. Vis. Graph. Image Process. 34:268-281. Chen, P. C., and Pavlidis, T. (1980), Image segmentation as an estimation problem, Comput. Vis. Graph. Image Process. 12:153-172. Coakley, J. A,, Jr., and Bretherton, F. E (1982), Cloud cover from high resolution scanner data: detecting and allowing for partially filled fields of view, J. Geophys. Res. 87:49174932. Cross, A. M., Mason, D. C., and Drury, S. J. (1988), Segmentation of remotely sensed images by a split and merge process, Int. J. Remote Sens. 9:1329-1345.
(c7)
Dalu, G. (1985), Emittance effect on the remotely sensed sea surface temperature, Int. J. Remote Sens. 6:733-740.
where 3, = 0.5. This method ensures that the splitand-merge operations result in an optimal segmentation.
Devijver, P. A., and Kittler, J. (1982), Pattern Recognition: A Statistical Approach, Prentice-Hall, Englewood Cliffs, NJ.
This work was sponsored by the Marine Life Research Group (MLRG) of the Scripps Institution of Oceanography and by
Duggin, M. J. (1985), Factors limiting the discrimination and quantification of terrestrial features using remotely sensed radiance, Int. J. Remote Sens. 6:3-27. Eckstein, B. A., and Simpson, J. J, (1991a), Aerosol and
120 Gallaudet and Simpson
Rayleigh radiance contributions to coastal zone color zone scanner images, Int. J. Remote Sens. 1:135-168.
Jain, A. K., and Dubes, R. C. (1988), Algorithms for Clustering Data, Prentice-Hall, Englewood Cliffs, NJ.
Eckstein, B. A., and Simpson, J. J. (1991b), Cloud screening coastal zone color zone scanner images using Channel 5, Int. J. Remote Sens., forthcoming.
Jensen, J. R. (1986), Introductory Digital Image Processing, Prentice-Hall, Englewood Cliffs, NJ.
England, C. F., and Hunt, G. E. (1985), A bispectral method for the automatic determination of parameters for use in satellite cloud retrieval, Int. J. Remote Sens. 6:1545-1553. Foody, G. M. (1988), The effects of viewing geometry on image classification, Int. J. Remote Sens. 9:1909-1915. Freidman, H. P., and Rubin, J. (1967), On some invariant criteria for grouping data, Am. Stats. Assoc. J. 62:1159. Fukada, Y. (1980), Spatial clustering procedures for region analysis, Pattern Recognition 12:395-403. Fukunaga, K., and Koontz, W. L. G. (1970a), Application of the Karhunen-Loeve expansion to feature selection and ordering, IEEE Trans. Comput. 19:311-318. Fukunaga, K., and Koontz, W. L. G. (1970b), A criterion and algorithm for grouping data, IEEE Trans. Comput. 19:917-923. Gallaudet, T. C. (1991) The application of satellite imagery to the analysis of mesoscale sea surface temperature variability and near surface acoustic structure in the California Current system. M.S. thesis, Scripps Inst. of Oceanography, La Jolla, CA, J. J. Simpson, Thesis Advisor. Gillispie, A. R., Kahle, A. B., and Walker, R. E. (1986), Color enhancement of highly correlated images: I Decorrelation and HSI contrast stretch, Remote Sens. Environ. 20:209235. Hansen, J., Russell, G., Rind, D., et al. (1983), Efficient three-dimensional global climate models for climate studies: Models I and II, Mon. Weath. Rev. 11:609-622. Haralick, R. M., and Shapiro, L. (1985), Survey: image segmentation techniques, Comput. Vis. Graph. Image Process. 29:100-132. Harlow, C. A., and Eisenbeis, S. A. (1973), The analysis of radiographic images, IEEE Trans. Computer. 7:678-689. Henderson-Sellers, A. (1982), Defogging cloud determination algorithms, Nature 298:419-420.
Karlsson, K. G. (1990), The use of cloud and precipitation information, derived from multi-spectral analysis of AVHRR data, in nowcasting and for models, Preprints of the 5th Conf. on Satellite Meteorology and Oceanography, 3-7 Sep. 1990, London, AMS, Providence. Kettig, R. L., and Landgrebe, D. A. (1976), Classification of multispectral image data by extraction and classification of homogeneous objects, IEEE Trans. Geosci. Electron. SE-14:19-26. Liljas, E. (1987), Multispectral classification of cloud, fog, and haze, in Remote Sensing Applications in Meteorology and Climatology ( R. A. Vaughn, Ed.), NATO ASI Series, Vol. 201,480 pp. Lillesand, T. M., and Kiefer, R. W. (1987), Remote Sensing and Image Interpretation, 2nd ed., Wiley, New York. London, J. (1957), A study of atmospheric heat balance, Report Contract AF19(122)-165, College of Engineering, New York University, NTIS No. 117227. Mather, P. M. (1987), Computer Processing of RemotelySensed Images: An Introduction, Wiley, New York. Matveev, L. T. (1984), Computer Processing of Remotely Sensed Images: An Introduction, Wiley, New York. Meleshlo V. P., and Wetherland, R. T. (1981), The effects of a geographical cloud distribution on climate: a numerical experiment with an atmospheric general circulation model, J. Geophys. Res. 86:11,995-12,014. Muerle, J. L., and Allen, D. C. (1968), Experimental evaluation of techniques for automatic segmentation of objects in a complex scene, in Pictorial Pattern Recognition (G. Cheng et al., Eds.), Thompson, Washington, DC, pp. 3-13. Nelson, C. S., and Husby, D. M. (1983), Climatology and surface heat fluxes over the California Current region, NOAA Technical Rep. NMFS SSRF-763, U.S. Dept. of Commerce, Washington, DC.
Hickey, B. M. (1979), The California Current-hypothesis and facts, Prog. Oceanog. 8:191-279.
Ohlander, R., Price, K., and Reddy, D. R. (1978), Picture segmentation using a recursive region splitting method, Comput. Graph. Pattern Recognition 8:313-333.
Horowitz, S. L., and Pavlidis, T. (1974), Picture segmentation by a directed split-and-merge procedure, Proc. 2nd Int. Joint Conf. Pattern Recognition, pp. 424-433.
Ohring, G., and Clapp, P. F. (1980), The effects of changes in cloud amount on the net radiation at the top of the atmosphere, J. Atmos. Sci. 37:447-454.
Horowitz, S. L., and Pavlidis, T. (1976), Picture segmentation by a tree traversal algorithm, J. Assoc. Comput. Mach. 23: 368-388.
Pailleux, E S. (1986), The impact of satellite data on global numerical weather prediction, in Remote Sensing Applications in Meteorology and Climatology (R. A. Vaughn, Ed.), NATO ASI cries, Vol. 201,480 pp.
Hughes, N. A., and Henderson-Sellers, A. (1983), A preliminary global oceanic cloud climatology from satellite albedo observations, J. Geophys. Res. 88:1475-1483. Ingerbritsen, S. E., and Lyon, J. P. (1985), Principal component analysis of multitemporal image pairs, Int. J. Remote Sens. 6:687-696.
Pairman, D., and Kittler, J. (1986), Clustering algorithms for use with images of clouds, Int J. Remote Sens. 7:855-866. Pavlidis, T. (1972), Segmentation of pictures and maps through functional approximation, Comput. Graph. Image Process. 1:360-372.
Automated Cloud Screening of AVHRR Imagery
121
Pavlidis, T. (1973), Waveform segmentation through functional approximation, IEEE Trans. Comput. C-22 (Jul.) 237-346.
Seddon, A. M., and Hunt, G. E. (1985), Segmentation of clouds using cluster analysis, Int. J. Remote Sens. 6:717731.
Pavlidis, T. (1976), The use of algorithms of piecewise approximations for picture processing applications, ACM Trans. Math. Software 2:305-321.
Shenk, W. E., and Salomonson, V. V. (1972), A simulation study exploring the effects of sensor spatial resolution on estimates of cloud cover from satellites. J. Appl. Meteorol. 12:214-220.
Pavlidis, T. (1977), Structural Pattern Recognition, SpringerVerlag, Berlin. Preisendorfer, R. W. (1988), Principal Component Analysis in Meteorology and Oceanography, Elsevier, Amsterdam.
Simpson, J. J. (1990), On the accurate detection and enhancement of features observed in satellite data, Remote Sensing Environ. 33:17-33.
Ramanathan, V. (1987), The role of earth radiation budget studies in climate and global circulation research, J. Geophys. Res. 92:4075-4095.
Simpson, J. J. (1991), Image masking using polygon fill and morphological operations, Remote Sensing Environ. forthcoming.
Richards, John A. (1986), Remote Sensing Digital Image Analysis: An Introduction, Springer-Verlag, New York.
Simpson, J. J., and Humphrey, C. (1990), An automated cloud screening algorithm for daytime AVHRR imagery, J. Geophys. Res. 95:13459-13481.
Sabins, F., Jr. (1987), Remote Sensing: Principles and Interpretation. 2nd ed., Freeman, New York. Saunders, R. W., and Kriebel, K. T. (1988a), An improved method for detecting clear sky and cloudy radiances from AVHRR data, Int. J. Remote Sens. 9:123-150. Saunders, R. W., and Kriebel, K. T. (1988b), Errata (re: An improved method for detecting clear sky and cloudy radiances from AVHRR data, Int. J. Remote Sens. 9:123150), Int. J. Remote Sens. 9:1393-1394. Schowengerdt, R. A. (1983), Techniques for Image Processing and Classification in Remote Sensing, Academic, New York.
Singh, A., and Harrison, A. (1985), Standardized principal components, Int. J. Remote Sens. 6:883-896. U.S. Navy (1981), U.S. Navy Marine Climatic Atlas of the World, Vol. IX. World-Wide Means and Standard Deviations, Naval Oceanogr. Command Detachment, Asheville, NC, May. Wacker, A, G. (1972), The minimum distance approach to classification, Ph.D. thesis, Purdue University.