Definin Anatomical Medica f Images
Structures
from
EdwardL. Chany and StephenM. P&r
A
number of important decisions in threedimensional (3D) treatment planning’ and verification rely on information obtained from images. Much of this information is obtained through localization2 and quantitation3 ofvolumes of interest, 3D display,4 image registration and correlation,5,6 and image enhancement.’ Volume localization and quantitation are necessary for incorporating inhomogeneity corrections into dose calculations,* computing dose-volume histograms,g quantitating tumor response to therapy,3 and designing field shapes. Three-dimensional display is important because complete understanding of a treatment plan requires comprehension of the spatial relationships among anatomical structures, radiation beams, and dose distributions. Registration and correlation of images from different modalities, such as computed tomography (CT) and magnetic resonance (MR) images, can improve the accuracy of localizing the tumor and other tissues of interest, and registration of images taken at different times using the same modality is important for monitoring tumor response or progress. Registration of digitized simulation images with electronic portal images may be useful for verifying patient set-up before each treatment fraction.lOJ1 Image enhancement improves the efficiency of interpretationi2 and even the accuracy of interpretation as shown by one observer study using port films.i3 Clearly, computers are required to handle and extract information from images as described previously. However, the images must first be preprocessed to define anatomical structures of special interest because, unlike a trained human observer, treatment-planning computers cannot recognize
From the Departmentsof Radiation Onmlogvand ComputerScience, Univmi& OfNmth Cm&a, ChapelHill, NC. Supportedin part .$y National Institutes of Health grant no. PO1 CA47982, National CancerInstitute Contractno. NOl-CM-97565, and & a sabbatical grant jam the organi&ion Nederlandr Wetenxhappelijk On&r.wk, of theDutch gouemment. Addressrepr& requests to EdwardL. Chany, PhD, Radiation Oncolo~ Department,CB 7515 University of North Carolina, Chapel Hill, NC 27599-7512. CoPYright0 1992 b WB. SaundersCompany 1053-4296/92/0204-000l0
anatomy. To a computer, a medical image is nothing more than an array of numbers that give the spatial coordinates and “intensity” of each voxel (volume element) in the image. To handle images in ways necessary for treatment planning and verification, the computer representation of each voxel must include a label that identifies the anatomical structure or the set of structures to which it belongs. (As will be discussed, a voxel can be assigned to more than one structure, with a fractional weight associated with each assignment.) Once structures have been defined in this way, volume determination of an organ, tumor, or target is in principle as simple as counting the appropriately labeledvoxels, or integrating the assigned weights in the case of multiple assignment, and multiplying that number by the volume of each voxel. Likewise, in an interactive 3D display,i4 anatomical structures can be manipulated individually, eg, turned on or off, selectively colored or shaded, or moved, when the computer knows which voxels are affected by a particular user command. Image registration based on anatomical landmarks and anatomy-specific enhancement may also require the computer to recognize the anatomical identities of voxels. Threedimensional treatment planning depends heavily on volume displays and calculations based on volumes to convey information to the radiation oncologist, physicist, and dosimetrist. The accuracy and reproducibility2J5-l7 of the methods for creating these volumes are fundamental limitations of current treatment-planning systems. Slice-by-slice manual contouring, which is extremely labor-intensive, and automatic edge-detection, which has a high failure rate and requires human intervention, are representative of the current standard of practice. These methods create hard edges that often do not really exist. Even when edges do exist, their location cannot be determined from medical images with the high degree of accuracy and precision implied by a pixel-thin contour. Hard-surface volumes computed from edges yield 3D displays and volume calculations that are misleading because of the implied certainty of the surface location. The implied certainty is even more misleading when user variability and differ-
Seminarsin Radiation Oncoloo, Vol.2,No 4 (Octob), 1992:pp 215-225
216
Challey and&-r
ences between user- and computer-guided approaches are considered, raising questions about the reliability of critical decisions and calculations based on hard-surface volumes. More efficient and reproducible approaches that take into account the uncertainty of edges need to be investigated. Terminology, concepts, and fundamental approaches associated with defining objects from images, with emphasis on their relevance to treatment planning and verification, are discussed in the remainder of this article. Basic issues related to developing better methods are presented, followed by descrip tions of existing and evolving techniques. The objective is to inform the reader at a level that does not require complex mathematical details.
Object Definition concepts
and Related
Object Definition Labeling voxels that define an anatomical structure of special interest is one step in a more general task, object definition, commonly associated with computer vision. Object definition can be separated into three distinct steps. These steps are not always obvious during treatment planning because the details are usually concealed by the computer programs used for defining anatomical structures. However, the distinctions between these steps are important for an understanding of computer-aided methods. The first step is segmentation, identifying groups or clusters of voxels that comprise regions of whole objects. Drawing contours around a kidney, for example, on a slice-by-slice basis is one way of manually creating regions of the whole kidney. The next step is to group all regions that belong to the same object, such as grouping all the contours of a kidney. The third step is to tag all the voxels within the grouped regions with the correct object label. With current technology, one or both of the first two steps requires decisions by a trained user. The third step, voxel labeling, is automatically completed by the computer at no expense to the user after the name of the object is supplied. Two general methods for segmentation are the user-guided, interactive approach and automatic segmentation. In the user-guided approach, regions are defined with the aid of an interactive program, which typically combines slice-by-slice semiautomatic and manual contouring. I8 In automatic segmentation, regions are defined without human intervention. Some methods create links between regions, or
region hierarchies, I9 that can be exploited when regions are grouped to define a complete object. The efficiency of user-guided 3D segmentation depends a great deal on the design of the user interface, but can be extremely labor-intensive. For contouring methods, it is not unusual to spend up to several hours on segmentation for a plan involving 50 or more slices. In this approach, grouping regions that belong together is straightforward because the user can pre-assign contours to a specific object. If segmentation is accomplished using computer vision or another automatic approach, the resulting regions are not pre-assigned and need to be grouped together. In principle, this step can be performed by a computer that understands anatomical shapes and their variants, an approach currently under investigation20 but beyond the state of the art. At present, the regions must be interactively grouped.2’ As with user-guided segmentation, the net expense in terms of personnel time is highly dependent on the design of the user interface. Even so, automatic segmentation followed by interactive grouping can offer a significant time savings compared with interactive segmentation and pre-assigned grouping. The segmentation pass of an object definition method usually produces “object-related” regions, ie, close approximations of the desired regions. Editing is often required to produce the desired shapes. Thus object definition programs usually incorporate interactive editing functions that allow the user to modify regions or relationships between regions. As computer vision techniques improve, the need for editing functions will decrease and perhaps disappear. The following discussion assumes the use of CT scans for object definition, the current standard practice. Special considerations involving other modalities are mentioned when important. Objects Used in Treatment
Planning
Two general classes of objects are commonly defined from images for treatment-planning purposes. One class includes normal anatomical structures, which are more or less clearly discernible to the trained observer. The other class includes objects that are conceptual or otherwise not clearly distinguishable in the image, such as tumor and target volumes. Unlike most normal anatomical structures, tumors often have irregular shapes and infiltrate into adjacent tissues, resulting in poorly defined boundary regions and poor image contrast. Also, it is sometimes difficult to distinguish surrounding edema or infected normal tissue from tumor. Contrast agents
Dqfiniq Anatomical Structuresfim Ima,p.s
that preferentially concentrate in some tissues,including tumors, can mitigate unfavorable subject contrast. However, this is not always true, especially when factors affecting the relative concentration of contrast material vary, such as microvasculature and cell state, or when microscopic extension is present. Defining objects in this class usually relies more heavily on human judgment, reducing or eliminating the ability to rely on automatic or semi-automatic object definition. An exception would be a target volume that is defined by protocol in such a way that it can be generated by a computer from relevant information such as tumor pathology, location, grade, size, or distant involvement.** Two- Versus Three-Dimensional Definition
Object
The use of the term “voxel” in the preceding discussion implies that object definition is accomplished using a 3D approach. In fact, in present practice this is usually not the case. Typically, regions are created on a slice-by-slice basis. However, the results are often presented in 3D formats, eg, a display or volume calculation, obscuring the 2D nature of the definition method. Slice-by-slice segmentation is well suited for userguided definition. Furthermore, it can be argued that user-guided definition is effectively 3D because trained observers take into account information on nearby slices that affects decisions being made on the current working slice. However, this argument is valid only to a certain extent. Subtle details can go unnoticed when an image is viewed in slice format alone compared with viewing it as slices together with a high-quality interactive 3D rendering of the image data. Moreover, hand-drawn object contours that are produced slice by slice frequently fail to correspond smoothly between slices. The distinction between 2D and 3D approaches is more significant for automatic segmentation methods. Most present implementations analyze an image slice by slice. Main memory storage requirements, computing speed, and programming convenience are major factors favoring this approach. Standard image formats represent a CT study as a set of 2D arrays of numbers. It is simpler from a programming point ofview, the computations are faster, and the storage in main memory at any time can be more limited when analysis is confined to one 2D-array, ie, one slice, at a time. However, unlike user-guided segmentation, information on other slices is not considered even though this information is just
217
as important for determining the assignment of a particular pixel (picture element) or voxel. In defense of 2D approaches, only a few segmentation methods that consider the full-image data set for assignment of a voxel to a region have yet been reported to be clinically practicaLz3 However, more 3D implementations of computer-vision methods are approaching clinical practicality, and recent results indicate that with them objects can be defined within clinically required accuracy, with less human interaction compared with current practice of slice-by-slice analysis.24This is primarily because of the following: (1) voxels rather than pixels are grouped to create regions, reducing the number of regions created and the number ofoperations required by human interaction to define a complete object; and (2) methods that generate hierarchical relationships among regions simplify user interaction. These relationships create links between regions, and groups of regions, based on common geometric properties in such a way that meaningful objects, such as anatomical structures, can be easily created by activating and editing links in the hierarchy, as illustrated later. In addition, 3D methods may reduce the need for region editing because incorporation of 3D information in segmentation decisions produces regions that are more likely to be well-defined pieces of a single object, rather than regions that contain fragments from more than one object. To maintain consistent terminology, the term “voxel” is used liberally throughout the remainder of this section even when the operations being discussed may be pixel-oriented. Context-Free and Context-Related Segmentation When a voxel is assigned to a region during segmentation, the assignment can be based on information in the immediate vicinity of the voxel, or on information gathered from a broad surrounding volume, which could include the entire image data set. The former method is called local or context-free, because the general context of the voxel is ignored. The latter approach is termed “context-related.” These classifications should not be construed as distinctions between 2D versus 3D methods. Two-dimensional approaches can be context-related, and 3D approaches can be context-free, and vice-versa. General approaches are both 3D and context-related. User-guided segmentation is to some extent context-related. For example, edge tracking is contextrelated when used manually by a trained observer,
218
Chancyand Pizer
but is context-free as commonly implemented on a computer. An observer can take context into account in determining the presence and location of a poorly imaged edge even when gaps occur, for instance. Automatic tracking based on local measurements of edge strength is prone to failure in noisy images, when tracking weak or ambiguous edges, and when gaps occur.
Binary Versus Weighted or Fuzzy Assignment There are two general methods for assigning a voxel to a region during segmentation. Binary assignment is unambiguous; a voxel is definitely included or excluded from the region being created. However, the region to which a voxel should be assigned is often ambiguous. Some object definition methods, including those most commonly used in treatment planning, eg, contouring, are inherently binary in nature and unable to deal with ambiguity, except to the extent that a contour can cross through and thus divide a voxel. Another approach, most often associated with automatic segmentation methods, allows assignment of a voxel to multiple regions.25 In this case, a weighting factor between 0 and 1 is associated with each region to which the voxel is assigned. The weighting factor can be thought of as the probability that the voxel belongs to the region associated with the weight. Thus the sum of all the weighting factors is 1, representing a probability of 100% that the voxel belongs to some region. Voxels entirely within parenchyma usually have one weighting factor close to unity. Voxels in boundary regions and regions with poor contrast are likely to carry weighting factors in the mid-range because there is greater uncertainty as to which region the voxel should be assigned. For example, a voxel in the boundary region between the kidney and the adrenal gland may carry a weighting factor of 0.7 for assignment to a region that ultimately will be labeled as kidney, and a factor of 0.3 for assignment to another region that will ultimately be labeled as adrenal gland. Fuzzy assignment offers several advantages over binary assignment. The weighting factors computed for a voxel can be in error by a fraction as opposed to 1 or 0 as with binary labeling. This can be important when using a display for editing. All voxels in regions created by a binary method are treated the same, and a display cannot show ambiguity that conveys useful information, such as subtle features or details like tumor invasion or extension. Regions created by
weighted assignment can be volume-rendered to produce displays that make these details easily discernible. In addition, the observer can make a change in how the (local) voxel values affect the display and see the effect of the change on the whole region or image instead of just the voxels in question. For example, suppose that it is ambiguous whether or not a few voxels belong to a particular organ. Using binary object definition, a “yes” or “no” decision is made without knowledge of possible consequences affecting the rest of the image. By changing the segmentation parameters that affect voxel weighting factors to make the questionable voxels look more “organ-like,” the observer can see the effect on related parts of the image (for example, whether the organ’s surface conformation as a whole is closer to normal anatomy) and have some guidance as to whether the voxels belong to the structure in question Partial volume effects are a major cause of segmentation ambiguity in boundary regions and in regions where the scale of anatomical structures is smaller than the resolution of the imaging device. In these cases, weighted assignment can be important in volume computations. In boundary regions, weighting factors can be interpreted as fractional contents. For instance, in the above example involving the kidney and adrenal gland, the weighting factors can be interpreted to mean that the voxel contains 70% kidney and 30% adrenal gland. When computing the total kidney volume, the volume of each voxel carrying a kidney label can be multiplied by the appropriate weighting factor and the results summed. Partial volume effects are always important in boundary voxels if the voxel size is well-matched to the imaging resolution. Thus, where the boundary voxels are a significant fraction of the total volume, the method of summing fractional weights has been shown in limited tests to produce more accurate volume measurements compared with summing voxels labeled with binary segmentation.25
Segmentation
Methods
Most segmentation definition methods are designed to work well in specific circumstances that may differ from method to method. In addition, some methods are more suitable for editing or refining regions generated by another approach. Thus, segmentation methods tend to complement one another, and most programs incorporate more than one. Only a few methods have been developed and
D&ring Anatomical Structuresjom Images
refined well enough to stand up to the expectations and impatience associated with routine clinical use. The most demanding expectations are speed, accuracy, absence of bugs, and minimal human involvement. Features of computer programs meeting these expectations include a welldesigned user interface; simplicity of use; fast computation when the user must wait for a response before continuing; predictable performance, allowing the user to recognize circumstances where a method is likely to succeed or fail; graceful performance when a method fails, eg, avoidance of program crashes and data losses, and production of informative error messages; handling of partial results from failed attempts, eg, saving them for review or editing or perhaps as a starting point for another method; editing methods that are well-matched with complementary primary definition methods; and thorough documentation. Clinically useful methods will be discussed along with methods that show promise for improving reliability, accuracy, reproducibility, and automation. Manual and Semi-Automatic Contouring The traditional method of segmentation for treatment planning is a combination of manual and automatic edge tracing.4,‘8 Most algorithms for automatic contouring rely on a measure of gradient strength to detect an edge.26This approach works well in high-contrast regions, and contours of the skin and some bony structures can be automatically generated with minimal errors. Manual contouring is most often used for outlining objects with weak or no edges, such as tumor and target volumes, and for editing automatically generated contours. Another method for edge detection is called “active contours.“*’ The method begins from an approximate contour and iteratively adjusts it to minimize an “energy” function that captures three properties, each producing a term in the function. One term responds inversely with the average “edginess” of the image intensities along the contour, eg, the average intensity gradient magnitude. Another term decreases as geometric properties of the contour become more desirable, eg, as it becomes smoother. The third term decreases as the contour conforms to user guidance, eg, as it moves away from a repelling location that the user has interactively placed to force the active contour to jump over lowor high-intensity spots in the image that might otherwise block computer-guided changes in the contour. The successof active contours depends on how well the energy function eomponents capture
219
the desirable boundary properties and on the starting contour. Among the ways this contour can be produced are interactive drawing or interactive specification of a point in the object, followed by automatic selection of locations of high-intensity gradient magnitude along many radii in a starburst centered at the point, with the gradient commonly measured on a smoothed version of the image. Interactive Pixel Selection The most basic method for segmentation involves explicit slice-by-slice, user-guided pixel selection. Selection of a single pixel at a time is impractical for routine clinical application. Pixel painting is a version that may sometimes be practical as a primary segmentation method, but more frequently can be useful for detailed editing of segments defined by some other method. In this variation, the user defines a region by moving a cursor around the desired region. Pixels touched by the cursor are labeled as part of the region. In the editing mode, pixels can be selected and deselected in the same way. The size and shape of the cursor can be customized to match the geometry and scale of the region being defined or edited. Interactive Thresholding Segments can sometimes be created quickly by userguided operations that yield close approximations to the desired regions. A simple example is setting a CT window at the upper and lower bound of the region of interest. The display is most effective when the pixels inside and outside the window are tinted in different hues.‘* If the desired region has a sharp edge, the window can usually be adjusted to clearly highlight the pixels of interest. When the user is satisfied with the window setting, a cursor can be used to select one or more patches of pixels as being part of the desired region. Sometimes the patches are flawed in one or more ways that require editing, eg, (1) they contain small islands of pixels that are not highlighted, but should be included in the desired region; (2) they have “noisy” edges with lots of tiny bumps and valleys that are clinically insignificant; or (3) a desired patch is joined to an undesired patch by an isthmus that cannot be eliminated by windowing. Unhighlighted islands can be painted. In the second case, the selection step can invoke a scale-sensitive filter that removes unwanted edge structure corresponding to the scale of the filter; the filter can also prevent the union of the unwanted patchwith the desired patch.30 (See example from MASK,** Fig 1.)
220
Chmg and Piqr
intensities for various organs, and it works best in restricted regions of interest that include the object being defined but little of the surrounding neighborhood. For a large region of interest, this is a poor approach if surrounding context is not taken into account. For example, in defining a kidney within a region of interest much larger than the kidney, the computer is likely to include nonkidney pixels in kidney segments. Context-sensitive methods under investigation take into account both spatial information and the intensity distribution in the neighborhood of the segment.
Figure 1. Example from MASKof interactive thresholding with scale-sensitive filtering. (A) Window set for defining a section of the kidney (red pixels). (B) Region selected using a small scale filter (green pixels). Many undesired pixels outside the kidney are also selected. (C) Region selected with a larger scale filter. Selected region is closer to the desired shape. (D) Desired shape obtained by using two different filters in succession. (See color plate on p
293.)
Statistical Pattern Recognition Statistical pattern recognition (SPR) creates “feature vectors” from multiparameter data associated with each voxe1.2g,30SPR requires a succession of spatially registered images, each formed with different image acquisition parameters. The feature vector consists of the collection of “intensity” values associated with corresponding voxels in each image. Alternatively, features can be produced from a single-
Slice-to-Slice Propagation Propagation of a segment from one slice to the next can be accomplished by duplicating on neighboring slices the operations used to define the object on an initial slice, eg, interactive thresholding or autocontouring (Fig 2). For user-guided definition, this strategy can be powerful if implemented so that user operations on one slice are automatically executed on a set of previously selected slices. On a modern workstation, the operations can be executed so fast that they appear to be simultaneous. Alternatively, a region can be transferred directly to an adjacent slice and used as the starting point for an automatic search or edited by the user to create the desired region.
Interpolation Regions on intermediate slices can be interpolated from regions defined on two end slices. When the results are satisfactory, as is often the case for the spinal cord, interpolation is frequently used in combination with manual contouring. If contours are drawn on every other slice, for example, user interaction time can be reduced almost in half.
Bayesian Pixel Classification Pixels can be assigned based on a afiori knowledge of the distribution of gray levels normally associated with the object being defined.28 This method requires the computer to contain knowledge of the expected
Figure 2. Example from MASKof slice-to-slice propagation. (A) Sagittal view windowed for defining the cervical spinal canal (green pixels). The white line down the canal indicates the path of a cursor used to select slices for defining the canal and to mark the patches on each slice. A small scale filter was used to edit the patches. The vertical green lines indicate slices on which the operation failed. (B) A slice on which the operation failed because of a large gap in the bone surrounding the canal. Such failures can be edited without affecting other slices. (C) Typical slice on which the canal was successfully defined. (See color plate on p 293.)
DeJining Anatomical Structuresjon Images
intensity image by convolving the image separately with each of a number of derivatives of Gaussians or combinations of such derivatives, for various derivatives and various widths of Gaussian. These features capture geometry, such as object orientation, object centrality, object edginess, and object “endness,” at the spatial scale of the Gaussian in question. In this case, the feature vector at a voxel consists of the collection of geometric measurements centered at that voxel.3* When the statistical probability distribution of the feature vector is studied over a sample of voxels known to be part of a particular tissue type, other vectors falling within the probability distribution can be automatically identified. For example, a user can interactively “train” the computer to select a set of voxels that approximately define sections of the liver by selecting a small region of interest known to be within the liver. The SPR program will perform the statistical analysis on feature vectors associated with voxels within the region of interest, and then label all voxels with vectors that exhibit the same statistical pattern. Automatic methods for training have also been investigated.23 The success of SPR can be improved if the images are preprocessed in a way that causes the probability distribution of feature vectors for each object to become more narrow. One preprocessing step shown to be effective is variable conductance diffusions2 In its basic form, SPR is essentially a binary classification decision made at the voxel level and thus is sensitive to noise and subject to failure in regions where volume averaging occurs, such as boundaries, unless special measures are takens3 In a more sophisticated approach, Windham et a125have developed a method that allows fuzzy assignment of voxels. Their approach involves user supervision to identify a training set of voxels as being entirely within the object of interest (the “desired” tissue type), and other collections of voxels as being “undesired” tissue types. By comparing the average feature vectors for all of the tissue types identified, their method produces a transformation that optimizes the signal-to-noise ratio of a fuzzy classification that describes the likelihood of a voxel being in the desired object (Fig 3). SPR is subject to failure when image acquisition parameters are inhomogeneous across the imaging field ofview, causing significant differences in feature vectors from one location to another for the same tissue type. Examples include beam hardening in CT and variations in the MRI magnetic field across the
221
Figure
3. Example of statistical pattern recognition using Windham’s eigenfilter method on multiparameter h4R images of the brain. (A) Typical slice from one of the image sets showing regions selected for training the computer to recognize ventricles, white matter, and gray matter. (B) Results for ventricles. (C) Results for white matter. (D) Results for gray matter. Desired regions are bright pixels. Note that some undesired pixels are selected in each case. (See color plate on p 293.) image. Perhaps the most significant weakness of SPR with features consisting of the initial intensities alone is that essential geometric information is not explicitly captured by the feature vectors. In general, the success of SPR improves as the number of parameters associated with each voxel increases. Image sets that meet the multiparameter requirements of SPR include CT scans acquired at different energies, and MR images acquired for different pulse sequences or over a number of different echo times. With current imaging technology, suitable multiparameter CT scans cannot be obtained in routine clinical practice. However, as the role of MR imaging in treatment planning increases, SPR may become a more important method for object definition.
Neural Networks Neural networks can be taught to recognize patterns in the presence of other signals or noise.34 So far, they have been successfully trained to recognize only very simple patterns from a simulated medical image.35 Constraints imposed by memory and time requirements make current technology unsuitable for recognition of anatomical objects from tomographic im-
222
Chary and Pim
ages. However, a particular network architecture, a constraint satisfaction neural network, can be used to generate contours in a gray scale image.3’j The results, including disadvantages, are similar to edge or isointensity tracking. The primary advantage is the computational speed offered by the parallel architecture of neural networks. The Multiscale Medial Axis In recent years, a great deal of attention has been devoted to creating theoretical frameworks for computer vision from which hypotheses or models can be derived and tested.37-3g In this way, general solutions can be developed that solve a wide variety of prob lems. One theoretical framework is based on the premise that perception and recognition of objects by the human visual system can be described analytically” and thus modeled using a computer-based approach. A promising model within this framework is that the visual system relies on a set of “receptive fields” that operate at the neuron level to allow the human to see an image or a property of an image at multiple scales of resolution simultaneously.41 These receptive fields are described analytically as filters or operators that must satisfy special mathematical properties.38 Receptive fields must detect many geometric properties simultaneously and at different scales to allow the observer to recognize objects. These properties and their associated filters are the subject of current investigation that takes as its starting point that loci, such as object edges, or distance measures, such as widths, cannot be satisfactorily described as a sharp value or locus; rather, they must be adjoined with their tolerances, ie, described as a fuzzy value or locus. Recently, Pizer et a142,43 have made a case that medialness of a point within an object and the scale of the object at the same point are important object properties with associated receptive fields. In a 2D image such as a CT slice or in 3D image data, points that are in the middle of an object form a sort of skeletal structure known as the medial axis. Each point on the medial axis is adjoined with a parameter that simultaneously describes the width (scale) of the object at that point and the tolerance of both the axis location and the width, leading to the name “multiscale medial axis” (MMA). Fritsch et a143and Morse et alti have developed analytic forms of 2D filters that operate at different scales to measure medial and width properties, and have successfully computed MMAs for objects within 2D medical images. Fritsch et al’s method is being used as the basis for
portal image registration with encouraging results (Fig 4). Because the filters used in MMA analysis operate at a variety of spatial scales,the method is essentially insensitive to the presence of high relative noise, typically confined to small scale (high-frequency) components of the image. MMA analysis can also yield information about edge curvature and direction at object boundary points that are associated with each medial axis point through the scale parameter. The method is amenable to defining fuzzy boundary regions and thus is not subject to the disadvantages of hard-edge detectors. In principle, the MMA, with its associated object scale information, and the associated boundary information are sufficient descriptors for automatic segmentation. Hierarchical region/ subregion relationships have not been generated, but methods for their derivation have been conceived.43 The implementation is currently being generalized to three dimensions. Other Multiscale, Hierarchical
Approaches
Many other approaches for decomposing an image based on multiscale analysis can be imagined. The challenge is to formulate approaches that produce a hierarchy comprised of a small number of segments that are closely related to objects of interest, and thus require minimal editing and grouping operations for final object definition. As an example, one approach treats the spatial distribution of pixel intensities in terms of ridges separated by valleys. A natural hierarchy is derived by finding ridges that are sub
Figure 4. Multiscale medial axes computed from electronic portal images of an anthropomorphic phantom. Portions of the axes were used as fiducial structures to register the images and compute the misalignment of the anatomy in terms of rotation and translation relative to the field edges. (A) Reference portal image. (B) Image with the same collimator size and isocenter location, but with the collimator rotated 5”. In repeated passes of this controlled experiment, the computer detected the rotation to within 0.5” and isocenter congruence to within 0.5 mm. (See color plate on p 293.)
223
Dejning Anatomical Structuresjom Images
ridges originating from the flanks of larger ridges.45 These ridge-subridge relationships can be discovered by multiscale analysis. This process creates links between ridges all the way from the smallest to the largest ridge; that is, subridges originating from the same larger ridge define object-related regions (segments) of the larger ridge, and so on. In its basic form, this approach produces binary assignments. The method can be generalized to yield better object-related regions by using other properties of the image besides pixel intensity, such as the abovementioned medialness measure, to form the ridgebased image description hierarchy and by including boundariness measurements at various scales within regions enabled by the fuzzy boundaries derived from the MMA, so as to produce assignments that are still fuzzy, but sharper than those from the MMA alone.
Region Grouping Following Automatic Segmentation Interactive Region Selection In interactive region selection, a trained observer interacts with a computer display to select all the regions that completely define each object of interest.
Mathematical Morphology Mathematical morphology relies on a @n-i information, eg, a dictionary of shapes and normal variants, to aid in grouping segments.20,ffl It is sometimes applied as a postprocess to the results of other methods to fill in holes and mod@ other geometrically improbable structural features. Compared with context-free methods, mathematical morphology allows some improvements in global analysis and thus has the ability to deal with object geometry. However, a major disadvantage is that objects with hard boundaries are produced.
Region Algebra Operations Region algebra operations can be used for grouping regions defined by another method. The desired regions can often be obtained by simple combinations, such as complement, union, and difference, involving the segments to be edited. For hierarchical image descriptions, reasonable groupings can be produced more efficiently with the aid of region algebra operations that can exploit the hierarchical ordering. Region algebra operations can also be used for region editing (Fig 5).
Figure 5. Example from MASK of region algebra operations to define the skin surface in a slice of the head. (A) Window set to define skin. Note that air (gray pixels) infiltrates into the sinus cavities through the nasal passages.Autocontouring in this case would produce a highly convoluted skin contour. Region algebra can be used to eliminate inner air cavities. (B) Air selected (green pixels) as the desired region using a scale-sensitive filter that prevents penetration into the nasal passages and beyond. (C) Final result obtained by using the air region as a mask, and then adding the islands in the nasal and sinus cavities to the resulting region. (See color plate on p 293.)
Interacting
With a Hierarchy
When hierarchies are generated, the links between regons and groups of regions can be activated and edited using an interactive computer program to quickly define objects of interest. Useful interactive operations are pointing at a pixel to select a leaf (small-scale) region in the hierarchy, moving to a parent in the hierarchy, taking unions and differences of regions so defined, and painting with leaf regions as illustrated in Fig 6 using the Interactive Hierarchy Viewer.47
Summary Defining anatomical objects in medical images is a critical step in 3D treatment planning. The accuracy and reproducibility of this step affects targeting, optimization based on dose-volume histograms or other volume-based measures, and the development of biological models for tumor control and complica-
Chary and P&-r
Figure 6. Region grouping with the Interactive Hierarchy Viewer (WV). The displayed image has been segmented using a technique that creates a hierarchy. (A) Smallest regions (primitive or leaf regions) created during segmentation in the brainstem and cervical spinal cord (green), cerebral cortex (dark blue), cerebellum (red), and scalp (yellow). These regions were “turned on” by placing a cursor in each region and clicking a mouse button. (B) Regions can be grouped one by one or painted to define the desired objects. Alternatively, a group of linked regions can be selected with one click that signals the computer to follow the hierarchy to the next node. In this figure, the brainstem, scalp, and cerebellum have been defined with just a few mouse operations. The highlighted area in the cortex has been created by adding one more leaf region. (C) Continued grouping adds regions further up the hierarchy. In this case, the corpus callosum has been added to the brain stem by a single click. A similar result was obtained for the cerebellum. In each of these cases, the user could have chosen not to add the new regions, or once added the user can subtract them. (See color plate on p 2!34.)
tion probabilities. Efficiency is another important consideration. The current standard ofpractice based on edge detection is inadequate in the modern era of
conformal therapy, tight margin, and dose escalation. More sophisticated approaches based on computer vision techniques, such as those discussedhere, need to be further studied in the laboratory and tested in the clinical setting to verify accuracy and reproducibility, and to develop clinically reliable performance and efficient user interfaces.
Acknowledgment This article reflects the programming work of Graham Gash, Krishna Gadepalli, and Gregory Tracton and the research of James Coggins, David Eberly, Eric Fredericksen, Daniel Fritsch, Bryan Morse, Shobha Murthy, and Ross Whitaker.
References 1. Smith AR, Purdy JA: Three-dimensional photon treatment planning: Report of the Collaborative Working Group on the evaluation of treatment planning for external photon beam radiotherapy. Int J Radiat Oncol Biol Phys 2 1, 1991 2. Goitein M, Abrams M: Multidimensional treatment planning: I. Delineation of anatomy. Int J Radiat Oncol Biol Phys 9:777-787, 1983 3. Ettinger DS, Leichner PK, Siegelman SS:Computed tomography assisted volumetric analysis of primary liver tumor as a measure of response to therapy. A m J Clin Oncol 8:413-418, 1985 4. Goitein M, Laughlin J, Purdy JA, et al: Three-dimensional
display in planning radiation therapy: A clinical perspective. Int J Radiat Oncol Biol Phys 21:79-89, 1991 5. Schad LR, Boesecke R, Schlegel W, et al: Three dimensional image correlation of CT, MR, and PET studies in radiotherapy treatment planning of brain tumors. J Comput Assist Tomogr 11:948-954,1986 6. Pelizzari CA, Chen GTY, Spelbring DR, et al: Accurate three-dimensional registration of CT, PET, and/or M R images of the brain. J Comput Assist Tomogr 13:20-26,1989 7. Pizer SM, Amburn EP, Austin JD, et al: Adaptive histogram equalization and its variations. Comput Vision Graphics Image Process39:355-368,1987 8. Goitein M, Laughlin J, Purdy JA, et al: Role of inhomogeneity corrections in three-dimensional photon treatment planning. Int J Radiat Oncol Biol Phys 2 1:59-69, 199I 9. Drzymala RE, Mohan R, Brewster L, et al: Dose-volume histograms. IntJ Radiat Oncol Biol Phys 21:71-78, 1991 10. Meertens H, Bijhold J, Strakee J: A method for the measurement of field placement errors in digital portal images. Phys Med Biol35:299-304, 1990 1 I. Baiter JM, Vijayakumar S, Pelizzari CA, et al: Computer aided registration of portal and simulation images: An exploratory clinical study. Int J Radiat Oncol Biol Phys 19:168, 1990 (suppl) (abstr) 12. Pizer SM, Johnston RE, Ericksen JP, et al: Contrast-limited adaptive histogram equalization: Speed and effectiveness, in Ezquerra NF, Garcia F, Arkin R (eds): Proceedings of the First International Conference on Visualization in Biomedical Computing. Atlanta, GA, IEEE Cat no. 9OTHO31I, 1990, pp 337-345 13. Rosenman JG, Roe C%,Cromartie R, et al: Portal film enhancement: Technique and clinical utility. Int J Radiat Oncol Biol Phys, 1992 (in press) 14. Pizer S, Fuchs H, Heinz E, et al: Interactive three-dimensional display of medical images, in Deconinck F (ed): Proceedings of the 8th Conference on Information Processing in Medical
Dtjining Anatomical Structures@m Images
Imaging. Brussels, Belgium, Martinus Nijhoff, 1984, pp 513526 15. Friedman MA, Resser KJ, Marcus FS, et al: How accurate are computed tomographic scans in assessment of changes in tumor size?Am J Med 75:193-198, 1983 16. Mahaley MS, Gillespie GY, Hammett R: Computerized tomography brain scan tumor volume determinations. J Neurosurg 72:872-878,1990 17. Williams DM, Bland P, Li L, et al: Liver-tumor boundary detection: Human observer vs computer edge detection. Invest Radio1 24:768-775, 1989 18. Mills PH, Fuchs H, Pizer SM, et al: IMEX: A tool for image display and contour management in a windowing environment, in Schneider RH, Dryer SJ,Jost GR (eds): Proceedings of Medical Imaging III: Image Capture and Display. Newport Beach,CA,SPIEVol 1091:132-142,1989 19. Koenderinka: The structure of images. Biol Cybern 50:363370,1984 20. Ayache N, Boissonnat JD, Cohen L, et al: Steps toward the automatic interpretation of 3D images, in Hohne KH, Fuchs H, Pizer SM (eds); NATO Conference on 3D Imaging in Medicine: Algorithms, Systems, Applications. Travemunde, Germany, Springer-Verlag, 1990, pp 107-120 21. Pizer SM, Gauch JM, Coggins JM, et al: Multiscale, geometric image descriptions for interactive object definition, in Burkhardt H, Hohne KH, Neumann B (eds): Proceedings of the 11th Symposium of DAGM [The German Association for Pattern Recognition]. Hamburg, Germany, InformatikFachberichte Vol219,1989, pp 229-239 22. Purdy JA, Chaney EL, Kalet I: Radiotherapy Treatment Planning Tools Second Year Progress Report, Technical Report 91-2, St Louis, Washington University, 1991 23. Gerig G, Martin J, Kikinis R, et al: Automating segmentation of dual-echo MR head scans, in Colchester ACF, Hawkes DJ (eds): Proceedings of the 12th International Conference on Information Processing in Medical Imaging. Wye, UK Lecture Notes in Computer ScienceVol511, pp 175-187, 1991 24. Frederickson RE, Coggins JM, Cullip TJ, et al: Interactive object definition in medical images using multiscale, geometric image descriptions, in Ezquerra NF, Garcia F, Arkin R (eds): Proceedings of the First International Conference on Visualization in Biomedical Computing. Atlanta, GA, IEEE Cat no. 9OTHO311,1990, pp 108-l 14 25. Peck DJ, Windham JP, Soltanian-Zadeh H, et al: A fast and accurate algorithm for volume determination in MRI. Med Phys 19:599-605, 1992 26. Canny J: A computational approach to edge detection. IEEE Trans Pattern Analysis Machine Intelligence 8:679-698, 1986 27. Kass M, Witkin A, Terzopoulos D: Snakes: Active contour models. IntJ Comput Vision 1:321-331, 1987 28. Duda RO, Hart PE: Pattern Classification and SceneAnalysis. New York, NY, John Wiley and Sons, 1973 29. Windham JP, Abd-Allah MA, Reimann DA, et al: Eigenimage filtering in MR imaging. J Comput Assist Tomogr 12: l-9, 1988 30. Baxter LC, Coggins JM: Supervised pixel classification using a feature space derived from an artificial visual system, in Casasent DP (ed): Proceedings of the Conference on Intelligent Robots and Computer Vision IXz Algorithms and Techniques. Boston, MA, 1990, SPIE Vol 1381, pp 459469 31. Coggins JM: A multiscale description of image structure for segmentation of biomedical images, in Ezquerra NF, Garcia F, Arkin R (eds): Proceedings of the First International Conference on Visualization in Biomedical Computing. Atlanta, GA, IEEE Cat no. 9OTHO311,1990, pp 123-130
225
32. Kubler 0, Gerig G: Segmentation and analysis of multidimensional data-sets in medicine, in Hohne KH, Fuchs H, Pizer SM (eds): Proceedings of the NATO Conference on 3D Imaging in Medicine: Algorithms, Systems,Applications. Travemunde, Germany, Springer-Verlag, 1990, pp 63-8 1 33. Gutfinger D,HertzbergE,TolxdorffT, et al: Tissue identification in MR images by adaptive cluster analysis, in Loew MH (ed); Proceedings of the Conference on Medical Imaging V: Image Processing. San Jose, CA, SPIE Vol 1445, 1991, pp 288-296 34. Anderson JA, Rosenfeld E (eds): Neurocomputing: Foundations of Research. Cambridge, MA, MIT Press, 1988 35. Boone JM, Sigillito VG, Shaber GS: Neural networks in radiology: An introduction and evaluation in a signal detection task. Med Phys 17:234-241,199O 36. Lin WC, Tsao ECK Chen CT: Constraint satisfaction neural networks for image segmentation, in Kohonen T, Makisara K, Simula 0, et al (eds): Artificial Neural Networks: Proceedings of the 1991 International Conference on Artificial Neural Networks (ICANN-91). Espoo, Finland, 24-28June, 1991, vol 2, pp 1087-1090 37. Wechsler H: Computational Vision. San Diego, CA, Academic Press, 1990 38. ter Haar Romeny BM, Florack LMJ, Koenderink JJ, et al: Scale-space:Its natural operators and differential invariants, in Colchester ACF, Hawkes DJ (eds): Proceedings of the 12th International Conference on Information Processing in Medical Imaging. Wye, UK Lecture Notes in Computer Science 1991,511:239-255 39. Pizer SM, Coggins JM, Burbeck CA: Formation of image objects in human vision, in Lemke HU, Rhodes ML, Jaffe CC (eds): Proceedings of the Conference on Computer Assisted Radiology. Berlin, Germany, Springer-Verlag, 1991, pp 535542 40. Koenderinka, van Doorn AJ: Representation of local geometry in the visual system. Biol Cybern 55:367-375, 1987 41. Koenderinkg, van Doorn AJ: Receptive field families. Biol Cybern 63:291-298,199O 42. Pizer SM, Burbeck CA, Coggins JM, et al: Object shape before boundary shape: Scale-space medial axes, in 0 Y-L, Toet L, Heijmans H (eds): Proceedings of the NATO Conference on Shape in Picture, Driebergen, The Netherlands, SpringerVerlag, 1992 (in press) 43. Fritsch DS, Coggins JM, Pizer SM: A multiscale medial description of greyscale image structure, in Casasent DP (ed): Proceedings of the Conference on Intelligent Robots and Computer Vision X: Algorithms and Techniques. Boston, MA, SPIEVol 1607,1991, ~~324335 44. Morse BS, Pizer SM, Burbeck CA: A Hough-like medial axis response function. Technical report TRSl-044, Department of Computer Science, University of North Carolina, Chapel Hill, NC, 1991 45. Pizer SM, Whitaker R, Morse D, et al: Medical image segmentation via ridge-related image description. J Biomed EngSoc 19:611, 1991 (abstr) 46. Bookstein I% Thin-plate splines and the atlas problem for biomedical images, in Colchester ACF, Hawkes DJ (eds); Proceedings of the 12th International Conference on Information Processing in Medical Imaging. Wye, UK Lecture Notes in Computer Science, 1991,511:326-342 47. Cullip TJ, Fredericksen RE, Pizer SM: Toward interactive object definition in 3D scalar images, in Lemkk HU, Rhodes ML, Jaffe CC (eds): Proceedings of the Conference on Computer Assisted Radiology. Berlin, Germany, SpringerVerlag, 199l,pp83-106