Copyright © IFAC Artificial Intelligence in Real-Time Control, Bled. Slovenia, 1995
MEDICAL IMAGE UNDERSTANDING SYSTEM Tatjana Zrimec * and Claude Sammut t
*Faculty of Electrical and Computer Engineering, University of Ljubljana, Slovenia
tSchool of Computer Science and Engineering,
University of New South Wales, Sydney, Australia 2052
Abstract : In this paper we present a general methodology developed for computer interpretation and integration of medical images. An explicit anatomical model, as well as other domain knowledge, are used to facilitate the feature extraction and fusion of images from different modalities. A prototype system for reconstructing the human cerebral vasculature is presented. Keywords: image recognition, model-based control, knowledge representation, medical applications, sensor fusion
1. INTRODUCTION
and the image: physical properties of the imaging system (e.g. focal length, angle of the camera, special properties); • image domain knowledge : types and relations among image features such as edges, lines and regions.
A number of studies have combined image processing with knowledge-based approaches in order to achieve betterresults in interpreting images. These resulted in various expert systems for image processing and image understanding. In most expert systems, the knowledge mainly concerns how to effectively use image processing operators for image analysis (Matsuyama, 1989). However, systems for image understanding, also known as model-based image processing, use object models to generate predictions during image analysis. These systems produce interpretations of image descriptions in terms of models (Brooks, 1984).
The first application of image understanding systems was in analysing aerial photographs . Two such systems are ACRONYM (Brooks, 1984) and SIGMA (Matsuyama, et al., 1985). ACRONYM is a modelbased image understanding system for detecting 3D objects, and has been used to detect aircraft in lowlevel aerial photographs. SIGMA is a prototype system for structural analysis of aerial photographs of a suburban area. Its purpose is identify roads, houses and other objects. Both systems rely on models of objects either described by the user or already stored in the knowledge base. Besides these models, the possible spatial organisation of objects in the world (as well as the hierarchical organisation of objects in term of classes and subclasses) is required by both systems. These systems interpret images by fmding instances of modelled objects and by predicting the existence of objects that have not been found but are known, according to the model, to be necessary for that scene. We propose to use the same approach for analysing medical images. Medical images are well suited for this type of analysis. When a radiologist examines an image, he looks for known features, having in mind a model of the imaged structures. Radiologists know, from experience, what features to expect and their organisation in the image.
The basic task of image understanding is to define a mapping between the scene and the image domain knowledge (Ballard, et al., 1982). To do so, an image understanding system, in general, requires diverse sources of knowledge to interpret visual scenes (Matsuyama, 1989). An important part of image analysis is to formulate and describe the visual information, such as what type of image features we can extract from the images, what properties those features are expected to have , and how they are related to each other. Knowledge used in image understanding has been classified into the following three levels (Matsuyama, 1989): • scene domain knowledge : geometric models of objects in the scene, their relations, etc; • knowledge about the mapping between the scene 179
during image processing and image fusion. • Knowledge about the mapping between the scene and the image: structural and spatial knowledge extracted from anatomical atlases together with the mapping scheme expressed in terms of relations (connections). • Image domain knowledge : symbolic representation of 2D and 3D angiograms together with the relations among the image features. Control Mechanism: The task of a control mechanism in image interpretation is to direct the interpretation process through several levels of activity (Ballard and Brown, 1982) which map models onto image features. It determines how and when domain knowledge should be used. Image analysis , which involves large amounts of data, requires several interpretative levels, or processing stages, to construct a scene description. Each level corresponds to a different level of abstraction of the scene . Control may be bottom-up, top-down or mixed. We propose to use a mixture of model-driven prediction with data-driven analysis at different resolutions. Figure 1 shows the organisation of the entire system. This paper outlines the progress that has been made in implementing the proposed system.
Many researchers have used model-based strategies, in much the same way that radiologists do (Baldock, 1992; Gerig, et aI. , 1993 ; Rake and Smith, 1987). Some attempts have been made to develop image understanding systems for analysing medical images (Wang, et al., 1987; Suetens, et al., 1989; Robinson, et al., 1993). The main problem for model-based medical image processing is to find an efficient representation for storing knowledge extracted during image processing and for representing anatomical knowledge supplied by the experts. This research investigates novel methods of symbolically representing and integrating visual data with complex, domain specific knowledge. It also aims to develop methods for multi-modal image fusion using a variety of medical image types, for example, MR images, MR angiograms and X-Ray angiograms . We present the development of an image understanding system specifically for interpreting and labelling vessels in cerebral angiograms in order to construct a 3 D model (Zrimec, et al., 1994). The role of the system is to accept data from various sources and to produce a meaningful description of the imaged object for 3D visualisation.
2. MOTIVATION AND SYSTEM ORGANISA nON
3. MODELLING AND MODEL REPRESENTAnON
In clinical practice, physicians now have access to large amounts of data made possible by recent developments in medical imaging instrumentation. Different imaging procedures capture different anatomical features in different ways. Images are either high resolution 2D projections, produced by X-rays, or a set of low resolution cross-sectional images which are part ofthe volume data generated by MRI. Physicians must combine the information from different images to fully visualise the imaged structure. Consequently a system for understanding medical images must also be able to use information from heterogeneous information sources. These include images from different imaging modalities, as well as other kinds of knowledge.
The first stage in modelling the domain knowledge is to determine what is to be represented and to select a formalism for representing the information that is relevant to the usage of the knowledge. In the field of medical imaging, we deal with complex anatomical structures which must be recognised in the images. Therefore, the models must include knowledge and information, both at the anatomical level, and at the image level. To model anatomical knowledge we must be able to encode biological shapes and to include normal cases and variations (Baldock, 1992). The knowledge includes descriptions of anatomical components and the relations between them. At the image level, the knowledge includes information about the expected appearance of an anatomical feature in an image. To increase the interpretative capability of the system, it is important to incorporate the different appearances of the modelled structures from different views, similar to those produced by the imaging modalities.
Physicians usually know what they are looking for in an image. Typically, the sought after features are described in the image domain terminology. This suggests that we can use models of the imaged structures to assist feature extraction. However, using models requires us to know how to establish a correspondence between a model and the features extracted from the image . Using knowledge in an image understanding system requires to make clear what knowledge is important, and to how it can be used. The first requirement is related to knowledge organisation and the second is related to the control mechanism for using knowledge.
A large body of knowledge incorporated into the model has been acquired from the different sources. Textbooks of anatomy and expert radiologists have been used for generating the symbolic description of the human anatomy. Two and three dimensional atlases have been used to determine the brain structure, together with the spatial organisation and relations of the brain features . Textbooks of normal neuroradiology, X-ray angiograms of a patient's vasculature which are in the form of 2D projections, and 3D volume data in the form of a stack of
Knowledge Organisation:The following are the basic components of our system: • Scene domain knowledge: a model of the cerebral vasculature represented in a form that can be used 180
Images Domain Knowledge
time~ " ~~~ ;E x-ray angiogram study 1
predict ions
x-ray angiogram stud y2
Models of vasculalur e structur es
verifications
~ Control
Brain str uctu res
Maximum r_----lI Intensity • hypothesis generation • hypothesis testing • symbolic matching
projection
r Descnpti ons of angiograms
• tree constirction • tree grafting
Symbolic level 3D MPA-basa:1
Visualisation Full-resolution 3D model
model
Figure 1.
Overview of the system.
one can expect to happen next or what to do if some conditions or expectations are not confirmed (Minsky, 1975). For example, when a new value is added to a slot, an if added demon may be automatically invoked by the frame system.
magnetic resonance angiogram (MRA) slices have been used to generate symbolic description of images. Production rules (Garreau, et al., 1991 ; Delaere, et al., 1991) and graphs (Gerig, et al., 1993; Hall and McGewgor, 1993 ) have commonly been used for representing models and other domain knowledge . We have chosen to use a frame system to implement the model. The reason is that frames allow easy representation of the hierarchical structures that are found in the human brain. Frames have been used for modelling in various domains (Brooks, 1984; Matsuyama, et aI., 1985; Robinson, et al. 1993).
In visual scene analysis, different frames may also describe a scene from different viewpoints (Minsky, 1975). We use this idea to represent knowledge about an object ' s appearance in images where different imaging modalities pro vide different viewpoints. We use the following frame types to represent the domain knowledge of the cerebral vasculature:
The central idea of frame theory is that a large amount of knowledge can be stored in a library of frames which are packets of knowledge that provide descriptions of typical objects and events (Minsky, 1975). Objects are represented by a set of attributes , also called slots, and their associated values. Frames may be seen as a hierarchy of nodes and relations . At the top of the hierarchy are class frames. They describe the generic properties of an entire class. Instance frames describe individual objects and are placed at the bottom of the hierarchy . Relations between classes and instances are expressed by special isa slots which indicate that an instance is a member of a class. We may also create a generic frame which represents a subclass of another class. Two powerful features of frame systems are inheritance and procedural attachment. Inheritance is the abilitv for a frame to inherit properties from its generic frames that are higher in the hierarchy. Inheritance is a verY useful feature , found also in object oriented languages. It enables a clear and concise descriptions of objects . Demons , are procedures attached to a slot. Demons can be used to store information such as how to use the frame , what
• The vascular anatomy frame stores anatomical information about the properties and organisation of the vasculature. • The brain structural anatomy frame stores the brain structure, divided into lobes and subdivisions down to particular brain features . • The brain spatial anatomy frame stores the spatial organisation and spatial relations of the brain features . It contains 3D information for each brain feature . • The XRA view fram e stores a symbolic description of two standard projections of the vasculature: lateral view and anterior posterior view. • The MRA volume fram e contains a symbolic description of a set of axial slices as produced by MR angiography . These frames are shown in Figure 2. Relationships between vessels are represented by links between different parts within the same frame hierarchy. Relationships between vessels and other brain structures are represented by links between different hierarchies . For example, a frame representing a 181
(b)
( c)
XRA views
/
~
/ "
AnterioPosterior
/ ">---
~bW CA-~ /
I
I '"
Above CA-CP
J --1
'"
@0C§Ve~ G®GG ~®G (d)
( e)
Brain SlrUctural
Anatomy
Figure 2. (a) and (f) belong to the Vascular Anatomy Frame, (b) belongs to the XRA View Frame, (c) belongs to the MRA Volume Frame, (d) and (e) belong to the Brain Anatomy Frame and (f) represents the Carotid System.
vessel contains a slot indicating which brain structures are supplied by that vessel and another slot indicating in which view the vessel is best seen. The relationship between brain features and the vasculature is established by the feeding vessels since the vessels supply or drain a particular part of the brain. This information is very useful to determine the position of the vessel in space.
(Figure 2f). A vessel system frame groups other frames into a part-sub-part hierarchy. The vessel system contains slots which describe the branches and slots which describe the appearance of those vessels in a Lateral or Anterior-Posterior x-ray. This organisation of the vessels enables us to establish a relationship between the x-ray images and the symbolic model. The system frame has slots which are frames that describe different views of the system. The frames , which correspond to different views of a system share the same vessels. The relationship between a particular vessel and a system is described by the slot, belongs_to_system, which contains a reference to the frame system. Each vessel system is described at different resolutions from coarse to fme . This kind of representation enables us to coordinate information gathered from different images and is useful for reconstruction and fusion.
3.1 Vascular Anatomy Frame A generic frame at the top level of this frame system contains the basic description of a blood vessel. All attributes which apply to a generic blood vessel appear as slots. At the next level, there are two more generic frames to describe blood vessels that are arteries or veins . Arteries deliver blood and are usually larger vessels which branch into smaller and smaller vessels, resulting in a vessel tree. Veins are vessels which drain the blood. They start as a small vessels and, by joining, form larger vessels until they end in a main artery. These two generic frames contain slots which represent the differences between the two types of vessel. The generic frames mentioned so far define classes of objects. They do not describe specific blood vessels. Descriptions of specific blood vessels are positioned at the bottom of the hierarchy, as shown in Figure 2a. An example of a frame describing a particular blood vessel is shown in Figure 3. Each slot has a default value for normal vessel appearance and a slot, exceptions, to store known variations. This frame represents the Internal Carotid Artery (ICA) which is one of the main arteries of the brain.
Frame systems can represent procedural as well as declarative knowledge. In our system, the frames that are used to represent procedural knowledge are called action frames . These frames contain heuristics to guide the process of recognition and interpretation. The vascular model can predict the same vessel organisation as a novice radiologist.
4. AN EXAMPLE OF VESSEL TRACKING AND LABELLING High-level image understanding can be used to assist feature extraction. We have developed new low-level algorithms that can make use of this high-level knowledge.The algorithms for processing x-ray angiograms are: hysteresis thresholding, followed by a thinning algorithm. The result of applying these two algorithms is a binary image containing a skeleton of a vessel ' s segments. After these stages, a vessel tracking algorithm can be used to extract descriptions of the vessels in the image. We have developed a tracking algorithm that uses both, the binary image, and the grey scale image. The binary image provides the vessel ' s skeleton and the grey scale image provides its diameter and average intensity. The algorithm can be used with and
Each vessel frame has a slot, position_3d, which contains a reference to another frame that gives the spatial description of the vessel. The value of the slot, view XRA lat, is a reference to a frame representing an x-ray view of this vessel (see Figure 3). The absolute position of the vessels in the brain may vary from person to person, but in most cases the organisation of the vessels is less variable. The vascular model can predict the same vessel organisation as a novice radiologist. Vessels are grouped into systems or vessels trees 182
"ICA" ako artery with name: size: diameter_start: diameter_end: branches_of: branches_ terminal : segments:
ic3_view_latisa view_latjrame with position_start: [front, below _h_mid]; position_end: [front, below _h_mid]; expec intensity: kl; expec_length: k2; expec_ diameter: k3; orientation: [du_vertical_up]; branches: n "OpA"" left_of], ["PCoA ", right_of], ["ChA", righeof]]; bifucation: ["MCA", right_of,left_of], ["ACA", right_of]];
default ''Internal Carotid Artery"; default large; default 5.0; default 3.5; default ["OpA", "PCoA", "ChA"]; default ["ACA", "MCA"]; default [ gangliOnic,
intracavemos, intraclinoid, supraclinoid, carotid_siphon); shapes: best_seen: position_3d: view_XRA_lat: view_XRA_ap: belons_to_system: exceptions:
=J1
deJalut [cl, cl, c3, c4, cS, (6); "7,::=;~;=:=:~~==~~~~~ tn' ===~[~"PC~A;"~'n~gh~t~of~,p~c~oa~]~! default [proi-!ateral,proj_ap]; Irica_3D isa space_description with default ica_3D; "ICA"; name: default ica_view_la t; below; position_CA_CP: default ica_view _ap; front; position_VCA_Vcp: default [carotid); [up]; direction: default [] . [points3d]; points_3d:
territory:
[[e,d], [10,12]]!
Figure 3. "ICA" is a generic frame representing an Internal Carotid Artery. The slot value position_3d is the frame ica_3D , shown at the bottom right. The slot value view_XRA_lat is the frame ica_view_lat, shown at the top right.
without the assistance of knowledge.
usually high resolution (1024 by 1024). However, during tracing, images from different resolutions are used. The model also contains representations at different levels of abstraction which correspond to different image resolutions. The fust prediction is at the highest level of abstraction, therefore tracing starts with a coarse image. Its task is to detect bifurcation points and to label the main vessels.
Blood vessels can be viewed as long thin objects organised in a connected tree structure. The representation of the blood vessels is in the fonn of a skeleton structure that includes the medial axes and the cross-sections of the vessels. The skeleton representation was chosen as the best way to preserve the topology of the tree-like blood vessel structure and it is adequate for dealing with high variability. The same type of representation is used for vessels obtained from images and for displaying them graphically (Zrimec, et aI., 1995).
To recognise a vessel during feature extraction, an instance frame of the vessel is created and compared with the frames in the vascular model. The instance frame is created automatically by demons. The fust demon that is triggered belongs to the slot, best seen. This slot refers to the frame which describes the view in which the vessel is best seen. Features, including the intensity, diameter, length and orientation of the vessel are extracted from the image corresponding to the best view . The view frame also contains infonnation about the intensity, diameter, length and orientation expected in this view of the vessel. The actual and expected values are compared, and if there is a match, the vessel is recognised and labelled. If not, the process continues by trying to identify the reasons for the discrepancy. Once the main components have been found and labelled, the resolution is increased in order to extract a more detailed description of the vessel structure in the image.
The representation of the blood vessel segments contains a topological description of the vessel tree and geometrical properties of the segments. The following data structures are used for storing the image infonnation extracted during vessel tracking: vessel tree, vessel segment and list ofskeleton points. A vessel tree is a list of vessel segments. Each vessel segment represents a named blood vessel. The list of skeleton points contains different types of points: vessel points, the starting point, branching points (where the vessel continues), bifurcation points (where the vessel divides into two vessels), and the end point of a vessel. Topological linking between the vessels is represented by the bifurcation points. Each vessel point holds all the infonnation of the traced vessel, such as the coordinates, local diameter, local direction, local average intensity. The end point contains the average diameter, intensity and length of the vessel segment. The tracking process extracts the vessel trajectory and stores it as a sequence of vessel points. Another list, generated in parallel, stores the parent-child relationship. The parent skeleton point contains a reference to the list of its children, represented in another structure.This representation was designed to establish a link with the model described in the previous section . Some of the attributes of a vessel segment are used in both, the frame system and in the vessel tracking algorithm.
A prediction given by the model helps to navigate through the image and helps in resolving some ambiguities that are problematic if knowledge is not present. For example, at a crossover point where the vessels are overlapping, it is difficult to decide where the continuation of the vessel is, if there is an overlap or a double branch. The model provides infonnation about a vessel's branches and relationships to other vessels. If a problem occurs during recognition, the model allows the system to look at other views in an attempt to resolve the difficulty. One of the requirements during labelling is that the blood vessel segments must have the same anatomical interpretation in all views. Thus, crossovers cannot be misinterpreted.
X-ray images of a particular vessel system provide the starting point for vessel extraction. The model uses the starting point to predict the root of the vessel tree and its expected branches. X-ray images are 183
ACKNOWLEDGEMENTS
5. RECONSTRUCTION AND SYMBOLIC FUSION
Thanks to Geoff Parker for his valuable medical assistance in constructing the model and to John Hiller, Nick Mankovich and Tim Lambert for their support. This work was supported by the Slovenian Ministry of Science and Technology and the Digital Equipment Corporation.
After extracting symbolic descriptions from two spatially separated views , the system can match vessel segments from both views . This gives us partial 3D information. It is incomplete because the imaging geometry is normally not recorded with the images . However, we can compensate for the missing information by using the expected volume data for MRA slices which has been stored in the model.
REFERENCES Baldock, R.A . (1992) . Trainable models for the interpretation of biological images. Image and Vision Computing, 10 (6), 444-450. Ballard, D.H. and Brown, C.M. (1982). Computer Vision, Prentice-Hall, Inc. New Jersey. Brooks, R.A , (1984). Model-Based Computer Vision , UMI Research Press, Michigan. Delaere, D. et al. (1991). Knowledge-based system for the three-dimensional reconstruction of blood vessels from two angiographic projections. MBEC North Sea: Medical physic and imaging, 27-36. Garreau, M. et al. (1991). A Knowledge-BasedApproach for 3-D Reconstruction and Labelling of Vascular Network form Biplane Angiographic projections. IEEE Trans on Medical Imaging, 10 (2), 122-131. Gerig, G. et al. (1993). Symbolic Description of 3-D Structures Applied to a Cerebral Vessel Tree Obtained from MR Angiography Volume Data. Information Processing in Medical Imaging, 13th 1nl. Conf., IPMI'93, Arizona, USA, 95-111 , June. Hall, P.M. and McGewgor, 1.1. (1993). A Graph Based Model of a Collection of Physical Vasculature. DICTA-93, Sydney, 414-421. Matsuyama, V. and Hwang, T. (1985) . SIGMA : A Framework for Image Understanding - Integration of Bottom-up and Top-down Analyses. In: Proc. of the Ninth Int. Joint Con! of Artificial Intelligence , August 18-23, 908--915. Matsuyama, T. , (1989) . Expert Systems for Image Processing: Knowledge-Based Composition of Image Analysis Processes, Computer Vision. Graphics. and Image ProceSSing., 48, 22-4. Minsky , M. (1975) . A framework for representing knowledge. In Winston, P. (ed) . The Psychology of Computer Vision, McGraw-Hill, New York, 211-277. Rake, S.T. and Smith, L.D. (1987). The interpretation of X-ray Angiograms using a Blackboard Control Architecture . Computer Assisted Radiology, 1nl. Symp. CAR'87, 681-685. Robinson, G.P., Colchester, A.C.F. and Griffin, L.D. (1993). Model Based Recognition of Anatomical Objects from Medical Images.In: Information Processing in Medical Imaging. 13th Int. Conf., IPMI'93 , Arizona, USA, 197-211. Suetens, P. et al. (1989) . Reconstruction of the Coronary Blood Vessels on Angiograms Using Hierarchical Model-Based Iconic Search. ICASSP 89, 576-580. Wang, H-Q , Ritchings, R.T. and Cholcester, A C.F. (1987). Image Understanding system for carotid angiograms. Image and Vision Computing, 5,(2) , 79-83. Zrimec , T . et al. (1994) . 3-D Visualisation Using Knowledge-based Image Fusion . AAAI94-Spring Symposium , March, 54-57. Zrimec, T. et al. (1995). 3-D Visualisation of the Human Cerebral Vasculature . SPIE 95 Medical Imaging: Image Display, Vo12431 , San Diego, CA
MRA images of the patient also help to make reconstruction more accurate. A correspondence is first established between an x-ray image and an MRA maximum intensity projection from the same view as the x-ray image. A maximum intensity projection (MIP) is obtained by tracing through the MRA volume image and placing the maximum of all the voxels encountered in a picture element on a projection image. Once a correspondence has been established between branches in the MIP and the xray image, a mapping is found between the MIP and the MRA volume data. Since we know which voxels in the MRA contributed whichpixels in the MIP, it is possible to locate a MIP pixel in 3D space and therefore we can also locate the corresponding pixels in the x-ray image. Combining 2D tracing and labelling of x-ray images and 3D tracing and labelling of MRA data results in the construction of symbolic descriptions of the imaged vasculature. The next step is to perform symbolic fusion using both representations. The fusion process is guided by the models and other domain knowledge.
6. CONCLUSION We have described a prototype of an image understanding system for interpreting medical images. The main components of the system are the domain knowledge and the control mechanism. The process of image interpretation requires integration of qualitative symbolic reasoning with the quantitative signal processing. The knowledge used in the system is explicit and separate from the programs that use it. Thus, the knowledge is accessible from different levels of abstraction in the image processing. Domain knowledge and control knowledge are implemented using frames which enable expectation driven programming. The process of filling frame slots confirms expectations, directs the reasoning process, and gathers information about the current situation (Minsky, 1975). While the area of application of this research is in medical image understanding, the principles developed here apply more generally to image understanding, knowledge-based image processing and multi-sensor fusion . 184