Author’s Accepted Manuscript Automatic recognition of anatomical regions in three-dimensional medical images Balázs Csébfalvi, László Ruskó, Márton Tóth
www.elsevier.com/locate/cbm
PII: DOI: Reference:
S0010-4825(16)30155-X http://dx.doi.org/10.1016/j.compbiomed.2016.06.018 CBM2440
To appear in: Computers in Biology and Medicine Received date: 2 March 2016 Revised date: 3 June 2016 Accepted date: 20 June 2016 Cite this article as: Balázs Csébfalvi, László Ruskó and Márton Tóth, Automatic recognition of anatomical regions in three-dimensional medical images, Computers in Biology and Medicine, http://dx.doi.org/10.1016/j.compbiomed.2016.06.018 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Automatic Recognition of Anatomical Regions in Three-Dimensional Medical Images
1
2,
Balázs Csébfalvi , László Ruskó Márton Tóth
1
1
Budapest University of Technology and Economics, Department of Control Engineering and Information
Technology. Address: Hungary, H-1119, Budapest, Magyar tudósok körútja 2 2
GE Hungary Healthcare Division, Address: Hungary, 6722, Szeged, Petőfi Sándor sgt 10
Corresponding author:
[email protected]
Abstract This paper presents a method that detects anatomy regions in three-dimensional medical images. The method labels each axial slice of the image according to the anatomy region it belongs to. The detected regions are the head (and neck), the chest, the abdomen, the pelvis, and the legs. The proposed method consists of two main parts. The core of the algorithm is based on a two-dimensional feature extraction that is followed by a random forest classification. This recognition process achieves an overall accuracy of 91.5% in slice classification, but it cannot always provide fully consistent labeling. The subsequent post-processing step incorporates the expected sequence and size of the human anatomy regions in order to improve the accuracy of the labeling. In this part of the algorithm the detected anatomy regions (represented by Gaussian distributions) are fitted to the region probabilities provided by the random forest classifier. The proposed method was evaluated on a set of whole-body MR images. The results demonstrate that the accuracy of the labeling can be increased to 94.1% using the presented post-processing. In order to demonstrate the robustness of the proposed method it was applied to partial MRI scans of different sizes (cut from the whole-body examinations). According to the results the proposed method works reliably (91.3%) for partial body scans (having as little length as 35 cm) as well.
Keywords: anatomy detection, content-based medical image processing, random-forest classification, expectation maximization
1.
Introduction
The application of three-dimensional (3D) medical imaging techniques is widespread in the clinical practice. Computed Tomography (CT) as well as Magnetic Resonance Imaging (MRI) is routinely applied in diagnosis, therapy planning, and monitoring. The number of cases to be processed is continuously increasing [1], so the computer assisted processing of medical images plays a critical role in healthcare. In order to facilitate various clinical workflows several algorithms have been developed for segmentation, registration, or visualization of medical images. These medical image processing methods are usually specialized to anatomy regions or organs. There are clinical applications for lung nodule detection, virtual colonoscopy, cardiac or cerebral vessel analysis, tumor follow-up, etc. Furthermore, many clinical workflows are specialized to a region of interest (ROI). For example, in radiation therapy planning the definition of the organs at risk varies among anatomy regions. Even basic functionalities such as visualization can be specialized to the organ of interest (e.g. lung CT exam is read using different window/level setting in comparison with abdominal or brain cases). Automated pre-processing functions (e.g. atlas-based organ segmentation) are part of most medical image processing systems. Having no information about the anatomy regions included in the imaging examinations, these functions are usually triggered by pre-defined keywords inserted in the description of the image series. Since the description is filled in by a human operator, simple typos can result in incomplete processing of cases, the correction of which takes expensive time from the physician. The DICOM standard involves tags to specify the anatomy location for each slice of an image series, but it has undefined value in most of the cases [2], so an automated function cannot rely on that. From the above-mentioned examples one can see that the automated detection of anatomy regions in 3D medical images would have great impact to content-based medical image processing. It would create many opportunities to automate or optimize various types of algorithms (i.e. initialization of segmentation or registration methods), which would save significant amount of time and workload for
the clinicians. Therefore, the need for automated detection of anatomy regions does not directly come from the user, but its benefits would be definitely welcomed in medical image processing.
2.
Related Work
As early as 1998 the necessity of a well-defined anatomical knowledge representation has been recognized [3]. This idea has revealed that an anatomical ontology could serve well in an automatic medical image processing workflow as a primary source of anatomical information. The usage of medical Picture Archiving and Communications Systems (PACS) is now part of the daily clinical routines. Florea et al. [4] have compared the automatic image categorization capabilities of different PACSs and have found that these systems are capable to recognize the main anatomical structures even in complex environment. However, the reliability of the recognition has been highly various among the different body regions. Other research groups have mainly focused on organ or anatomical landmark point detection [5] [6] [7]. They have used probabilistic algorithms to automatically define salient landmark points that can be used to navigate through the body scans. Their methods have focused on CT scans only, mainly using full body scans. Other techniques have been developed to estimate the location of the axial slices in the human body scans [8] [9] [10]. They have used various methods and their accuracies varied between 16.6 mm and 28.3 mm, but they have considered CT modality only. Atlas based registration methods are also available for anatomy detection. The most related approach [11] performs a non-rigid registration to align a statistical model to a full-body MRI scan, but its usability is limited to full-body MRI scans. The recent VISCERAL challenge [1] has also showed that the automatic anatomy detection is still a very active filed of medical image processing research and could be well utilized in the clinical practice. In [12] He et al. have used a region of interest (ROI) detection for their multi-organ segmentation framework which is a similar task to the anatomy detection. Their work has considered CT modality.
Other works have considered the MRI modality as well. In [13] the authors have used their manifold learning method to classify the axial slices of a low-resolution preliminary scan to estimate the patients position. They have achieved a classification result above 90%. A similar approach have been used in [14] for slice classification and a 94% overall correct classification rate has been reported. Similar to [5], Criminisi et al. have extended their method to analyze MRI images as well in [15] and [16]. They have achieved high precision in landmark detection in MRI exams. In summary, the existing methods consider CT and MRI images and focus on landmark point or organ detection. A smaller part of the works has focused on the direct classification of the axial slices of a 3D image. It has not received much attention yet and none of the previous algorithms did consider working with partial body images, where the spatially coherent labelling of the axial slices is even more challenging. The goal of the presented work was to develop a fully automated method that can label each slice of a 3D image series according to its anatomical location. The method shall recognize the main anatomy regions, shall handle whole as well as partial body series (to cover all clinical scenarios), shall be robust to patient variations (sex, age) and abnormalities (obesity, pathology), shall not have extreme computation demand, and shall be easy to adapt to any common 3D medical image type, so that it can be used in wide range of clinical applications.
3.
The Data Set
The set of 49 full body MRI cases referred in this paper was acquired for research purposes in the University Hospital Zurich. The acquisitions were performed with a Discovery MR750w MRI scanner using the same protocol (LAVA-Flex sequence, T1 weighted, FA/TR = 5°/3.7 ms, acquisition matrix = 256x128, 75% phase FOV, scan time 17 s, TE1/TE2 = 1.15 ms/23 ms). The images included all major anatomical regions, the head and neck, the chest, the abdomen, the pelvis, and some part of the legs as well. The test images involved male and female patients of different age (adults only) and level of obesity. The axial resolution of each image was 512x512 pixels. The number of slices varied between 260 and 864 slices (average 447). The pixel size was between 0.39 mm and 1.37 mm (average 0.91 mm),
the slice thickness was between 0.47 mm and 8.8 mm (average 5.79 mm). The applied T1 protocol introduces some modality specific image property, but it does not reduce the usability of the method as T1 or some similar protocol is available for all MRI devices and its acquisition is part of the regular MRI examination process. The image data set was manually labeled. The axial slices were annotated as LEG, PELVIS, ABDOMEN, CHEST or HEAD&NECK. The definition of these classes was based on the basic structure of the human body. The LEG begins at the bottom of the MRI scan and ends where the two legs join, which is the beginning of the PELVIS. The ABDOMEN begins at the top of the pelvic bone and ends at the top of the liver that is the beginning of the CHEST. The HEAD begins at the top of the shoulders and ends at the top of the head. The test dataset involved 21401 manually labeled axial slices. Based on the manual labeling the average size and the variance were calculated for each region, which is presented in Table 1. Table 2 presents statistics about age, height, and weight of the scanned patients. The data set contained 20 female and 29 male patients.
4.
The Method
The proposed method consists of three main steps. In the first step the axial slices of the image are separately pre-processed. This is required to eliminate several disturbing image artifacts and to reduce image complexity. In this step the axial images are normalized, so that they can be used in a uniform way in the further processing steps. This is followed by a classical image recognition step that uses 2dimensional feature extraction and a machine learning approach to classify the individual axial slices of the image by assigning the probability of each label to each slice. To calculate the feature vectors a global shape feature extraction algorithm is applied called the Zernike transform. It is followed by a feature vector size reduction method using Principal Component Analysis (PCA). This way a compact representation is generated for each slice. The feature vectors are further processed by a Random Forest classifier. This machine learning approach assigns a probability for each label to an input slice. The parameters of the PCA transform and the classifier are estimated during the training process. In the final step a post-processing method is applied, a three dimensional coherence inspection method, which
takes the valid sequence and the mean size of the anatomy regions into consideration to eliminate the false slice classifications and guarantees the correct order of labels and the correct size of the recognized anatomy regions. The complete workflow is illustrated in Figure 1. 4.1. Slice Normalization MR images acquired in the clinical practice are usually retrieved from different scanners, show significant variation in patient size, positioning, and have different image properties (such as intensity range and image resolution). These differences originate from the different imaging settings of the scanners and the anatomical differences of the patients. The first thing that has to be normalized is the intensity range of the image. Since the MRI intensities are not standardized like in case of CT images, the intensity range can vary on a large scale, even if they were acquired using the same (e.g. T1) protocol. To deal with this phenomenon the intensity range of the MRI scan is normalized to the range [0, 255] using a histogram based and a linear technique. First, the histogram of the axial slice is computed then the 98% percentile value is computed. This provided an upper limit to the intensity values and filtered the high value outliers. After the computation of the upper limit, the recalculated intensity scale is mapped to the normalized range. After this normalization the main tissue types have similar intensities in different MR images, so the image can be handled uniformly in the further processing. Differences caused by the different imaging settings have to be also handled carefully. The input images may have different resolution, field of view (FOV), and even the positioning of the patients can vary. To handle these problems a multistep process is applied. First, the resolution of the axial slices is normalized. The input slices are down-sampled to 128x128 pixels. This down-sampling reduces the latter computation time significantly and it still preserves the information required to recognize an axial slice. The next step of the normalization is the correction of the patient position. Different positions can cause translation of the body on the axial slices. To correct this effect, the Hu image moment [17] is computed for each axial slice, as it is showed by Equation 1. ∑ ∑
(
)
(1)
In Equation 1 (
) defines the image function belonging to the -th axial slice. To correct the
translation effect, the center of gravity of each axial slice has to be moved to the center of the image. This can be done by applying a transform from ( ⁄
,
⁄
and
and
) to (
(
)
(
)), where,
denote the coordinates of the center of the image.
After the translation correction a FOV normalization is applied. In this step the image is cropped to a fixed physical size. This ensures that the cross section of the body fills the image disregarding the original pixel size and image resolution (that is always available in the DICOM header of the input images). The result of this step together with the other main steps of the pre-process is shown in Figure 2. After this normalization process the input image is free from the most interfering artifacts and can be used in the further steps of the proposed method. 4.2. Feature Extraction In order to represent the structural content of an axial slice and to make it suitable for classification by a machine learning technique, a reduced number of features (referred as feature vector) is extracted from it. The computation of these feature vectors consists of two steps. First, a global image shape descriptor is applied to capture the structural content of the image and subsequently a dimension reduction method is applied to the generated feature descriptor to minimize its size and the complexity of the machine learning model. In the first step, the Zernike transform is used, while in the second step a Principal Component Analysis (PCA) is applied. 4.3. The Zernike transform The original mathematical transform has been introduced by F. Zernike [18] for optical problems and his work has been later applied in the field of image processing [19] for image feature extraction. The Zernike transform provides a set of orthogonal complex moments of an image. These moments can be used as a global image descriptor. The moments can capture the structural content of the image, so they are widely used as global scene/shape descriptor [20][21][22] when the displayed shape is a fundamental property of an image. There are several shape descriptors that are used in image retrieval
and classification tasks, but the Zernike transform has some important properties which can be exploited in the recognition of the axial slices of a 3D medical image. The transform can be formalized in a rotation invariant way, which is an important property as it makes the transform not sensitive to the rotation of the patient. The Zernike transform performs better in shape recognition tasks than other geometrical moments [23] and it shows resistance to some typical characteristics of MRI scans like the unstandardized intensity or the intensity inhomogeneity. According to its mathematical formulation the transform is interpreted within a unit disk and this geometrical arrangement is similar to the cross-section of the human body and the circular shape of the FOV of the axial slices in 3D medical images. The Zernike moments can be expressed as a set of complex polynomials the unit disk i.e., ( where √( (
a )
) over the interior of
. In polar coordinates the form of the polynomials is:
)
is
(
(
)
( )
non-negative (
(
)
integer,
(2)
is
an
integer,
| | is
even
and | |
) is the length of the vector from the center of the image (
); is the complex unit vector and
(
;
) to a pixel
) is the angle between the vector and the axis.
is the Radial polynomial defined as: ∑(
( )
(
| |) ((
)
) ( ) ((
) )
(3)
)
The Zernike moments are the projection of the image function onto these basis functions. So the twodimensional Zernike moments of order
and repetition
for a discretized function (
), which
vanishes outside the unit disk, can be defined as follows: ∑ ∑ where slice).
(
)
( (
)
(
)
). In this work, the (
(4) ) function represents the image (i.e. a 2D axial
Suppose that one knows all the moments reconstruct a function ̂(
of the function (
) up to a given order
) whose moments exactly match those of (
) up to the order
, to one
should use the following equation: ̂(
)
∑
∑
(
)
(5)
where the same constraints are applied as in (2). Note that as approach (
approaches infinity, ̂(
) will
).
Figure 3 illustrates the results of the forward transform followed by a back transform using different orders of the Zernike transform. In image 3/a one can see the original axial slice, while 3/b, 3/c, and 3/d th
th
th
show the reconstructed images using 5 , 25 , and 50 order transform, respectively. Applying a higher order transform captures more details but increases the number of the moments rapidly, therefore the th
order of the transform shall be limited. Based on visual inspection of MRI images the 25 order of the transform was selected, which provides a visually recognizable reconstructed image while the number of the provided moments is still manageable. 4.4. Principal Component Analysis PCA [24] is a mathematical transform that uses an orthogonal transform to map a series of observations of correlated variables into an uncorrelated set of variables. These variables are called principal components. PCA is used for dimension reduction while the variance of the samples is preserved as much as possible. The procedure is defined in a way that the first principal component has the highest variance and each succeeding component has the largest possible variance under the constraint that it has to be orthogonal to the previous components. Although, the Zernike transform captures all the necessary information, it still provides near 200 complex moments. It has to be reduced to get a feature vector that can be handled by a machine learning tool. The transformation matrix of the PCA is calculated on the training data set, as this calculation is part of the training process. Later, this pre-calculated transformation is applied in the recognition phase to reduce size of the output of the Zernike transform approximately to the tenth of its
original size. In this work the WEKA [25] data mining tool was used to perform the calculation of the PCA transform. By applying this two-step feature vector extraction process, a compact representation can be obtained for each axial slice of an MRI scan. Using this representation, a machine learning tool can be trained to assign anatomy label to each axial slice of a 3D image. 4.5. Machine Learning In many image recognition works various machine learning tools are applied to classify images based on extracted features. In this work the Random Forest classifier [26] was used. This classifier creates multiple, independently trained decision trees. It was shown in [26] that this composition of decision trees has better generalization ability than the classical decision trees through the applied randomness during the training process. Supervised, discriminative classification methods have been successfully applied in many fields of medical image processing like tumor detection [27] or organ localization [28]. Support Vector Machines (SVM) [29], AdaBoost [30] and Probabilistic Boosting Trees [31] are well known and proved methods for these tasks. However, it has been shown that random forest performs better in multi-class problems than SVMs and it is more effective than boosting [32] [33]. The random forest has another important property that can be utilized in this work. It does not only assign a label for each axial slice, but also gives the confidence of its decision. As the random forest consist of multiple decision trees, each tree assigns a label for each axial slice. The distribution of this labeling shows what percent of the trees voted for each possible label, which can be interpreted as confidence value for each label. This confidence distribution of the labels is used in the further step of the proposed algorithm to correct the errors in the result of the initial classification. As all these properties of the random forest were favorable in this work, this technique was selected as the machine learning tool. An implementation of random forest is also available in the WEKA toolset [25].
5.
Training the Classifier
To calculate the parameters of the PCA and to train the random forest classifier, the following training process was performed (Figure 4). First, the image database is randomly split into two parts. The smaller part (including 22 patients) is used to train the system while the larger part (including 27 images), is used for the evaluation of the trained system. The training process estimates the parameters of the PCA and the random forest and provides a model that is used to label the test data. The initial classification is achieved by assigning the most likely label to each slice. After the labeling is performed, its result is compared with the manually assigned reference labeling. The result of the comparison is displayed using a confusion matrix. To have a more accurate picture about the overall accuracy of the system, the above described training process was repeated ten times, with different random division of the original image database. This way the training and the evaluation steps were performed multiple times and a comprehensive overview was obtained about the accuracy of the classifier by the usage of different parts of the dataset. The confusion matrix was generated for each run of the repeated training/testing process and all of them were aggregated into a common confusion matrix which can be seen in Table 3.
6.
Accuracy of the Initial Slice Classification
Table 3 represents the summarized confusion matrix based on the ten repetitions of the training and the evaluation steps. The columns of the table specify the known ground truth labeling while the rows represent the labeling assigned by the trained system. To assess the performance of the system the following metrics were used. For each anatomy label the class precision (last column) was calculated as the number of the correctly labeled slices dived by the number of all slices that are associated with that specific label by the proposed method. The class recall (last row) metric is defined as the number of the correctly labeled slices for a specific label class divided by the total number of slices belonging to that specific class according to the ground truth labeling. The overall accuracy (black cell in last row) is defined as the ratio of the number of correctly labeled slices (belonging to any label) to the total number of slices in all classes.
Table 3 demonstrates the summarized accuracy of the random forest classifier measured on the test sets. According to the table the overall accuracy was 91.49%, which means that 91.49% of the slices involved in the test sets were correctly labeled. The class recall was the highest (98.16%) for HEAD&NECK, which means that 98.16% of the slices, which belong to the head according to the ground truth, were labeled as HEAD&NECK by the initial classification. The class precision was the highest (98.46%) for HEAD&NECK, too. It means that 98.46% of the slices, which were classified as HEAD&NECK, did actually belong to the head (according to the ground truth). This means the head region was recognized with the highest confidence. In contrast to HEAD&NECK the PELVIS is recognized with the least confidence (see 85.72% recall and 83.21% precision). The main diagonal of the table contains the correctly labeled slices for each class. The elements directly above and below of the diagonal represent slices which were confused with a neighboring region, while the remaining elements represent large confusion in the initial labeling. The first type of the confusion errors is not considered as a major problem, because it usually means border shift between the neighboring regions or some alternating labels near the region borders. The second type of confusion is considered as greater mismatch which means the classifier completely missed the labeling (e.g. 74 LEG slices were labeled as CHEST by the initial classification). This effect could be caused by several factors. As the random forest classification is a machine learning tool that performs statistical operations, it has the possibility that the generated model could not handle all feature vector occurrences correctly. Another factor that should be considered is the variation of images including image noise. Figure 5 shows an example where the exam was merged form multiple sub-scans and the original borders contain high level noise, therefore the initial classification fails to recognize those regions. The different colors represent the different anatomy regions (yellow - HEAD&NECK, purple - CHEST, green ABDOMEN, red - PELVIS, and blue – LEG). The left side of the image contains a coronal view of the exam, showing the merging artifacts, and on the right side the confidence values provided by the random forest are plot. Two other examples for the above-mentioned errors are illustrated in Figure 6. The left side of both examples shows the initial labeling (based on maximum confidence) and the confidence values provided
by the random forest are plot on the right side. As it can be seen, even the most accurate classification result (6\a) includes some outliers (some slices in the middle of the ABDOMEN are labeled as PELVIS). This indicates that the independent processing of slices, even if the overall accuracy seems high enough, cannot be reliable, as it does not incorporate the neighborhood of the slices. When 6\b is considered, the results are more confusing. Although the majority of slices are correctly classified, the result contains several misclassified slices (e.g. chest detected in the pelvis). Some labels are confused with labels of other regions, so such a result cannot be considered as useful information in clinical applications. The results summarized in Table 3 and the observation illustrated in Figure 6 led to the recognition of the necessity of a post-processing step that takes the neighborhood of the slices as well as the normal size of anatomy regions into consideration to assign correct labeling to an image series.
7.
Establish Coherent Labeling
Up to this point the slices are handled independently by the proposed method. This approach leads to some errors which make the recognition process less reliable. These errors are corrected by a postprocessing step described in this section. This step incorporates some a priori information about the human anatomy in the recognition process to establish a coherent final labeling. The first and most obvious fact that is exploited is the order of the anatomical regions. The labels can follow each other in the right order: LEG, …, LEG, PELVIS, …, PELVIS, ABDOMEN, …, ABDOMEN, CHEST, …, CHEST, HEAD&NECK, …. This order cannot be changed as it is a trivial property of the human body. The second consideration is that the size of the anatomy regions cannot be arbitrarily large or small. It can vary among patients but each region has its expected length and variance (except for the leg that can be fully or partially scanned). The post-processing can incorporate these lengths to estimate the final labeling. The mean size of each region was calculated based on the manually labeled test database that is large enough to have a good estimation for these statistical values. The mean region sizes are displayed in Table 1.
The third element to incorporate is the reliability of the initial classifier. The previously introduced training and evaluation process provides information about the class precision values, see in Table 3, but these values cannot be used directly in the post-processing as they are measured on test sets and therefore the introduction of this information in the post-processing would cause a bias in the final evaluation. However it is still a reasonable request to obtain information about the reliability of the base classifier and therefore an out of bag error estimation [34] was used during the training. This allows getting unbiased reliability information without significant modification of the framework. Based on the result of the out of bag error estimation the following order of reliability was defined: HEAD&NECK (most reliable), LEG, CHEST, PELVIS, and ABDOMEN (less reliable). Incorporating the above mentioned set of priori information, the proposed post-processing consists of three main steps. In the first step a set of possible region configurations is generated based on the result of the initial classification. Then, the candidate regions represented by Gaussian curves are fitted on the confidence values using a dynamic optimization technique. When the optimization is done for each possible region combination, the best fitting region combination is used to define the final labeling of the input image. 7.1. Selection of possible region configurations In the first step of the post-processing, those regions are selected which are detected by the random forest classifier with high confidence. This step has a very important role in case of partial exams (when not the entire body is acquired). First, the size of each detected region is computed using the detection confidence as a weight. More specifically the following formula is computed for each region type (i): ∑ where
(
)
[
(
)
]
(6)
is the confidence weighted region size,
(
) is the confidence value assigned by the
Random Forest (that represents the percentage of the decision trees voted for region in case of axial slice ).
is the physical thickness of the slice
and it is usually the same for all slices). [
(
(this value is known from the DICOM metadata tags )
] defines in turn a limit to the minimal confidence
value that is considered as a valid recognition. Confidence values below are not considered in the size
estimation. This part of the equation is used to filter out the small confidence values that could sum up and could cause uncertainty in the size estimation. Such small confidences could be generated by the random forest classifier and are considered as small noise in the recognition process. After computing Equation 6 the regions are processed in the following order: HEAD&NECK, LEG, CHEST, PELVIS, ABDOMEN. The first region that exceeds the predefined detection limit is selected as the starting region and added to the list of candidate regions. The order of candidate region selection was derived from the out of bag estimation while the detection limit represents a minimal size that is expected to detect a region. After the starting point of the region configuration is selected, the neighboring regions are checked, if their weighted size is above the detection limit. If a neighboring region has the required size, it is added to the region configuration. Then, the search continues with the further neighboring regions. This process continues until the appropriate number of regions is selected. As the physical length of the image series is known, the number of regions can be estimated in the input image. Based on the manually labeled image database the average physical size of an anatomy region is Therefore the number of regions to be detect in an image is ⌊
⁄
⌋
, where
. is the
physical size of the input image. After selecting a possible region configuration, two additional configurations are generated from the original one, by leaving out each terminal region from the original configuration. This is important when the original image series contains only a few slices of a terminal region and the overall accuracy of the labeling would be better if the terminal region would be eliminated. This way, multiple region configurations are obtained. In the next step of the post-process, these configurations are evaluated, in order to see, which one fits to the confidence values of the initial classification the best. 7.2. Fitting the Possible Configurations to the Confidence Curve In this step, each configuration is fitted on the confidence distribution one by one. Each region in a configuration is associated with a Gaussian distribution function. For each region the variance is set according to the expected size of the region and the center position is set based on the confidence
weighted center of gravity belonging to the appropriate confidence distribution. An example for this setting is illustrated in Figure 7. The background bar graph represents the confidences resulted by the initial classification. Each slice is represented by a column in the chart and the height of the bars shows what percentage of the decision trees of the Random Forest classified the slice as member of a specific region (using the earlier introduced colors). The regions in the selected configuration are represented by the Gaussian distribution functions having the same color as the corresponding confidence distribution. The vertical orange lines show the intersection points of the Gaussian distributions. These lines represent the borders of the regions. The vertical black lines show the region borders defined by the ground truth labeling. To perform the optimization, the positioning of the Gaussian functions should be refined. This can be done by the following three main considerations: a)
the curves should maximize the sum of the correctly covered confidence values
b) the curves should minimize the sum of the incorrectly covered confidence values c)
the curves should minimize the overlap between each other
The first two considerations assume that the initial classification provided by the random forest is mainly correct, therefore the Gaussian curves should maintain the main aspects of the initial classification. To align the curves to the confidence values, two forces are introduced. The first force (a) attract the curves to the center of gravity of the corresponding confidence values. The second force (b) repulses the curves from the non-matching confidence values and minimizes the overlap. The third point (c) is introduced to prevent the change of the sequence of the curves during the iterative optimization process. To formalize point (a), the attraction force, that stabilizes the position of the Gaussian distribution function, Equation 7 is presented: ⃗⃗⃗⃗⃗⃗⃗⃗⃗
∑
⃗⃗⃗⃗⃗ |⃗⃗⃗⃗⃗ |
(
(
))
(7)
where ⃗⃗⃗⃗⃗⃗⃗⃗⃗ is the summarized attraction force on the Gaussian distribution function of region , ⃗⃗⃗⃗ is a vector from the center of the distribution function to the -th slice ,
(
) is the same as in (6), and
is the number of axial slices in the input image. The second force, point (b), which affects the movement simulation is a repulsion force generated based on the confidence distributions, but in this case the Gaussian distribution function of a region is pushed away by the confidence values belonging to other regions. This is expressed by Equation 8:
⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗
∑
(
⃗⃗⃗⃗⃗ |⃗⃗⃗⃗⃗ |
( ) (∑
(
(
))
∑
(
(
)) ))
where ⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ is the summarized repulsion force on the Gaussian distribution function of region the value of the -th Gaussian function at the position of the -th axial slice, and
(8)
( ) is
is the total number
of regions. The third force, point (c), which affects the movements is the repulsion between the Gaussian distributions. This is defined by Equation 9: ⃗⃗⃗⃗⃗⃗⃗⃗
∑
⃗⃗⃗⃗ |⃗⃗⃗⃗ |
∑
( ( )
( ))
∑
⃗⃗⃗⃗ |⃗⃗⃗⃗ |
∑
( ( )
( ))
(9)
where ⃗⃗⃗⃗⃗⃗⃗⃗ is the summarized repulsion on the Gaussian distribution function of region exerted by the other distribution functions, ⃗⃗⃗ is a vector from the -th Gaussian to the -th Gaussian distribution, and ∑
( ( )
( )) is the square of the total overlap between the Gaussian distribution calculated
over the whole image series. Summarizing these forces for each Gaussian distribution, their displacement can be calculated. After performing these calculations and applying the displacements to the Gaussian curves iteratively, the whole system reaches the equilibrium. The final labeling for a region configuration is obtained by finding the section points between the neighboring Gaussian distributions. These section points designate the borders between the anatomy regions. In Figure 8 the result of the simulation can be seen. Compared to the initial layout illustrated in Figure 7, one can see that optimized labeling fits better on the manual
labeling, the order of the regions is correct and the size of each region approximates well the expected value. 7.3. Selecting the Best Configuration After the optimization process is calculated for all region configurations, the best fitting configuration is selected. This is done by calculating a reliability factor that incorporates both the size of the included regions and the correspondence between the original confidence and the fitted Gaussian distributions. To measure the correctness of the region sizes Equation 10 is used.
refers to the difference between
the estimated size of region and the expected size of that region.
is the average size of region ,
is its variance, and
is the calculated region size. The average size and its variance are calculated on
the image database (see Table 1). ( expected value, (
variance and (
) is the value of a Gaussian distribution with
is the position where this distribution is evaluated.
))
(10)
The other component of the reliability measurement is the fitting on the original confidence values. This is formulated in Equation 11. This formula consists of two parts. In the first sum those slices are considered which are not labeled as part of region after the optimization process. This is expressed by [
]. Among these slices only those ones are used which are labeled as part of the region by the
Random Forest. This is expressed by [
(
(
))]. Then, the confidence values of these
slices are summarized. This part of the equation summarizes the confidence values of slices which are initially labeled as part of the region , but later the optimization process modifies them. The second part of the formula is the opposite of the first part. It summarizes confidence values of those slices which are originally not part of the region, but during the optimization they became members of it. Therefore these two parts summarize the differences between the decision of the Random Forest and the decision of the optimization process.
∑
( ]
(11)
)[
(
(
))] [
]
∑
(
(
))[
(
(
))] [
The final reliability score is calculated by the multiplication of these two components expressed by Equation 12. Based on the
factor, the possible region configurations are sorted in descending order
and the first one is chosen as the final labeling. ∑
8.
(12)
Evaluation for whole-body images
The proposed algorithm was evaluated using the same test framework introduced in sections 5 and 6. The corresponding confusion matrix resulted by this evaluation is displayed in Table 4. According to the results the overall accuracy increased to 94.13% and the most important issue of the initial labeling (confusion with non-neighboring region) was completely eliminated. The confusion matrix has nonzero values only in the main diagonal (94.13%) and in cells directly below and above the diagonal (5.87%). This means the majority of the slices are correctly labeled and there are confusions only between neighboring regions. There were no serious mistakes in contrast to Table 3 where 0.96% of the slices were significantly misclassified. As the correct sequence of the labels is always ensured in the result, the confusion between the neighboring regions mean only small displacements of the borders between two regions. Figure 9 illustrates the accuracy improvement for one selected case. It shows both the initial classification and the effect of the post-processing method. As is can be seen the most serious misclassifications were corrected, the correct sequence of the labels was established and the all detected regions have a reasonable size and fit the real anatomical structure of the human body. Some further examples are shown in Figure 10. This figure shows results for a few full body MRI cases which demonstrate normal patient variation (concerning sex and level of obesity).
9.
Evaluation on partial body images
Since the acquisition of full body scans is not so frequent in the clinical practice (especially in case of MRI modality), the method was tested on partial images as well. The test images were created from the full body scans by using a sliding window sampling. Using this technique, different samples were cut out from the whole-body scans at different locations with different sample sizes. The sampling size varied
from 350 mm to 1150 mm (with 100 mm steps), and all possible subseries were generated. This means that all together 36,681 different partial body scans were generated which contained 10,513,282 slices. To perform these tests, the system was trained in the same way as it was with full body scans, but this time the training process was not repeated multiple times, only one random partition of the image database was used to train the system. Table 5 summarizes the accuracy of the anatomy labeling measured on the partial body scans. The numbers show the overall accuracy was 91.27% that is somewhat lower than in case of whole body scans. The table shows that the majority of the slices are correctly labeled, there were confusions only between the neighboring regions (8.73%), and there were no serious misclassifications. To show a more detailed picture about the accuracy, a graph is presented in Figure 12. The continuous black line represents the average overall accuracy rate, the dotted line shows the minimal, and the dashed line shows the maximal accuracy rate belonging to test images having the same length. The diagram shows the worst accuracy was observed at a 350 mm scan which had 40% labeling accuracy, but as it can be seen in Table 5 it does not mean a complete mislabeling, it means only larger displacement of region boundaries. Pay attention also to the high average accuracy that is nearly 90% in all cut size, which means the majority of the partial scans are correctly labeled. It is also remarkable that the minimal accuracy rises with the increase of the width of the partial scan, and exceeds 90% when the partial scan is longer than 1150 mm. This effect is due to the information content of the initial classification. As the length of the exam is rising, the post-process has more and more information to estimate the correct labeling and to eliminate the false classifications of the initial classification. To further analyze the accuracy of the results, Table 6 is presented. This table shows the same statistical values seen in Figure 12 extended with the standard deviation of the overall accuracy for each cut size. The low deviation values (<10%) indicate the majority of the results are very close the average accuracy and only a few mislabeled results belong to the minimal accuracy values. An accuracy drop can be observed at 750mm in the minimum accuracy curve. This can be explained by the discrete decisions made during the optimization process and the statistical variations in the random
forest classification. Considering that the average region size is 215 mm, a partial scan, having 750 mm length, can contain three or four anatomical regions. This discrete decision has to be made in the postprocess and can have an impact on the final accuracy. However, this accuracy drop does not influence the tendency as the average accuracy at 750 mm is 91.68%, only the variance is larger (5.21%) compared to the other values. To illustrate the effect of the discrete decisions in post-process Figure 11 is presented. It shows a special case where two cuts slightly differ in position (a few cm). In the left image the leg region is included but the right image does not contain this region. This few centimeter difference causes a drop in the confidence weighted size of the leg region, therefore it falls below the detection limit in right image. To illustrate the reliable results on partial scans Figure 13 is presented. It contains examples with different cut sizes and different patient data.
10. Examples for pathological cases In the clinical practice the acquisition of pathological cases is part of the daily examination routines. The method shall be prepared and robust in the recognition of such cases. Figure 14 includes some examples from the data set containing pathological cases. At each case the right image shows the confidence values retuned by the random forest for the wholebody scan. In the middle a coronal view is presented showing the labeling provided by the post-process considering only the critical region and the third image shows an axial view with the pathological lesion. In Figure 14/(a) the right lung is filled with fluid, in (b) there is a tumor in the left lung, and in the (c) image the MRI scan is distorted by an implant next to the spine. It can be observed that confidence values are stable despite the presence of abnormalities and the whole classification workflow can produce reliable classifications.
11. Comparison with other methods To compare the results, two other potential solutions were tested. In a previous paper [35] a graph based approach has been used, similar to Nakamura’s work [10], to estimate the labeling. This solution worked well for whole body exams but failed in case of partial scans. This can be explained by the
applied graph structure and the dynamic programming method. In Figure 15 an example is illustrated where the PELVIS region collapsed to one slice. The method could not guarantee the correct size of the anatomic regions. Another straightforward concept could have been the application of Hidden Markov Models (HMM) [36] in the post-process. However, this method also suffers from the problem of the correct estimation of the size of anatomical regions. Tests showed examples where the proposed method outperforms the HMM based post-process. Figure 16 shows an example where the HMM based approach (middle image) extended the PELVIS region too much and compressed the ABDOMEN region. However, the proposed method (right image) gave correct estimation for the anatomical regions compared to the manual labeling (left image). As the HMM approach is based on a graph structure as well, the results are explainable. All the graph based methods can take into consideration only a limited 3D spatial neighborhood, therefore they can make only local decisions. This limits their usability, and a global optimization is necessary to find an acceptable solution. To show a numerical comparison the work of Nakamura [10] has been chosen. Although their method was developed for CT images it was chosen for comparison because they use similar body subdivision as presented in this work and their accuracy measurement was also the same. According to the results (displayed in Table 7) the accuracy characterizing Nakamura’s and the proposed methods are very similar. Nakamura’s method performs better in the identification of the CHEST (97.3%), the ABDOMEN (93.1%) and the LEG regions (99.8%), while the method proposed in this paper was better in the recognition of HEAD&NECK (98.6%) and PELVIS (94.7%) regions. Other related publications detect organs or anatomy landmarks. In publications [8] and [9] the authors reported localization errors between 16.6 mm and 28.3 mm. Since the detection of anatomy regions can be considered as localization of region boundaries, the displacement of the region boundaries can be also considered as localization error. For the sake of comparison with other types of methods, the average region boundary displacement was calculated for whole-body examinations (see Table 8). Based
on the results the most reliably detected boundary was between the CHEST and the ABDOMEN (average error 10.59 mm), and the least reliably detected boundary was between the ABDOMEN and the PELVIS (average error was 24.07 mm). The precision of the least reliable boundary was limited most likely by the basic image feature extraction method. As the Zernike transform describes the shape properties of the axial slices, it can capture only limited amount of differences between the ABDOMEN and the PELVIS regions. Considering other neighboring regions (e.g.: HEAD and CHEST) the difference in shape is more significant therefore the transforms allows a more precise border localization. The average error considering all boundaries was 19.78 mm with 12.28 mm standard deviation, which represents the same level of precision demonstrated by other methods for detecting anatomy landmarks. It is important to emphasize a main feature of the proposed algorithm: the presented method can be applied to partial body scans. This is important because partial images are commonly acquired in the clinical practice. Nakamura’s method uses a graph based post-processing method that can ensure the correct ordering of the labels, but cannot incorporate the actual physical size of the anatomy regions therefore its applicability is rather limited in case of partial body scans. In contrast, the proposed approach applies such a post-processing method which ensures the correct and consistent recognition of the anatomical structures even if some of the regions are missing in the image. Most of the related publications [6], [7], [10], [11] assume the anatomy structure to be detected is always found in the image, and the presented methods were not tested in cases where this assumption is not true.
12. Conclusions This work presents a new method for automated detection of anatomy regions in 3D medical images. The presented approach uses the Zernike transform and PCA to extract feature vectors from axial slices of a 3D medical image, and classifies them using random forest classifier. The result of the initial classification is further processed by a 3D coherence inspection method to eliminate false labeling. According to the evaluation on MRI exams, the proposed method works well for whole-body images, and the final labeling follows the correct order of the anatomical regions of the human body and provides anatomy regions with reasonable size. An important contribution of this work is the applicability of the method to partial body scans as it was not considered by previous works. The
presented algorithm does not rely on the imaging properties of the MRI modality, except for the intensity normalization in the preprocessing step, therefore the method can be applied, after a proper training, for other image protocols or modalities such as CT. The adaptation of the method to other modalities is one of the most important future directions of the presented work. Another important future direction is the adaption of the method to partial FOV images (when not the entire cross section of the body is visible in the axial slices), which will make the method applicable to facilitate wide range of algorithms including visualization, segmentation, and registration. This way it can be a good basis for content-based automated medical image processing.
13. Acknowledgments The dataset is from the University Hospital Zurich. Special thanks to Dr. Patrick Veit-Haibach and Gaspar Delso for making the dataset available for us. This work was supported by the research projects OTKA K-101527, VKSZ-14 SCOPIA and Analitic Healthcare Quality User Information Program of the National Research, Development and Innovation Fund, Hungarian Government, Grant VKSZ 12-1-2013-0012. Appendix – The parameters of the machine learning tools Table A.1 summarizes the achieved initial accuracy values using different number of PCA components and trees in the random forest. It can be seen from the table that the achieved accuracy highly depends on the number of the trees. It is plausible, as the number of the trees determines the generalization ability of the forest. However, it can also be seen in Table A.2, that the required training time increases rapidly with the number of trees. By applying a PCA transform this training time can be decreased significantly and even the initial accuracy can be increased slightly. This increase can originate from the dimension reduction. As the 25
th
order Zernike transform performs an orthogonal projection, the
orthogonal components of the PCA can represent the data set similarly but can reduce the size of the non-well separating components. This way the random forest does not have to handle complex feature vectors and an optimal representation can be achieved. In this work 50 PCA components and 500 trees in the random forest were used.
References [1]
O. A. Jim, O. Goksel, B. Menze, M. Henning, G. Langs, M. Andr, I. Eggel, K. Gruenberg, M. Holzer, G. Kontokotsios, M. Krenn, S. Fernandez, R. Schaer, and A. A. Taha, “VISCERAL — VISual Concept Extraction challenge in RAdioLogy : ISBI 2014 Challenge Organization,” in Proceedings of the VISCERAL Challenge at ISBI, 2014, no. 1194, pp. 6–15.
[2]
M. O. Gueld, M. Kohnen, D. Keysers, H. Schubert, B. B. Wein, J. Bredno, and T. M. Lehmann, “Quality of DICOM header information for image categorization,” in Proceedings of SPIE, 2002, vol. 4685, pp. 280–287.
[3]
C. Rosse, J. L. Mejino, B. R. Modayur, R. Jakobovits, K. P. Hinshaw, and J. F. Brinkley, “Motivation and Organizational Principles for Anatomical Knowledge Representation,” J. Am. Med. Informatics Assoc., vol. 5, no. 1, pp. 17–40, 1998.
[4]
F. Florea, H. Müller, and A. Rogozan, “Medical image categorization with MedIC and MedGIFT,” Med. Informatics Eur., pp. 3–11, 2006.
[5]
A. Criminisi, D. Robertson, E. Konukoglu, J. Shotton, S. Pathak, S. White, and K. Siddiqui, “Regression forests for efficient anatomy detection and localization in computed tomography scans.,” Med. Image Anal., vol. 17, no. 8, pp. 1293–303, Dec. 2013.
[6]
B. Haas, T. Coradi, M. Scholz, P. Kunz, M. Huber, U. Oppitz, L. André, V. Lengkeek, D. Huyskens, A. van Esch, and R. Reddick, “Automatic segmentation of thoracic and pelvic CT images for radiotherapy planning using implicit anatomic knowledge and organ-specific segmentation strategies,” Phys. Med. Biol., vol. 53, no. 6, p. 1751, 2008.
[7]
S. Seifert, A. Barbu, S. K. Zhou, D. Liu, J. Feulner, M. Huber, M. Suehling, A. Cavallaro, and D. Comaniciu, “Hierarchical parsing and semantic navigation of full body CT data,” Proc. SPIE, pp. 725902–725902–8, 2009.
[8]
J. Feulner, S. K. Zhou, E. Angelopoulou, S. Seifert, A. Cavallaro, J. Hornegger, and D. Comaniciu, “Comparing axial CT slices in quantized N-dimensional SURF descriptor space to estimate the visible body region.,” Comput. Med. Imaging Graph., vol. 35, no. 3, pp. 227–36, Apr. 2011.
[9]
T. Emrich, F. Graf, H.-P. Kriegel, M. Schubert, M. Thoma, and A. Cavallaro, “CT Slice Localization via Instance-Based Regression,” Proc. SPIE Med. Imaging, p. 762320, 2010.
[10]
K. Nakamura, Y. Li, W. Ito, and K. Shimura, “A machine learning approach for body part recognition based on CT images,” Proc. SPIE 6914, Med. Imaging 2008 Image Process., vol. 6914, p. 69141U–69141U–9, Mar. 2008.
[11]
M. Fenchel, S. Thesen, and A. Schilling, “Automatic labeling of anatomical structures in MR fastview images using a statistical atlas,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 5241 LNCS, pp. 576–584, 2008.
[12]
B. He, C. Huang, and F. Jia, “Fully Automatic Multi-organ Segmentation based on Multi-boost Learning and Statistical Shape Model Search,” in VISCEARL@ISIB 2015 VISCERAL Anatomy3 Organ Segmentation Challenge, 2015, p. 18.
[13]
G. H. Chen, C. Wachinger, and P. Golland, “Sparse projections of medical images onto manifolds,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 7917 LNCS, pp. 292–303, 2013.
[14]
C. Wachinger, D. Mateus, A. Keil, and N. Navab, “Manifold learning for patient position detection in MRI,” 2010 7th IEEE Int. Symp. Biomed. Imaging From Nano to Macro, ISBI 2010 - Proc., pp. 1353–1356, 2010.
[15]
O. Pauly, B. Glocker, A. Criminisi, D. Mateus, A. M. Möller, S. Nekolla, and N. Navab, “Fast multiple organ detection and localization in whole-body MR dixon sequences,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 6893 LNCS, no. PART 3, pp. 239–247, 2011.
[16]
A. Criminisi, D. Robertson, O. Pauly, B. Glocker, E. Konukoglu, J. Shotton, D. Mateus, A. M. Möller, S. G. Nekolla, and N. Navab, “Anatomy Detection and Localization in 3D Medical Images,” in Decision Forests for Computer Vision and Medical Image Analysis, Springer, 2013.
[17]
M. Hu, “Visual pattern recognition by moment invariants,” IRE Trans. Inf. Theory, vol. 8, no. 2, pp. 66–70, 1962.
[18]
von F. Zernike, “Beugungstheorie des schneidenver-fahrens und seiner verbesserten form, der phasenkontrastmethode,” Physica, vol. 1, no. 7–12, pp. 689–704, May 1934.
[19]
a. Khotanzad and Y. H. Hong, “Invariant image recognition by Zernike moments,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 12, no. 5, pp. 489–497, May 1990.
[20]
S. A. Bakar, M. S. Hitam, and W. N. Jawahir Hj Wan Yussof, “Investigating the properties of Zernike moments for robust content based image retrieval,” Int. Conf. Comput. Appl. Technol. ICCAT 2013, 2013.
[21]
A. Soumelidis, Z. Fazekas, M. Pap, and F. Schipp, “Generic Zernike-based surface representation of measured corneal surface data,” 2011 IEEE Int. Symp. Med. Meas. Appl., pp. 148–153, May 2011.
[22]
D. Z. D. Zhang and G. L. G. Lu, “Improving retrieval performance of Zernike moment descriptor on affined shapes,” Proceedings. IEEE Int. Conf. Multimed. Expo, vol. 1, pp. 205–208, 2002.
[23]
S. Belkasim, M. Shridhar, and M. Ahmadi, “Pattern recognition with moment invariants: a comparative study and new results,” Pattern Recognit., vol. 24, no. 12, pp. 1117–1138, 1991.
[24]
I. T. Jolliffe, Principal Component Analysis. 2002.
[25]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA Data Mining Software: An Update,” SIGKDD Explor. Newsl., vol. 11, no. 1, pp. 10–18, Nov. 2009.
[26]
L. E. O. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.
[27]
D. Pescia, N. Paragios, and S. Chemouny, “Automatic detection of liver tumors,” Biomed. Imaging From Nano to Macro, 2008. ISBI 2008. 5th IEEE Int. Symp., pp. 672–675, 2008.
[28]
A. Criminisi, J. Shotton, and S. Bucciarelli, “Decision Forests with Long-Range Spatial Context for Organ Localization in CT Volumes.”
[29]
C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, 1995.
[30]
Y. Freund and R. E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an
Application to Boosting,” J. Comput. Syst. Sci., vol. 55, no. 1, pp. 119–139, 1997. [31]
Z. Tu, “Probabilistic boosting-tree: Learning discriminative models for classification, recognition, and clustering,” Proc. IEEE Int. Conf. Comput. Vis., vol. II, pp. 1589–1596, 2005.
[32]
A. Bosch, A. Zisserman, and X. Munoz, “Image classification using random forests and ferns,” IEEE 11th Int. Conf. Comput. Vision, 2007. ICCV 2007., vol. 1, no. 8, pp. 14–21, 2007.
[33]
P. Yin, A. Criminisi, J. Winn, and I. Essa, “Tree-based classifiers for bilayer video segmentation,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2007.
[34]
L. B. Leo Breiman Statistics, “Out-Of-Bag Estimation,” 1996.
[35]
M. J. Tóth, T. Blaskovics, L. Ruskó, G. Delso, and B. Csébfalvi, “Automated Detection of Anatomical Regions in Magnetic Resonance Images,” in Vision, Modeling & Visualization, 2014.
[36]
L. Rabiner and B. Juang, “An introduction to hidden Markov models,” IEEE ASSP Mag., vol. 3, no. January, p. Appendix 3A, 1986.
1. Table Size of the anatomical regions Region LEG PELVIS ABDOMEN CHEST HEAD&NECK
Average Size (mm) 179.56 206.73 207.83 188.39 234.08
Variance 85.31 27.19 38.47 33.82 16.78
. 2. Table Pateients’ statistics
Age (Years) Height (cm) Weight (kg)
Min 18 150 43
Max 91 194 127
Average 60 169 74
. 3. Table Accuracy of the initial slice classification
pred. LEG
true LEG
true PELVIS
true ABDOMEN
true CHEST
true HEAD&NECK
class precision
18888
1398
144
118
0
91.92%
pred. PELVIS
714
19564
2790
439
5
83.21%
pred. ABDOMEN
50
1820
20987
850
120
88.08%
pred. CHEST
74
41
540
20481
371
95.23%
pred. HEAD&NECK
0
0
129
284
26420
98.46%
95.75%
85.72%
85.35%
92.37%
98.16%
91.49%.
class recall
. 4. Table Accuracy of the slice classification after establishing 3D coherent labeling
true LEG
true PELVIS
true ABDOMEN
true CHEST
true HEAD&NECK
class precision
19265
1257
0
0
0
93.87%
819
22266
427
0
0
94.70%
pred. ABDOMEN
0
1790
21200
863
0
88.88%
pred. CHEST
0
0
675
20220
612
94.02%
pred. HEAD&NECK
0
0
0
378
26455
98.59%
95.92%
87.96%
95.06%
94.22%
97.74%
94.13%
pred. LEG pred. PELVIS
class recall
. 5. Table Accuracy measured on partial body images
true LEG
true PELVIS
true ABDOMEN
true CHEST
true HEAD&NECK
class precision
pred. LEG
943614
88995
0
0
0
91.38%
pred. PELVIS
91020
2447127
118326
0
0
92.12%
pred. ABDOMEN
0
238429
2921987
60043
0
90.73%
pred. CHEST
0
0
217707
2195947
20135
90.23%
pred. HEAD&NECK
0
0
0
83181
1086771
92.89%
91.20%
88.20%
89.69%
93.88%
98.18%
91.27%
class recall
. 6. Table Accuracy staistics for labeling partial body images Scan width (mm) 350 450 550
Min
Max
Average
Std. Dev.
40.49% 45.26% 57.74%
100.00% 100.00% 99.69%
89.15% 90.79% 89.59%
9.09% 5.90% 6.44%
650 750 850 950 1050 1150
70.68% 62.81% 76.83% 82.07% 87.03% 93.64%
99.74% 99.56% 99.20% 99.10% 99.35% 99.41%
91.16% 91.68% 92.59% 93.54% 94.72% 96.53%
4.68% 5.21% 3.33% 2.88% 2.55% 2.09%
. 7. Table Comparison of the accuracy with another published method Nakamura et al. Body Part Precision HEAD 98.9% HEAD&NECK 95.3% NECK 96.6% CHEST 97.3% CHEST&ABDOMEN 93.0% ABDOMEN 93.1% PELVIS 91.5% LEG 99.8%
Body Part
Proposed method Precision
HEAD&NECK
98.6%
CHEST
94.0%
ABDOMEN
88.9%
PELVIS LEG
94.7% 93.9%
. 8. Table Region boundary localization error Boundary
Average Localization Error (mm) 12.02 10.59 24.07 17.67
HEAD→CHEST CHEST→ABDOMEN ABDOMEN→PELVIS PELVIS→LEG
. Table A.1 Initial accuracy values considering different PCA components and number of trees in the random forest
Number of PCA components
Number of trees in the RF 50
100
150
200
300
500
none
85,99%
87,94%
89,08%
89,48%
89,88%
90,28%
25
89,16%
89,63%
89,39%
89,47%
90,15%
90,07%
50
88,75%
89,72%
89,55%
90,05%
91,44%
91,49%
100
87,85%
88,87%
89,63%
89,81%
91,01%
91,27%
.
Table A.2 Training times (s) considering different PCA components and number of trees in the random forest
Number of PCA components
Number of trees in the RF 50
100
150
200
300
500
none
63
124
184
247
448
759
25
26
50
95
101
183
314
50
29
63
98
127
238
394
100
42
83
125
165
302
511
Three Dimensional Coherence Inspection
Final Labelling
. Feature Extraction and Machine Learning Input Image
Slice Normalization
Zernike Transform
PCA Transform
Random Forest Classification
1. Figure The classification workflow.
(a)
(b)
(c)
2. Figure Slice normalization: the original slice (a), the result of the translation correction (b), the result of the FOV normalization (c).
(a) Original Image.
(b) Reconstructed from (c) Reconstructed from (d) Reconstructed from the 5th order transform. the 25th order transform. the 50th order transform.
3. Figure Reconstruction from different orders of the Zernike transform.
Image Database
Data Split
Training
Training Data
Labeled Test Data
Model PCA Transform
Random Forest Classifier
Labeling
Calculation of the Confusion Matrix
Accuracy
Test Data
4. Figure The training process.
7. Figure Initialization of the region curve fitting.
5. Figure Image artifacts influencing the initial calssification.
8. Figure Result of the region curve fitting.
(a) Best classification.
(b) Worst classification.
6. Figure Result of the initial classification.
9. Figure Effect of the post-processing. Initial classification (left) final labeling (right).
10. Figure Automated anatomy labeling for patients with varying sex, and level of obesity.
11. Figure Effect of the estimation of the number of anatomical regions.
Overall Accuracy
100% 80% 60% 40% 20% 0% 350
450
550
650 750 850 Scan Width (mm)
950
1050
1150
12. Figure Accuracy measured on partial body scans of various size.
13. Figure Results of the anatomy labeling for partial body images (cut width 350mm top; 650mm bottom).
(a)
(b)
(c)
14. Figure Pathological cases.
15. Figure Collapsed region using a graph based post-process.
16. Figure HMM result. From left to right: Confidence values; Manual labeling; HMM labeling; Labeling using the proposed method.
Highlights
A novel method is presented to recognize the main anatomical regions in 3D medical exams. Standard image processing techniques and machine learning tools are applied to obtain an initial slice-by-slice classification. The post-processing part of the method incorporates the expected order and size of anatomy regions to eliminate false classifications. The proposed method was evaluated for whole-body (torso) as well as partial body MRI exams and proved reliable