Medical Engineering & Physics 33 (2011) 1017–1026
Contents lists available at ScienceDirect
Medical Engineering & Physics journal homepage: www.elsevier.com/locate/medengphy
Automatic endotracheal tube position confirmation system based on image classification – A preliminary assessment Dror Lederman a,∗ , Samsun Lampotang b , Micha Y. Shamir c,d a
Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA Department of Anesthesiology, University of Florida, Gainesville, FL, USA Department of Anesthesiology and Critical Care Medicine, Hadassah Hebrew University Medical Center, Jerusalem, Israel d Department of Anesthesiology, Perioperative Medicine and Pain Management, University of Miami Miller School of Medicine, Miami, FL, USA b c
a r t i c l e
i n f o
Article history: Received 11 November 2010 Received in revised form 28 March 2011 Accepted 12 April 2011 Keywords: Airway management Endotracheal intubation confirmation Esophageal intubation detection Medical image classification One-lung intubation detection
a b s t r a c t Endotracheal intubation is a complex medical procedure in which a ventilating tube is inserted into the human trachea. Improper positioning carries potentially fatal consequences and therefore confirmation of correct positioning is mandatory. This paper introduces a novel system for endotracheal tube position confirmation. The proposed system comprises a miniature complementary metal oxide silicon sensor (CMOS) attached to the tip of a semi rigid stylet and connected to a digital signal processor (DSP) with an integrated video acquisition component. Video signals are acquired and processed by a confirmation algorithm implemented on the processor. The confirmation approach is based on video image classification, i.e., identifying desired expected anatomical structures (upper trachea and main bifurcation of the trachea) and undesired structures (esophagus). The desired and undesired images are indicators of correct or incorrect endotracheal tube positioning. The proposed methodology is comprised of a continuous and probabilistic image representation scheme using Gaussian mixture models (GMMs), estimated using a greedy algorithm. A multi-dimensional feature space, which consists of several textural-based features, is utilized to represent the images. The performance of the proposed algorithm was evaluated using two datasets: a dataset of 1600 images extracted from 10 videos recorded during intubations on dead cows, and a dataset of 358 images extracted from 8 videos recorded during intubations performed on human subjects. Each one of the video images was classified by a medical expert into one of three categories: upper tracheal intubation, correct (carina) intubation and esophageal intubation. The results, obtained using a leave-one-case-out method, show that the system correctly classified 1530 out of 1600 (95.6%) of the cow intubations images, and 351 out of the 358 human images (98.0%). Misclassification of an image of the esophagus as carina or upper-trachea, which is potentially fatal, was extremely rare (only one case when in the animal dataset and no cases when in the human intubation dataset). The classification results of the cow intubations dataset compare favorably with a state-of-the-art classification method tested on the same dataset. © 2011 IPEM. Published by Elsevier Ltd. All rights reserved.
1. Introduction Endotracheal intubation is a relatively common procedure (∼25 million intubations per year in US) performed by trained health care providers in emergency and scheduled medical conditions in which the patient ceases or cannot breathe on his own. During the procedure, a tube is positioned in the patient’s trachea, thus securing access to the lungs and enabling artificial ventilation. On rare occasions, the anatomy of the patient does not allow easy insertion of the endotracheal tube (ETT) and consequently the ETT might be
∗ Corresponding author. Tel.: +1 412 641 2581; fax: +1 412 641 2582. E-mail addresses:
[email protected],
[email protected] (D. Lederman).
incorrectly positioned, either in the esophagus or too deep in the trachea (i.e. right main bronchus) [1,2]. Unrecognized esophageal intubation carries fatal risk as the stomach is ventilated rather than the lungs resulting in severe, incompatible with life, oxygen deficiency in the vital organs. In cases of right bronchus intubation (also termed one-lung intubation (OLI)), only one lung is ventilated. One lung ventilation might cause serious complications such as collapse of the non-ventilated lung, yielding reduction in oxygen tension in the blood or lung infection, hyperinflation or rupture of the ventilated lung resulting in shock. Both esophageal intubation and OLI might also occur after the ETT was properly positioned (“dislodgement”) from many reasons such as neck flexion [3,4]. Unrecognized OLI and esophageal intubation are the most serious complications of attempted tracheal intubation [5]. Numerous studies have investigated endotracheal misplacements rate
1350-4533/$ – see front matter © 2011 IPEM. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.medengphy.2011.04.006
1018
D. Lederman et al. / Medical Engineering & Physics 33 (2011) 1017–1026
Stylet
CMOS sensor image acquisition
PC/ DSP
RGB to gray-level conversion
CMOS
Air/water lumen Fig. 1. A schematic drawing of the video-stylet which includes the stylet and complementary metal oxide silicon (CMOS) sensor connected to a digital signal processor (DSP) or a personal computer (PC).
in hospital and pre-hospital settings. Several studies reported an unrecognized OLI rate between 0% and 7.8% [6–8]. Studies based on independent observers reported higher misplacements rates which vary between 6% and 25% [1,2,9–12]. These numbers seem low but since the absolute number of intubations carried out worldwide is large the number of people at risk of serious complications is high. The medical literature offers a few ways to confirm proper tube positioning. For many years, auscultation of breath sounds over the lungs using a stethoscope was the most widely used technique as it utilizes a very accessible tool. This technique requires high attention, lack of environmental noise and it suffers significant false positive and false negative rates significantly affecting its reliability [4,13,14]. Recently measurement of carbon dioxide (CO2 ) in the expired air has become the standard practice. Since one of the main functions of the lung is to eliminate CO2 by exhalation then measuring a substantial amount of CO2 in the exhaled air (end-tidal CO2 (ETCO2 )) proves that the endotracheal tube is located in the airways. De-facto it has become the gold standard for confirming correct ETT positioning. This method suffers however from significant shortcomings. It is not capable of differentiating proper ETT positioning from a too deep one because in both cases CO2 is exhaled in the same concentration [14,15]. Additionally, ETCO2 is unreliable for endotracheal intubation confirmation in many medical emergencies such as cardiac arrest [5,14–17]. Other techniques have been proposed [18–22], but none of them has been proven better. Auscultation and ETCO2 measurement are therefore the common practice in the absence of a better method. Based on a previous proof-of-concept [23], we further developed and tested our novel approach for automatic endotracheal intubation confirmation. The approach is based on direct visual cues, i.e., identification of specific anatomical landmarks as indicators of correct or incorrect tube positioning. In this study, the system is further developed and evaluated using animal and human tissue model. The paper is arranged as follows. Section 2 presents the proposed confirmation system. The experimental results are presented in Section 3. A discussion appears in Section 4. 2. Materials and methods The correct position of an ETT tip is 2–5 cm above the carina (the bifurcation of the trachea into the two main bronchi). The image of the carina is therefore the definitive anatomical landmark for confirming endotracheal intubation. The proposed approach is based on identifying the carina in the acquired video images, and discriminating between the carina and other anatomical structures. The method combines a continuous and probabilistic image representation scheme which is employed in a textural-based feature
f ( x; H1)
Uppertrachea GMM1
f (x; H2 )
Carina
Maximumlikelihoodbased
GMM2
decision rule
f ( x; H3 )
Esophagus GMM3
Fig. 2. A general scheme of the proposed confirmation system. The system consists of three Gaussian mixture models (GMMs), one representing the upper-trachea, one representing the carina and one representing the esophagus.
space. In the following subsections, an overview of the proposed system is given. 2.1. The video-stylet We designed and assembled a system which resembles an intubating stylet. The tip comprises a miniature CMOS sensor. The inner part of the stylet contains wires to transfer the image and a narrow lumen to spray water or air in order to clear blood and secretions away from the camera sensor (Fig. 1). The image sensor is connected to a processor with an integrated image acquisition component. During intubation, this rigid stylet is inserted into a commercial endotracheal tube with its camera at the tip. Video signals are continuously acquired and processed by the confirmation algorithm implemented on the processor. 2.2. Pre-processing and features extraction The confirmation algorithm is based on classification of specific anatomical landmarks, including the carina, tracheal rings (upper trachea) and esophagus. Fig. 2 presents a general block diagram of the confirmation system. The first step is conversion of the acquired image from a color image (RGB) to a gray scale image, which is followed by feature extraction. Various features have been utilized in medical image applications, including squared gray-level difference [24], cross correlation [24] and localized intensity features [25]. In this work, textural features [26] were used. Textural features contain important information about the structural arrangement of surfaces and their relationship to the surrounding environment. In particular, features based on gray level co-occurrence matrices (GLCM) were utilized. These features are based on the assumption that texture information on an image is contained in the overall or “average” spatial relationship, which the gray tones in the image have to one another [26]. More specifically, it is assumed that this texture information is adequately specified by a set of gray tone spatial dependence matrices which are computed for various angular relationships and distances between neighboring resolution cell pairs on the image. One of the advantages of these features is that they are robust to imaging angles and scaling. This property is of great importance to the task in hand, as during intubation the tube may be inserted in different angles and directions, depending on the
D. Lederman et al. / Medical Engineering & Physics 33 (2011) 1017–1026
90º
135º
45º
6
7
8
5
*
1
4
3
0º
(ijp(i, j) − x y )/x y , where x and y 2. Correlation: f2 = i j are the means, x and y are the standard deviations of the marginal distributions associated with p(i,j). 3. Two information measures of correlation: f3 = (HXY − HXY1 ) 1/2 , where /max {HX, HY} and f4 = (1 − exp[− 2.0(HXY2 − HXY)]) HX and HY are the entropies of px and py , HXY = − i j p(i, j) log {p(i, j)}, HXY1 = − i j p(i, j) log {px (i)py (j)} and HXY2 = − i j px (i)py (j) log {px (i)py (j)}. 4. Maximal correlation coefficient: f5 = (second largest eigvenvalue of Q)1/2 , where Q (i, j) =
2
p(i, k)p(j, k) k
Fig. 3. A schematic drawing of the resolution cells used to calculate the textural features. Resolution cells 1 and 5 are 0◦ (horizontal) nearest neighbors to resolution cell *. Resolution cells 2 and 6 are 135◦ nearest neighbors; resolution cells 3 and 7 are 90◦ nearest neighbors; and resolution cells 4 and 8 are 45◦ nearest neighbors to *.
1019
px (i)py (j)
.
The four values that each feature takes on in the four directions are averaged to produce a rotation-invariant feature which is employed by the classification system. 2.3. The GMM framework
technique employed by the person performing the procedure. It was therefore hypothesized that textural features will allow reliable classification of the images, independently of the angle at which the tube was inserted. It should be noted that calculation of these features is computationally costly. However, efficient calculation methods exist [27], which may overcome this problem and allow real-time classification based on these features. In addition, their use saves the need to perform image registration prior to classification. A brief description of the textural features is now given. Let f : Lx*Ly → I be an image with dimensions Lx and Ly, and gray levels g = 0,1, . . . , G − 1. Let d be the distance (offset) between two pixel positions (x1 ,y1 ) and (x2 ,y2 ). Angles quantized to 45◦ intervals are considered, such that the neighbors of any pixel can lie on four possible directions: = 0◦ , 45◦ , 90◦ and 135◦ . A resolution cell is considered to have eight nearest-neighbor resolution cells as in Fig. 3. The co-occurrence matrix is constructed by observing pairs of image cells at distance d from each other and incrementing the matrix position corresponding to the gray level of both cells. The un-normalized frequencies for direction of 45◦ , for instance, are defined by: P(i, j, d, 45◦ ) =
#{(k, l), (m, n) ∈ (Ly ∗ Lx ) ∗ (Ly ∗ Lx )|k − m = d, |l − n| = −d (1) or (k − m = −d, l − n = d), I(k, l) = i, I(m, n) = j},
where # denotes the number of elements in the set. Measures of the other directions, as well as the normalized measures, can be easily obtained [26]. The associated cells, are illustrated in Fig. 3, in which resolution cells 4 and 8 represent 45◦ nearest neighbors to the center cell. To construct the feature set utilized in the proposed system, various textural features were extracted from the GLCM. Let p(i,j) denote the (i,j)th entry in a normalized gray-tone spatial dependence matrix, such that p(i,j) = P(i,j)/R, where R is a normalization constant, which in this work to the sum of all values of Gwasset G P(i,j), i.e., R = P(i, j), and px (i) and py (i) denote the ith i=1 j=1 entry in the marginal-probability matrix, obtained by summing G the rows and columns of p(i,j), respectively, i.e. px (i) = P(i, j), j=1
G
P(i, j). Then, the following features are used to conpy (j) = i=1 struct the feature set: 1. Contrast: f1 =
G−1 2 G G n n=0
i=1
p(i, j) j=1
|i−j|=n
.
In order to classify the video frames, a probabilistic framework is utilized, in which the images are represented in the feature space using Gaussian mixture model (GMM). GMM-based classification methods have been largely applied to speech recognition [28], and recently to some medical image classification applications [25]. Mixture models, in particular GMM, form a common technique for probability density estimation. This is justified by the fact that any density can be estimated, in a required degree of approximation, using finite Gaussian mixture [29]. Their mathematical properties, as well as their flexibility and the availability of efficient estimation algorithms, make them attractive for classification problems. The most popular algorithm for GMM parameters estimation is expectation-maximization (EM) [29]. This algorithm allows iterative optimization of the mixture parameters, under monotonic likelihood requirements, and has a relatively simple implementation. However, the EM suffers from several drawbacks: (i) it requires a priori knowledge of the mixture order, i.e. number of mixing components, (ii) it is highly sensitive to parameters initialization, (iii) it tends to converge to local maxima. Greedy learning of GMM was recently proposed [30,31] and overcomes the drawbacks of the EM algorithm. The greedy learning algorithm estimates the GMM parameters in a greedy fashion, and thus inherently estimates the mixture order. 2.3.1. GMM estimation In this study, a greedy GMM-based classification technique applicable to images classification was developed. The probability density functions (pdf) of the upper-trachea, carina and esophagus video images were represented, in the feature space, by three classes, each of which was modeled by a GMM defined as a weighted sum of K Gaussian components. A GMM, representing a random process, x, can be expressed as follows: fK (x) =
K
(2)
k k (x),
k=1
where k (x) represents the kth Gaussian mixture component and
K
k represents the mixing weight such that = 1, j ≥ k=1 k 0 ∀j. A multivariate Gaussian mixture is given by the weighted sum (2), where the jth component (x; j ) is the d-dimensional Gaussian density:
−1/2
(x; j ) = (2)−d/2 sj
exp −0.5(x − mj )T s−1 (x − mj ) , j
(3)
1020
D. Lederman et al. / Medical Engineering & Physics 33 (2011) 1017–1026
which is parameterized on the mean mj and the covariance matrix sj , collectively denoted by the parameter vector j . Previous studies suggested that a mixture density could be learned using a maximum likelihood approach in a greedy fashion, i.e. by incrementally adding components to the mixture up to a desired number of components. If component insertion is carried out in an optimal way, such an incrementally computed mixture is almost as good as any mixture in the form of equation (2). This approach was implemented [31], further improved [30] and was utilized in the present work. For the learning problem, a training set {x1 , . . . , xn } of independently and identically distributed samples of the mixture is given, and the task is to estimate the parameters {j ,mj ,sj } of the kth component, such that the following log-likelihood function is maximized: Lk =
n
log fk (xj ).
(4)
i=1
The greedy algorithm is based on the following recursive equation: fk+1 (x) = (1 − a)fk (x) + a(x; ),
(5)
where a is a constant ranging from 0 to 1. Then, the log-likelihood function for k + 1 can be re-written in a recursive manner as follows: Lk+1 =
n
log fk+1 (xi ) =
i=1
n
log (1 − a)fk (xi ) + a(xi ; ) .
(6)
i=1
This means that a possible solution could be found by commencing the process with one Gaussian component, estimating its parameters via maximum-likelihood (ML) estimation, and thereafter building the mixture component-wise, rather than applying the EM on all mixture components simultaneously. The algorithm is performed in two steps: insert a component into the mixture, and run an EM algorithm to estimate the updated mixture parameters. This iterative procedure is terminated when a given convergence condition is satisfied. In this way, the greedy learning algorithm does not require prior knowledge of the mixture order, in contrast to the conventional EM algorithm [30,32]. 2.3.2. GMM-based classification The classification criterion for greedy-GMM is based on an ML decision rule. According to this approach, the decision is made by finding the class, j, which maximizes the likelihood function: j = arg max fk (x; Hm ),
(8)
m=1,2,...,M
where fk (x;Hm ) is the pdf of the observations (features), x, under hypothesis Hm , which were estimated during the learning phase. Fig. 4 presents a typical example of the log-likelihood score, Lk , as a function of the mixture order, for a cow carina model. It is shown that the log-likelihood function reaches a maximum for a mixture order between 10 and 12, indicating that a mixture order of 10 will be appropriate for the cow carina model. 3. Results 3.1. Classification of cow intubation video images In order to perform a preliminary evaluation of the proposed system, we recorded videos during intubations performed on dead cows. A total of 10 videos were recorded, out of which 1600 images were extracted and classified by a medical expert into one of the following categories: upper-trachea (490 images), carina (550 images) and esophagus (560 images). Evaluation of the proposed approach
Table 1 Summary of classification results for the cow intubation dataset. Recognized
Upper-trachea Carina Esophagus Total
Actual Upper-trachea
Carina
Esophagus
443 (90.4%) 25 (5.1%) 22 (4.5%) 490
20 (3.6%) 528 (96.0%) 2 (0.4%) 550
1 (0.2%) 0 (0.0%) 559 (99.8%) 560
was performed using a leave-one-case-out validation method: in each iteration, the images extracted from 9 videos were used to train the models, i.e. estimate the GMMs, and the images from the remaining video were used to test system performance. This process was repeated 10 times, such that each video participated once in the testing phase. Figs. 5 and 6 show typical examples of the images and the calculated textural features: correlation, contrast and two information measures of correlation. These examples are given to demonstrate the differences in the textural patterns between carina images (Fig. 5) and non-carina images (Fig. 6). However, no specific rule can be determined based on these figures, according to which a classification decision can be made. Therefore, all five features were included in the feature set which was utilized by the classifier. Fig. 5 shows that the correlation graphs have significant peaks in the relevant offsets which are indicative of a repetitive pattern in the image due to the symmetry of the carina. In the first example (f)–(j), a minimum point appears in an offset of 135◦ , due to the angle at which the image was taken. In the second example, which was taken when the stylet was held straight, a maximum point is seen in an offset of 0◦ , Likewise, the contrast and the information measures of correlation graphs show local maximum and minimum points in the relevant shifts, i.e., between pixel 50 and 200. Fig. 6 presents a typical example of upper-tracheal intubation (a)–(e), and of esophageal intubation (f)–(j), and the calculated textural features. In this case, no significant peak appears in the correlation graph. Likewise, differences between the contrast graphs are evident. These typical examples demonstrate the advantage of textural features and their suitability for the task in hand. The classification results are summarized in Table 1 where the rows represent the predicted (recognized) classes and the columns represent the actual classes. The system achieved an overall classification rate of 95.6% (1530 out of 1600 images). These results were statistically significant (p < 0.005). Specifically, most of the errors are due to incorrect identification of carina images as upper-trachea (20 cases, i.e. 3.6%), or incorrect identification of upper-trachea images as carina (25 cases, i.e. 5.1%) or esophagus (22 cases, i.e. 4.5%). Only in one case (0.2%), an esophagus image was mistakenly classified as upper-tracheal, in two cases (0.4%) a carina image was mistakenly classified as esophagus, and no esophageal image was classified as trachea. When judging for discrimination between correct positioning of the ETT (tracheal or carinal positions) and esophageal intubation, the algorithm had a false negative rate of 0.041 and false positive rate of 0.0009. When testing the performance of the algorithm in diagnosis of correct position of the ETT within the respiratory system (tracheal vs. carina positioning), the false positive rate was 0.045 and the false negative rate was 0.041. Fig. 7 presents some typical examples of the three groups: uppertrachea, carina and esophagus images, correctly classified by the algorithm. 3.2. Comparison with a state-of-the-art algorithm In order to validate the proposed textural-based classification system against the state-of-the-art in relevant medical applica-
D. Lederman et al. / Medical Engineering & Physics 33 (2011) 1017–1026
1021
-7.5
Log-likelihood
-8
-8.5
-9 0
5
10
15
20
25
30
Mixture order Fig. 4. Log-likelihood scores as a function of the mixture order in case of a cow carina model. The circles indicate the mixture orders that provide the maximal log-likelihood scores and were therefore selected by the greedy algorithm.
tions, a method based on selective measures [24] was implemented and utilized in the task in hand. This method is based on calculating a localized modified mean squared error measure between acquired bronchoscope video images and reference (virtual) images. It should be noted that to the best of our knowledge, the present study is the first to propose classification of upper airway images and specifically image-based ETT position confirmation. Therefore, as a benchmark for evaluating the performance of the proposed approach, we utilized the method in [24] and adapted it for our application. Using the cow intubations dataset, that method yielded an average correct classification rates of 86.7%, 83.7% and 99.4%, as compared with 90.4%, 96% and 99.8%, obtained using our method, for the upper-trachea images, carina images and esophagus images, respectively. While both algorithms successfully discriminated between carina and esophagus images, our method significantly outperformed the selective measures method in discriminating between upper-trachea and carina images. Notably, an exact comparison between the two different approaches, one a probabilistic model-based approach and the
other based on localized selective measures, is a difficult task as the latter requires accurate determination of several parameters which can significantly affect performance.
3.3. Comparison with other schemes The performance of our proposed scheme was further tested with other features, namely anisotropic circular Gaussian Markov random field (ACGMRF) [33], steerable Laplacian pyramid (SLP) [34], and discrete cosine transform (DCT) coefficients. In addition, the proposed scheme was compared with a support vector machine (SVM)-based scheme with the same feature sets. For this purpose, a publicly available SVM software package (SVM-Light) [35] was utilized. The results, presented in Table 2, show that the GLCM features compare favorably with the ACGMRF and SLP features, outperforming the DCT coefficients. In addition, the GMM-based scheme compares favorably (with no statistically significant difference) with the SVM-based scheme.
Fig. 5. Two examples of carina images (a, f) and the calculated textural features: correlation (b, d), contrast (c, h) and two information measures of correlation (d, i) and (e, j).
1022
D. Lederman et al. / Medical Engineering & Physics 33 (2011) 1017–1026
Fig. 6. Two examples of non-carina images: upper-tracheal intubation (a) and esophageal intubation (f), and the calculated textural features: correlation (b, g), contrast (c, h) and two information measures of correlation (d, i) and (e, j).
Fig. 7. Typical examples of correctly classified images: (a) carina images, (b) upper-trachea images, (c) esophagus images.
D. Lederman et al. / Medical Engineering & Physics 33 (2011) 1017–1026 Table 2 Summary of classification results for different features.
Table 4 Summary of classification results for the human intubation dataset.
Feature set
GMM classifier overall accuracy
SVM classifier overall accuracy
16 DCT coefficients ACGMRF [33] SLP [34] GLCM (this work)
88.9% 95.9% 93.2% 95.6%
91.3% 94.7% 93.8% 95.3%
3.4. Greedy-GMM vs. EM-GMM While the EM-GMM assumes a priori knowledge of the mixture order, the greedy-GMM automatically estimates the order based on the available data. In the following experiment, the performance of both algorithms, with the same feature set, were evaluated as a function of the order, K, which in the case of greedy-GMM represents the maximal mixture order, and in the case of EM-GMM represents the preselected mixture order. Fig. 8 shows the average correct classification rates for both algorithms as a function of the mixture order. The results clearly show that for the particular database in hand, a mixture order between 10 and 12 provides the best performance for both algorithms. It is evident that for mixture orders between 1 and 10, the average correct classification rates obtained by the greedy-GMM and EM-GMM are similar. However, for mixture orders greater than 12, the performance of the EMGMM degrades since the available database is not large enough to estimate all the models’ parameters, namely the “curse of dimensionality” phenomenon. At the same time, the performance of the greedy-GMM is not affected, i.e., the algorithm correctly estimates the mixture order which provides the best classification rates for the given database. These results demonstrate the superiority of the greedy-GMM-based classification algorithm. It should be noted, however, that this advantage of the greedy-GMM algorithm might be dependent on the database size and characteristics. Theoretically, if an infinite database had been used, the differences between the two algorithms would have been insignificant. Therefore, this conclusion should be re-visited once a larger database is obtained. 3.5. An additional dataset of human intubation videos While efforts to acquire a large dataset of intubation videos are being invested, we used a dataset of videos that were recorded during intubations performed on 8 human subjects at the University of Florida Medical Center. From these videos, a total of 358 images were extracted and used as a second validation dataset. The distribution of images for each subject is summarized in Table 3. The new images correspond to the original dataset classes, including: uppertrachea, carina and esophagus. Typical examples of correct and incorrect classification by the system are shown in Figs. 9 and 10, respectively. Classification errors occur mostly when the exact position is not clearly seen due to the position and/or angle of the Table 3 Distribution of images for each subject. Subject
#1 #2 #3 #4 #5 #6 #7 #8 Total
1023
No. of images Upper-trachea
Carina
Esophagus
21 25 20 22 34 24 22 21 189
16 12 13 11 13 13 11 12 101
8 7 10 9 9 10 8 7 68
Recognized
Upper-trachea Carina Esophagus Total
Actual Upper-trachea
Carina
Esophagus
185 (97.9%) 2 (1.1%) 2 (1.1%) 189
2 (2.0%) 98 (97.0%) 1 (1.0%) 101
0 (0.0%) 0 (0.0%) 68 (100.0%) 68
stylet. In particular, the upper-tracheal images, shown in Fig. 10(a) and (b), were classified as carina. Indeed although the stylet was far from the carina in an upper trachea position, the carina was clearly seen and hence the image was incorrectly classified as carina. Two examples of upper-trachea which were incorrectly classified as esophagus are presented in Fig. 10(c) and (d). Using the leave-onecase-out validation method, an overall correct classification rate of 98.0% was achieved using the proposed confirmation algorithm (p < 0.005). The classification results are summarized in Table 4. The false negative (tube in respiratory system recognized as esophageal intubation) rate was 0.042 while the more important false positive rate was 0. When checking the algorithm for its discrimination quality between high tracheal to carinal positioning we found a false negative rate of 0.01 and a false positive rate of 0.02. 4. Discussion A novel system for automatic endotracheal tube positioning confirmation was introduced. According to the proposed approach, direct physical determination of the tube position with respect to the relevant anatomical structures is performed based on image classification. Video-images are acquired by the video-stylet, and are represented using a continuous and probabilistic scheme via greedy-estimated GMMs and textural-based features. The system performance was evaluated using cow and human intubation videos, out of which images were extracted and classified by a medical expert into one of three categories: upper tracheal, carina and esophagus. The method achieved a high precision of 95.6% (1530 out of 1600 images) using the cow intubation dataset, and 98.0% (351 out of the 358 images) using the human intubation dataset. It is of importance to mention that with the proposed method, mistaking esophageal intubation for tracheal positioning (a potentially fatal mistake) was extremely rare (0.2%). The more common, yet rare, mistake was when the ETT was actually located within the respiratory system which carries less serious consequences. An exact comparison across schemes suggested in the literature is a complex task, as this is the first study focusing on a designated image-based method for endotracheal intubation confirmation. Still, in order to validate the proposed framework, we adapted a state-of-the-art method based on selective measures, originally proposed for bronchoscope tracking, and used it for the task in hand. Our approach outperformed the selective measures method testing on the cow intubation dataset. In addition, a comparison between the greedy-GMM-based classifier and the widely used EM-GMM-based classifier, for different mixture orders, was performed using the cow intubation dataset. It was demonstrated that the greedy-GMM-based classifier inherently estimates the model order, and may therefore outperform the EM-GMM-based classifier, unless the model order is a priori known. Moreover, the proposed classification scheme compared favorably with several other feature sets, as well as with an SVM-based scheme. Recently many visualization or video-intubation devices have been proposed [36,37]. These aim to aid in visualizing the entrance to the trachea (i.e. vocal cords) and therefore may aid in ETT insertion. They have no rule in the proper ETT position verification. In
1024
D. Lederman et al. / Medical Engineering & Physics 33 (2011) 1017–1026 100 Greedy-GMM EM-GMM
Correct classification rates (%)
90
80
70
60
50
40
0
2
4
6
8
10
12
14
16
18
20
No. of Gaussians Fig. 8. Performance (average correct classification rate) as a function of the mixture order for the greedy-GMM and EM-GMM algorithms.
addition they suffer from many drawbacks, including significant operator training, limited use in out-of-hospital setting (intense daylight) and their high price. Most importantly, they do not provide means of automatic determination of the tube position and
therefore they require continuous visual inspection by trained medical personnel. The proposed algorithm may be integrated in any of the commercially available video-intubation devices in order to provide
Fig. 9. Typical examples of correctly classified images: (a) carina, (b) upper-trachea, (c) esophagus.
D. Lederman et al. / Medical Engineering & Physics 33 (2011) 1017–1026
1025
Fig. 10. Typical examples of incorrectly classified images: (a, b) carina, (c) upper-trachea, (d) esophagus.
an automatic confirmation of endotracheal intubation. Such a confirmation system would have significant advantages over existing devices: (i) it can reliably determine the tube position in various medical conditions, (ii) it is suitable for both esophageal intubation detection and one-lung intubation detection, (iii) it is fully automatic and, using a designated endotracheal tube, may be used for continuous and long-distance screening of tube misplacement and dislodgment. The method can be readily integrated in patient monitoring systems. Moreover, the system can be used to improve training of medical professionals. It should be also mentioned that according to the American Heart Association (AHA) guidelines, published in 2010, health professionals are required to perform chest X-ray for any intubated patient in order to confirm correct tube placement. The method proposed in this work may potentially eliminate the need to perform chest X-ray, and hence reduce cost and radiation. The proposed method is computationally efficient. Specifically, all of the algorithms used in this work were implemented in Matlab (R2009a). The code was not optimized for real-time processing at the current stage. Nevertheless, using a conventional PC equipped with Dual Intel Xeon 3.4 GHz with 4 GBytes of RAM, feature extraction requires approximately 1 s for each image and the GMM (ML-based) validation process requires between 1.5 and 2 s, for each image. To achieve a real-time confirmation system, the performance should be about 10 times faster than the current system. Our ultimate goal is to develop a reliable, cost-effective, easy to use and fully automatic device for confirmation of correct ETT positioning. For this purpose, we plan to develop an advanced prototype which will be extensively evaluated in pre-clinical trials and, upon receiving the appropriate regulatory approvals, on humans. Based on this preliminary study, we believe that implementation of the proposed method into a real-time confirmation system will lead to a major improvement in the ability to detect intubation incidents as they occur, while the patient is still well oxygenated and stable. Possible improvements, which were left for future research, are the inclusion of other anatomical landmarks, such as vocal cords, and the development of a video-analysis algorithm which are expected to improve confirmation performances. Other features, which may yield better representation and discrimination between upper-trachea and carina images will be considered, and a feature selection scheme will be used to choose a subset of features which may improve the overall performances. Despite the encouraging results, we recognize that this is a very preliminary study. The available database consists of only 10 cow intubation videos and 8 human intubation videos. A much larger database is required in order to reliably validate system performance. Various factors might challenge the system performance, especially fog and secretions, which could result in poor image quality. In addition, the effect of possible anatomical variability between patients on system performance is yet to be evaluated. Clearly, more work is needed in order to evaluate system performance in real-time detection of misplaced and dislodged intubations. With these challenges in mind, successful implementation of the proposed method into
a real-time confirmation system can serve as a major contribution to patient safety.
Acknowledgements The authors wish to thank Yaron Daniely and Dudu Daniely for their efforts and continuous contribution to the project.
Conflict of interest The first author is the inventor of the system presented in this paper, founder and owner of Tube-Eye Medical Ltd. which aims at commercializing the invention. The second and third authors have no conflict of interests.
References [1] Silvestri S, Ralls GA, Krauss B. The effectiveness of out-of-hospital use of continuous end-tidal carbon dioxide monitoring on the rate of unrecognized misplaced intubation within a regional emergency medical services system. Ann Emerg Med 2005;45:497–503. [2] Timmermann A, Russo SG, Eich C, Roessler M, Braun U, Rosenblatt WH, et al. The out-of-hospital esophageal and endobronchial intubations performed by emergency physicians. Crit Care Trauma 2007;104:619–23. [3] Yap SJ, Morris RW, Pybus DA. Alterations in endotracheal tube position during general anesthesia. Anaesth Crit Care 1994:586–8. [4] Vergese ST, Hannallah RS, Slack MC, Cross RR, Patel KM. Auscultation of bilateral breath sounds does not rule out endobronchial intubation in children. Anesth Analg 2004:56–8. [5] Nolan JP, Deakin CD, Soar J. European resuscitation council guidelines for resuscitation. Resuscitation 2005;67:S39–86. [6] Jacobs LM, Berrizbeitia LD, Bernnett B, Madigan C. Endotracheal intubation in the prehospital phase of emergency medical care. JAMA 1983:250. [7] Steward RD, Paris PM, Winter PM. Field endotracheal intubation by paramedical personnel: success rates and complications. Chest 1984:85. [8] Pointer JE. Clinical characteristics of paramedics’ performance of endotracheal intubation. J Emerg Med 1988:6. [9] Wang HE, Cook LJ, Chang CH, Yealy DM, Lave JR. Outcomes after out-of-hospital endotracheal intubation errors. Resuscitation 2009;80:50–5. [10] Jemmet ME, Kendal KM, Fourre MW. Unrecognized misplacement of endotracheal tubes in a mixed urban to rural emergency medical services setting. Acad Emerg Med 2003;10:961–5. [11] Jones JH, Murphy MP, Dickson RL. Emergency physician-verified outof-hospital intubation: miss rates by paramedics. Acad Emerg Med 2004;11:707–9. [12] Katz SH, Falk JL. Misplaced endotracheal tubes by paramedics in an urban emergency medical services system. Ann Emerg Med 2001;37:32–7. [13] Wang HE, Lave JR, Sirion CA, Yealy M. Paramedic intubation errors: isolated events or symptoms of larger problems? Health Aff 2006;25:501–9. [14] Webb RK, Walt JHVD, Runciman WB, Williamson JA, Cockings J, Russel WJ, et al. Which monitor? An analysis of 2000 indicent reports. Anaesth Intensive Care 1993;21:529–42. [15] Gravenstein JS, Jaffe MB, Paulus DA. Capnograhpy clinical aspects. Cambridge University Press; 2004. [16] Bhende SM, Thompson AE. Evaluation of an end-tidal CO2 detector during pediatric cardiopulmonary resuscitation. Pediatrics 1995;95:395–9. [17] Li J. Capnography alone is imperfect for endotracheal tube placement confirmation during emergency intubation. J Emerg Med 2001;20:223–9. [18] O’connor CJ, Mansy H, Balk RA, Tuman KJ, Sandler RH. Identification of endotracheal tube malpositions using computerized analysis of breath sounds via electronic stethoscopes. Anesth Analg 2005;101:735–9. [19] Tejman-Yarden S, Zlotnik A, Weizman L, Tabrikian J, Cohen A, Weksler N, et al. Acoustic monitoring of lung sounds for the detection of one-lung intubation. Anesth Analg 2007;105:397–404.
1026
D. Lederman et al. / Medical Engineering & Physics 33 (2011) 1017–1026
[20] Tejman-Yarden S, Lederman D, Weksler N, Gurman G. Acoustic monitoring of double lumen ventilated lungs for the detection of selective unilateral lung ventilation. Anesth Analg 2006;103:1489–92. [21] Lederman D. An energy ratio test for one lung intubation detection. In: 18th biennial international EURASIP conference. 2006. [22] Weizman L, Tabrikian J, Cohen A. Detection of one-lung intubation incidents. Ann Biomed Eng 2008;36:1844–55. [23] Lederman D, Shamir M. An endotracheal intubation confirmation based on image classification – a preliminary evaluation using a mannequin model. J Clin Monit Comput 2010;24:335–40. [24] Deguchi D, Mori K, Feuerstein M, Kitasaka T, Maurer CR, Suenaga Y, et al. Selective image similarity measure for bronchoscope tracking based on image registration. Med Image Anal 2009;13:621–33. [25] Greenspan H, Pinhas AT. Medical image categorization and retrieval for PACS using the GMM-KL framework. IEEE Trans Inf Technol Biomed 2007;11:190–202. [26] Haralick RM, Shanmugam M, Dinstein I. Textural features for image classification. IEEE Trans Syst Man Cybern 1973;SMC-3:610–21. [27] Clausi DA, Jernigan ME. A fast method to determine co-occurrence texture features. IEEE Trans Geosci Remote Sensing 1998;36:298–300. [28] Rose RC, Reynolds DA. Text-independent speaker identification using automatic acoustic segmentation. In: IEEE int conf acoust, speech and signal proc. 1990. p. 293–6.
[29] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc 1977;39:1–38. [30] Verbeek J, Vlassis N, Krose B. Efficient greedy learning of Gaussian mixture models. Neural Comput 2003;15:468–85. [31] Vlasis N, Likas A. A greedy EM algorithm for Gaussian mixture learning. Neural Process Lett 2002;15:468–85. [32] Bilik I, Tabrikian J, Cohen A. GMM-based target classification for groundsurveillance Doppler radar. IEEE Trans Aerosp Electron Syst 2006;42:267–78. [33] Deng H, Clausi A, Gaussian D. Rotation-invariant features for image M.R.F. classification. IEEE Trans PAMI 2004;26:951–5. [34] Greenspan H, Belongie S, Goodman R, Perona P. Rotation invariant texture recognition using a steerable pyramid. In: Proc 12th IAPR int conf pattern recognition. 1994. p. 162–7. [35] Joachims T. Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A, editors. Advances in kernel methods – support vector learning. MIT-Press; 1999. [36] Weiss M, Hartmann K, Fischer J, Gerber AC. Video-intuboscopic assistance is a useful aid to tracheal intubation in pediatric patients. Cardiothoracic anesthesia, respiration and airway. Can J Anaesth, 2000. [37] Weiss YG, Deutschman CS. The role of fiberoptic bronchoscope in airway management of the critically ill patient. Crit Care Clin 2000;16:445–51.