Automatic diagnosis of vocal fold paresis by employing phonovibrogram features and machine learning methods

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288 journal homepage: www.intl.elsevierhealth.com/j...

Download PDF

584KB Sizes 1 Downloads 62 Views

Report

PDF Reader
Full Text

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

journal homepage: www.intl.elsevierhealth.com/journals/cmpb

Automatic diagnosis of vocal fold paresis by employing phonovibrogram features and machine learning methods Daniel Voigt a,∗, Michael Döllinger a, Anxiong Yang a, Ulrich Eysholdt a, Jörg Lohscheller b a b

Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Bohlenplatz 21, D-91054 Erlangen, Germany University of Applied Science Trier, Department of Computer Science, Schneidershof, D-54293 Trier, Germany

a r t i c l e

i n f o

a b s t r a c t

Article history:

The clinical diagnosis of voice disorders is based on examination of the rapidly moving vocal

Received 13 July 2009

folds during phonation (f0: 80–300 Hz) with state-of-the-art endoscopic high-speed cameras.

Received in revised form

Commonly, analysis is performed in a subjective and time-consuming manner via slow-

2 November 2009

motion video playback and exhibits low inter- and intra-rater reliability. In this study an

Accepted 9 January 2010

objective method to overcome this drawback is presented being based on Phonovibrography, a novel image analysis technique. For a collective of 45 normophonic and paralytic voices

Keywords:

the laryngeal dynamics were captured by specialized Phonovibrogram features and ana-

Computer-assisted diagnosis

lyzed with different machine learning algorithms. Classiﬁcation accuracies reached 93%

Voice disorders

for 2-class and 73% for 3-class discrimination. The results were validated by subjective

Vocal fold paresis

expert ratings given the same diagnostic criteria. The automatic Phonovibrogram analysis

Phonovibrography

approach exceeded the experienced raters’ classiﬁcations by 9%. The presented method

Feature extraction

holds a lot of potential for providing reliable vocal fold diagnosis support in the future. © 2010 Elsevier Ireland Ltd. All rights reserved.

Laryngeal high-speed video analysis

1.

Introduction

Verbal communication plays an important part in the modern world. Hence, an occurring vocal dysfunction seriously affects a person’s social integration and perceived quality of life [1]. For the differentiation of normal and aberrant voices a variety of sophisticated examination techniques are employed clinically [2–6]. The application of a certain set of these diagnostic instruments is compulsory [7]: in order to make a proper diagnosis of a patient’s voice functioning, the physician is obliged to follow this clinical guideline step by step. Usually the most revealing part of this diagnostic chain is the visual inspection of the two vocal folds (VFs), being the main voice producing elements in the larynx. The distinction between healthy and pathological laryngeal dynamics is commonly made based on the degree of symmetry and regularity of the VF vibrations

∗

during phonation [7]. However, examination is aggravated by the fact that VFs are oscillating at a fundamental frequency of 80–300 Hz. For the purpose of observing the rapidly moving VFs, various analysis approaches have been developed (e.g. stroboscopy [4] and videokymography [8]). To date endoscopic high-speed (HS) cameras have turned out to be the most promising technology [5]. Nevertheless, for the clinical assessment of HS recordings a high level of experience and time is needed on the physician’s part, as the human eye is far better adapted to process static visual information than moving images. As a consequence the resulting subjective diagnoses are inherently error-prone and may vary signiﬁcantly between different physicians examining the same patient. To remedy this weakness, different quantitative analysis approaches have been introduced enabling objective HS analysis [9–12]. Lately, Hilbert transform-based approaches [13], Nyquist plots [14], and methods from nonlinear systems analysis [15] have

Corresponding author. Tel.: +49 9131 85 32602. E-mail address: [email protected] (D. Voigt). 0169-2607/$ – see front matter © 2010 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.cmpb.2010.01.004

276

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

been suggested for this purpose. However, all of these methods lack the ability to analyze the complex VF oscillation pattern in its entirety. A fast and clinically evaluated visualization method for capturing the whole spatio-temporal pattern of activity, namely Phonovibrography, has been recently introduced [16]. The VF deﬂections are automatically extracted from laryngeal HS recordings and transformed into a compact graphical representation, a so-called Phonovibrogram (PVG). In this manner the 2-d laryngeal dynamics can be inspected at a level of detail, which could not be achieved by previous 1-d signal analysis approaches [10]. But aside from being a valuable diagnostic tool [17], the PVG characteristics can again be analyzed and described by numerical features to capture the underlying VF vibration properties. The resulting feature vectors are then taken as a basis to build classiﬁcation models allowing for the reliable automatic identiﬁcation of voice disorders. In doing so, valuable decision support is provided to the physician in a timely manner enabling objectiﬁcation of voice diagnosis in a clinical setting. Thus, currently available subjective examination methods (e.g. stroboscopy) can be replaced by a more sophisticated and trustworthy approach, paving the way for the clinical realization of evidence-based medicine. In this study, clinical HS recordings of 45 female subjects with normal voices and unilateral VF pareses are processed with a novel feature extraction method for describing PVG dynamics. The contained pathological concepts are modeled using a selection of supervised machine learning (ML) algorithms in combination with an evolutionary parameter optimization strategy. The test data are classiﬁed according to different clinically meaningful decision tasks and the corresponding cross-validated accuracies are determined. Given the available clinical data set, the objective is to assess the basic classiﬁcation capability of the approach, to identify the most appropriate ML algorithm for modeling and to evaluate certain PVG feature extraction parameters. A further set of conventional 1-d features is analyzed to estimate the performance gain achieved through the novel PVG features. In order to compare the diagnostic reliability of the objective analysis approach with the subjective results of six expert raters, an additional validation is conducted using the same HS sequences underlying this study.

2.

Material and methods

2.1.

Clinical data

To provide a gold standard for subsequent data analysis the laryngeal dynamics of a collective of 45 female test persons (median age: 44.4 years) were thoroughly assessed by experienced physicians according to the standardized examination procedure of the European Laryngological Society (ELS) [7]. This protocol implies a variety of examination approaches: auditory-perceptual assessment, video-laryngoscopic examination, aerodynamic and acoustic analysis, and self-rating of the patient. For a third of the women no evidence of an impaired voice was found, and hence, they constituted the

healthy control group. The remaining 30 cases showed a variety of manifestations of VF paresis, ranging from mildly to severely impaired. This relatively prevalent voice disorder involves the degradation of one VF side’s vibratory properties, commonly caused by neural damage. While for one half of the examined patients mainly the left VF side was affected by paresis, for the other half of the test persons the pathology was identiﬁed at the right side. In contrast to organic voice disorders (e.g. nodule, polyp, edema), which can be assessed quite reliably by analyzing laryngeal still images [18,19], it is widely held that among other things, an appropriate paralytic diagnosis can only be made by considering the overall oscillation behavior of the VFs during phonation and relating both sides’ dynamics to each other.

2.2.

Phonovibrography

The VF movements of all patients have been digitally recorded with an endoscopic HS camera system (Wolf Highspeed Endocam 5542) while producing the sustained vowel /a/, enabling the analysis of laryngeal vibration properties. The camera takes 8-bit grayscale images at a rate of 4000 frames per second with a spatial resolution of 256 × 256 pixels. The high temporal sampling frequency is necessary due to fact that the fundamental frequency of normal speakers approximately ranges from 80 to 300 Hz. During examination the endoscope is inserted orally, providing a top view of the larynx (see Fig. 1a + b). The following image quality criteria had to be met to yield adequate HS recordings for diagnosis: sufﬁcient lighting conditions, overall image sharpness, and complete visibility of both VF edges. The captured HS sequence consisting of 500 laryngeal images (see Fig. 1b) was subsequently processed using a speciﬁcally designed region-growing algorithm [17], which ensures reliable image segmentation under realistic clinical conditions. As a result, the opening formed between the left and the right VF, denoted as glottis, was detected for each HS image. Based on the found glottal area, the lateral position of both VFs was determined in a robust manner for the entire HS sequence and no corrections were applied [17]. The corresponding glottal midline connecting the most dorsal and the most ventral point of the detected glottal area (see Fig. 1b) is divided into equidistant intervals. Thus, the deﬂection of the i-th VF point at a certain point in time t is characterized by its perpendicular distance d(t,i) to the glottal midline. By considering the sequence of these displacement values through time, a so-called trajectory is obtained. For the distinction of both VF sides’ trajectories an additional index ˛ = [Left; Right] is introduced. The determined VF deﬂections d˛ (t, i) are subsequently transformed into a novel 2-d color-image representation, denoted as PVG [16]. To this end the extracted VF distances are interpreted as color intensity information—the brighter the color of a certain PVG image point, the farther away the corresponding VF point from the glottal axis (see Fig. 1c). Usually a PVG consists of three distinct colors (red, black, blue), but the grayscale representation shown here illustrates the basic idea. The deﬂections of the left and right VF side towards the glottal midline are shown in the upper and lower half of

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

277

Fig. 1 – Grayscale PVG representation (c) of both VFs’ oscillations over time as captured in endoscopic HS recordings (a + b). The changing VF deﬂections yield characteristic PVG patterns, e.g. dorsal triangular shapes. The letters “A” and “P” denote the anterior and posterior detected endings of the glottis, “L” and “R” the left and right VF side.

a PVG, respectively. While a single PVG column displays the overall deﬂection alongside the entire VF length at a certain point in time, a single PVG row represents the trajectory of an individual VF point. Taking this compact PVG representation of spatial and temporal VF changes as a basis for diagnostic decision making, the physician is enabled to easily gain insight into the complexities of the underlying oscillation patterns and to assess clinical evidence of voice pathologies. Thus, the stability and symmetry of the laryngeal dynamics as a whole can be quickly examined in an intuitive manner. The various VF examination and visualization techniques available so far, do not allow for such differentiated analyses. Hence, the PVG can be of great practical help in the clinical workﬂow, as it supersedes costly slow-motion playback of HS videos. Nevertheless, this particular kind of PVG analysis is still based on subjective perception and may vary signiﬁcantly between different physicians.

2.3.

Feature extraction

To objectify the criteria underlying the clinical diagnosis of voice disorders using HS videos, the resulting PVG data matrix was subsequently analyzed to extract a set of numerical features describing the contained laryngeal movement information. As VF vibrations exhibit periodically recurring movement patterns (see Fig. 1c), it is appropriate to take these individual oscillation episodes comprising distinct opening and closing phases as a starting point for further analysis. Hence, the ﬁrst step of feature extraction consisted in automatically detecting the individual boundaries of the VF oscillation cycles by analyzing the glottal signal [20]. The found cycles were displayed and inspected, to ensure the validity of the subsequently derived features. Then, the obtained PVG cycles were normalized to a constant width and height [16], to allow for better inter-individual comparability of the derived

278

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

Fig. 2 – Normalized PVG cycles of the left and right VF side and the corresponding extracted contour lines (h = 50%) for two individuals (a: normophonic, b: pathological).

features. In doing so, the inﬂuence of varying phonation frequencies between test persons and different endoscopic distances to the VFs is eliminated. See Fig. 2 for exemplary normalized PVG cycles extracted from healthy and pathological VF vibrations. The dynamic opening and closing properties of the VFs during oscillation are characterized through the geometric shapes within the PVG cycles, which can be seen in Fig. 2 middle. Here, according to the ELS classiﬁcation given in [7], the depicted movement patterns exhibit dorsal triangular (a) and oval (b) shapes—indicating different modes of VF movement. This relevant shape information can be represented using so-called contour lines. These contours connect equivalent VF deﬂection states in the opening and closing phase of an oscillation cycle (see Fig. 2 right). For each trajectory, the two points in time were determined when a deﬁned deﬂection state dh is reached within the cycle, e.g. half of its displacement (h = 50%) as shown in Fig. 2 right. This establishes a relationship between the dynamics of different anatomical VF sections, which incorporates both spatial and temporal information. This particular in-depth description of VF oscillation patterns could not be obtained before the availability of the PVG visualization technique. A single PVG contour line is speciﬁed here by 256 individual contour points, each consisting of a temporal (x-axis), a longitudinal (y-axis), and a lateral position (color intensity) within

the considered cycle. As adjacent contour points exhibit a strong correlation due to anatomical reasons, the amount of information used can be reduced by data aggregation without losing diagnostic evidence. For this purpose, the VF dynamics were subsumed by averaging over eight intervals comprising 32 contour points each. Hence a simpliﬁed PVG contour line C˛h was obtained, which approximates the spatio-temporal changes of an entire VF section, instead of giving an isolated description of individual VF points. The question, which particular oscillation state is most adequate for describing VF dynamics, is still open, and the contour lines extracted from PVG cycles can be used as a means to solve this problem. PVG contour lines were extracted at ﬁve different contour heights h = [10; 30; 50; 70; 90] %, each capturing the dynamics of a different VF oscillation state. These described deﬂection states are located in-between the extremes of minimum (h = 0%) and maximum VF displacements (h = 100%). Contours C˛[10;90]% near these extreme oscillation states are denoted as peripheral contours, while contours C˛[30;50;70]% being located around the middle of the oscillation phase are denoted as intermediate contours. Left In this manner for each PVG cycle contour line features Ch Right

and Ch were obtained for both VF sides separately. In order to establish a connection between the two lateral descriptions and to capture the clinically relevant information on Left Right VF symmetry, proportions Ph = Ch /Ch and distance-based

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

Left

similarities Dh = Ch

Right

− Ch

between corresponding left 2

and right VF side were determined for each simpliﬁed contour line. The temporal changes of the VF properties at the PVG level were captured by computing the mean and the standard deviation of the derived contour and symmetry features ˛ (Ch , (C˛h ); Ph , (Ph ); Dh , (Dh )). The mean describes the average vibratory behavior of the VFs and the corresponding standard deviation quantiﬁes the deviation from this average movement pattern in time. In doing so, PVG feature sets FC10−90 were obtained, whereas in total each feature set comprised 448 quantitative features. In order to evaluate the gain achieved through these new PVG-based features with respect to conventional VF description approaches, an additional feature set was derived from the HS videos which served as a benchmark for performance. The determined glottal features FG were [10]: • open quotient (OQ), quantifying the proportion of time the glottis is open during an oscillation cycle, • speed quotient (SQ), representing the temporal proportion between the opening and closing phase of a cycle, • glottal insufﬁciency (GI), capturing the relation between a cycle’s minimum and maximum glottal opening, • time periodicity index (TPI), describing the temporal stability of the cycle duration, • and amplitude periodicity index (API), measuring a VF’s deﬂection stability. Since these features are directly computed from the 1-d glottal signal, they principally do not allow for the differentiation of the two VF sides, nor do they capture the dynamics of distinct VF sections. The glottal description approach only considers minimum and maximum glottal deﬂection states, which is opposed to PVG features, which also incorporate intermediate deﬂection states. This set of conventional glottal features is the standard approach for analyzing VF movements from HS recordings.

2.4.

Data analysis setup

The derived feature sets were subsequently taken as a basis for inductively building models of normal and pathological VF vibrations, which can potentially support clinical decision making. For this purpose each patient’s VF oscillation patterns were represented by numerical feature vectors computed from the corresponding PVG and glottis signal. By attaching the underlying clinical diagnoses to this representative set of feature vectors, a training set was obtained which was then analyzed with different supervised ML techniques. The objective was to identify the algorithm which allows for the differentiation of VF movement patterns most reliably. The following ML approaches were employed [21]: • k-nearest neighbor algorithm (k-NN) with Euclidian norm, • C4.5 decision tree with information gain splitting criterion, • multilayer perceptron (MLP) with backpropagation (one hidden layer with (# features + # classes)/2 nodes with sigmoid activation function),

279

• soft-margin support vector machine (SVM) with 1st, 2nd, 3rd-order polynomial and radial basis function (RBF) kernels. These ML algorithms automatically analyze the multidimensional feature space in order to identify structural interdependencies between features that can be exploited for the distinction of different VF vibration classes. In this study the following clinically relevant classiﬁcation tasks were examined, wherein each class comprised 15 cases: Healthy vs. ParesisL, Healthy vs. ParesisR, Healthy vs. Pathological, ParesisL vs. ParesisR, and Healthy vs. ParesisL vs. ParesisR. The ﬁrst three tasks address the issue of distinguishing two classes: normal and aberrant VF vibrations. While problems Healthy vs. ParesisL and Healthy vs. ParesisR consider the position of the affected VF sides explicitly, Healthy vs. Pathological does not maintain this lateral information. As such, the ﬁrst three tasks are used to assess the basic capability of the classiﬁcation approach to differentiate movements of healthy and impaired VFs. The data used for class Pathological are composed from a merged and undersampled class containing randomly selected cases from classes ParesisL and ParesisR. Through classiﬁcation task ParesisL vs. ParesisR it is evaluated how well both types of pareses can be distinguished from each other. Additionally, the performance obtained for 3-class-problem Healthy vs. ParesisL vs. ParesisR (solved using a “one-against-one” classiﬁcation scheme with majority vote [22]) gives information on the overall distinguishability of the whole training set. In order to obtain reliable classiﬁcation results unbiased by divergent class distributions [23], only balanced class conﬁgurations were considered in this study. The classiﬁcation tasks were analyzed with different ML methods using the novel PVG contour features FC10−90 and glottal features FG . So each patient’s laryngeal dynamics were described by six different feature vectors, as the PVG contours were extracted at different heights h. By exclusively using one feature description for learning at a time and then comparing the resulting model accuracies, the adequacy of the feature sets could be evaluated. In addition to considering the feature sets individually, all possible combinations of paired features were also examined to assess their capability to complement each other. So for example FC10+90 denotes the combined feature set obtained by joining feature vectors from FC10 and FC90 . Hence, 15 additional feature combinations were analyzed in this study. Most of the employed ML algorithms possess free parameters which affect the overall training process and the classiﬁcation performance of the resulting models. For the purpose of ﬁnding an appropriate ML parameter combination, a heuristic search strategy was used, namely the (␮ + ␭) evolution strategy of Schwefel and Rechenberg [24]. By repeatedly applying a sequence of elitist and tournament selection (fraction: 0.25), recombination (probability: 0.9) and Gaussian mutation to a set of ﬁve individuals representing potential problem solutions, the population’s overall ﬁtness (w.r.t. classiﬁcation accuracy) is assessed and its internal structure is accordingly adapted. In this manner, after 10 iterations of the evolutionary modiﬁcation scheme, suitable parameter settings were obtained for the individual ML algorithms as shown

280

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

Table 1 – Optimized parameters of the ML algorithms including the corresponding parameter range searched using the evolution strategy approach. ML algorithm

Optimized parameters

Considered range

k-NN

neighborhood k

1.0 . . . 20.0

C4.5

minimal gain g pruning conﬁdence c

0.0 . . . 10.0 1.0−7 . . . 0.5

MLP

learning rate r momentum m

0.0 . . . 1.0 0.0 . . . 1.0

SVM with polynomial kernel

cost C

0.0 . . . 10000.0

SVM with RBF kernel

cost C RBF width

0.0 . . . 10000.0 0.0 . . . 100.0

in Table 1. All remaining parameters were kept ﬁxed at their standard values. Due to the above-mentioned quality requirements concerning the HS recordings and the resulting limited amount of data, a stratiﬁed 10-fold cross-validation method (10CV) was used in this study for the evaluation of classiﬁcation performance [25]. As random inﬂuences occur during splitting of the training data into individual folds, the stability of the obtained performance measure was increased by repeating the 10CV evaluation process three times with differing random seed values for each classiﬁcation task. A reliable estimate for classiﬁcation accuracy was obtained by averaging over the individual results. In addition, classiﬁcation task Healthy vs. Pathological was evaluated multiple times using differing class conﬁgurations, since it comprises randomly undersampled and then merged data from the pathological class.

2.5.

Expert validation

To assess the achieved performance of the proposed PVG classiﬁcation system, an additional expert validation of the diagnostic decisions was conducted. This further evaluation step was due to the fact that the automatically determined classiﬁcations cannot be directly related to the gold standard diagnoses: Besides from the vibratory and symmetrical properties of the VFs (as captured by PVG features in this study), the physician additionally considers further physiological evidence to make an appropriate diagnosis (e.g. VFs’ normal position and transient oscillations). To revaluate the stability of the physicians’ judgment, given the same analytic criteria as in the automatic processing of the data, the same 45 HS sequences underlying this study were presented to six clinical experts (R1–6 ). The task posed to each expert was to assign the laryngeal HS videos to one of the three classes subjectively. This particular decision-making process corresponds to classiﬁcation task Healthy vs. ParesisL vs. ParesisR of this study. All additional image information giving hint to the diagnosis to be made was removed beforehand—so only VF movement patterns were available as the main basis for subjective classiﬁcation of the patients’ HS sequences. As a mean to evaluate the intra-rater reliability of the experts, a selection of 6 out of 45 available HS videos (2 for each class) was presented twice and the corresponding diagnoses were subsequently compared to each other. Thus, in total 51 HS recordings were rated by the clinical experts.

3.

Results

The cross-validated classiﬁcation accuracies achieved for the different employed learning schemes, feature descriptions and decision problems are presented in the following. As the number of individual classiﬁcation results obtained is high, only averaged accuracy values and standard deviations are shown.

3.1.

Machine learning performance

The classiﬁcation accuracies of the different ML approaches are presented in Fig. 3. These results give indication of the learning algorithms’ general capability to model class membership. The shown accuracies are averaged over PVG contour features FC10−90 and 2-class classiﬁcation tasks Healthy vs. ParesisL, Healthy vs. ParesisR, and ParesisL vs. ParesisR. These particular feature sets and decision problems are selected to ensure consistency between the outcomes of different experiments and to allow for the direct comparison of the employed ML algorithms’ performance. In the experiments best results were achieved by SVMs. As shown in Fig. 3, classiﬁcation accuracies of around 82% were obtained in average through this kernel-based ML technique. The decision problems could be solved most reliably using a SVM with 1st-order polynomial kernel function. This soft-margin linear separation of the PVG feature space yielded classiﬁcation results as good as 85%, signiﬁcantly exceeding (p < 0.05) the other inductive learning approaches (except SVM with quadratic kernel; p = 0.15). For this reason all results presented in the following are exclusively gained using this best performing SVM with linear kernel classiﬁer. The accuracy values for C4.5, MLP, and SVM with 3rd-order polynomial and RBF kernel were around 80%. Signiﬁcantly lowest classiﬁcation accuracy of this study (p < 0.05) was obtained through k-NN algorithm, whose instance-based learning scheme yielded 74%. In Table 2 the individual feature sets are given which yielded best and worst classiﬁcation performance for each ML algorithm.

3.2.

Feature set suitability

From the average classiﬁcation accuracies of the individual feature sets FC10−90 depicted in Fig. 4 the suitability of the different PVG and glottal description approaches concerning class discrimination can be assessed.

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

281

Fig. 3 – Classiﬁcation results obtained for different ML algorithms averaging over PVG feature sets FC10−90 and classiﬁcation tasks Healthy vs. ParesisL, Healthy vs. ParesisR, and ParesisL vs. ParesisR. Vertical bars indicate the accuracies’ standard deviations.

The intermediate PVG contours FC30−70 yielded best classiﬁcation accuracy within this study (approximately 87%), though not being statistically signiﬁcant (p > 0.05). The remaining peripheral PVG contour features FC10 and FC90 yielded lowered accuracies (82% and 84%). Conventional glottal description FG yielded weakest performance of all considered feature sets (approximately 76%). The corresponding classiﬁcation accuracies obtained from paired feature set combinations are given as radar charts in Fig. 5. The results are grouped from left to right according to the ﬁrst combined feature. For each chart the radii of the semicircle indicate the accuracy level achieved for the respective feature set combination. At this, the orientation of the radius displays clockwise the second feature set used for pairing: while the bottommost spoke illustrates the result of feature combination FC...+10 , the topmost spoke shows accuracies for FC...+90 . The individual radius corresponding to a pair of identical features (e.g. FC10+10 = FC10 ) is emphasized. As a point of reference for the gain achieved for each combined feature set, the individual feature set results are depicted as dotted half circles. In case of accuracy improvement the performance area gained is highlighted gray. The individually best performing feature set combination is marked with a small circle on

Table 2 – Best and worst performing feature sets for the individual ML algorithms obtained by averaging over classiﬁcation tasks Healthy vs. ParesisL, Healthy vs. ParesisR, and ParesisL vs. ParesisR. ML algorithm

Feature set performance Best

k-NN C4.5 MLP SVM (1st poly) SVM (2nd poly) SVM (3rd poly) SVM (RBF)

FC10 FC90 FC70 FC50 FC70 FC70 FC70

Worst FG FG FG FG FG FG FG

the corresponding radius and an additional line indicating the best overall classiﬁcation result. Since each feature set can be placed as the ﬁrst and as the second part for combination, in Fig. 5 each combined classiﬁcation result is shown twice: e.g. the accuracy obtained from paired feature set FC10+90 is given in groups FC10+... and FC90+... . By combining PVG contours extracted at lower heights with features describing higher contours, the results of the individual features could be signiﬁcantly improved (p < 0.05; e.g. illustrated by the accumulated gray area in the upper part of radar chart FC10+... ). In particular, feature set combination FC30+70 achieved a classiﬁcation accuracy of around 90%, outperforming all other feature pairs. The accuracy of best performing individual feature set FC50 could not be increased further by paired combinations. In Fig. 6 the classiﬁcation performance achieved through combination of the two different VF description approaches is shown. A highly signiﬁcant boost ( p < 0.001) compared to its individual performance was obtained for all pairs of glottal features FG with PVG contour features FC10−90 .

3.3.

Class discrimination capability

The general classiﬁcation capability of the presented PVG analysis approach for the individual decision tasks can be assessed from the accuracy results of best performing feature set combination FC30+70 as shown in Fig. 7. Since problem Healthy vs. ParesisL vs. ParesisR involves discriminating three balanced classes, the baseline classiﬁcation accuracy which can be achieved by assigning all cases to the same class amounts to 33.3%. This is opposed to a 50% baseline accuracy for the remaining balanced 2-class problems (see dashed horizontal lines). Results of 2-class decision tasks Healthy vs. ParesisL, Healthy vs. ParesisR, and Healthy vs. Pathological were as good as 93% in average. For the ﬁrst classiﬁcation task a decision accuracy of over 95% was achieved, yielding highest reliability of all examined problems. For the task of identifying the VF side which is actually affected by paresis (ParesisL vs. ParesisR) a reduced accuracy of approx. 81% was obtained. This issue of assign-

282

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

Fig. 4 – Classiﬁcation results obtained for different PVG contour features FC10−90 (light gray) and glottal features FG (dark gray) averaging over classiﬁcation tasks Healthy vs. ParesisL, Healthy vs. ParesisR, and ParesisL vs. ParesisR and using SVM with linear kernel for learning.

Fig. 5 – Classiﬁcation results obtained for PVG contour feature set combinations averaging over problems Healthy vs. ParesisL, Healthy vs. ParesisR, and ParesisL vs. ParesisR and using SVM with linear kernel for learning. A single radar chart is organized as follows: each spoke represents a feature set combination; the radius of the semicircle depicts the classiﬁcation accuracy achieved using a certain combination; while the small circle marks the best performing feature set combination, the additional small line indicates the best overall classiﬁcation result; the corresponding individual feature set’s performance is displayed as a dotted half circle; an accuracy improvement over the individual results is indicated gray. For the sake of clarity, standard deviations are not depicted.

ing pareses laterally is also reﬂected in the results of problem Healthy vs. ParesisL vs. ParesisR (73%): A closer look at the averaged confusion matrix of the corresponding results given in Table 3 reveals that over 62% of misclassiﬁcations occurred between the two paralytic classes (highlighted gray).

3.4.

Expert validation

In Fig. 8 the individually achieved subjective classiﬁcation accuracies for the six expert raters R1–6 , the corresponding averaged result R1−6 and standard deviation are depicted.

Table 3 – Averaged confusion matrix for classiﬁcation task Healthy vs. ParesisL vs. ParesisR (expressed as percentage of 45 cases in total) obtained from PVG contour feature combination FC30+70 using SVM with linear kernel for learning. Suspected class

Healthy ParesisL ParesisR

True class Healthy

ParesisL

ParesisR

30. 37% 1.48% 1.48%

2.96% 21.48% 8.89%

4.44% 8.15% 20.74%

62.2% of misclassiﬁcations between ParesisL and ParesisR

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

283

The average classiﬁcation accuracy achieved by the clinical experts when confronted with similar diagnostic conditions was as good as 64%. The SVM classiﬁcation results clearly outperformed all but one expert’s overall ratings (R5 ). Besides from exhibiting rather low inter-rater reliability ( = 0.35), reduced intra-rater reliability (approximately 75%) could also be observed. Healthy cases could be identiﬁed by all experts more reliably than pathological cases (see Fig. 8). The resulting averaged confusion matrix in Table 4 additionally points out that classes ParesisL and ParesisR were especially hard to distinguish for the experts, in total accounting alone for more than 53% of misclassiﬁcations (highlighted gray). This ﬁnding is in accordance with the SVM results given in Table 3.

Fig. 6 – Classiﬁcation results obtained for all paired feature set combinations of PVG contours and glottal parameters averaging over problems Healthy vs. ParesisL, Healthy vs. ParesisR, and ParesisL vs. ParesisR and using SVM with linear kernel for learning. For an explanation of the radar chart see Fig. 5.

Since the decision task posed to the experts corresponds to 3-class classiﬁcation problem Healthy vs. ParesisL vs. ParesisR analyzed by the ML algorithms, the average accuracy of the best performing SVM with linear kernel is also given as classiﬁcation benchmark (dashed horizontal line). The performance of the objective PVG classiﬁcation approach can thus be put into context with the experts’ subjective ratings when the only diagnostic information being available is VF vibration. Moreover, in Fig. 8 bottom the achieved inter-rater reliabilities (expressed as generalized Cohen’s kappa [26]) and intra-rater reliabilities (expressed as percentages of agreement) are presented.

4.

Discussion

4.1.

Expert validation

The result achieved by the presented automatic PVG classiﬁcation approach (dashed line in Fig. 8) was approximately 9% above the corresponding average result of the group of experts (R1−6 in Fig. 8) when the only diagnostic clue available was the patient’s VF oscillation pattern. The PVG feature analysis based on objective criteria derived from HS videos surpasses subjective assessment by clinically experienced raters in average. By means of the novel PVG classiﬁcation method objective clinical decision support can be provided to the physician. However, further laryngeal properties will be considered clinically in order to make a proper diagnosis of voice functioning: Besides the mandatory ELS criteria, also the VFs’ normal position, the phonation onset, and the general orientation of the arytenoid cartilages give indication of the paralytic impairment and its lateral assignment. But even without incorporating these additional laryngeal properties at the current state, the automatic quantitative PVG analysis approach already achieves a more reliable classiﬁcation of VF oscillation patterns than human experts. Further quantitative features describing different aspects of VF activity will be captured and analyzed with the presented classiﬁ-

Fig. 7 – Classiﬁcation results obtained for considered 2-class problems (light gray) and 3-class problem (dark gray) using PVG contour feature combination FC30+70 and SVM with linear kernel for learning. Dashed horizontal lines illustrate the baseline accuracies attainable by classifying all cases as one class.

284

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

Fig. 8 – Subjective classiﬁcation results for the 45 HS sequences achieved by the six expert raters using exclusively vibratory VF information for diagnosis. For each rater the class-speciﬁc accuracies and the overall classiﬁcation result are given from left to right. The dashed horizontal line indicates the overall accuracy achieved by the SVM classiﬁcation approach for 3-class decision problem Healthy vs. ParesisL vs. ParesisR. The corresponding reliability measures indicating the experts’ inter-rater and intra-rater variability are given below.

cation method in the future using a more extensive data collection. A particular problem for the experts consisted in distinguishing left- and right-sided pareses from each other (see Table 4). As this classiﬁcation difﬁculty is in accordance with the SVM results in Table 3, it suggests high resemblance of the two pathological classes considering only vibratory information. The reason for this aggravated lateral distinction can be seen in the fact, that a paralytic impairment does not manifest itself in the deﬂection symmetry of the affected VF as suspected. Yet, it may even be functionally compensated for by increased laryngeal muscle tension. From the experts’ misclassiﬁcations it can be concluded that for the reliable discrimination of unilateral pareses it is essential to include additional diagnostic information. The reduced inter- and intra-rater reliability measures presented in Fig. 8 substantiate the problem inherent to the subjective assessment of laryngeal dynamics as captured in HS videos. This issue can be attributed to the fact that human visual perception is better adapted to the processing of static information than to moving images. Comparing HS sequences of periodically recurring episodes of VF movement is consequently a rather difﬁcult task to accomplish. This holds

in particular for image sequences where no supplementary diagnostic evidence preceding the VF oscillations is available. The experts’ reduced rating reliability found in this study is a further argument in support of the objective PVG analysis technique.

4.2.

Machine learning performance

The SVM modeling approach attained signiﬁcantly highest classiﬁcation accuracy in the experiments (see Fig. 3). Hence, this learning method of ﬁnding the maximum-margin hyperplane which separates best the analyzed data points turned out to be particularly suited to the problem of distinguishing VF pathologies from healthy oscillation patterns. This ﬁnding is in accordance with the results of similar studies comparing classiﬁer performance in medical domains [27–29]. As for the choice of the employed SVM kernel function, best average classiﬁcation accuracies were obtained by using 1st-degree soft-margin polynomials, yet outperforming kernel functions with higher polynomial degree and RBF functions. Results of decision tree algorithm C4.5, backpropagation MLP and SVM with 3rd-order polynomial and RBF kernel were nearly equivalent. However, the C4.5 and SVM approaches should

Table 4 – Averaged confusion matrix for manual classiﬁcation of VF vibrations as captured in 45 HS recordings averaged over expert ratings R1–6 . Suspected class

Healthy ParesisL ParesisR

True class Healthy

ParesisL

ParesisR

26.98% 4.13% 2.22%

6.03% 19.37% 7.94%

4.76% 11.75% 16.83%

53.5% of misclassiﬁcations between ParesisL and ParesisR

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

be given preference when it comes to practical implementation of a clinical decision support system, as the amount of time needed for MLP training is considerably high. The general weak classiﬁcation performance of k-NN algorithm points towards a rather inhomogeneous class distribution in the feature space. The relatively high standard deviations in Fig. 3 (approximately 17%) can be ascribed to the limited amount of data available: For the 10CV model evaluation procedure the applied test fold simply comprised three cases for the considered 2-class problems, and therefore, the evaluation results were quite sensitive to the random split of the data.

4.3.

Individual feature set suitability

The gradual decline of classiﬁcation results (approximately −4%) obtained from using peripheral features at the boundaries of the VF oscillation phases (see Fig. 4) suggests that characteristic VF movement patterns are expressed best in medium deﬂection states. Hence, class membership can be determined most reliably through the corresponding intermediate PVG contour features, as they appropriately capture the relevant spatio-temporal information during the opening and closing phase of a cycle. This is opposed to the conventional glottal description approach which basically implements a peak-picking strategy, and thus only allows for the consideration of maximum glottal deﬂection states. The advantage of the new PVG contours over the glottal features is conﬁrmed by the results in Fig. 4, as the accuracy gain achieved by employing individually best performing PVG feature set FC50 amounts to 11%. Hence, the additional amount of VF vibration information captured through 2-d PVG contour features signiﬁcantly improves classiﬁcation accuracy. The features extracted from the 1-d glottal signal lack the descriptive power to capture the necessary details of VF dynamics.

4.4.

Combined feature set suitability

As shown in Fig. 5, high and low PVG contours are complementing each other in an advantageous manner. Classes could be distinguished most reliably when information derived from the ﬁrst and the second third of a VF oscillation phase was considered (as captured by combined feature set FC30+70 ). This supports the ﬁnding of the diagnostic relevance of intermediate VF deﬂection states. Compared to the results of the individual features (dotted half circles in Fig. 5) most feature set combinations lead to improved accuracies (represented by gray areas). Largest gain for contour features was achieved for individually weak performing peripheral feature sets (FC10 : +4.2%; FC90 : +4.3%). For best functioning individual PVG feature set FC50 , classiﬁcation results could not be increased further by combination (yielding no gray area), conﬁrming its already high suitability for class discrimination purposes. Despite the high average gain of +8.6% for the combination of glottal features FG with PVG features FC10−90 compared to the individual classiﬁcation results (see extensive gray area in Fig. 6), its respective accuracy values are still surpassed by paired PVG contour features. This can be ascribed to the fact that certain combinations of glottal and PVG descriptions may exploit additional information which is not covered by the individual features. Still, PVG feature combinations are more

285

discriminative, as they provide an even more detailed description of the underlying VF movement patterns through time. It can be stated that for classiﬁcation purposes PVG features are more suitable than glottal features.

4.5.

General classiﬁcation capability

The presented PVG classiﬁcation approach is capable of accurately distinguishing between individuals with a healthy and a wide variety of paralytic voices (see Fig. 7). Classiﬁcation problems Healthy vs. ParesisL, Healthy vs. ParesisR, and Healthy vs. Pathological are especially important clinically, as they constitute a basic decision step of the diagnostic process: to identify whether an individual is affected by a voice disorder or not. Hence, the average classiﬁcation accuracy of over 93% obtained in this study for the differentiation of normal and paralytic VF vibrations by means of PVG features is considered successful. This ﬁnding is conﬁrmed when contrasting the results to other studies [12,30,31]. But as the automatic classiﬁcation of laryngeal HS recordings in general and PVG feature extraction in particular are utterly novel VF dynamics analysis approaches, only few directly comparable results can be found in the literature. The technique most frequently applied is digitally recording a patient’s voice signal, extracting miscellaneous acoustic features and applying different ML algorithms to classify the audio signal into normal or pathological groups [32–34]. But this classiﬁcation approach basically differs from the one presented in this paper: it makes a difference to analyze the underlying vibration patterns of the sound-producing elements in the larynx (VFs) or to analyze the phonatory outcome of this source (voice signal). The mapping from a hoarse voice (effect) to the pathology (cause) is obviously much more ambiguous than from VF movements directly. Due to this fact most studies exclusively focus on the decision problem Healthy vs. Pathological, or rather “Normal vs. Hoarse Voice”, which essentially corresponds to the amount of noise in the audio signal. The clinical question, which VF side is affected by a detected pathology, cannot be answered by analyzing the voice signal only. Further studies were based on feature extraction from the glottal signal (as captured by FG in this paper) and statistically determining ranges for normophonic and dysphonic speakers [10,35,36]. But no direct attempts to classifying any data were made in these studies. In summary, it can be said that VF pathology classiﬁcation results were attained in this study which are evenly matched to existing voice analysis approaches. These two branches of VF analysis and diagnosis ought to be considered jointly to beneﬁt from the individual strengths.

4.6.

Lateral classiﬁcation capability

Explicit lateral classiﬁcation (ParesisL vs. ParesisR, Healthy vs. ParesisL vs. ParesisR) performed relatively poorly compared to the results of diagnostic tasks Healthy vs. ParesisL, Healthy vs. ParesisR, and Healthy vs. ParesisL vs. ParesisR. From the average performance loss of about −12% as shown in Fig. 7 and the relatively high lateral error rate in Table 3 it can be concluded that the employed ML algorithms have difﬁculties differen-

286

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

Fig. 9 – PVG contour feature plot for two exemplary pathological cases which were particularly hard to classify. While (a) shows ﬁve normalized oscillation cycles of a case where the left VF is affected by paresis, (b) shows a right VF side paresis. For each cycle the PVG contours with heights h = 30% and h = 70% are depicted for both VF sides, respectively.

tiating the two types of pareses. But when considering the baseline accuracy thresholds (depicted as horizontal lines in Fig. 7) for balanced 2-class (50%) and 3-class problems (33.3%), it must be noted that results of Healthy vs. ParesisL vs. ParesisR relatively outperform results of ParesisL vs. ParesisR. Hence, a classiﬁer built to distinguish all three classes jointly achieves more reliable results than a classiﬁer that exclusively focuses on ParesisL and ParesisR. In order to understand this aggravated lateral distinction more thoroughly, in the following a selection of VF oscillations which was especially hard to classify is considered in more detail. For this purpose in Fig. 9 a PVG contour feature plot is shown for two selected paralytic voices from this study. These special cases are regarded as deviations from a mutual similarity criterion deﬁning the class in question, and give hint to the reason for misclassiﬁcation. For both paralytic examples shown in Fig. 9 VF movement appears relatively stable, exhibiting only small variation in the contour shapes of consecutive oscillation cycles. The relation between the vibrations of the left and right VF side is quite balanced—none of the two VFs can be characterized as being functionally inferior to the other. Hence, for these particular pathological cases the common diagnostic criterion of lateral vibration dissimilarity does not hold—by all appearances they rather resemble cases from the healthy control group. It can therefore be stated that the impact of the pareses on the VF vibration patterns is not always as strong as presumed. This

can be ascribed to the fact that certain characteristics of VF vibration patterns are inﬂuenced by the phonation frequency [37,20]. Since in this study no explicit frequency ranges were speciﬁed during examination, an inappropriate choice may have been made inadvertently by the patient, and as a result no distinct evidence for a voice disorder can be seen from the vibratory patterns in question. Accordingly it is not surprising that certain pathological examples were difﬁcult to distinguish from each other using a shape-based oscillation pattern description approach, and thus, only reduced accuracy was obtained for lateral classiﬁcation in this study. Nevertheless, it can be assumed that by expanding the PVG feature description approach as a whole and incorporating additional physiological information of the patient (e.g. voice signal features, features capturing the orientation of the arytenoids) further improvement of the overall classiﬁcation accuracy can be achieved. Besides from adapting the feature extraction process, collecting a more comprehensive clinical data set is also an important point in order to assess the general validity of the statements concluded from the rather small clinical example set underlying this study. Furthermore, differing phonation paradigms, which are more suited to the identiﬁcation of VF pathologies, should be utilized during examination (e.g. non-stationary [38,39]). In doing so, the objective PVG-based analysis technique can be aligned more closely to the diagnostic chain which is actually followed in the clinical routine. The results of this paper suggest that

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

the presented knowledge-based PVG feature extraction and SVM classiﬁcation approach holds a lot of promise to serve as a reliable diagnosis support system for voice disorders in general.

5.

Conclusions

An objective voice analysis and classiﬁcation approach was presented in this paper which enables the reliable identiﬁcation of paralytic voice disorders. It is based on quantitative PVG feature descriptions of the laryngeal dynamics along the entire VF length and subsequent SVM learning. Using an endoscopic HS camera for a collective of 45 normophonic and pathological female speakers the VF movements were digitally recorded during phonation. The resulting HS videos were segmented and the captured vibrations were transformed into a PVG representation. The contained spatio-temporal patterns of activity were described by means of PVG contour features at different VF deﬂection states. Conventional glottal parameters were determined as a reference for performance. The obtained feature sets were analyzed using a variety of ML algorithms with the aim to correctly distinguish healthy from pathological cases. An additional expert validation was conducted to assess the subjective discrimination of the clinical cases given the same diagnostic criteria as for the objective analysis. With the proposed PVG classiﬁcation approach a reliable distinction between normal and paralytic VF movement patterns was achieved, yielding an average classiﬁcation accuracy of over 93%. This even outperformed the results obtained for the subjective classiﬁcations of the experts. The PVG features’ advantage over the glottal features regarding their capability to describe the underlying vibratory processes was substantiated. A particular improvement was shown for the combination of PVG contour features extracted from different VF deﬂection states. Possible starting points for improving certain aspects of the PVG description approach could also be identiﬁed. On the whole, the presented PVG classiﬁcation approach holds a lot of potential to support clinical decision making in the future by providing a sound objective basis.

Conﬂict of interest statement The authors hereby state that, aside from the public sources of funding explicitly speciﬁed in the following section, the work presented in this paper is free of any ﬁnancial or personal relationships with other people or organizations that could inappropriately inﬂuence the results.

Acknowledgments This work was funded by the Deutsche Forschungsgemeinschaft (DFG), grants LO1413/2 1-3 and EY15/11 3-4. The authors would like to thank the High Performance Computing Group at Regionales Rechenzentrum Erlangen (RRZE) for providing computational resources.

287

references

[1] T. Rasch, S. Günther, U. Hoppe, U. Eysholdt, F. Rosanowski, Voice-related quality of life in organic and functional voice disorders, Logoped. Phoniatr. Vocol. 30 (1) (2005) 9–13. [2] P.N. Carding, J.A. Wilson, K. Mackenzie, I.J. Deary, Measuring voice outcomes: state of the science review, J. Laryngol. Otol. 123 (8) (2009) 823–829. [3] R.J. Baken, Electroglottography, J. Voice 6 (2) (1992) 98–110. [4] J. Wendler, Stroboscopy, J. Voice 6 (2) (1992) 149–154. [5] M. Döllinger, The next step in voice assessment: High-speed digital endoscopy and objective evaluation, Curr. Bioinform. 4 (2) (2009) 101–111. [6] R.T. Sataloff, The professional voice: Part II. Physical examination, J. Voice 1 (2) (1987) 191–201. [7] P.H. Dejonckere, P. Bradley, P. Clemente, G. Cornut, L. Crevier-Buchman, G. Friedrich, P. Van De Heyning, M. Remacle, V. Woisard, Committee on Phoniatrics of the European Laryngological Society (ELS), A basic protocol for functional assessment of voice pathology, especially for investigating the efﬁcacy of (phonosurgical) treatments and evaluating new assessment techniques, Eur. Arch. Otorhinolaryngol. 258 (2) (2001) 77–82. ˇ [8] J.G. Svec, H.K. Schutte, Videokymography: high-speed line scanning of vocal fold vibration, J. Voice 10 (2) (1996) 201–205. [9] A. Verikas, V. Uloza, M. Bacauskiene, A. Gelzinis, E. Kelertas, Advances in laryngeal imaging, Eur. Arch. Otorhinolaryngol. 266 (10) (2009) 1509–1520. [10] Q. Qiu, H.K. Schutte, L. Gu, Q. Yu, An automatic method to quantify the vibration properties of human vocal folds via videokymography, Folia Phoniatr. Logop. 55 (3) (2003) 128–136. [11] P. Mergell, H. Herzel, I.R. Titze, Irregular vocal-fold vibration—high-speed observation and modeling, J. Acoust. Soc. Am. 108 (6) (2000) 2996–3002. [12] R. Schwarz, U. Hoppe, M. Schuster, T. Wurzbacher, U. Eysholdt, J. Lohscheller, Classiﬁcation of unilateral vocal fold paralysis by endoscopic digital high-speed recordings and inversion of a biomechanical model, IEEE Trans. Biomed. Eng. 53 (6) (2006) 1099–1108. [13] Y. Yan, K. Ahmad, M. Kunduk, D. Bless, Analysis of vocal-fold vibrations from high-speed laryngeal images using a Hilbert transform-based methodology, J. Voice 19 (2) (2005) 161–175. [14] Y. Yan, E. Damrose, D. Bless, Functional analysis of voice using simultaneous high-speed imaging and acoustic recordings, J. Voice 21 (5) (2007) 604–616. [15] Y. Zhang, C. Tao, J.J. Jiang, Parameter estimation of an asymmetric vocal-fold system from glottal area time series using chaos synchronization, Chaos 16 (2) (2006) 023118-1–023118-8. [16] J. Lohscheller, U. Eysholdt, H. Toy, M. Döllinger, Phonovibrography: mapping high-speed movies of vocal-fold vibrations into 2-D diagrams for visualizing and analyzing the underlying laryngeal dynamics, IEEE Trans. Med. Imaging 27 (3) (2008) 300–309. [17] J. Lohscheller, H. Toy, F. Rosanowski, U. Eysholdt, M. Döllinger, Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos, Med. Image Anal. 11 (4) (2007) 400–413. [18] A. Verikas, A. Gelzinis, D. Valincius, M. Bacauskiene, V. Uloza, Multiple feature sets based categorization of laryngeal images, Comput. Method Progr. Biol. 85 (3) (2007) 257–266. [19] A. Verikas, A. Gelzinis, M. Bacauskiene, V. Uloza, Towards a computer-aided diagnosis system for vocal cord diseases, Artif. Intell. Med. 36 (1) (2006) 71–84.

288

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

[20] D. Voigt, M. Döllinger, T. Braunschweig, A. Yang, U. Eysholdt, J. Lohscheller, Classiﬁcation of functional voice disorders based on phonovibrograms, Artif. Intell. Med. (2010), doi:10.1016/j.artmed.2010.01.001, in press. [21] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classiﬁcation, second ed., John Wiley & Sons, New York, USA, 2001. [22] C.W. Hsu, C.J. Lin, A comparison of methods for multi-class support vector machines, IEEE T Neural Network 13 (2) (2002) 415–425. [23] N. Japkowicz, S. Stephen, The class imbalance problem: a systematic study, Intell. Data Anal. 6 (5) (2002) 429–449. [24] H. Beyer, H. Schwefel, Evolution strategies—a comprehensive introduction, Nat. Comput. 1 (1) (2002) 3–52. [25] R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, IJCAI (1995) 1137–1145. [26] K. Krippendorf, Content Analysis: An Introduction to Its Methodology, Sage Publications, Beverly Hills, CA, USA, 1980. [27] D. Wang, B. Larder, A. Revell, J. Montaner, R. Harrigan, F. De Wolf, J. Lange, S. Wegener, L. Ruiz, M.J. Pérez-Elías, S. Emery, J. Gatell, A. D’Arminio Monforte, C. Torti, M. Zazzi, C. Lane, A comparison of three computational modelling methods for the prediction of virological response to combination HIV therapy, Artif. Intell. Med. 47 (1) (2009) 63–74. [28] H. Shin, M.K. Markey, A machine learning perspective on the development of clinical decision support systems utilizing mass spectra of blood samples, J. Biomed. Inform. 39 (2) (2006) 227–248. [29] G. Díaz, F.A. González, E. Romero, A semi-automatic method for quantiﬁcation and classiﬁcation of erythrocytes infected with malaria parasites in microscopic images, J. Biomed. Inform. 42 (2) (2009) 296–307. [30] M.A. Little, P.E. McSharry, S.J. Roberts, D.A. Costello, I.M. Moroz, Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection, Biomed. Eng. Online 6 (23) (2007).

[31] A. Gelzinis, A. Verikas, M. Bacauskiene, Automated speech analysis applied to laryngeal disease categorization, Comput. Method Progr. Biol. 91 (1) (2008) 36–47. [32] K. Umapathy, S. Krishnan, V. Parsa, D.G. Jamieson, Discrimination of pathological voices using a time-frequency approach, IEEE Trans. Biomed. Eng. 52 (3) (2005) 421–430. [33] C.D. Crovato, A. Schuck, The use of wavelet packet transform and artiﬁcial neural networks in analysis and classiﬁcation of dysphonic voices, IEEE Trans. Biomed. Eng. 54 (10) (2007) 1898–1900. [34] J.I. Godino-Llorente, P. Gómez-Vilda, Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors, IEEE Trans. Biomed. Eng. 51 (2) (2004) 380–384. [35] T. Braunschweig, J. Flaschka, P. Schelhorn-Neise, M. Döllinger, High-speed video analysis of the phonation onset, with an application to the diagnosis of functional dysphonias, Med. Eng. Phys. 30 (1) (2008) 59–66. [36] H.S. Bonilha, D.D. Deliyski, Period and glottal width irregularities in vocally normal speakers, J. Voice 22 (6) (2008) 699–708. [37] R.F. Orlikoff, R.J. Baken, Consideration of the relationship between the fundamental frequency of phonation and vocal jitter, Folia Phoniatr. (Basel) 42 (1) (1990) 31–40. [38] O. Rasp, J. Lohscheller, M. Döllinger, U. Eysholdt, U. Hoppe, The pitch rise paradigm: a new task for real-time endoscopy of non-stationary phonation, Folia Phoniatr. Logop. 58 (3) (2006) 175–185. [39] T. Wurzbacher, R. Schwarz, M. Döllinger, U. Hoppe, U. Eysholdt, J. Lohscheller, Model-based classiﬁcation of nonstationary vocal fold vibrations, J. Acoust. Soc. Am. 120 (2) (2006) 1012–1027.

Automatic diagnosis of vocal fold paresis by employing phonovibrogram features and machine learning methods

Automatic diagnosis of vocal fold paresis by employing phonovibrogram features and machine learning methods

Recommend Documents