Automatic diagnosis of vocal fold paresis by employing phonovibrogram features and machine learning methods

Automatic diagnosis of vocal fold paresis by employing phonovibrogram features and machine learning methods

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288 journal homepage: www.intl.elsevierhealth.com/j...

584KB Sizes 1 Downloads 62 Views

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

journal homepage: www.intl.elsevierhealth.com/journals/cmpb

Automatic diagnosis of vocal fold paresis by employing phonovibrogram features and machine learning methods Daniel Voigt a,∗, Michael Döllinger a, Anxiong Yang a, Ulrich Eysholdt a, Jörg Lohscheller b a b

Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Bohlenplatz 21, D-91054 Erlangen, Germany University of Applied Science Trier, Department of Computer Science, Schneidershof, D-54293 Trier, Germany

a r t i c l e

i n f o

a b s t r a c t

Article history:

The clinical diagnosis of voice disorders is based on examination of the rapidly moving vocal

Received 13 July 2009

folds during phonation (f0: 80–300 Hz) with state-of-the-art endoscopic high-speed cameras.

Received in revised form

Commonly, analysis is performed in a subjective and time-consuming manner via slow-

2 November 2009

motion video playback and exhibits low inter- and intra-rater reliability. In this study an

Accepted 9 January 2010

objective method to overcome this drawback is presented being based on Phonovibrography, a novel image analysis technique. For a collective of 45 normophonic and paralytic voices

Keywords:

the laryngeal dynamics were captured by specialized Phonovibrogram features and ana-

Computer-assisted diagnosis

lyzed with different machine learning algorithms. Classification accuracies reached 93%

Voice disorders

for 2-class and 73% for 3-class discrimination. The results were validated by subjective

Vocal fold paresis

expert ratings given the same diagnostic criteria. The automatic Phonovibrogram analysis

Phonovibrography

approach exceeded the experienced raters’ classifications by 9%. The presented method

Feature extraction

holds a lot of potential for providing reliable vocal fold diagnosis support in the future. © 2010 Elsevier Ireland Ltd. All rights reserved.

Laryngeal high-speed video analysis

1.

Introduction

Verbal communication plays an important part in the modern world. Hence, an occurring vocal dysfunction seriously affects a person’s social integration and perceived quality of life [1]. For the differentiation of normal and aberrant voices a variety of sophisticated examination techniques are employed clinically [2–6]. The application of a certain set of these diagnostic instruments is compulsory [7]: in order to make a proper diagnosis of a patient’s voice functioning, the physician is obliged to follow this clinical guideline step by step. Usually the most revealing part of this diagnostic chain is the visual inspection of the two vocal folds (VFs), being the main voice producing elements in the larynx. The distinction between healthy and pathological laryngeal dynamics is commonly made based on the degree of symmetry and regularity of the VF vibrations



during phonation [7]. However, examination is aggravated by the fact that VFs are oscillating at a fundamental frequency of 80–300 Hz. For the purpose of observing the rapidly moving VFs, various analysis approaches have been developed (e.g. stroboscopy [4] and videokymography [8]). To date endoscopic high-speed (HS) cameras have turned out to be the most promising technology [5]. Nevertheless, for the clinical assessment of HS recordings a high level of experience and time is needed on the physician’s part, as the human eye is far better adapted to process static visual information than moving images. As a consequence the resulting subjective diagnoses are inherently error-prone and may vary significantly between different physicians examining the same patient. To remedy this weakness, different quantitative analysis approaches have been introduced enabling objective HS analysis [9–12]. Lately, Hilbert transform-based approaches [13], Nyquist plots [14], and methods from nonlinear systems analysis [15] have

Corresponding author. Tel.: +49 9131 85 32602. E-mail address: [email protected] (D. Voigt). 0169-2607/$ – see front matter © 2010 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.cmpb.2010.01.004

276

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

been suggested for this purpose. However, all of these methods lack the ability to analyze the complex VF oscillation pattern in its entirety. A fast and clinically evaluated visualization method for capturing the whole spatio-temporal pattern of activity, namely Phonovibrography, has been recently introduced [16]. The VF deflections are automatically extracted from laryngeal HS recordings and transformed into a compact graphical representation, a so-called Phonovibrogram (PVG). In this manner the 2-d laryngeal dynamics can be inspected at a level of detail, which could not be achieved by previous 1-d signal analysis approaches [10]. But aside from being a valuable diagnostic tool [17], the PVG characteristics can again be analyzed and described by numerical features to capture the underlying VF vibration properties. The resulting feature vectors are then taken as a basis to build classification models allowing for the reliable automatic identification of voice disorders. In doing so, valuable decision support is provided to the physician in a timely manner enabling objectification of voice diagnosis in a clinical setting. Thus, currently available subjective examination methods (e.g. stroboscopy) can be replaced by a more sophisticated and trustworthy approach, paving the way for the clinical realization of evidence-based medicine. In this study, clinical HS recordings of 45 female subjects with normal voices and unilateral VF pareses are processed with a novel feature extraction method for describing PVG dynamics. The contained pathological concepts are modeled using a selection of supervised machine learning (ML) algorithms in combination with an evolutionary parameter optimization strategy. The test data are classified according to different clinically meaningful decision tasks and the corresponding cross-validated accuracies are determined. Given the available clinical data set, the objective is to assess the basic classification capability of the approach, to identify the most appropriate ML algorithm for modeling and to evaluate certain PVG feature extraction parameters. A further set of conventional 1-d features is analyzed to estimate the performance gain achieved through the novel PVG features. In order to compare the diagnostic reliability of the objective analysis approach with the subjective results of six expert raters, an additional validation is conducted using the same HS sequences underlying this study.

2.

Material and methods

2.1.

Clinical data

To provide a gold standard for subsequent data analysis the laryngeal dynamics of a collective of 45 female test persons (median age: 44.4 years) were thoroughly assessed by experienced physicians according to the standardized examination procedure of the European Laryngological Society (ELS) [7]. This protocol implies a variety of examination approaches: auditory-perceptual assessment, video-laryngoscopic examination, aerodynamic and acoustic analysis, and self-rating of the patient. For a third of the women no evidence of an impaired voice was found, and hence, they constituted the

healthy control group. The remaining 30 cases showed a variety of manifestations of VF paresis, ranging from mildly to severely impaired. This relatively prevalent voice disorder involves the degradation of one VF side’s vibratory properties, commonly caused by neural damage. While for one half of the examined patients mainly the left VF side was affected by paresis, for the other half of the test persons the pathology was identified at the right side. In contrast to organic voice disorders (e.g. nodule, polyp, edema), which can be assessed quite reliably by analyzing laryngeal still images [18,19], it is widely held that among other things, an appropriate paralytic diagnosis can only be made by considering the overall oscillation behavior of the VFs during phonation and relating both sides’ dynamics to each other.

2.2.

Phonovibrography

The VF movements of all patients have been digitally recorded with an endoscopic HS camera system (Wolf Highspeed Endocam 5542) while producing the sustained vowel /a/, enabling the analysis of laryngeal vibration properties. The camera takes 8-bit grayscale images at a rate of 4000 frames per second with a spatial resolution of 256 × 256 pixels. The high temporal sampling frequency is necessary due to fact that the fundamental frequency of normal speakers approximately ranges from 80 to 300 Hz. During examination the endoscope is inserted orally, providing a top view of the larynx (see Fig. 1a + b). The following image quality criteria had to be met to yield adequate HS recordings for diagnosis: sufficient lighting conditions, overall image sharpness, and complete visibility of both VF edges. The captured HS sequence consisting of 500 laryngeal images (see Fig. 1b) was subsequently processed using a specifically designed region-growing algorithm [17], which ensures reliable image segmentation under realistic clinical conditions. As a result, the opening formed between the left and the right VF, denoted as glottis, was detected for each HS image. Based on the found glottal area, the lateral position of both VFs was determined in a robust manner for the entire HS sequence and no corrections were applied [17]. The corresponding glottal midline connecting the most dorsal and the most ventral point of the detected glottal area (see Fig. 1b) is divided into equidistant intervals. Thus, the deflection of the i-th VF point at a certain point in time t is characterized by its perpendicular distance d(t,i) to the glottal midline. By considering the sequence of these displacement values through time, a so-called trajectory is obtained. For the distinction of both VF sides’ trajectories an additional index ˛ = [Left; Right] is introduced. The determined VF deflections d˛ (t, i) are subsequently transformed into a novel 2-d color-image representation, denoted as PVG [16]. To this end the extracted VF distances are interpreted as color intensity information—the brighter the color of a certain PVG image point, the farther away the corresponding VF point from the glottal axis (see Fig. 1c). Usually a PVG consists of three distinct colors (red, black, blue), but the grayscale representation shown here illustrates the basic idea. The deflections of the left and right VF side towards the glottal midline are shown in the upper and lower half of

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

277

Fig. 1 – Grayscale PVG representation (c) of both VFs’ oscillations over time as captured in endoscopic HS recordings (a + b). The changing VF deflections yield characteristic PVG patterns, e.g. dorsal triangular shapes. The letters “A” and “P” denote the anterior and posterior detected endings of the glottis, “L” and “R” the left and right VF side.

a PVG, respectively. While a single PVG column displays the overall deflection alongside the entire VF length at a certain point in time, a single PVG row represents the trajectory of an individual VF point. Taking this compact PVG representation of spatial and temporal VF changes as a basis for diagnostic decision making, the physician is enabled to easily gain insight into the complexities of the underlying oscillation patterns and to assess clinical evidence of voice pathologies. Thus, the stability and symmetry of the laryngeal dynamics as a whole can be quickly examined in an intuitive manner. The various VF examination and visualization techniques available so far, do not allow for such differentiated analyses. Hence, the PVG can be of great practical help in the clinical workflow, as it supersedes costly slow-motion playback of HS videos. Nevertheless, this particular kind of PVG analysis is still based on subjective perception and may vary significantly between different physicians.

2.3.

Feature extraction

To objectify the criteria underlying the clinical diagnosis of voice disorders using HS videos, the resulting PVG data matrix was subsequently analyzed to extract a set of numerical features describing the contained laryngeal movement information. As VF vibrations exhibit periodically recurring movement patterns (see Fig. 1c), it is appropriate to take these individual oscillation episodes comprising distinct opening and closing phases as a starting point for further analysis. Hence, the first step of feature extraction consisted in automatically detecting the individual boundaries of the VF oscillation cycles by analyzing the glottal signal [20]. The found cycles were displayed and inspected, to ensure the validity of the subsequently derived features. Then, the obtained PVG cycles were normalized to a constant width and height [16], to allow for better inter-individual comparability of the derived

278

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

Fig. 2 – Normalized PVG cycles of the left and right VF side and the corresponding extracted contour lines (h = 50%) for two individuals (a: normophonic, b: pathological).

features. In doing so, the influence of varying phonation frequencies between test persons and different endoscopic distances to the VFs is eliminated. See Fig. 2 for exemplary normalized PVG cycles extracted from healthy and pathological VF vibrations. The dynamic opening and closing properties of the VFs during oscillation are characterized through the geometric shapes within the PVG cycles, which can be seen in Fig. 2 middle. Here, according to the ELS classification given in [7], the depicted movement patterns exhibit dorsal triangular (a) and oval (b) shapes—indicating different modes of VF movement. This relevant shape information can be represented using so-called contour lines. These contours connect equivalent VF deflection states in the opening and closing phase of an oscillation cycle (see Fig. 2 right). For each trajectory, the two points in time were determined when a defined deflection state dh is reached within the cycle, e.g. half of its displacement (h = 50%) as shown in Fig. 2 right. This establishes a relationship between the dynamics of different anatomical VF sections, which incorporates both spatial and temporal information. This particular in-depth description of VF oscillation patterns could not be obtained before the availability of the PVG visualization technique. A single PVG contour line is specified here by 256 individual contour points, each consisting of a temporal (x-axis), a longitudinal (y-axis), and a lateral position (color intensity) within

the considered cycle. As adjacent contour points exhibit a strong correlation due to anatomical reasons, the amount of information used can be reduced by data aggregation without losing diagnostic evidence. For this purpose, the VF dynamics were subsumed by averaging over eight intervals comprising 32 contour points each. Hence a simplified PVG contour line C˛h was obtained, which approximates the spatio-temporal changes of an entire VF section, instead of giving an isolated description of individual VF points. The question, which particular oscillation state is most adequate for describing VF dynamics, is still open, and the contour lines extracted from PVG cycles can be used as a means to solve this problem. PVG contour lines were extracted at five different contour heights h = [10; 30; 50; 70; 90] %, each capturing the dynamics of a different VF oscillation state. These described deflection states are located in-between the extremes of minimum (h = 0%) and maximum VF displacements (h = 100%). Contours C˛[10;90]% near these extreme oscillation states are denoted as peripheral contours, while contours C˛[30;50;70]% being located around the middle of the oscillation phase are denoted as intermediate contours. Left In this manner for each PVG cycle contour line features Ch Right

and Ch were obtained for both VF sides separately. In order to establish a connection between the two lateral descriptions and to capture the clinically relevant information on Left Right VF symmetry, proportions Ph = Ch /Ch and distance-based

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

 

Left

similarities Dh = Ch

Right

− Ch

   between corresponding left 2

and right VF side were determined for each simplified contour line. The temporal changes of the VF properties at the PVG level were captured by computing the mean and the standard deviation of the derived contour and symmetry features ˛ (Ch , (C˛h ); Ph , (Ph ); Dh , (Dh )). The mean describes the average vibratory behavior of the VFs and the corresponding standard deviation quantifies the deviation from this average movement pattern in time. In doing so, PVG feature sets FC10−90 were obtained, whereas in total each feature set comprised 448 quantitative features. In order to evaluate the gain achieved through these new PVG-based features with respect to conventional VF description approaches, an additional feature set was derived from the HS videos which served as a benchmark for performance. The determined glottal features FG were [10]: • open quotient (OQ), quantifying the proportion of time the glottis is open during an oscillation cycle, • speed quotient (SQ), representing the temporal proportion between the opening and closing phase of a cycle, • glottal insufficiency (GI), capturing the relation between a cycle’s minimum and maximum glottal opening, • time periodicity index (TPI), describing the temporal stability of the cycle duration, • and amplitude periodicity index (API), measuring a VF’s deflection stability. Since these features are directly computed from the 1-d glottal signal, they principally do not allow for the differentiation of the two VF sides, nor do they capture the dynamics of distinct VF sections. The glottal description approach only considers minimum and maximum glottal deflection states, which is opposed to PVG features, which also incorporate intermediate deflection states. This set of conventional glottal features is the standard approach for analyzing VF movements from HS recordings.

2.4.

Data analysis setup

The derived feature sets were subsequently taken as a basis for inductively building models of normal and pathological VF vibrations, which can potentially support clinical decision making. For this purpose each patient’s VF oscillation patterns were represented by numerical feature vectors computed from the corresponding PVG and glottis signal. By attaching the underlying clinical diagnoses to this representative set of feature vectors, a training set was obtained which was then analyzed with different supervised ML techniques. The objective was to identify the algorithm which allows for the differentiation of VF movement patterns most reliably. The following ML approaches were employed [21]: • k-nearest neighbor algorithm (k-NN) with Euclidian norm, • C4.5 decision tree with information gain splitting criterion, • multilayer perceptron (MLP) with backpropagation (one hidden layer with (# features + # classes)/2 nodes with sigmoid activation function),

279

• soft-margin support vector machine (SVM) with 1st, 2nd, 3rd-order polynomial and radial basis function (RBF) kernels. These ML algorithms automatically analyze the multidimensional feature space in order to identify structural interdependencies between features that can be exploited for the distinction of different VF vibration classes. In this study the following clinically relevant classification tasks were examined, wherein each class comprised 15 cases: Healthy vs. ParesisL, Healthy vs. ParesisR, Healthy vs. Pathological, ParesisL vs. ParesisR, and Healthy vs. ParesisL vs. ParesisR. The first three tasks address the issue of distinguishing two classes: normal and aberrant VF vibrations. While problems Healthy vs. ParesisL and Healthy vs. ParesisR consider the position of the affected VF sides explicitly, Healthy vs. Pathological does not maintain this lateral information. As such, the first three tasks are used to assess the basic capability of the classification approach to differentiate movements of healthy and impaired VFs. The data used for class Pathological are composed from a merged and undersampled class containing randomly selected cases from classes ParesisL and ParesisR. Through classification task ParesisL vs. ParesisR it is evaluated how well both types of pareses can be distinguished from each other. Additionally, the performance obtained for 3-class-problem Healthy vs. ParesisL vs. ParesisR (solved using a “one-against-one” classification scheme with majority vote [22]) gives information on the overall distinguishability of the whole training set. In order to obtain reliable classification results unbiased by divergent class distributions [23], only balanced class configurations were considered in this study. The classification tasks were analyzed with different ML methods using the novel PVG contour features FC10−90 and glottal features FG . So each patient’s laryngeal dynamics were described by six different feature vectors, as the PVG contours were extracted at different heights h. By exclusively using one feature description for learning at a time and then comparing the resulting model accuracies, the adequacy of the feature sets could be evaluated. In addition to considering the feature sets individually, all possible combinations of paired features were also examined to assess their capability to complement each other. So for example FC10+90 denotes the combined feature set obtained by joining feature vectors from FC10 and FC90 . Hence, 15 additional feature combinations were analyzed in this study. Most of the employed ML algorithms possess free parameters which affect the overall training process and the classification performance of the resulting models. For the purpose of finding an appropriate ML parameter combination, a heuristic search strategy was used, namely the (␮ + ␭) evolution strategy of Schwefel and Rechenberg [24]. By repeatedly applying a sequence of elitist and tournament selection (fraction: 0.25), recombination (probability: 0.9) and Gaussian mutation to a set of five individuals representing potential problem solutions, the population’s overall fitness (w.r.t. classification accuracy) is assessed and its internal structure is accordingly adapted. In this manner, after 10 iterations of the evolutionary modification scheme, suitable parameter settings were obtained for the individual ML algorithms as shown

280

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

Table 1 – Optimized parameters of the ML algorithms including the corresponding parameter range searched using the evolution strategy approach. ML algorithm

Optimized parameters

Considered range

k-NN

neighborhood k

1.0 . . . 20.0

C4.5

minimal gain g pruning confidence c

0.0 . . . 10.0 1.0−7 . . . 0.5

MLP

learning rate r momentum m

0.0 . . . 1.0 0.0 . . . 1.0

SVM with polynomial kernel

cost C

0.0 . . . 10000.0

SVM with RBF kernel

cost C RBF width 

0.0 . . . 10000.0 0.0 . . . 100.0

in Table 1. All remaining parameters were kept fixed at their standard values. Due to the above-mentioned quality requirements concerning the HS recordings and the resulting limited amount of data, a stratified 10-fold cross-validation method (10CV) was used in this study for the evaluation of classification performance [25]. As random influences occur during splitting of the training data into individual folds, the stability of the obtained performance measure was increased by repeating the 10CV evaluation process three times with differing random seed values for each classification task. A reliable estimate for classification accuracy was obtained by averaging over the individual results. In addition, classification task Healthy vs. Pathological was evaluated multiple times using differing class configurations, since it comprises randomly undersampled and then merged data from the pathological class.

2.5.

Expert validation

To assess the achieved performance of the proposed PVG classification system, an additional expert validation of the diagnostic decisions was conducted. This further evaluation step was due to the fact that the automatically determined classifications cannot be directly related to the gold standard diagnoses: Besides from the vibratory and symmetrical properties of the VFs (as captured by PVG features in this study), the physician additionally considers further physiological evidence to make an appropriate diagnosis (e.g. VFs’ normal position and transient oscillations). To revaluate the stability of the physicians’ judgment, given the same analytic criteria as in the automatic processing of the data, the same 45 HS sequences underlying this study were presented to six clinical experts (R1–6 ). The task posed to each expert was to assign the laryngeal HS videos to one of the three classes subjectively. This particular decision-making process corresponds to classification task Healthy vs. ParesisL vs. ParesisR of this study. All additional image information giving hint to the diagnosis to be made was removed beforehand—so only VF movement patterns were available as the main basis for subjective classification of the patients’ HS sequences. As a mean to evaluate the intra-rater reliability of the experts, a selection of 6 out of 45 available HS videos (2 for each class) was presented twice and the corresponding diagnoses were subsequently compared to each other. Thus, in total 51 HS recordings were rated by the clinical experts.

3.

Results

The cross-validated classification accuracies achieved for the different employed learning schemes, feature descriptions and decision problems are presented in the following. As the number of individual classification results obtained is high, only averaged accuracy values and standard deviations are shown.

3.1.

Machine learning performance

The classification accuracies of the different ML approaches are presented in Fig. 3. These results give indication of the learning algorithms’ general capability to model class membership. The shown accuracies are averaged over PVG contour features FC10−90 and 2-class classification tasks Healthy vs. ParesisL, Healthy vs. ParesisR, and ParesisL vs. ParesisR. These particular feature sets and decision problems are selected to ensure consistency between the outcomes of different experiments and to allow for the direct comparison of the employed ML algorithms’ performance. In the experiments best results were achieved by SVMs. As shown in Fig. 3, classification accuracies of around 82% were obtained in average through this kernel-based ML technique. The decision problems could be solved most reliably using a SVM with 1st-order polynomial kernel function. This soft-margin linear separation of the PVG feature space yielded classification results as good as 85%, significantly exceeding (p < 0.05) the other inductive learning approaches (except SVM with quadratic kernel; p = 0.15). For this reason all results presented in the following are exclusively gained using this best performing SVM with linear kernel classifier. The accuracy values for C4.5, MLP, and SVM with 3rd-order polynomial and RBF kernel were around 80%. Significantly lowest classification accuracy of this study (p < 0.05) was obtained through k-NN algorithm, whose instance-based learning scheme yielded 74%. In Table 2 the individual feature sets are given which yielded best and worst classification performance for each ML algorithm.

3.2.

Feature set suitability

From the average classification accuracies of the individual feature sets FC10−90 depicted in Fig. 4 the suitability of the different PVG and glottal description approaches concerning class discrimination can be assessed.

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

281

Fig. 3 – Classification results obtained for different ML algorithms averaging over PVG feature sets FC10−90 and classification tasks Healthy vs. ParesisL, Healthy vs. ParesisR, and ParesisL vs. ParesisR. Vertical bars indicate the accuracies’ standard deviations.

The intermediate PVG contours FC30−70 yielded best classification accuracy within this study (approximately 87%), though not being statistically significant (p > 0.05). The remaining peripheral PVG contour features FC10 and FC90 yielded lowered accuracies (82% and 84%). Conventional glottal description FG yielded weakest performance of all considered feature sets (approximately 76%). The corresponding classification accuracies obtained from paired feature set combinations are given as radar charts in Fig. 5. The results are grouped from left to right according to the first combined feature. For each chart the radii of the semicircle indicate the accuracy level achieved for the respective feature set combination. At this, the orientation of the radius displays clockwise the second feature set used for pairing: while the bottommost spoke illustrates the result of feature combination FC...+10 , the topmost spoke shows accuracies for FC...+90 . The individual radius corresponding to a pair of identical features (e.g. FC10+10 = FC10 ) is emphasized. As a point of reference for the gain achieved for each combined feature set, the individual feature set results are depicted as dotted half circles. In case of accuracy improvement the performance area gained is highlighted gray. The individually best performing feature set combination is marked with a small circle on

Table 2 – Best and worst performing feature sets for the individual ML algorithms obtained by averaging over classification tasks Healthy vs. ParesisL, Healthy vs. ParesisR, and ParesisL vs. ParesisR. ML algorithm

Feature set performance Best

k-NN C4.5 MLP SVM (1st poly) SVM (2nd poly) SVM (3rd poly) SVM (RBF)

FC10 FC90 FC70 FC50 FC70 FC70 FC70

Worst FG FG FG FG FG FG FG

the corresponding radius and an additional line indicating the best overall classification result. Since each feature set can be placed as the first and as the second part for combination, in Fig. 5 each combined classification result is shown twice: e.g. the accuracy obtained from paired feature set FC10+90 is given in groups FC10+... and FC90+... . By combining PVG contours extracted at lower heights with features describing higher contours, the results of the individual features could be significantly improved (p < 0.05; e.g. illustrated by the accumulated gray area in the upper part of radar chart FC10+... ). In particular, feature set combination FC30+70 achieved a classification accuracy of around 90%, outperforming all other feature pairs. The accuracy of best performing individual feature set FC50 could not be increased further by paired combinations. In Fig. 6 the classification performance achieved through combination of the two different VF description approaches is shown. A highly significant boost ( p < 0.001) compared to its individual performance was obtained for all pairs of glottal features FG with PVG contour features FC10−90 .

3.3.

Class discrimination capability

The general classification capability of the presented PVG analysis approach for the individual decision tasks can be assessed from the accuracy results of best performing feature set combination FC30+70 as shown in Fig. 7. Since problem Healthy vs. ParesisL vs. ParesisR involves discriminating three balanced classes, the baseline classification accuracy which can be achieved by assigning all cases to the same class amounts to 33.3%. This is opposed to a 50% baseline accuracy for the remaining balanced 2-class problems (see dashed horizontal lines). Results of 2-class decision tasks Healthy vs. ParesisL, Healthy vs. ParesisR, and Healthy vs. Pathological were as good as 93% in average. For the first classification task a decision accuracy of over 95% was achieved, yielding highest reliability of all examined problems. For the task of identifying the VF side which is actually affected by paresis (ParesisL vs. ParesisR) a reduced accuracy of approx. 81% was obtained. This issue of assign-

282

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

Fig. 4 – Classification results obtained for different PVG contour features FC10−90 (light gray) and glottal features FG (dark gray) averaging over classification tasks Healthy vs. ParesisL, Healthy vs. ParesisR, and ParesisL vs. ParesisR and using SVM with linear kernel for learning.

Fig. 5 – Classification results obtained for PVG contour feature set combinations averaging over problems Healthy vs. ParesisL, Healthy vs. ParesisR, and ParesisL vs. ParesisR and using SVM with linear kernel for learning. A single radar chart is organized as follows: each spoke represents a feature set combination; the radius of the semicircle depicts the classification accuracy achieved using a certain combination; while the small circle marks the best performing feature set combination, the additional small line indicates the best overall classification result; the corresponding individual feature set’s performance is displayed as a dotted half circle; an accuracy improvement over the individual results is indicated gray. For the sake of clarity, standard deviations are not depicted.

ing pareses laterally is also reflected in the results of problem Healthy vs. ParesisL vs. ParesisR (73%): A closer look at the averaged confusion matrix of the corresponding results given in Table 3 reveals that over 62% of misclassifications occurred between the two paralytic classes (highlighted gray).

3.4.

Expert validation

In Fig. 8 the individually achieved subjective classification accuracies for the six expert raters R1–6 , the corresponding averaged result R1−6 and standard deviation are depicted.

Table 3 – Averaged confusion matrix for classification task Healthy vs. ParesisL vs. ParesisR (expressed as percentage of 45 cases in total) obtained from PVG contour feature combination FC30+70 using SVM with linear kernel for learning. Suspected class

Healthy ParesisL ParesisR

True class Healthy

ParesisL

ParesisR

30. 37% 1.48% 1.48%

2.96% 21.48% 8.89%

4.44% 8.15% 20.74%

62.2% of misclassifications between ParesisL and ParesisR

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

283

The average classification accuracy achieved by the clinical experts when confronted with similar diagnostic conditions was as good as 64%. The SVM classification results clearly outperformed all but one expert’s overall ratings (R5 ). Besides from exhibiting rather low inter-rater reliability ( = 0.35), reduced intra-rater reliability (approximately 75%) could also be observed. Healthy cases could be identified by all experts more reliably than pathological cases (see Fig. 8). The resulting averaged confusion matrix in Table 4 additionally points out that classes ParesisL and ParesisR were especially hard to distinguish for the experts, in total accounting alone for more than 53% of misclassifications (highlighted gray). This finding is in accordance with the SVM results given in Table 3.

Fig. 6 – Classification results obtained for all paired feature set combinations of PVG contours and glottal parameters averaging over problems Healthy vs. ParesisL, Healthy vs. ParesisR, and ParesisL vs. ParesisR and using SVM with linear kernel for learning. For an explanation of the radar chart see Fig. 5.

Since the decision task posed to the experts corresponds to 3-class classification problem Healthy vs. ParesisL vs. ParesisR analyzed by the ML algorithms, the average accuracy of the best performing SVM with linear kernel is also given as classification benchmark (dashed horizontal line). The performance of the objective PVG classification approach can thus be put into context with the experts’ subjective ratings when the only diagnostic information being available is VF vibration. Moreover, in Fig. 8 bottom the achieved inter-rater reliabilities (expressed as generalized Cohen’s kappa [26]) and intra-rater reliabilities (expressed as percentages of agreement) are presented.

4.

Discussion

4.1.

Expert validation

The result achieved by the presented automatic PVG classification approach (dashed line in Fig. 8) was approximately 9% above the corresponding average result of the group of experts (R1−6 in Fig. 8) when the only diagnostic clue available was the patient’s VF oscillation pattern. The PVG feature analysis based on objective criteria derived from HS videos surpasses subjective assessment by clinically experienced raters in average. By means of the novel PVG classification method objective clinical decision support can be provided to the physician. However, further laryngeal properties will be considered clinically in order to make a proper diagnosis of voice functioning: Besides the mandatory ELS criteria, also the VFs’ normal position, the phonation onset, and the general orientation of the arytenoid cartilages give indication of the paralytic impairment and its lateral assignment. But even without incorporating these additional laryngeal properties at the current state, the automatic quantitative PVG analysis approach already achieves a more reliable classification of VF oscillation patterns than human experts. Further quantitative features describing different aspects of VF activity will be captured and analyzed with the presented classifi-

Fig. 7 – Classification results obtained for considered 2-class problems (light gray) and 3-class problem (dark gray) using PVG contour feature combination FC30+70 and SVM with linear kernel for learning. Dashed horizontal lines illustrate the baseline accuracies attainable by classifying all cases as one class.

284

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

Fig. 8 – Subjective classification results for the 45 HS sequences achieved by the six expert raters using exclusively vibratory VF information for diagnosis. For each rater the class-specific accuracies and the overall classification result are given from left to right. The dashed horizontal line indicates the overall accuracy achieved by the SVM classification approach for 3-class decision problem Healthy vs. ParesisL vs. ParesisR. The corresponding reliability measures indicating the experts’ inter-rater and intra-rater variability are given below.

cation method in the future using a more extensive data collection. A particular problem for the experts consisted in distinguishing left- and right-sided pareses from each other (see Table 4). As this classification difficulty is in accordance with the SVM results in Table 3, it suggests high resemblance of the two pathological classes considering only vibratory information. The reason for this aggravated lateral distinction can be seen in the fact, that a paralytic impairment does not manifest itself in the deflection symmetry of the affected VF as suspected. Yet, it may even be functionally compensated for by increased laryngeal muscle tension. From the experts’ misclassifications it can be concluded that for the reliable discrimination of unilateral pareses it is essential to include additional diagnostic information. The reduced inter- and intra-rater reliability measures presented in Fig. 8 substantiate the problem inherent to the subjective assessment of laryngeal dynamics as captured in HS videos. This issue can be attributed to the fact that human visual perception is better adapted to the processing of static information than to moving images. Comparing HS sequences of periodically recurring episodes of VF movement is consequently a rather difficult task to accomplish. This holds

in particular for image sequences where no supplementary diagnostic evidence preceding the VF oscillations is available. The experts’ reduced rating reliability found in this study is a further argument in support of the objective PVG analysis technique.

4.2.

Machine learning performance

The SVM modeling approach attained significantly highest classification accuracy in the experiments (see Fig. 3). Hence, this learning method of finding the maximum-margin hyperplane which separates best the analyzed data points turned out to be particularly suited to the problem of distinguishing VF pathologies from healthy oscillation patterns. This finding is in accordance with the results of similar studies comparing classifier performance in medical domains [27–29]. As for the choice of the employed SVM kernel function, best average classification accuracies were obtained by using 1st-degree soft-margin polynomials, yet outperforming kernel functions with higher polynomial degree and RBF functions. Results of decision tree algorithm C4.5, backpropagation MLP and SVM with 3rd-order polynomial and RBF kernel were nearly equivalent. However, the C4.5 and SVM approaches should

Table 4 – Averaged confusion matrix for manual classification of VF vibrations as captured in 45 HS recordings averaged over expert ratings R1–6 . Suspected class

Healthy ParesisL ParesisR

True class Healthy

ParesisL

ParesisR

26.98% 4.13% 2.22%

6.03% 19.37% 7.94%

4.76% 11.75% 16.83%

53.5% of misclassifications between ParesisL and ParesisR

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

be given preference when it comes to practical implementation of a clinical decision support system, as the amount of time needed for MLP training is considerably high. The general weak classification performance of k-NN algorithm points towards a rather inhomogeneous class distribution in the feature space. The relatively high standard deviations in Fig. 3 (approximately 17%) can be ascribed to the limited amount of data available: For the 10CV model evaluation procedure the applied test fold simply comprised three cases for the considered 2-class problems, and therefore, the evaluation results were quite sensitive to the random split of the data.

4.3.

Individual feature set suitability

The gradual decline of classification results (approximately −4%) obtained from using peripheral features at the boundaries of the VF oscillation phases (see Fig. 4) suggests that characteristic VF movement patterns are expressed best in medium deflection states. Hence, class membership can be determined most reliably through the corresponding intermediate PVG contour features, as they appropriately capture the relevant spatio-temporal information during the opening and closing phase of a cycle. This is opposed to the conventional glottal description approach which basically implements a peak-picking strategy, and thus only allows for the consideration of maximum glottal deflection states. The advantage of the new PVG contours over the glottal features is confirmed by the results in Fig. 4, as the accuracy gain achieved by employing individually best performing PVG feature set FC50 amounts to 11%. Hence, the additional amount of VF vibration information captured through 2-d PVG contour features significantly improves classification accuracy. The features extracted from the 1-d glottal signal lack the descriptive power to capture the necessary details of VF dynamics.

4.4.

Combined feature set suitability

As shown in Fig. 5, high and low PVG contours are complementing each other in an advantageous manner. Classes could be distinguished most reliably when information derived from the first and the second third of a VF oscillation phase was considered (as captured by combined feature set FC30+70 ). This supports the finding of the diagnostic relevance of intermediate VF deflection states. Compared to the results of the individual features (dotted half circles in Fig. 5) most feature set combinations lead to improved accuracies (represented by gray areas). Largest gain for contour features was achieved for individually weak performing peripheral feature sets (FC10 : +4.2%; FC90 : +4.3%). For best functioning individual PVG feature set FC50 , classification results could not be increased further by combination (yielding no gray area), confirming its already high suitability for class discrimination purposes. Despite the high average gain of +8.6% for the combination of glottal features FG with PVG features FC10−90 compared to the individual classification results (see extensive gray area in Fig. 6), its respective accuracy values are still surpassed by paired PVG contour features. This can be ascribed to the fact that certain combinations of glottal and PVG descriptions may exploit additional information which is not covered by the individual features. Still, PVG feature combinations are more

285

discriminative, as they provide an even more detailed description of the underlying VF movement patterns through time. It can be stated that for classification purposes PVG features are more suitable than glottal features.

4.5.

General classification capability

The presented PVG classification approach is capable of accurately distinguishing between individuals with a healthy and a wide variety of paralytic voices (see Fig. 7). Classification problems Healthy vs. ParesisL, Healthy vs. ParesisR, and Healthy vs. Pathological are especially important clinically, as they constitute a basic decision step of the diagnostic process: to identify whether an individual is affected by a voice disorder or not. Hence, the average classification accuracy of over 93% obtained in this study for the differentiation of normal and paralytic VF vibrations by means of PVG features is considered successful. This finding is confirmed when contrasting the results to other studies [12,30,31]. But as the automatic classification of laryngeal HS recordings in general and PVG feature extraction in particular are utterly novel VF dynamics analysis approaches, only few directly comparable results can be found in the literature. The technique most frequently applied is digitally recording a patient’s voice signal, extracting miscellaneous acoustic features and applying different ML algorithms to classify the audio signal into normal or pathological groups [32–34]. But this classification approach basically differs from the one presented in this paper: it makes a difference to analyze the underlying vibration patterns of the sound-producing elements in the larynx (VFs) or to analyze the phonatory outcome of this source (voice signal). The mapping from a hoarse voice (effect) to the pathology (cause) is obviously much more ambiguous than from VF movements directly. Due to this fact most studies exclusively focus on the decision problem Healthy vs. Pathological, or rather “Normal vs. Hoarse Voice”, which essentially corresponds to the amount of noise in the audio signal. The clinical question, which VF side is affected by a detected pathology, cannot be answered by analyzing the voice signal only. Further studies were based on feature extraction from the glottal signal (as captured by FG in this paper) and statistically determining ranges for normophonic and dysphonic speakers [10,35,36]. But no direct attempts to classifying any data were made in these studies. In summary, it can be said that VF pathology classification results were attained in this study which are evenly matched to existing voice analysis approaches. These two branches of VF analysis and diagnosis ought to be considered jointly to benefit from the individual strengths.

4.6.

Lateral classification capability

Explicit lateral classification (ParesisL vs. ParesisR, Healthy vs. ParesisL vs. ParesisR) performed relatively poorly compared to the results of diagnostic tasks Healthy vs. ParesisL, Healthy vs. ParesisR, and Healthy vs. ParesisL vs. ParesisR. From the average performance loss of about −12% as shown in Fig. 7 and the relatively high lateral error rate in Table 3 it can be concluded that the employed ML algorithms have difficulties differen-

286

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

Fig. 9 – PVG contour feature plot for two exemplary pathological cases which were particularly hard to classify. While (a) shows five normalized oscillation cycles of a case where the left VF is affected by paresis, (b) shows a right VF side paresis. For each cycle the PVG contours with heights h = 30% and h = 70% are depicted for both VF sides, respectively.

tiating the two types of pareses. But when considering the baseline accuracy thresholds (depicted as horizontal lines in Fig. 7) for balanced 2-class (50%) and 3-class problems (33.3%), it must be noted that results of Healthy vs. ParesisL vs. ParesisR relatively outperform results of ParesisL vs. ParesisR. Hence, a classifier built to distinguish all three classes jointly achieves more reliable results than a classifier that exclusively focuses on ParesisL and ParesisR. In order to understand this aggravated lateral distinction more thoroughly, in the following a selection of VF oscillations which was especially hard to classify is considered in more detail. For this purpose in Fig. 9 a PVG contour feature plot is shown for two selected paralytic voices from this study. These special cases are regarded as deviations from a mutual similarity criterion defining the class in question, and give hint to the reason for misclassification. For both paralytic examples shown in Fig. 9 VF movement appears relatively stable, exhibiting only small variation in the contour shapes of consecutive oscillation cycles. The relation between the vibrations of the left and right VF side is quite balanced—none of the two VFs can be characterized as being functionally inferior to the other. Hence, for these particular pathological cases the common diagnostic criterion of lateral vibration dissimilarity does not hold—by all appearances they rather resemble cases from the healthy control group. It can therefore be stated that the impact of the pareses on the VF vibration patterns is not always as strong as presumed. This

can be ascribed to the fact that certain characteristics of VF vibration patterns are influenced by the phonation frequency [37,20]. Since in this study no explicit frequency ranges were specified during examination, an inappropriate choice may have been made inadvertently by the patient, and as a result no distinct evidence for a voice disorder can be seen from the vibratory patterns in question. Accordingly it is not surprising that certain pathological examples were difficult to distinguish from each other using a shape-based oscillation pattern description approach, and thus, only reduced accuracy was obtained for lateral classification in this study. Nevertheless, it can be assumed that by expanding the PVG feature description approach as a whole and incorporating additional physiological information of the patient (e.g. voice signal features, features capturing the orientation of the arytenoids) further improvement of the overall classification accuracy can be achieved. Besides from adapting the feature extraction process, collecting a more comprehensive clinical data set is also an important point in order to assess the general validity of the statements concluded from the rather small clinical example set underlying this study. Furthermore, differing phonation paradigms, which are more suited to the identification of VF pathologies, should be utilized during examination (e.g. non-stationary [38,39]). In doing so, the objective PVG-based analysis technique can be aligned more closely to the diagnostic chain which is actually followed in the clinical routine. The results of this paper suggest that

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

the presented knowledge-based PVG feature extraction and SVM classification approach holds a lot of promise to serve as a reliable diagnosis support system for voice disorders in general.

5.

Conclusions

An objective voice analysis and classification approach was presented in this paper which enables the reliable identification of paralytic voice disorders. It is based on quantitative PVG feature descriptions of the laryngeal dynamics along the entire VF length and subsequent SVM learning. Using an endoscopic HS camera for a collective of 45 normophonic and pathological female speakers the VF movements were digitally recorded during phonation. The resulting HS videos were segmented and the captured vibrations were transformed into a PVG representation. The contained spatio-temporal patterns of activity were described by means of PVG contour features at different VF deflection states. Conventional glottal parameters were determined as a reference for performance. The obtained feature sets were analyzed using a variety of ML algorithms with the aim to correctly distinguish healthy from pathological cases. An additional expert validation was conducted to assess the subjective discrimination of the clinical cases given the same diagnostic criteria as for the objective analysis. With the proposed PVG classification approach a reliable distinction between normal and paralytic VF movement patterns was achieved, yielding an average classification accuracy of over 93%. This even outperformed the results obtained for the subjective classifications of the experts. The PVG features’ advantage over the glottal features regarding their capability to describe the underlying vibratory processes was substantiated. A particular improvement was shown for the combination of PVG contour features extracted from different VF deflection states. Possible starting points for improving certain aspects of the PVG description approach could also be identified. On the whole, the presented PVG classification approach holds a lot of potential to support clinical decision making in the future by providing a sound objective basis.

Conflict of interest statement The authors hereby state that, aside from the public sources of funding explicitly specified in the following section, the work presented in this paper is free of any financial or personal relationships with other people or organizations that could inappropriately influence the results.

Acknowledgments This work was funded by the Deutsche Forschungsgemeinschaft (DFG), grants LO1413/2 1-3 and EY15/11 3-4. The authors would like to thank the High Performance Computing Group at Regionales Rechenzentrum Erlangen (RRZE) for providing computational resources.

287

references

[1] T. Rasch, S. Günther, U. Hoppe, U. Eysholdt, F. Rosanowski, Voice-related quality of life in organic and functional voice disorders, Logoped. Phoniatr. Vocol. 30 (1) (2005) 9–13. [2] P.N. Carding, J.A. Wilson, K. Mackenzie, I.J. Deary, Measuring voice outcomes: state of the science review, J. Laryngol. Otol. 123 (8) (2009) 823–829. [3] R.J. Baken, Electroglottography, J. Voice 6 (2) (1992) 98–110. [4] J. Wendler, Stroboscopy, J. Voice 6 (2) (1992) 149–154. [5] M. Döllinger, The next step in voice assessment: High-speed digital endoscopy and objective evaluation, Curr. Bioinform. 4 (2) (2009) 101–111. [6] R.T. Sataloff, The professional voice: Part II. Physical examination, J. Voice 1 (2) (1987) 191–201. [7] P.H. Dejonckere, P. Bradley, P. Clemente, G. Cornut, L. Crevier-Buchman, G. Friedrich, P. Van De Heyning, M. Remacle, V. Woisard, Committee on Phoniatrics of the European Laryngological Society (ELS), A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques, Eur. Arch. Otorhinolaryngol. 258 (2) (2001) 77–82. ˇ [8] J.G. Svec, H.K. Schutte, Videokymography: high-speed line scanning of vocal fold vibration, J. Voice 10 (2) (1996) 201–205. [9] A. Verikas, V. Uloza, M. Bacauskiene, A. Gelzinis, E. Kelertas, Advances in laryngeal imaging, Eur. Arch. Otorhinolaryngol. 266 (10) (2009) 1509–1520. [10] Q. Qiu, H.K. Schutte, L. Gu, Q. Yu, An automatic method to quantify the vibration properties of human vocal folds via videokymography, Folia Phoniatr. Logop. 55 (3) (2003) 128–136. [11] P. Mergell, H. Herzel, I.R. Titze, Irregular vocal-fold vibration—high-speed observation and modeling, J. Acoust. Soc. Am. 108 (6) (2000) 2996–3002. [12] R. Schwarz, U. Hoppe, M. Schuster, T. Wurzbacher, U. Eysholdt, J. Lohscheller, Classification of unilateral vocal fold paralysis by endoscopic digital high-speed recordings and inversion of a biomechanical model, IEEE Trans. Biomed. Eng. 53 (6) (2006) 1099–1108. [13] Y. Yan, K. Ahmad, M. Kunduk, D. Bless, Analysis of vocal-fold vibrations from high-speed laryngeal images using a Hilbert transform-based methodology, J. Voice 19 (2) (2005) 161–175. [14] Y. Yan, E. Damrose, D. Bless, Functional analysis of voice using simultaneous high-speed imaging and acoustic recordings, J. Voice 21 (5) (2007) 604–616. [15] Y. Zhang, C. Tao, J.J. Jiang, Parameter estimation of an asymmetric vocal-fold system from glottal area time series using chaos synchronization, Chaos 16 (2) (2006) 023118-1–023118-8. [16] J. Lohscheller, U. Eysholdt, H. Toy, M. Döllinger, Phonovibrography: mapping high-speed movies of vocal-fold vibrations into 2-D diagrams for visualizing and analyzing the underlying laryngeal dynamics, IEEE Trans. Med. Imaging 27 (3) (2008) 300–309. [17] J. Lohscheller, H. Toy, F. Rosanowski, U. Eysholdt, M. Döllinger, Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos, Med. Image Anal. 11 (4) (2007) 400–413. [18] A. Verikas, A. Gelzinis, D. Valincius, M. Bacauskiene, V. Uloza, Multiple feature sets based categorization of laryngeal images, Comput. Method Progr. Biol. 85 (3) (2007) 257–266. [19] A. Verikas, A. Gelzinis, M. Bacauskiene, V. Uloza, Towards a computer-aided diagnosis system for vocal cord diseases, Artif. Intell. Med. 36 (1) (2006) 71–84.

288

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 9 ( 2 0 1 0 ) 275–288

[20] D. Voigt, M. Döllinger, T. Braunschweig, A. Yang, U. Eysholdt, J. Lohscheller, Classification of functional voice disorders based on phonovibrograms, Artif. Intell. Med. (2010), doi:10.1016/j.artmed.2010.01.001, in press. [21] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, second ed., John Wiley & Sons, New York, USA, 2001. [22] C.W. Hsu, C.J. Lin, A comparison of methods for multi-class support vector machines, IEEE T Neural Network 13 (2) (2002) 415–425. [23] N. Japkowicz, S. Stephen, The class imbalance problem: a systematic study, Intell. Data Anal. 6 (5) (2002) 429–449. [24] H. Beyer, H. Schwefel, Evolution strategies—a comprehensive introduction, Nat. Comput. 1 (1) (2002) 3–52. [25] R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, IJCAI (1995) 1137–1145. [26] K. Krippendorf, Content Analysis: An Introduction to Its Methodology, Sage Publications, Beverly Hills, CA, USA, 1980. [27] D. Wang, B. Larder, A. Revell, J. Montaner, R. Harrigan, F. De Wolf, J. Lange, S. Wegener, L. Ruiz, M.J. Pérez-Elías, S. Emery, J. Gatell, A. D’Arminio Monforte, C. Torti, M. Zazzi, C. Lane, A comparison of three computational modelling methods for the prediction of virological response to combination HIV therapy, Artif. Intell. Med. 47 (1) (2009) 63–74. [28] H. Shin, M.K. Markey, A machine learning perspective on the development of clinical decision support systems utilizing mass spectra of blood samples, J. Biomed. Inform. 39 (2) (2006) 227–248. [29] G. Díaz, F.A. González, E. Romero, A semi-automatic method for quantification and classification of erythrocytes infected with malaria parasites in microscopic images, J. Biomed. Inform. 42 (2) (2009) 296–307. [30] M.A. Little, P.E. McSharry, S.J. Roberts, D.A. Costello, I.M. Moroz, Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection, Biomed. Eng. Online 6 (23) (2007).

[31] A. Gelzinis, A. Verikas, M. Bacauskiene, Automated speech analysis applied to laryngeal disease categorization, Comput. Method Progr. Biol. 91 (1) (2008) 36–47. [32] K. Umapathy, S. Krishnan, V. Parsa, D.G. Jamieson, Discrimination of pathological voices using a time-frequency approach, IEEE Trans. Biomed. Eng. 52 (3) (2005) 421–430. [33] C.D. Crovato, A. Schuck, The use of wavelet packet transform and artificial neural networks in analysis and classification of dysphonic voices, IEEE Trans. Biomed. Eng. 54 (10) (2007) 1898–1900. [34] J.I. Godino-Llorente, P. Gómez-Vilda, Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors, IEEE Trans. Biomed. Eng. 51 (2) (2004) 380–384. [35] T. Braunschweig, J. Flaschka, P. Schelhorn-Neise, M. Döllinger, High-speed video analysis of the phonation onset, with an application to the diagnosis of functional dysphonias, Med. Eng. Phys. 30 (1) (2008) 59–66. [36] H.S. Bonilha, D.D. Deliyski, Period and glottal width irregularities in vocally normal speakers, J. Voice 22 (6) (2008) 699–708. [37] R.F. Orlikoff, R.J. Baken, Consideration of the relationship between the fundamental frequency of phonation and vocal jitter, Folia Phoniatr. (Basel) 42 (1) (1990) 31–40. [38] O. Rasp, J. Lohscheller, M. Döllinger, U. Eysholdt, U. Hoppe, The pitch rise paradigm: a new task for real-time endoscopy of non-stationary phonation, Folia Phoniatr. Logop. 58 (3) (2006) 175–185. [39] T. Wurzbacher, R. Schwarz, M. Döllinger, U. Hoppe, U. Eysholdt, J. Lohscheller, Model-based classification of nonstationary vocal fold vibrations, J. Acoust. Soc. Am. 120 (2) (2006) 1012–1027.