hmrnal of Voice Vol. 12, No. 2, pp. 143-150 © 1998 Singular Publishing Group, Inc.
A Hardware-Software System for Analysis of Video Images *Maria InSs Gongalves and tRebecca Leonard *Universidade Camilo Castelo Branco, Centro de Estudos da Voz, and Instituto da Laringe, Sgo Paulo, Brazil "~Universi~ of California, Davis, Medical Center, Sacramento, California, U.S.A.
Summary: The purpose of this paper is to describe a software/hardware system for the analysis of digitized video images and a number of applications for which it may be used. The system described includes a Macintosh computer, a frame-grabber board, and Image, a public domain software program available at no cost from the U.S. National Institutes of Health. In our clinic and laboratory, this system is routinely used to make quantitative measurements from videofluoroscopic x-ray images of dynamic swallow studies and studies performed to assess velopharyngeal dysfunction in speech. It can also be used to examine various laryngeal parameters obtained from videotaped endoscopic and stroboscopic examinations. With a videocamera attached to a microscope, the system permits quantitative analysis of tissue characteristics, e.g., thickness of epithelial or connective tissue layers of the vocal folds. The relatively low cost and ease of use of the image analysis system make it a particularly attractive option when quantitative assessment of clinical or research materials in video format is desirable. Key Words: Video image analysis.
in the reach of even the most economy-conscious budgets.
Quantitative analyses of videotaped images can be a desirable objective for both researchers and clinicians. In the past, most image measurements were manually performed, and calculations were subject to a large margin of error. New possibilities have been realized with advances in computer hardware and software. Currently available products improve precision, provide shortcuts, save time in analysis, and permit manipulation of images in ways that enhance their quality prior to measurement. At present, powerful digital image-processing techniques are readily available, and in some cases, at prices that are with-
SYSTEM DESCRIPTION One such system, heavily used at our Center, is based on the software program Image, initially developed by Wayne Rasband at the U.S. National Institute of Mental Health. (The Image program can be downloaded from the Internet at the Image Home Page, http://rsb.info.nih.gov/nih-image/. It can also be obtained by contacting the National Institute of Mental Health in Bethesda, MD, U.S.A.). This software is a public domain image processing and analysis program for Macintosh computers (a PC version which runs on Windows 95 is also available). It requires a computer (from Mac II to contemporary models) with at least 8MB of RAM, and a monitor
Accepted for publication December 4, 1996. Address correspondence and reprint requests to Rebecca Leonard, Ph.D., Universityof California, Davis, Department of Otolaryngology, 2521 Stockton Blvd., Suite 7200, Sacramento, California, 95817.
143
144
MARIA INF,S GON,CALVESAND REBECCA LEONARD
with the capacity to display 8-bit or 16-bit images (256 gray levels). A frame-grabber board is also required to digitize video images. Boards made by Data Translation* or Scion** support the NIMH program, and are available for around $I000. Alternatively, Image supports QuickTime digitizers, such as those built into AV Macs and selected PowerMacs. Wnh Image and appropriate frame-grabber hardware, a user can acquire, display, edit, enhance, analyze, print, and animate images directly from a videocamera, or from a VCR. Once an image is captured, it can be subjected to a number of enhancements, including contrast and brighmess adjustments, smoothing, sharpening, edge detection, and a variety of filtering processes. Digitized images can be rotated, inverted, scaled, and manipulated in several other ways as well. The program can be used to measure areas, path lengths and angles, to average gray values, and to determine center and angle of orientation of defined regions of interest. An additional feature of the program is its capability to perform automated particle analysis. Editing of color and grayscale images (such as that seen with MacPaint), including the option of overriding automatic operations to manually outline, select, and/or measure particular regions of interest, make Image extremely "user-friendly." Any calculations obtained can be printed, exported to text files and spreadsheets, or copied to the "Clipboard" for further manipulation or analysis. The program supports multiple windows which can be simultaneously opened, and eight levels of magnification in which all editing, filtering, and measurement functions can operate. In order to use Image, a videotaped frame of interest is input from a VCR or camera through the computer's frame-grabber board, digitized, and captured. Alternatively, sequences of images can be collected, with the number of frames per second and the amount of data limited by capabilities of the frame-grabber board and available memory on the computer. A VCR with stop frame or variable playback forward and reverse speeds is useful for identifying selected frames of interest for digitization. A TV monitor used with
* (100 Locke Drive, Marlboro, MA 01752) ** (152 West Patrick Street, Frederick, MD 21701)
Journal ofVoice, Vol. 12. No. 2, 1998
the computer monitor is helpful for simultaneously visualizing video and digitized images. Once an image has been captured and enhanced or otherwise manipulated to meet the user's criteria, it can be subjected to analysis, hnage contains several tools that facilitate measurement. Tools are similar to those in many draw and paint programs and include a magnifying glass, scrolling tool, selection rectangle, oval or polygon, a freehand drawing tool, line tool, pencil, eraser, paintbrush, and look-up table (LUT). The LUT permits the user to transform each of the 256 possible gray scale pixel values into color, if desirable. Areas chosen for processing are identified using rectangle, oval, or polygon selection tools. Lines are created using the line tool, and can be straight, freehand, or segmented. Any selection can be moved, stretched, added, subtracted, deleted, transferred, saved, or restored. Selection options are also useful to isolate and enhance a particular region of an image without changing other parts of the image.
APPLICATION E X A M P L E S Swallowing One application of the Image program that we have found extremely useful is the analysis of videotaped fluoroscopic studies of swallowing. At our institution, dynamic swallow studies are performed in adults and children experiencing dysphagia related to head and neck pathology, neuromuscular disease, neurogenic, and other disorders. During these studies, patients are asked to swallow 1 cc, 3 cc, and self-selected amounts of barium, of both liquid and paste consistencies, during videofluoroscopic filming in lateral and anteroposterior views at 30 frames per second. Timing measures can be obtained without digitization, but other quantitative assessments are made possible with Image. For this purpose, a radiopaque ring of known diameter is placed on the patient's midchin to serve as a referent measurement (Fig. 1), that is, x pixels = y displacement (in mm, cm, or other measurement standard), assuming linearity of images obtained. The line tool is used first to draw a straight line across the diameter of the ring. The number of pixels traversed in this distance is then entered in the calibration window to equal the number of m m of the known diam-
A HARDWARE-SOFTWARE SYSTEM FOR ANALYSIS OF VIDEO IMAGES
145
this measurement correlates well with other measures of swallowing efficiency and safety. It is routinely collected on all patients undergoing dynamic swallow studies at our center.
FIG. 1. Lateral view videofluoroscopic image. Ring on midchin provides measurement referent.
eter of the ring, for example, 32 pixels = 17 mm. All subsequent measurements will be related to this referent until the referent itself is changed. Measures that we routinely collect from the recorded swallow studies include the following:
1. Pharyngeal area at rest and at maximum constriction. To obtain these measurements, the videotaped swallow study is searched for a "rest" position and then for the point of maximum pharyngeal constriction during a swallow. At each point, pharyngeal area is carefully traced and calculated. This can be done by outlining the entire area or by putting dots at selected points, which are then automatically connected. A recent study in our laboratory revealed that, in 60 normal control adults, "pharyngeal area," as shown in Fig. 2A, became essentially zero during swallow (Fig. 2B) (1). This is indicative of the critical interaction between the base of the tongue and the pharynx in propelling the bolus into the esophagus. In contrast to the control subject, note the same two points in a patient who underwent oropharyngeal resection for squamous cell carcinoma in Fig. 3, A and B. Obviously, pharyngeal area does not decrease in a normal manner. Admittedly, the "area" measure does not account well for differential function of the right or left side of the pharynx or tongue during swallow, and may lead to an overestimate of the patient's ability to constrict the pharynx. Studies at our institution indicate, however, that in selected patient populations
2. Hyoid at rest and at maximum displacement. Elevation of the hyoid and larynx have been well established as critical to airway protection and opening of the upper esophageal sphincter during swallow. Measurements of the hyoid (and larynx) at rest and at point of maximum elevation during swallow provide useful, objective insights into both processes. An example of the measurement technique for hyoid elevation is illustrated in Fig. 4, A-D. The appropriate frames of the hyoid at rest (Fig. 4A) and then maximally elevated (Fig. 4B) are selected and digitized. In each frame, the anterior hyoid is outlined and reference lines are drawn on stable landmarks, i.e., the floor of the nose to the tubercle of the atlas and a straight line projected inferiorly from the floor of nose-tubercle line. When this has been completed, the hyoid and portions of the two reference lines are selected and copied from the rest frame as in Fig. 4C. This selection is then ready to be pasted onto the frame representing maximum elevation (Fig. 4B), with care taken to ensure that the referent landmarks are aligned. Following this pasting, the superior and anterior displacement of the hyoid from rest to its maximal elevation during swallow can be calculated, as can the most direct distance between these two points (illustrated in Fig. 4D). Another convention requires relating absolute distances to vertebral height. 3. Maximum anteroposterior upper esophageal sphincter (UES) opening. This measurement refers to the maximum opening of the UES during swallow, as measured in the anteroposterior dimension. Examples of the LIES at rest and maximally open are presented in Fig. 5, A and B. If the UES does not open properly, bolus material may not enter the esophagus in a timely manner or may present a risk to the airway. Both timing and extent of UES opening are routinely calculated on patients undergoing dynamic swallow studies at our institution. As with pharyngeal area, the opening measurement provides information only on the anterior-posterior component of UES opening. Lateral movements cannot be appreciated in the lateral view radiographic images. This limitation notwithstanding, studies in progress in our
Jounlal of Voice. Vol. 12, No. 2, 1998
146
MARIA INES GOIV,CALVES AND REBECCA LEONARD
A
B
FIG. 2. A: Pharyngeal area at rest in normal adult is outlined with tools in hnage. B: Pharyngeal area at point of maximum constriction during swallow in normal adult.
A
B
FIG. 3. A: Pharyngeal area at rest in adult patient with oropharyngeal resection. B: Pharyngeal area in same patient at point of maximum constriction during swallow. Large area reflects difficulty in tongue-pharynx contact caused by resection.
laboratory suggest it is a useful measure in characterizing the nature of swallowing impairment in dysphagic patients. ARTICULATORY M O V E M E N T S An additional application of Image at our center is in determining range of tongue +jaw motions during selected speech tasks. In a recent study, for example, range of tongue motion across the vowels/i/,/a/, and Journal of Voice, Vol. 12, No. 2, 1998
/u/in normal speakers and in speakers with glossectomy was investigated. Range of tongue motion is defined here as the total area encompassed by the tongue across the three vowels, as measured from lateral view videofluoroscopy studies. To make this measurement, steady-state portions of subjects' productions of/i/,/a/, and/u/were first identified on the videotape, captured, and digitized. The tongue was then outlined or traced anteriorly from its insertion in the floor of mouth and posteriorly to the vallecula.
A HARDWARE-SOFTWARE ~YSTEM FOR ANALYSIS OF VIDEO IMAGES
A
B
C
D
147
FIG. 4. A: Hyoid at rest in normal adult. Referent lines are added and anterior hyoid is outlined using tools in Image. B: Hyoid at point of maximum elevation during swallow in normal adult. Anterior hyoid is outlined and referent lines are added. C: Portions of anterior hyoid and referent lines in A are selected and copied for pasting onto the image in B. D: Selection of hyoid and referent lines in A is superimposed on image in B, with referent lines aligned. The shortest distance between the two points can be calculated; alternatively, anterior and superior displacement of hyoid, or displacement in terms of vertebral height, can be quantified.
An example of this for the vowel/a/is shown in Fig. 6A. As shown, a straight line was again projected along the floor of nose to the tubercle of the atlas, and a straight line was projected inferiorly from the tubercle. This process is then repeated for the subject's production o f / i / ( F i g . 6B). With/i/completed, the outline of the tongue and portions of the two reference lines were selected, copied (Fig. 6C), and then pasted onto the image of the speaker producing the
vowel/a/(Fig. 6D). This step was then repeated for the image of the speaker producing the vowel/u/. With each superimposition, care was taken to align the referent lines. When the composite picture was completed, the measurements of the total and shared areas of movement of the tongue for the three vowels were made, as illustrated in Fig. 6E and E As noted, both overall range of tongue motion and the propol'tion of shared area to total area are being compared Journal of Voice. Vol. 12, No. 2, 1998
148
MARIA INES GOAl,CALVESAND REBECCA LEONARD
A
B
FIG. 5. A: Arrow indicates location of UES at rest (closed) in normal adult. B: Arrow indicates maximum opening of UES during swallow in normal adult.
in control speakers and speakers with glossectomy. Although analyses of these data have not been completed, preliminary findings suggest that speakers may strive to preserve the ratio of shared and independent areas to total area even with extensive oropharyngeal resection.
LARYNGEAL PARAMETERS Additional uses of Image include relative measurements of a number of laryngeal parameters. It has not been possible to make absolute measurement of laryngeal variables due to the difficulty of locating a known measurement referent for structures of interest. However, relative measures are quite possible, and include extent and degree of closure of the vocal folds, characteristics of anterior and posterior glottal chinks, angles formed by the vocal processes or anterior commissure, length of the true vocal folds associated with frequency changes, and displacement of the vocal fold edges associated with intensity variation. A simple example is presented in Fig. 7, in which the extent of a lesion along the vibratory portion of one vocal fold edge is compared to glottic length. In the example shown, the broad-based lesion occupies about one third of the entire length of the glottis. Repeated measures over time can provide Journal of Voice, Vol. 12, No. 2, 1998
quantitative information about any reduction in the extent of the lesion with various interventions. Other applications of Image in our setting have ranged from measures of velopharyngeal function during speech to tissue measurements from histology slides input into the computer via a videocamera attached to a microscope, but the system lends itself to any type of video information for which measurement or quantitative analysis is desirable. With a Macintosh (or PC) computer, digitizing board or built-in digitizer, good quality VCR (preferably with stop frame and variable playback rates), and the hnage software program from NIMH, the clinician or researcher has a powerful tool. Virtually any clinical or research material that can be prepared in video format can be subjected to a wide range of measurement and analysis techniques. Although many image analysis options are available, the system described here, involving free software (which is continually upgraded) and relatively inexpensive hardware, has proven to be an extremely valuable resource with a wide range of applications.
REFERENCES 1. Kendall K, McKenzie S, Leonard R, Gon~:alves M, Walker A. Dynamic videofluoroscopic swallowing parameters in normal adults. Presented at Dysphagia Research Society Meeting, Aspen, CO, October, 1996.
A HARDWARE-SOFTWARE SYSTEM FOR ANALYSIS OF VIDEO IMAGES
149
A, B
C, D
E, F FIG. 6. A: Lateral view videofluoroscopic frame of normal adult producing vowel/a/. Tongue is outlined from anterior floor of mouth to vallecula using tools in hnage. Referent lines are added. B: Process is repeated for/i/. C: Tongue shape and portion of referent lines in B are selected and copied for pasting onto image in A. D: Selection in B is pasted onto frame of speaker producing vowel/a/, with care taken to align referent lines. E: Composite of images for the three vowels is completed, with referent lines aligned. A line connects vallecula to the anterior floor of mouth for each tongue shape. These points are then connected to form the inferior border of the composite. F: Area common to all three tongue positions (shared area) is outlined with segmented line. Measurements permit calculation of total, shared, and independent tongue areas for the three vowel productions.
Journal ofVoice, VoL 12, No. 2, 1998
150
MARIA INES GONCALVES AND REBECCA LEONARD
FIG. 7. A lesion of the right true vocal fold is shown. Its extent along the vibratory edge of the fold is calculated as a percentage of the total length of the membranous portion of the fold.
Journal of Voice, Vol. 12, No. 2, 1998