Comput & Graphit~ Vol. 8. No. 2. pp. 163-166. 1984 Pnnted in the U.S.A.
0097-8493/84 $3.00 + .00 1984 Pergamon Press Ltd.
~c
Technical Section
THE USE OF C O M P U T E R - D R A W N FACES AS AN EDUCATIONAL AID FOR THE PRESENTATION OF STATISTICAL
CONCEPTS
CLn~ORD A. PICKOVER Remote Information Access Systems Group, Computer Sciences Department, IBM Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598, U.S.A.
(Received6 February1984) Abstraet--A description of a vector-graphics facility for coordinated computation and display of faces is presented. It is suggested that such faces are suitable as visual supplcraents in the presentation of statistical concepts, particularlydistribution theory, to individuals inexperienced in mathematics and with no prior knowledge of the methods of statistical evaluation. INTRODUCTION Computer graphics has become increasingly useful in the representation and interpretation of multidimensional data with complex relationships. Pseudo-color, animation, 3-dimensional figures, and a variety of shading schemes are among the techniques used to reveal relations not easily visible from simple correlations based on two-dimensional linear theories. Various graphical methods of representing multivariate data using icons, or symbols, have been discussed previously[l-3]. In general, n data parameters are each mapped into a figure with n features, each feature varying in size or shape according to the point's coordinate in that dimension. One particularly novel method of representing multivariate data has been presented by Chernoff[1]. The data sample variables are mapped to facial characteristics; thus, each multivariate observation is visualized as a computer drawn face. Such faces have been shown to be more reliable and more memorable than other tested icons [2] and allow the human analyst to grasp many of the essential regularities and irregularities in the data. This aspect of the graphical point displays capitalizes on the feature integration abilities of the human visual system, particularly at higher levels of cognitive processing [2]. A particularly useful application of such computerdrawn faces would be in the field of education. In this paper, faces are used to illustrate the concept of white noise (totally random distribution) vs Gaussian noise (normal error distribution), theories usually not introduced to individuals prior to the high-school level due to the mathematical complexity of the subject matter. DESCRIPTIONOF SYSTEM The user console consists of a vector graphics display (Tectronix 618) and a standard CRT terminal (IBM 3277 GA). The support software is implemented in PL/I[4]. The system allows for the rapid generation of computer-drawn faces, and can be used in a variety of applications [3].
In the current applications, ten facial parameters, used, and each facial characteristic has ten settings, SL.2.3,4.5.6.7.8.9,10.The controlled features are: head eccentricity, eye eccentricity, pupil size, eyebrow slant, nose size, mouth shape, eye spacing, eye size, mouth length, and degree of mouth opening. The mouth is constructed using parabolic interpolation routines, and the other features are derived from circles, lines, and ellipses. A middle-setting face Fs.5.s,5.5.5.5.5.5.5 is shown below. Fi,2,3,4,5,6,7,8,9,10 a r e
® APPLICATIONS For the case of white noise (Fig. 1), one hundred faces were generated, each facial characteristic having a setting determined from a random number generator. In the case of the Gaussian noise (Figs. 2 and 3), the facial settings S~ are calculated from the following equation:
j=N~j s,--- j=l 2~ -N
where 6j are random numbers, and N is 5 for weakly Gaussian noise or 20 for strongly Gaussian noise. CONCLUSION Graphics are generally limited to a finite number of dimensions, requiring that many multivariate data problems be reduced to fewer dimensions before analysis. It has been shown previously that icons are often useful in allowing a user to detect and comprehend important phenomena and perhaps for communicating major conclusions to others[I-3]. The most important probability distribution for use in statistical analysis is the Gaussian, or normal error, distribution defined by the equation: 1 pc(x,l~,ol=_~exp[_l ( x -#~2l 2\ ~ ,]3"
163
164
CLIFFORDA. P1CKOVER
Fig. 1. Computer-drawn faces calculated from white noise (random distribution).
It is a continuous bell-shaped function describing the probability that from a parent distribution with a mean # and a standard deviation t7 the value of a random observation would be x. Physically it is useful because it describes the distribution of random observations for most experiments. In addition, the most probable estimate of the mean from a random sample of observations is the average of those observations. Figures 2 and 3 are representations of such Gaussian noise. The facial settings are calculated simply by summing several random numbers (see Applications). The Gaussian faces clearly favor the average "middle-of-the-road" settings; i.e. wild excursions from the middle setting face are unlikely. The larger N is, the less expected are deviations in the facial parameters. This concept is similar to a cointossing experiment; if the coin is tossed repeatedly, the fractions of time that it lands heads up will asymptotically approach 1/2 as the number of tosses
increases. The faces make clear the differences between Gaussian and white noise in a visual manner, which can be easily understood even by those individuals with no prior knowledge of statistics and mathematics. Computer-drawn faces may also be useful in the presentation of other statistical concepts. Among such uses is the visual characterization of distribution kurtosis (the degree of peakedness of a distribution) and skewness (the degree of asymmetry of a distribution). Leptokurtic, platykurtic, and mesokurtic distributions[5] would be made clear in a simple manner, without resort to such mathematical concepts as moment coefficients. Other ideal applications would include presentation of J-shaped, reverse J shaped, U-shaped, and bimodal distributions. For very young students, the faces could be used for visualizing simpler concepts such as the mean, median, mode and other measures of central tendency.
The use of computer-drawn faces as an educational aid
® ® ® ® ® ® ® ®
® ® ® ~ ® ®
® ® ~ ® ® ®
Fig. 2. C o m p u t e r ~
® ® ® ® ® ®
® ® ® ® ® ®
® ® ® ® ® ®
® ® ® ® ® ®
165
® ® ® ® ® ®
~ces ~lculated ~ r "weakly" Gaussian noise.
® ® ® ® ® ®
166
® ® ® ® ® ®
Cu~
® ® ® ® ® %
® ® ® ~ ® ®
~ ® % ® ® ®
® ~ ® ® ® ®
A. ~ C K O ~
® ® ® ® ® ®
® ® ® ® ® ®
® ® ® @ ® ®
® ® ® ® ® ®
® ® ® ® ®
®®®®®®®®®® ®®®®~®®®®~ ®®®®®®®®®~ ®®~®®®®®®® Fig. 3. Computer-drawn faces calculated for "strongly" Gaussian noise.
It is hoped that the computer-drawn representations presented here will provide a useful tool for presenting a variety of statistical concepts in a way which requires little mathematical sophisticaiton as prerequisite. REFERENCES
1. H. Chernoff, The use of faces to represent points in k-dimensional space graphically. J. Am. Statist. Assoc. 68, 361-367 (1973).
2. L. Wilkinson, An experimental evaluation of multivariate graphical point representations. Proc. Human Factors in Computer Systems, pp. 202-209 (March 1982). 3. C. Pickover, The use of a vector graphics system in the generation of facial multivariate point representation for data series. IBM Research (RC 10156), (1983), 4. J. Hughes. PL/I Programming. Wiley, New York (1975). 5. M. Spiegel, Schaum's Outline of Theory and Problems of Statistics. McGraw-Hill, New York (1961).