A computer program for fitting multimodal probability density functions

A computer program for fitting multimodal probability density functions

Computer Programs in Biomedicine 7 (1977) 1-20 © Elsevier/North-Holland Biomedical Press A COMPUTER PROGRAM FOR FITTING MULTIMODAL PROBABILITY DENSIT...

1MB Sizes 0 Downloads 64 Views

Computer Programs in Biomedicine 7 (1977) 1-20 © Elsevier/North-Holland Biomedical Press

A COMPUTER PROGRAM FOR FITTING MULTIMODAL PROBABILITY DENSITY FUNCTIONS 1 J.R. BRASSARD and M.J. CORREIA Departments of Otolaryngology, Physiology and Biophysics, University of Texas Medical Branch, Galveston, Texas 77550, USA

A FORTRAN IV program is described, which may be run interactively with tutorial assistance or in batch and which allows a user to selectively fit any of seven probability density functions (p.d.f.'s) or a combination of the p.d.f.'s to a unimodal or multimodal histogram of empirical data. A "best-fit", uni- or multimodal p.d.f., which may be obtained by a method of nonlinear least squares or a generated p.d.f, may be displayed on a Tektronix 4010 terminal as a continuous curve against the background of a bar, square wave, symbol or point-plot histogram. The following, supportive statistical information is also displayed: (1) Kolmogorov-Smirnov probability of goodness of fit, (2) mean square error, (3) correlation coefficient, and (4) parameter estimates. The resident driver program and six overlayable segments have been implemented on a Digital Equipment Corporation LAB-11 minicomputer (PDP-11/20). Biostatistics Components Resolution

Minicomputer Program Multimodal Mixture

Parameter Estimation Polymodal Sum

1. Introduction The biological literature is replete with examples of empirical frequency distributions which are multimodal. Selected references from the neurobiological literature illustrate this point. Multimodal interspike interval (ISI) histograms have been observed following analysis of driven or spontaneously discharging neurons in various parts of the central nervous system (CNS) of several species. Poggio and Viernstein [16] found multimodal ISI histograms for action potential trains of spontaneously active lemniscal neurons. Herz et al. [12] found multimodal ISI histograms during single unit recording from cells in the visual system. Bishop et al. [2] found that certain geniculate body neurons produced discharges which had ISI histograms with three to nine peaks. Sanderson and Kobler [20] observed multimodal sequential interval histograms for experimental data from single unit recordings in the lateral geniculate nucleus of the cat. Attempts have been made to mathematically model the underlying stochastic process or superposi1 This research was supported in part by NASA Contract, NAS9-14641.

Probability Density Functions Probability Distributions

tion of processes [ 2 1 - 2 4 ] which account for the multimodal ISI histograms that have been observed from spontaneously discharging neurons within the CNS. These attempts have been generally impeded by the lack of an automatic, precise, and convenient method of fitting a mixture of probability density functions (p.d.f.'s) to a multimodal, empirical, frequency distribution. In other areas of biology, efforts have been made to graphically determine the underlying p.d.f.'s which comprise a multimodal p.d.f. Attempts to analyse multimodal, probability distributions were begun late in the last century by Pearson [14] and continued in recent times by Harding [9] and Cassie [4] using a graphical approach, which consisted of plotting by hand on standardized normal variate, probability graph paper empirical, cumulative distribution functions. Harding showed that if a distribution were unimodal, the resultant plot was linear; otherwise, it was sigmoid-shaped. Using this technique, Harding was able to determine parameter estimates for each component p.d.f, as well as its proportional contribution to the multimodal p.d.f. Using a method developed by Rao [18] for multimodal, normal distributions, Hasselblad [ 10,11 ]

2

J.R. Brassard, M.J. Correia, Fitting multimodal density functions

wrote a program (NORMSEP), which could fit separate modes of a multimodal p.d.f, and obtain their respective parameter estimates 2 Recently, Yong and Skillman [25] have developed a program (ENORMSEP), which uses the probit transformation [8] of input, normal distribution data in order to obtain the sigmoid-shaped curve mentioned above. Polynomials are then fit regressively to this "probit" curve in order to obtain its inflection points, which are then input as initial, mode separation estimates to the NORMSEP program. A different technique was applied by Pernier and Echallier [15] in attempting to estimate the parameters of a bimodal p.d.f. They arbitrarily selected separation points for the empirical p.d.f, modes and then applied a least squares, iterative method to each mode separately in order to obtain parameter estimates. The modes were then "combined" to form a bimodal p.d.f. While the methods of Hasselblad [ 10,11 ] and Yong and Skillman [25] are more accurate than those using probability paper, they have only been applied to normal and exponential distributions. The "iteration by parts" approach taken by Pernier and Echallier [15] relies too heavily on the user's judicious selection of the modal separation points. If they are not selected properly, the correct proportionality constants cannot be obtained. In order to overcome the limitations and/or difficulties inherent in the methods described above, a computer program has been written which can be run interactively (on-line) or in batch mode (off-line). This program, which will be described in the following sections, has the capability of generating and/ or fitting multimodal p.d.f.'s containing a mixture of a maximum of four modes. Because of the partitioning process described in sections 2 and 3 below, the iterative portion of the program may be executed on a minicomputer with only 16K (16 bit) words of computer memory. Additional capabilities of the program, as well as its operational characteristics, are discussed in section 3.

described in a previous paper [3], has been extended to include multimodal p.d.f.'s (also called a mixture of distributions [10,11,19]). In the following discussion we assume without loss of generality that the theoretical, multimodal p.d.f, of interest is comprised of modes (distributions) which are each a function of three parameters. In order to obtain the "best" estimates (in the sense of least squares) of the parameters of a multimodal p.d.f, during some time period T, a set of interevent interval samples { x i l x i = t i - t i _ l ; 0 <~ ti_ 1 < T ; i = 1,2 ..... n)

of the random, independent variables X1, X 2 ..... X k are obtained and entered into N frequency classes of width Ar as counts n / ( j = 1,2 ..... N) from which an interevent interval histogram is formed as a function of the N mid-class values (rj) of the frequency classes. The empirical p.d.f, is then estimated for each class by

nj p(rj) = n A t ;

(1)

and initial parameter estimates a (/)lk, ~)2k) are obtained for the kth mode, considered over a specified range of classes as a separate p.d.f., from the formulas contained in table 1 of [3]. These formulas employ the following expressions for the kth mathematical expectation and variance, respectively Nk Ek(X) ~ ~ r/n] /xr ]=Nk_l+l n

Vk(X)~ ~

]':

_,+1

(2)

n

l

(3)

where (k = 1,2 ..... K) and where 0 = N 0, N 1..... N K = N are modal separation points. For modes which are separable, the modal proportionality parameter 4 is approximated by

2. Theoretical model

A method for fitting unimodal p.d.f.'s which we 2 These estimates consist of the mean and variance as well as the mode proportionality constants.

3 The estimates (and the estimate for nk below) are obtained by a method similar to that suggested by Hasselblad [ 101 for truncated samples. 4 This parameter is the fractional contribution made by the respective mode (i.e., population) to the total p.d.f.

J.R. Brassard, M.J. Correia, Fitting multimodal density functions Nk irk = ~ J=Nk- 1 +

n/ -1

(4)

n

It is clear from (4) that, for each k, 0 < 7rk < 1 and

K

K

Nk

G

G

G

k=l

k=l j=Nk_l+l n

N

l G ,,j= l

rl /=1

(s)

and also that for a unimodal p.d.f. (K = 1), 7rk = 1. If some of the p.d.f, modes are not visually separable (e.g.,the special cases of completely overlapping modes discussed by Bhattacharya [ 1]), then the parameters (0 lk, 02k) of those p.d.f, modes which overlap (i.e., are non-separable) are estimated as in [3] over the p.d.f, classes corresponding to the overlap. Using (4), the proportionality parameter for each of the overlapping modes is given an average weight determined by

frOm = frm/Kom ,

(6)

where o m is the ruth group of overlapping modes defined on the p.d.f, classes bounded by the modal separation points, Nm_ 1 and N m, and where Kom is the number of p.d.f)s in this ruth group. For example, if the ruth group contained three overlapping p.d.f.'s, each of them would have an initial, proportionality parameter with value frm/3. Using the relative parameter estimates defined above, a theoretical, multimodal p.d.f, can be computed for each frequency class by K

p(rf) = k~=l frkPk(O lk, 02k, "r]),

(7)

where for each k, Pk(Olk, 02k, r/) can be any one of the p.d.f.'s listed previously (see table 1, [3]), which is definitive of the kth mode. The values obtained from (1) and (7) are used in determining the residual differences s between the theoretical and empirical p.d.f.'s. This is the first step of the nonlinear, least squares, iteration process which we have previously described [3]. The reader is referred to that publication for a description of the iteration model and certain "goodness of fit" tests such as s The residual differences (residuals) are the elements of the Y-matrix (see equation (11) in [ 3 ] ).

3

the X2 statistic, Kolmogorov-Smirnov statistic and Williams and Kloot test. The present publication will describe only modifications in the iteration model and additional "goodness of fit" tests which have been incorporated into the present program. To accomodate the computing requirements of additional modes, the Hessian matrix 6 described previously [3] is created by a partitioning process (see section 3.5) in the program described herein. This process may be summarized as follows• Consider the least squares solution for the 3K parameter correction estimates, which may be expressed by

AO = A - l g

(8)

where A = qbTdp is a 3K × 3K Hessian matrix, g = ~ T y is an N × 1 gradient correction matrix, and ~ is the N × 3K matrix of first partial derivatives of the multimodal p.d.f, defined by (7). The ~-matrix may be partitioned by mode as follows qb = [p1 ! p2 ~... Pk]

(9)

where, for the estimated parameter vector 0, and for the kth mode, - 0 p ( r 1, 0) ~p(7"1,0) a0 lk

Op(r 1, O)

002k

0frk

ek=

(10)

ap('rN,O) ap(7"N,O) ~0 lk

aP(rN,O)

O02k

~#k

The Hessian matrix, A, (for k = 3 see fig. 4 below) and the g-matrix in (8) may then be expressed, respectively, as

-ex el I ere21 .. I ----i---

I--

_L___

PTP1 'P2t'2, , T I ... A = ---

,I t,~t' K ~- - *I - -• a I . . . . . f I

I

I - - - - 1 " - - -

I --i

(11)

t " -

--

I "-I . . . .

_PKTPll, e kr P21, • . ' e, ; : e x

6 This matrix is referred to as the PTP-matrix by Marquardt [ 13 ] and as the ~ T~-matrix in the previous paper [ 31.

4

J.R. Brassard, M.J. Correia Fitting multimodal density functions

FPrC P rl g:

3. Program description

.

(12)

YJ Thus, the matrices of (11) and (12) are created from the submatrices { P k } during each step of the iteration process until convergence is achieved• Following the iteration process, several statistical variables; which have been added to the original program, are computed in order to provide the user with additional information concerning the least squares "goodness of fit". The first of these variables computed is the mean square error (MSE) which is determined by N [~(r/) - p ( r / , /))12 MSE = i=1

N-

(13)

3K

where N - 3K is the number of degrees of freedom associated with the fitting process. Also, the discrete, theoretical, cumulative distribution and the residual sum are computed, respectively, using N C = ~ p(rj) /=1

(14)

N Sr = ~

1=-1

[~(r/) - - p ( T / , 0)] .

(15)

Finally, using (13), the coefficient of correlation (determination) [26] is obtained from R = + x/1 - (MSE/V~)

(16)

where Vp is an unbiased estimate of the variance of ~(r) over N probability density function classes and is defined by N V~-'

/=1 N-1

N

N(N-

1)

(17)

This program represents a significant modification and supplementation of a program we have described previously [3]. Since it can now fit multimodal p.d.f.'s, its power is increased many times. However, to avoid redundancy we refer the reader to our previous publication [3] for a description of many of the program segments. In this publication we will only describe additions or modifications to the predecessor program. These changes will also be shown in the flowcharts (see below). The multimodal p.d.f., curve-fitting program consists of a resident driver progrmn (PDF) and six overlayable program segments (PDFIN, PDFITR, MLTPDF, MLTPTP, PDFOUT and PDFWK), which are shown in the block diagram of fig. 1. The three overlays (MLTPDF, PDFITR, MLTPTP) are used to generate unimodal or multimodal p.d.f.'s or to iteratively obtain their parameter estimates. Even though the iterative process used in these three overlays could be collapsed into a single larger overlay in computers with more than 16K words of computer memory, nevertheless, a user could eliminate or reduce the graphical portions of the input and output overlays and still have a workable program which would run in 16K words of memory (see decimal core requirements in section 5). The input and output overlays (PDFIN and PDFOUT) provide the user with the following capabilities: (1) B a t c h m o d e - allows the user to obtain parameter estimates overnight with graphic and/or print output, which may be placed on mass storage. (2) 4 0 1 0 m o d e - enables the user to run the multimodal p.d.f, program interactively on a Tektronix 4010, storage, cathode ray tube (CRT) terminal even if it is not designated as the system console in DOS/BATCH [7]. (3) C o n t r o l l e d graphics - provide the user with high quality, graphical output which may be scaled and/or hardcopied. (4) P a r a m e t e r storage a n d retrieval - saves and restores parameters of user-selected models. (5) A u t o m a t i c p a r a m e t e r e s t i m a t e s for both unimodal and multimodal p.d.f.'s. Shown in table 1 is a list of subroutines (labeled with a single asterisk in fig. 1). Since these subroutines

J.R. Brassard, M.J. Correia, Fitting multimodal density fimctions

INPUT OVERLAY (PDFIN) °°

TUTORIAL OUTPUT (TOOTI*

HISTOGRAM COMPUTATION {HGROUP) °

PARAMETER ESTIMATION (PDFEST}"°

t

OUTPUT OVERLAY (POFOUT)*"

I/0 FILE QUERY (ENTRY)"

LEAST SQUARES ERROR ANO STATISTICAL COMPUTATIONS (PDFLSQ}"

HISTOGRAM PLOTTING (PDFHST)

PRINT AND PLOT OUTPUT (PDFPLT) ° °

MULTIMODAL PDF COMPUTATIONS OVERLAY (MLTPDF)

WILLIAMS & KLOOT TEST OVERLAY (PDFWK)*o

1

5

PTPMATRIX FORMATION OVERLAY {MLTPTP)

POF ITERATION OVERLAY (PDFITR)*"

I

1

MULTIMODAL PDF GENERATION AND RESIDUALS COMPUTATION (PDFGNR)

PDF FUNCTIONS AND PARTIALS COMPUTATION (POFCMP)**

MULTIMODAL PDF MATRIX EQUATION SOLUTION (POFSOL)"

LOG~ GAMMA ANO ~1 FUNCTIONS COMPUTATION (LGMPSI)O "

PTP MATRIX INVERSION (PDFINV)"

PLOT LIBRARY (TEXLIB)

MATRIX MULTIPLICATION; (POFMLT)"

SYMMETRIC MATRIX FORMATION (PTPSYM)

.r w[

MATRIX COPY (PTPCPY)

Fig. l. Flow diagram illustrating system structure and overlay paths. Table 1 Subroutines used in the unimodal p.d.f, program [3] and unmodified in the multimodal p.d.f, program Subroutine

Purpose

TOOT

Input a tutorial file from the disk and print it

ENTRY

Determine the Input/Output device and data file name and then open the file Group interevent interval data into histogram classes of a user-specified size Compute the log e Gamma function and the derivative of the log e Gamma function Solve a linear matrix equation Multiply two matrices to form a resultant matrix Invert an N × N square matrix using a GaussJordan elimination process Compute least squares error, Chi-square statistic, Kolmogorov-Smirnov statistic, and standard deviations of parameter estimates

out

ltG ROUP LGMPSI PDFSOL PDFMLT PDFINV PDFLSQ

were discussed in some detail in the original paper [3], they are listed here only to familiarize the reader with

their separate purposes. The program blocks labeled with a double asterisk in fig. 1 have undergone minor or major modification. Thus, in the sequel only the changes to these modules will be discussed unless otherwise necessitated by the discourse. The dashed lined, program s y m b o l s and paths, shown in the flowcharts that follow, indicate functions and decisions performed in the original unimodal PDF program and carried over unchanged into the new m u l t i m o d a l PDF program. Those program symbols outlined with a solid line indicate current additions to the original program.

3.1. PDF The initial function of this driver program, PDF, is to read t w o o f the processor console, switch register switches. If switch 6 is set to the "one's position", the program will run in batch m o d e , suppressing unnecessary

CALL TOOT

E~,~-

t

,_ _ _ ~ _ _ /

....

I

YES

'

YES -.

~

iI NO

•x TUTORIAL ~

NO

*~

I I

ENTER I OUTPUT i I ~ DATA i TYPE I

."

I

• ~ PROGRAM . "

-" ENCO, ".

"; .o

~ J

/ ~CONTINUE~ WILLIAMS& ~ \ ~E$_ ~.K LOOT MODE / "

i

2-

ENTER / NUMBER

V

I)1

:

I

i

I

)

I

YES

/ / AUTO • \ CLASS LIMITS ~ • ? /~

~

\

\

II

@

~(

~

~

YES

. . . . . .

/

I

~ /--

I

/

\

~"

NO

PDF ~ODE T~ EIS) /

El "ER

CALL ENTRY

i

i

ENTER INPUT DATA

...

r . . . .

,

INPUT i ~ HISTOGRAM t i CLASS I I MID-VALUES/

-\'S"~O0~AN:ESS/

T

COMPUTE PARAMETER POINTERSFOR EACH MODE

' HISTOGRAM \.

I;~, ~

N-~;.~.,-'

' INPUT FI LE /' ~'~ HEADER RECORD / '

ENTER FILE RECORD

\OF INTEREVENT/ ~ INTERVALS - ~ " ANOCLOCK/ ~ RATE t

< ,-.;&

/ / /AUTO ~ ~/ CLASS ~ ~ " ~ \ \ LIMITS ~ / "

Z ' Ls ~

INPUT / PARAMETERS / FOR EACH t MODE ,

FROM KEYBOARD ,>

. "PARAMETERS".

i

YES

~ GENERATE ~ \ . YES MODE ~" . . . . .

-. ,N:UT/

~

\

i

I

/'"\\\

NO

ODMPUTE CLASS LIMITS

,~.



NO'S ~ i

PATH = 1 " ? / ~

t

NO I

~

-~

II ;I

~HISTOGRAM", y l - $ , ,VIEW MODE. ~

YES ( '\

YESI"

/"\

NO

" 'I

LIMITS i"

t ~ _ ~ ~ PAT.- 7

I)l

~

<

i

.~p// CALL " \ HGROUP / }

I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

:NO

N O . "~GENERATE" ~. "~ MODE / -4-q

INPUT I INTEREVENT . 4 . . . . . . . ~ INTERVALS/ ~ I

" ~ ~.'/HISTOGRAM'NO ~" ~ INPUT / "~

i

~

r . . . . . . . .

I COMPUTE ~1 TOTA L NUMBER i OF INTEREVENT INTERVALS

-- --

' . . . . . . .



I I

I

~ ~ YeS . ~ ' - ~ --I

NO ~ PASSOF ~ ".GENERATE ITERATE - ~ ~ MODE •

INPUT II CLASS / FREQUEN- I ~1" . . . . . . . CIES I

I I

/HISTOGRA M~ ~ NO ~ INPUT /~ \ ? / /

~/

"

i_

MODE

NUMBER iI --&-I I NO

MAXIMUM CLASS FOR

LOWEST I \ FREQUENCY t

,, ,.PuT,

~ "AUTO • NO ~ ' " CLASS ~" LIMITS

; .1F

CALL ITERATION OVERLAY

I I

I COMPUTE I HISTOGRAM r CLASS I MI(~VALUES t- _ _ -t . . . .

r-

/!

~

II I t i

Fig. 2. Flowchart of driver program (PDF) and input overlay (PDFIN). Solid lines indicate modifications to original program [3] (indicated by dashed lines).

MOOES

\op p/

,

\:'°

• \ YES / / ~ PRINT • \ \ OUTPUT ~• ? / / ~

"

~/WILLIAMS~ ~/ES ~ KLOOT ~• \ ~ MODE ~ "

No

• •HISTOGRAM\ \ y ~ ~ - VIEW MODE • " ~ ) //

~

""

ITERATE

-~.E/

~ "~ENERATE~

~ \

. . . . . . . . . . . . .

ENTER I PRINTING / ~ CYCLE

I i !

~

~. /~

ENTER i/ CONVERt GENCY / EPSILON I I

i CALL i ITERATION : OVERLAY . . . . . . . .

\

I I

- ", . . . . .

CALL- - - • " INPUT I OVERLAY i

. . . . . . .

ii ~

/ I

..~.ES

IN 'UT SE dEE

\/ INPUT I FREQUENCYL. CLASS I TM

J.R. Brassard, M.J. Correia, Fitting multimodal density functions

interactive printouts; the opposite switch position allows the program to run in the interactive mode. If switch 3 is set to the "one's position", the program automatically assigns both input and output to the Tektronix 4010 terminal. If switch 3 is set to the "zero's position", the user must assign the input and output devices. Switch 3 and switch 6 must not be set to the "one's position" simultaneously. The remaining functions of the driver are to force load FORTRAN COMMON blocks and library routines and to call the input overlay, PDFIN, for program execution. 3.2. PDFIN

As shown in fig. 2, after the "PRINT OUTPUT?" decision is completed and the program is not running in modes 6 or 8 (see table 2 [3]), PDFIN (the main program of the input overlay) presents the following query:

7

required for each mode type as well as initial and final pointers associated with each mode are computed. The progrmn then resumes its original path as shown in fig. 2. The next change made to PDFIN concerns the input of data from the interevent interval file. After the interevent interval file header record is read, the following request is printed: ENTER RECORD NO.: (I2) The user then enters a two digit positive integer which indicates the interevent interval record to be used in the data file currently open for processing. This feature allows the user to gather many groups of interevent interval data and store them under a single t'de name: Continuing to the next program change, after the lowest frequency class number 7 has been entered through the keyboard by the user, the following query is presented:

ENTER # OF PDF MODES: (I1) ENTER MAX CLASS FOR EACH MODE: (I3) A response of a positive integer from 1 to 4 indicates the number of modes (and, hence, the total number of parameters) that must be processed by the program. This response causes the following request to be printed:

ENTER MODEL DEV: FILNAM.EXT:

The response to this query is a three digit, positive integer (002 to 255) which is entered on a separate line for each mode. For example, if the user has chosen three modes and the mode "boundaries" (i.e., separation points) are at classes 19, 85 and 156, these are the numbers that would be entered. If the user wishes the program to find the multimodal boundaries or if there are no visible boundaries, the user may enter the number 156 8 three times on separate lines. Once the user has delimited the boundaries (and, thus, the class window 9) over which different modes are to be fit, the program calculates the number of events in the bounded, multimodal p.d.f. If the Williams and Kloot mode [3] has not been selected, a bar, frequency histogram is then formed by subroutine PDFHST and displayed on the Tektronix CRT terminal. Small arrows are displayed

The user must supply a two or three character device name, followed by a colon and the name of the file containing the information (e.g., DK: Z400.PRM or MTI: MODEL.001). After the modes have been input through the keyboard or from mass storage, the number of parameters

7 Lowest frequency class number and maximum class number (mode boundary) for each mode must be entered only when class limits are user-selected (see section 3.2 [3]). 8 The number 156 applies to this example only and would differ according to the class boundary of interest. 9 The class window was defined above by lowest frequency class number and the maximum number of the mode classes.

ENTER PDF MODE(S) BY TYPE: (I1) The user must then enter a positive integer from 1 to 7 (see fig. 5 or table 1 [3]) on as many separate lines as the number of modes previously chosen. If the number of p.d.f, modes is 0 in the second preceding query above, previously saved, model modes and parameters may be input from mass storage (disk, magtape, etc.). In this instance, a response to the following is necessary:

8

J.R. Brassard, M.J. Correia, Fitting rnultimodal density functions

at the top of the graph of the histogram to indicate mode delimiters, which have been specified by the user. Another modification to PDFIN concerns computation of parameters estimates for each of the modes of a multimodal p.d.f. This change 10 is indicated in fig. 2 (following label "G"). Using the p.d.f, mode class boundaries entered above and the parameter pointers for each p.d.f, mode, the program computes the sum of events for each mode. This sum is used by PDFEST in the computation of the modal parameter estimates (as indicated in the theoretical model). Next, if the p.d.f, being processed is multimodal, the proportionality parameter (see section 2) is computed for each mode. Following this, parameter estimates are output on the user device for each mode. 3.2.1. P D F H S T

This subroutine is used by PDFIN to plot a bar histogram, which represents the frequency distribution of the input data, with associated, labelled axes (see fig. 8). This routine has a single path command (see section 3.4.5 and table 3 [3]), a "CTL/P" followed by a "4", which can be activated after the histogram is plotted. If the user enters this combination through the keyboard, the following query results: ENTER PDF MAX SCALE (F6.0): By entering a five digit floating point number, the user may scale the ordinate axis of the frequency histogram. The histogram is then replotted and the same control options are made available to the user. 3.2.2. P D F E S T

Only one change was made to this subroutine, which is that the mean and variance of each mode are labeled as they are printed out on the user device.

this program. Instead of calling PDFCMP to obtain functions and partials of a single mode p.d.f, model, the overlay, MLTPDF, is called to perform multimodal computations using PDFCMP. A feature added to PDFITR allows the use of sense switch zero on the processor front panel as a means of viewing the progress of the iterative procedure. If this switch is set to the "one's position", the program prints out on the user device the following variables: (1) the iteration number, (2) the previous least squares error, (3) the current least squares error, (4) Marquardt's 11, (5) "r angle 11, (6) the current parameter estimates, (7) the cosine of 3' and (8) the current g-matrix 11 values. 3.4. M L T P D F

This is the main program of the overlay used to compute a unimodal or multimodal p.d.f. (see fig. 3) and, if required, the associated partial derivatives and residual sum of squares (least squares error). If the selected p.d.f, is multimodal, the absolute value of the proportionality coefficients (see section 2) associated with each mode are summed and tested for unity. If the sum is not unity, each coefficient is divided by the sum. This action constrains the coefficients to sum to unity. The functions and partials (if required) for each p.d.f, mode are then computed using subroutine PDFCMP (see section 3.4.1 below). The functions associated with each mode are then summed to form the multimodal p.d.f. Then, if the program is in generate mode 12, overlay PDFITR is brought into memory to complete the generation process. If the program is in iterate mode 12, the least squares error is computed and, if necessary, overlay MLTPTP is called for the next step in the iteration process (otherwise, a return is made to overlay PDFITR). 3. 4.1. PD F CMP

3.3. PDFITR

One small change and one addition was made to lo This change applies only if the user wants the program to determine automatic parameter estimates for input, multimodal p.d.f, data.

Two minor modifications were made to this subroutine, which is used to compute the functions and/ or partials for the modes (mode) of a multimodal (unimodal) p.d.f. 11 See section 2 [3]. 12 See table 2 [3].

J.R. Brassard, M.J. Correia, Fitting multimodal density functions

-

MLTPDF

)

PTPRD

~)

J-J

M LTPTP

®

COMPUTE RANDOM

COEFFICIENT SUM

L

CONSTRAIN COEFFICIENTS TO BE GREATER THAN OR EQUAL TO ZERO AND TO SUM TO UNITY

CALL PDFCMP

READ COLUMN OF

/

/

PARTIALS

+" (

(

PTPCPY

)

1

CALCULATE POSITIONS OF RELATIVE LEFTMOST AND RIGHTMOST COLUMN POINTERS INTO PARTIALS MATRIX

ACCESS POINTERS FOR COLUMNS OF PARTIALS MATRIX

O,REC,ACCE~ /

)

9

COPY GIVEN MATRIX INTO INDICATED POSITION IN PTP MATRIX

INPUT RELATIVE BLOCK OF COLUMN PARTIALS AND FORM SYMMETRIC PORTIONS OF HESSIAN MATRIX

REFLECT ¢-~PYI

COPY SYMMETRIC MATRIX TO PROPER DIAGONAL LOCATION IN PTP MATRIX

OF GIVEN MATRIX ACRCI~ PTP DIAGONAL

L

(,

r RETURN )

RETURN )

I PTPSYM

)

MORE PDF MODES

COMPUTE RELATIVE PORTION OF MARQUARDT "G" MATRIX

FORM UPPER

TRIANGULAR PART OF SYMMETRIC MATRIX

CALL PDFITR

)

YES

1

REFLECT

TRIANGULAR MATRIX AC ROSS THE DIAGONAL

(

,•

RETURN ) CALL POFITR

)

I

INPUT NEXT 8LOCK OF COLUMN PARTIALS

1

COMPUTE OFF-OIAGONAL BLOCK FOR PTP MATRIX

I COPY RL~KAND ~1 ITS TRANSPOSE TO PROPER LOCATIONS IN PTP MATRIX

COMPUTE RESIDUAL SUM OF SQUARES

CALL ) MLTPTP Fig. 3. Flowcharts of multimodal p.d.f., formation overlays (MLTPDF and MLTPTP).

First, if a multimodal p.d.f, has been selected, the partials with respect to the proportionality coefficient of each mode are now computed. The second modification allows this subroutine to

copy the partials associated with each mode to a random access file located on the disk. This method allows rapid retrieval of partials information, which is later used in the formation of the Hessian matrix.

10

J.R. Brassard, M.J. Correia, Fitting multirnodal density functions

Ia \

I\

. . . . . . . .

I I

!

~

~xl

as block I; block V's computation method is that of block H. As each Hessian block is computed, that portion of the g-matrix (see section2 [3]), which is associated with the block, is computed as well. After these computations have been accomplished, program control returns to the iteration overlay, PDFITR.

TIT

-

t- . . . . .

\ \

IT'

]3Z

3Z

\

.................

\

-x:-,~. . . . . . . 3.5.1. PTPRD

\

\

Fig. 4. Diagram illustrating the partitioning of a 9 × 9 symmetric matrix into nine 3 × 3 submatrices. The submatfices lI', IIl', and V' are identical to the submatrices I1, III and V across the matrix diagonal (heavy dashed line). 3.5. ML TPTP

This overlay, illustrated in fig. 3, is used to form the Hessian matrix by means of a partitioning process (see (9), (10) and (11) of section 2 and fig. 4), which uses the partials information stored on a random access file by PDFCMP. The Roman numerals, which appear in fig. 4, indicate the order in which the blocks (outlined by dashed lines) comprising the Hessian matrix are computed. The heavy dashed line represents the diagonal of the completed matrix. Every block that has a Roman numeral with a prime symbol is the mirror image 13 (across the Hessian diagonal) of the block with the same Roman numeral. After the pointers for the block of partials that are to be used in the formation o f / i n fig. 4 have been computed, the partials are input from their temporary storage location on the disk; and the upper triangular matrix IA , contained in 1, is computed and its transpose copied into lB. , The next block of partials is read from the disk and is multiplied times the first block of partials to obtain H, the transpose of which is copied into H'. Blocks III and I I f are obtained in a similar fashion. Blocks I V and VI are computed in the same manner

After computing the random access pointers for the columns of partials previously written on the disk, a direct access read of one column of partials at a time is performed until the number of columns requested to be input is satisfied. 3.5.2. P T P S Y M

This subroutine is used to form a symmetric matrix. This is done by computing the upper triangular part (e.g., IA ) of the matrix and then by copying the transpose of the upper part into the lower part (e.g.,

I8). 3.5.3. PTPCPY

After the values associated with the blocks of fig. 4 are computed using a temporary array, this subroutine is used to transfer them to their proper locations in the blocks comprising the Hessian matrix. If requested, a transpose copy (e.g., the primed Roman numerals of fig. 4) of the these values can also be obtained. 3.6. PDFOUT

As can be seen in the flowchart of this subroutine in fig. 5, if the program is running in generate mode, subroutine PDFLSQ is bypassed. After plotting and/or printing has been accomplished by PDFPLT, the user is asked: SAVE MODEL? If the response is any positive, single-digit integer, the user is then asked the name of the device and file on which to store the p.d.f, model modes and parameters. This is accomplished by answering the query: ENTER MODEL DEV: FILNAM.EXT:

13 The Hessian matrix is a symmetric matrix.

The response is identical to that of PDFIN (see section

/

\

/

\

PDFPLT

CALL

OVERLAY RETURN

/

/

OUTPUT MODEL MODES AND

/

\

CALL POFLSQ

/

~

\

\

)

SAVE PDF MODE COMBINATION IN TABLE

CHECK FOR PDF MODE COMBINATION IN TABLE

ENTER NUMBER OF PDF

POFWK

)

I

/ \\

I. . . .

COMPUTE TEST T-VALUE

-

~,

i I

PARAMETER POINTERS FOR EACH MODE

REQUEST S ~ : o T CNH

I

. . . .

I _

_

YES . . . . . .

I I

P.

I I~

-TI--__ i,~. -I

/ /

~~

SAVE TABLE INOEX TO COMBINATION

ERROR

COMPUTE LEAST -----SQUARES

/

\~/

SECOND PDF

"//\

INPUT MODE L MODES AND PARAMETERS FROM MA~S

. . . .

~I~

(~

NO ....

®

ENTER // I PROBABILITY / I ~ AND / . . . . . . TABULAR I I T-VALUE I COMPUTE CONFIDENCE INTERVAL

COMPUTE SUMS OF SQUARES

I I I. . . .

i

I

-I I I I I

I I

I

PARAMETERS FOR EACH MODE

. . . . .

_~

--

YES

. . . . .

~ INPUT J

~

~. .

I COMPUTE Ibl Z AND y 2 . '~1 I I FOR EACH CLASS I

I I I. . . . . . . .

I I I I

. . . .

. . . . . . . I COMPUTE I SLOPE AND I STANDARD I DEVIATION ~ ~

I ~ I~ i t

OUTPUT TEST

-

-~

I

....

RESULTS

/

II

\ \

./

x

"~ - --

i

I

NO

"~

i

I I

I

I

'

_rE_s_ - 1 /

\ yES PATH = 1 ~. . . . . ? / \\ / /

\\

PATH = ~

/

SAVE PARAMETERS IN TABLE

¢

\

"

ENTER / PROGRAM / ~ PATH / CHOICE / \ I I I

/

~,

II

~--

~1

I t

CHECK FOR DIFFERENCE BETWEEN I MODELS I. . . . . . . . .

I I D'l

Fig. 5. F l o w c h a r t o f the o u t p u t overlay ( P D F O U T ) and the Williams and K l o o t T e s t overlay ( P D F W K ) .

YES

PDFOUT

~

MOOE

REMAIN IN WILLIAMS& KLOOT

I

RETURN

-OVERLAY ......

/I

"

r~

4

.1.

PLOT ODE VARI |LES



IN IT PI T :

~MAC ~ 8;H A " ~O ME IE

q

CORRE ~TION COEFF ;lENT

i I

PDF V LUES

COM' J SUN JTE )F DEVIA1 DNS& SU~ Z)F

I

FIND MUTUAL MAX OF EM= PIRICAL THEORETICAL POF*S

~

EMPIRICAL PDF DATA

I+I

STATISTICS ASSOCIATED WITH DATA

T

PLOT I FINAL PARAMETER VALUES

I SCALE& PLOT COMPUTED PDF USING TEXLIB

I

I _

I I NO

I

Y~S

YES

PRINT

I CAL PDF i I VALUES/

AND THEORET "t

Ii EMPIRICAL /

~

I

Ii

._

~

t .... t PRINT tI H STOGRAMI MID.CLASSt ~ VALUES f ---i-I

....

\

YEs

NO

C T

2

~TCH

MAX

YES

YES

II

C

3

~VECTOR '

PRINT I FINAL CORREC. I TION

NO

YES

-, NO \\ /~ \ , ~ LAST \ ,// PRINT \ NO -- -- - ~ " HISTOGRAM.' \ CYCLE /'~- ~ \~ CLASS /

~\

PRINT tI t OUTPUT ii t I HEADER I

I

\

PRINT\ \ ~, OUTPUT .~ \\ ? // /

~

i

I

' 3(~lhli F'iN~AI } ~PARAMETERS ' & ASSOCTD ~1~1STANDARo / ~ DEVIA- ,' TIONS

t

T

PRINT LSE, MSE.AND

ASSOC rED STATI! INFO -TI(

PLOT INTERVALI HISTOGRAM USING TEXLIS

)

Fig. 6. Flowcharts of the print/plot subroutine (PDFPLT), the frequency histogram plotting subroutine (PDFttST) and the p.d.f, generation subroutine (PDFGNR).

©

®

i++.! I

MEAN SQUARE ERROR

POFGNR

cz,

5

13

J.R. Brassard, M.J. Correia, Fitting multimodal density functions

3.2). Then program control is returned to the driver program, PDFIN, as would also occur if the user had responded with a " 0 " to the "SAVE MODEL?" query.

Table 3 PIot options used with subroutine PDFPLT, Symbol

3.6.1, PDFPLT

+

Initially, PDFPLT (see fig. 6) computes the following: (1) mean square error, (2) sum of residuals, (3) sum of theoretical p.d.f, values and (4) coefficient of correlation. Then, if plot output is not requested, program control transfers to the print routine. Otherwise, if the user has requested plot output and is also running in batch mode, the following information is requested:

Option

-2 -I

Vector (i.e., line) Plot Point Plot

~ ~-~

Type

Bar Histogram

1

Square-wave Histogram

9

Circle

O

O •

I~





11

Solid circle



12

Circle with Bottom Half Solid

O



13

Circle with Right Half Solid

D

O

14

Square

S

15

Square with Bar

ENTER DEV: FILNAM.EXT: •

Circle with Bar



16

Solid Square

8

17

Square with Right Lower Solid



18

Small Square at 45 Deg.

e

19

Triangle



2~

Small Triangle



21

Pine Tree

The response to this query is the same as that of PDFIN (see section 3.2). After it has been entered, the plot mode variables are initialized so that plot information will be transferred to mass storage. Next, in both interactive and batch modes, a continuous curve, which represents the theoretical curve obtained using either the generate or iterate modes, is generated using subroutine PDFGNR. The mutual maximum of the empirical p.d.f, and the generated p.d.f, is then obtained. After the theoretical p.d.f, is scaled and plotted and the empirical p.d.f, is scaled, statistics associated with the data (Kolmogorov-Smirnov, Chi-square, etc.) and the final parameter estimates are plotted 14 as well as a bar histogram of the empirical p.d.f. Next, if the program is running in batch mode, a nonnegative integer is input, which is used to indicate the program path to be followed (see table 2). Table 2 Path control variable used with subroutine PDFPLT

If path " 2 " is selected, the program requests the type of plot to be used for the replot of the empirical data by:

Path Option

Meaning

ENTER PLOT TYPE (I2):

Restore automatically scaled plot Change plot type and/or scale Input plot mode and file information (interactive mode only), where the plot mode variable has the following options: -1 = CRT (Tektronix 4010) mode only 0 = Plot-file mode only 1 = Both CRT and plot-file modes

By entering one of the integral options appearing in table 3, the user is able to obtain one of a large selection of plot types listed ~s Following this, the scale for the replot is requested by:

14 In batch mode, all plots are output to user-selected mass storage,

•0.





22

Solid Triangle

+

+

23

Plus

~:< X

24

X

x

25

Tabled X

2

26

Z with Bar

~

27

Y Star

"6 :~:

]



28

X

29

Sour Glass

I

3~

Vertical Line

X

31

Tiny Square, Extended Corners

] s All symbol plots are centered at coordinates corresponding to the p.d.f, class mid-values.

14

J.R. Brassard, M.J. Correia, Fitting multimodal density functions

ENTER PDF MAX SCALE (F6.0): The response is a five-digit, floating point number, which is to be used as the maximum ordinate value for the replot. If the program is running in interactive mode and the path selected is "3", the program requests: ENTER PLT MODE: The response is one of the values specified in table 2 for the Path "3" option. If plot-file mode is selected, the program then asks for the name of the mass storage file name, using the same query that is used for plotting while running in batch mode (see above). The program then loops to the location where the plot mode variables are initialized. The remainder of this subroutine (with the exception of additional printout) is the same as that of the original publication [3]. 3. 6.2. P D F G N R

The purpose of this subroutine, shown in fig. 6, is twofold. First, it is used by PDFPLT to generate the number of ordinate points necessary for a smooth and continuous plot of a theoretical p.d.f. The function generation algorithm is identical to that of PDFCMP and will, therefore, not be described here. It differs from PDFCMP in that the complete multimodal p.d.f. is formed within the subroutine itself. Secondly, this subroutine computes the residual differences between the values of the empirical and theoretical p.d.f.'s. This option is exercised only when PDFGNR is called by PDFWK (see below). 3. 7. P D F W K

Shown in fig. 5 is the flowchart for the Williams and Kloot Test [3] overlay, which has been modified to provide additional capabilities in the areas of data input and storage. The data input modification is identical to that of PDFIN (see section 3.2) in that previously saved, model modes and parameters may be input from mass storage. These values reside in a p.d.f, mode combination table which allows the user to store in local memory precision parameter estimates for a maximum of nine different p.d.f.'s, each of which may consist of

a maximum of four modes. After the p.d.f, mode types have been input, the user is requested to enter through the keyboard the p.d.f, model type by: ENTER INPUT TYPE: (I1) A response of "0" indicates that the input p.d.f. model was iteratively obtained;whereas a response of "1" indicates a model which was "forced" (i.e., noniteratively obtained). This information is used by the program to compute the correct number of degrees of freedom for each model. After the associated mode(s) and parameters for each model have been input and if the mode combination table is not empty, a search is made to determine whether or not the model has already been stored in the table. If the combination is not found and the table is full, the following message and query is printed out: ***MODE COMBINATION LIMIT REACHED *** ENTER TABLE POSITION: (I1) A response of a digit from 1 to 9 indicates to the program the position in the table which can be filled with the mode combination of the current model; whereas a zero response indicates that the mode combination table is to be cleared and the test restarted. If the table is not full, the mode(s) of the model are stored in the next available location. If the mode combination is found and if the number of degrees of freedom of the input model match that which is stored in the table, the index to that combination is saved. The user is then allowed the option of changing all of the parameter values associated with the current mode combination. If the user does not choose to do this, the parameters are next input from the keyboard (if they have not already been input from mass storage) and saved in the table. If the number of degrees of freedom does not match, the program uses the procedure described above to obtain an available table location in which to store the mode(s) of the model. The parameter-change option just mentioned allows the user flexibility in determining what level of accuracy is required of the p.d.f, parameters in the computation of the least squares error used in the

J.R. Brassard, M.J. Correia, Fitting multimodal density functions

15

Table 4 Plotting modules contained in TEXLIB program library Module

Language

Purpose

ALS AXIS CItARS CPLOT EAS.MAC FP2DIR INUMBR LABEL LSHIFT NUMBER PLOTIT PLTMOD PLTSAV STPCHK STP410 ~ SYMBOL TITLE VTPLOT

MACRO-11 FORTRAN IV MACRO-11 MACRO-11 MACRO-11 FORTRAN IV FORTRAN IV FORTRAN IV MACRO-11 FORTRAN IV FORTRAN IV FORTRAN IV FORTRAN IV

Initialize 4010 input and output characters and plot points and vectors (lines). Plot a graphical axis beginning at any given (x, y) coordinate and at any given angle. Complete ASCII character set plus 23 symbols used in special symbol plots. Plot characters of a given scale at any given angle of rotation. Generate structured assembly language programs [171. Convert floating point numbers to a double integer, one part of which represents a fraction. Plot an integer beginning at any given (x, y) coordinate and at any given angle. Plot an axis label,beginning at any given (x, y) coordinate and at any given angle. Perform left or right logical bit shifts. Plot a floating-point number beginning at any given (x, y) coordinate and at any given angle. Replot on graphics terminal a plot file previously saved on mass storage. Set plot mode (see table 2). Used to save an unformatted plot file on mass storage.

MACRO-11 FORTRAN IV FORTRAN IV MACRO-11

Used with keyboard and Tektronix 4010, respectively, for program path control.

VTSCAL

FORTRAN IV

YNUMBR

FORTRAN IV

Plot a string of characters beginning at any given (x, y) coordinate and at any given angle. Plot an alphanumeric title beginning at any given (x, y) coordinate and at any given angle. Plot input x- and y- coordinate, raster arrays as point, symbol or vector plots or as a bar (or square wave) histogram on a Tektronix 4010 terminal (using ALS - see above) Used to compute, if requested, the minimum and maximum of an input array and its associated scaled raster array. To horizontally number selected tick marks on the y-coordinate axis (see fig. 8).

Williams and Kloot Test. After the table values have been entered, the program computes the residual differences using PDFGNR. Overlay PDFWK then performs the same actions as in the previous paper [3] until it reaches path commands 2, 3 and 4. If the path chosen by the user is "2", then program control returns to the beginning of the overlay. If path "4" is chosen, the mode combination table is cleared and path "2" is followed. If path "3" is chosen, the program returns to the point where the statistical parametric pair {c~, t) is requested (see section 3.5 [31).

3.8. T E X L I B TEXLIB is a plot library developed for use with the Tektronix 4010 terminal. Most of the subroutines contained in this library are written in FORTRAN IV. Since there are a large number of subroutines in this library and since the primary purpose of this paper is the explanation of the algorithms used in fitting multimodal p.d.f.'s, the TEXLIB modules have been listed in table 4 along with their associated purposes.

4. Sample run Four sample runs were made in order to determine the multimodal p.d.f, which best fit an empirical, p.d.f, histogram formed from 1024 interspike intervals (ISI's) measured to the nearest 1 msec. These ISI's originated from adjacent action potentials in a train of spontaneous discharges from vestibular primary afferents of the pigeon. This discharge was obtained from an "awake", paralyzed preparation which had received only local anesthesia. Generally this preparation produces unimodal ISI histograms [5]. However, multimodal distributions can also be observed. The interactive input and output of the run which produced the best fit is shown in fig. 7. A frequency histogram of the interspike interval input is present in fig. 8 (the arrows are mode delimiters), while the graphical output corresponding to the bimodal run of fig. 7 is illustrated in fig. 9b. The graphical output resulting from a unimodal run is shown in fig. 9a. A comparison of the graphs as well as the curvefitting statistics (P.K.S., M.S.E. and R) indicates that a mixture of two Wiener-Levy, first passage time (f.p.t.) p.d.f.'s (see [5]) best fit the empirical histogram.

16

J.R. Brassard, M.J. Correia, Fitting multimodal density functions

*** TUTORIAL? e

B•NO ;

PDF

***

i-YES

t976 OUTPUT? ( I t ) e PRINT DELTA: B

(X3)

ENTER B OF PDF HOPES: 2 ENTER PDF NODE(S) 3 3

([1)

BY TYPE:

(It)

PRRflHS? ( 1 1 ) B

ENTER D E V I C E : 1

t=DK ;

4=PR ;

6=KB

ENTER FILNRM. EXT: JULY41.?76 HIST e

INPUT?

(11)

ENTER RECORD NO. 1

(I2):

AUTO CLASS L I H ! T S ? 1 ENTER CLASS S I Z E 1.

(It) (HSEC):

ENTER START CLflSS: e

(Ft2.5)

(13)

ENTER HAX CLflSS FOR EriCH MODE: 35 99 FREQUENCY CLflSS L I N I T S : E = E •

e. 2 5 6 5 2 3 E O. 4 6 t 4 4 5 E

e2 02

ESTIHRTED T H E T R ( I ) , I • I ,

8.287727E

e2,

8.348926E

02,

[

E, ¥ = ¥ =

(13)

99] e. 2 0 3 9 e t E 0.84535eE

FOR PDF NODE t FOR PDF HODE 2

e2 02

6: 8.112564E e.?38822E

et, Be,

e.?seeeeE e. 2 5 e e E e E

De, De,

EPSILON? (Fi2.5) 80800081

Fig. 7. Computer printout showing interactive input and iteratively obtained output of a bimodal, Wiener-Levy, first passage time p.d.f, which has been fit to the empirical histogram of fig. 8. Examining fig. 7, we find that after bypassing tutorial assistance print/plot output is requested for the iterate mode. By responding with a zero to the

"PRINT D E L T A " query, empirical and theoretical densities will not be printed after the iteration process is completed. The Wiener-Levy p.d.f, is chosen for

J.R. Brassard, M.J. Correia, Fitting multirnodal density functions J OF I T E R R T I O N S ? 2e8

17

(13)

ENTER PLOT TYPE ( I 2 ) : 1 ENTER PDF MAX SCALE ( F E . e ) : £

***

P R O B R B I L I T Y D E N S I T Y FUNCTION OUTPUT

FOR THE FOLLONING PDF NODE(S) ¢ ) FPT N - L

(

OVER t e e CLASSES: 2 ) FPT N - L

t3

***

CONPLETE I T E R R T I O N S )

DELTR THETR

e. t e 9 2 3 9 E - e 9 -e.?ea3eSE-e9 THETA(I)

+/-

-B. 3 4 5 2 8 3 E - t e

8.654938E-e8

-e. Eee2etE-69

STANDARD D E V I A T I O N THETR(1) THETR(2) THETA(3) THETR(4) THETR(5) THETA(6)

P.D.F.

SUN -

= = -

e. 3 8 3 2 9 £ E 8 2 e. t 5 5 9 1 4 E e t

SD SO SD SD SD so

O. 574795E'eO 8.24e?lEE 82 8. E t 2 e 2 2 E e e e. 4 2 5 2 8 5 E e e

e. 999975E

88

CORR. COEFF.

=

8.987482E

"

G. 7 8 8 1 4 2 E - 8 3

MITH

9 4 DEG OF FREEDON

N, S . E .

"

8.829938E-65

NITH

9 4 DEO OF FREEDOM

CHI-S•

-

O. t 4 3 2 3 1 E

NITH

5 5 DEG OF FREEDON

83

FOR t 8 2 4 OBSERVATIONS RND DNRX = e. t 2 3 3 1 ? E - e l THE P R O B R B I L I T Y OF R GOOD F I T I S e. 998

?

7

6

*************

E PLURIBUS

= = = = = =

e. 3 6 2 2 6 9 E - 8 4 G. t 5 1 8 7 ? E - 8 5 8.383495E-B5 8.£8198tE-G3 e. 2 8 5 8 7 3 E - e 5 e. 5 3 2 i S e E - 8 5

RESIDURL SUN =

L. S . E .

I

-8. t35395E-e9

UNUN

8.25e729E-e4

8e

OCCURRING AT CLRSS

*************

t

t5

9

?

6

SAVE MODEL?

1 ENTER NODEL D E Y : F I L N R R . EXT: 3ULY4i, 9?6

***

POF

***

T U T O R I A L ? B-NO ~ i - Y E S

9

both modes of the bimodal p.d.f, to be fit, using parameter estimates automatically determined by the program. Interspike interval information is obtained from the first record of the disk file July41.776. Using a class size of 1 millisecond, classes 0 through

35 identify the limits of the first mode of the bimodal data; while classes 36 through 99 serve the same purpose for the second mode. The estimated mean and variance are as indicated for each mode; while the corresponding, initial, parameter estimates are given on

18

J.R. Brassard, M.J. Correia, Fitting multimodal density functions

minal. After the characters "CONTROL-P" followed by the digit " 2 " are entered through the keyboard, the program requests one of the plot types listed in table 3 above as well as the maximum value for the ordinate plot axis. Fig. lOa is the result of a response for a square wave histogram and maximum plot ordinate (probability density) of 0.1. Following the user entry of a "CONTROL-G" through the keyboard, the parametric correction vector and parameters are then printed out. Besides these data, four additional variables, which have been added to the output of the previous program [3], are printed: (1) the discrete, theoretical distribution represented by "P.D.F. SUM", (2) the residual sum (3) the coefficient of determination (correlation) represented by "CORR.COEFF." and (4) the mean square error (M.S.E.). Following the printed line of asterisks, the current model's parameters, mode types and indices are output to the disk file JULY41.976 (fig. 10b is an example of an output, symbol plot generated by this saved file). The program then exits normally to the system monitor after the " 9 " is entered in response to the tutorial query. To further demonstrate the power and versatility of the program described herein, an empirical histogram was obtained from fig. 6 of the work of Harding [9]. This histogram represents the distribution of the

UNIT JULY41.776 9~. ¸

>~ ,,=, =. 0

45.

== z

o

~ 0.00000

50000

I00000

INTERVAL TIME (MS)

Fig. 8. Empirical histogram of data obtained from the interspike intervals (ISl's) of the spontaneous discharges from the primary afferent UNIT JULY41.776 of the pigeon's anterior semicircular canal. separate lines. The proportionality parameter, rrl, indicates that the first mode comprises approximately 75% of the complete, initial, theoretical p.d.f.; whereas 71"2 indicates that the second mode comprises 25%. An e with a value of 10 - 8 determines the convergence criterion, which must be satisfied within 20 complete iterations. However, within 13 complete iterations e is satisfied; and fig. 9b is output to the Tektronix 4010 ter-

PD.F BASED ON THE FOLLOWING MODEL: PD.E BASED ON THE FOLLOWING MODEL: FPT W-L

I) FPT W-L

2) FPT W-L

0.089"

0.089

UNIT JULY41 776 1024 EVENTS RK.B. = 0998 R=0.96740 --

0.044

o.ooo

O00000

(a)

~

50.000

,NTERWL T,ME (MS)

leO000

0.044

o ooc

- --~.~'

0.00000

(b)

:.......:.........:""~' . " F - - ~ . ~

50.000

' "

"" Joo~ooo

INTERVAL T,ME (MS)

Fig. 9. The p.d.f.'s, which best fit the data of fig. 8, are: (a) unimodal Wiener-Levy f.p.t.p.d.f, with parameter estimates 01 = 22.64 and 02 = 0.83. (b) bimodal Wiener-Levy f.p.t.p.d.f, with parameter estimates: 011 = 38.33, 02] = 1.56, ~1 = 0.57,012 = 24.07, 022 = 0.6 l, and ~2 = 0.43.

J.R. Brassard, M.J. Correia, Fitting multimodal density functions

19

PDE BASED ON THE FOLLOWING MODEL: I)FPT W-L 2)FPT W-L

PDF BASED ON THE FOLLOWING MODEL: I) FpT W'L 2 ) F P T W-L O.lO0

0.100 UNtT JULY41. 776 1024 EVENTS F~K.S. : 0 9 9 8 R: 0.98740

~

0.050

- - - - ~ J

0.000

0.o50

.J

J

,

,

,

,

,~

, -

50.000

0.00000

(l~l)

0.000

__

'

IO0.C)O0

.

.

50.000

0.0000

INTERVAL TIME (MS)

IO000O

INTERVAL TIME (MS)

(b)

Fig. 10. Plots illustrating special program features: (a) rescaled, square wave histogram fit with the curve of fig. 9b. (b) rescalcd symbol (e) plot generated by the program using the output parameter estimates of fig. 7. Table 5 Comparison of models tested against the distribution of the lengths of 1122 fish from data provided by Harding [9] Mode Type and Number

Normal 1

Normal 2

Normal 3

Parameters a)

011

021

~1

012

022

~2

013

023

~3

Model 1 b)

14.5

1.3

0.5

11.85

1. l

0.4

9.15

0.9

O.1

Model 2 c)

15.13

1.07

0.3

12.32

1.32

0.6

9.12

1.13

0.1

The Williams and Kloot Test [3] indicated that Model 2 produced a significantly better fit to the data than did Model 1 (t = 6.96, P < 0.05, d.f. = 6). a) The parameter estimates 0 ]k 02k ~k represent the mean, standard deviation, and proportionality factor for the kth mode (k = 1, 2, 3). b) Model 1 is based on the parameter estimates obtained by Harding ([9], p. 149). c) The parameter estimates for Model 2 were iteratively obtained using the program described herein.

PD.F BASED ON THE FOLLOWING MODEL: I1 NORMAL 2)NORMAL 3)NORMAL 0.200'

HARDING DATA 1122 EVENTS RK.S. = 1.000 = . ¢3

~ q.,oc

O.OlXl

5.0(XX)

,z;oo ,.TERVAL r..,~ ("S)

lengths o f 1122 fish. Using a graphical approach, Harding d e t e r m i n e d that this histogram could best be described by a m i x t u r e o f three normal distributions, the parameters o f which are listed as Model 1 in table 5. The program described herein, after it had iterated to the best fitting m u l t i m o d a l p.d.f. (see fig. 11) consisting o f a m i x t u r e o f three normal distributions, yielded the parameters listed for Model 2 in table 5. The results of a comparison o f the two models using the Williams and K l o o t Test [3 ] are also shown in table 5.

" 2o.&o

Fig. 11. Empirical histogram of data obtained from the distribution of the lengths of 1122 fish and "best" fit with a trimodal, normal p.d.f. (see Model 2 in table 5).

5. Hardware and software specifications This program is designed to run on a Digital

20

J.R. Brassard, M.J. Correia, Fitting multimodal density functions

Equipment Corporation LAB-11 (PDP-I 1/20) minicomputer with the following, minimum number of hardware components (1) 24K (16 bit words) of memory, (2) RK05 magnetic disk unit, (3) KE11-A extended arithmetic element, (4) Tektronix 4010 graphics terminal, and (5) Disk Operating System (DOS/BATCH Version 9.20c) [6]. The driver program, the overlay main programs, and their corresponding subroutines (excluding plot routines) are written in FORTRAN IV. The subroutines contained in the plot library, TEXLIB, are written in FORTRAN IV and MACRO-11, the assembly language of the PDP-11. All computations, except for the least squares error computation performed in double precision by PDFITR, are done in single precision arithmetic. The decimal core requirements are as follows for the driver and its six overlays: (1) PDF driver segment occupies 5176 words. (2) PDFIN overlay occupies 12,425 words. (3) PDFITR overlay occupies 5867 words. (4) MLTPDF overlay occupies 5738 words. (5) MLTPTP overlay occupies 5256 words. (6) PDFOUT overlay occupies 14,974 words. (7) PDFWK overlay occupies 7751 words. DOS/BATCH requires 3 or 4 K decimal words for system routines, buffers and device drivers.

6. Mode of availability Reprints can be obtained from the first author at the address listed below. A source program will be furnished if a 9-track 800 BPI magnetic tape is mailed to the following address: John R. Brassard, Research Division, Department of Otolaryngology, 500L (R6) Clinical Sci. Bldg., University of Texas Medical Branch, Galveston, Texas 77550, USA

Acknowledgements The authors would like to thank Marian Y.Y. Yong and Robert A. Skillman of the Southwest Fisheries

Center, Honolulu, HI for sending them a listing of their program, ENORMSEP. F o r typing many drafts of this manuscript, the authors would also like to express their appreciation to Mrs. Janice (Bean) Arriola and Mrs. Joanne Alvarado.

References [ 1] C.G. Battacharya, Biometrics 23 (1967) 115. [2] P.C. Bishop, W.R. Levick and W.O. Williams, J. Physiol. 170 (1964) 598. [3] J.R. Brassard, M.J. Correia and J.P. Landolt, Computer Prog. Biomed. 5 (1975) l l . [4] R.M. Cassie, Australian J. of Mar. Fresh. Res. 5 (1954) 513. [5] M.J. Correia, manuscript in preparation. [6] DOS/BATCH Monitor (9.20C), Digital Equipment Corporation Program Library Number DEC-11-ODBMA A - (P01 to P05). [7] DOS/BATCH System Manager's Guide, Digital Equipment Corporation Manual Number DEC - 11 - OSMGA A-D, [8] D.J. Finney, Probit analysis (Cambridge University Press, London, 1964). [9] J.P. Harding, J. Mar. Biol. Assoc. 28 (1949) 141. [ 10] V. Hasselblad, Technometrics 8 (1966) 431. [11] V. Hasselblad, J. Amer. Stat. Assoc. 64 (1969) 1459. [12] V.A. Herz, O. Creutzfeldt and J. Fuster, Kybernetik 2 (1964) 61. [13] D.W. Marquardt, J. SIAM 11 (1963) 431. [14] K. Pearson, Philos. Trans, Roy. Soc. London 185A (1894) 71. [15] J. Pernier and J.F. Echallier, int. J. Biomed. Comp. 4 (1973) 241. [16] G.F. Poggio and L.J. Viernstein, J. Neurophysiol. 27 (1964) 517. [17] R.B. Price, Enhanced assembly structures (EAS.MAC), Program No. 11-256 (Digital Equipment Computer User's Society, Maynard, 1976). [18] C.R. Rao, Advanced statistical methods in biometric research (Wiley, New York, 1952). [19] P.R. Rider, Ann. Math. Stat. 32 (1961) 143. [20] A.C. Sanderson and B. Kobler, Biol. Cybernetics 22 (1976) 61. [21] M. Ten Hoopen and H.A. Reuver, J. of Appl. Prob. 2 (1965) 286. [22] M. Ten Hoopen and H.A. Reuver, Biometrika 53 (1966) 383. [23] M. Ten Hoopen, Brain Res. 3 (1966) 123. [24] M. Ten Hoopen, Kybernetik 4 (1967) 1. [25] M.Y.Y. Yong and R.A. Skillman, Fish. Bull. of the NOAA 73 (1975) 681. 126] G.U. Yule and M.G. Kendall, An introduction to the theory of statistics (Griffin, London, 1950).