0Forensic Science Society
COMMENTARY
1987
Forensic linguistics: the determination of authorship from habits of style RN TOTTY,* RA HARDCASTLE and J PEARSON Home Ofice Forensic Science Laboratory, Gooch Street North, Birmingham, United Kingdom B5 6QQ
Abstract The application of a method of literary stylistics (or stylometry) to forensic problems such as the authentication of disputed witness statements is described. Relatively large amounts of text are required for both specimen and questioned material before an effective comparison can be attempted and this severely restricts the number of cases in which the method may be applied. The validity of the method as applied to materials of forensic interest is considered to be as yet unproven but such methods are certainly worthy of further study. Key Words: Document examination; Stylometry; Text analysis. Journal of the Forensic Science Society 1987; 27: 13-28 Received 21 February 1986 Introduction A recent paper by Gudjonsson and Haward has commented favourably on the use of Stylometry (also called Stylistics) as a method of assessing the validity of confessions [I]. This has prompted us to present a more detailed discussion of the subject and to report our own work in this field. Stylometry has been defined as "the science which describes and measures the personal elements in literary or extempore utterances, so that it can be said that one particular person is responsible for the composition rather than any other person who might have been speaking or writing at that time on the same subject for similar reasons" [2]. Stylometry sets out to determine how an utterance has been spoken or written; it does not set out to determine by whose hand the text was actually written but asks the question 'In whose words are these views expressed?'. There are a number of approaches to literary stylistics [3]. Traditional opinions on style have been largely intuitive but advances in computer techniques have enabled linguists to obtain accurate data from extensive * Present address: Home Office Forensic Science Laboratory, Usk Road, Chepstow, Gwent, United Kingdom NP6 6YE. 13
texts for objective studies. Most computer-aided analyses of style have been restricted to features which are easily recognised and readily quantified. Two broad groups of features have been used, sentence length and choice and frequency of words. An example of early computer-aided authorship determination is the work of Mosteller and Wallace on the Federalist Papers, a series of documents written in 1787-8 in New York and of disputed authorship. Initial work concentrated on sentence length with inconclusive results but further work on word frequencies led to a satisfactory attribution of authorship [4].
The work of AQ Morton A principal exponent of the application of stylometry to forensic problems is Andrew Morton who has been consulted, and prepared evidence, in a number of cases. Morton and his collaborators have worked extensively on the New Testament and other Greek prose texts and have established the use of stylometry in this field [5-71. Morton's work is developed from the methods of WC Wake who showed that nouns depend too much on subject matter to contribute to authorship studies, and that the same is true of verbs [8]. Morton has concentrated his attention on the frequency of occurrence of common words, usually prepositions and conjunctions, and he relies heavily on the chi-squared test for determining the significance of the data. Numerous examples of applications to Greek prose are given in Morton's book on stylometry [2]. It is Morton's contention that common words, such as prepositions and conjunctions, are much less consciously selected in writing or speech than are nouns, adjectives and verbs. In order to maximize the effectiveness of his tests, Morton makes much use of pairs of common words and their relative positions rather than simple rates of occurrence. The main group of habits used by Morton consists of collocations in which one common word, the mark word, occurs in succession with another, the target word (Table 1). For example, the mark word might be 'OF' and the target word 'THE'. In a sample of text, the total number of occurrences of the word 'OF' and the number of occurrences of the word pair 'OF THE' are determined and the ratio of these counts can be compared with a ratio determined similarly for another sample of text. The collocation need not necessarily involve adjacent words; for example, the number of occurrences of the mark word 'THE' followed by any word and then the target word 'AND' can be combined with the total number of occurrences of the word 'THE'. Further, the target word need not be a specific word-it can be a class of word; for example the mark word could be 'A' and the target word any adjective. 14
A second group of habits is 'proportional pairs' of words. For example, the total number of occurrences of the word 'NO' is divided by the same total added to the total number of occurrences of the word 'NOT'. Other pairs of words which can be combined in this way are 'A' and 'AN', 'THIS' and 'THAT' and 'WITH' and 'WITHOUT'.
TABLE 1. Examples of collocations frequently occurring in writers of English Mark Word
Relation
Target Word
A
an adjective
AND
ALL THE THEN A IF THE THOUGH WELL LAST THE
BEEN
HAVE
FOR
A THE A THE THE
A COURSE THE ON
THE
THAT
THE
TO
A BE THE
Reprinted from Literary Detection by A. Q . Morton [2] Published by R. R. Bowker, Division of Reed Publishing, U.S.A. @ 1978 by Reed Publishing U.S.A., a division of Reed Holdings, Inc. All rights resewed. fb = followed by; pb = preceded by.
A third group of habits is the positioning of common words within sentences. Thus a ratio can be determined of the number of occurrences of a word in a particular position (first word, second word, last word, etc) compared to the total number of occurrences of this word. Having determined rates of occurrence of selected habits within two samples of text, these ratios are compared using the chi-squared test. If several of the habits show significant differences in their rates of occurrence, this is claimed to be indicative of different authorship. With a 95% confidence level, about 6 differences are regarded as conclusive; 1 or 2 differences in about 20 habits are to be expected due to chance variation. On the other hand, if no significant differences are found, this does not prove common authorship+mly that the texts are indistinguishable by the tests applied. The successful use of Morton's technique is illustrated by his examination of some of Jane Austen's writings. The novel Sanditon was only partially completed when Jane Austen died in 1817. The manuscript was recently completed by 'Another lady' in imitation of the writer's style and published in 1975 [9]. Morton has examined the occurrence of collocations in parts of two other novels by Jane Austen, Sense and Sensibility and Emma as well as parts of the two portions of the Sanditon text. From the eight collocations listed in Table 2 it can be seen that there are significant differences between the text of the additional part of Sanditon and the three genuine texts, which, in these respects, correspond closely. There have been criticisms of Morton's work for his failure to apply more than a small number of tests to a set of data and for the assumptions made about authorship on the basis of statistically questionable data. His use of the chi-squared test has also been subject to unfavourable comment [3,10]. Nevertheless others have taken up this work and extended and modified the technique [lo]. It is clear that it represents a creditable advance in literary scholarship.
The validity of stylometry There is, however, a very real difficulty in applying linguistic techniques to forensic problems. The techniques have been developed for literary uses, for examining style of writing, and have been established using material containing many thousands of words. For example, the chapters in Jane Austen's works used in the Sanditon exercise quoted above each contain between 1500 and 2500 words. Similarly like is compared with like-the texts compared deal with similar subject matters and many are written as literary exercises. 16
In forensic applications, stylometry has been used to test the validity of claims by convicted persons that records of interviews, containing full or partial confessions, which formed part of the prosecution evidence at their trial, had been fabricated, in whole or in part. These records are usually written transcripts of utterances made by the person concerned in answer to questions and are taken down at the time by a police officer; they bear no signature or any other acknowledgement by the person being interviewed and their validity as evidence is based on statements made by the police officers that the records are true and accurate accounts of what was said. In comparing the disputed utterances with accepted utterances using stylistic methods, use has been made of almost any material, both written and verbal, that can be obtained and is accepted by the person concerned, including statements, other questionnaires and personal letters. In using the techniques developed for applications in the literary field to forensic problems, a number of questions have not been answered by the
TABLE 2. A comparison of Jane Austen and The Other Lady
Habit
Sense and Sensibility ch.1,3
Sanditon Sanditon Emma (Jane Austen) (The Other Lady) ch.1,2,3 ch.1,6 ch.12,24
AN A+AN A pb SUCH AND fb I THE pb ON fws THIS THIS + THAT WITH WITH
+ WITHOUT
VERY pb THE Reprinted from Literary Detection by A. Q. Morton [2] Published by R. R. Bowker, Division of Reed Publishing, U.S.A. 0 1978 by Reed Publishing U.S.A., a division of Reed Holdings, Inc. All Rights reserved. (a) comparison of the three genuine samples. (b) comparison of genuine samples taken together for comparison with The Other L pb = preceded by; fb = followed by; fws = first word of sentence.
Chisquared (a) (b)
protagonists of forensic stylistics. These include: Is the method valid where relatively small amounts of text are involved? Is it acceptable to compare verbal utterances with written material? Is it acceptable to compare written material of different types, e.g., statements of admission with personal letters? To what extent do circumstances change style of speech or writing? For example, do a person's writing and speaking habits remain constant when he is under pressure as compared to a totally relaxed state? To what extent are differences likely to be found in material taken at random?
It is relatively straightforward to demonstrate clear differences in style between documents but much less so to demonstrate a clear relationship between style and authorship. In our view the protagonists of stylistic analysis in forensic applications have not only failed to demonstrate such a link but have not even attempted to do so. Such a link can only be proven by the examination of material of known origin and authorship and showing that no significant stylistic differences occur in the writings of a large number of individuals. This has not been done. Bailey [ll] has commented on the difficulty of comparing confessions transcribed by the police with letters or other documents purportedly written by the same person, and notes that, whilst linguists have long been aware of some of the characteristic features of writing and speech, much remains to be learned about the statistical properties of features that distinguish each mode. His view is that it is not surprising that in Morton's work differences were found but he finds little reason to assign such differences to authorship or to allow them to support a contention that the police had tampered with or fabricated the purported confession. Bailey sets out three rules that define the circumstances necessary for forensic authorship attribution: that the number of putative authors constitute a well-defined set; that there be a sufficient quantity of attested and disputed samples to reflect linguistic habits of each candidate; and that the compared texts be commensurable. In such cases he asserts that the best conclusion to be hoped for would assign a strong probability in favour of one candidate and a significantly lower probability in favour of other candidates. The use of collocations Collocations must always be chosen with care to avoid those which have a frequency determined by the subject matter rather than the author. We have reservations about the use of some of the collocations advocated by Morton and used in his work on disputed statements, in particular the use of such collocations as 'I have', 'I am', and 'my' preceded by a preposition. 18
The use of 'I have' in a text dealing with present or past events-in the sense 'I have a complaint' or 'I have complained' will of necessity differ from the use of 'I have' in a text dealing with future actions, as in the latter the forms 'I will complain' or 'I should complain' would be expected to be more in evidence. The investigation of context and tense effects on stylometry is worthy of much more attention than it has so far received. These effects are considered briefly by Morton in his discussion of the works of John Fowles and he points out that it is correct to scrutinize the texts for context and tense effects that may affect the choice of collocations. However, we are concerned that the presence of context and tense effects at a level such that they are not detected during visual inspection of the text, but at a level where they have a marked effect on the statistical tests, has not been the subject of any close study.
Stylometry in court-the St Germain case Gudjonsson and Haward discuss [I] the appeal of Ronald St Germain and the linguistic evidence supporting the appeal. At St Germain's trial a number of statements by police officers, in which utterances of St Germain were included, were tendered in evidence. Certain of these utterances were denied by St Germain, although the bulk was accepted by him as genuine. Morton examined a number of utterances by St Germain, and included amongst the control material a statement written by St Germain, a questionnaire of St Germain, parts of the police officers' statements accepted by St Germain and two personal letters written by St Germain to Morton. He found that habits were consistent (i.e., not significantly different) between the accepted samples and compared them with the disputed parts of the police officers' statements which he found to differ significantly from the body of accepted material. In all, forty habits were examined; in twenty-five differences were found and eleven selected for illustration on the grounds of simplicity, Table 3. (Morton did not present calculations of chi-squared values but illustrated the differences found by looking at the standard errors). The stylistic evidence was reviewed in detail at the hearing of the appeal of St. Germain against conviction. A number of other witnesses were prepared to attest to Morton's status, the value of stylistics, and, having read Morton's report, on the validity of his findings. Despite this battery of expertise Lord Justice Scarman said in judgement "The trouble is the absence of a broad enough basis for this stylometric research.. . . At the very best it has to be said that Dr Morton, distinguished scholar though he is, has been unable to advance his conclusions beyond that of hypothesis". This appears to us to be an excellent summary of stylistic evidence in a forensic setting. The differences found by Morton between the two sets of utterances are clearly real but do they have any meaning? In particular are 19
they of such a nature that the conclusion that St Germain did not make the utterances he had denied is inescapable, or are there other equally valid, or even more likely, explanations? Further studies of the St Germain case We have had the opportunity to examine further material written by St Germain, namely a detailed history of events leading up to a court hearing on 9 May 1977, written at some time shortly after this date and three years after the other disputed and authentic utterances examined by Morton.
We have examined the habits determined by Morton for St Germain's genuine utterances and compared them with the habits in the later document. The results are given in Table 4. Our finding that only two of these habits differ, a level of difference that could be expected by chance, is far from sufficient to conclude that there is any difference in style between the accepted material examined by Morton and written in 1974 and the
TABLE 3. A comparison of the disputed utterances of St Germain with accepted utterances of St Germain, based on data quoted by Morton in his report in the St Germain case Habit
Disputed utterances
Accepted utterances
Chi-squared
A pb preposition HAVE
fb verb I
fb HAVE IT pb THAT ME pb TO fb THAT
MY pb preposition THAT fb IS fb personal pronoun
THE as fws as 2 Iws pb = preceded by; fb = followed by; fws = first word of sentence; 2 Iws = second last word of sentence. *values of X Z which are significant at the 95% confidence level (i.e., greater than 3.8).
further material examined by us and written in 1977. As St Germain can have had no foreknowledge that his later text would at any time be the subject of a stylistic examination, and therefore no motive for attempting to use the same habits as in his previously accepted writings, our result certainly provides some strong support for Morton's method as applied in this case. Clearly St Germain's habits remain the same despite the three year difference between the original material and the later documentalthough, as we have not examined all the possible habits, simply those listed by Morton in his report, the possibility that other habits may have changed cannot be excluded. Even so the consistency between the two sets of material, considering the quite major differences Morton observed between the accepted and disputed utterances, is remarkable.
Stylometry and disputed statements It is our contention that the use of stylometry for the comparison of utterances made to the police in the form of statements or questionnaires
TABLE 4. A comparison of the accepted utterances of St Germain with the accepted later utterances of St Germain Habit
Accepted utterances
Later utterances
Chi-squared
A pb preposition HAVE
fb verb I
fb HAVE IT pb THAT
ME pb TO fb THAT
MY pb preposition THAT
fb IS fb personal pronoun
THE as fws as 2 Iws fb =followed by; pb = preceded by; fws = first word of sentence; 2 Iws = second last word of sentence. * values of X Z which are significant at the 95% confidence level (i.e., greater than 3.8).
can only be tested by the examination of utterances known to have been made by the same individuals. In this context we have examined three statements made to the police at different times and concerning different events by one person, (J Smith): a statement regarding events to which Smith was a witness, written by himself in June 1981, comprising 9433 words (A); responses to a police questionnaire, the verbal answers being recorded by a police officer, in May 1981, comprising 2653 words (B); and a statement regarding events for which Smith was a defendant, recorded by a second police officer, in May 1981, comprising 3731 words (C). We have examined habits in all three texts-if no significant differences are found between them, considering the varied nature of the means of recording the utterances, both verbal and written, and the different individuals involved, then the validity of stylometry is supported. The authenticity of these statements is beyond doubt, and there is no dispute of this. For the purposes of this work however we have taken the contents of the third statement (C) as being in question and the contents of the statements A and B as being authentic, in order to be able to make a realistic comparison of the habits in these three texts. It is a vital step in applying Morton's method to determine what habits of the author are consistent in his utterances, and to assess this consistency at the level of the questioned material-that is, we should attempt to establish the consistency of habits in utterances containing approximately the same number of words as the questioned material, in this instance 3731. For this purpose we have divided statement A into three approximately equal parts, and have examined the consistency of habits in the four texts, A l , A2, A3 and B (containing 2925, 3117, 3292 and 2653 words respectively). The results of this comparison are given in Table 5. Of 18 habits where counts were high enough to permit analysis, 15 were found to be consistent. These 15 consistent habits were then compared with the same habits in statement C. The results of this comparison are given in Table 6. Of the 15 habits 3 proved to be significantly different. As an illustration of the level of differences in habits to be expected from random comparisons of texts, we have compared the habits of St Germain, in his later utterances, with the habits of Smith in statement A, as far as is possible within the constraints of consistency and available habits. These results are given in Table 7. Of 16 habits examined only three proved to be significantly different. The results of the comparisons made in the St. Germain case and of the material uttered by Smith have been given in some detail to illustrate the collocations that can be used, the levels of collocation occurrences that can be expected and the values of chi-squared that are determined. 22
Finally, we have compared the habits of St. Germain in his later utterances, and the habits of Smith in the "accepted" utterances of statement A, with the habits of a third writer Brown. Brown's habits were determined from utterances in an admitted interview and were checked for consistency against a transcript of his evidence given at his trial. The results of these comparisons are not presented in detail but are given as part of Table 8.
TABLE 5. A comparison of the "accepted" utterances of Smith Statements Habit
A1
A2
A3
B
Chi-squared
A pb preposition fb adjective fbXOF
FOR fb THE IN
as fws fb THE NO NO
+ NOT
OF fb THE ON
as fws fb THE THE as fws fb X A N D fb XX THE as 2 lws TO fb BE fb THE bracketed by verbs
pb = preceded by; fb = followed by; fws = first word of sentence; 2 Iws = second last word in sentence. fb X O F = followed by any word and then the word O F fb X A N D = followed by any word and then the word A N D fb XX THE = followed by any two words and then the word THE * =values of X 2 which are significant at the 95% confidence level (i.e., greater than 7.8) For explanation of statements A and B, see text.
This contains a summary of the number of habits compared and the number of habits found to be significantly different, in the various comparisons we have made of utterances in the form of Criminal Justice Act statements or interview records. It should be noted that all the material on which the summary in Table 8 is based is accepted by the persons concerned as being correct and true records of their utterances (although for the purpose of this research part of the utterances of Smith have been deemed to be "disputed"). The habits used have been checked for consistency and have
TABLE 6. A comparison of "disputed" utterances of Smith with "accepted" utterances of Smith Statements Habit
A 1 + A2
+ A3 + B
C
Chi-squared
A pb preposition fb adjective fb X O F
FOR fb T H E
IN fb THE NO NO
+ NOT
OF fb T H E ON as fws fb THE THE fb X AND fbXXTHE as 2 Iws TO fb B E bracketed by verbs pb = preceded by; fb = followed by; fws = first word of sentence; 2 lws = second last word of sentence. fb X AND =followed by any word and then the word AND fb X X THE = followed by any two words and then the word THE * =values of ,y2 which are significant at the 95% confidence level (i.e., greater than 3.8) For explanation of statements A, B and C, see text.
been subject to scrutiny for validity-that is, the influence of context and other effects which may distort the frequency of occurrence of the various habits. It can be seen that within the writing of both St. Germain and Smith, the comparison of later material with earlier material already checked carefully for consistency has revealed 3 and 2 differences significant at the 95% level. These are clearly habits in which St. Germain and Smith are not consistent in their usage and yet the normal rigorous tests for consistency have not detected them. We conclude that a significant difference in 2 or 3 habits between accepted and disputed texts could occur by chance and should be
TABLE 7. A comparison of the later utterances of St Germain with the "accepted" utterances of Smith Habit
St Germain
Smith
Chi-squared
pb preposition fbXOF
125 28 37
167 51 37
2.4 2.1
as fws fb X AND fbXXTHE as 2 Iws
343 10 6 23 16
544 45 18 35 26
10.4' 1.9 0.02 0.00
fb BE fb THE bracketed by verbs
295 20 35 63
332 20 40 59
0.1 0.00 1.3
A
FOR fb THE
IN fb THE
NO N O + NOT OF fb THE
ON as fws fb THE
THE
TO
pb = preceded by; fb = followed by; fws = first word of sentence; 2 lws = second last word of sentence. fb X A N D = followed by any word and then the word AND fb XX THE = followed by any two words and then the word THE * =values of X Z which are significant at the 95% confidence level (i.e., greater than 3.8).
ignored. Table 8 also shows that comparison of material uttered by different persons also produces 2 or 3 significant differences-that is, the texts used for stylistic analysis could not differentiate the utterances of St. Germain, Smith and Brown to a level greater than might occur by chance. We view the results shown in Table 8 almost as a natural consequence of attempting to apply methods of stylistic analysis, developed for use in lengthy literary tests, to utterances, both written and verbal, which are relatively short, contain many utterances that are answers to questions, and almost invariably give low frequency counts for the various habits that can be studied. The difficulty in comparing the accepted utterances of St. Germain, Smith and Brown is in finding habits that give sufficiently high frequency counts for chi-squared tests to be valid, utterances which can be tested for consistency and utterances which are free from context effects. In addition, counts relating to word positions within sentences may be affected when one person (e.g., a police officer) writes down and punctuates the verbal utterances of another person.
TABLE 8. Summary of habits in comparisons of utterances in statements Statements "Accepted" utterances of St Germain vs. later utterances of St Germain "Accepted" utterances of Smith vs. "disputed" utterances of Smith Later utterances of St Germain vs. "accepted" utterances of Smith "Accepted" utterances of Smith vs. accepted utterances of Brown Later utterances of St Germain vs. accepted utterances of Brown
Comparable habits
Different habits
11
2 (Table 4)
15
3 (Table 6)
16
2 (Table 7)
10
2
12
3
Conclusions From our study of Morton's work, and our own, perhaps limited, essays in this field, we draw the following conclusions. Morton's work is worthy of more attention and study than it has received to date, but many of the basic questions raised by Bailey and others regarding Morton's application of stylometry to forensic problems remain unanswered. Stylometry is very time consuming. The method is easy to understand and to apply but even with computer assistance, analysis of statements and other utterances is a lengthy process. Consistency in habits can only be determined by examination of a range of utterances by the same person and
context effects can only be removed by very careful consideration of the text concerned. Even after very careful checking for consistency, 2 or 3 differences may well be encountered in comparing 'consistent' material with a further text and the number of differences found between utterances of a few thousand words by different persons may well be quite small-again of the order of 2 or 3. Stylometry can only be applied to relatively lengthy utterances. In statements made to the police, utterances of over 1000 words are rare and 1000 words is the very minimum to which any attempt at stylistic analysis could be applied. We have experienced severe difficulty in obtaining sufficient lengthy utterances for internal comparisons, such as that discussed above in the case of Smith, to be made. The number of occasions when stylistic analysis can be used in an attempt to confirm or refute allegations that utterances have been fabricated by police officers will inevitably be few. Whilst the results of the St. Germain case examinations carried out by Morton and us are striking, we can only agree with Lord Scarman that Morton's methods cannot as yet be regarded as having a broad enough base to be accepted as wholly valid; we would, however, recognise the value of the method as a tool for the investigator. Our own contributions to this field are the results of some ten years study of Morton's methods; much remains to be done and we report our findings in the hope of stimulating interest in this application of literary scholarship to practical and immediate problems and in the expectation that, in the course of time, sufficiently large numbers of utterances will be examined and compared to substantiate or refute Morton's hypothesis. References Gudjonsson G H and Haward LRC. Psychological analysis of confession statements. Journal of the Forensic Science Society 1983; 23: 113-120. Morton AQ. Literary Detection. London: Bowker, 1978. Hockey S. A Guide to Computer Applications in the Humanities. London: Duckworth, 1980. Mosteller F and Wallace DL. Inference and Disputed Authorship: The Federalist. Reading, Mass: Addison-Wesley, 1964. Michaelson S and Morton AQ. Last words: a test of authorship for Greek writers. New Testament Studies 1971-72; 18: 192-208. Michaelson S and Morton AQ. Positional Stylometry. In: The Computer in Literary Studies. Ed. Aitken. Edinburgh University Press: Edinburgh 1972. Morton AQ, Michaelson S and Hamilton-Smith N. To Couple is the Custom: A General Stylometric Theory of Writers in English. Internal Report CSR-22-78. University of Edinburgh: Department of Computer Science 1978. Wake WC. Sentence-length distributions of Greek authors. Journal of the Royal Statistical Society, 1957; 120: 331-346. Austen J and Another Lady. Sanditon. London: Peter Davies 1975.
10. Smith MWA. Recent experience and new developments of methods for the determination of authorship. Association for Literary and Linguistic Computing Bulletin 1983; 11: 73-82. 11. Bailey RW. Authorship Attribution in a Forensic Setting. In: Advances in ComputerAided Literary and Linguistic Research. Proceedings of the Fifth ~nternational Symposium on Computers in Literary and Linguistic Research, University of Aston in Birmingham, April 1978.