Information Retrieval

Information Retrieval

Forensic Science Case The Autumn Symposium of the Forensic Science Society was held on Saturday, 15th November, 1969, at St. Mary's Hospital Medical S...

2MB Sizes 4 Downloads 285 Views

Forensic Science Case The Autumn Symposium of the Forensic Science Society was held on Saturday, 15th November, 1969, at St. Mary's Hospital Medical School, London. About 150 members and guests attended the meeting. Five of the papers presented are included in this edition of the journal.

Information Retrieval A. S. CURRY Home Ofice Central Research Establishment, Aldermascon, Berkshire, England T h e v ~ o r kof the Information Division of the Home Ofice Central Research Establishment i s described. A n account of the searching of the world's scientijc literature by manztal and computer techniques i s given, followed by details of the dissemination of selected items to the Regional Forensic Science Laboratories. T h e storage of the information and the methods of retrieval are also discussed. T h e collection o f analytical data by the Division i s mentioned together with a description of the feature card system for the identification of a n unknown infrared CZGYVB.

The subject of this paper is, as far as I can discover, unique in the Forensic Science literature to date and as such it therefore represents an original contribution coming from the Home Office Central Research Establishment. I t emanates from the Information Division, and I am merely the spokesman for the staff in that particular section of the Establishment. I t may be of interest to recall that it is only three years ago that the Central Research Establishment came into being and that the Directors of the Regional Laboratories put the provision of a centralised information service high on their list of priorities of work that should be undertaken by the new Establishment. I will in this paper describe our progress to date.

The Beginnings of the Information Service Ever since the beginning of the Forensic Science Service each member of staff had been responsible for keeping himself scientifically up to date, and of ensuring that he was well informed. Most laboratories are near University centres and in addition to the use of their libraries a number of journals and books were bought for use within the laboratory. In addition, each laboratory had built up to varying degrees analytical collections of poisons and drugs and, in some cases, infrared and ultraviolet curves. There were also in existence certain specialised collections often still tended by their originators. I t was obvious that this division between a literature service and the building up of "collection" data had to be kept and our early thoughts went little further than hoping to make the existing arrangements more effective. Our first task, therefore, was to ensure that the "desirable" journals were bought a t the Central Research Establishment, so we ordered them and put in subscriptions to Chemical Abstracts and Analytical Abstracts ; we then sat back waiting for them to arrive. I t did occur to us that somehow the "information" we were going to get from these sources would have to be reproduced so that we could circulate it to the Regional Laboratories. I felt very strongly that it would be of little value for the Central Research Establishment to act as a store-the information had to be in the operational laboratories easily accessible to the scientists doing case work. I t was a t this stage we discovered that before one can get a copying machine its projected use had to be investigated by the Operations and Methods 151

branch of the Home Office Establishment and Organisation Department. I t was our contacts with Opcrations and Methods and later with Mr. B. Terry, of H.M. Treasury, that turned us from enthusiastic amateurs to whatever degree of professionalism in Information that we have to-day. They asked searching questions, made us estimate in detail, arranged demonstrations of manufacturers' equipment, measured our requirements against practicalities and, in short, gave us every possible assistance. At the same time contacts were established with Dr. A. K . Kent a t the Chemical Society Computer Unit at Nottingham University, and plans for data collection coding were helped by discussions with Dr. W. E . Batten, then a t ICI and now the new Head of the United Kingdom Chemical Information Service. Slowly a pattern emerged ; firstly, the information had to be found ; secondly, it had to be copied and circulated ; thirdly, it had to bt indexed in a form so that any item could be easily retrieved. Let me consider these separately.

The Various Sources of Information To find the information is a massive task for the forensic scientist. His interests cover every conceivable interest ; chemical, from dyes to explosives, plastics and petroleum ; biological, from serology and immunology to hair morphology and botany ; toxicological, from arsenic and antibiotics to tranquillisers and rat poison ; in addition, his interest includes new instrumentation, new techniques, new products and, in short, everything the world of science has to offer that may be of use in crime detection. As some measure of the size of this field it is pertinent to remember that over 1,000,000 papers appear annually in the chemical field alone. So far, I have only considered the "science" side-but in fact the Home Office pathologists are not forgotten and Forensic Pathology is also an area the Central Research Establishment attempts to cover in its search through the world's literature. Somehow the world's scientific literature had to be brought to the Central Research Establishment ; in practice four ways have been found to give adequate coverage. Firstly, there are a number of journals in which, experience has shown, papers of interest frequently occur. Naturally one of the first orders went for the Journal of the Forensic Science Society ! Secondly, there are published collections of Abstracts ; thirdly, a useful publication called Current Contents which lists title pages of journals ; and lastly, the use of computer searching of commercially produced literature abstract services. At the Central Research Establishment all these methods are used regularly. There is a clear division of information gathering here in that collections of abstracts are inevitably published a considerable time after the original ; in the other methodsliterature searching, Current Contents, and computer techniques-the aim is to reach information as soon after it is published as possible ; they are called "current awareness" techniques. Some measure of the efficiency of computer use may be gathered from the fact that we have on a number of occasions received details of a paper of interest to us from the two-weekly interrogation of the Nottingham computer which uses Chemical Titles and Chemical and Biological Activities (CBAC) tapes flown over from the States, and when we have gone t o the library for the Journal it has been discovered that the issue has not yet arrived from the publishers ! The abstracting, programming and tape production being done a t the proof stage having beaten tlie printing of the Journal.

T h e Use of the Computer As one of the original users of the Chemical Society Unit a t Nottingham, we were able to contribute towards a study of the broad profile type of search, which is carried out in the following way : the computer stores the authors' 152

names, titles of papers, journals, and for CBAC (Chemical and Biological Activities) a digest of the paper ; the user interrogates the store by using "key wordsH-that is a word or words which describes his particular interest. If the word exists in the store, the computer prints out details of the paper in which it occurs. In this way it is possible to search for papers of interest in over 1,000 Journals every two weeks. This may seem a t first the ideal way t o search the literature-after all, all one has to do is to dial in one's key words and sit back. In fact one cannot fully realise this dream for the forensic scientists' interests are so wide that many thousands of key words would have to be used and the output would be phenomenal-indeed, using 23 key words a regular output of about 150 titles is received a t the Central Research Establishment every two weeks. Over 50% of these have a bearing on our type of work, the remainder are of no value, being retrieved because of the occurrence of the key word but in the wrong context. To try and get copies of the possibly relevant papers from the libraries is itself a major task and in practice only obviously highly relevant papers can be tackled with the other titles being retained for future use. The other point about computer searching is that it is impractical and indeed too expensive to search back in the literature. If one suddenly becomes interested in a particular subject then the best way to do a literature search is still to look it up in the Decennial Abstracts. Putting it on the computer profile will only tell you if anything has been published on the subject that particular week, and if it stays on profile, the week after next, and so on. One exception to this statement is that MEDLARS-the computer covering medical searching, did re-run all tapes looking for "Cannabis"; they have published a list of the nearly 50 titles they retrieved.

T h e Detailed Study of the Titles Obtained The searching of computer output, abstracts, Current Contents and the journals has to be done manually ; there is no short cut. I am often asked "how do you decide what is relevant?" There is no magic formula about this ; one has to use experienced forensic scientists to weigh the value of a paper within their own expertise and, at the Central Research Establishment, heads of scientific divisions are responsible for marking up papers in journals provided for them by the Head of Information Division, Mr. M. Swain. When the marked copies are returned, Xerox copies are made of the relevant pieces, and Mr. Swain then makes his decision as to which pieces of information, abstracts, and full copies of papers should be circulated to the Regional Laboratories in the next monthly circulation. The remainder of the papers are possibly pertinent to research topics, instrumental techniques not yet used in operational work, or so specialised that they can be stored centrally a t the Central Research Establishment, available on receipt of a query from a Regional Laboratory. They go to make up the Central Research Establishment's Central Bank of Information. A considerable amount of effort is put into library searches for full copies of papers which the original search has revealed only in title or abstract. The National Lending Library, the British Medical Association Library, Reading University, the Atomic Weapons Research Establishment, are used regularly, and may I particularly mention the National Institute for Research in Dairying, in Reading, to whose library Mr. M. Swain makes usually at least twice a week pilgrimage. The Police College at Bramshill is yet another most helpful contact.

Information Obtained from the Laboratory Staff I have, so far, made no mention of the most potentially useful sources of information-that is the members of the staffs of the Regional Laboratories themselves. In the course of their investigations into case work they often seek out information that would be of interest to other laboratories ; by sending a copy to the Central Research Establishment the circulation is done 153

for them. This aspect of information retrieval-"user participation" i t is called -is not as highly developed as I would like to see it. I t may be that the value of shared information is not fully appreciated yet, but certainly I am convinced that if all the Regional Laboratory staffs realised their responsibilities in this area, our efficiency in Information Retrieval would be much higher. One area in which this type of retrieval is highly developed is in the collection from the Regional Laboratories of analytical results in cases of sudden death involving poisons and drugs. The results are correlated and circulated in the form of a "Register of Human Toxicology". In this way the operational scientist, faced with a request for analysis for a relatively uncommon poison, knows that the Central Research Establishment will have a literature file on it and that if a similar analysis has been tackled in the last three years by the Forensic Science Service, not only are the results a t his elbow but he knows where to contact the scientist who did the work in other relevant cases. Human contact is, of course, one of the most important forms of information sharing and the Colloquia which are held a t Aldermastori in the Central Research Establishment constitutes another major way in which contact within the Service is encouraged. To date there have been 27 colloquia with invited speakers, not only from within the Service but also from Industry and the Universities. The Copying and Circulation of the Information From all these sources the search of the literature leads to an accumulation of data in the bank, and hence a circulation of highly relevant material is made monthly. This monthly circulation to the Regional Laboratories is divided into 5 different coloured folders into which papers involving Biology, Chemistry, Toxicology, Pathology and General Interest are put. Each folder carries a single page sheet indicating the contents of the whole circulation. In 1967 and 1968 the circulation covered 2,350 papers, most in full form, although, of course, a significant number are retrieved through abstracts. (At the Central Research Establishment the whole of Chemical Abstracts back to 1907 are held on microfilm). The way in which the data is circulated is on Xerox copy ; the original machine was a Xerox 420 ; our current one is a Xerox 720. Using the original machine, we made 10 copies of 1,000 infrared curves from the Harrogate collection in 34 days ; the estimated rate with the new machine is 5,000 copies a day. Since the Central Research Establishment started, over 350,000 copies have been made and if one is turning out 10 copies a time the cost is approximately 2id. a sheet.

Storage Problems and the Use of Microfilm This, you can see, is beginning to add up to an awful lot of paper and it was apparent from the beginning that although after 10 years we would have a magnificent Forensic Science Literature collection, we would probably have to build a library in each Region to hold it. The answer to the problem has been the use of microfilm and the provision of each Regional Laboratory with a 3M type 400 ReaderIPrinter. Each year's collection of 2,000 sheets of foolscap, complete with its own index, now occupies a space about the size of a 20 packet of cigarettes. Any page that is required to go on case notes can be copied immediately. The cost of the ReaderIPrinter is approximately £800 ($2,000) and each printed page costs about 11- (12 cents). I would like to say a few words about the use of microfilm ; to all of us used to sitting down with a book almost horizontal on our knees it does require a major effort to accept the vertical screen. Many people believe that microfilm gives an inferior image-if this is so, then the film has been badly made-a good microfilm image should be as good as a printed page. We do have problems in that often one is filming a copy of a copy, and then the master film itself has 154

Fig. 1. This photograph shows a reader,'printer being used with a collection of boot and shoe print impressions on roll film. Second from right is a microfiche viewer, and nearest to the camera is the infrared retrieval s ~ . s t e m .

to be copied to send a cassette to each Regional Laboratory so that the final microfilm print out may be sometimes a fifth generation copy. At each copying stage definition is lost but even so the print is quite acceptable. The point is that microfilm reading is different to book reading ; it requires a little practice getting used to, but the ability to flick the pages, slowly or quickly, refer to the index, retrace one's reading, is equally possible with microfilm. I hope the Regional scientists are practising this technique ; to date they have 8 cassettes completed, about 16,000 pages of data, and soon the bank a t the Central Research Establishment is t.;, be microfilmed so that all the holdings are available in the Regional Laboratories. Let there be no mistake about it, microfilm is here to stay, its use is increasing in libraries and soon there is little doubt that journals and books will be published on it. I mentioned we had Chemical Abstracts back to 1907 on microfilm-they fit into about 1 cubic foot instead of several library racks.

T h e Ufidating of the Information So far I have only considered roll film--that is a continuous length of film which is, for our purpose, 16mm wide, 100 ft. long, and takes 2,000 pages of print. The cost of making a film is about £13 ($30) with £2 ( $ 5 ) for each copy. Although this is an ideal storage medium it has shortcomings in that one needs the capacity to "update" tlie bank.

To take an example, the file on barbiturates a t the Central Research Establishment, containing hundreds of reprints or original papers is now so bulky that it will take one cassette to hold it ; but that is not the end of our acquisition of data on barbiturates, papers of interest will continue t o appear. I t is not convenient nor practical to film these as soon as they arrive, nor is it practical to 1.5.5

cut a master film, insert a few more pages-or images-at the end, splice it up and turn out 10 more films for the Regional Laboratories every time updating is required. You can see that every time updating is done one would have to reproduce the whole of the original in this technique. The alternative is to update using 6" x 4" film called "microfiche". I n this method when only 60 or so pages of new acquisitions on a subject liave been collected, the pages are photographed, fed into a jacket and copied directly by diazo copying producing very cheap copies quickly and easily. In this way one can create a card index of subjects, the only difference being that this card index is of film. Updating then becomes easy, cheap and quick to handle. In the Regional Laboratories the microfiche reader is separate from the M 3 ReaderIPrinter, although I believe 3 M now make an attachment for fiche.

The Retrieval of the Information I have now reached the stage where the literature has been searched, the information has been collected and circulated, either on Xerox or microfilm. Also plans for the bank to be available to the Regional Laboratories on roll film, and the updating by microfiche has been made. The problem that remains is how "retrieval" of the information is to be achieved. By this I mean, how an index is to be prepared of all the data on file. This is indeed a very big problem if more than average efficiency is t o be achieved. The obvious thoughts of labelling the cassettes and indexing with a list of titles a t the beginning and of putting the microfiche in alphabetical subject order are attractive but a moment's reconsideration reveals many difficulties. Firstly, that of the indexing itself : on which file would one put a paper entitled "A Modification of a Mass Spectrographic Method for Use in Trace Element Analysis of Glass Fragments Used in Survey Work"? I t could go on "glass", "instrumentation", "mass spectroscopy", "survey", "trace elements", etc. . . . and even cross referencing to the actual trace elements involved might be useful. Secondly, having prepared the index, how does one easily and quickly "update"? An annual updating already has been criticised by our users ; they want each monthly circulation to be indexed in a running form. Much thought has gone into this a t the Central Research Establishment and with Organisation and Methods. The use of automatic typewriters as well as special cards for Xerox reproduction liave been considered but the best long term bet would seem to be a computer retrieval service. In this way the key words, "glass", "mass spectrometry", etc. . . . are fed into the computer, and this can be done as soon as a new paper arrives in the laboratory, with the accession number. The user at his terminal then interrogates by feeding in his key words and receives back a series of numbers which send him to his microfilm bank. If the number is a very late one and the copy of the paper has not yet reached him, then he will telephone the Central Reserach Establishment and ask for a Xerox copy to be put in the post. In this way every Regional Forensic Scientist will have immediate access to the world's scientific literature. The computer we are negotiating to use is the Xational Police Computer and you will appreciate that one advantage is that of each Regional Laboratory is geographically close to a police terminal. I must stress that what I am discussing here is at the fringe of our current thinking but every profession has problems of information retrieval and I pass on our experiences and thinking in the hope that they may be constructive.

Laboratory Collections of Data At the beginning of this paper I mentioned laboratory collections ; this is a second area involving information handling and retrieval that I would like to discuss. Firstly, there are fairly straightforward collections which can be centrally handled, for example, boot and shoe impressions. The purpose of having a set

of patterns in a Regional Laboratory is twofold ; firstly, it assists in crime detection in that a relatively uncommon pattern may be of value in tracing the suspect, and as many thieves do not leave a pattern with the manufacturers' name on, a collection is essential. Secondly, the value of the evidence can be more accurately judged if the characteristics of the mark itself enable the scientist to say a particular pattern of shoe is involved. In these cases the fact that a pattern is common or rare has some significance. I do not pretend this is a comprehensive collection but I know of several cases where this centrally generated collection has been of value. Incidentally, we have found that Industry in general has reacted favourably to one single enquiry for information instead of individual requests from all Regional Laboratories. Directors of Laboratories have also realised the value of having replies to enquiries made available to all laboratories (what is one laboratory's problem today is another's tomorrow), and have delegated to the Central Research Establishment this particular role. I acknowledge particularly the co-operation of the Shoe and Allied Trades Research Association regarding shoe and boot prints. T h e Collection of Ultraviolet and Infrared Curves The second area of collection gathering involves analytical data. So far, at the Central Research Establishment the emphasis has been on ultraviolet and infrared curves. To the non-scientist, let me say that these are methods of identification involving the comparison of instrumentally drawn curves from the unknown substance with a standard collection of known compounds. The way one handles collections of this type depends on the size of the collection ; in some ways the problem is analogous to fingerprint searching, except that the curves are rarely distorted and the collections are smaller in number. As far as the ultraviolet curves are concerned, the Central Research Establishment made the curves for 700 "alkaloids" kindly supplied by Professor E. G. C. Clarke, arranged them in easily retrievable form (by coding in ascending major peak order), microfilmed them and circulated them to the Regional Laboratories. 700 may not seem a large number but the preparation of these curves and the corresponding very high quality infrared curves on the Perkin Elmer 225 took a considerable time-well over a year. Infrared curves are much more characteristic of substances than ultraviolet curves. A "match" on infrared curves amounts to a fingerprint identification because of the large number of inflections in the curve. Experience has shown that over 99% of the problems of infrared identification involving comparisons in the Regional Laboratories can be solved by having a master collection of about 3,000 curves. However, the total number of curves that exist is of the order of 100,000. I took the decision that the 3,000 should be made available in each Regional Laboratory and that on the rare occasions when a new compound came along, the Regional Laboratory should refer it to the Central Research Establishment for a full search. At the moment this occurs only about once a week and so far the Central Research Establishment's Sadtler collection involving 55,000 curves has sufficed for the purpose. ICI have kindly offered us the use of their computer search facilities but so far this year it has not been required. The collection of 3,000 compounds had to be generated and this was done in collaboration with the staff of the North Eastern Forensic Science Laboratory at Harrogate. I t is now in its third phase of updating. When the compounds had been obtained, the curves made, microfilmed and circulated, the problem of retrieval of an unknown had to be faced. To search 3,000 images to match a complex pattern every time one wanted to do a search was impractical and as we envisaged an expansion up to a 10,000 maximum, simple codings based on single peaks were also found to be inefficient. 157

The solution has been found in the use of the "peek-a-boo" type of feature cards, known as coincidence cards. The difference, however, is that our system uses polyester film and reproduction from the master is done photographically. In this way a cheap durable product is obtained and reproduction errors are nil. Updating also is greatly simplified. The selection of features for retrieval is a technical problem and we had the help of the Joint Home Office and Metropolitan Police ADP Computer Unit in deciding on the best combination of features to use. I will not go into detail here about the results of this investigation, as they have already been published (Curry, A. S., Read, J. F., and Brown, C., 1968). Suffice i t to say that each Regional Laboratory now has a set of 88 feature cards and two roll films by the use of which the scientist is able to identify any one of nearly 3,000 compounds of significance in Forensic Science and Toxicology in about 30 seconds. I t is hoped that these will soon be available commercially. The system works by the selection of 6 major peaks of the spectrum, superimposition of the relvant feature cards, and visualisation of the optical coincidence of the retrieval numbers of only those substances with the same characteristics. The system is applicable to other types of retrieval problems and already it is being considered for wood identification. I have not time to go further into information gathering and retrieval ; our first hesitant steps less than three years ago have led us on a fascinating path into a new technology. My hope is that the users of Information Division's work find it useful and that i t has been of interest.

Discussion Dr. K . D. Amarasi~zgham: Could members of foreign Forensic Science Laboratories be supplied with copies of retrieval information? Dr. Curry: I regret copyright restrictions do not allow circulation to other than Home Office employees as far as literature coverage is concerned. Mr. W . G. Scott: Does the system of forensic information and retrieval include information relating to questioned documents? Dr. Curry: KO. References CURRY,A. S., READ,J. I?., and BROWN, C., 1968, J. Pharm. Pharmacol., 21,224. Author's Note T h i s paper was written i n August, 1969. Since that lime m a n y advances have been made. Information Scientists interested i n recent developments are invited to communicnte with the author.