ht. Libr. Rev. (1978) 10,3-22
Script Conversion and Bibliographic of Documents in Dissimilar Scripts: Problems and Alternatives* HANS
Control
H. WELLISCHt
The question posed at the end of a previous paper on the subject of script conversioni-“Is script conversion, and Romanization in particular, really the only reliable bibliographic control tool?“-has actually found its negative answer in the facts and figures revealed by that worldwide survey. Many comments from respondents also made it clear that the method, far from allowing reliable control (i.e. unambiguous identification) of books and other documents in non-Roman scripts, generates bibliographic chaos for librarians as well as for the users of catalogs, bibliographies and other bibliographic control tools. Yet, in spite of the obvious shortcomings of script conversion as a bibliographic control tool, it has always been axiomatic in Western librarianship that Romanization is indispensable as an effective and “scientific” method when dealing with manuscripts, books, serials, and other documents not written in the Roman script. When Universal Bibliographic Control (UBC) began, about a decade ago, to move from an age-old dream into the realm of realizable fact, it was (and still is) also asserted that it would be indispensable for its full achievement to convert or “reduce” all bibliographic records written in any script other than the Roman by transliteration or transcription, or (since a mixture of both conversion methods is in practice * This paper is partially based on a presentation at the IFLA Worldwide Seminar on Use and Control of Eastern Publications by East and West, held at Seoul, Korea, 31 May5 June 1976, which has been published in the Proceedings ofthe ZFLA Worldztde Seminar (edited by Ke Hong Park, Dorothy Anderson, and Peter Havard-Williams) . Seoul : Korean Library Association, 1976. Permission to use this material has graciously been granted by Mr Ke Hong Park. 7 Associate Professor, College of Library and Information Services, University of Maryland, College Park, Md 20742, U.S.A. 1 Hans H. Wellisch (1976). Script conversion practices in the world’s libraries. ht. Libr. Rev. 8,55-84.
4
H.
H.
WELLISCH
used for this purpose) by Romanization. It is thus claimed that records in other scripts in which so much of mankind’s epics, poetry, imaginative writing, and scientific thought has been preserved, cannot be integrated into the world-wide network of UBC unless they are stripped of their characteristic form of expression and are “reduced” to romanized forms. Only Romanization, so the conventional wisdom says, will make it possible for bibliographic records to be handled by machines and by Western librarians and bibliographers in general, who (by implication) are thus supposed to be the only ones capable of wielding bibliographic control in the proper manner. Although the proponents of Romanization act mostly in good faith and without any malicious intent, this relentless drive towards wholesale Romanization of bibliographic records for the sake of UBC is in fact harmful to those for which the system purportedly has been set up, namely the readers and users of written records. One of the most disturbing aspects of the demand for complete Romanization in the name of exercising control over works written in the “strange” and “exotic” scripts of Asia and Africa is the fact that the practice has in recent years been imposed on the libraries and bibliographic tools of countries in which a user naturally expects to find documents organized in his own language and script. This is a tangible expression of haughty disdain for indigenous cultures on the part of Western librarians who seek to graft their systems on the records of those cultures for the sake of “efficiency” and the “introduction of modern methods”. Unfortunately, some librarians in Asian and African countries (especially those who received their training in the United States or in the U.K.) tend to adopt these practices without considering the consequences to their countrymen whom they are supposed to serve. By using the Roman script as a control tool in their libraries they imply that the indigenous script, though it has produced the national literature, is not good enough to be used for the organization of the books and other documents in which it is recorded. But by degrading the traditional values of their own culture they may end up only degrading themselves as well as the image of their libraries in the eyes of their fellow citizens. They would do well to heed the advice of the Canadian librarian J. M. Elrod, who spent a long time in Korea, and thus gained a better insight into the problem than most of his Western colleagues : Present through subject patrons librarian
Anglo-American rules assume that the approach to the collection will be the roman alphabet in author and title entry and through English in headings. This assumption cannot be made in a collection which serves many of whom know neither the roman alphabet nor English. The in a country with a perfectly good alphabet, character system, or
DISSIMILAR syllabary his own
SCRIPTS:
PROBLEMS
of its own would find little reason language into the Roman alphabet
AND
5
ALTERNATIVES
to transliterate authors and titles in nor to use English subject headings.1
Romanization is indeed by no means the only method by which full UBC of documents in various scripts can be achieved. As we shall see, it is possible to preserve the unique identity of written records in their original form, and yet to integrate them successfully into an international network of bibliographic control, while eliminating the barriers of Romanization which for so long have prevented rather than facilitated access to a large and important part of the written records of mankind. But before we turn our attention to alternative methods of bibliographic control, it is necessary to consider, first, Romanization as such, and second, its historical role in bibliographic control. ROMANIZATION
AND
ITS
PROPER
USES
It should be clearly stated from the outset that Romanization (or any other kind of script conversion, such as Cyrillization, Arabization, Japanization, etc.) are not to be considered as being either impractical or unnecessary per se. Quite to the contrary, in our age which has reduced the time needed to cover vast physical distances to a matter of a few hours, people of different cultures, speaking and writing different languages in different scripts, meet more frequently than ever; conconsequently, the written records of various peoples and cultures also come into contact with each other through translations of literature and in the news dispatches of the press, radio and television. This is now also more than ever a multi-directional phenomenon: the Western world, largely using Roman script, must constantly render Russian, Arabic, Hebrew, Indian, Chinese, Korean, Japanese and other names in a form in which they can be read and pronounced by people who do not know those languages and scripts; to the same extent, Russian authors of books and editors of newspapers must render such names and words for the readers of Cyrillic script,2 and Chinese, Korean or Japanese authors or editors must make the names of foreign personalities and events intelligible to their readers. Script conversion is also a necessary operation for the authors of books and articles in history and geography, and especially in cartography, where the names of foreign places must be rendered in the script that is familiar to the users of maps, and in a form that at least approximately renders the sound of those place names. 1 J. M. Elrod (1972). Cataloguing 1,7. 2 L. Zorin-Obrusnikova
Rec.8,353-5.
The
two-language (1976).
Practical
collection
with
transcription
the bilingual and
reader.
retranscription.
Intevzztional ht.
Libr.
6
H.
H.
WELLISCH
Since there is almost never a one-to-one relationship between the spelling and pronunciation in a source script and that of a target script, the results of script conversions are neither uniform nor mutually consistent, and great confusion is sometimes generated, especially when Western names have to be rendered in Chinese or Japanese whose writing systems do not allow for anything but a very vague approximation of pronunciation. But, by and large, the various script conversions produce more or less intelligible results, since in the case of ambiguity the context or other back-up systems (such as the grid coordinates in cartography) furnish sufficient information for the identification of names and words in foreign languages. ROMANIZATION
IN
HISTORICAL
PERSPECTIVE
Script conversion, and in particular Romanization has, however, been most widely used for the purposes of bibliographic control. The practice goes back to the days of Imperial Rome when Greek names had first to be rendered into Latin, and to the early Middle Ages when Hebrew names in the Bible were romanized in the Vulgate by St Jerome. Several hundred years later, the names of Arabic authors were Latinized (e.g. Averroes, Avicenna, Alfarabius) and became known in these forms to rather than European scholars; these, however, were Latinizations Romanizations, i.e. there was no attempt to render Arabic script in Roman letters. Romanization of names was introduced into bibliographical practice in the 16th century by Conrad Gessner, the “Father of Bibliography”, and the first to conceive the idea of UBC in his monumental work, the Bibliotheca Universalis. These Romanizations remained, on the whole, the only conversions of names and titles used in the catalogs of libraries and in bibliographies until Russian literature began to appear in the libraries of the West in the late 18th and early 19th centuries. The system that had worked well for Greek and Hebrew script could also be applied to Cyrillic script, and the predominant position of European scholarship in all spheres of learning made it unnecessary to even contemplate the problems which other cultures and their writing systems might encounter with dissimilar scripts (including the various adaptations of the Roman script to European languages and their alphabets). At the end of the 19th and the beginning of the 20th century, when the zenith of Western colonialism in Africa and Asia had been reached, “Civilization” still meant largely Western culture and ways of thinking; the literature of Asia was studied in the West mainly by philologists who were more interested in the extentional features of language and script than in the intentional
DISSIMILAR
SCRIPTS
: PROBLEMS
AND
ALTERNATIVES
7
character of what those documents contained in spiritual, philosophical and scientific values. A corollary to this was the notion that all books and other documents written in scripts other than the Roman must of necessity be “reduced” to a romanized form if they were ever to be handled in an orderly manner in libraries and bibliographies. Now that the days of Western colonialism are past, the idea of Romanization as the sole way of exercising bibliographic control over documents in dissimilar scripts may perhaps be considered as one of the last remaining vestiges of European predominance and exploitation in Africa and Asia. If we really wish to achieve the goal of UBC we must abandon the erroneous notions that bibliographic control is a one-way street, and that it can only be achieved through the medium of the Roman script. To be sure, it is the most widely used script in the world,1 but it would be presumptuous to suggest that the vast literature written in Cyrillic, for example, can be made subject to UBC only on condition that all bibliographic references to it be totally romanized, and the same goes for literature written in other scripts, particularly in the logographic scripts of China, Korea and Japan, for which Romanization is indeed least effective and often essentially destructive from the point of view of UBC. THE
DIFFICULTIES
OF ROMANIZATION
LVe shall now briefly review beset Romanization as a librarians and bibliographers actually apply the method systems, they are confronted
FOR
OPERATORS
some of the most obvious difficulties that bibliographic control method. As far as are concerned, that is to say, people who as operators of their bibliographic control with the following problems :
(a) Multiplicity of schemes For every non-Roman script there exist today at least three and sometimes up to several dozen different conversion schemes. As our previous survey has shown, there is great diversity of usage, ranging from complete Romanization (or Cyrillization, etc., as the case may be) to entirely separate catalogs for each script and language; as to Romanization, no one scheme can claim to be used by a majority for the conversion of any non-Roman script. There are many societal, technical and economic reasons for this variety of usage, and it would be unrealistic to believe that any degree of uniformity will be achieved on an international scale in the foreseeable future. 1 For an analysis of the major scripts Hans H. Wellisch (1975), The relative 238-50.
in terms of literacy of users and book production importance of the world’s major scripts. Lihi
see 25,
8
H.
H.
WELLISCH
(b) Inconsistent application and local adaptations Even where one single Romanization scheme is in use for a particular script, it is often not consistently applied, and in many cases locally devised “adaptations” are made to national or international schemes, often without any indication to outside users what those changes imply and when or to which names and words they are applied. This in turn results in : (c) Variantforms of romanized names Names that have been romanized according to different schemes or national usages may appear in quite different form in various bibliographic control tools. 1 The remedy for such bibliographic confusion is often thought to be the adoption of international transliteration standards, at least by the world’s foremost libraries and by national bibliographies. Quite apart from the fact that many of these institutions are either reluctant or unable to adopt a Romanization system that is different from the one they have employed for a long time, the use of international standards would sometimes make things worse rather than better, because at least some of the International Organization for Standardization’s (ISO) Recommendations for transliteration have been designed with the needs of philologists in mind rather than with an eye to practical bibliographic applications. So-called “strict” transliteration which endeavours to reconstruct as faithfully as possible the image of words written in a non-Roman script with the help of Roman letters is, of course, useful and sometimes necessary for philologists and linguists, but it is almost useless for bibliographic purposes. For example, according to Recommendation ISO/R843-1968, Transliteration of Greek Characters into Latin Characters, the name of the great Greek poet which has been known to the Western world in its romanized form for more than 2000 years as Homeros or Homerus is now to be rendered as ‘Om;ros, and would thus be filed in a catalogue under 0, not H, where probably few library users would look for it. But such discrepancies in Romanization of names are minor nuisances compared with the odd and bewildering results of Romanization of Chinese and Japanese names. Most Western librarians seem to believe that Romanization of names written in Chinese characters is as straightforward as that of Greek or Russian names; they do not realize the complexities generated by the homophony of Chinese syllables or by the
1 See H. H. Well&h
(1976).
ht.
Libr. Rev 8,55-84.
DISSIMILAR
SCRIPTS
:
PROBLEMS
AND
ALTERNATIVES
9
intricacies of Japanese onomastics. 1 The romanized Chinese syllable 6, for example, may stand for any of several dozens of Chinese characters, each of which means a completely different thing. The concatenation of three Roman letters hsi does not, therefore, mean anything to a reader who knows Chinese unless he is able to see the corresponding Chinese character; on the other hand, it is not even pronounceable in a manner remotely similar to its actual sound by a person who does not know Chinese, so actually nobody is really served. Generally speaking, romanized Chinese names cannot bedisambiguatedinorder tolocate them in Chinese reference tools, unless accompanied by the original script. As to Japanese names, a person who signs himself identical while
characters) is to be properly the name
of another
romanized variously (and Takano, depending on the versely, the romanized form different characters (see Fig.
FIG 1. Romanization (Chinese characters).
of a Japanese The name Kbno
(i.e. with four IT%? romanized as “Hiradaira, Heibei”,
Japanese
author,
$ s1 may be in each case correctly) as K&.0, K@a or desired “reading” of the characters; conK&o can be written with no less than ten 1) so that a Japanese reader of a romanized
name according to different can be written in ten different
s la
“readings” ways.
of the kanji
1 The following statement by an American librarian in charge of an Oriental collection be typical : For some, the omission of Asian script bodies of entries is objectionable; some would even say it renders the reproduction useless. I do not believe this is so. It is, I believe, a matter of degrees of tolerance of lack of information. (J. Musgrave (197 1). Automated bibliographic control of area research materials. In Conference on Access to Southeast Asian Research Material. Proceedings, edited by C. Hobbs. Washington, D.C.: [s.n.], p. 116.) One may well ask: how tolerant are readers expected to be when presented with a completely meaningless jumble of Roman letters which purportedly stand for the name of a Chinese or Japanese author and the titles of their works? may
10
H.
H.
WELLISCH
list of names cannot ever know which author is meant unless he also sees the name written in its original form. It must be emphasized that these examples are by no means artificially contrived; rather, the Romanization of almost every Japanese name is fraught with such ambiguities which can often be resolved only by asking the bearer of a name how he or she wishes it to be pronounced or “read”.1 (d) Reversibility To any unbiased observer, such results of Romanization would seem to be the exact opposite of control, namely chaos and confusion. But the strong belief of most librarians and bibliographers in Romanization as an effective bibliographic control tool is based on the assumption that any Romanization, if properly or “scientifically” done, is also reversible, i.e. the original script can always be reconstituted from its romanized form. Unambiguous re-transliteration is, however, possible to achieve only for pairs of fully alphabetic scripts, and even there it is in practice mostly sacrificed because of the more potent demands for easy pronounceability and approximation to the phonology of a foreign language rather than strict rendering of the graphemes by transliteration. The Romanization of Cyrillic script as performed by the Library of Congress furnishes good examples: the Cyrillic letter & is rendered as 2 in Romanization from Russian, but as io when romanized from White-Russian, to indicate its proper pronunciation, not its graphic form (which is identical in both languages); the same is true for the letter F which is rendered as g in Russian and Bulgarian, but as h in Ukrainian and White-Russian. For languages written in scripts that do not indicate vowels at all (or only partially), e.g. Arabic and Hebrew, true reversibility is possible only for consonants but not for vowels, unless the latter appear in the original text which is very seldom the case. Finally, for logographically written languages-Chinese, Japanese and Korean-no reversibility is possible at all, since any rendering of Chinese characters can only be done phonemically, that is by trying to approximate the sound of a character when pronounced in a particular Chinese dialect, or in Korean, as the case may be. in a certain Japanese “reading”, Figure 2 shows in schematic form the extent to which reversibility is possible to achieve in Romanization of various scripts. In conversion from Roman script to a non-Roman one, or between a pair of non-Roman scripts, strict transliteration and reversibility are practically impossible to achieve, because many Roman letters stand 1 On the Romanization of Japanese names see P. G. O’Neill (1972) Japnnese Names: A Comprehensiue Index by Characters and Names. New Tsukamoto (1973). Can uniformity in transliterating Proc. 29th Intern. Gong. Orientalists, Paris, 1973.
York: J. Weatherhill. Japanese personal
See also Jack T. names be achieved?
DISSIMILAR
SCRIPTS
: PROBLEMS
AND
ALTERNATIVES
,-
1 Amharic Arabic Armenian Burmese Hebrew Thai Other
Cyrillic Devanagari Georgian Greek
FIG. 2. Graphemic for various
(G)
and phonological
11
(P) features
Chinese Japanese Korean
characterizing
Romanization
schemes
scripts.
for different sounds not only in the same language (e.g. c, g, h, s,y in English) but are not pronounced in the same way in different languages (e.g. j in English, French, German and Spanish); also, virtually all orthographies are more or less unphonetic, i.e. they do not reflect the pronunciation of words unambiguously. Any rendering of names and words in languages written in a non-Roman script, whether from Roman script (e.g. English into Russian, German into Greek, etc.) or from another non-Roman script (e.g. Arabic into Russian, Japanese into Greek, etc.) can only be performed by phonological transcription, i.e. by a more or less faithful rendering of the sound of such names and words. Figure 3 shows this in schematic form: graphemic features of a source script (e.g. letters, diacritical marks, etc.) can be converted into target scripts only partially, depending on the characteristics of the latter-more faithfully in an alphabetic script such as Cyrillic, less so in Arabic or Hebrew, not at all in Chinese; phonemic features of a source script have to be rendered to a lesser or greater extent in all alphabetic scripts, and they are indeed the only ones that can be rendered in Chinese or Japanese.
r
All alphabktlc
FIG. 3. Graphemic Cyrillization,
(G) and phonological Hebraization, Grecization,
scripts
(P) features characterizing Sinization, Japanization,
Chinese’ Japanese Korean
schemes etc.
for Arabization,
H.
12
H.
WELLISCH
(e) Susceptibility to errors The complexity of non-Roman writing systems (some of which do not express vowels graphically, while others contain letters and letter combinations which are not pronounced, and so on) and the intricacies of certain Romanization schemes combine to produce inevitable errors and mistakes in conversion which will happen no matter how well the Romanizer is acquainted with the source language and its script. Such mistakes are often not detected and corrected at the source, are then transmitted to print, and find their way into catalogs and bibliographies where they wreak havoc with any attempt by users to find out which name or word was written in the original form.
(f)
Alphabetization The inter-filing of entries for documents originally written in Roman script with those that are artificially romanized results in many inconsistencies and leads to further errors; sometimes, two entirely different Romanizations are used within the same bibliographic system: in the Library of Congress catalog, the Cyrillic word stem uearpan- is romanized tsentral when it appears in a Russian title, but central when it is part of a Serbian title, so that the very same word, identically written, is filed under T and under C; there is, to be sure, a reason for this procedure1 but the fact remains that Romanization often does not result in uniquely retrievable entries, which runs counter to the whole idea of bibliographic control. THE
DIFFICULTIES
OF ROMANIZATION
FOR
USERS
Let us now turn to the difficulties experienced by uSersof bibliographic control systems (which here means users other than librarians themselves). In addition to all the problems discussed before, users are faced with an almost complete lack of keys to a Romanization system which is almost never made explicit to users of catalogs or bibliographies. The burden of deciphering a Romanization in order to arrive at the original form of a name is entirely on the user and it amounts in many instances to exercises in cryptography, or the decoding of a message to which the code key is available only to the senders (the librarians) but not to the receivers (the library users). 1 Serbo-Croatian is virtually one language, written in Roman script by Croatians in northern and western Yugoslavia, and in Cyrillic script by Serbians in the southern and eastern parts of the country. The Library of Congress adopted the Croatian transliteration of Serbian Cyrillic because (a) the Serbian Cyrillic alphabet contains several letters not found in Russian Cyrillic, for which only Croatian transcriptions exist, and (b) the works of Yugoslavian authors may and do appear in either scripts or in both scripts, depending on the place of publication.
DISSIMILAR
SCRIPTS:
PROBLEMS
AND
13
ALTERNATIVES
While names are sometimes recognizable even in distorted form (especially if the author is well known), the wholesale Romanization of titles is almost useless, the more so since there are generally no dictionaries where words in romanized form can be looked up; such dictionaries exist, to be sure, for Japanese, but they are intended mostly for the use of Japanese readers and not for foreigners. In the case of Chinese titles, a Romanization such as “Kuan yii fa than kuo min thing chi ti ti i ko wu nien (i chiu wu san nien tao i chiu wu ch’i nien) chi hua chih hsing chieh kuo ti kung pao”l is completely meaningless for a IVesterner, but it is equally useless for a Chinese reader unless it is also accompanied by the original scrpit l&q -G+- * .lp, tq j& & ;9,:+ A$ g -c $g ,?i -T(l953-+3] 1957-+)3-f 41 $- 4jd && .% .hq I/= $i; Since in most bibliographic Romanizations the tone indicators for Chinese are omitted, the Romanization of Chinese text becomes even more ambiguous and undecipherable. 2,s What has been said here of Chinese applies to the same extent to Korean, and even more to Japanese. Finally, romanized bibliographic control systems deprive users of the most basic aid to the identification of a name in its original form, namely cross-references from a romanized form of an author’s name or title, e.g. DuKeBuS seeJuSkeviC, or
CYBERNETIC
seeChao.
ANALYSIS OF SCRIPT CONTROL TOOL
CONVERSION
AS A
Even the most ardent champions of Romanization have always admitted that the practice is more of a hindrance than a help to those who can read other scripts, but they insisted that Romanization had to be performed in the interests of catalogers and reference librarians, as well as for the benefit of Western scholars, so as to be able to exercise at least a 1 LC card C60-1496. The example is taken from Lee-Hsia Hsu Ting (1966). Problems of cataloging Chinese author and title entries in American libraries. Library ILuorterly 36, 1-13. Translation: “Report on the result of the implementation of the first Five Year Plan (19531957) for the national economy”. Note that the dates. given in the original in Arabic numerals, have also been Romanized. 2 “The main complaint of the readers has been that without seeing the Chinese characters it is impossible to read off the meaning of these transliterated headings.” K. Kishibe (1974), “Cataloguing books in Chinese,“. Zntewzdonal Otaloguing 3,4. 3 “A user who has only Romanized information often finds it difficult, if not impossible, to make use of a catalog arranged on the basis of characters. This is a common problem, since many publications . . . frequently provide Chinese-language bibliographic information only in Romanized form.” J. D. Anderson (1974). “The arrangement of Chinese-language authortitle catalogs”, Library Quarters 44,54.
14
H. H. WELLISCH
modicum of bibliographic control over documents in dissimilar scripts, especially by librarians who did not know those scripts and languages, and when only clerical help was available for the duplication and filing of catalog cards provided by some service elsewhere in romanized form. So far, we have only pointed to some operational shortcomings of the method which, after all, is a matter of subjective evaluation and different opinions. But because Romanization is considered to be a method of control, one can examine its effectiveness by applying to it the entirely objective laws of cybernetics, the science of control, which are valid for all control processes, whether in machines, living organisms or conceptual systems. It can be shown that the problem of bibliographic control over documents in dissimilar scripts cannot be solved by graphemic transformations of any kind. To develop the proof for this statement in full would go beyond the framework of this paper, and will be found elsewhere.1 In essence, it depends on the fact that the amount of variety introduced as input to a bibliographic control system by documents written in scripts that are dissimilar from the one that is dominant in the system, cannot be effectively controlled by purely graphemic methods, thereby making the output partially or totally ambiguous and unintelligible, which defeats the primary goal of the whole system.
POSSIBLE ALTERNATIVE
SOLUTIONS
If Romanization is ruled out as a useful instrument for effective bibliographic control of documents in non-Roman scripts, and particularly of those written in logographic scripts, there is clearly a need for other, more reliable, methods which will provide control over documents for both the operators of bibliographic systems (library catalogs, bibliographies, indexes, etc.) and for the users of these tools who wish to find and use documents. In the following, we shall give only the briefest outline of such possible alternative methods, all or some of which may have to be used in future world-wide bibliographic control systems for documents, regardless of the language and script in which they are written. Some of these methods are already in use, at least partially, others will have to be developed by collaborating teams of language experts, librarians and bibliographers, and specialists in the handling of graphemic data by computers and associated devices. 1 Hans H. Well&h of Their Characteristics,
(1977). The Conversion of Scripts: Transliteration History and Utilization, pp. 398413. New York:
and Trmwifition, Wiley-Inter-science.
a Study
DISSIMILAR SEPARATE
SCRIPTS: LISTINGS
PROBLEMS BY
SCRIPT
AND AND
ALTERNATIVES
15
LANGUAGE
The totally “integrated” and romanized catalog which for so long has dominated the thinking of Western librarians can now safely be said to be not only impossible to achieve but also to be a hindrance rather than an aid to users of documents in non-Roman scripts. The first, and most important, step towards better bibliographic control is the separation of bibliographic references by script. In many instances, there must also be a separation by language: it would be unhelpful to have one listing in Arabic script for works in Arabic, Persian, Urdu and Ottoman Turkish; it is also advisable to separate Chinese from Japanese and Korean works, and it is essential to list Hebrew and Yiddish works in separate sequences, to name only a few examples. Here, too, the fact that so many different languages with widely varying orthographic practices are normally listed together in one unbroken alphabetical sequence in Roman (and to some extent in Cyrillic) script cannot serve as an example for other scripts which are used by different and linguistically unrelated languages. As our survey has shown, the method of separate catalogs by script and/or language has, in fact, been adopted by several European libraries with large non-Roman collections, it is the rule in the libraries of the Soviet Union, and it is also applied by a large majority of Asian libraries with multi-script and multi-language collections. Among the latter, 80% of all university libraries, 73% ofpublic libraries, and 90% of all special libraries in Asia which responded to the survey had separate collections by script and language. Only national libraries in Asia tend to favor Romanization of entries, perhaps under the influence of the practice of other large national libraries in the West, trying to make an effort to be “helpful” to their Western colleagues and to contribute their share to UBC. As we have tried to show, this is a largely misguided or misunderstood effort that is not likely to further the true aims of UBC. In the long run, it would be more helpful if the national libraries of Asian countries would produce authoritative entries for documents in non-Roman scripts (particularly those in the script or scripts of their own countries) which could serve as perfect models for the catalogs of Western libraries where the necessary expertise is either lacking or not available to the same extent, but where excellent technical facilities exist to reproduce original entries in various ways (photographically, photo-mechanically or electronically). It goes without saying that such entries in the original script must be accompanied by Romanization of names and titles with references to the form and filing arrangement of the non-Roman retrieval tag. The difference between such cross-references to non-Roman main entries and the partial or full
16
H.
H.
WELLISCH
Romanization of such main entries as practised now is very important: the main entry in its original script and language needs to be listed only once, and can be found in its proper place by anybody who knows the language and is therefore able to read the document to which the entry refers. Cross-references in romanized form serve those who do not know the language to show that such and such a document in Arabic, Hindi, Chinese, etc. exists, and they can be made from as many access points as are needed according to the various ways in which a name may be romanized (or in which even the same author may have romanized it at different times and in different works). Wholesale Romanization in Western library catalogs and bibliographies is still often justified by the proponents of this method by pointing to the technical difficulties of printing in various scripts, lack of skilled typographers, and attendant high costs of production if nonRoman scripts are to be reproduced in the original form. But in our age of ever more centralized cataloging, cold-type composition by computercontrolled photo-typesetting devices and inexpensive photolithographic printing and duplicating processes, not to mention typewriters with exchangeable “golfballs” capable of carrying various alphabets, there is no reason why bibliographic entries cannot be produced as economically or perhaps even at less cost than romanized entries. After all, language experts have to perform the Romanization of an entry (a process that is time-consuming and therefore costly). If these same experts were employed to produce camera-ready entries it would take less time and would cause fewer errors than any Romanization of the same text; the entries could then be disseminated from a central point, such as a national bibliography or a national library, to be copied and integrated into other bibliographic tools in the same way in which this is now done with romanized entries produced by the Library of Congress and other central bibliographic services. PROBLEMS
OF LOGOGRAPHIC
SCRIPTS
The methods pointed out above are admittedly better suited to alphabetic scripts than to logographic ones, namely Chinese, Korean and Japanese. In these scripts, it is vitally important to preserve the image of the graphemes, while at the same time tremendous difficulties arise in their sequential ordering (the equivalent of alphabetization) and in graphic output by machines, especially computers. Obviously, conventional line printers cannot be used to print Chinese characters. But it does not follow that therefore all entries for Chinese, Japanese or Korean works have necessarily to be romanized. It is important to
DISSIMILAR
SCRIPTS
: PROBLEMS
AND
ALTERNATIVES
17
distinguish between the requirements of input and output which, in the case of Chinese logograms, may be substantially different without impairing the control function of a computer-produced bibliographic listing. Above all, it is not absolutely necessary for the control function that a computer be able to print out actual Chinese characters. A method of representing Chinese logograms by different yet unambiguous graphemes has been used for many years, long before the first modern computers were even thought about: it is the simple expedient to use the four-digit serial numbers assigned to each character in Mathew’s Chinese-English dictionary and to write “Chinese by numbers”, as it were.1 While this is a cumbersome method when used manually, it seems to be ideally suited for the treatment of Chinese logograms by computer: input of text can be in numerical form only, and output may either be in the same numerical form or in the form of the actual logograms, depending on the sophistication of the equipment used. Purely numerical output would make it necessary for a user to look up the characters in an index,2 a process which, although somewhat cumbersome, takes far less time for a reader of Chinese than trying to figure out which of the many different meanings a romanized form such as shih might have (especially if it is written without tone indications, as is usual in bibliographic practice). As to computer-controlled printing of Chinese characters, the National Diet Library in Tokyo has successfully shown that this can be done (at least for the most often used characters) ; in the United States, the IBM 3800 Printing Sub-system, developed in 1975, is capable of printing katakana at high speed by means of laser beams, and several different projects are presently concerned with the storage and reproduction of Chinese characters by computers.3 The difficult problems of filing Chinese characters is also solved to a certain extent by a numerical system, because the numbers, when 1 A similar method has been in use for decades in order to transmit telegrams in Chinese, and seems to work well for that purpose. It was also used by George Sarton in his Zntroduction to the History of Science (Baltimore: Williams & Wilkins, 192748), in order to make the WadeGiles transcription of Chinese names unambiguous, as explained in vol. 1, pp. 47-48. The use of numbers for characters is not entirely unambiguous, because the indication of J. by 3097, for example, does not indicate which of the several meanings of that character is intended; but this ambiguity is inherent in the Chinese language and its writing system and cannot be resolved by purely graphemic transformations of any kind, but only by human recognition of the context. a Such an index has already been produced: Ching-yi Dougherty et al. (1963). Chinese Character Indexes. Berkeley: University of California Press. The work is in five volumes, allowing access to characters by the numerical telegraphic code, five different Romanization schemes, radicals, total stroke count, and by the Four Corner system. 3 A survey of recent developments with an extensive bibliography is W. Stallings (1975). The morphology of Chinese characters: a survey of models and applications. Computers and the Humanities 9, 13-24.
18
H.
H.
WELLISCH
arranged by their numerical value, will automatically result in an arrangement of characters identical with that of the key dictionary. The numerical system outlined here would perhaps not always yield perfect results (and it would have to be based on different dictionaries for Japanese with suitable provision for kana, and for Korean with provision for Hang61 letters) but it would certainly constitute a better control tool than the completely ambiguous and largely meaningless transcriptions which are now made for the bibliographic control of documents in these scripts. 1 Perhaps Western experts on Chinese, Japanese and Korean could now devote their energies to the elaboration of such a method rather than continuing the futile exercise of romanizing Chinese, Japanese and Korean names and words in a manner that is open to innumerable errors, mistakes and differences in national usages that vary enormously even in the respective country of origin (e.g. the various “readings” of Japanese personal names which are not obvious from the graphemes even for native Japanese readers, much less for a Westerner).
CENTRALIZED
COLLECTIONS
Separate listings of documents by script and language naturally require people familiar with these languages to produce bibliographic entries in catalogs or bibliographies and indexes, and such experts are often not easy to find. Romanization as a control tool is today mainly used just for that reason: it makes it possible for librarians and bibliographers to become, as it were, masters of all languages and scripts without knowing a single one of them except their own mother tongue and its orthography; the very same people who praise Romanization as the only viable bibliographic control tool would probably be appalled if librarians in another country would identify English, French or German literature by little colored dots because they could not read the script in the original. But a librarian who handles works written in Cyrillic, Arabic or Chinese by means of simulated and entirely artificial transformations into another writing system is equally “illiterate” from the point of view of a user who needs a Russian, Arabic or Chinese book, the author or title of which he knows in the original script. Separate catalogs and other listings would therefore probably mean 1 The use of numerical codes to represent Chinese logograms was recommended officially as long ago as 1961, when tbe National Science Foundation in its 1 lth Annual Report stated: “Use of the Chinese telegraphic code by groups working with Chinese texts has been recommended by the Foundation.. . in order to avoid the multiplicity of transliteration systems . . . (Pp. 1274).
DISSIMILAR
SCRIPTS:
PROBLEMS
AND
ALTERNATIVES
19
that collections of documents in non-Roman scripts would have to be concentrated in large research libraries which have the means to employ people who know the languages and scripts, and are able to deal with them adequately in terms of true bibliographic control. If bibliographic control tools in the form of catalog cards, entries in bibliographies or machine-readable data files are created at such focal points, other libraries may, of course, use these products only if they have the knowhow to deal with such entries in the original script, i.e. without any further transformation made only for the benefit of clerical personnel. If not, they ought not to keep collections in scripts and languages which nobody can handle and interpret for prospective users. The illiterate librarian who pretends to “know” all languages and scripts is not an ornament to his profession. What, then, should the user do who is in need of literature in nonRoman scripts but happens not to be near one of the focal points? He would probably be better served by directing his request directly to one of the large libraries that are known to have collections of documents in, say, Russian or Japanese; in these’days of inexpensive photocopying and micro-reproduction he might be able to get a desired document perhaps even faster and certainly with less frustrating and error-prone search procedures than if the document had been held by his local library where the librarian does not understand his request. UNIQUE
IDENTIFICATION
OF AUTHORS'
NAMES
The bibliographic identification of a work primarily by the name of its author is essentially a Western practice which has been adopted by Asian cultures only relatively recently. The principle of unique author identification for bibliographic control purposes has never been easy to apply and has posed innumerable difficulties even with relatively straightforward and stable names in the Western style which follow the pattern Christian name(s)-Surname. The principle breaks down, for all practical purposes, when it is applied to most African or Asian names. What constitutes the “main” part of a person’s name is difficult and sometimes impossible to decide even in that person’s native country; when a sometimes quite arbitrary Romanization is added to an artificially contrived rendering of an author’s name, the result is often a form which makes the name of an author impossible to retrieve for anyone but the cataloger or bibliographer who concocted that form of a name. It seems to be much more effective to list an author’s name in the form in which he himself wrote it on the title page of his work, even if this is a Romanization that does not conform to any particular transcription
20
H.
H.
WELLISCH
scheme used by a library; at least, this is the form in which that person wants to become known in the Western world. Often there is considerable inconsistency in personal usage: the same author may sign his name in romanized form variously as Mahmud, Makhmud, Mahmood, Makhmood, Machmud, Mahmoud, Makhmoud, or as Cheng, Jeng, Zheng, Zeng, and so on. In the case of Chinese, Korean and in particular Japanese names, unique and unambiguous identification by Romanization is practically impossible, and the idea that such Romanization serves the purposes of exact bibliographic control is a mere illusion.1 If unique identification of authors’ names is indeed important for a particular purpose, it would probably be much better to assign numerical or alpha-numerical codes to them which are independent of language and script. A certain (albeit limited) precedent for such a method exists in the classification schedules of the Library of Congress, where “author numbers” have been assigned to thousands of authors in order to identify them unmistakably by a unique alpha-numeric code. UBC is already strongly committed to internationally recognized numbering systems such as the International Standard Book Number (ISBN) and the International Standard Serial Number (ISSN). Why not have an international system of author identification by an International Standard Author Number (ISAN)? This idea, too, is admittedly not new and it has been dismissed earlier as impracticable; but that was before the days of national and international computer networks, and when many countries (especially in Africa and Asia) did not yet have a national library or a national bibliography. Now that such an infrastructure exists to a considerable extent it would be worthwhile investigating the possibilities of such a method.
UNIQUE
IDENTIFICATION
OF DOCUMENTS
Of more importance to UBC than the identification of authors’ names is the unique identification of individual documents: books, reports, articles, and nonbook materials of all kinds. Romanization of titles is in most cases entirely meaningless, as we have seen, and it serves only as a purely mechanical means of subarranging entries under a romanized form of an author’s name. While transcribed titles are worthless for the 1 In a book with the title A New Phonetic A&abet for the Cantonese Dialect of the Chinese Longuuge (1953). Hong Kong: Yan Sang Printing Press, the author gives his name on the title page in two different forms, thus: Tam Wing Kwong (Tsam Weing Guong); the Library of Congress romanized his name as T’an, Juang-Kuang.
DISSIMILAR
SCRIPTS
:
PROBLEMS
AND
ALTERNATIVES
21
user who does know the source language, title trunrlations would be of much more help to those who do not know it but would like to find out what a work is about; in non-fiction, at least, the title often gives a reasonably good clue to the topic with which a document deals. But since translations may vary considerably, they could, of course, not serve as a unique means of identification. Here, too, a numbering system would assure better and much simpler control than the present methods. In the Western world, there is already the ISBN which is now rapidly encompassing more and more of European and American book production. Asian publishers and national bibliographies should make every effort to adopt the system on the largest possible scale, because in the case of books written in non-Roman scripts a unique identification number for each publication is even more essential than it is for Western book production. Recently, an expansion of the ISBN, namely a Universal Standard Book Number (USBN) has been proposed by F. H. Ayres et al.192 and this seems to hold great promise in the framework of UBC. As to literature published before 1969 (when ISBN was first applied) it is now covered to a large extent by the so-called “Mansell” numbers, the running nine-digit alpha-numeric code given to every entry in the Rational Union Catalog: pre-1956 Imprints which contains a large amount of books in non-Roman scripts. The ISSN system which now covers a very large part of the world’s most important periodicals can be used instead of cumbersome and ambiguous Romanizations of serial titles which, together with a standardized format for the indication of volume, date and pages, can serve as an unique and unambiguous retrieval code for individual articles or contributions to collections such as conference proceedings or year-books. The use of alpha-numerical identification codes for authors’ names and for documents would make it possible to handle bibliographic data entirely independent of the script in which they are written, both manually by clerical routines, and also an input to mechanized data bases. It would not necessarily be dependent on printed output in nonRoman scripts, as pointed out above (although at some central point in the system such output must be produced for the benefit of users of documents who know the script and the language. Such simplified clerical handling of bibliographic data would make the treatment of non-Roman scripts in separate sequences actually much less complicated than it may seem to be at first glance. 1 F. H. Ayres (1974). The Universal Standard Book Number (USBN) : a new method for the construction of control numbers for bibliographic records. Program 8, 166-73. of, and improvement on, Ayres’ 2 D. D. Beale and M. F. Lynch (1975). An evaluation Universal Standard Book Number. Program 9, 3545.
22
H.
H.
WELLISCH
CONCLUSION
Script conversion, though useful and necessary as an aid to the rendering of foreign names and words in various forms of written communication, is not an effective means of bibliographic control. The alternative methods briefly discussed here will probably be more effective for the unambiguous identification and retrieval of documents written in scripts other than the dominant one in a library, in a bibliography, or in indexes of various kinds. Other control methods may become possible with further advances in modern communication technology. vowever, the idea that this technology, primarily the use of computers and associated devices such as CRT terminals, demands not only Romanization of dissimilar scripts but complete suppression of any trace of another original script, if effective control is to be achieved, is entirely erroneous. Such ideas are the result of using modern technology in a mindless way so as to perpetuate methods of recording information that were invented in the age of parchment and goose quill. The practice of Romanization for library purposes had a limited usefulness when only a small number of fairly well-known names and titles had to be converted, and when bibliographic control was almost non-existent or in its infancy. It does not become a modern control instrument for an almost unlimited variety of names, titles, and scripts by being performed with lightning speed and electronically instead of by laborious handwriting. Rather, the present and potential possibilities of computers and their ancillary equipment ought to be harnessed to the task of recording, storing, and retrieving documents irrespective of the graphic form in which they are written, and in a manner that makes it possible for those who can read them to do so without having to overcome artificial barriers to access. It will not be easy to do this, not so much because of the technical difficulties involved, but because longstanding traditions, vested interests, and even rank prejudice will have to be fought. Yet if bibliographic control is to become truly universal, not limited to the Western world nor geared to the Roman script as the sole key to the world’s literary output, but encompassing the records of all ages, in all languages, and in all scripts-it will have to be done.