Update BTRON promises multilingual processing A criticism that applies to all current PC standards is that they are poor platforms for processing multiple natural languages. This poses problems for bilingual users, such as translators, educators, journalists, diplomats and those in international business. Software written for the same basic hardware standard can take a long time to be translated from one natural language environment to another. An example of this is Lotus 1-2-3 spreadsheet, which took two years to translate from English to Japanese. The problem arises because the leading hardware and software architectures were developed in the USA, where natural language processing requires only a small number of alphanumeric characters. Foreign language extensions to these systems have not proved ideal from the nonEnglish speaker's point of view. BTRON machines will have provisions for processing all languages in a uniform manner using a novel character code system, and software applications will have their messages and prompts compartmentalized from the main program to make them independent of any natural language. In processing the alphabets of English and other European languages - or even the 50-character Japanese hiragana and katagana syllabaries - the binary number assigned to each character need not be longer than 1 byte. The current international standard code for Roman letters, Arabic numerals, symbols and control codes used in English and other European languages - - ASCII - - only specifies 128 items using a 7-bit code and saves the eighth bitas a parity bit. On the other hand, the processing of human languages such as Chinese, Japanese and Korean, which use tens of thousands of Chinese characters, requires a 2 byte character code, which can handle a maximum of 65 536 characters. However, because the number of characters used in these languages greatly exceeds the number of keys on the keyboard, an intermediate 'software step' is also required to fetch the proper characters from RaM. In the case of Japanese, this is done by a piece of basic
software called a 'kana/kanji-conversion front-end processor' (FEP),which converts simple hiragana inputs into complex combinations of hiragana and kanji (Chinese characters). Although a unified character code for roughly 6500 characters is currently being used in Japan, there are a large number of kana/kanji-conversion FEPs on the market, and these FEPs are not employed in a uniform manner. Specifically, kana/kanji-conversion FEPs are used from inside word processor programs, between software applications and the operating system, and from inside the operating system. Moreover, there is no standard interface for use between these FEPsand software applications, which means not all kana/kanji-conversion FEPs can be used with all programs. To make up for the multilingual deficiencies of today's PCs and put the processing of all languages on an equal footing, the TRON architecture has been designed with provisions that will make it possible to mix any number of natural languages into a document. These provisions are • an efficient multilingual character code system • language-specifier codes to switch between hundreds of different languages • language-specific software packets that contain each language's input, display and sorting algorithms, in addition to its typesetting rules. The TRON character set contains 48 400 characters and can handle all major written languages, including some languages of academic interest such as Sanskrit. However, this character set is not based on a simple 2byte coding scheme as its size would suggest. Since it is used in conjunction with language-specifier codes, it has a 1-byte coding system for languages with a small number of characters (e.g. English and other European languages) and a 2-byte coding system for languages with a large number of characters (e.g. Chinese, Japanese and Korean). This saves space and increases efficiency in processing languages with small character sets.
The language-specifier codes that will be used in the TRON multilingual processing scheme also serve another purpose - - letting BTRON machines know how many different languages are being used in a document. The language-specific software packets in the TRON architecture will be based on a four-layer 'hierarchical environment' in which the algorithmic parameters, text storage, text display and fonts are logically separated from each other. These four layers are
• Language, which
contains the input, display and sorting functions, in addition to character mapping from input codes to storage codes • Group, which contains codes for uniformly storing text that can be written with either a single large character set or multiple character sets
I
User
[
I
(character
inputdevicedrive9
J -I
I'nput'onguogel 6oo, 0
[
I
I
Language [:Target language
I
]T.x, ,oro0.
On BTRON machines a piece of software called a character input device driver (CIDD) is used to manipulate the input algorithms that specify Language a n d Group codes. A normal CIDD assigns a code to each key on the keyboard, but special CIDDs can be developed to handle input from hardware such as optical character
readers
0141-9331/89/08556-05 $03.00 © 1989 Butterworth & Co. (Publishers) Ltd 556
Microprocessors and Microsystems