Pass-sentence— a new approach to computer code

Pass-sentence— a new approach to computer code

Computers & Security, 13 (1994) 145-160 nce a new approach to computer code Yishay Spector’ and Jacob Ginzberg* to constructing and This artic!c p...

1MB Sizes 0 Downloads 58 Views

Computers & Security, 13 (1994) 145-160

nce a new approach to computer code Yishay

Spector’

and Jacob Ginzberg*

to constructing and This artic!c prcscnts a new approach identifying compurcr codes. The new mcthodoiogy is bawd on conceptual processing of natural language using a ‘passbcntcncc instcad of a password. Tbc focal point of rhc as mcrhodology is the use of- a code based on scnlanrics opposed to the syntax-based codes in use‘ today. Thih method has the following advantages: it is (I) significantly mm mcnmrable; (2) able to control the level of data security; (3) tlcxiblc in pcnuitting diffcrrnt lcvcls of access; and (A) less vulncrablc than passwords to most con~monly used codcbreaking techniques. A computerized prototype based on the principles of a conceptual coding methodology is prcsentcd. The prototype is only an cxamplc of the many possibilities inhcrcnt in dcvcloping ‘intclligcnt ~nodcls for understanding and identifying conceptual computer codes. K~~yworLs: Computer wcurity. and syntax-based cotnputcr Natural language processing.

Cwnputcr pasnwrds. Semantics c&s, Conceptual dcpcndcncy,

1. Introduction

T

hc area of data security covers a broad range of subjects, from protecting equipment and data against natural disasters to protecting information systems against computer viruses. This study focuses on a new method of protecting information rcsourccs against ‘intruders’. By intruders of computer WC mean unauthorized users

0167-4048/94/$7.00

0 1994, Elsevier Science Ltd

rcsourccs. Intruders can attack computer systems in various ways: they can intcrccpt transmitted passwords, copy password files, guess the passwords, etc. [I, 2). In a survey conducted by Hagopian [3] it was claimed that only 1 O% of all &sting information systems have adequate means of protection. Only one in a hundred attempts at unauthorized USC of information systems, whcthcr successful or not, is detcctcd. In most casts the owners of the system arc unaware cvcn that the offcncc has been committed [ 31, which indicates the seriousness of the problem. Several new approaches to user identification arc now commonly acccptcd to solve the problem of data security. Among these arc: fingerprinting, magnetic cards, smart cards, voice or handwriting identification and so on. Howcvcr, thcsc relatively new methods arc costly to implement and some of them, such as voice recognition and handwriting identification, have not been sufficiently developed technologically to make them commercially viable. Another problem with thcsc tcchniqucs arises from the fact that access to computer facilities is not possible from locations uncquippcd with the appropriate identification installations. With the current state of the art in the world of dcccntral-

145

Y: Spector and J. GinzberglPass-sentence

izcd computer systems and widespread cations, this is a cardinal constraint [4].

communi-

For these reasons the most cffcctivc and still the most widely used means of protection is the password. By means of a password the individual is idcntificd and authorized to cntcr the computer and make use of its information. The advantages of the password over other means of identification is that it is incxpcnsivc, convcnicnt, may bc used on any computer and does not rcquirc special hardware. Generally, cithcr the user chooses his or her own password and reports it to the system, or the system allocates a password to the user. The advantage of the former method is the cast with which the user is able to rcmcmbcr the password. The disadvantagc is that most users USC simple passwords that arc very easy to break 141. In the latter cast, the problem of east of code breaking can be rcsolvcd. However, the harder the password is to break, the more difficult it is for the mcrcly mortal user to rcmcmbcr. To ovcrcomc this obstacle the user adopts various means such as noting the password in an address book or cvcn on the computer terminal its&I Hcncc, both methods have in common the failing that they open the way to intruders willing to invest an effort to gain illegal entry to the computer facilities [5, 61. Hcncc, thcrc is an unrcsolvcd conflict concerning password methodology: making the password as complicated as possible (c.g. more random symbols, not less than tight symbols, etc.) in order to make it hard to break or to be gucsscd [I]; or making the password as simple as possible (e.g. using names, short codes, etc.), in order to make it easy to rcmcmbcr. The pass-scntcncc methodology paper attempts to rcsolvc this is not a mcaninglcss sentcncc characters, nor is it depcndcnt word. Kathcr it is a collection semantic meaning. The user can

146

described in this conflict. A passcollection of upon one single of words with choose a scntcncc

approach to computer code

with his or her own associations to facilitate the ability to recall it later. As will bc shown, it can be composed of eight or more words and still bc easy to remcmbcr. In addition, pass-sentence offers flexibility m t h c form of different access levels to stored information. And finally, as will be shown, pass-scntcnccs arc at least as secure as conventional passwords with regard to most code-breaking tcchniques. The next section presents the theory of conceptual processing of natural language which is the basis of pass-sentence implementation. In Section 3 WC dcscribc pass-sentence construction and in Section 4 WC illustrate the process of identifying a keyed-in pass-scntcncc. In Section 5 the methodology is cvaluatcd on the aspects of security level and cast of recall. Section 6 dclincatcs the computcrizcd prototype constructed according to the mcthodology. Conclusions arc discussed in the last section.

2. Background: conceptual natural language

processing

of

The roots of this work lie in Schank’s theory of conceptual dcpcndcncy (CD) processing. Schank [7] noted that many distinct verbs in the English language actually give rise to the same meaning. Let us consider, for example the following two scntcnccs: “Danny bought a pen from Jack” and “Jack sold a pen to Danny”. These two scntcnccs arc verbally diffcrcnt, but they share exactly the same meaning. Thcrc is no diffcrcnce in meaning whcthcr WC use “buy” or *sell” as well as other verbs like “give”, “pay”, “acquire”, “get”, “take”, etc. The basic idea is that English language has a rcduccd set of basic concepts, called semantic primitives. Each verb in the English language can bc essentially mapped into one of thcsc primitives. In his original study Schank d&cd 14 different semantic primitives [7, S]. Hcrc arc four of thcsc semantic primitives: Atrans-transfer of possession of an object from one person to another (buy, sell, trade, get, give, etc.)

Computers

Ptrans-physical transfer of an object from person to another (give, hold, put, hand, etc.) Mtrans-transfer etc.)

of mental

Ingest-digestion

of food (cat, drink, etc.)

one

In the next section it will bc shown how passscntcnccs can bc composed on the basis of conccptual processing of natural language. 3. Pass-sentence

construction

In general, pass-scntcnccs arc based on a unique understanding of meaning, and not on a unique representation of a character string. This means that a pass-scntencc can be rcprcscnted by a number of scntcnccs that arc syntactically different from each other but identical in their semantic meaning. IIcfinition

1-pass-scntcncc:

Pass-sentence is an entity with a dejhed and unique semantic meanin
For cxamplc: “Danny bought an interesting book from Jack for five dollars in the store.” This code has the unique semantic meaning of the Atrans type. Its attributes arc as follows.

Vol. 13, No. 2

l

object (an article convcycd to him)-the

l

s&r-Jack;

l

cost-money

energy (tell, explain,

Each semantic primitive has its own attributes. For cxamplc, Atrans has at least four attributes: buyer, scllcr, object and price. In addition, thcrc arc general attributes such as time, place etc. that can characterize any primitive. A third group of attributes may be used to charactcrizc the prcviously mentioned attributes in greater detail; for cxamplc, quantity and adjectives.

& Security,

book;

(dollars).

(b) Gcncral attributes: 0 place-a l

time-not

store; mentioned in the code.

(c) Attributes of attributes: l amount tnoncy); l

of money-five

the book-intcrcsting

dollars (quantifier

for

(adjective for book).

As WC can see, a pass-sentence dots not have to include all the possible attributes. For instance, the above cxamplc cxcludcs a time attribute. Morcovcr, each attribute can appear at a different detail level. For example, the attributes “buyer” and “scllcr” in the Atrans semantic primitive can appear at the following detail lcvcls: (a) General cases-organizations,

stores, people, etc.

(b) Individual casts-a particular person, a specific organization, or a specific store, such as Joseph’s store. In the above cxamplc, the place appears at the ‘general case’ level and the buyer appears at the ‘individual case’ level.

More explicitly, the pass-scntencc consists of the following stages:

methodology

(a) Defining the primitives

(a) Unique attributes: l

buyer-Danny;

In defining the conceptual primitives that will scrvc in the construction of computer codes it is

147

Y. Spector and J. GinzberglPass-sentence

not necessary to USC all the 14 primitives by Schank [7] .l (b) Defining

proposed

the attributes

The following attributes should bc dcfincd for each primitive chosen as a pattern for constructing conceptual codes: l

unique

attributes;

l

gcncral

attributes;

l

attributes

of attributes.

For each kind of attribute detail lcvcls.

WC can construct

several

approach to computer code

(I) The width dimension-this is the number of attributes that can be fomld in the pass-scntcnce. For example: actor-attribute, object-attribute, time-attribute etc. This dimension can be roughly compared with the number of characters in the password system. (2) The in-depth dimension-each attribute can bc dcscribcd by several layers of detail. For cxamplc: the timc-attribute can be described as a specific day within a specific month within a specific year. Gcncrally speaking, having grcatcr numbers in thcsc dimensions implies a higher security lcvcl. The security subject will bc further discussed in Section 5. (c) Analysis

(c) Formulating

the codes

This stage is carried out togcthcr with the user for whom the pass-scntcncc code is being constructed. The user is asked to formulate a code in the context of what has been dcfmcd at stages (a) and (b) above. This means that the codes should be in line with the primitives, attributes and detail lcvcls as d&cd in the previous stages. It should bc noted that only the types of primitives and attributes arc static. The specific concepts underlying thcsc types arc dynamic and dcpcnd upon the user’s codes. If certain concepts arc still missing, stages (a) and (b) above should bc rcvicwcd to enable the ncccssary rcfincments to bc made. For cxamplc, if the user uses the verb “get” WC should rcfinc the system with that new verb as another form of the Atrans primitive. (d) Analyzing

of code identification

the construction

111 analyzing the construction obscrvc two dimensions:

of the codes of the

code,

The code kcycd in by the user is compared with the original code, and a criterion of the level of identity with the original code is computed. This criterion takes into account the lcvcl of identity bctwccn the kcycd-in attributes and their original counterparts. As a result the kcycd-in code may bc acccptcd, or rcjcctcd outright; or, in an intermediate case, a rcqucst for further details may bc issued by the system. In the next of process

section, WC shall claboratc upon the identification of a keyed-in pass-

SClltC'llCC.

4.

Identification

In comparing original code, occur bctwccn codes:

of a keyed-in

pass-sentence

the keyed-in code against the WC identify scvcn states that can each pair of concepts from the two

WC (a) Full identity 13y full identity WC mean absolute identity (syntactical and conceptual) bctwcen the two concepts. Actually, this state means using the same true words in the two concepts.

148

Computers & Security, Vol. 13, No. 2

(b) Conceptual-only

cam thcrc is a lack of conceptual the two concepts.

identity

By ‘conceptual-only identity WC‘ mcan full conccptual identity, at the same detail lcvcl of the attribute but with use of synonyms in describing the attribute. For cxamplc, the word “buy” may bc used instead of “get”. (c) Under-detailing The conceptual identity bctwccn the concepts is full but the keyed-in concept is described at a lower detail lcvcl. For cxamplc, the USC of the attribute “something” instead of “book”. (d) Omission The keyed-in code is missing complctcly attribute that appears in the original code. cxamplc, the seller is not rncntioncd at all in kcycd-in code, cvcn though hc is mcntioncd in original code.

an For the the

(c) Over-detailing A given attribute appears at a higher detail lcvcl in the kcycd-in code than it dots in the original code, though conceptually the two concepts arc identical; for cxamplc, “paperback book” is kcycd in instead of “book”. (f) Additiou

identity

bctwccn

When comparing a kcycd-in code with an original code, any combination of the above states can appear. In order to reject or accept a kcycd-in code, an examination of the lcvcl of identity bctwccn the kcycd-in and the original code is carried out. This cxainination process uses the following conccptions. Definition

2-Compliance

mcasurc:

Compliance mcasurc (CM) is a certain pcrccntagc that rcflccts the lcvcl of compliance bctwecn the kcycd-in pass-scntcncc and the original code. Definition

3-Acccptancc/Kcjcction

thresholds:

arc ccrtaiu thresholds Acccptance/Kcjcction pcrccntagcs that dctcrminc the lcvcls of accepting or rcjccting the kcycd-in pass-scntcnce on the basis of its compliance incasure. Iti rcjcctiug or accepting the kcycd-in distinguish the following states.

code,

WC

(a) Acccptancc-when the value of compliance is greater than the acccptancc threshold, the kcycd-in code will bc acccptcd.

of attributes

Au addition of attribute in the kcycd-in code that dots not appear in the original code. For example, the time the book was purchased is mentioned only in the kcycd-in code. WC cmphasizc that the additional attribute must be recognized by the system’s dictionary; othcrwisc it will bc ignored by the system. (g) Lack of identity By this WC mean lack of identity bctwccn the two concepts. For cxarnplc, “pencil” instead of “book” or “six dollars” instead of “five dollars”. In these

(b) Intcrmcdiatc range-when the value of compliancc lies bctwecn the rcjcction threshold and the acccptancc threshold, the user will bc rcqucstcd to answer questions about missing attributes or under-dctailcd attributes. It must bc strcsscd that this is not a process of correcting errors but rather a detailing process. In order to ascertain that intruders will not take advantage of this process, two precautions wcrc selected. The first is that during the questioning proccdurc the system uses the words of the kcycd-in code rather than those of the original code; the second is that intcrmcdiatc states should be allowed only when the security lcvcl is rclativcly low.

149

Y Spector and J. GhzberglPass-sentence

(c) Rcjcction-when the value of compliance is lower than the rcjcction threshold, the kcycd-in code is rcjccted. Howcvcr, the user will be requested to answer questions about presumably ‘missing’ or ‘under-detailed’ attributes. This is done in order to prevent the potential intruder from getting information by using some kind of an elimination process. Let us assume that the user wishes to cntcr the system and keys in the following code: “Danny got something interesting for six dollars”. If WC compare this with the original code: “Danny bought an interesting book from Jack for five dollars in the store”, WC note the following: (a) The word “something” is used in place of the word “book”, meaning that the detail level of the object attribute is of a lower order (‘general case’ level instead of ‘individual cast’). (b) The ‘buyer’ attribute (Jack) Hence, this is an omission.

has been

omitted.

(c) There is an error in the ‘price’ attribute. This is a case of lack of identity (six dollars instead of five). The process of accepting or rejecting a keyed-in code will be further dcscribcd and illustrated in Section 6. 5. Evaluation

of the methodology

In this section we will evaluate the new mcthodology concerning the aspects of case of recalling and security lcvcl. 5.1. Ease of recall An cxperimcnt was conducted to estimate the case of recalling pass-scntcnccs as compared to regular passwords. Fifteen pcoplc participated in the cxpcrimcnt. Each participant was asked to rcmcmbcr two passwords and one pass-sentence. One password was constructed out of six randomly sclcctcd characters. The other password and the pass-scnfcncc wcrc chosen by the participants

150

approach to computer code

thcmselvcs. The participants wcrc asked to construct them in such a way that they would bc easily rcmcmbcred but still ‘safe enough’. The number of characters in the chosen password was not bounded for a maximum or minimum number of charactcrs. In a similar way, there wcrc no restrictions for the pass-scntcncc. The cxperimcnt’s results arc reported in Table I. The numbers in the table reprcscnt the number of participants who rcmcmbcrcd the codes. In the cast the participants wcrc given of the passwords, unlimited attempts to key-in the right password (i.c., until they kcycd-in the right password or gave up trying). This is in contradiction to regular computerized password systems whcrc the number of attempts is usually limited. In the cast of the passscntcncc, the participants were given only three attempts to key-in the right scntcncc. In thcsc attempts the computcrizcd pass-scntcncc prototype system had to rccognizc the code or rcjcct it, with a compliance mcasurc of 95%. The conclusions follows.

drawn

(I) Users can rcmembcr passwords.

from

thcsc

pass-scntcnces

results

bcttcr

arc as

than

(2) Assuming that the code must bc changed at least once a month, the pass-scntencc had 100% success!

Computers

(3) Pass-scntcncc is superior to a simple password when the code must be remcmbcrcd for a rclativcly long period. (4) Users usually choose pass-scntcnccs with a number of attributes that is grcatcr than the number of characters of their passwords. For cxamplc: a certain pass-scntcncc was: “Danny bought a beautiful Parker pen for $10 in Tel Aviv last month.” This pass-scntcnce includes no less than nine different attributes and 12 words. This can bc compared to the !&e-character word “Danny” (the name of the subjects’ child) that this same woman chose for her password. This implies that WC can demand a rclativcly high width dimension (for security reasons) and still bc confident that the pass-scntcncc will not bc forgotten. 5.2. Security level Potential intruders USC three penetrating a system: (I) password

basic

methods

for

interception;

(2) cracking

a copied password

(3) guessing

the password

file;

[2].

The first process is based on the intcrccption of the password during transmission, typing, or any other intcrccption opportunity. Hcrc, the pass-sentence system has no advantage over the conventional password. The second method for breaking into the computer system is by cracking the password file. This method can be broken down into two stages. The first stage consists of the copying of the password file by the intruder. This MC is coded such that the intruder cannot decipher the passwords without knowing them beforehand. This is the case because the coding of each password is dcpcndcnt on the password itself, dcspitc the fact that the encryption method is common knowlcdgc (the encryption

& Security,

Vol. 13, No. 2

function is a one-way function [9]). Hcncc the second stage consists of encrypting (according to the known encryption function) potential password candidates in order to find a match between an cncryptcd password candidate and an encrypted password existing in the file. Thus, the wider the number of potential candidates, the bcttcr the system is protcctcd against this type of trespassing. In analyzing discriminate

the range of password possibilities, WC bctwccn two main types of password:

(I) passwords bearing meaning, names in Latinatc characters; (2) random meaning.

passwords,

bearing

i.c.,

no

words

or

linguistic

The range of possible passwords of the first type is the range of meaningful words and names in the Latin alphabet. Let us assume that the number of such words and names in the language is 50000; thus the starch range will be log2(50000) = 15.6 bits. The common method for cracking a file made up of such passwords is called a ‘dictionary attack’. This method is based upon comparisons bctwccn the words in the dictionary and names in the Latin alphabet, and the contents of the password file. This type of password is considcrcd quite vulncrable, as a range of combinations of I6 bits is quite limited. The second type of password is that which has no linguistic or nominative meaning in the English language. In this instance the range of combinations is larger than that of words in the dictionary. For instance, if a character can bc any one of the 62 alphanumeric characters, the range of combinations for passwords made up of eight characters will bc (62)‘. In bits, the range of possibilities for such a system is log,(62’) = 48 bits. In the same vein, the range of possibilities for a random password of 12 characters is 7 1 bits. Let us now cxplorc the range of combinations possible in the pass-scntcncc system. In the research

151

Y: Spector and J. GinzberglPass-sentence

WC carried out WC found that the average passsentcncc is 8.2 words in length, wherein the avcragc word length is 4.85 lcttcrs. Again, let us assume that the number of words in the language is 5OOOO; thus the range of possible combinations for pass-sentcncc on avcragc is 128 bits (since log2(50 000)“’ = 128). When one takes into account that a pass-sentcncc code must have content (meaning) and a correct grammatical structure, there arc then fewer possible combinations. Using studies in the comprcssion of textual information, WC can attempt to evaluate the approximate reduction of search range for pass-sentence. WC know that an efficient algorithm for information compression of English text causes each character of text to bc at an uncertainty lcvcl of about two bits [lo]. This reduction is due to the fact that the context is known and has been well studied. Hence log1(26) (assuming an alphabet of 26 letters) can bc reduced to about 2 bits. Assuming an avcragc pass-sentence of 10 characters, WC can arrive at the conclusion that the range of possible combinations of an avcragc pass-sentcncc code is approximately 80 bits. Thus an cfficicnt ‘attacking’ algorithm (c.g. [I 0] or [ 111) rcduccs the search range from 128 bits to approximately 80 bits. Yet, during the above calculation process, two added components were not taken into considcration. One is that it is possible to cxprcss the same idea in several diffcrcnt syntactical variations. The pass-scntcncc system allows and accepts diffcrcnt scntcnccs with the same mcaning as a correct and lcgitimatc access code. The other component is the fact that the pass-scntencc system allows the acceptance of code containing certain types of error, using the CM mechanism (SK Section 6.4). The exact reduction of the search range as a result of thcsc two fcaturcs is being studied now. The third type of attack on the password code is the ‘intclligcnt’ attack. In this method, an intclligent being (human or artificial) attempts to guess the password. As WC saw in our research, and as has

152

approach to computer code

been found in previous studies [l], computer users tend to choose passwords from their immcdiatc surroundings, such as names of family members, ID numbers, etc. (86% of the passwords in our rcscarch wcrc taken from the participant’s immediate surroundings). Hcrc pass-scntcnce methodology has some advantages over the convcntional passwords system. The advantage lies in the fact that the pass-scntcncc contains a number of elcmcnts, so that cvcn if the user takes a code from his or her immcdiatc surroundings, guessing all of its elcmcnts is more difficult than guessing one sole clcmcnt. To conclude this section, WC find that the mcthodology of the pass-scntcncc has the potential of being at least as safe as passwords or cvcn safer. The exact security lcvcl of the pass-sentence system, and all its components, is a subject for further inquiry which WC are prcscntly undertaking. The next section will illustrate the proposed methodology as was embcddcd in the computcrizcd prototype system.

6. Pass-sentence system

computerized

prototype

6.1. General

A computcrizcd prototype system of conceptual computer code was developed on the basis of the proposed methodology. The prototype includes the semantic primitive: Atrans2 Additional technical description of the prototype program is given in the Appendix. A frame consists of several slots whose number and content arc not fixed. A slot contains a single value: a finite svmbol, called a terminal. or the frame itself [12]. ’ The frame the system USC’Sfor defining the Atrans type is basically as follows: ZAny number

of other primitives

can be added.

a code of

Computers

(Atrans

. ) (object1 . . .) (objcct2..

(ii) Th c_ 11 eve 1 o f a family “money”, “toys”.

Thus in principle WC arc in a position to accept codes that describe naive events conccrncd with the transfcrencc of ownership, consisting of a dcscription of five attributes:

(iii) The lcvcl of “dollars” (money), “Puppet” (toys). (b) Gcncral

(a) A buyer

actor 1

(b) A scllcr

actor2

(c) A purchased

object

(d) Cost

of objects,

such as “food”,

the specific “hamburger”,

object, such as “pizza” (food),

attributes

For an attribute of the type “location” the following hierarchical detail lcvcls: (i) The most gcncral place”, “somcwhcre”.

objcctl

i$a;;t

Description

of purchase

location

of the attributes

As noted in the previous section, the attributes of the code’s primitive fall, in the system, into three groups: (a) Description

of unique

attributes

location

WC propose

level, such as “somc-

most gcncral lcvcl is “an such as “somebody”, “anyonc”.

(ii) The level of “a person specified “man”, “woman”, “boy”, “girl”. (iii) Tl K1 1cvc 1 o f a person

spccificd

lcvcl of “, “store”.

a

family

(iii) Tl K3 1cvc4 o f a specific place”. (c) Attributes

of

places,

lcvcls

such

as

place, sucl~ as “Martin’s

of attributes

WC propose the following gcncral attributes of quantity and quality:

WC propose three hierarchical detail attributes of the type actor1 or actor2: (i) The human”,

such as “somc-

object2

(c) Location 6.2.

object,

.)

.))

(1ocation..

Vol. 13, No. 2

(i) The lcvcl of the gcncral thing”, “thing”.

(Type Atrans-frame)

(actor 1 . . .) (actor2..

& Security,

division

for

for (i) IIcscriptions of quantity, “many”, “few” etc.

such as “half’,

“one”,

unspecified (ii) Adjcctivcs, such as “lovely”, “lousy”, “bad” etc.

“nix”,

“beautiful”,

by xx”, such as

by name.

Each of thcsc attributes has its own frame, though it is not dctailcd hcrc. For attributes of the type objcctl and object2 WC propose the following hicrarchical detail lcvcls:

6.3. Formulating codes To give an idea of the structure of the code in the system, WC‘ shall dcmonstratc how the code “Jack bought a pizza at Martin’s for half a dollar” is rcprcscntcd by the computerized prototype. The following is the rcprcscntation of the pass-scntcncc as implcmcntcd in Lisp:

153

Y. Spector and J. GinzberglPass-sentence

approach to computer code

(bought (type verb) (subtype 1 Atrans-frame) (hierarchical-lcvcl I) (actor I (Jack

(type actor-filler) (subtype 1 unspecified-humans) (subtype2 human-male) (subtype3 specific-name) (hierarchical-lcvcl 3)

(object1

(pizza

(type object-f&r) (subtypcl

objects)

(subtype2 food) (subtype 3 specific-food) (hierarchical-lcvcl 3) (object2

(dollars

(type object-f&r) (subtypcl object) (subtypcz money) (subtype3 specific-money) (QUANT (half (type QUANT-f&r) (mean 0.5))) (hierarchical-lcvcl 4)

(place (Martin’s

(type placef&r) (subtype 1 places) (subtypc2 store) (subtypc3 specific-store) (hierarchical-lcvcl 3)

Thus WC have a code of the type Atrans composed of the following attributes: (a) actor1 (buyer)-JACK, who is an clcmcnt group of specific names, which is an clement group of malt individuals, who arc an clcmcnt of unspccificd humans. hierarchical detail level 3.

grouP

(b) actor2 (scllcr)-d

Thus

actor1

in a in a in a is at

ocs not exist in the code.

(c) object1 (the article purchased)-PIZZA, which is a specific food, which is an clcmcnt in the food family, which is an clcmcnt of gcncral objects. Thus object 1 is at hierarchical detail lcvcl 3.

154

(d) objcct2 (cost)-DOLLARS itself is at hicrarchical detail lcvcl 3 (according to the above logic). But thcrc is also a quantifier for this object, so one more lcvcl is added to the hierarchical score, making it 4. (c) place-“Martin’s place” is the place of purchase detailed at the lcvcl of specific place, and is thercfort at a detail lcvcl of 3. 6.4.

Analyzing

conceptual

the identification

of a keyed-in

code

Analyzing the identification of a keyed-in conceptual code is based on the compliance mcasurc

Computers & Security, Vol. 13, No. 2

(CM). The CM cvaluatcs bctwccn the code kcycd original code as it was fed for purposes of illustration, linear weighting method.

the level of compliance in by the user and the into the computer. Hcrc, WC have chosen to USCa

The following algorithm is only a simple illustration of a computation model for calculating the CM. The CM is dctcrmincd according to scores that arc given to the scvcn states that can occur code with the when comparing the kcycd-in original one (XC Section 6.2 for the states dcscription).

There is no way to decide whether the additional details arc true or not, as they do not occur in the original code. So WC confine the system only to the corresponding details. Thus the scoring process is exactly as above, without any consideration of new details. In this case a ncgativc score could also be considered.

(f ) Addition of attributes Here again we ignore the additional information, dealing only with corresponding attributes. Again, a ncgativc score could bc considered.

(a) Full identity

(g) Lack of identity

The following scores arc dctcrmincd according to the level in which the compliance occurs: full match at hierarchical

lcvcl I -score

4;

In this cast there is no way to avoid a negative score The following scores arc determined according to the lcvcl in which the first mismatch was rccognizcd:

full match at hierarchical

lcvcl 2-score

6;

mismatch at hierarchical

lcvcl 1-score

- 10;

full match at hierarchical

lcvcl 3-score

8;

mismatch at hierarchical

lcvcl %-score

-8;

full match at hierarchical

lcvcl 4-score

1O.

mismatch at hierarchical

lcvcl3-score

-6;

mismatch at hierarchical

level 4-score

-4.

(b) Conceptual-only

identity

It is computed as above, cxccpt that 20% is subtracted from the score.

CM is the sum of all the above individual scores divided by the sum of their maximums:

(c) Under-detailing Hcrc the match stops at a certain level which is lower than the original hierarchical detail level. WC compute the score according to the lower hicrarchical detail level only. (d) Omission For the sake of simplicity, a zero score &as chosen for the cast of omitting attributes. Alternatively, a ncgativc score could bc considered as well. (c) Over-detailing

1

CM=

,=I

Score,

I, 1 niax(Scorci) ,=I

(1)

The organization dctcrmincs the acccptancc threshold rcquircd for acccptancc of the kcycd-in code and provision of authorization to access the class&cd information according to the lcvcl of its sensitivity. The organization also dctcrmincs the rejection threshold rcquircd for rcjccting a code absolutely, again in accordance with the sensitivity of the stored information. For purposes of illustra-

155

Y Spector and J. GinzberglPass-sentence

tion, WC have selcctcd the following two types of information.

Sensitivity of information

‘secret information’ ‘regular information’

thresholds

for

Threshold for rcjcction

Threshold for acccptancc

80% 65%

95% 80%

Let us now look again at the code, ‘Jack bought a pizza at Martin’s for half a dollar” and assume that the user keyed in the following code: “Jack got some food at Martin’s place”. Calculation

of the CM of the keyed-in

code is:

(a) The word “got’‘-This is a ‘conceptual-only identity’ state. The basic structure is that of an Atrans primitive, similar to the original code, and the score for this attribute is therefore -c. The score is ‘fined’ 20% for using the word “got” instead of the original “bought”, so that the final score for this attribute is 3.2. (b) The word “Jack’‘-This is a “full identity” state. The detail level is at hierarchy 3, meaning a score of 8 points. (c) The word “sonic”-This word dots not occur in the system’s dictionary of concepts and the system thus skips it. (d) The word “food’‘-This is an ‘under-detailing’ state. Compliance is at the hierarchical level of only 2, which gives a score of 6 out of a maximum of 8. (c) The word “at’‘-This word dots not occur in the system’s dictionary and the system thus skips it. (f) The words “Martin’s identity’ state. Compliance The score for this attribute

place”-This is at a detail is 8.

(g) The cost-This is an ‘omission one point of the possible 8 is given.

156

approach to computer code

The sum of all these scores gives a total of 25.2. The maximum possible sum for the compliance values is 36. Thus the CM is: 3.2+8+6+8+0 4+8+8+8+8

not

(2)

If the user has asked for an access to ‘sccrct information he will be denied, as a CM of 70% is below the secret information rcjcction threshold (which is 80%). The cast for ‘regular information’ is a little more complicated. The threshold for rejection is only 65%; access to regular information is not dcnicd. But the threshold for acceptance is 80%, which is still beyond the user’s CM of 70%. This is an intermediate state whcrc the user is ncithcr denied nor given permission to use the requcstcd information. As the user did not give full details of the original attributes, the system can inquire now for greater detail. This is done by simple questions, trying to acquire the missing details. As was already mcntioncd, this is not a process of correcting errors but rather a detailing process. The user is asked to answer questions about missing attributes or under-dctailcd attributes. The CM is recomputed and compared again with the thresholds. The process of question-answering is cndcd when all omissions and under-detailing arc covered, or when the user’s CM is either below the rcjcction threshold or over the acccptancc threshold. Let us illustrate all this by continuing the cxamplc. The “food” in the kcycd-in code is under-detailed in relation to the original code. SO: QUES?‘IoN:

WC would like you to bc specific. What did Jack get?

ANSWER:

A pizza.

is a ‘full level of 3.

state and

25.2 = 36 = 70%

more

This is a correct answer. The more ‘dctailcd’ attribute of food sets the CM to 8 instead of the

Computers

old 6. This yields a CM of 76%:

& Security,

Vol. 13, No. 2

25.2 + (8 - 6) __--~ = 7 6(X, 36

is based on conceptual processing of natural language. The focal point of the methodology is the USC of a pass-scntcncc based on semantics as well as syntax, as opposed to the syntax-only passwords in USCtoday.

Also notice the USC of the kcycd-in “get” instead of the original “buy” in order not to cxposc any information about the pass-scntcncc. Rut the CM is still below the 80% threshold of ‘regular information’. Howcvcr, since the cost attribute is complctcly omitted in the keyed-in code, the following dialogue can proceed:

This tncthod has the following advantage: improved nmnorability, flexibility in permitting different lcvcls of access, grcatcr security in ccrtaiu cases, and the ability for interaction bctwcen the user and the computer in order to try to reach a better point of identification or rcjcction of the user.

QLESTION:

WC would like you to bc nlorc specific. What did Jack give for the object he rcccivcd?

ANSWER:

A dollar.

A computcrizcd prototype, based on the principles of a conceptual coding methodology, was prcscntcd. Sonic results of its USC in a survey wcrc rcportcd. Thcsc results imply that pass-scntcnccs arc casicr to rcrncmbcr than traditional passwords. The prototype is only an cxamplc of the many possibilities inhcrcnt in dcvcloping ‘intelligent’ models for understanding and identifying conceptual computer codes.

The original code spccificd half a dollar and not one dollar. Thus, the system dctccts a ‘lack of identity’ state. The lack of compliance is at a hierarchical lcvcl of 4. Thus altogcthcr the CM will bc &cd’ 4 points. The user’s CM is rc-computed as follows:

The mistake is fatal to the user and hc is denied access to regular information as well (since the CM is below the 65% regular information rcjcction threshold). When thi; conclusion is rcachgd, the question-answering is cndcd. However, the user will bc rcqucstcd to answer questions about prcsumably ‘missing or ‘under-dctailcd’ attributes. This is done in order to prevent the potential intruder from getting information by using some kind of elimination process.

The rnodcl can provide a solution to the problems of data security in a world in which comnunication with cornputcrs and databases is cffcctcd from various locations. This is not true for other tncans of identification, such as fingerprinting, which improve the lcvcl of security but do not allow communication with the computer from outside locations. The boundaries of the model can bc cxpandcd to include additional subjects such as: dcvcloping of tools for evaluating the rcscmblancc between a kcycd-in code’s attribute and the original attributes (c.g “one dollar” is closer to “95 cents” than “fifty dollars”); investigations of diffcrcnccs in ability to rcmctnbcr codes of diffcrcnt types.

7. Summary This articlc has prescntcd the conceptual computer code methodology, a new approach to constructing and idcnti+ng computer code. This methodology

157

Y. Spector and J. GinzberglPass-sentence

G. Hagopian, Planning and implcrnenting a security package, Data I-‘rot. Comyut. Secur. (Winter 1987) IO- I I. J.C. Spcndcr, Identifying computer uscrs with authcntication devices (tokens), Cwtryur. &au., 6 (Oct. I 987) 385-39.5. J.A. Cooper, Conrpurcv-Seturify ?i&o/o~q~, Lexington Hooks, Lexington, MA, 1081. L.S. Smith, Authenticating LISC~S by word association, Gmput. Stwr., 6 (Dee. 1987) &G&+70. KC. Schank. Concc~p~ual Inf;,rmatior~ l’roccwiy, Elscvicr, New York, 107% KC. Schank and R.P. Abelson, Scripts, l’laru nerd Utzdcrsrmdirzg, Erlbaum, Hillsdalc, NJ, 1077. I.H. Wittcn, Computer (in)sccurity: infiltrating open SyStWlS, in P.J. Dcnning (cd.), Cwputen rrrrdcr“A&k, Addisw-Weslcv, NY, 1900. D.A. Lclcwcr and I>.S. Hirschbcrg, Data compression, nmf Cowi~ur. Suru., I O(3) (Sept. 1087) 261-296. T. Hell, 1.H. Wittcn and J.C. Clcary, Modclling for tcxr compression, ACA4 Conzput. Sum., 2 l(1) (Ike. 1080) 557-.59 1. M. Minsky, A framework for rcprcscnting knowlcdgc, in I

I

approach to computer code

currently running under a SUN workstation in ‘Sun Common Lisp’ (version 3.0.2). Both versions of the computer program have an average cxccution time (for processing a ten words pass-scntencc) of less than half a second. So the main diffcrcncc is not execution time, but rather the improvement in dcvclopment tools. dictionary is currently being The system’s expanded to rcprescnt additional semantic primitives (besides Atrans and Ptrans). WC stress that the expansion process is mainly a dictionary rcfincmcnt with only minor corrections of the parser. A.2.

How does

it work?

The system’s parser parses a pass-sentence in a leftto-right manner. As it encounters each word, it saves the ‘meaning’ of that word in its memory and ‘spawns’ demons which represent expcctationseither for what has already occurred, or for what may occur next. Demons arc active processes which ‘wait’ until their conditions arc satisfied, whercupon they ‘fire’ and cause various structures to be conncctcd togcthcr.

APPENDIX: The prototype system A.I.

General

The prototype system can be roughly divided into a dictionary and an infcrcncc cnginc. The current dictionary contains about 70 words. The infcrcncc cnginc is based on an cxpcctation-based conceptual parser [ 131. was on LBM-XT, The original devclopmcnt written in ‘Golden Common Lisp’ (version 1.0). Another version of the computer program is

The best way to become familiar with this parsing method is by means of example. Let us take the following pass-sentence: “Jack bought a pizza for half a dollar”. Each word has its own dictionary definition which contains a definition frame and cxpcctations. The definition frame contains a number of fixed slots; arc additional slots arc added as cxpcctations realized. To dcmonstratc, the following is a partial definition (in Lisp) of the concept “bought”:

(sctf (get ‘bought ‘definition) ‘(bought (type verb) (subtypcl Atrans) (hierarchical-lcvcl I))) (sctf (get ‘bought ‘demons) (((op cxpcct) (slot actor 1) (fill ((chCckClC11lCrltbCfr~ (subtype I ~117spCcifiCd-llul~lans)) (no-gap St-id actor 1) (not-same sLid actor 1)) (destroy ((gap-f&d stid actor 1) (cnd_of_scntcncc))))

158

Computers & Security, Vol. 13, No. 2

The fixed slots in the “buy” definition are its syntactic role, verb, and its semantic significance, Atrans. The actor1 slot, reprcscnting the seller, is filled if the cxpcctations to identify a seller arc realized. The cxpcctations arc reprcscnted by functions known hcrc as demons. In the cxamplc, the cxpcctation for actor1 is evaluated by three demons in the function “fill”. The ‘check-clcmcnt-bcforc’ demon verifies that the identity bcforc the verb “buy” is represented by a ‘subtype1 unspccifiedhuman’ frame. The ‘no-gap’ demon vcrifics that this slot is still vacant. The third demon, ‘notsamc’, ensures that the content of the actor1 slot is not identical to that of the actor2 slot, i.e. that the seller and the buyer arc not the same person. If the three conditions arc met, the cxpcctation is realized and the frame under consideration is linked to the actor1 slot of “bought”. The actor1 slot is not necessarily filled, and if any of the demons listed under destroy is successful, the cxpcctation concerning ‘actorI’ is cancellcd. The full definition of “bought” contains, in addition to the actor1 slot, cxpcctations concerning actor2 (the buyer), object1 (the object sold), and object2 (the rcmuncration). The dictionary definitions of the rest of the above words arc as follows (some of the definitions have been abbreviated f&r the sake of simplicity): Jack(sctf (get ‘Jack ‘definition) ‘(Jack (type actor-filler) (subtype1 unspccificd-human) (subtype2 human-male) (subtype3 specific-name) (hierarchical-lcvcl 3))) pizza(sctf(gct ‘pizza ‘definition) ‘(pizza (type object-filler)

(hierarchical-lcvcl3)))

j&-

(sctf (get ‘for ‘definition) ‘(for (type preposition))) half(sctf (get ‘half ‘definition) ‘(half (type quant-filler) (mean 0.5) (hierarchical-level ‘dynamic + I))) (sctf (get ‘half ‘demons) ‘(((op expect-attribute) (slot quant) (fill ((chcckclcmcntaftcr (subtypcl object)) (no-gap ptr quant))) (destroy ((gap-filled ptr quant) (end-of-scntcncc))))) As an ‘attribute of attributes’ (see definition in Section 3) the “half” frame ‘scarchcs’ an object frame with a non-empty quant slot. If the search is successful the quant slot is updated with a pointer to this quant-concept. The hierarchical-lcvcl of the updated frame is added one point more.

dollar-

(sctf (get ‘dollar’dcfinition) ‘(dollar (type object-filler) (subtype1 objects) (subtypc2 money) (subtypc3 specific-money) (hierarchical-lcvcl3))) (sctf (get ‘dollar ‘demons) ‘(((op expect-attribute) (slot object2) (fill (chcckelcmcntaftcr (subtype1 Atrans)) (no-gap ptr objcct2))) (destroy ((gap_filled ptr object2) (end_ofscntcncc))))) The “dollar” frame, as with all other money frames, dots not ‘sit and wait’ to be tied with an Atrans frame. If an Atrans frame with an empty object2 (remuneration) slot is cncountcrcd, the dollarframe ‘tics’ itself to that slot. The following is the memory structure which results from processing the above pass-scntcncc:

159

Y. Spector and J. Ginzberglfass-sentence

(1 (Jack (type actor-filler) (subtypc2 human-male) (hierarchical-level 3)))

approach to computer code

(subtypcl unspccificd-humans) (subtypc3 specific-name)

(2 (bought (type verb) (subtype1 (hierarchical-lcvcl 1) (actor1 (id 1)) (objcctl (id 3)) (objcct2 (id 6)))

Atrans) (2 (got (type verb) (subtypcl (hierarchical-lcvcl 1) (actor1 (id 1)) (object1 (id 3)) (place (id 5)))

(3 (pizza (type object-filler) (subtype1 objects) (subtype2 food) (subtype3 specific-food) (hierarchical-lcvcl 3)))

Atrans)

(3 (pizza (type object-filler) (subtypcl objects) (subtypc2 food) (subtype3 specific-food) (hierarchical-lcvcl 3)))

(4 (for (type preposition))) (4 (at (type preposition))) (5 (half (type quant-filler)

(mean 0.5)) (5 (Martin’s type place-filler) (subtypcl places) (subtype2 stores) (subtypc3 specific-stores) (hierarchical-lcvcl3))))

(6 (dollar (type object-filler) (subtypcl object) (subtype2 money) (subtypc3 specific-money) (hierarchical-level (+ 3 I))

(Quant (id 5)))) The frame for the concept “bought” contains the core meaning of the pass-scntcncc. The cxpcctaand a tions for a buyer, an object cxchangcd remuneration are rcalizcd by “Jack”, “pizza”, and Thus, the “bought” frame “dollar” rcspcctivcly. under consideration contains pointers to the concepts for these instants. The expectation for a scllcr (actor2) is not rcalizcd and so has, as yet, no slot in the frame. The “dollar” frame contains a pointer to the “half” frame. This pointer was updated by the “half cxpcctation that found an object frame with an empty quant slot. The dctcrmincr “a” is ignored, as arc all determiners, since it dots not contribute to the semantics of the scntcncc. A.3.

Comparing

a pass-sentence

with

a keyed-in

pass

sentence Let

us

scntcncc:

assume that the user keyed in the pass“Jack got a pizza at Martin’s place”.

The following

is the memory

structure

that results:

(1 (Jack (type actor-filler) (subtype1 unspccificd-humans) (subtypc2 human-male) (subtype3 specific-name) (hierarchical-lcvcl 3))) 160

First WC identify the core meaning of the two scntcnccs. This is done tither by simply locating the semantic primitives, or by locating the two concepts that arc not being pointed by, and have maximum pointers to, other concepts. Now WC can compare the structures, as WC dcscribcd in Section 4. For cxamplc: the word “got” vs. “bought” creates a state of ‘conceptual only identity’ bctwccn the two concepts. The word “Martin’s” crcatcs a state of ‘addition of attributes’, as the place attribute is not mcntioncd in the original pass-scntcncc.